Big Data Privacy Scenarios Elizabeth Bruce, Karen Sollins, Mona Vernon, and Danny Weitzner
|
|
|
- Rosamond Pope
- 10 years ago
- Views:
Transcription
1 Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR October 1, 2015 Big Data Privacy Scenarios Elizabeth Bruce, Karen Sollins, Mona Vernon, and Danny Weitzner massachusetts institute of technology, cambridge, ma usa
2 BigDataPrivacyScenarios BigDataPrivacyWorkingGroup September2015 BigDataPrivacyWorkingGroupChairs: ElizabethBruce(MIT) KarenSollins(MIT) MonaVernon(ThomsonReuters) DannyWeitzner(MIT)
3 Acknowledgements WegratefullyacknowledgethemanycontributorstothisScenarioWorkingDocument. ThisincludesalloftheBigDataPrivacyWorkingGroupleaders,teammembers,andguides fortheirthoughtfulefforts.aspecialthankyoutodazzagreenwoodofmitmedialaband SimonThompsonfromBTforcreatingtheoriginaltemplateforthescenariosummaries. BigDataPrivacyScenarioContributors/Teams:MicahAltman(MIT),ElizabethBruce (MIT),DavidDietrich(EMC),JohnEllenberger(SAP),DazzaGreenwood(MIT),Maritza Johnson(Facebook),LalanaKagal(MIT),JakeKendall(GatesFoundation),CameronKerry (MIT),IlariaLiccardi(MIT),YvesVAlexandredeMontjoye(MIT),UnaVMayO Reilly(MIT), MichaelPower(OsgoodeHallLawSchool),ArnieRosenthal(Mitre),KarenSollins(MIT), SimonThompson(BT),MonaVernon(ThomsonReuters),EvelyneViegas(Microsoft),and JamesWilliams(Google/UniversityofToronto) BigDataPrivacyWorkingGroupEditor:BarbaraMack(PingryHillEnterprises,Inc.) 2
4 TableofContents ExecutiveSummary...5 UseCase:MassiveOpenOnlineCourses(MOOCs)andOnlineLearningEnvironments (OLEs)...6 UseCase:ResearchInfrastructureforSocialMedia...7 UseCase:DataforGood:PublicGoodandPublicPolicyResearchUsingSensor Data/MobileDevices...9 OtherUseCases...10 Conclusions Introduction OverarchingObservations Stakeholders OpenQuestionsandIssues RemainderofThisDocument PrivacyIssuesforDataCollectedfromMOOCsandOnlineLearning Environments Abstract DetailedNarrative PrivacyImpactAssessmentVTheSpecificContextofScenario GoalsofOLEs Data Systems Risks Rules/Regulations Technologies PrivacyConstraints TechnologyInformingandSupportingOLEDataPrivacyandConfidentiality Policy 23 3 ResearchInfrastructureforSocialMedia Abstract ScenarioIntroduction StakeholdersandInteractions
5 3.4 Systems AnalyzetheScenario InnovationIdeasandOpportunities NotesonScenario References DataforGood:PublicGoodandPublicPolicyResearchUsingSensor Data/MobileDevices Abstract ScenarioDevelopment OperationofScenarios RegulatoryEnvironment DataUtility Privacy CriticalIssues PromisingPathsForward References AdditionalUseCases PrivacyinAggregatedDiverseDataSets Creation,Management,ApplicationandAuditingofConsentonPersonalData ConsumerPrivacy/RetailMarketing GenomicsandHealth Conclusions...46 A. B. Appendix:PrivacyScenarioTemplate...48 Appendix:Stakeholders...50 C. Appendix:StakeholderDatafromMOOCsandOnlineLearningEnvironments (OLEs)
6 ExecutiveSummary Karen&Sollins&(MIT)& TheMITBigDataPrivacyWorkingGrouplaunchedaseriesofworkshopsbeginningin 2013toexplorethechallengesandpossibletechnologicalsolutionstoelementsofthose challenges.asasuccessortothoseworkshops,theworkinggroupbegantofocusona collectionofrealworldscenariosandusecases,toilluminatethechallengesmore concretely. Thedeeperquestionexploredbythisexerciseiswhat&is&distinctive&about&privacy&in&the& context&of&big&data.althoughprivacyasageneralissueincomputingandcommunications remainsatopicofsignificantattentionanddisagreement,inthiseffortwenarrowour attentiontothe BigData context,tounderstandmoreclearlytheparticularchallenges andpossibleapproachesthatderivefromthecollection,pooling,andcombinationofvast amountsofdata,specificallyaboutpeople.thisfocusonpeopleasthesubjectsofattention inthebigdatacontextiscentraltothedefinitionofprivacy,whichitselffocusesoncontrol data,informationandinferencesaboutpeopleandhowthatcanorshouldbeused, exposed,orotherwisemadeavailable. Wesummarizehereaninitiallistofissuesforprivacythatderivespecificallyfromthe natureofbigdata.thesederivefromobservationsacrosstherealworldscenariosanduse casesexploredinthisprojectaswellaswiderreadinganddiscussions. Scale:Thesheersizeofthedatasetsleadstochallengesincreating,managingand applyingprivacypolicies. Diversity:TheincreasedlikelihoodofmoreandmorediverseparticipantsinBig Datacollection,management,anduse,leadstodifferingagendasandobjectives.By nature,thisislikelytoleadtocontradictoryagendasandobjectives. Integration:Withincreaseddatamanagementtechnologies(e.g.cloudservices, datalakes,andsoforth),integrationacrossdatasets,withnewandoftensurprising opportunitiesforcrossvproductinferences,willalsocomenew information about individualsandtheirbehaviors. Impactonsecondaryparticipants:Becausemanypiecesofinformationare reflectiveofnotonlythetargetedsubject,butsecondary,oftenunattended, participants,theinferencesandresultinginformationwillincreasinglybereflective ofotherpeople,notoriginallyconsideredasthesubjectofprivacyconcernsand approaches. Needforemergentpoliciesforemergentinformation:Asinferencesovermerged datasetsoccur,emergentinformationorunderstandingwilloccur.althougheach uniquedatasetmayhaveexistingprivacypoliciesandenforcementmechanisms,it isnotclearthatitispossibletodeveloptherequisiteandappropriateemerged privacypoliciesandappropriateenforcementofthemautomatically. Theprimarycontentofthisreportisanumberofrealworldscenarios,resultingfrom discussionandthensubgroupeffortswithintheprivacyworkinggroup.eachcasewas analyzedalongacollectionofaxes:keystakeholders,datalifecycle,keysystems,potential privacyrisks,andexistingbestpracticeswithinthecontextofthatscenario.thetemplate waslaidoutinitiallybydazzagreenwoodofthemitmedialabandsimonthompsonof BTandcanbefoundinAppendixA. 5
7 Asaresultofcollatingthesescenarios,twokindsofpointsemergedacrossthem.Thefirst isasmallsetofcommonquestions.thesecondisalistofcategoriesofstakeholders.we summarizethosehere. Thekeyquestionsthataroseare: Whatnew/uniquechallengesemergewhenitcomestomanagingprivacyinthe contextofbigdata? Howdoweassessbenefitvs.risk? Howdoweevaluate harm?giventhatharmissubjective,difficulttoquantify, andfallsonaspectrumfrominappropriateonlineadvertisementstodiscrimination insettinginsuranceratestolifeordeathmedicalintervention,isitpossibleto evaluateharmuniformlyandifso,howwouldonedothat? Howcanweestablishandassesstrustamongthestakeholders?What mechanisms/modelsdowehaveforunderstandingtrust? Atableofthecategoriesofstakeholdersderivedfromthescenarioscanbefoundin AppendixB.Inaddition,AppendixCdemonstratesanapplicationofthesestakeholder categoriestothefirstusescenarioonmoocsandoles. Theinitiallistofcategoriesofstakeholdersincludes: Datasubject(s) DecisionVmaker Datacollector Datacurator Dataanalyst Dataplatformprovider Policyenforcer Auditor BothofthesesetsofpointsarediscussedinmoredetailinthecompaniontechnologyV mappingdocument,andareprovidedheretoidentifycrosscuttingobservationsfromthe variousscenarios.althoughthecurrentlyidentifiedsetofpotentialstakeholdersislisted here,itisimportanttorecognizethatprivacyisamuchmorecomplexproblemthat concernsmorethanthestakeholdersalone. TheWorkingGroupexploredsevenusecases.Thisreportpresentsthreeintheircomplete formsinsections2v4;thosethreecasesaredescribedbrieflyintheexecutivesummary. Inaddition,inthefinalsectionofthereport,inSection5,summariesoftheadditionalfour casesarepresented,becausethesewerestudiedinlessdetail. UseCase:MassiveOpenOnlineCourses(MOOCs)andOnlineLearning Environments(OLEs) Anyonlinelearningsituationprovidesanopportunitytorecordalltheactivitiesof everyoneinvolvedintheteachingexperience,primarilybutnotexclusivelystudentsand teachingstaff.moocsasasubsetofonlinelearningtakethistonewscalesandoftento newlevelsofautomationaswellasexpandingrolesinthecollectionof,responsibilityfor, anduseofthedatathatderivesfromthoseteachingexperiences. 6
8 Infocusingonprivacyinthiscontext,oneisconcentratingonquestionsofwhichbehaviors andinformationaboutindividualsmaybeexposedinwaysthattheymayfindcontradicts theirmodelsofprivacy.thechallengesariseatleastinpartfromthenewopportunities thatmoocsprovidetocollect,merge,andreasonovereducationaldataatascaleandwith aneasenotpreviouslypossible.thedatamaynowbeusedinnovelwaysandinvolvenew stakeholdersincludingdatacurators,dataplatformproviders,researchers,andthose interestedinnovelapproachestopedagogy.thechallengeistoachievethatinwaysthat respecttheprivacyoftheindividualstudent,perhapstheteachingstaff,andpossibly secondarypeopleaswell,suchasparentsandguardians,especiallyinthefaceof asymmetricpowerrelationships.oneaspectofthechallengeistounderstandthe implicationsofprivacy violations inthiscontext.theymayarisenotonlyfromthedirect exposureofinformationabouttheindividualthatwasneitherintendednordesired,but alsofrommoresubtleconcernsoverdiscrimination,harassment,inaccessibility,or violationofothercivilandhumanrights. Thecontributionidentifiesanumberofkeyinsightsintoprivacychallengesthatarisein themoocandolearenas,including: Thenatureoftheinformationbeingcollected,includingclickstreams, contributionstoonlinediscussions,forums,andquestionnaires,aswellas behaviorswithrespecttobothaccessingandsubmittingcontent(reading, watchingonlinelecturesorvideos,attemptsatdoinghomework,etc.); Toolsandnormsforexpressionofprivacypolicies,includingcurrent,future, aggregation,andintegrationwithotherdata; Thetusslesinobjectivesamongstudents,teachingstaff,ownersoftheeducational content,crowdorstudentprovisionofcontributions(throughgradingorsocial networkingfacilities)totheexperienceofotherstudents,institutionalhosts, educationalsystems(suchasmunicipalschoolsystemsorstateuniversity systems),researchersandanalysts,andserviceproviderssuchasdatacurators, datastorageandanalysisservices; Thenatureofthepotentialprivacyviolationharmstothevariousstakeholders; TranslationoftheFamilyEducationalRightsandPrivacyAct(FERPA)intothis increasinglyrich,complex,growing,andevolvingdomaininwhichcollectionsof educationaldataiscollected,curated,collatedandperhapsintegrated; Thefactthatthisispreviousunchartedterritorywithsocial,legal,andmoral challengesasyetnotclearlyidentified,whichisalsoevolvingduetoincreased technologicalcapabilities,oftenindependentlyofprivacyobjectivesandinterests. UseCase:ResearchInfrastructureforSocialMedia Thebehaviorsofindividualsandgroupsonlinecanprovidethebasisforsignificantdeeper understandingandpredictionofhumanbehaviorsandinterests.thekindsofdatathatcan beusefulingainingthatincreased social understandingrangefromthevarious contributionsmadebyindividualssuchastext,photos,variouskindsofstreamingmedia andotherinformationrelatingtotheparticipantsaswellasloggedinformationsuchas clickstreams,frequencyandotherpatternsofaccess,etc.atpresentthemajorityofaccess tosuchsocialmediainformationisprimarilyrestrictedtoinvhouseanalysisbysocial mediaorganizations. Thequestionexploredbythisgroupiswhetherandhowonemightprovidea privacy frameworkforsuchinformation,givingthesubjectsoptvincontrolofwhichinformation 7
9 aboutthemselvescanbemadeavailableforbroaderstudiesandwideravailabilityofthe information.theintentionisthatpermissionforuseremainswiththesubjects,butby givingthemtheopportunitiestoshare,richer,andlargerstudiescanoccur,withallthe potentialsocietalbenefitsthatthosestudiesmightentail.thesubjectmustbegivencontrol overboththegranularityandtypesofthedata,includingbothstaticdatasuchasbirthdate, address,jobhistoryandsoforth,anddynamicdatasuchasongoingpostsinvariousmedia. Intermsofthestakeholders,therearethreekeyparticipants,1)thesubjectsthemselves, 2)thesocialmediaorganizationswhowillplaytheroleofdatacollectors,oftendata curators,anddataplatformproviders,3)thedataanalysts,whomayalsoplaytheroleof datacurators,iftheyprovidedaddedunderstanding(curation)overthedatasets.there aretwogeneralapproachestomakingthedataavailable.thefirstistogenerateslices,on someregularbasis,ofthedatathatistobeexposedanddeliverthattotheanalysts.the alternativeistoretainalldataonacontrolledservicewithaclearlydefinedapi,providing onlyconstrainedaccesstothedata.thefirstgivestheanalystmorefreedomtoexplore,but reducesthesubject sabilitytoretaincontrol,especiallywithrespecttowithdrawingfrom astudyretroactively. Thereareatleastfourcontextsinwhichsuchasystemmustoperate:legal,social,business, andtechnical.thechallengeisthatprivacymustberespectedinthecontextofallofthese domainssimultaneously. Thestudygroupidentifiedalistofrisksorchallengestoprivacythatmustbeconsideredin suchascenarioincluding: Unexpectedinferenceresultingfromtheanalysis; Unexpectedharmduetomodificationsofthedataplatform,duetoinferences,orto thenatureoftheresearchitself; UnpredictablebiasintheresultingresearchbasedonbiasintheselfVselecting natureofparticipation; Unexpectedcorrelationbetweenthestudysubjectpopulationandthegeneral population; Removalfromstudiesafteragreeingtoparticipate; Controlofdownstreamuseofthedata,beyondtheoriginalanalystagreement.This raisesquestionsofprovenance(whohastouchedthedataandhowmightthey havemodifiedit),tohowtoenforcepoliciesbeyondtheboundsofpairwise agreements,toidentificationandrecourseformisuse,forstarters; Responsibilityfordatabreachesbothbythesocialmediaprovideractingas repositoryandcuratorandbytheresearchersandanalysts; Findingthebalancebetweenprivacyandpublicationofresults; Managementofinformedconsents; Automationofasmuchofthisaspossible,whileunderstandingtherisksthatmay beintroducedthroughsuchautomation. Thestudyalsoidentifiedsomekeytechnologiesthatexistandsomeplaceswhere technologiesareneeded,butnotyetavailable. ThescenarioisbasedonacurrentcollaborativestudyinvolvingtheTechnicalUniversityof DenmarkandtheMITHumanDynamicsLaboratory. 8
10 UseCase:DataforGood:PublicGoodandPublicPolicyResearchUsing SensorData/MobileDevices Thechallengefacedinthisscenarioistotakeadvantageofmobilephonedata(mobility data)withoneoftwopossibleobjectives.thefirstistomodelandpredictoutbreaksof epidemicsandthesecondistoenablemicrovtargetingofindividualsorgroupsofpeople withinterventionsinordertoreduceorpreventoutbreaksofepidemics.thegeographic regionoffocusinthisworkisafrica.ofparticularinterestarepeoplemovingacrossareas whereanepidemicmaybemoreprevalentandthosewhereitmaybelessso. Inadditiontothetwokindsofobjectives,thestudyexaminestwodistinctsystemdesigns orimplementations.inallcases,theoriginaldataiscollectedbythemobilenetwork operators(mno).inoneimplementation,eachmnoanonymizesandcoarsensthedata bothspatiallyandtemporally.thus,forexample,thetimemaybereportedin12vhour blocksrepresentingdayandnightandlocationmayberepresentedasparticularregions wheremalariaisprevalentornot.theindividualityofeachrecordisretained.this enablesthetargetingofindividualsthroughoneoftwomeans.theanonymizedidentifier ispresentedtothemno,whichinturneitherprovidesaccessinformationtotheanalystor actsasanintermediaryconveyinginformationbetweentheanalystandsubject.inthe otherimplementationdesign,dataismergedonaregionalbasisbeforebeingaggregated, soforexample,themnomightreportthataspecificpercentageoftheresidentsofone areaspentadifferentspecificpercentageofnightsinadifferenttargetarea.thissecond designsignificantlyincreasesthesubject sprivacyandreducesthepossibilityofrev identificationorexposure,aswellasreducingtheaccuracyandpotentialutilityofthedata. Thisstudyidentifiedanumberofchallenges: Thescenarioexposesadirecttradeoffbetweenhealthrisks(andpossible mitigation)fortheindividualandpersonalprivacy; Thescenarioalsoexposesadirecttradeoffbetweenanalysiscapabilitiesand personalprivacy; MNOsaregenerallynotinthebusinessofanonymizing,curatingandproviding datatootherentities.inthesecases,theanalystroleisoftentakenonbynational healthministries; ThelegalbasesforprivacyinAfricaarecomplexandgenerallybasedinhistorical traditionfromthecountriesthatcolonizedtheminpreviouscenturies.those WesternandNorthernAfricamostlyderivefromtheFrenchcivilcode,withexplicit privacyframeworksandarecloselyrelatedtotheeuropeanprivacydirective. ThosesuchasSouthAfricathatderivefromtheEnglishcommonlawtraditionhave muchlessconcretepolicieswithrespecttoprivacy.toaddtothis,aspopulations movefromonecountrytoanother,theymayalsobemovingfromoneprivacy policymodeltoanother; TheintentionofthisuseVcasestudywastoallowthegrouptoelicitcommonalitiesand distinctionsamongthecasesthatmightallowustogeneralize.thatinturnalsohas providedthebasisforacompanionpaper,whichconcentratesoncurrentandnearvterm futuretoolstoimprovethepossibilityofprovidingprivacy,whilecontinuingtoallowfor BigDataanalysisandthebenefitsthataccruefromthat. 9
11 OtherUseCases Thereportconcludeswithabriefsummaryoftheadditionalfourusecasesexaminedby theworkinggroup.theseincludeprivacyunderconditionsofintegratingoverdiverse datasets,thecreationandmanagementofuserconsentoverexposureanduseofpersonal data,consumerprivacyandretailmarketing,andgenomicsandhealth. Conclusions Fromthesescenarioswedrawthreecategoriesofconclusions.Thefirstisasetofcommon overarchingchallenges.inorderofincreasingcomplexitytheseare: Scale:ThesheersizeofboththedataitselfandtheaccompanyingmetaVdatathatis necessarytomanageitandprovideprivacypoliciesisincreasing. Diversity:Withgrowth,wealsoseeanincreaseinthetypesofdata,interestsof analystsorusersofthedata,andrichnessofprivacypoliciesinthesenew scenarios. Integration:ThereisincreasingpressureandopportunitytomergeorcrossV fertilizeamongthesediversedatasets.thisleadstoresultsthatmayhave previouslybeeninaccessible,butthatareexposedthroughperhapsdiffering integratedobservationsoftheindividual. Secondarysubjects:Althoughmuchdataisbasedonprimarysubjects,itmayalso, perhapsinadvertentlyalsoreflectonsecondarysubjects.handlingprivacypolicies forthismoreintegratedsituationissignificantlymorecomplexthanthepolicies applicabletoasinglesubject. Emergentprivacypolicies:Withboththeintegrationofdatasetsandtheincreasing captureofdataaboutsecondarysubjects,thereisalsoaneedforprivacypoliciesto reflectthisemergentdata.thechallengeofhowthesenewpoliciescomeinto existencewillplayanincreasinglyimportantrole. Thesescenarioshaveprovideuswithabasisforaninitialobservationaboutthediffering stakeholdersinvolvedinthehandlingofbigdataandtheprivacypoliciesapplicableto them.webeginwiththesubjectsthemselves,perhapsbothprimaryandsecondary,and thedecisionvmakerswhosetouttohavethedatacollectandmadeavailable.wethen identifyasetofdifferentstakeholdershavingtodowiththecollection,managementand provisionofthedata.thisincludestheactualdatacollector,thedatacurator,andthedata platformprovider.wethenidentifythreekindsofstakeholdersinvolvedintheactivitiesof usageofthedata,thedataanalyst,theprivacypolicyenforcer,andthedataaccessauditor. Withthesechallengesandobservationsinmind,wealsorecognizethatthereareanumber ofopenquestions.thesequestionsrevolvearoundseveralkeyelements.thefirstis whetherornotbigdatabringsnewchallengestotheprovisionofprivacyorwhetherit exposesexistingproblemsperhapsmoreclearly.moreimportantly,arequestionsofrisk vs.benefitstradeoffs.oneofthechallengesonefaceshereisprivacyandtheriskof violationofprivacyisnotbinaryandperhapsnotevenmeasurable.thus,oneisthenledto askabouttheharmsthatmayresultfromdifferentlevelsofprivacypoliciesand/orthe violationsofthoseprivacies.finally,weareleftwithasetofquestionsrelatedtotrust,how itcomesintoexistence,howitmayevolve,howhumans trustcanbemodeled,andhow trustmaybesupportedtechnically. 10
12 Wenotethatthissetofobservations,challengesandquestionsareonlyrepresentativeof whatonemightdrawevenfromthislimitedsetofscenarios.abroaderstudymightleadto yetmorechallengesandquestions. 11
13 1 Introduction Karen&Sollins&(MIT)& ThevastamountsofdiversedatathatarenowbeingcalledBigDatapresentsocietywith anextremelyinterestingsetofchallenges,rangingfromhowtouseanyonesuchdataset forawideandincreasingsetofopportunities.thesemayrangefromimprovedproduct recommendationstoimprovedmodelingofhumanmobilityinregionsofinfectious diseasestomanyotherpointsinbetween.butbigdatapresentsadditionalopportunities thatincludeabroaderanddeeperunderstandingacrosssuchdatasets.ifonecanmerge mobilitydatawithmedicalhistories,forexample,onemightprovideamuchmoreaccurate modelofpotentialepidemics,dependingonbothmobilityandpriorepidemicsofdiseases towhichimmunitiesaredeveloped. Atthesametime,societiesandcommunitiesarebecomingincreasinglyconcernedoverthe questionsofwhoknowswhataboutthemandwhetherornottheyhavecontroloverthose datacollectorsandanalyzersknowingthingsaboutthem.theconcerniscapturedinthe word privacy.the problemofprivacy isinfactacomplexandsubtleone,withmany challengesandoftentoofewsolutionstothosechallenges.onemustaskquestionssuchas, Whoisthesubjectofthedata? Theremaybeaprimarysubject,butdataabout interactionsmayhavemultipleprimarysubjects.theremaybesecondarysubjects,suchas theparentsorlegalguardiansofachildwhohappenstobethesubjectofthedata.in addition,onecanaskquestionsaboutwhoelseisinvolvedwiththedatainvariousways, suchascollectingorstoringit,protectingit, curating itforaccuracyandcompleteness, analyzingit,andsoforth.onecanalsoaskwhatpoliciesshouldbeappliedtothedatafor controllingaccesstoit,tomeetanyprivacyconstraintsfromalegitimatepolicysource.or, howmightthatpolicybeenforced?orhowcanonebeconfident(trust)thatthepolicyis eitherbeingdefinedbyalegitimatepolicysourceorbeingenforcedbyatrustvworthy enforcer?andsoforth.thequestionsofwhatismeantbyprivacy,whocandefine appropriateprivacyandhowthatmightbeimplementedareonlynowbeginningtobe examined,withsignificantprogressinsomeareasandlessadvancementinothers. ThechallengewefaceintheBigDataarenaisattheintersectionofthesetwodriving forces,bigdataitselfandallthatithasthepotentialtoprovide,andprivacy,asitbecomes increasinglywellvunderstoodtobeadesignvdriverforsystemsinthecybervage. TheMITBigDataPrivacyWorkingGroupconcentratesonthisproblemdomain.Tothat end,severalworkshopswereorganizedbyandheldatmit. 1 Inaddition,theWorking Grouptookontwoinitialagendaitems:1)documentationofasetofscenariosinorderto betterilluminatesomeofthecentralchallengestoprovidingprivacyina BigData world; 1Seeworkshopreports: 1. Big&Data&Privacy:&Exploring&the&Future&Role&of&Technology&in&Protecting&Privacy,June19,2013.Availableat: report. ( 2. MIT&White&House&Big&Data&Privacy&Workshop:&Advancing&the&State&of&the&Art&in&Technology&and&Practice, March3,2014.Availableat:report.( priv/images/mitbigdataprivacyworkshop2014_final pdf) 12
14 2)roadmappingofcurrentandnearVtermfuturetechnologiesthathavepromiseof addressingpartsoftheprivacyinbigdatachallenge.thisdocumentisthefirstofthese. Belowintheremainderofthissectionwewillsummarizeanumberofconclusionswe drawfromthescenarios.thesetakethreeforms.thefirstisasetofissuesthatderivefrom thelargerchallenge.thesecondisasetofcategoriesofstakeholdersweextractfromthe scenarios.finally,weconcludetheintroductionwithasetofquestions,whichremain unanswered,butappeartobecentraltotheproblemdomain. 1.1 OverarchingObservations Inexaminingtheusescenarioshere,wecanidentifyaninitialsetofsignificantissueson theconsiderationofprivacy,whichderivespecificallyfromthenatureofbigdata.these arealsoinformedbywiderreadinganddiscussionsonthetopic: Scale:Thesheersizeofthedatasetsleadtochallengesincreating,managingand applyingprivacypolicies.becausethedatasetsthemselvesareofsuchincreasing size,themanagementofthemetavdatathatreflectsprivacypoliciesaboutitwill incurparallelgrowth.oneofthechallengesisthatasdatasetsgrow,efficiencywill playanincreasingrole.thatwillalsobetrueoftheprivacypolicymanagement associatedwiththegrowingdata. Diversity:Asdatasetsbecome bigdata, itwillbeincreasinglylikelythatmoreand morediversestakeholderswillbeinvolved.eachmaycometotheeffortwithhisor herownagenda.withanincreasingnumberofstakeholderswithdifferent responsibilitieswillalsocomeanincreasedprobabilitythattheirinterests,agendas andobjectiveswilllessalignedwitheachotherandhencetheirapproachesto privacypolicieswillalsobemoredivergentandpossiblyconflicting.thus,privacy policyconflictresolutionwillplayanincreasinglyimportantrole. Integration:Withincreaseddatamanagementtechnologies(e.g.cloudservices, datalakes,andsoforth),integrationacrossdatasets,withnewandoftensurprising opportunitiesforcrossvproductinferences,willalsocomenew information about individualsandtheirbehaviors.thechallengeisthatreasoning,inferenceand otheranalysistoolswillallowfortherecognitionordiscoveryofhithertohidden facts(data)aboutthesubjects.thisraisesaquestionofhowtocreateandenforce privacypoliciesonthisnew data. Impact&on&secondary&participants:Muchdataaboutindividualsubjectstendsto reflectonotherpeopleaswell.thismayrangefrompeoplewho liked apostto peoplewhoarementionedin orposts,totruesecondaryparticipants,suchas familymembersorcovworkers.onequestionthatwillbecomeincreasingly importantishowtoobservetheprivacyrightsoftheseotherpeople,whoarenot theprimarysubjectofthedataandmaynotbeavailabletoapplyaprivacypolicy whenthatispossible.evenifthesesecondarypeopleareavailable,itisnotclear howtohandleconflictingprivacypoliciesinthisdomain. Need&for&emergent&policies&for&emergent&information:Asinferencesovermerged datasetsoccur,emergentinformationorunderstandingwilloccur;thiswillbe basedasmentionedaboveonbothsimplymergingdatasets,butperhapsmore importantlyallowingfortheexposureofpreviouslyhiddendatathatisonly exposedinthemergingofdatasets.althougheachuniquedatasetmayhave existingprivacypoliciesandenforcementmechanisms,itisnotclearthatitis possibletoautomaticallydeveloptherequisiteandappropriateemergedprivacy policiesandappropriateenforcementofthem. 13
15 1.2 Stakeholders Asthereaderwillseeinthescenariosthemselves,thereareanumberofkeystakeholder categoriesthatappearrepeatedly.notallcaseswillincludeallofthesestakeholders.in somecases,individualsmayplaymorethanonestakeholderrole.thus,forexample,the datacollectorandthedatacuratormaybethesame,orthedataplatformprovider,the policyenforcerandtheauditormightbethesame.butothercombinationsarelikelytobe foundaswell.itisalsoimportanttorememberthattheprivacypoliciesforadatasetmay bedefinedbypeopleindifferentrolesindifferentsituationsand,insomecases,the policiesmaybedefinedbyoutsidersonbehalfofoneormoreofthesestakeholders,asfor examplemaybetrueunderaregulatoryregime.thus,itmaybethatonbehalfofthedata subject,thegovernmentrequirescertainprivacypolicies. Datasubject(s) DecisionVmaker Datacollector Datacurator Dataanalyst Dataplatformprovider Policyenforcer Auditor This list was drawn from the scenarios and should only be considered representative rather than complete. Appendix B includes a table with definitions of each of these stakeholder roles. It is also considered at greater length in the companion paper on technologies. Appendix C demonstrates an application of these definitions to the first scenarioonmoocs. 1.3 OpenQuestionsandIssues Instudyingthesescenarios,weareleftwithanumberofchallengingquestionsandissues: Novelty:Whatnew/uniquechallengesemergewhenitcomestomanagingprivacy inthecontextofbigdata? Tradeoff:Howdoweassessbenefitvs.risk?Partofthechallengeinthesedomains isthatthattherisksandtradeoffsneedtobeevaluated,totheextentthattheycan beevaluatedbymetrics,bothbydifferentmetricsandatdifferenttimescales.asan extremelysimpleexample,thebenefitsofmoocanalysismaybetofuturestudents, whiletherisksmaybetothesubjectsofthedata,thestudentsaboutwhomdata hasbeencollected.akeystrokeloggingsystemmayhelpcurrentstudentsifthe teachingstaffcangetimmediatefeedbackonhowlongittakeseachindividual studenttocompleteaparticularexercise,butitmaybethatsystematicchanges mayonlyoccuronalongertermbasisthantheperiodduringwhichaparticular studentisinvolvedwithaparticularcourse.atthesametime,totheextentthatthe datacanprofileindividualstudentsinnumerouswaysbothinrealtimeand perhapsoverthelongerlifevtimeofthedataset,andperhapsinconjunctionwith thedatafromothercoursesthestudenthastaken,theirrisksofviolationofprivacy maycontinuetogrow,anddefinitelyareunrelatedtothebenefitsforfuture students.oneofthechallengesinthisdomainofmetricsisthatprivacyisnot binary.inpartbecauseitiscontextualandinpartsimplybecausetheprivacyof someinformationismorecriticalthanotherinformation,thisquestionofthe 14
16 tradeofforbalancebetweenbenefitandriskisbothcomplexatanyinstantandisa movingtarget. Harm:Howdoweevaluate harm?asmentionedabove,therisktoprivacyis neitherbinarynornecessarilystable.thedeeperchallengeistounderstandthe potentialharmthatmayaccruefrompotentialrisks.infact,wemayneedtoturn thisissuearound.thequestionwemayneedtoaskis, Whichharmsareimportant totheindividualsandinwhatcontexts? Thus,harmscouldbeimaginedona spectrumfrominappropriateonlineadvertisingtodiscriminationinsetting insuranceratestosomethingthatisalifeordeathmatterintermsofmedical intervention.fromthatwemightbeabletoconsiderwhetherthereissomemetric forevaluatingharmgenerically,orwhetheranycomparativeevaluationcanonlybe doneintermsofspecificharms.interm,fromtheidentificationofharms,wemay alsobeabletoidentifytherisksthatwouldleadtothoseharms.thisisanother wayoftalkingabouttherelatedtopicfromthesecuritycommunity:threats. Trust:Howcanweestablishandassesstrustamongthestakeholders?Whatdoes itmeanforthevariousstakeholderstotrustormistrusteachotherorsetsof others?whatmodelsdowehaveforunderstandingtrust?whatarethecurrent andpredictablefuturemechanismsandtechnologiesforestablishingtrustandhow dotheyrelatetothemodelsinpeople smindsandperception?howistrust establishedandmaintained?howdoesitevolveovertime? Withallthesequestionsandissuesinmind,theremainderofthisreportpresentsthe scenarioanalysisdonebyvarioussubgroupsofthebigdataprivacyworkinggroupfrom whichwedrewtheseobservations,thoughtsandquestions. 1.4 RemainderofThisDocument Theremainderofthedocumentfocusesondescriptionsofthescenariosasoutlinedby subgroupsofthelargerworkinggroup.thefirstfocusesonmoocs(massiveopenonline Courses)andOLEs(OnLineEducationalsystems).Thesecondaddressesthechallengesin usingsocialnetworkingdataforresearch.thethirdconsiderstheuseofmobilecellphone datatoreflecthumanmobilityintoandoutofregionsofhighlyinfectiousdiseases, especiallyindevelopingpartsoftheworld.thefinalsectionofthepapersummarizesa numberofadditionalscenariosaddressedbythegroup,butinlessdepth.theyilluminate moreofthebreadthoftheproblemdomain.thepaperconcludeswiththreeappendices: A)thetemplatedevelopedbythegroupfororganizingtheindividualscenarios,B)amore invdepthtableofthestakeholdercategories,c)anapplicationofthestakeholderanalysisto thefirstscenarioaboutmoocsandoles,asanexample. 15
17 2 PrivacyIssuesforDataCollectedfromMOOCsandOnline LearningEnvironments Team:&UnaMMay&O Reilly&(MIT),&David&Dietrich&(EMC),&Lalana&Kagal&(MIT)& 2.1 Abstract MOOCs(MassiveOpenOnlineCourses)representaspecifictypeofOnlineLearning Environment(OLE),whichcanbedeployedonInternetVservedplatformsthatcollectlarge volumesofgranularbehavioralinformationaboutstudents learningactivities.somedata revealeachindividualstudent sdetailedstudybehaviorsuchasvideousage,consultation oftextorlearningtools,andthesequenceinwhichmaterialwasnavigated.otherdata includeassessments,grades,andsocialinteractionsandcommunicationonforumswithin theplatform.collectivelythedatacanbelinkedtoauxiliarydemographicinformationsuch asage,sex,andsocioeconomicstatus.itcanalsobelinked,ifnotanonymized,topublic onlinebehavior.ageneralsetoflegitimateusesofthisdataincludeseducationresearch, examination,andanalysesthatdirectlyorindirectlyhelpinstructorsteachandconduct studentassessments.some,butnotall,oftheseusecaseshavecommercializablemodels forpartiesbeyondtheplatformprovider DefinitionofaMOOCandtheScopeofOLEandMOOCinthisdocument MOOCisanacronym(MassiveOpenOnlineCourse)originatingin2012.Theacronymhas beenshortvlived,asmoochasevolvedintoanounwithmeaningsfallingoutsidethe acronym.forexample,todayweseemoocsthatarenotopentoallcomersandmoocs thatareonlypartiallyonline,becausetheyareintegratedintoblendedlearningorflipped classroommodels. 2 MOOCssharehistorywithITS IntelligentTutoringSystemsandother learningmanagementsystems,suchasblackboardandmoodles. Wearefocusingondataanditsrelatedprivacyandconfidentialityissuesinthisdocument. NoOLEplatformcollectsexactlythesamedata,butwhereveritislargelyunimportantto differentiateeachplatformbyitsspecificname,wewillrefertothemallasoles StateofDataPrivacyOrganization OLEs,andMOOCsinparticular,attheircurrentscalearerelativelyrecent,sodataprivacy andaccesspoliciesareemergentanddynamic.policymakersrangeingovernancescale fromthefederalgovernmenttoplatformproviders,andfurthertoinstitutionaland independentcontentproviders.defactopoliciesandinterimpoliciesthathavebeen necessarytocoverfastvpacedoleactivitybothexist.furthermore,existingpolicieson dataprivacyhavebeeninterpretedinnewcircumstances.policycommitteesandmeetings 2GiventhisfluidityofthemeaningofMOOC,somepeoplereasonablydisputetheoriginofthewidely recognizedfirstmooc,believinglargescaleonlinecoursesatthecollegelevelprecedingng sorthrun sat Stanfordin2012tobevalidexamples.ItisarguablethatCourseraandMITX/edXexamplesaremoreprecisely called xmoocs, whilepreviousonlinelearningcourses,whicharegenerallymuchmorefluidinnaturein termsofcontentdeployment,aremorepreciselycalled connectivist or cmoocs. 16
18 abound.policymakingisattheinformationcollecting,optiondrafting,andrevisionstages. Thereisapotentialtoleveragetheexperiencefrommanyotherdatadomainsandshapea strongnationalexample.thiswillrequireinputfromdatastakeholders,thelegal community,andtechnologyexperts.thelatterareimportantbecausetheycanadviseon technicalrisksofprivacyandconfidentialitybreaches,whilealsoindicatingthecapabilities andpotentialpowerofnewtechnologies. 2.2 DetailedNarrative TheOnlineLearningEnvironment(OLE)dataprivacyscenarioisrelativelystraightforward comparedtosomeotherdomains,suchashealthrecordsorpersonalgenotyping,for severalreasons: BecauseOLEsarerecent,therearefewdatalegacycomplexities. Becausethenumberofplatformsismodestrightnow,thekindsofdataare enumerableandtheirformatsareknown.however,thiswillchange. Becausethereareenumerableclassesofstakeholdersinthespaceandpolicy precedentsinrelateddomains,thereisgenerallylessdivergenceand/or disagreementonwhatapolicyshouldcoverandwhattheprinciplesandshouldbe OpenIssues Recognizingthedynamicnatureofcontrolofthedataandacknowledgingthatthe circumstancesaroundthatcontrolmaychange.thedataisreplicatedandpassed bytheplatformprovidertotheinstitutionofthecontentprovider.atthispoint, twopartieshavecontrol.hereafter,designatedcontrollersmayexpandinnumber, orthecontrolmaybepassedfrompartytopartyinstages.differentcontrollers havedifferentinterestsinthedataandallowvariouspartiestoaccessitundera diversesetofgoalsandagreements.thereisnouniformitytoinstitutional practicesacrossthecountry.ifabroaderpolicyandsetofpracticesweretobe developedbygovernment,theirinterpretationmightstillresultinheterogeneous localpractices. Defininganddetermininglegitimateusesofthedataandhowtheseusesshouldbe controlledinaclear,specific,andopenvendedmanner. Settingguidelinesorstatedpoliciesrelatedtothesale,trade,orsharingofthisdata inolesandmoocs. Defininganddeterminingthelegitimatecommercialuseofthedata,ifany. Definingtheroleoftechnologyinaidingthedraftingandgovernanceofpolicy. AnticipatingcommercialandeducationalactivitiesaroundOLEdata,aswellas potentialmaliciousactivities,andconsideringwhattechnologycandotosupport them(orpreventthem),asnecessary. Thetradeoffsforpolicyarounddatacontrolandaccessinclude: Students righttoconfidentiality,privacy,andaccesstotheirowndata. Institutions andcontentproviders righttoaccessbecauseofcontentprovision. Platformproviders righttoaccessbecauseofserviceprovision. Thebenefitofresearch,theresearchVmotivatedrighttoaccess,andthe countervailingriskofidentification. Thepotentiallinkingofanonymizeddatawithoutsidedata. 17
19 Commercializationopportunitiesthatmaybeunforeseenorunanticipatedby studentswhograntpermissiontocollectandcontroltheirdata. Thereasonablelimitsoftechnologyforprivacyandconfidentialitypolicysupport AdditionalPrivacyConcerns Forumdiscussionsanddatalinkability. OnecommonwaytogradeassignmentsisviapeergradinginMOOCs,whichmay createpowerrelationshipsandopportunitiesformisuse. Powerdynamicsmaynotrespectbasicrights,astheyrelatetothelinkeddataor thetextualinformationfromthediscussionforums.inaddition,themoocscan presentasymmetricalpowerdynamics.considerthecaseofchildrenand prisoners,wherepeoplewithinasystem(educational,correctional)maybe requiredtodothingsaspartofthethatsystem,orinthiscase,themooc,andthey maybeinfluencedtobendtherules,giventheexistingpowerdynamics. Therefore,thisareaneedsadditionalprotection,sinceMOOCshavethepotentialto enablecoercionandpowerimbalance.therearefreemoocsandmoocsfocused oncertificationsandjobs.thereisanasymmetricpowerrelationshipinsome situationsandwhenthisexists,thereshouldbeseparateregulationsgoverning thesemoocstoensurethatthedynamicsarefairandthereisfreewillandclear consent. 2.3 PrivacyImpactAssessmentYTheSpecificContextofScenario Actors Students:Userswhotakethecourse,completetheassignments,andreceiveagrade. Teachingcontentproviders:Facultyandteachingstaffthatprovidetheteaching material,monitorandsupportthediscussions,andhandlethegrading. CrowdParticipants:AtVlargepartieswhomightvolunteertogradeorofferfeedbackon assessments,programmingassignments,andsoforth,butwhoarenotstudentsorcore teachingstaff. PeerGraders:Aspecificcaseofstudents,inwhichstudentsareexpectedtogradeeach othersworkinordertomanagethegradingatlargescales,asoccursinsomemoocs contexts. Institutionalcontentprovider:Theinstitutionbehindtheteachingcontentproviders. ExamplesincludeanenterpriseofferinginVhouselearningplatform,auniversityofferinga MOOC,anenterpriseofferingproducteducationforclients,orthegeneralpublic. Platformprovider(e.g.Coursera,edX,StanfordU):Apartythatdeploysthecourseonthe Webviaaplatform.Insomecases,thesamepartydevelopsandmaintainstheplatform. Forexample,edXisanotVforVprofitorganizationthatdevelops,maintains,anddeploysa MOOCplatformasaservicewithaconsortiumofuniversitypartners,includingMITand Harvard.Courseraisacommercialentityandhasdifferentuniversityrelationships.Open edxisanopensourceplatformthatanycontentprovidercanadoptanduseforcontent deployment. Analyst:ApartywhoexaminesthedatacollectedfromOLEs.Analystsinclude researchers,theirstudents(iftheresearchersareacademics),andeducationtechnologists. 18
20 Teachingstaff,platformproviders,andinstitutionalcontentprovidersmayalsoactas analysts. Datacontrollers:DatacontrolofOLEdataisnotalwayscentralizedorstationary. Examplesofdatacontrollersincludetheplatformproviderandinstitutionalcontent provider.withineachoftheseinstitutions,therecouldbemultiplecontrollers.theymay controlthedataatdifferenttimes,ortheymaycontrolitconcurrently.forexample,atmit, theofficeofdigitallearningreceivesthedata,controlsitsdistributionatonepoint,and thenlaterpassesthisroleontotheinstituteregistrar ActorsandRelationships Analystsinteractwithdatacontrollerstogainaccessthedata.Thedatacontrolleroften askedtheanalyststoformallysubmittoapolicy.eventuallyanalystswilltransformsource databylinkingandinterpretationintomoreabstractrepresentationsofstudentbehavior, e.g.variablesformodeling,allthewhiletryingtoenforcestudentanonymity.analystswill interactwithdatacontrollerstoworkouthowtomorewidelysharesuchvariablesandto evaluatetheriskthattheyandmodelsusingthempresentsomeriskofrevidentification. Datacontrollersinteractwitheachothertopassorsharethedata. Studentsinteractwiththeplatformproviderandtheteachingcontentprovider.They registerwiththeplatformprovidertogainentrytotheplatformandcourse.theyprovide backgroundinformation,participateinthecourse,includingitsforumsandassessments, andprovidesurveyinformation.asdatacontrollers,bothproviderswillaccessthis information.itshouldbenotedthatstudentsoftenconfusetheplatformandcontent providers.astudentisshownaprivacyandaccesspolicybytheplatformwhenheorshe registers.astudentagreestoaplatformusepolicywhenregistering.forexample,edx s usepolicystipulatesnoscraping. StudentsindirectlyinteractviatheOLEwiththeInstitutionalContentProviderwhen theyhavegradesplacedintheiracademicrecords,orwhentheyreceivecreditor proficiencycertificates. Studentsindirectlyinteractwithanalysts.Theygainabenefitfromassistancethatcould befoundedontheresearchers analysisoftheirdata bothastudent sindividualdataand thedataofotherstudentsinaggregate. Studentsinteractwithotherstudents,generatingdataofgreatinterest.These interactionsfrequentlytakeplaceonforumswithintheplatform.importantly,fordata privacyreasons,theymaytakeplaceoutsidetheplatform,informallyarising,ratherthan beingorganizedbythecoursestructure.examplesofdigitalrecordsoftheseinteractions arefacebookorlinkedingroups.sometimesstudentsassesstheworkofotherstudentsin peervtovpeerrelationships.studentsmayalsoworkingroupsonprojectsorhomework. StudentsinteractwithCrowdParticipantswhentheyreceivefeedbackfromthem.For example,onecourseatmitinvitesalumnitocommentonstudentsoftwaredesigns. Studentsrarelyinteractwithdatacontrollersatthistimeandhavezeroorlittleaccessto theirdatabeyondofficialrecordscreatedfortheireducationpurposes. Institutionalcontentprovidersemploytheteachingstaff,i.e.teachingcontent providersandhaveagreementswiththemregardingintellectualpropertyrelatedtothe course,andremunerationforinstruction.theinstitutionisusuallythedatacontroller, 19
21 ratherthantheteachingcontentprovider.infact,thelatterpartymayneedtoseek permissionfordataaccesstotheverycoursesheorshehastaught. TeachingcontentprovidersinteractwithCrowdParticipantstoprovideguidelineson gradingandgetfeedbackonstudentperformanceandinterestinthecourse. Teachingcontentprovidersprovidefeedbacktoinstitutionalcontentprovidersand platformprovidersonusability,additionalfeatures,andstudentperformance,for example. Teachingcontentprovidersmayinteractwithanalyststounderstandhowstudents learnandinteractwiththeirteachingcontentinordertoimprovethatcontent. Teachingcontentprovidersmayinteractwithdatacontrollerstogetaccesstodata abouttheircourseinordertoanalyzeitandtoimprovetheteachingcontent. Institutionalcontentprovidersinteractwiththeplatformproviderstoensurethatthe coursesaresupportedproperlyandprovidefeedbackonadditionalfeatures. Institutionalcontentprovidersinteractwithdatacontrollerstoidentifyand/orspecify thepoliciesthattheywishtoenforceandtodiscussenforcementmechanisms. ThecoreinteractionisthestudentlearningviaanOLE.Aroundthispoint,studentsinteract witheachotherandteachingstaff.intermsofprivacy,studentsareidentifiedbytheirlogin idontheoleplatform.theymayalsorevealtheir offline identifytoeachotherandstaff inthecontentoftheirdiscussionposts.studentsagreetoaplatformuseagreementthat impliesthattheyaccepttheplatform sdatausepolicy.duringthelearningprocess,the platformprovidercapturesclickstream,assessment,discussion,andwikidata.inreal time,oratlongerintervals,theplatformprovideraggregatesthisdatafrommanystudents interactions.theplatformandtheinstitutionalcontentproviderscontrolthesedata.they aregenerallynotaccessibletothestudent,buttheyareaccessibletoteachingcontent providersandanalysts.institutionscontrollingthedataareresponsibleformeetingferpa requirementsandpseudovanonymizingdatatowhichtheywilllinkandprovideaccess. Theyalsodevelopandprovidetechnicalsupportfordataaccesspolicies.Analysts transformsourcedatainthecourseoftheirmodelingactivities.theymaycombinelow levelobservations(e.g.mouseclickactivity)intovariables(e.g.referralstotextduring problemsolving)andcompilelargedatasetsofthem.thesedatasetsdescribestudent behavioratarecognizablelevelofhumanactivity.theyaredestinedtobecomethedata currency ofanalyticresearch.howtohandlethecontrolandprivacyprotectionofsuch secondarydata(i.e.whocanitbesharedwith,givenpotentialforstudentrevidentification) remainstoberesolved. 2.4 GoalsofOLEs General:Toeducate.WithcollegeOLEs,theeducationcouldhave(secondary)outreach, accessibilitygoals.withcorporateoles,theeducationcouldhave(secondary)product adoption,sentiment,andpublicitygoals.inaddition,goalsspecifictoactorsare: Teachingcontentproviders:Providingteachingmaterials,jobtasksforanemployer. CrowdParticipants:Altruisticorprofessionaleducationgoals. PeerGraders:Evaluateotherstudentworkinanappropriate,objectivemanner. 20
22 Institutionalcontentprovider:Sometimesthroughgeneratingrevenuedirectlyor indirectly;reputation. Platformprovider:Revenuestreamsviaadvertising,signaturetracks,recruiting. Possiblecrosssellingtosteerpeopletowardformaldegreeprogramsatuniversitiesthat providecontent.owntheecosystem,astheyowntheactualplatformandaccessthedata. Analysts:Researchintoeducation,improvementofOLEexperienceforstudentsand teachersbyinterpretinghistoricaldata.inevitably,financialprofitcouldbeagoalforthis kindofactor. Datacontrollers:Thesearethedatagatekeepers.Theyregulateaccesstothedataatthe momentforanalystsandotherpotentialcontrollers.theirgoalistoensurethatthe privacyandconfidentialitypoliciesgoverningthedataarerespected,whileproviding accesstoappropriateanalysts. Thereisalurkingunnamedadversarialgoal/actorinthisspace:Thoseexploitingthedata forcommercialorhackingpurposes,outsidetherealmofeducationaluse,i.e.toidentify someoneandtargetherorhimspecificallyforrevelationsorforprofitvbasedactivities. Forexample,thereisasignificantpotentialfortargetedadvertising. 2.5 Data MOOCsofferapotentialsocialsciencelaboratoryorstudysettingwherestudents behavior andinteractionwithcoursecontentcanbealmostmicroscopicallyobserved.technology allowsustocaptureatremendousamountofdetaileddata,including: ClickVstreaminteractionsbetweenastudentandcontent. UseofvideosandothereVresources,suchasdigitizedreferencematerial,wikis,and forums. Assessmentbehavior:attempts,correctness,useofimmediatefeedback. SelfVreportedbackground,preVandpostVtestsurveys. Moredatathaninaresidentialsetting,butwithlesscontextualinformation accompanyingit. Thisdatacanbesegmentedinseveralways,asoutlinedbelow CourseYrelated Coursecontentfromcontentprovider. Dataexhaustfromplatform,asstudentsinteractwithWebservers.Thisisoften calledclickstreamdata.foredx,itisjsonlogsofeveryget/postofdatatotheweb site. StudentinputtotheOLEviawikianddiscussionforumentries,questionnaires,and selfvreportingsurveys. Assessments bothgradesandresponses;certificateachievement InstitutionorPlatformYrelated Curriculardatarelatedtocoursestaken,timing,andlearningpaths. Registrationdata,suchasprofileinformationaboutstudents. Paymentdataperhaps(e.g.,CourseraSignatureTracks,otherthirdparties). Certificatedata. 21
23 ThesedataareindiverseformatsandcanbelinkedtoformstudentVorientedortimeV orienteddescriptions(theformerbeingmoreactionable)oflearningactivitywithin&the& platform.onesuchopenorganizationofmoocplatformdataismoocdbwithinthe MoocDBproject.MoocDBisaplatformagnosticfunctionaldatamodelfordataexhaust frommoocs.themoocdbprojectwillprovideopensourcesoftwareofmooctoolsand frameworks. 2.6 Systems Businesssystems.Asanexample,CourseraisaforVprofitorganization,providingan onlineservice.inthepast,courseraoffereda"freemium"modelinthemarketplace,and hasevolvedtoofferlowcostcoursesandspecializations.signaturetrackingverifies studentauthenticity,recruitersareinthemodelandserveasarevenuesource,andlifelong learnerstakecourseswellbeyondthetypicalstudentyears.inthecaseofacorporate MOOC,HRlearningsystemsarepartofthispictureaswell. 2.7 Risks Thebiggestdatariskisthatsomeoneinthedataisidentifiedandthiscausesharmtothem. DatahastobepseudoVanonymizedbeforerelease,butthatdoesnotassurethatreV identificationwillnotbepossiblewith100%confidence.revidentificationcantakeplace inatleastthreeways: PseudoVrandomizeddatahasconfidentialcrossVreferencetablestotrueidentity. Thesetables,ifnotadequatelyprotected,couldbecompromised. Somereferenceinthecontentofthedata,forexamplefreetextpostsindiscussions ortimestampswilldirectlyorindirectlyallowcrossvreferencingtopublicdatathat revealsidentity. Apreviouslycompromiseddatasetcanpotentiallybeusedtolearnthebehaviorof astudentandthisbehaviorpatterncanthenbeappliedtonewdatasetstoidentify thestudent. Severaladditionalrisksexist: Datacontrolisnotinthehandsofthedataproviders,i.e.thestudent.Therefore, thereisariskthatthedatacanbeusedinawaythatthedataproviderdidnot anticipate,orforareasonthattheydonotapprove. Datareleasedforresearchpurposeswillbeusedforcommercialpurposes. Datawillbeusedtoevaluatetheteachingabilityoftheteachingcontentprovider andtocompareteachingcontentacrossdifferentinstitutionalcontentproviders withoutexplicitconsentrelatedtoindividualdatasharing. StudentsmaynotunderstandtheprivacypolicythattheyhaveagreedtoatsignVup, andtheirpersonaldatagetssharedormonetizedwithouttheirinformedconsent. 2.8 Rules/Regulations IntheUnitedStatesmuchoftheregulationofacademicdataisregulatedbytheFamily EducationalRightsandPrivacyAct(1974), 3 whichdefinestherightsofparentsand 3Seehttp://www2.ed.gov/policy/gen/guid/fpco/ferpa/students.htmlforgeneralinformationaboutFERPA. 22
24 guardianstoaccessandsomecontroloverwhohasaccesstowhichinformationabout childrenunder18yearsold.italsodefinestherightsofstudentsover18,suchasstudents incollege.itisimportanttorecognizethattheremaybeanumberofnonvferpa regulationswithrespecttotheprivacyofinformationaboutstudents.anexampleofthisis theu.s.healthinsuranceportabilityandaccountabilityact(hipaa),butthereareothers aswell.thisgroupdidnotdiscusstherelationshipsamongthesevariousdifferentfactors intheprivacyofeducationaldata,butjustnotedthatsuchdifferencesandpossible conflictsexist. 2.9 Technologies LearningPlatforms(usingthisbroadlytorefertoplatformssuchasedX,Coursera, Udacity,andotherMOOCproviders,aswellasmoretraditionalLearning ManagementSystems(LMS)suchasBlackboard; Softwareframeworksforprocessinglargedatasets,suchasHadoopanddatalakes thatstoreacombinationofstructuredandunstructureddata; Webbrowsersandfrontendtools; Analyticaltools; Cloudcomputingplatforms(e.g.,AmazonWebServicesandothers); Codeondifferentsystems; Mobiledevices PrivacyConstraints PrivacyconstraintsinaMOOCareverydifferentfromthoseofaphysicalclassroom experience.thereisaperspectivethatsincemoocsaremuchmoreopen,studentsare morevulnerableonline,comparedwithatraditionalclassroomsetting TechnologyInformingandSupportingOLEDataPrivacyand ConfidentialityPolicy Whattoolsandapproachescan(new)technologyprovide? Somepossibletechnologies: Differentialprivacy. Analysisiscarriedoutonencrypteddata,soeventheplatformproviderdoesnot seethedata(homomorphicencryption). TheanalystusestrustedandprivacyVawareAPItowriteuptheiranalysisand submittheircodetodatacontroller;theapipreventstheabuseofdata. Storeextensiveauditlogsaboutanalystaccesstoensurethattheanalystisnotable tochainqueriesinordertogainaccesstoinappropriatedata. PrivacyVawareanalysisframeworkthathelpsanalystbepolicycompliant. SomeinitialthinkinghasbeengiventomanagingMOOCdataviadecisionandpolicy enginesbasedonheuristics.thisapproachwouldrequireseparatingthedatabasesand usingdifferentaccesscontrols. 23
25 Risks Whatrisksaretheretoeventhenewtechnology? Differentialprivacyonlyworkswithinacloseddataset;privacybreechesare possiblewhenexternaldatasetsarelinked. Encryptionactslikeaccesscontrolandisusefulwhentheplatformprovideris untrusted. ArestrictedAPIactslikeanaccesscontrolcombinedwithaudit. Auditingcanhandlepostfactoproblems. Theanalystplatformprovidesaholisticapproachtoaccesscontrol,privacy awareness,andensuringpolicycompliance.however,itrestrictstheanalysttoa singleplatform. 24
26 3 ResearchInfrastructureforSocialMedia Team:&Maritza&Johnson&(Facebook),&Dazza&Greenwood&(MIT),&Mona&Vernon&(Thomson& Reuters)& 3.1 Abstract Mostsocialmediaplatformsprovideatleasttwobasicfeatures:theabilitytoshareuserV generatedcontentandtheabilitytoconnectwithanaudience.differentsocialmedia platformsmakeitpossibleforuserstosharearangeofcontenttypesandsomeallowthe usertoselectivelychoosetheaudienceforindividualpiecesofcontent.onfacebook,for example,theusercouldsharetextvbasedstatusupdates,photos,orwebsitesurls.the userisalsoabletocommentoncontentpostedbyotherusers,installapplicationsthat utilizethefacebookapi,orcommunicatewithotherswithinaselfvorganizedgroupof people.betweentheuservgeneratedcontentandtheserverlogsthatcapturehowand whenpeopleinteractwiththeplatform,theseservicesareaninvaluablesourceof informationabouthumanbehaviorattheindividual,group,andevencountrylevels. Thegoalofthisscenarioistoevaluatetechnicalsolutionsthatwouldopenthisdataupto researcherswhileofferingdatasubjectsinformedconsentandcontrolovertheirdata. StudiesofsocialmediatodatehaveprovidedinsightsontopicsaswideVrangingassocial capital,socialinfluence,memeevolution,emotionalcontagion,mobility,andpolitics.fora varietyofreasons,muchofthisresearchiscurrentlylimitedtoemployeesofsocialmedia companies. 3.2 ScenarioIntroduction StudiesofsocialmediatodatehaveprovidedinsightsontopicsaswideVrangingassocial capital,socialinfluence,memeevolution,emotionalcontagion,mobility,andpolitics. Unfortunately,muchofthisresearchiscurrentlylimitedtoinVhouseresearchersatsocial mediacompanies.academicsandotherresearchershave,insomecases,leveragedpublicly availablecontentorapis,whentheyareavailable,buttherearenotablelimitationsto collectingdatathroughthesechannels.insomecases,studyingagroupofpeopleyieldsthe mostinterestinginsightsbutthisrequiresthatacriticalmassofthepopulationoptsvintoa researchprogram.inothercases,theuservgeneratedcontentisbestsupplementedby informationthatcanonlybefoundintheserverlogs,suchashowfrequentlyaperson visitstheplatform,howmuchtimetheyspend,andtheproportionoftimespent consumingcontentversusproducingcontent. Onewaytoincreasethevolumeofresearchinthisareaistodevelopasocialmedia researchinfrastructurethatallowsusers(datasubjects)tooptvintoaprogramthatmakes somesubsetoftheirsocialmediacontentandtheaccompanyingserverlogsavailableto researchers.theresultwouldbealargevscale,richdatasetthatwouldempower researcherstogeneratevariedandreproducibleresearch.socialmediaplatformsmight participateindatareleaseprogramwithvaryingoptions.forexample,onesuccessful implementationoftheprogrammightincludeapredefinedsetofuserdataanddatafrom serverlogs,afeaturethatallowsresearcherstocontactparticipantsforsupplementary dataorfollowupsurveys.itmightalsoincludeaportalwitheducationalcontentfor individualstovisittohelpthemunderstandtheinformationthey vechosentodonate,to seehowresearchersareusingit,andtogaugethelongvtermbenefitsofparticipation. 25
27 TheincentivefortheStudyParticipantsandSocialMediaProvidersistoactforthepublic good.theriskforthestudyparticipantsisthattheymightexperiencenegativeeffectsasa resultofcontributingtheirdatatothegeneraldataset.thedataexchangedmaycontain severalfeaturesofdataknowntobepersonallyidentifyingorsensitiveinnatureincluding race,sexualpreferences,genderchoice,andpoliticalviews.thedataexchangedcouldalso beusedformakingunexpectedinferencesthattheparticipantwasunawareofatthetime ofconsent. AsahighVleveloverview,theprogramwouldbeinitializedbytheSocialMediaProvider. ThesocialmediaproviderwouldadvertisetheoptVinresearchprogramtousers(potential participants),giveanoverviewofthestructureoftheprogram,therisks,andthebenefits andpresentthechoicesthatrepresenthowausermightparticipate.thisinformation wouldincludethemainfeaturesoftheprogram:thebasicsetofinformationthatis requiredtoparticipate;additionaloptionalfieldsthattheparticipantmaychooseto include;andthefeaturesthatwouldallowaresearchertocontactauserforadditional information. TheparticipantwillhavegranularoptVinchoicesforsharingasubsetoftheirpersonal data,forexample,somebasic(static)fieldsareincludedinthesetsuchasbirthmonthand year,currentcity,schoolhistory,jobhistory,etc.theparticipantisalsogiventheabilityto contributedynamicstreamsoftheirdata,includingphotos,posts,comments,and interests. Theinformationwillclearlydescribethepoliciesthatresearcherswillbeheldto,while makingitclearthatthedatasetisnotbelievedtobeanonymousordevidentifiedinarobust manner. 3.3 StakeholdersandInteractions Socialmediaprovidersarethedatacollectorsandwouldinitiallyserveasthedata platformproviders. Socialmediausersarethedatasubjectsandareaskedtoprovideinformedconsentforthe datatobetransmittedbysocialmediaprovidertoresearcherforpurposesofresearch study. Researchersaredataanalystsandreceivedatafromdatacollectors(socialmedia providers)bypermissionofthedatasubjects(socialmediausers).theresearchers becomedatacuratorsofthedatathattheyreceiveatthetimeofreceiptandany derivativedatathatisproducedasaresultoftheresearchactivities. Thedatacollectors(socialmediaproviders)remaindatacuratorsfortheunderlyingdata ofallsocialmediausersthattheycontinuetomaintain. Socialmediauserswillcontinuetointeractwiththesocialmediaplatformtogeneratenew content. Researchersmightcontactsocialmediauserstocollectadditionaldatatosupplementthe socialmediadata. Socialmediauserswillcontactthesocialmediaprovideriftheyexperienceissuesorhave concernsabouttheoverallprogram.userswillexpectthatthesocialmediaprovideris ultimatelyresponsibleforensuringapositiveexperience. 26
28 Researcherswouldprovideinformationtothedatasubjectsabouttheresearchthatresults fromusingthedatasubjects data Data Examplesofthedatathatcouldbemadeavailable: Posts:photos,statusupdates,locationcheckVins,etc. Commentsandthenumberoflikesonindividualposts Educationhistory Hometown Currentcity Religiousandpoliticalviews Informationaboutthefriendnetwork:summarystatisticslikecount,breakdownby agerange,currentcity(location),gender,politicalviews,andeducationlevel,etc. Forthedynamicfields,theinformedconsentdialogmightoffertheabilitytocontribute: Audience,keyword,tags,orsomeothermechanismcoulddefinetheexceptions. Allhistoricaldata Allhistoricaldatawithsomeexceptions Onlyfuturedata Onlyfuturedatawithexceptions Historicalandfuturedata Historicalandfuturedatawithexceptions Makingthedataavailable: Option1.Socialmediaprovidergeneratesdataslices: Onamonthly/quarterly/annualbasis,theSocialMediaProviderwouldcreatea newdatasliceforallactiveparticipantsintheprogram. ParticipantswouldbeabletooptVoutoftheprogram,buttheywouldnotbeableto removetheirdatafromthedatasetsthathadalreadybeenpublished.this&is&mainly& because&no&practical&guarantees&could&be&made&about&deletion&requests&once&the&data& has&been&released&to&researchers.&& Researcherswouldconductqueriesontheavailabledatasets,ordownloadthe entireavailablesetforagiventimeperiod. Option2:SocialmediaplatformprovidesasAPIspecificallyforthisprogram. 3.4 Systems Legalsystems Theprivacypolicy,ordatausepolicy,currentlygovernshowdatacanbe used. Socialsystems Whataretheexistingexpectationsaroundwhoownssharedcontent? Socialmediadatasometimesinvolvesmorethanonedatasubject.Considerforexamplea Facebookstatusupdatewithasetofcommentsand Likes. Thesimpletextofthepost belongstotheoriginalposter(thepersonweconsiderthedatasubjectthroughoutthis scenario).butthepostmightalsoinclude tags tootherpeople.thesestructured referencestootherusersrepresentotherindividuals.what sthebestwaytohandle 27
29 providingthisinformationinthedataset?similarly,onfacebook,commentsonapostin arestoredwiththeaccountofthepostauthorratherthanthecommenter.whodoesthis contentbelongto?thecommentsarerelevanttothecontextofthepost,butaregenerated byotherpeople.isconsentrequiredtoknowwhichusers liked apost?dowelimitthe datasothatonlythenumberoflikesisavailable? Businesssystems Humansubjectsresearchrequirestheapprovalofanethics committeeifthecommonrule 4 applies. Technicalsystems informedconsent,apermissionvbasedsystemtoallowtheuserto participateinawaytheyfeelcomfortable,transparencyandcontroloverhowdatais shared,deletionprotocols,devidentificationofdatatoprotectindividualswhenitis aggregated,andauditablesystemstounderstandwhohasaccess. 3.5 AnalyzetheScenario Goals Theparticipantsbenefitfromcontributingtoageneralbodyofknowledgeand perhapstheywilllearnsomethingaboutthemselvesonanindividualbasistoo. Researchershaveaccesstoadatasetthatwaspreviouslyunavailable. Thesocialmediaprovidergainsinsightsabouttheuserbaseandcontributestothe generalbodyofknowledge. TheResearchersmaybeactingforthepublicgood,ortheymaybeactingto developtheirowncareers Risks Participantsagreetoparticipateintheprogramandthenlaterexperiencean unexpectedharm,duetoanunexpectedinferencethatarisesfromtheresearch. Participantsagreetoparticipateintheprogramandthenlaterexperiencean unexpectedharm,duemodificationofthesitebasedonthoseinferences,orasa partoftheexperimentitself. Thedatasetwouldbeavaluableresourceforresearchers,butitwouldbedifficult toquantifythebiasintroducedtothedatasetbasedonthecharacteristicsofthe peoplewhodecidetooptvintotheprogram. Researchersidentifyacorrelationinthestudypopulationthatcanbeextrapolated tothegeneralpopulation,greaterthanthepooloftheparticipantswhooptedin. DeletionrequestsVVisitreasonabletodesigntheprogramsuchthatpeoplecanopt inorchoosetooptout,butcannotremovetheirdatafromthealreadyvreleased dataslices?ifnot,thenhowwoulddeletionbehandledwhenthedatasliceshave alreadybeenreleased? Lackofcontrolonthedownstreamuseofthedata,orderiveddata:whatare expectationsandcommitmentstothepeoplewhooptinondownstreamusesofthe data?whennewinsightsemerge,howdoyouensurethattheinferences/derivative datahavebeencreatedinawaythatisconsistentwithanindividual s 4TheCommonRuleisthenameoftheU.S.federalpolicyontheethicsofuseofhumansubjectsinbiomedical andbehavioralresearch.formoredetailsee 28
30 expectations?howwouldwedetectamisuseofthedata?howwouldwetag derivativedatatounderstandwhereitcamefromandunderstandtheoriginal policyinordertodeterminewhethertheactionandthefutureusesarepolicy compliant? ThedatacopymaybedisposedofbytheResearchersafterthestudy,ormaybe retainedinacorpusforfurtherstudy.thedatacopymustbeheldsecurelyandthe Researchersareliableforabreach.However,theSocialMediaProvidermaybe liableiftheyhavenotassuredthattheresearchersareactingproperlyandalso mayriskcollateraldamageinthecaseofabreach,eveniftheproperprocesses havebeenfollowed.avarietyoftechnologiesandsystemswillbeusedtostoreand transmitthedata,includinginternetlinksandvariousdatabases.thedatamustbe heldaccordingtothevariousdataprotectionregulationsintheterritorythatthe datahasbeenexportedto,providedtheexportislegalinthefirstplace Rules TermsandConditionsofthesocialmediaprovider Thesocialmediaplatform sexistingaudiencecontrolsforcontent NoticeandconsentwhentheuseroptsVintotheprogram FTCSection5 FortheResearchers:applicablehumansubjectsresearchprotections(e.g.,The BelmontReportorTheCommonRule) Thepoliciesofpublicationvenues Time Roughlytwotofouryears ExistingRelevantBestPractices HumansubjectsreviewcommitteeVVWhereTheCommonRuleappliesanethicscommittee wouldberequiredtogiveapprovalforhumansubjectsresearchandanappropriaterisk assessmentwouldbeundertakentovalidatethearrangementsthathavebeenputinplace tomanagethedatasecurityanddisclosure. OAuth2forenablingaccesstoauthorizedusersVVOncethedatasubjecthasprovidedthe clickvbasedgrantofauthorization,theresearchercouldbegrantedanoauth2tokento requestandreceivethatindividual sdataviatheapi.thedatawouldthenbetransferred toaresearchplatformanddatabasetoconducttheanalysis.theoauth2tokenwouldbe provisionedtoincludeauthorizedaccesstoascopeofaccessthatcorrespondstothe personaldatathatthedatasubjectagreedtoprovide. IntheUK,organizationsliketheUKDataArchivecanbeconsultedtomanagetheprivacy processesandpublicationofresultswithoutbreachingprivacy Gaps Theabovedescriptionincludesafewcaveatsthatarebasedonthelimitationsofour technicalabilitiesvvforexample,it simportantthattheparticipantsunderstandthat researcherswouldagreetoapolicythatprohibitedattemptstorevidentifyparticipants withinthedataset,butitwouldbedifficulttomakeanyguaranteesalongthoselinesgiven today stechnicalsolutions.similarly,therecouldbecontractuallimitationsinplace 29
31 arounddeletionandretention,however,wearelackingtechnicalsystemstoenforcethe policies. Themanagementofaccesstodataandtherisksassociatedwithpublication presentanimpedimenttotheuseofsocialmediadata. Gatheringinformedconsentfromsocialmediausersisparticularlyproblematic. Toenableresearchofthiskind,weneedtostreamlinetheseprocessesandprovide automaticverificationofthesafetyofdisclosures. 3.6 InnovationIdeasandOpportunities Lookingat3Y5yearsopportunitiesandchallenges Oneofthemainopportunitiesliesintheabilitytocombinesocialdatafromdifferent sourcesinordertoconductmoreinsightfulresearchandenablingreproducibilityof research.thiswillrequiretechnologytoallowforprivacypreservation,ortheapplication ofrulesasthedataiscombinedwithotherdatasets. Howdowedeveloplegislation,ifitisnotalreadyinplace,tosetVupabaselinethatwillnot becountryvspecificandhencemakesitdifficulttomanageforthesocialmediaprovidersto complytomultipleformsoflegislation?ideally,therewillbeamechanismforallowing socialscienceresearchtobeconductedonaglobalscale. Theessenceofcomputationalsocialsciencemaybecomemorecommonand normal, comparedtothenicherolethatcomputationcurrentlyhasinthesocialsciences.atrue limitationoftheresearchareanowisthatonlysocialmediaplatformshaveeasyaccessto largevscaledatasets.mostacademicswhoworkinthespacehavepartnershipswith corporateentitiestoacquirelargedatasets.howwilltheresearchcommunitychange whenlargevscaledatasetsareavailabletoallsocialcomputingresearchers? Shiftingnormsareexpectedtocontinueevenbeyondthe3V5yearhorizonandthismeans thatweexpectcontinueddeepuncertainty OpenQuestions Whatifwedevelopeda CommonProgram&Protocol forinfrastructurevlevel servicestoenablepopulationvwidelivinglabssocialmediaresearch? WhatifFacebooksupportedafeatureforusersto"optVin"forparticipationinpreV qualifiedresearchstudiesandwemodeled/testedthatasacommonservice availabletoanyapprovedmitlivinglabapplication?intheory,thissortof capabilitycouldenablerevusableoreasyupdateofconsentacrosssimultaneous researchstudiesandforfuturestudies.thistypeofservicecouldcomprise fundamentalcapabilitiesthatarenowmissingforoperationalizingfairpermissionv baseduseofpersonaldatainbigdatacontexts. AnOAuth2scopetypedevelopedforresearchcontentcouldbeamodelforother socialnetworkstouse.oneofthebestaspectsofthefacebookandgeneralweb 2.0designpatternwithOAuth2isthattheauthorizationscanbeseenona dashboardandindividuallymodifiedorrevokedaccordingtotheagreements, potentiallyatanytime. Howcouldacommonservicetypeandinterfacespecificationbeusedby researcherstoenableothersocialmediaproviders(e.g.linkedin,googleplus, Twitter)toprovideconsentVbaseddatausinginteroperableprogramsand 30
32 accordingtothestandardprotocoldevelopedbymitandfacebook?whatissues ofscaling,cost/riskmanagement,businessvalue,andusabilitywouldneedtobe addressed,andatwhatphaseofdesign,development,testing,iteration,and deployment(alpha,beta,v1,v2)? CouldMITLivingLabspartnerwithFacebooktotestamodelOpenPDS(Personal DataStore)deploymentthatfurtherdevelopedinfrastructureVgradeservice interfaces,pipes,andgauges?wouldvorshouldvitmatterifopenpdswas situatedattheresearchinstitution(e.g.mitformitlivinglabs),oratathirdparty provider? AlternativeA:InteractionsofPeople TheparticipanthasanaccountwithSocialMediaProvider,providesInformedConsentto ParticipateintheStudyand,withinthescopeofthestudy,providesauthorizationtoSocial MediaCompanytoreleasepersonaldatatoResearchersviatheirapplications. Alaboratoryhasanapprovedresearchstudyandhasreceivedtheinformedconsentof individualparticipantsandhasregisteredanapplicationwithasocialmediaproviderand selectedtheoauth2scopesforgrantofauthorizedaccessthatcorrespondtothepersonal datausedtoconducttheresearch.oncetheindividualhasprovidedtheclickvbasedgrant ofauthorization,thelab sappusesanoauth2tokentorequestandreceivethat individual spersonaldataviatheirappandintoaresearchplatformanddatabaseusedto conducttheanalysis. TheSocialMediaProviderprovidesanaccounttotheindividualunderitstermsand conditionsandprovidesadeveloperaccounttothelabunderanothersetoftermsand conditions.italsoprovidesthepersonaldataauthorizedbytheindividualforsharingwith theapplicationofthelabuponpermissionoftheindividualuser Data Allpastandcurrentavailabledataduringthecourseofparticipationinthestudythatis availablebyoauth2individualconsentfromincludedsocialmediaproviders. 3.7 NotesonScenario ThisexampleisbasedonastudythatiscurrentlyhappeningattheTechnicalUniversityof DenmarkincollaborationwiththeMITHumanDynamicsLab.However,referencesto potentialdownstreamsharingarrangementsbyparticipantsandresearchersrepresent prospectivefuturephaseresearchandassumeafuturestateofperhaps1v3yearsfrom now. 3.8 References Relatedtoapplicablerules & *&When&Facebook&has&the&data,&these&terms&apply: PlatformPolicy(AppliesviaResearcher sregistered Client App/Service) 31
33 StatementofRightsandResponsibilities DataUsePolicy FacebookCommunityStandards FacebookPrinciples *&When&the&Researchers&Receive&the&Data SensibleDTUExampleComputationalSocialScienceResearchStudy *&When&the&Participants&Share&Downstream&Via&Personal&Data&Services& MITHumanDynamicsLabModelPersonalDataSystemRules stem_rules.md DraftDataRightsServicesAgreement Agreement.md 32
34 4 DataforGood:PublicGoodandPublicPolicyResearch UsingSensorData/MobileDevices Team:&Jake&Kendall&(Gates&Foundation),&YvesMAlexandre&de&Montjoye&(MIT),&Cameron&Kerry& (MIT)& 4.1 Abstract Thereislittledoubtthatthecapacitytocollectandanalyzemobilephonedataatlarge scalehasgreatpotentialforgood[un][d4d].thereare,however,numerousbarriersthat needtobeovercomebeforethisdatacanbebroadlyusedbynonvgovernmental organizations(ngos)andresearchers: Thedataisgeneratedbythecarriers infrastructureandbelongtothem Theinfrastructuretomanageandanalyzethisdataatscaleforgoodhastobe developed DataVscienceskillsareneededwithinNGOstofullytakeadvantageofthedata, ThesedataarehighlysensitiveandpersonalVsimplyanonymizedmobilephone metadatahasbeenshowedtoberevidentifiable[unique],and Thelegalandregulatoryenvironmentisatbestuncertainandmaypreventcertain usesofthedata. Thisgroupisstudyingthetechnicalandlegalsolutionsthatcouldmakethisdataavailable inanoperationalcontext.wefirstfocusouranalysisontwoscenariosinspiredbythe availableacademicliterature.wethensketchproposedpracticalimplementationsto operationalizethesescenariosandanalyzethemfromaprivacyangle,focusingonrev identification,andalegalperspective,withafocusonafricancountries. 4.2 ScenarioDevelopment Afterconsideringanumberofdifferentscenarios,wefocusedontwothatcontrastscope andpurpose: Scenario1:Trackingpopulationmobilitywithinandacrossborderstomodelepidemic spread Scenario2:MicroVtargetingbehaviorchangeinterventionstoindividualsorspecificsubV setsofthepopulation. Scenario1ismodeledontheuseoflocationdatacomingfrommobilephonesinorderto betterunderstandandquantifythespreadofmalaria.thelocationofusersisrecordedat theantennalevelandeverytimeauserisinteractingwithhisphone(phonecall,text,or Internetsession),locationdataisusedtoestimatehismigrationsbetweenasetof predefinedregions,forexamplefromnairobitolakevictoria,aswellasthetotalnumber ofnightsspentbyeveryuserineveryregion.themainexpectedoutcomesofthisworkare twomatricesthatshowtheaveragemonthlyparasiteimportationbyreturningresidents andbyvisitors.inthescenarioweconsider,suchmatriceswouldbecomputedona monthlybasisandsharedwithlocalcdcs,ministriesofhealth,andngos.wealsoconsider acasewheredatafrommultipleoperatorsacrossneighboringcountrieswouldbeusedto estimatethemonthlyparasiteimportationsperregions.whilethisscenariohasaclear publicpurpose,thesensitivity,revidentifiability,andpotentialformisuseoffinevgrained 33
35 locationdata,suchastargetingofindividualsorgroupsformaliciouspurposes,hastobe considered. Scenario2,inspiredby[bigdatadriven],usesmobilephonemetadatatomicroVtarget peopleforspecificbehaviorchangepurposes:agriculturetechniquesandhealthseeking behaviors,forexample.inthiscase,locationdataattheantennalevel,aswellasother metadatafields,suchasanonymizedcallandtextlogs(excludingcontent),andrecharge informationareusedtoestimateanindividual sstatus(farmer,othersocioveconomic status)and/orpropensitytochangebehavior.inthisscenario,mobilephonemetadataare usedbymachinevlearningalgorithmsthroughasetofprevcomputedmetrics(e.g.daily distancetraveled,rechargingbehavior,timeittakestoansweratext,).userscanthenbe targetedforvariousbehaviorchangeorinformationalcampaignsthroughtextmessagesor phonecallssentbythecarrier,orbyathirdparty.whilecomputingthemetricsrequiresa richsetofdata,thisscenarioaimsatemphasizingthechallengesassociatedwithmicrov targetingindividualsandinintroducinganelementofintrusivenessthatisnotpresentin Scenario1,butinvolvesthesamepublicpurposes. 4.3 OperationofScenarios Foreachscenario,weproposetwopotentialimplementations.Wewillsubsequently analyzethesefourimplementationsfromaprivacyangleandalegalperspective Scenario1 InScenario1implementationA,thedifferentmobilenetworkoperators(MNOs)involved wouldsharesimplyanonymizedindividualmobilitydatawithonethirdvparty.tolimitthe risksofrevidentification,thedatawouldbecoarsenedspatiallyandtemporally.matching thestudy[quantifying],thespatialresolutionofthedatawouldbeatapredefinedregional levelorapproximately1000km²(692settlementsforthe581,309km²ofkenya).similarly, giventheimportanceofnightsformalariainfections(mosquitobites),thetemporal resolutionofthedatawouldbeof12h(e.g.6amv6pm).finally,asmalariasymptomsmay takeupto30daystomanifestthemselves,weworkundertheassumptionthatthree monthsofsuchmobilitydataareneededtoestimatetheimpactofhumanmobilityon malaria.differentmnoswouldhashaslatedversionofthemobilephonenumberofthe subscriberstoallowthethirdpartytoreconcilethedata.scenario1implementationais representedbelow. 34
36 ContrarilytoimplementationA,inimplementationB,MNOsonlyshareaggregated informationwiththirdparties.inthisimplementation,everymnowillprovideamodified versionofthemobilitymatricesdevelopedby[quantifying]tothethirdparty.usingthree monthsofdata,everymnowillassigneveryofitsuserstooneregion.thisregionwillbe theuser shomelocation.themnowillprovidethethirdpartywitharegionvregionmatrix containinghowmuchtimeuserswhosehomeisinregionihavebeenspendinginregionj. Forexample,therowcorrespondingtoregioniwilllooklikethefollowingmatrix: iv2 iv1 i i+1 i+2 1% 2% 87% 0.5% 2% Thisreadsthatalltheuserswhosehomelocationisinregioni,havebeenspending87%of theirtime(e.g.hourlyornights)inregioni,2%inregioniv1,1%inregioniv2overthe courseofthreemonths. EachMNOwillalsoprovidethethirdpartywiththenumberofitssubscriberswhohave beenassignedtoeachregion. 35
37 4.3.2 Scenario2 Herewewillalsoconsiderathirdpartyplatformprovider,althoughthearchitectureis fairlysimilarifthereisonlythemnoinvolved.theissueisonlythattheenduserswould havetotakeituponthemselvestolinktomultiplemnosiftheywantedtobeabletotarget clientsofeach. Heretheanalytictransformationofthedataconductedbytheserviceproviderwould selectasetofuniqueusers(notidentifiedbynameorotherpii,butbyencryptedkeyor otheranonymousuniqueidentifier),basedontheirusagepatternsandinferencesabout theirsocialstatusorothertraits.theywouldthenpasstheuniqueidstothemno,who wouldbeabletomatchthemtothecorrespondingphonenumbersforrevcontactwithan SMSorautomatedvoicemessageencouragingprogramparticipation. Case1 Thirdpartiesmayanalyzeanonymousdatatoselectindividuals,butthe mobileoperatoristheonlyoneintouchwithtargetsandtheyarenotidentifiedto thirdparties.thirdpartiesmaypassbackanencryptedkeyorotheridentifierto triggersendingamessage. Case2VAthirdVpartyisputdirectlyintouchwiththetargets,orcanidentifythem itself. 4.4 RegulatoryEnvironment ReviewofonlinesourcesondataprivacylawsinAfricaindicatesalandscapethatis evolvingalongtwolines.francophonecountriesinwestafricaandnorthafricathat reflectthefrenchcivilcodesystemhavetendedtoadoptprivacyframeworksmodeledon the1995europeanprivacydirective,supervisedbydataprotectionauthorities.englishv speakingcountrieswithcommonlawsystemshavelessdefinedprivacylaws. Thus,dataprotectionauthoritiesinanumberofFrenchVspeakingcountriesaroundthe worldhaveunitedinanassociationundertheleadershipofthefrenchcnil,andatleast 36
38 Benin,BurkinaFaso,Gabon,IvoryCoast,Senegal,Madagascar,Mali,Mauritius,and Moroccohavesuchprivacyregimesinplace,withnewlawsexpectedinMauritaniaand Niger.Manycountries(e.g.,Côted Ivoire)inbothcategoriesdonothaveanydata protectionlaws,butdoappeartohaveconstitutionalprovisionsforarighttoprivacythat providesatleastsomeauthorityforprotection. IntheEnglishVspeakingcountries,thesystemsarelessdeveloped.SouthAfricarecently adoptedlegislation,theprotectionofpersonalinformationbillthatadoptsprivacy principlestobeenforcedbyadataprotectionauthority;ittakeseffectattheendofthis year.bothnigeriaandkenyaareconsideringbroaderbillsthatresembleeachother. Basedonthisframework,wewillusetheEuropeanPrivacyDirective(EPD)asa benchmarkforcivilcodecountries.wewillalsolooktothe[consumerprivacybillof Rights]asawayofexploringitsapplicationanddevelopinganalternativeframework DataUtility Scenario1 ImplementationA:Inthiscase,theutilityseemsclosetothesituationofhavingaccessto thefullrawdata.datapreprocessingandcleaningishardertodooncoarseneddata,as unusualbehaviormightbehiddenbythecoarsening(e.g.anunusuallyhighnumberof phonecalls). ImplementationB:Inthiscase,theaggregationthatisdoneatMNOleveldecreasesthe utilityofthedata.considerationsincludetrackingpeopleacrossborders,removingdual simmers,andtakingspecificperiodsoftimeintoaccount. 4.6 Privacy ImplementationA:ThereexistsariskofreVidentificationevenwhenthedatais coarsened.wewilllookatthenumberofantennaoverseveralregionstomatchtothe unicityformulaonspatialresolution.similarly,thetemporalresolutionherewouldbe twelve.thisshouldallowaveryroughestimateofthelikelihoodofrevidentificationgivenx points. ImplementationB:Whendataisaggregated,theriskofreVidentificationislower;theedge caseswouldbeverysmallregionsthathavebeenassignedashomeregionstoveryfew people.therisktoconsiderherewouldbeatthegrouplevel,e.g.peoplefromoneregion thatonlygotoanotherregion(ofthesameethnicgroup,forexample).acounterpoint wouldbepeoplewhospendtoomuchtimeinanotherregion.thisgoesbeyondpure privacyasriskofrevidentificationandmanyothercasesshouldbeconsidered. 5 CraigMundie,inarecentForeignAffairsarticle,suggestsanewmodelwheregovernanceandregulations shouldnotbefocusedasmuchatthepointofcollectionandstorageofpersonaldata,butratheronhowthat personaldataisusedandretained.thepresident scouncilofadvisersonscience&technology(ofwhich CraigMundieisamember)echoedmanyoftherecommendationsandthoughts.Intheirdocument,BigData: SeizingOpportunities,PreservingValue,inparticular,thebeliefthatregulatingusecasesandenforcingprivacy withstiffcontractualobligationsanddeterrentsmaybeneededtoextractvaluewhilemaintainingdatasecurity andprivacy. 37
39 4.7 CriticalIssues Businesscaseformobilecarriers.Mobilecarriersarenotinthebusinessof conductingsocialscienceorpublichealthresearch.ngoswillneedtodevelopa businessplanthatmakesdatavsharingworkforthecarriersinterestingand worthwhilefromtheirperspective.supportofgovernments(e.g.,healthministries andcommunicationsregulators)willbepivotal. Scenario1presentstechnicalissuesofdeVidentification.Thespatialandtemporal coarseningofcalldetailrecords(cdrs)substantiallymitigatesprivacyrisksand,if strongenough,cansidesteptheapplicationoftheeuprivacydirective.however,it canalsolimitthereliabilityandutilityofthedata. InScenario2,deVidentification,atleastforsignificantapplications,isnotanoption, becauseinterventionswilltargetedtospecificindividuals.thisscenariowill requireengagementofgovernmentstoenablethedatauseandidentification; withoutaffirmativesupportbyrelevanthealthanddataprotectionauthorities,this scenariomaybeimpossible.theimplicationofgovernmentswillalsorequire carefuldevelopmentofmechanismstoavoidmisuseandunwantedidentification. Furtherdevelopmentofspecificpracticesandtechnicalmethodstomanageprivacy protectioninaccordancewithvariousprinciplesoftheeuprivacydirectiveand theconsumerprivacybillofrights(e.g.dataretention,accountability) 4.8 PromisingPathsForward Acrossbothofthesescenariostherearepromisingpathsforwardintermsofemploying differenttechnicalarchitecturesandpracticestomeetdataprotectionneeds,whilestill extractingvaluefromthedata Scenario1 Inthiscase,therearealreadyprivatesectorcompaniesthatgrabmobilitydatafrommobile operatorsandsellitwithoutuserpermission(i.e.,basedonostensiblyachieving anonymity). AirsageisanexampleintheU.S.thatdemonstratesanumberofinnovativeapproachesto sharinganonymousmobilitydata.theyimprovethequalityofthepositionsignalover whatacdrwouldbeabletoprovidethroughtriangulation,whichtheyachieveby upgradingthebasestationsoftwareofthemno.theytheninstallsoftwarewithinthemno firewallthatanonymizesthedatabystrippingitdowntojustmobilitypatternsand aggregatestheoutputtoaminimumofsevenmobiletracesperobservation.hence,iftwo peoplemovedfromatobinagiventimeperiod,theywouldreportthat lessthanseven peoplemoved. Thefactthattheydotheiranonymizationwithinthefirewallremovesthe needtosharerawdata. AcompanycalledGrandatainMexicousesaformofdifferentialprivacyalgorithmtoadd somerandomnoiseandlimitthefidelityofqueriesontheirmobilitydatathattheysellto retailmarketers. Othertechniquestoexplorefurtherwouldincludeemergingdifferentialprivacy approaches,aswellassyntheticdatasetgenerationviamodelingmethodologies(e.g.dpv WHERE). 38
40 4.8.2 Scenario2 Becausedecisionsarebeingmadeaboutactionsinvolvingindividualsorsmallgroupsin thisscenario,andbecauseindividualleveldata(ratherthanaggregate)arebeingused,the factthatdataisanonymizedbybeingstrippedofpiidoesnotfullyameliorateprivacy concerns. Someapproachestoinvestigatehereare: IDkeyencryptionschemesandanonymizationapproachesthatgoasfaras possibletoprotectindividualidentity. Someformofregulatoryexception(e.g.specificlegalauthorizationorpublicpolicy exception)mightalsobeinorder,sinceevenfullyanonymizeddatawouldstill refertoindividuals. Developmentofethicalprinciplestomakesurethatdecisionsbeingmadeabout individualsarefairanddonotexplicitlydisadvantageanyone. Thisrequirescarefulthinkingabouttheuserexperience SMSorcallsthatare clearlytargetingthepersonmightfeel creepy andcareshouldbetakenotto makedatasubjectsfeeluncomfortableortargetedinanyway Thedevelopmentoftrustframeworkstomanagethedataandverifythelegitimacy ofitsuses 4.9 References OverviewofAfricanPrivacyRegulation [D4D] [UN] [unique] [quantifying] [bigdatadriven] epvas/edit?usp=sharing Scenariodevelopmentdocument TkP20/edit#heading=h.gjdgxs 39
41 5 AdditionalUseCases Summarized&by&Karen&Sollins&(MIT)& Inadditiontothethreescenariosdevelopedabove,fourothergroupsprovidedbriefer reports.theyaresummarizedhere,inordertofurtherbroadenourunderstandingofthe breadthoftheproblemdomainofconsiderationofprivacyintheworldofbigdata.these additionaltopicsare:(1)privacyinaggregateddiversedatasets,(2)creation, Management,Application,andAuditingofConsentonPersonalData,(3)Consumer Privacy/RetailMarketing,and(4)GenomicsandHealth. 5.1 PrivacyinAggregatedDiverseDataSets Team:&Evelyne&Viegas&(Microsoft),&Micah&Altman&(MIT),&YvesMAlexandre&de&Montjoye(MIT),& Elizabeth&Bruce&(MIT) Overview Microsoftisworkingwiththeresearchcommunityondevelopinganopensourceplatform forhostingdatasetsandcodeforthemachinelearningresearchcommunity.codalabisa MachineLearningServicethatallowsresearcherstoshareandbrowsecode,data,and createandshareexperimentsandworkflows.codalabhelpsnurtureanenvironmentof scientificrigorandopenupnewavenuesforcollaborationbetweenresearchers. Thecharacteristicsofdatathataresubmittedmightvarywidely.SuchdataincludeswellV known,previouslypublisheddata,suchasthatfromofficialstatisticsandcommunityv manageddataobtainedfromthirdparties,datacollectedbytheauthorsofthesubmission generallyfortheirresearch,andderivativedatasetspreparedspecificallyforapublication whichmayintegrate,correct,annotate,andrecodedatafrommultiplesources. Theemergingchallengesinthisareaarerelatedtothevarietyofdataandthelimited resourcesthatareavailableforvettingit.ownersofcommunityrepositoriesare particularlyconcernedwithdevelopingpoliciesthat1)arestrongenoughtostrengthen replicability,2)thatcanbeappliedwithoutintensecasevspecificscrutiny,and3)recognize commondisclosureofthreats,whilestillpermittingpostingandaccess. Stakeholders DatacollectorVwiderangeVVanypartythatcollectsoriginaldata,nodirectinteraction withserviceormainscenarios,mayhavesettermsunderwhichdatawasoriginally collected ServicehostVprovidesCodaLabserviceandhostsstorage,mayimposerestrictionsonuse DatasubjectsVwiderangeVVnodirectinteractionwithserviceormainscenarios DatacuratorVcuratorscreate competitions onthesite,providedatatotheservice,set termsofusethatarepresentedtocompetitors,(optionally)vetcompetitors DataanalystVentrantsinaparticularcompetition,typicallyresearcherswhoaimto developortunealgorithmsormodelstooptimizesomequantitativecompetitioncriteria, suchas%correctlypredicted,meanvsquarederror(mse) DatausersVsynonymouswithdataanalysts 40
42 Questionsandchallenges& Keygoals: Shareresearchshowingadvancementinfield(notjustincrementaladvances) Findexpertswhocanworkona(societal)problem Keyrisks: ReVidentificationattacks Inadvertentdisclosureofpersonalinformation Identifiedchallenges: Whatisthedatalifecycle? HowdoesaserviceownermanageprivacyVrelatedrisksresultingfromrunninga servicethatacceptingdatafromcurators? LowVeffortmethodsVVmustapplytomanydifferentdatasetsofheterogeneous typeswithoutexpertanalysisofeachdatabase Reuseacrosschallenges:mostcompetitionsdonotsupportreuseacrosschallenges, orlongvtermaccess.incontrast,agoalofcodalabistocontributetoalongvterm evidencebaseforresearchinthisarea. AutomaticorguidedidentificationofPII/datacurrentlyfocusesonmedical/health datacasesandmaynotbeappropriatetotherangeofdatabeingconsideredinthis usecase. Howdowemeasuretradeoffsbetweenutilityvs.privacyinthisusecase? ArethereautomatedtechniquesforidentifyingpotentialPIIindatasetsbeing submittedbyresearchers? 5.2 Creation,Management,ApplicationandAuditingofConsenton PersonalData Team:&Simon&Thompson&(BT),&Karen&Sollins&(MIT),&Arnie&Rosenthal&(Mitre)& Overview Personaldatahasmanystakeholders.Thisscenariofocusesontheabilityofthesubject,as animportantstakeholder,toinfluencehowtheirdataistreated:&&collected,shared,used, andprotected,andtheabilityofthecontrollersofpersonaldatatoabidebythese preferences.patientsandotherstakeholdersmusthaveincentivestoshare(andminimize disincentives),andtotrustotherstobehaveastheysaytheywilldo.otherwise,patients maywithholddatafromcliniciansandrecordholderswillresistforwardingdatatoothers, harmingpatients health,increasingcosts,andslowingoperationalimprovementsand researchprogress. Personaldataisofmanykinds,oftenrequiringdifferentpolicies.Thesedistinctionsinkind aremultivdimensional,andnosingledistinctiondominates.wenotethatauditmetadata andthesubject sownconsentspecificationsarethemselvespersonaldata.theydonot requirefundamentallydifferenttreatment,butmayhavesomespecificpoliciesattached. 41
43 Thisscenarioisrelevanttomanyimportantverticals,includingseveraleachinHealthcare, Education,andCommerce,butwhatiscentraltothisscenarioistheinterplayamong stakeholders wishes.thesedependonthekindofinformationinvolved.inparticular,the subjectmayhavedifferentrightswithregardtodifferentkindsofdata,andespeciallyin termsofmedicalcontent. Acriticalaspectofthisarenaisthatstakeholders,especiallythesubjects,deserve appropriate&controls,butcanrarelyhandlethetechnicalcomplexityofspecifyingthem. Theyneedawaytocustomizebehaviortobeapproximatelycorrect.Theregulatory frameworkmayneedtoallowforsituationswheretheuserdidnotspecifyorunderstand allbehavioraldetails(justasitallowssignoffonlegalesethatfewcitizensunderstand). Stakeholders: Thekeystakeholdersconsiderinthisrevieware: DataSubjects:thosedescribedbythedata Recordholders:thecollectorsandrepository Recipients:thosewhomayreceivethedata,including,forexamplewithmedicalrecords, caregivers,payers,researchers,marketers,orlegalauthorities,whothenmaybecome recordholders. Questions,challenges,andobservations: KeyGoals: Providesubjectswithappropriate(tothemselves)understandingandcontrol(user preferences)overprivacypoliciesofinformationaboutthemselves. Balancetheinterplaybetweeninterestsandresponsibilitiesofdifferent stakeholders,forexamplethesubject,regulators,caregivers,insurancecompanies, etc. Taggingorotherlabelingandgovernanceofdatainordertoenableapplicationof policies. Certifyingandmaintainingthequalityofthedata KeyChallenges: Preferencedataisitselfmetadataaboutthesubject:Consumerpreferencedata mustbetaggedbywhatcontentthepreferenceitselfrevealsvapatientpreference aboutreleasingabortiondatashoulditselftaggedasabortionvrelated,andcannot besharedwithallrecordholders.itisanopenquestionhowbesttocombine confidentialityandusabilityforsuchdata. Standardsforcompositionwhenglobalstandardsareimpossible:Global standards,globallycompliedacrossallindustries,areunlikely especiallyasone addsmoreandmoredetails.(afewbasicpracticesmightbestandardizedand compliedwith,butnotthediversityinamoderneconomy).howshould stakeholdersexpresspoliciesthatarerobust,evenwhensomeinformationis absent? Thediversityofenforcementmechanismswillcomplicateimplementation: Techniquesforamajorcorporationmaybeinappropriateforasmallbusinessand techniquessuitableformanaginglargedocumentsmaybeinappropriatefor 42
44 millionsofvaluesinadatabase.forexample,omittingadocumentdiffersfrom redactingadatabasevalue(whoseabsencemaybenoted). Trust:Toprovideaneffectiveprivacymanagementmechanism,theprivacy metadataofpersonalinformationmustbetrusted,andusedbytrusted components,i.e.,oneneedsaneffectivetrustnetworkthatassuresthateveryone willbehaveappropriately. 5.3 ConsumerPrivacy/RetailMarketing Team:&John&Ellenberger&(SAP).&Ilaria&Liccardi&(MIT),&Dazza&Greenwood&(MIT)& Overview: Thisgroupconsideredaspecificexampleinmarketing,acustomerloyaltyprogramina brickandmortarretailer.theyenvisionedasystemwiththreeelements:(1)the customer ssmartphone,(2)acloudvbasedintermediaryservice,and(3)theretailer s backend.theintermediaryserviceprovidestheserviceforcommunicationwiththesmart phone,bothcollectingdataandpushingoffers.theretailer sbackendcollects,manages andutilizesthecustomerdataandaspartofthatprovidesthesupportforanyprivacy policiesandmeetsanylegalrequirementsforprivacy. Asanextensiontothis,thegroupalsoconsideredacasewherethirdVpartydatamay becomeavailabletothebackendservice.thegroupconsideredtheproblemofmapping betweenthe identified datacollectedbytheretailerandthepotentiallyanonymizeddata fromathirdvpartymarketingfirm. Stakeholders: Subject Cloudserviceprovider Retailerrunningthebackenddatacollection,management,andanalysisservices PossiblethirdVpartymarketingdatasource Keygoals: Improvethecustomerexperienceinthestore Increasetheretailer smarketshare Totheextentthereareregulatoryrequirementsonprivacypolicyenforcement, complywiththelaw Keychallenges: Fusionofidentifieddata,legitimatelycollectedbytheretailerwiththirdVparty marketingdata.simplyfusingthesecorrectlyisextremelydifficult. Totheextentthatmergingdatamaycreate newdata aboutthesubject,thisis subjecttoregulations,especiallyineuropeanditwillrequirepermissionsfromthe subject. Inferenceofotherfactsaboutasubjectfromthebaselevelinformation.For example,itiswellunderstoodthatpatternsof likes maybeagoodpredictorof preferencesnotdirectlyexposedandthereforesubjecttoprivacypolicies.the demonstratedexampleispredictionofsexualpreferences. Morebroadlythisgroupdidnotconsidertheethicsoftheseapproaches. 43
45 5.4 GenomicsandHealth Team:&James&Williams&(Google/University&of&Toronto),&Michael&Power&(Osgoode&Hall&Law& School) Overview: Thisscenariofocusesonsharinghealthinformation(includinggeneticinformation)for bothhealthvrelatedresearchandpersonalizedmedicine.thescenarioinvolvesnumerous healthcareproviders(e.g.,hospitals)andresearchgroups(e.g.,universities)collaborating toexchangeinformationforavarietyofpurposes,includingtheprovisionofcare.asa result,itisinherentlycomplex;notonlyaretherenumerousorganizationsinvolved,but eachofthesemaybesubjecttodifferentlegalrequirementsbasedonthejurisdiction(e.g., country,state,province)inwhichtheyoperate. Whileadvancesingenomicresearchmethodshavemajorramificationsforthebiological sciencesingeneral,theyareparticularlyinterestingfromthestandpointofhealthmrelated& research.infact,someresearchershavearguedthattheanalysisoflargegenomic databases(i.e.,containingmillionsofsamples,asopposedtothousands)maybethekeyto unlockingnewdiscoveriesrelatedtohumanhealth.tonamebuttwoadvantages:1)larger datasetsempowerresearchersbysupportingawiderrangeofqueriesandobservations, and2)theuseofmodern,distributedcomputinginfrastructuresupportsinteractivemodes ofresearchthatoffermajoradvantagesovertraditionalapproaches. Thesituationbecomesevenmorepressingwhenonerealizesthatmanyresearchproblems canonlybeansweredbycombininggenotypeandphenotypedata.inpractice,thismeans themergingofgenomicrepositorieswithelectronic&medical&records&(emrs).indeed,the emergingfieldofpersonalizedmedicineisbasedontheabilitytocorrelateinformation betweenthesetwodomains.giventhemultitudesofhealthvrelatedissuesfacinghuman populations,andthepromiseofgenomicresearchandpersonalizedmedicinetoaddressa significantnumberofthem,itisimportanttodeveloptoolsandmethodsforfosteringthe sharingofgeneticandphenotypicinformationforresearchpurposes. Ofcourse,privacyisoneofthemostcommonlycitedconcernsthatarisewhenindividuals aresurveyedabouttheirattitudestowardssharinghealthinformation.itisvitalthatsuch datasharingbeaccomplishedinamannerthatminimizesriskstoprivacy.aspartof respectingprivacy,individualsmustbeprovidedwiththeabilitytocontroltheuseoftheir information,includingwithdrawingconsent. Whileinformationalprivacyconcernsareexplicitlyaddressedindataprotectionlaw,fair informationpractices,anddatasharingagreements,itisanopenquestionastowhether wecandesignbettermechanismstogiveeffecttothesenorms. Stakeholders: Patients,subjectsofthedata Cliniciansincludingbothphysiciansandalliedhealthprofessionals Researchers Healthcareserviceproviders InstitutionalReviewBoards(IRBs)orResearchEthicsBoards(REBs) Regulators 44
46 KeyGoals: Deliveryoftimelyandeffectivehealthcare(patients,clinicians) Participateinresearch(patients,possiblyclinicians,researchers Actinaccordancewith fiduciary responsibility(clinicians) Obtainandutilizelargegenomicdatasets(researchers) Obtainandutilizelargeclinical(i.e.phenotypedata)datasets(researchers) Integrateacrossthesetwotypesofdatasets(researchers) Maximizeefficiencyofhealthcaredelivery(healthcareserviceproviders) Utilize(andprofitfrom)intellectualpropertyinherentinpatientrecords(health careserviceproviders) Maintainsecurityofrecordssystems(healthcareserviceproviders) Minimizeprivacyrisks(regulators) Providerecourseforprivacyviolations(regulators) KeyChallenges: Atpresent,integrationisalmostimpossible.Mostdatasetaccessisrestrictedto peoplewithintheorganizationcollectingthedata. Integrationacrossdifferentregulatoryauthoritiesispoorlyunderstood. ThetradeVoffsbetweenprivacyandutilityinthecontextoftechnicalprivacy preservationmechanismsareparticularlyacuteinthecaseofgenomicresearch. Thereisalsoatensionbetweentheabilityofpatients(datasubjects)tocontrolthe useoftheirinformation,andtheabilityofresearcherstoaccumulatestabledata setsforresearchpurposes.forinstance,dynamicconsentmechanismsgive patientscontrolofdataattheexpenseofresearchers,whoseactivitiesmaybe interdictedbyrequeststoremovedatafromtheircorpus. EnablinginterVjurisdictionaltransferofdatamayrequiretheharmonizationof regulatoryregimes,aswellastheadoptionofcommonstandards. Thecurrenttransactioncostsfordatasharingagreementsareonerousformany organizations,creatingalandscapeof'silos'ofhealthinformationthathavegreat utility,butwhichcannotbeaccessed. Existingapproachestosharinghealthdatabetweenorganizationsrelyheavily uponbivlateraldatasharingagreements.thisapproachscalespoorlywhenthere aremultipleorganizationsthatwishtojointlysharedata. 45
47 6 Conclusions Karen&Sollins&(MIT)& Wegeneralizethreesetsofconclusionsfromthereviewofthescenariosdescribedabove insections2through5.thefirstisasetofoverarchingchallengesderivedfromthe systemicapproachestakenacrossthesebigdatascenariosinconsiderationofprivacy.the secondisacommonalthoughnotuniversalsetoftypesofstakeholdersinhandlingboth thebigdataitselfandinsupportoftheapplicationofprivacypolicies.finally,weobservea numberofkeyopenquestions,raisedbythesetofscenarios. Weobservefivekeychallengesfromthescenarios: Scale:Notonlyareweobservingincreasingsizesofdatasets,butalsothose increasesinsizewillleadtoincreasesinsizeoftheaccompanyingmetavdatathatis criticaltothesupportofprivacy.withoutsignificantimprovementsinefficiency, thegrowthinbothdataandmetavdatawillleadtountenableprocessingtimes,but thismustbeachievedwithoutcosttoprivacy. Diversity:Withincreasingdatasetsizeswillalsocomeanincreaseininterestsand typesofresponsibilities.thisincreaseislikelytoleadtoincreasedprobabilityof nonvalignedinterests.thisdiversityofobjectivesandinterestwillleadtoatleasta divergenceofprivacypoliciesandmorelikelytoincreasedincompatibilityof privacypolicies.capabilitiesforbothobservingandhandlingsuchdifferenceswill becomeincreasinglyimportant. Integration:Inadditiontothepointsaboveofscaleanddiversity,services increasinglysupporttheintegrationofpreviouslyindependentdatasets.ata minimumthiscanleadtosurprisingorunintendedinferencesacrossthesenewly integrateddatasets,resultinginpreviouslyunknownfactsaboutsubjects.thusa newchallengearisesfromthisintegrationintermsofprivacypoliciesforthese newlydiscoveredfactsordata. Impact&on&secondary&participants:Althoughdatamayitselfhaveaprimarysubject, increasinglytherewillalsobesecondaryparticipantsorsubjects,suchasfriends, parents,guardians,orbyvstanders,alsoreflectedinthedata.providingprivacy throughprivacypoliciesforthesesecondaryparticipantsmaybeevenmore challengingthanfortheprimarysubjectsofdata. Need&for&emergent&privacy&policies&for&emergent&data:Integrationmayleadto emergent,orpreviouslyunobservabledataaboutsubjects.thisnewlyobservable datawillalsorequireprivacypolicies,anditisnotclearthatthosenewpolicieswill simplybeaderivativeofthepoliciesapplicabletotheunderlyingoriginaldata.itis likelythatnew,emergentprivacypolicieswillbeneeded,andthechallengeishow thosenewpolicieswillbecreated,bywhomandunderwhatconditions. Thesecondsetofkeyobservationswederivefromthesescenariosisalistoftypesof stakeholders,whoplayaroleinsetting,enforcingandmitigatingthefailureofapplication ofprivacypolicies.webeginwiththesubjectsofthedataitself.insomecases,butnotall, theyplayaroleindeterminingapplicableprivacypolicies.additionally,adecisionvmaker, whodecideswhatdatatocollectandhowtohandleitmayplayasignificantorcentralrole insettingprivacypolicies.fromtherewemovetothe handlers ofthedata.thatdatawill becollectedbysomeparty,andmaybeseparatelycuratedforcompleteness,accuracy,and soforthbyacurator.thedatamaythenbestored,managed,andmadeavailablebyadata platformprovider.itwillthenbeusedbyadataanalyst.alloftheselastfourhaveaccessto 46
48 thedatainoneformoranother.wehavethenalsoidentifiedtwoadditiontypesof stakeholders,whoserolesfocusonenforcementofprivacypoliciesandrecordingor auditingofusageofthedata.thesetwofinalrolesaredistinctfromeach.itispossibleto haveauditingwithoutenforcement,foreitherlegalormitigationreasons,ifapolicyis violated.enforcementbenefitssignificantlyfromauditing,butisnotdependentonit. Finally,werecognizethattherearemanyopenquestions.Wehighlightfourhere: Novelty:Althoughweidentifiedanumberofchallengesabove,thereremainsa questionofwhetherbigdataleadstonewanduniquechallengesintheprovision ofprivacy,orwhetherthesechallengesareonlymoreobviousinthebigdata arena. Tradeoff:Eachofthescenariospresentsasignificantbenefit.Thesemaybe economic,social,medical,andsoforth.inaddition,eachpresentsriskstoprivacy, bothinherentlyandperhapsbecausethesituationisstillnewandnotwell understood.wemustaskhowtoevaluatethetradeoffsbetweenbenefitsandrisks, specificallytoprivacy.atthispoint,wedonotevenhaveametricorspectrum alongwhichtoconsiderthistradeoff,anditisnotclearthatasingleoneexists. Harm:Therisktoprivacymentionedaboveisneitherbinarynornecessarilystable. Thisleadstoaquestionofwhetherandhowtoevaluatetheharmthatmayresult fromdifferentchoicesinthetradeoffspacebetweenbenefitsandrisks. Trust:Trustreflectsawillingnessamongstakeholderstoacceptvulnerabilities. Thus,wemustaskhowitisthatstakeholdersdeterminetheirleveloftrustor mistrustinotherstakeholders,withrespecttotheapplicabilityofprivacypolicies. Thisincludesboththestakeholders modelsoftrust,howthoserelatetopeople s perceptionsofeachother,aswellaswhatmechanismsandtechnologiescan provideinsupportofthoselevelsoftrust.furthermore,onemustaskhowsuch trustevolveswithtimeandhowthatmightbesupportedtechnically. Itisimportanttorecognizethatourobservationsherearelimited.Theyarebasedonthis limitedsetofscenarios,andeveninthatcontext,maybeincomplete.theyarepresentedto givethereaderaclearersenseofthesortsofchallengesandquestionsthatarisefromthe intersectionofbigdataandprivacy. 47
49 48 A. Appendix:PrivacyScenarioTemplate Team:&&Simon&Thompson&(BT)&&&Dazza&Greenwood&(MIT&Media&Lab)& ElementsofBigDatascenario People/Stakeholders?(i.e.,Whoaretheparties,theirrespectiverolesand relationships?whoisdataowner(datacontroller)?whoisusingthedataand whatistheintendedpurpose?whoarethedatasubjects?whoisdoingthedata analytics?) Interactions?(i.e.,WhattransactionsorotherexchangesbetweenActors?)(What isthepowerdynamic?) Data Whatkindofpersonaldata?* Whattypeofbigdatamodels,analytics,orotheroutputsresultfromthis scenario? Howisthedataused? What sthedatalifecycle? Systems?(i.e.,Whatbusiness,legal,technical,orsocialsystemsmattermost?) BusinessSystems(Ethicscommittees,signVoffbyauthorized officers,recordkeeping,audit) LegalSystems(Contracts,Employeerules/procedures, certification/accreditations,compliancereviews,insurance/bonding requirements,industrystandardpolicy/guidelines,etc.) TechnicalSystems(Systempermissionsandsecurity,alarms& automateddetectionofpai,automaticanonymizationofdata, cryptography,etc.) SocialSystems(Whatsocialsystemsandcontextexists?) Analysisofscenario Goals(i.e.,WhataretheincentivesandthebenefitsdrivingtheActors?Who benefits?whatarefinancialincentives?) Rules:(i.e.,Whataretherelevantlawsandregulations,otherenforceablerules) Arethereexistingstatutes,contractualagreementsorothercommitments associatedwiththedate. i. Rulesaboutretention, ii. Liabilityforbreach? iii. Accuracy? iv. Others... Iftherearenotstatutoryorotherbindingrules,howwouldtheprinciples fromtheconsumerprivacybillofrightsguidethedevelopmentofrules? i. INDIVIDUALCONTROL:Consumershavearighttoexercisecontrol
50 overwhatpersonaldatacompaniescollectfromthemandhowthey useit. ii. TRANSPARENCY:Consumershavearighttoeasilyunderstandable andaccessibleinformationaboutprivacyandsecuritypractices. iii. RESPECT FOR CONTEXT: Consumers have a right to expect that companieswillcollect,use,anddisclosepersonaldatainwaysthat are consistent with the context in which consumers provide the data. iv. SECURITY: Consumers have a right to secure and responsible handlingofpersonaldata. v. ACCESS AND ACCURACY: Consumers have a right to access and correct personal data in usable formats, in a manner that is appropriate to the sensitivity of the data and the risk of adverse consequencestoconsumersifthedataisinaccurate. vi. FOCUSED COLLECTION: Consumers have a right to reasonable limitsonthepersonaldatathatcompaniescollectandretain. vii. ACCOUNTABILITY: Consumers have a right to have personal data handledbycompanieswithappropriatemeasuresinplacetoassure theyadheretotheconsumerprivacybillofrights. Risks:Whatarethepotentialharms?Whataretherisksofthoseharmsoccurring? Towhom?Iftheriskisanexternality,howmightitbemitigated? Assessmentofscenario Existingorrelatedbestpracticesforcontextofthisscenario Whatbusiness,legal,and/ortechnicalbestpractices? Gap IssuesNotAddressedbyExistingPracticesandSolutions BusinessSystems LegalSystems TechnicalSystems SocialSystems ShortFallBetweenCurrentandNeededPracticesandSolutions Keyoutcomesforeachscenario Promisingbestpractices Gapsthatneedtobefilledwithnewtechsolutionsorpolicyapproaches PersonalDataisdefinedbroadly,asfollows,fromtheConsumerPrivacyBillof Rights. Thistermreferstoanydata,includingaggregationsofdata,whichis linkabletoaspecificindividual.personaldatamayincludedatathatislinkedtoa specificcomputerorotherdevice.forexample,anidentifieronasmartphoneor familycomputerthatisusedtobuildausageprofileispersonaldata.this definitionprovidestheflexibilitythatisnecessarytocapturethemanykindsof dataaboutconsumersthatcommercialentitiescollect,use,anddisclose. 49
51 50 B. Appendix:Stakeholders Elizabeth&Bruce&(MIT),&Karen&Sollins&(MIT)& DataStakeholders Decription/Examples "datacollector" Partythatcollectsthe raw ororiginaldata fromthedatasubjects "datasubject(s)" Aperson(e.g.apatient,student, customer )orgroupofpeople(orentity) thatdataisbeingcollectedfrom;thisisthe groupofdataprovidersorparticipants. Subjectsmaybecontributingdatawith informedconsent(e.g.byoptingvinto researchstudy);ordatamaybecollectedinv directlyorinaggregate. Datamaybegeneratedby anindividual/consumer(e.g.takingan onlineclass,acustomeratabank) theinteractionsofagroupofindividuals (e.g.peertopeerinteractions;social networkgraphs) combining/aggregatingdataovera group/populationofsubjects. "datacurator (also:controller,provideror caretaker) Partythatstoresandmanagesthedataand isresponsibleforgranting/controlling accesstothedata;datacuratorisoftenthe stakeholderthatrequiresotherstoformally submittoapolicy(ordatauseagreement) inordertoaccessthedata.theremaybe morethanonedatacurator: originaldatacurator thirdvpartydatacurators "dataanalyst"(also:datascientist) Partydoingtheanalyticsonthedata;may usemanydifferenttypesoftools,software etcforanalysis,explorationand visualization( relyingparty ) "decisionmaker" Thestakeholder(s)thatbenefitsfromthe data;adecisionmakerthatultimately derivesnewinsightsandvaluefromthe dataanalysis;thisstakeholderwill ultimatelymakedecisionsbasedonthedata andmayormaynottakeactionforsome purpose.thispurposeoruseofthedata
52 dataplatformprovider dataregulator(s) dataauditor maybefor:personalbenefit;forvprofitor commercialuse;orsocietalbenefite.g. NGOs/government). Databeneficiarymaybe: anindividual agroupofindividuals aninstitutionororganization(private; commercial;government;nonvprofit) acontentprovider aserviceprovider Thepartythatbuildsthesystem(s)fordata collectionandprovidesaservice.platform provideranddatacollectormayormaynot bethesameentity/organization.inthecase thattheyaredifferent,theplatform providermayhaveitsowndatausepolicy separatefromthedatacollector. Anarbiterthatsetspolicies;thegoverning regulatorybodythatdevelopspoliciesthat controlsdatacollection,sharinganduse amongstakeholders couldbeatthelocal, state,federal,internationallevel(e.g. HIPPA,FERPAetc) Theenforcingbodyresponsibleforensuring thepoliciesandregulationsareenforced. Mayrequireauditlogging,documentation toensurepoliciesareenforced,anddatais managedasrequired 51
53 52 C. Appendix:StakeholderDatafromMOOCsandOnline LearningEnvironments(OLEs) Elizabeth&Bruce&(MIT)& DataStakeholder Example TypeofData Allclickstreamdatacapturinginteractionsbetween studentandcontent,includingwhenwatch video/lessons,quizanswers,textfromdiscussion forums,etc.useofvideosandotherevresources,such asdigitizedreferencematerial,wikis,andforums. Assessmentbehavior:attempts,correctness,useof immediatefeedback. MayincludePII(name, ,address)dependingon whatinformationrequiredwhenregisterforcourse. SelfVreportedbackground,preandpostVtestsurveys. DataSubjects Studentswhotaketheonlinecourse,complete assignmentsandreceivecredit Studentshavezeroorlittleaccesstotheirdatabeyond officialgrade/recordscreatedfortheireducation purposes DataPlatformProvider Cousera,EdX,Udacity,StanfordU,etc. ContentProvider IndividualContentProvidersincludefaculty,teachers, staffwhoprovidetheteachingcontentandmaterial (videos,lessons,quizzes,etc),supportdiscussions, interactwithstudents(thedatasubjects)directly,and responsibleforgrading/credit InstitutionalContentProvidersincludeinstitutionsand organizationsthatarebehindtheteachingcontent(i.e. MIT,Harvard,oranindividualprivateenterprise) DataCollector DataPlatformProvidersandInstitutionalContent Providers DataCurator DataPlatformProvidersandInstitutionalContent Providers DataScientist Analystsincluderesearchers,theirstudents(ifthe researchersareacademics),andeducation technologists.teachingstaff,platformproviders,and
54 DecisionMaker DataAuditor(andCompliance) DataRegulator institutionalcontentprovidersmayalsoactasanalysts. TypicallytheDataPlatformProvidersandInstitutional ContentProviders,sometimestheIndividualContent Providers(i.e.theteachers) Government Government FERPApolicies 53
55
Threat!and!Vulnerability!Assessments!
ThreatandVulnerabilityAssessments https://www.cybersecdefense.com @cybersecdefense 13720JetportCommerceParkway STE13 Ft.Myers,FL33913 COPYRIGHT 2015,CybersecurityDefenseSolutions,LLC ALLRIGHTSRESERVED
Accountability Model for Cloud Governance
Accountability Model for Cloud Governance Massimo Felici, Hewlett-Packard Laboratories CSP Forum 2014, Athens, 21-22 May 2014 Overview Problem of Data Governance Data Governance in the Cloud Accountability
Special Education Transportation Task Force Report
Special Education Transportation Task Force Report OCTOBER 2010 www.moecnet. org Massachusetts Organization of Educational Collaboratives Massachusetts Organization of Educational Collaboratives DearColleagues:
MASSIVE OPEN ONLINE COURSES AS DISRUPTIVE INNOVATION: POSSIBILITY TO HELP EDUCATIONAL CHALLENGES IN CURRENT TIMES?
Jaroslava Szüdi, University of Economics in Bratislava MASSIVE OPEN ONLINE COURSES AS DISRUPTIVE INNOVATION: POSSIBILITY TO HELP EDUCATIONAL CHALLENGES IN CURRENT TIMES? New Delhi 3 October, 2015 Self
DATA RECOVERY SOLUTIONS EXPERT DATA RECOVERY SOLUTIONS FOR ALL DATA LOSS SCENARIOS.
Shareholders Communication Policy
Shareholders Communication Policy China Resources Power Holdings Company Limited Adopted By the Board: 19 March 2012 Room 2001-05, 20/F, China Resources Building 26 Harbour Road, Wanchai, Hong Kong www.cr
TECHNICAL SPECIFICATION: LEGISLATION EXECUTING CLOUD SERVICES
REALIZATION OF A RESEARCH AND DEVELOPMENT PROJECT (PRE-COMMERCIAL PROCUREMENT) ON CLOUD FOR EUROPE TECHNICAL SPECIFICATION: LEGISLATION EXECUTING CLOUD SERVICES ANNEX IV (D) TO THE CONTRACT NOTICE TENDER
Enabling Integrated Care
Enabling Integrated Care Harnessing personal health systems for better outcomes across the care continuum Briefing Note for a SmartPersonalHealth Workshop WoHIT, Thursday 18 March 2010, 13:00-17:00, Barcelona
Big data tools and analytics are increasingly contributing to the increasing popularity of MOOC.
Brochure More information from http://www.researchandmarkets.com/reports/3506951/ Massive Open Online Course Market by Platform, Course (Humanities, Computer Science & Programming, Business Management,
The Language Services Market: 2014
The Language Services Market: 2014 Annual Review of the Translation, Localization, and Interpreting Services Industry By Donald A. DePalma, Vijayalaxmi Hegde, Hélène Pielmeier, and Robert G. Stewart The
Whitepaper. GL Consolidation. Published on: August 2011 Author: Sivasankar. Hexaware Technologies. All rights reserved. www.hexaware.
Published on: August 2011 Author: Sivasankar Hexaware Technologies. All rights reserved. Table of Contents 1. General Ledger Consolidation - Making The Right Moves 2. Problem Statement / Concerns 3. Solutions
How To Write A Mobile Device Policy
BYOD Policy Implementation Guide BYOD Three simple steps to legally secure and manage employee-owned devices within a corporate environment We won t bore you with the typical overview that speaks to the
BIG DATA WITHIN THE LARGE ENTERPRISE 9/19/2013. Navigating Implementation and Governance
BIG DATA WITHIN THE LARGE ENTERPRISE 9/19/2013 Navigating Implementation and Governance Purpose of Today s Talk John Adler - Data Management Group Madina Kassengaliyeva - Think Big Analytics Growing data
Designing Massive Open Online Courses
Designing Massive Open Online Courses Vladimir Kukharenko 1 1 National Technical University Kharkiv Polytechnic Institute, Frunze Street 21, 61002 Kharkiv Ukraine [email protected] Abstract. Connective
New InfoSec Leader The First 90 Days. John Bruce CEO
New InfoSec Leader The First 90 Days John Bruce CEO Agenda Introduction Co3 Systems Role of the CISO Three critical changes Suggestions Page 2 of 39 The next challenge in security PRODUCTS PREVENTION DETECTION
Is Your Data Management Ready For Systems Of Insight?
Is Your Data Management Ready For Systems Of Insight? by Michele Goetz July, 0 Why Read This Report Systems of insight (SOI) will create a culture where what you do with the data is more than how you manage
The problem of cloud data governance
The problem of cloud data governance Vasilis Tountopoulos, Athens Technology Center S.A. (ATC) CSP EU Forum 2014 - Thursday, 22 nd May, 2014 Focus on data protection in the cloud Why data governance in
Information Governance Workshop. David Zanotta, Ph.D. Vice President, Global Data Management & Governance - PMO
Information Governance Workshop David Zanotta, Ph.D. Vice President, Global Data Management & Governance - PMO Recognition of Information Governance in Industry Research firms have begun to recognize the
The Convergence of Big Data Processing and Integrated Infrastructure
Research Report Abstract: The Convergence of Big Data Processing and Integrated Infrastructure By Evan Quinn, Senior Principal Analyst and Bill Lundell, Senior Research Analyst With Brian Babineau, Vice
Mobile Money Manager
Mobile Money Manager 1 Problem Statement Are you always running out of money before the end of the month? If yes, it's about time you need to start thinking about how to manage your money. The first step
PANORATIO. Big data : Benefits of a strategic vision. White Paper June 2013. Executive Summary
PANORATIO Big data : Benefits of a strategic vision White Paper June 2013 Executive Summary Following the massive deployment of new mobile technologies and social media, sources of data regarding organizations
Stakeholder Analysis. Theory. Kerry Malone, Senior Advisor
Stakeholder Analysis Theory Example: eimpact Partners 2 Important questions Who are the stakeholders? What are their interests? What is their influence? How do you address differences in interests and
Personal data and cloud computing, the cloud now has a standard. by Luca Bolognini
Personal data and cloud computing, the cloud now has a standard by Luca Bolognini Lawyer, President of the Italian Institute for Privacy and Data Valorization, founding partner ICT Legal Consulting Last
Analyzing the Customer Experience. With Q-Flow and SSAS
Q.nomy Analyzing the Customer Experience With Q-Flow and SSAS Using Microsoft SQL Server Analysis Service to analyze Q-Flow data, and to gain an insight of customer experience. July, 2012 Analyzing the
Analytics Centre of Excellence: Roles, Responsibilities and Challenges
Analytics Centre of Excellence: Roles, Responsibilities and Challenges Warwick Graco Analytics Professional Convenor of the Whole of Government Data Analytics Centre of Excellence 1 Contents Changes to
MassMutual Cyber Security. University of Massachusetts Internship Opportunities Within Enterprise Information Risk Management
MassMutual Cyber Security University of Massachusetts Internship Opportunities Within Enterprise Information Risk Management Position Title: Threat Intelligence Intern Job Location: Boston, MA Timeframe:
Program Drill-Downs National
July 2014 Program Drill-Downs National Prepared for Hilbert College and St. Bonaventure University Matching People & Jobs Reemployment & Education Pathways Resume Parsing & Management Real-Time Jobs Intelligence
World Hybrid Cloud - Market
Report Code: IC 15256 World Hybrid Cloud - Market (product Types, Application, Technology, End Users and Geography) Global Share, Size, Industry Analysis, Trends, Opportunities, Growth and Forecast, 2014-2020
Translation Services and Software in the Cloud
Translation Services and Software in the Cloud How LSPs Will Move to Cloud-Based Solutions By Donald A. DePalma and Benjamin B. Sargent Translation in the Cloud By Donald A. DePalma and Benjamin B. Sargent
CONSENT ORDER. THIS CAUSE came on for consideration as the result of an agreement between
TOM GALLAGHER THE TREASURER OF THE STATE OF FLORIDA DEPARTMENT OF INSURANCE IN THE MATTER OF: CASE NO. 40504-01-CO MASSACHUSETTS BAY INSURANCE COMPANY 2000 Property and Casualty Target Market Conduct Examination
THE LATVIAN PRESIDENCY UNLOCKING EUROPEAN DIGITAL POTENTIAL FOR FASTER AND WIDER INNOVATION THROUGH OPEN AND DATA-INTENSIVE RESEARCH
THE LATVIAN PRESIDENCY UNLOCKING EUROPEAN DIGITAL POTENTIAL FOR FASTER AND WIDER INNOVATION THROUGH OPEN AND DATA-INTENSIVE RESEARCH IT-LV-LU TRIO PROGRAMME Overcome the economic and financial crisis Deliver
Communication Policy
Beacon Lighting Group Limited ACN 164 122 785 Communication Policy June 2015 Page 1 of 6 Table of Contents 1 Introduction... 3 2 Defined terms... 3 3 Continuous disclosure... 4 3.1 Communications with
Risk Considerations for Internal Audit
Risk Considerations for Internal Audit Cecile Galvez, Deloitte & Touche LLP Enterprise Risk Services Director Traci Mizoguchi, Deloitte & Touche LLP Enterprise Risk Services Senior Manager February 2013
Proactive DATA QUALITY MANAGEMENT. Reactive DISCIPLINE. Quality is not an act, it is a habit. Aristotle PLAN CONTROL IMPROVE
DATA QUALITY MANAGEMENT DISCIPLINE Quality is not an act, it is a habit. Aristotle PLAN CONTROL IMPROVE 1 DATA QUALITY MANAGEMENT Plan Strategy & Approach Needs Assessment Goals and Objectives Program
Cloud computing based big data ecosystem and requirements
Cloud computing based big data ecosystem and requirements Yongshun Cai ( 蔡 永 顺 ) Associate Rapporteur of ITU T SG13 Q17 China Telecom Dong Wang ( 王 东 ) Rapporteur of ITU T SG13 Q18 ZTE Corporation Agenda
A Framework to Improve Communication and Reliability Between Cloud Consumer and Provider in the Cloud
A Framework to Improve Communication and Reliability Between Cloud Consumer and Provider in the Cloud Vivek Sridhar Rational Software Group (India Software Labs) IBM India Bangalore, India Abstract Cloud
Foreword... 11 Introduction - The Global Food Safety Initiative (GFSI)... 11 Scope... 12 Section Overview... 12 Normative References...
Version 6.3 Overview Contents Foreword... 11 Introduction - The Global Food Safety Initiative (GFSI)... 11 Scope... 12 Section Overview... 12 Normative References... 13 9 Foreword Global Food Safety Initiative
HOW TO SELECT A BACKUP SERVICE FOR CLOUD APPLICATION DATA JUNE 2012
HOW TO SELECT A BACKUP SERVICE FOR CLOUD APPLICATION DATA JUNE 2012 INTRODUCTION The use of cloud application providers or Software-as-a-Service (SaaS) applications is growing rapidly. Many organizations
Iowa Student Loan Online Privacy Statement
Iowa Student Loan Online Privacy Statement Revision date: Jan.6, 2014 Iowa Student Loan Liquidity Corporation ("Iowa Student Loan") understands that you are concerned about the privacy and security of
How To Manage A Project Management Information System In Sharepoint
SharePoint 2010 for Project Management Course SP03; 3 Days, Instructor-led Course Description Leverage Microsoft SharePoint to increase the success rate of your projects and meetings in this SharePoint
MOOC at universities
Peer-reviewed and Open access journal e-issn: 1804-4999 www.academicpublishingplatforms.com The primary version of the journal is the on-line version ATI - Applied Technologies & Innovations Volume 10
Public Cloud Workshop Offerings
Cloud Perspectives a division of Woodward Systems Inc. Public Cloud Workshop Offerings Cloud Computing Measurement and Governance in the Cloud Duration: 1 Day Purpose: This workshop will benefit those
Post-Implementation EMR Evaluation for the Beta Ambulatory Care Clinic Proposed Plan Jul 6/2012, Version 2.0
1. Purpose and Scope Post-Implementation EMR Evaluation for the Beta Ambulatory Care Clinic Proposed Plan Jul 6/2012, Version 2.0 This document describes our proposed plan to conduct a formative evaluation
Horizontal IoT Application Development using Semantic Web Technologies
Horizontal IoT Application Development using Semantic Web Technologies Soumya Kanti Datta Research Engineer Communication Systems Department Email: [email protected] Roadmap Introduction Challenges
CA Clarity PPM v13.x Business Analyst Exam
CA Clarity PPM v13.x Business Analyst Exam (CAT-241) Version 1.2 - PROPRIETARY AND CONFIDENTIAL INFORMATION - These educational materials (hereinafter referred to as the Materials ) are for the end user
Offline Mode SAP Mobile BI 4.1. Author : Priya Sridhar
Offline Mode SAP Mobile BI 4.1 Author : Priya Sridhar OCTOBER 15 2013 TABLE OF CONTENTS Contents INTRODUCTION... 2 AUTHOR BIO... 2 Chapter 1: Dashboards... 3 Chapter 2: Web Intelligence... 5 Chapter 3:
Compensation Policy. Introduction
Compensation Policy Introduction Technological progress in payment and settlement systems and the qualitative changes in operational systems and processes that have been undertaken by various players in
Analytics With Hadoop. SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics
Analytics With Hadoop SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics Everything You Need to Get Started on Your First Hadoop Project SAS and Cloudera have identified the essential
Global Massive Open Online Courses Market 2014-2018
Brochure More information from http://www.researchandmarkets.com/reports/2889826/ Global Massive Open Online Courses Market 2014-2018 Description: About Massive Open Online Courses Massive open online
SPECIALISTS TRAINING IN BIG DATA USING DISTRIBUTED ARCHITECTURAL SOLUTIONS SERVICES. Проректор по учебной и воспитательной работе
SPECIALISTS TRAINING IN BIG DATA USING DISTRIBUTED ARCHITECTURAL SOLUTIONS SERVICES M. Batura 1 S. Dzik 2 I. Tsyrelchuk 3 Ректор Белорусского государственного университета информатики и радиоэлектроники
Adaptive SLA Monitoring of Service Choreographies Enacted on the Cloud
Istituto di Scienza e Tecnologie dell'informazione A. Faedo Software Engineering and Dependable Computing Laboratory Adaptive SLA Monitoring of Service Choreographies Enacted on the Cloud Antonia Bertolino,
Advisory AgreementAdvisory Agreement
Advisory AgreementAdvisory Agreement This Advisory Agreement ( Agreement ), made this day of, 20 between the below signed party(s) (hereinafter referred to as the Client ), and Phalanx Wealth Management,
Service Desk Consolidation Project
Jonathan Marks Project Manager IT Services - delivering responsive, innovative IT ICT Forum 25 June 2014 Aims: - To bring together the s of IT Services into a consolidated service which incorporates consolidated
(Effective for audits of financial statements for periods beginning on or after December 15, 2009) CONTENTS
INTERNATIONAL STANDARD ON AUDITING 720 THE AUDITOR S RESPONSIBILITIES RELATING TO OTHER INFORMATION IN DOCUMENTS CONTAINING AUDITED FINANCIAL STATEMENTS (Effective for audits of financial statements for
Executive Director for Operations AUDIT OF NRC S CYBER SECURITY INSPECTION PROGRAM FOR NUCLEAR POWER PLANTS (OIG-14-A-15)
UNITED STATES NUCLEAR REGULATORY COMMISSION WASHINGTON, D.C. 20555-0001 OFFICE OF THE INSPECTOR GENERAL May 7, 2014 MEMORANDUM TO: Mark A. Satorius Executive Director for Operations FROM: Stephen D. Dingbaum
Data Masking Best Practices
Data Masking Best Practices 1 Information Security Risk The risk that sensitive information becomes public 2 Information Security Risk Government systems store a huge amount of sensitive information Vital
IMPROVING RISK VISIBILITY AND SECURITY POSTURE WITH IDENTITY INTELLIGENCE
IMPROVING RISK VISIBILITY AND SECURITY POSTURE WITH IDENTITY INTELLIGENCE ABSTRACT Changing regulatory requirements, increased attack surfaces and a need to more efficiently deliver access to the business
Media Kit. Contents. Company Overview 2. Frequently Asked Questions 4. Leadership 7. Press Releases 9
Media Kit Contents Company Overview 2 Frequently Asked Questions 4 Leadership 7 Press Releases 9 Media Contact: Ed Waingortin [email protected] 617.606.9001 ext. 100 Company Overview Our Technology
QUESTIONS FOR COMMENT ON PROPOSED FRAMEWORK
QUESTIONS FOR COMMENT ON PROPOSED FRAMEWORK Scope Are there practical considerations that support excluding certain types of companies or businesses from the framework for example, businesses that collect,
Impact of International MOOCs on College English Teaching and Our Countermeasures: Challenge and Opportunity
Research Inventy: International Journal Of Engineering And Science Vol.5, Issue 4 (April 2015), PP 11-15 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.com Impact of International MOOCs on
A consumer research study commissioned by ATG
Live Help: Global Consumer Views & Trends Live Voice and Live Chat A consumer research study commissioned by ATG ATG Research Report March 2010 Survey Background ATG commissioned research across four international
15.3.3 OOA of Railway Ticket Reservation System
448 15.3.3 OOA of Railway Ticket System Assume that domain analysis is complete and DAD is ready for reference. The analyst also has a fair knowledge of the system and the system environment. For the sake
D1.3 Industry Advisory Board
Project acronym: Project full name: EDSA European Data Science Academy Grant agreement no: 643937 D1.3 Industry Advisory Board Deliverable Editor: Other contributors: Deliverable Reviewers: Mandy Costello
ISE Northeast Executive Forum and Awards
ISE Northeast Executive Forum and Awards October 3, 2013 Company Name: Project Name: Presenter: Presenter Title: University of Massachusetts Embracing a Security First Approach Larry Wilson Chief Information
