What is Big Data used for? 56
What is Big Data used for? Harnessing scientific discoveries 57
What is Big Data used for? Harnessing scientific discoveries Initiating early warning of natural disasters (e.g., floods, volcanic eruptions, and earthquakes) 58
What is Big Data used for? Harnessing scientific discoveries Initiating early warning of natural disasters (e.g., floods, volcanic eruptions, and earthquakes) Reports» Track business processes, transactions 59
What is Big Data used for? Diagnosis Decisions 60
What is Big Data used for? Diagnosis» Why is user engagement dropping?» Why is the system slow?» Prevent failures» Detect spam, worms, viruses, DDoS attacks Decisions 61
What is Big Data used for? Diagnosis» Why is user engagement dropping?» Why is the system slow?» Prevent failures» Detect spam, worms, viruses, DDoS attacks Decisions» Personalized medical treatment» Decide what ads to show 62
Who is collecting what? Credit Card Companies What data are they getting? Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 63
Who is collecting what? Credit Card Companies What data are they getting? Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 64
Who is collecting what? Credit Card Companies What data are they getting? Airline ticket Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 65
Who is collecting what? Credit Card Companies What data are they getting? Airline ticket Restaurant check Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 66
Who is collecting what? Credit Card Companies What data are they getting? Airline ticket Restaurant check Grocery Bill Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 67
Who is collecting what? Credit Card Companies What data are they getting? Airline ticket Restaurant check Grocery Bill Hotel Bill Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 68
Why are they collecting all this data? Target Marketing Targeted Information Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 69
Why are they collecting all this data? Target Marketing Targeted Information To send you catalogs for exactly the merchandise you typically purchase. Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 70
Why are they collecting all this data? Target Marketing Targeted Information To send you catalogs for exactly the merchandise you typically purchase. To suggest medications that precisely match your medical history. Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 71
Why are they collecting all this data? Target Marketing Targeted Information To send you catalogs for exactly the merchandise you typically purchase. To suggest medications that precisely match your medical history. To push television channels to your set instead of your pulling them in. Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 72
Why are they collecting all this data? Target Marketing To send you catalogs for exactly the merchandise you typically purchase. To suggest medications that precisely match your medical history. To push television channels to your set instead of your pulling them in. To send advertisements on those channels just for you! Targeted Information Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 73
Why are they collecting all this data? Target Marketing To send you catalogs for exactly the merchandise you typically purchase. To suggest medications that precisely match your medical history. To push television channels to your set instead of your pulling them in. To send advertisements on those channels just for you! Targeted Information To know what you need before you even know you need it based on past purchasing habits! Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 74
Why are they collecting all this data? Target Marketing To send you catalogs for exactly the merchandise you typically purchase. To suggest medications that precisely match your medical history. To push television channels to your set instead of your pulling them in. To send advertisements on those channels just for you! Targeted Information To know what you need before you even know you need it based on past purchasing habits! To notify you of your expiring driver s license or credit cards or last refill on a Rx, etc. Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 75
Why are they collecting all this data? Target Marketing To send you catalogs for exactly the merchandise you typically purchase. To suggest medications that precisely match your medical history. To push television channels to your set instead of your pulling them in. To send advertisements on those channels just for you! Targeted Information To know what you need before you even know you need it based on past purchasing habits! To notify you of your expiring driver s license or credit cards or last refill on a Rx, etc. To give you turn- by- turn directions to a shelter in case of emergency. Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 76
5 Ways Big Data Will Change the World 77
78
Medicine Aetna is using reams of data to try to get early diagnosis, prevention and treatment of heart disease and diabetes. UCLA is using Big Data analysis to prevent complications from brain injuries. The American Society of Clinical Oncology is using Big Data to help it find the best treatments for cancer. http://insights.wired.com/profiles/blogs/5- ways- big- data- will- change- the- world#axzz3nhefva1j 79
Security There is a pedometer application that can actually identify people based on their gait, how they walk. A new security firm called Pindrop is using Big Data analysis to help banks and other financial institutions identify callers to ensure the person on the other end of the line is who they say t.. Pindrop is able to listen to more than 100 different background sounds on a phone call to tell where the call is coming from and whether it is a cell phone, land line, of VOIP. They can tell you if the person claiming to be in Nebraska is actually calling from Nigeria. http://insights.wired.com/profiles/blogs/5- ways- big- data- will- change- the- world#axzz3nhefva1j 80
Urban Planning Tracking the movements of people and how that could impact urban planning. Cities are using data discovery techniques to examine the myriad of ways small changes can impact a big urban centers. The Urban Center for Computational Data talks about computer models helping cities to figure out how things like a new bus line might impact crime, employment, and energy usage in parts of a city. There is little question that how our cities are built and function will be changed by data analytics. http://insights.wired.com/profiles/blogs/5- ways- big- data- will- change- the- world#axzz3nhefva1j 81
Consumer Products The tremendous rise in online shopping has created piles of data to better understand what consumers want and how they shop. It even allows companies to customize their pricing models based on who is shopping and when they want to buy. http://insights.wired.com/profiles/blogs/5- ways- big- data- will- change- the- world#axzz3nhefva1j 82
Elections In the 2012 presidential election, the Obama Campaign made use of voter models on a scale never before seen. They were able to identify specific voters who would make a difference in the election and target messages to those voters. I am not talking about something general like, we need to appeal to soccer moms, I am talking about true specifics like, the Johnson family Maple Lane in Columbus, Ohio will vote for us if they know our stance on social security. It seems insane to think that presidential politics has gotten that local, but it has and it worked. There is little question that the Obama campaigns sophisticated methods of get out the vote and swing voter identification swung a very close election their way. http://insights.wired.com/profiles/blogs/5- ways- big- data- will- change- the- world#axzz3nhefva1j 83
Usage Examples of Big Data 84
Self- driving cars have to do with Big Data? Computers in cars know where you go, when you go, how fast you go, how many times you stop along the way, whether you stay in your lane, what your average MPG is, how you like your temperature, how close you get before stepping on the brake, and tens of thousands of other facts.instantly. Analyzing all of this data rapidly allows a self- driving car to:» Anticipate where you are going by looking at driving history» Check road signs using sensors to know what the speed limit is or if a stop sign is approaching» Alert and activate your braking and steering systems if pedestrians are in the street or you re too close to the curb or you drift into another lane or you doze off. Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 85
Usage Example in Big Data - Moneyball: The Art of Winning an Unfair Game Oakland Athletics baseball team and its general manager Billy Beane - Oakland A's' front office took advantage of more analytical gauges of player performance to field a team that could compete successfully against richer competitors in MLB - Oakland approximately $41 million in salary, New York Yankees, $125 million in payroll that same season. Oakland is forced to find players undervalued by the market, - Moneyball had a huge impact in other teams in MLB And there is a moneyball movie!!!!! Adopted from a presentation by Kayvan Tirdad The Age of Big Data, York University 86
Usage Example of Big Data US 2012 Election - predictive modeling - mybarackobama.com - drive traffic to other campaign sites Facebook page (33 million "likes") YouTube channel (240,000 subscribers and 246 million page views). - a contest to dine with Sarah Jessica Parker - Every single night, the team ran 66,000 computer simulations. - Amazon web services - data mining for individualized ad targeting - Orca big- data app - YouTube channel( 23,700 subscribers and 26 million page views) - Adopted from a presentation by Kayvan Tirdad The Age of Big Data, York University 87
Big Data: Challenges 88
Volume (Scale) Data Volume» 44x increase from 2009 2020» From 0.8 zettabytes to 35zb Data volume is increasing exponentially 89
Volume (Scale) Data Volume» 44x increase from 2009 2020» From 0.8 zettabytes to 35zb Data volume is increasing exponentially 90
Volume (Scale) Data Volume» 44x increase from 2009 2020» From 0.8 zettabytes to 35zb Data volume is increasing exponentially Exponential increase in collected/generated data 91
12+ TBs of tweet data every day
12+ TBs of tweet data every day 25+ TBs of log data every day
12+ TBs of tweet data every day? TBs of data every day 25+ TBs of log data every day
12+ TBs of tweet data every day? TBs of data every day 25+ TBs of log data every day
12+ TBs of tweet data every day 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide? TBs of data every day 25+ TBs of log data every day 76 million smart meters in 2009 200M by 2014 100s of millions of GPS enabled devices sold annually 2+ billion people on the Web by end 2011
Variety (Complexity) Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi- structured Data (XML) Graph Data» Social Network, Semantic Web (RDF), Streaming Data» You can only scan the data once A single application can be generating/collecting many types of data Big Public Data (online, weather, finance, etc) Intro to Big Data INRIA 98
Variety (Complexity) Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi- structured Data (XML) Graph Data» Social Network, Semantic Web (RDF), Streaming Data» You can only scan the data once A single application can be generating/collecting many types of data Big Public Data (online, weather, finance, etc) To extract knowledgeè all these types of data need to linked together Intro to Big Data INRIA 99
A Single View to the Customer Social Media Bankin g Finance Gamin g Customer Our Known History Entertain Purchas e
Velocity (Speed) Data is begin generated fast and need to be processed fast Online Data Analytics Late decisions è missing opportunities 101
Velocity (Speed) Data is begin generated fast and need to be processed fast Online Data Analytics Late decisions è missing opportunities Examples» E- Promotions: Based on your current location, your purchase history, what you like è send promotions right now for store next to you» Healthcare monitoring: sensors monitoring your activities and body è any abnormal measurements require immediate reaction 102
Some Make it 4V s 103
and Privacy 104
Goodbye Anonymity 105
What are some impacts of Big Data? Decisions like your credit score and your insurance rates may be based on the analysis of big data, for good or bad. After Haiti s 2010 earthquake, Columbia University tracked the movements of 2 million refugees by the SIM cards in their cell phones and were able to determine where health risks would likely develop. Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 106
Is Big Data good or bad for consumers? How would you feel about paying more for the same product than the person checking out in front of you? The real challenge: are you willing to get better value and more innovation for some loss of privacy? Since there is no way to stop the accumulation of Big Data, should its use be regulated by the Federal government? Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 107
How Can You Avoid Big Data? Pay cash for everything! Never go online! Don t use a telephone! Don t fill any prescriptions! Never leave your house! Adopted from a presentation by Stu Miller, How Big Data will change your life, at Osher Lifelong Learning Institute 108
The Data Science: The 4 th Paradigm for Scientific Discovery 109. a a 2 = 4πGρ Κ 3 c a 2 2 Thousand years ago Description of natural phenomena Last few hundred years Newton s laws, Maxwell s equations Last few decades Simulation of complex phenomena Today and the Future Unify theory, experiment and simulation with large multidisciplinary Data Using data exploration and data mining (from instruments, sensors, humans ) Crédits: Dennis Gannon Distributed Communities 109
The Data Science: The 4 th Paradigm for Scientific Discovery 110. a a 2 = 4πGρ Κ 3 c a 2 2 Thousand years ago Description of natural phenomena Last few hundred years Newton s laws, Maxwell s equations Last few decades Simulation of complex phenomena Today and the Future Unify theory, experiment and simulation with large multidisciplinary Data Using data exploration and data mining (from instruments, sensors, humans ) Crédits: Dennis Gannon Distributed Communities 110
Big Data Science: The art of understanding huge volumes of data Data Science is not just data analysis. Four main topics: Data architecture: how the data would need to be routed and organized to support the analysis, visualization and presentation of the data Data acquisition: how the data are collected, and, importantly, how the data are represented prior to analysis and presentation Data analysis: involves many technical, mathematical, and statistical aspects; still, the results have to be effectively communicated to the data user. Data archiving: preservation of collected data in a form that makes it highly reusable (data curation) 111
Data Scientist skills Evolution from the data analyst role: Computer science, software engineering methodologies, modeling, statistics, analytics, visualization, databases, machine learning, data mining, big data and maths. Business skills: Influence in making decisions in a business environment The data scientist guides a data science project Engineer collect & scrub disparate data sources manage a large computing cluster Mathematician machine learning statistics Artist visualize data beautifully, tell a convincing story 112
Data Science Venn Diagram
Cross- Cutting data Requirements Machine learning Statistical methods Scalability Quality Multi- model Data Analysis Schema Sharing Retention Search I/O Storage tech. Data Management Acquisition Workflow Reduction System arch. Provenance Data Processing 114
Data Scientist I keep saying the sexy job in the next ten years will be statisticians. The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it. Hal Varian, Google s chief economist 115
116
Intro to Big Data INRIA 117
118
Acknowledgments Gabriel Antoniu (Inria) Alexandru Costan (INSA) 119
Thank you! Shadi Ibrahim shadi.ibrahim@inria.fr 120