Why the Big Deal about Big Data? Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering Founding Director, escience Institute University of Washington Technology Alliance Insight to Impact March 2015 http://lazowska.cs.washington.edu/ta.pdf, pptx
Today A quick tutorial on exponentials Big Data and Smart Everything Some closer-in examples Components of the ecosystem Computer Science: The ever-expanding sphere
Processing capacity Storage capacity Network bandwidth Sensors Every aspect of computing has experienced exponential improvement Astonishingly, even algorithms in some cases!
Exponentials are rare we re not used to them, so they catch us unaware 9,223,372,036,854,780,000 4,294,967,296 16,777,216 65,536 256 1 2 4 8... 128
In Computer Science, we can exploit these exponential improvements in two ways Constant capability at exponentially decreasing cost Exponentially increasing capability at constant cost RAM Disk Flash John McCallum / Havard Blok Ray Kurzweil Storage Price / MB, USD (semi-log plot) Microprocessor Performance, MIPS (semi-log plot)
The 1970s to today 1970 Ford Mustang 2014 Ford Mustang Size: roughly comparable Speed: roughly comparable Efficiency (MPG): roughly comparable Value (cost relative to performance): roughly comparable
The 1970s to today 1971 Intel 4004 (2,300 transistors) 2014 Intel Xeon (4,300,000,000 transistors) Size: area occupied by a transistor reduced by 1,000,000x Speed: operations per second increased by 100,000x Efficiency (operations per watt): improved by 6,750x Value (dollars per instruction executed): improved by 2,700x
The 1970s to today 1970 Ford Mustang 2014 Intel Xeon What if cars had improved as rapidly as microprocessors?
The 1970s to today Size: A car would be smaller than an ant! (About 1/5 th of an inch long!)
The 1970s to today Speed: A car would go 6,000,000 miles per hour! (San Francisco to New York in 1.7 seconds!)
The 1970s to today Efficiency: A car would get 100,000 miles per gallon! (San Francisco to New York on 1/2 cup of fuel!)
The 1970s to today Cost: A car would cost less than $10!
Today, these exponential improvements in technology and algorithms are enabling a big data revolution A proliferation of sensors Think about the sensors on your phone More generally, the creation of almost all information in digital form It doesn t need to be transcribed in order to be processed Dramatic cost reductions in storage You can afford to keep all the data Dramatic increases in network bandwidth You can move the data to where it s needed
Dramatic cost reductions and scalability improvements in computation With Amazon Web Services, 1000 computers for 1 day costs the same as 1 computer for 1000 days Dramatic algorithmic breakthroughs Machine learning, data mining fundamental advances in computer science and statistics Ever more powerful models producing everincreasing volumes of data that must be analyzed
So, exactly what is meant by big data? Credit: Dan Ariely, Duke University
Serious answer: big data is enabling computer scientists to put the smarts into everything Smart homes Smart cars Smart health Smart robots Smart crowds and humancomputer systems Smart education Smart interaction (virtual and augmented reality) Smart cities Smart discovery
Shwetak Patel, University of Washington 2011 MacArthur Fellow Smart homes (the leaf nodes of the smart grid)
Smart cars DARPA Grand Challenge DARPA Urban Challenge Google Self-Driving Car
Smart health Larry Smarr quantified self Evidence-based medicine P4 medicine
Smart robots
Smart crowds and human-computer systems Zoran Popovic, UW Computer Science & Engineering David Baker, UW Biochemistry
Zoran Popovic, UW Computer Science & Engineering Smart education
Smart interaction
Smart cities
Smart discovery (data-intensive discovery, or escience) Nearly every field of discovery is transitioning from data poor to data rich Oceanography: OOI Astronomy: LSST Physics: LHC Biology: Sequencing Neuroscience: EEG, fmri Sociology: The Web Economics: POS terminals
Some closer-in examples of big data in action Collaborative filtering
Fraud detection
Price prediction
Hospital re-admission prediction
Travel time prediction and route recommendation under specific circumstances
Coaching / play calling in all sports
Speech recognition
Machine translation Speech -> text Text -> text translation Text -> speech in speaker s voice http://www.youtube.com/watch?v=nu-nlqqfckg&t=7m30s 7:30 8:40
Presidential campaigning
Electoral forecasting
Secret government surveillance of American citizens Hemisphere Project 26 years of records of every call that passed through an AT&T switch New records added at a rate of 4B/day
Secret government surveillance of foreign heads of state
Large Scale Deep Learning Jeff Dean Google Senior Fellow Joint work with many colleagues at Google Deep Learning : A form of Machine Learning A modern reincarnation of Artificial Neural Networks from the 1980s and 1990s Made practical by vast amounts of data (e.g., billions of images on the web) and vast computing resources Fully automated: General algorithms are trained and then turned loose
Generating Image Captions from Pixels Human: Three different types of pizza on top of a stove. Model sample 1: Two pizzas sitting on top of a stove top oven. Model sample 2: A pizza sitting on top of a pan on top of a stove.
Generating Image Captions from Pixels Human: A tennis player getting ready to serve the ball. Model: A man holding a tennis racquet on a tennis court.
Generating Image Captions from Pixels Human: Three different types of pizza on top of a stove. Model Model sample sample 1: I: Two A close pizzas up of sitting a child on holding top of a a stuffed stove top animal. oven. Model sample 2: A baby is asleep next to a teddy bear. Model sample 2: pizza sitting on top of a pan on top of a stove. Arthur C. Clarke: Any sufficiently advanced technology is indistinguishabl e from magic.
Infrastructure/Platforms Components of the ecosystem
Tools Elastic Map Reduce = Hadoop
Verticals/Services Real estate Traffic Government data IT operations Business expense management IT management Predictive analytics for businesses
Sensor systems
Intensive users
The open data movement: Civic data for civic good
Computer Science: The ever-expanding sphere Credit: Alfred Spector, Google
High Demand Fields in WA State, Baccalaureate Level & Above WSAC / SBCTC / WTECB, October 2013 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Computer Science Engineering Health Professions* Current Completions Additional Annual Completions Needed, 2016-21 Research, Science, Technical* *Gap exists at the graduate and/or professional level only Data from Table 2 of the report linked at http://www.wsac.wa.gov/sites/default/files/2013.11.16.skills.report.pdf
Is this a great time or what? http://lazowska.cs.washington.edu/ta.pdf, pptx