1 03: What You Need to Know About Big Data: Understanding and Better Utilizing Data Analytics Trainer(s): Mike Holland NYU Center for Urban Science and Progress Timothy Savage NYU Center for Urban Science and Progress Alan Mitchell KPMG Stephen C. Beatty KPMG
2 What You Need to Know about Big Data Understanding and Better Utilizing Data Analytics Mike Holland Tim Savage March 7, 2015
3 Applied Sciences NYC Applied Sciences NYC is the City s unparalleled opportunity to build or expand world-class applied sciences and engineering campuses in New York City. We are seeking to dramatically expand our capacity in the applied sciences to maintain our global competitiveness and create jobs. These campuses would not only enrich the City s existing research capabilities, but also lead to innovative ideas that can be commercialized, catalyzing hundreds of spinoff companies and increasing the probability that the next high growth company a Google, Amazon, or Facebook will emerge in New York City. New York City Economic Development Corporation The NYU-led Center for Urban Science and Progress, a multi-sector research and education collaborative, was announced on April 23, 2012.
4 Big Cities + Big Data The world is urbanizing Cities are the loci of consumption, economic activity, and innovation Cities are the cause of our problems and the source of the solutions Global network traffic, 30% CAGR Informatics capabilities are exploding Storage, transmission, analysis Proliferation of static and mobile sensors Internet of things
5 GRADUATE PROGRAMS IN APPLIED URBAN SCIENCE AND INFORMATICS DEGREE Master of Science LENGTH One Year, 3-semester (Full-time) CLASS SIZE Approx. 60 students Z
6 Projects for the City & State City Lights Building Informatics Urban Soundscape Neuroeconomics of Decision Making Economic Mapping Greener Greater Buildings Plan MTA Bus Driver Optimization MTA Origin/Destination Study New York City Police Department 911/311 Trash Informatics Parks Attendance & Utilization Property Ownership Records Assessment School Property Use Assessment Taxi Visualization Transit Operations
7 What does it mean to instrument a city? Infrastructure Environment People Condition, operations Meteorology, pollution, noise, flora, fauna Properly acquired, integrated, and analyzed, data can Take government beyond imperfect understanding Better (and more efficient) operations, better planning, better policy Improve governance and citizen engagement Enable the private sector to develop new services for citizens, governments, firms Enable a revolution in the social sciences Relationships, location, economic /communications activities, health, nutrition, opinions,
8 Data Types
9 Urban Data Sources: Acquire, Integrate, Use Organic Data Flows Sensors Novel Technologies Administrative records (census, permits, ) Transactions (sales, communications, ) Operational (traffic, transit, utilities, health system, ) Social media (Twitter, Facebook, blogs, ) Personal (location, activity, physiological) Fixedin situ sensors Crowd sourcing (mobile phones, ) Choke points (people, vehicles) Visible, infrared and spectral imagery RADAR, LIDAR Gravity and magnetic Seismic, acoustic Ionizing radiation, biological, chemical
10 Privacy, Big Data, and the Public Good: Frameworks for Engagement The book identifies ways in which vast new sets of data on human beings can be collected, integrated, and analyzed to improve urban systems and quality of life while protecting confidentiality. Sponsored by CUSP, the American Statistical Association, its Privacy and Confidentiality subcommittee, and the Research Data Centre of the German Federal Employment Agency. Editors: Julia Lane, American Institutes for Research; Victoria Stodden, Columbia; Stefan Bender, The German Federal Employment Agency; Helen Nissenbaum, NYU Chapter Authors Alessandro Acquisti, Carnegie Mellon University; Cynthia Dwork, Microsoft; Peter Elias, University of Warwick; Robert Goerge, UChicago; Alan Karr, National Institute of Statistical Sciences and Jerry Reiter, Duke University; Steve Koonin and Michael Holland, CUSP; Frauke Kreuter, U-MD and Richard Peng, Johns Hopkins; Carl Landwehr, George Washington University; Helen Nissenbaum and Solon Baracas, NYU; Paul Ohm, Colorado; Alexander Pentland, et al., MIT; Kathy Strandberg, NYU; Victoria Stodden, Columbia; John Wilbanks, Sage Bionetworks/Kauffman Foundation. visit dataprivacybook.org.
12 Analysis of Massive Taxi GPS Data Overview Data from yellow cabs is almost 800 million trips; nearly impossible to manage, explore, visualize, and analyze with existing tools Objective & Goal Build scalable, usable tools that can be used by experts and non-experts Work with relevant city agencies on development & deployment of the technology Status Initial deployment of TaxiVis at NYC Taxi & Limousine Commission and Department of Transportation Freire, Silva, Vo, et al.
13 Taxis as Sensors for Manhattan Taxis are sensors that can provide unprecedented insight into city life: economic activity, human behavior, mobility patterns, April 2011: Taxi drivers petitioning TLC for higher fares to compensate for rising gasoline prices. August 2011: Hurricane Irene October 2012: Hurricane Sandy
14 Urban Observatory PERSISTENT and SYNOPTIC ANALYTICS for URBAN SCIENCE
15 Manhattan in the Thermal IR 199 Water Street Built 1993 :: 998,000 sq ft electricity, natural gas, steam LEED Certified Photo by Tyrone Turner/National Geographic Other synoptic modalities: Hyperspectral, RADAR, LIDAR, Gravity, Magnetic,
17 raw image Plumes of Opportunity Background subtraction: registration to reference image form 10 absolute difference images from surrounding frames construct the minimum difference image pixel by pixel Plume identification and tracking: denoise background subtracted image identify excess/deficit in luminosity space cross check object location in color space localization and probability weighted tracking of centroids Upcoming use cases: plume rate urban winds carbon vs steam emissions TOO (triggered) observations background subtracted Source: Dobler, et al.
18 Street Environment: Attention, Distraction, and Interaction Dynamics P. Glimcher, M. Grubb, M. Ghandehari, G. Dobler, M. Sharma, A. Chiang
19 Hyperspectral Imaging of Manhattan Bridge Lights Source: Dobler, et al.
20 Open Data
21 https://project-open-data.cio.gov/ Federal Open Data Policies
22 State & Local Open Data its.github.io/open data handbook/opendatahandbook.pdf navigation
23 *Seattle s 911 dispatches, with 438,000 downloads, is the table with the highest number of downloads Source: Barbosa, Luciano, et al. "Structured open urban data: understanding the landscape." Big data 2.3 (2014):
24 Cities and States with Chief Data Officers Blue signifies a state level officer, green signifies a local level officer, and yellow signifies an officer in education. Source: Steve Towns, Which States and Cities Have Chief Data Officers?, govtech.com, June 13, 2014
25 Open Data Can Lead to Open Innovation A consortium of public sector transit agencies, commercial firms, nonprofits, academic researchers, and interested individuals Real time arrival predictions 94% reported increased or greatly increased satisfaction with public transit Significant decrease in actual wait time per user, and an even greater decrease in perceived wait time 78% of riders reported increased walking a significant public health benefit
26 $397B Sanitation Utilities Parks Roads $180B Emergency Mgmt Courts, Jails Police Fire Streets $245B Planning Public Buildings Financial Admin Community Development Safety Core City Services Include General Government Human Services $826B Health Education Social Services We need to understand: How data flows within agencies? How interoperable can data be? What data can be shared? and how is it shared to support delivery of city services? Local Gov t. Expenditures: U.S. Census Bureau, 2012 Census of Governments: Surveys of State and Local Government Finances,
27 Tools and Uses of Big Data
28 Tools Data acquisition and synthesis Exploration and data mining Formulation of meaningful policy questions Formal modeling and interpretation
29 Tools Data acquisition and synthesis Exploration and data mining Formulation of meaningful policy questions Formal modeling and interpretation
30 Picture merges image captured from video, 3 D LIDAR map of NYC, PLUTO (Primary Land Use Tax Lot Output) database, and LL84 Energy Benchmarking data Source: Dobler, et al.
31 Tools Data acquisition and synthesis Exploration and data mining Formulation of meaningful policy questions Formal modeling and interpretation
32 TaxiVis: Interactive Visual Exploration of NYC Taxi Records Source: Freire, Silva, Vo, et al.
33 Source: Freire, Silva, Vo, et al.
34 Tools Data acquisition and synthesis Exploration and data mining Formulation of meaningful policy questions Formal modeling and interpretation
35 Tools Data acquisition and synthesis Exploration and data mining Formulation of meaningful policy questions Formal modeling and interpretation
36 Uses of Data Analytics Regulatory compliance Targeted enforcement Improved understanding of municipal ecosystems via crowd sourcing
37 Some Examples Regulatory compliance Targeted enforcement Improved understanding of municipal ecosystems via crowd sourcing
38 Some Examples Regulatory compliance Targeted enforcement Improved understanding of municipal ecosystems via crowd sourcing
39 Apartment Fires in the Bronx and Brooklyn 20,000+ complaints/year of unsafe illegal conversions Department of Buildings: 200 building inspectors for 900,000 buildings relied on expert judgment to prioritize Historically, only 8% of inspections found serious violations Strongest predictors of unsafe illegal conversion Whether the building is current on its property taxes: data at Department of Finance Whether banks have filed any mortgage foreclosures: data at Office of Court Administration Teaming Fire Marshals up with Building Inspectors Fire fighters 15X more likely to die responding to a fire in an illegal conversion than other fires Vacate orders jumped to more than 70% Source: Mike Flowers, Beyond Open Data: The Data-Driven City in Beyond Transparency: Open Data and the Future of Civic Innovation, Brett Goldstein, Lauren Dyson, Eds.; San Francisco, CA: Code for America Press (2013).
40 Some Examples Regulatory compliance Targeted enforcement Improved understanding of municipal ecosystems via crowd sourcing
41 THANK YOU cusp.nyu.edu