Dr. Willa Pickering Lockheed Martin enior Fellow March 2012 Data, Data Everywhere Big Data what is it Protecting Data in Cloud how do we handle it Data Analysis are we prepared to use it Willa Pickering 1
Big Data Data sets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. ource: McKinsey Global Institute, Big data: The next frontier for innovation, competition, and productivity, May 2011 3 Big Data Challenges Volume: 5 exabytes of data (images, video, audio, text)/every 2 days according to Google; 10 TB U and EU online capacity of 27M terabytes 60,000 Libraries of Congress Variety: Web logs, wireless RFID sensors, unstructured info from social networks, ensors embedded in mobile phones, energy meters, cars, machines ocial media, smart phones, laptops Velocity: Demand for delivery in hrs, months, seconds Exponential growth in price performance of storing, aggregating, and filtering data ource: Information Week, 3 Big Data Challenges: Expert Advice, 10/18/2011 Willa Pickering 2
Data Forecast uccess of individual firms rooted in use of big data Use of big data will generate new waves of productivity growth and consumer surplus Big data will impact all industries (finance, insurance, government, computer and electronics, information processing) Big data will force businesses and society to address fundamental assumptions and priorities (privacy, liability) ource: Trends E Magazine, The Big Data Revolution, October 2011 mart Data How to protect data in clouds elf identifying elf protected mart data wrapper Tagging Processing operations Provenance functions Encryption Metadata Track who touches, where created, worked on, etc. Willa Pickering 3
Tagged Data Tagging Object (credential, attribute, policy, RBAC) Processing Lifecycle Lineage (history, traceability) Provenance Cookie User Node Process Date Time Action Data Object Change Data Equity Creating transparency Relevant data available for analysis Enabling experiments Controlled experiments for better management decisions Customizing actions to more effectively address a population segment Narrower segmentation of customers with more precisely tailored products or services Enabling better and timelier decisions ophisticated analytics improve/replace human decision making Enabling computer assisted innovations Improve development of next generation ource: Trends E Magazine, The Big Data Revolution, October 2011 Willa Pickering 4
Data to Decisions pecialized databases, data warehousing, data warehouse appliances Query on cols Compression Information management tools, ETL, sorting and manipulating data Mining social networks ource: Information Week, 10/18/2011 emistructured, Unstructured Read and integrate data Edit Eliminate stop words ynonym replacement Homographic resolution Thematic clustering Glossary/taxonomy overlay temming Alternate spellings Foreign language accommodation earch support Data mine, search from data warehouse ource: Information Week, 10/18/2011 Willa Pickering 5
Partitioning Different tiers of servers Operational Data tore Data Analysis Predictive analysis data mining, machine learning tatistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns Capturing relationships between explanatory variables and the predicted variables Latent semantic analysis Vector representation/petri dish Probabilistic analysis Ontology Goal/Question/Metric Wikipad ource: Information Week, 10/18/2011 Willa Pickering 6
New Demands Near real time business intelligence Business Performance Management demand for operational data Nightly data warehouse updates not sufficient Pro active push reporting Exception based reporting Time intensive data loading Provide when needed More diverse users Different types of users More users More data sources External data emistructured, unstructured data ource: Agile Analytics, Ken Collier, 2012 Challenge National shortage 140K 190K of analysts 1.5M managers skilled at making decisions based on analysis of big data Needed skills Data management Data processing Data quality Privacy and liability policies ource: The Economist, Data, data everywhere: A special report on managing information, February 27, 2010 Willa Pickering 7