WHAT IS BIG DATA? BIG DATA DR. KLARA NELSON THE UNIVERSITY OF TAMPA "Volumes of data that are unusually large, or types of data that are unstructured" Thomas Davenport, Keeping Up with the Quants, 2013, p. 6 The emerging technologies and practices that enable the collection, processing, discovery, analysis, and storage of large volumes and disparate types of data, quickly and cost effectively. SAS Best Practices Team Definition http://tamaradull.com/2013/02/20/the-5-ws-what-is-big-data/ TBTLA PRESENTATION AUGUST 14, 2014 WHAT IS BIG DATA? Big data Traditional analytics Type of data Unstructured formats Formatted in rows and columns Volume of 100 TB to PB Tens of TB or less data Flow of data Constant flow of data Static pool of data Analysis methods Machine learning Hypothesis-based Primary purpose Data-based products Internal decision support and services Source: Thomas Davenport, Big Data @ Work, 2014, Table 1-1, p. 4 THE 5 V'S OF BIG DATA Volume Data size Velocity High-velocity capture, discovery, and/or analysis Value Variety Many different types Veracity Quality / Trustworthiness http://www-01.ibm.com/software/data/bigdata/ http://www- 05.ibm.com/fr/events/netezzaDM_2012/Solutions_Big_Data.pdf 1
TYPICAL DATA SET SIZE CUSTOMER TRANSACTIONS: #1 SOURCE OF LARGE DATA Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report, p. 31. Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report, p. 9. THE 5 V'S OF BIG DATA: VALUE Integrating V doing something valuable with the data, turning data into dollars Being able to translate massive amounts of data into real insights and realizing value from that insight Big Data at UPS to shave ONE MILE off each DRIVER's ROUTE a day would save the firm $50 MILLION a year. BIG DATA = BIG ROI Healthcare 20% decrease in patient mortality by analyzing streaming patient data Telco 92% decrease in processing time by analyzing networking and call data Utilities 99% improved accuracy in placing power generation resources by analyzing 2.8 petabytes of untapped data Healthcare, Telco, Utilities: http://www-01.ibm.com/software/data/bigdata/industry.html UPS: Christian Science Monitor, Aug 12, 2013, p. 32 THE 8 MOST IN-DEMAND BIG DATA ROLES Role Average Annual Salary ($) Visualization Tool Developers (Expert Level) 150,000 175,000 Hadoop Developers 150,000 175,000 Data Scientists 125,000 140,000 Information Architects 113,750 135,350 ETL Developers 110,000 130,000 Predictive Analytics Developers 103,700 129,000 Data Warehouse Appliance Specialist 97,950 123,600 OLAP Developers 97,900 115,550 http://www.computerworld.com/slideshow/detail/138836/the-8-most-in-demandbig-data-roles-#slide7, February 17, 2014 2
THE BIG DATA LANDSCAPE WHAT IS BIG DATA TECHNOLOGY? "Big data technology is capable of handling a lot of data. Big data handles data cheaply. Big data handles data in the form of unstructured strings of data. Big data does its searches independently. Big data is used to store and manage large amounts of data. That s what big data is." Bill Inmon http://blogs-images.forbes.com/davefeinleib/files/2014/06/big-data-landscape-jul-4-2012-00111.png Source: "Big Data Technology Does Not Replace a Data Warehouse", http://www.b-eye-network.com/view/16714, January 10, 2013 TECHNOLOGIES: DATA WAREHOUSE VS. BIG DATA Use the best tool for the job depending on the business requirements: Discovery of unexplored business questions Clean, consistent, high quality data Low latency, interactive reports, OLAP Raw unstructured data Analysis of preliminary data WHICH DATA MINING/ ANALYTIC TOOLS ARE USED? The average data miner reports using 5 tools, but conducts 76% of their work in their primary tool. Source: http://tamaradull.com/2013/03/20/the-5-ws-when-should-we-use-big-data-vs-data-warehousingtechnologies/ Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report, p. 31. 3
PREPARING STUDENTS TO WORK WITH BIG DATA Analytics courses ITM 466 Business Intelligence and Analytics (Elective) ITM 615 Business Analytics (MBA Decision Analysis Elective) Course topics Assessing analytics competencies of organizations (e.g., Davenport's DELTA) Analytical thinking stages Ethics of analytics / big data Data quality Data warehouses & other technologies Data mining methods TECHNOLOGIES USED IN THE BUSINESS ANALYTICS COURSES SAP Business Objects Microsoft Excel Tableau Software SQL Server Data Tools for building analysis databases and data mining IBM SPSS Statistics Suite for research and analysis IBM SPSS Modeler for predicting future behavior (data mining) IBM SPSS Text Analytics for mining unstructured data sources IBM Digital Analytics (formerly Coremetrics Web Analytics) DATA MINING ALGORITHMS DATA MINERS & ITM 466/615 STUDENTS ARE USING denotes algorithms covered hands-on in ITM 466/615 Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report, p. 36. THE CHALLENGES OF BIG DATA & BIG DATA ANALYTICS Delivering Value "Through 2015, 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage." (Gartner) Data Silos Quality Storage Enterprise strategy Talent Lack of IT/technical skills Lack of domain knowledge Lack of analytical thinking skills Organizational culture Technologies and tools Big data as IT-driven projects Gartner quote: http://www.gartner.com/technology/topics/big-data.jsp 4
THE CHALLENGES OF BIG DATA AND BIG DATA ANALYTICS Ethics "A code of conduct to refer to in judging what is right and what is wrong" regarding the ways we gather data and use data and guide individual and organizational conduct through use of data and Frank Buytendijk quotes on Analytics and Ethics from the TDWI Las Vegas 2012 World Conference "Are there things you shouldn't do?" "It seems like we are doing things because we can." "The key thing is that technology is answering questions that weren't even asked." "Tools are creating ethical issues, and we don't even have the mechanism to do something about it." THANK YOU! 5