1 Modern (Computational) Approaches to Big Data Analytics CSC 576 Computer Science, University of Rochester Instructor: Ji Liu
2 Big Data in Academy SIGKDD 2014 (program page, found 14 big data, 50+ large scale ) ICML 2014 (3 of 6 tutorials are about big data )
3 Big Data in Industry From linkedin, I found 2,107 results for data scientist positions 865 results for Java programmer positions 436 results for c++ programmer positions
4 What is ``Big Data''? A Mock from a professor of psychology and behavioral economics Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it ---- Dan Ariely.
5 Big Data Every Where! Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Social Network
6 How ``Big''? Google processes 20 PB a day (2008) Wayback Machine has 3 PB per month (3/2009) Facebook has 2.5 PB of user data + 15 TB/day (4/2009) ebay has 6.5 PB of user data + 50 TB/day (5/2009) CERN s Large Hydron Collider (LHC) generates 15 PB a year
8 Variety Volume Velocity Veracity Value 2014 Advanced Performance Institute, BWMC Ltd. All rights reserved. 8
9 Variety Volume Velocity Veracity Value 2014 Advanced Performance Institute, BWMC Ltd. All rights reserved. Volume refers to the vast amounts of data generated every second. We are not talking Terabytes but Zettabytes or Brontobytes. If we take all the data generated in the world between the beginning of time and 2008, the same amount of data will soon be generated every minute. This makes most data sets too large to store and analyse using traditional database technology. New big data tools use distributed systems so that we can store 9 and analyse data across databases that are dotted around anywhere in the world.
10 Variety Volume Velocity Veracity Value Velocity refers to the speed at which new data is generated and the speed at which data moves around. Just think of social media messages going viral in seconds. Technology allows us now to analyse the data while it is being generated (sometimes referred to as in-memory analytics), without ever putting it into databases Advanced Performance Institute, BWMC Ltd. All rights reserved.
11 We see increasing variety of data types: Variety Volume Veracity Velocity Value 2014 Advanced Performance Institute, BWMC Ltd. All rights reserved. Variety refers to the different types of data we can now use. In the past we only focused on structured data that neatly fitted into tables or relational databases, such as financial data. In fact, 80% of the world s data is unstructured (text, images, video, voice, etc.) With big data technology we can now analyse and bring together data of different types such as messages, social media conversations, photos, 11 sensor data, video or voice recordings.
12 Variety Volume Veracity Value Velocity Veracity refers to the messiness or trustworthiness of the data. With many forms of big data quality and accuracy are less controllable (just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech as well as the reliability and accuracy of content) but technology now allows us to work with this type of data Advanced Performance Institute, BWMC Ltd. All rights reserved.
13 Value The most important V of all! Variety Volume Veracity Value Velocity Then there is another V to take into account when looking at Big Data: Value! Having access to big data is no good unless we can turn it into value. Companies are starting to generate amazing value from their big data Advanced Performance Institute, BWMC Ltd. All rights reserved.
14 Recommendation System Example 1
15 Recommendation System Example 2
16 Video Analysis
17 Video Surveillance
18 Steps of Data Analysis Pose a problem Collect data raw and dirty data Pre-process data (like extract feature) clean data Design mathematical model (formulation) Find a solution Evaluation
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
Big Data: Current and coming impact on travel and travel management What can Big Data do for you? Like many industry sectors and professions, travel and travel management are working to move past the buzz
Navigating the big data challenge Do you have lots of data but few insights? By Rasmus Wegener and Velu Sinha Rasmus Wegener is a partner with Bain & Company in Atlanta. Velu Sinha is a partner in Bain
Database Systems Journal vol. IV, no. 3/2013 31 Big Data Challenges Alexandru Adrian TOLE Romanian American University, Bucharest, Romania email@example.com The amount of data that is traveling across
Big Data + Predictive Analytics = Actionable Business Insights: Consider Big Data as the Most Important Thing for Business since the Internet Adapted from the forthcoming book, Business Innovation in the
Hadoop for Enterprises: Overcoming the Major Challenges Introduction to Big Data Big Data are information assets that are high volume, velocity, and variety. Big Data demands cost-effective, innovative
Big-data Analytics: Challenges and Opportunities Chih-Jen Lin Department of Computer Science National Taiwan University Talk at 台 灣 資 料 科 學 愛 好 者 年 會, August 30, 2014 Chih-Jen Lin (National Taiwan Univ.)
EXECUTIVE SUMMARY Big Data is not an uncommon term in the technology industry anymore. It s of big interest to many leading IT providers and archiving companies. But what is Big Data? While many have formed
Is Connectivity A Human Right? For almost ten years, Facebook has been on a mission to make the world more open and connected. For us, that means the entire world not just the richest, most developed countries.
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 1 (2014), pp. 33-40 International Research Publications House http://www. irphouse.com /ijict.htm Big Data
May 2014, HAPPIEST MINDS TECHNOLOGIES Big Data: Why should enterprises adopt it Author Manish Kumar 1 S HARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright Information This
IBM Software Thought Leadership White Paper June 2013 The top five ways to get started with big data 2 The top five ways to get started with big data Big data: A high-stakes opportunity Remember what life
32 Big Data: present and future Big Data: present and future Mircea Răducu TRIFU, Mihaela Laura IVAN University of Economic Studies, Bucharest, Romania firstname.lastname@example.org, email@example.com
why your hr department needs big data why your hr department needs big data 2 Introduction Big Data is a term that increasingly is used to describe the emerging industry of analyzing multiple databases
www.wipro.com Big Data Moving from the operational to the strategic K. R. Sanjiv, Senior Vice President and Global Head Analytics and Information management, Wipro Technologies Table of Contents 1. The
COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY The business world is abuzz with the potential of data. In fact, most businesses have so much data that it is difficult for them to process
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
White paper Proactive Planning for.. Big Data.. In government, Big Data presents both a challenge and an opportunity that will grow over time. Executive Summary Consider this list of government-adopted
Innovation Winners Create Situational Awareness with Big Data Analytics Platform Innovation hurts. It s messy. And if not managed right, it can be very expensive. That s why CIMS believes the winning firms
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
Big Data Efficiencies That Will Transform Media Company Businesses TV, digital and print media companies are getting ever-smarter about how to serve the diverse needs of viewers who consume content across
Background A White Paper Optimizing your Call Center through Simulation By Bill Hall, Call Center Services and Dr. Jon Anton, Purdue University The challenge for today's call centers is providing value-added
CoolaData Predictive Analytics 9 3 6 About CoolaData CoolaData empowers online companies to become proactive and predictive without having to develop, store, manage or monitor data themselves. It is an
Ventana Research: Predictive Analytics Enters the Mainstream Predictive Analytics Enters the Mainstream Taking Advantage of Trends to Gain Competitive Advantage White Paper Sponsored by 1 Ventana Research
Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards
An Oracle White Paper March 2013 Big Data Analytics Advanced Analytics in Oracle Database Advanced Analytics in Oracle Database Disclaimer The following is intended to outline our general product direction.
Big Data for the Next Big Idea in Financial Services Understanding Customers, Global Economies and Human Welfare with Analytics CONCLUSIONS PAPER Insights from a panel discussion at the SAS Financial Services