Modern (Computational) Approaches to Big Data Analytics. CSC 576 Computer Science, University of Rochester Instructor: Ji Liu



Similar documents
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Big Data Explained. An introduction to Big Data Science.

Introduction to Big Data & Basic Data Analysis. Freddy Wetjen, National Library of Norway.

Mobile Monetization Scenario Design & Big Data. Arther Wu Senior Director of Monetization and Business Operation

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

1. Understanding Big Data

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

What happens when Big Data and Master Data come together?

L1: Introduction to Hadoop

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

Big Data Analytics. Lucas Rego Drumond

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

Big Data and Open Data

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

The? Data: Introduction and Future

BIG DATA FUNDAMENTALS

WHAT IS BIG DATA? David Bechtold

Using Big Data to Explore New Opportunities. Fandhy Haristha Siregar, M.Kom, CIA, CRMA, CISA, CISM, CISSP, CEH, CEP-PM, QIA, COBIT5

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Introduction to the Mathematics of Big Data. Philippe B. Laval

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Collaborations between Official Statistics and Academia in the Era of Big Data

Now, Next and the Future: IT, Big Data and other Implications for RIM. Presented by Michael S. Smith /

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

So Just What Is Big Data? James E. Tcheng, MD, FACC, FSCAI

Big Data Operations: Basis for Benchmarking Big Data Systems

Big Data. Introducción. Santiago González

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

BIG DATA TRENDS AND TECHNOLOGIES

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

Modern Data Warehouse

Outline. What is Big data and where they come from? How we deal with Big data?

Big Data in Transportation Engineering

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Big Data Executive Survey

COMP9321 Web Application Engineering

Statistical Challenges with Big Data in Management Science

Changing the face of Business Intelligence & Information Management

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications

Big Data Analytics: 14 November 2013

Lecture 9: Data Mining, Data Analytics and Big Data

THE AGE OF BIG DATA. Chula DataScience

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Turning Big Data into Big Decisions Delivering on the High Demand for Data

Volume 3, Issue 8, August 2015 International Journal of Advance Research in Computer Science and Management Studies

Social Media Boot Camp

Big data and its transformational effects

Open source Google-style large scale data analysis with Hadoop

Big Data and Analytics: Challenges and Opportunities

Chapter 1. Contrasting traditional and visual analytics approaches

Big Data and Analytics:

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

The InterNational Committee for Information Technology Standards INCITS Big Data

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 12

DATAOPT SOLUTIONS. What Is Big Data?


How To Understand Data Theory

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Oracle Big Data for Dummies

Big Data. Fast Forward. Putting data to productive use

Exploiting Data at Rest and Data in Motion with a Big Data Platform

A Survey on Big Data Concepts and Tools

Search Engine Marketing(SEM)

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Data Centric Computing Revisited

Big Analytics: A Next Generation Roadmap

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

Sunnie Chung. Cleveland State University

Large-Scale Data Processing

U N D E R S TA N D I N G T H E D N A O F DATA SCIENCE Persontyle Ltd. All rights reserved.

Chapter 7. Using Hadoop Cluster and MapReduce

HKUST-MIT Research Alliance Consortium. Call for Proposal. Lead Universities. Participating Universities

Majed Al-Ghandour, PhD, PE, CPM Division of Planning and Programming NCDOT 2016 NCAMPO Conference- Greensboro, NC May 12, 2016

2013 BIG DATA OPPORTUNITIES SURVEY

Big Data Analytics Process & Building Blocks

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Internet Marketing Institute Delhi Mobile No.: DIMI. Internet Marketing Institute Delhi (DIMI)

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Transcription:

Modern (Computational) Approaches to Big Data Analytics CSC 576 Computer Science, University of Rochester Instructor: Ji Liu

Big Data in Academy SIGKDD 2014 (program page, found 14 big data, 50+ large scale ) http://www.kdd.org/kdd2014/program.html ICML 2014 (3 of 6 tutorials are about big data ) http://icml.cc/2014/index/article/17.html

Big Data in Industry From linkedin, I found 2,107 results for data scientist positions 865 results for Java programmer positions 436 results for c++ programmer positions

What is ``Big Data''? A Mock from a professor of psychology and behavioral economics Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it ---- Dan Ariely.

Big Data Every Where! Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Social Network

How ``Big''? Google processes 20 PB a day (2008) Wayback Machine has 3 PB per month (3/2009) Facebook has 2.5 PB of user data + 15 TB/day (4/2009) ebay has 6.5 PB of user data + 50 TB/day (5/2009) CERN s Large Hydron Collider (LHC) generates 15 PB a year

Variety Volume Velocity Veracity Value 2014 Advanced Performance Institute, BWMC Ltd. All rights reserved. 8

Variety Volume Velocity Veracity Value 2014 Advanced Performance Institute, BWMC Ltd. All rights reserved. Volume refers to the vast amounts of data generated every second. We are not talking Terabytes but Zettabytes or Brontobytes. If we take all the data generated in the world between the beginning of time and 2008, the same amount of data will soon be generated every minute. This makes most data sets too large to store and analyse using traditional database technology. New big data tools use distributed systems so that we can store 9 and analyse data across databases that are dotted around anywhere in the world.

Variety Volume Velocity Veracity Value Velocity refers to the speed at which new data is generated and the speed at which data moves around. Just think of social media messages going viral in seconds. Technology allows us now to analyse the data while it is being generated (sometimes referred to as in-memory analytics), without ever putting it into databases. 10 2014 Advanced Performance Institute, BWMC Ltd. All rights reserved.

We see increasing variety of data types: Variety Volume Veracity Velocity Value 2014 Advanced Performance Institute, BWMC Ltd. All rights reserved. Variety refers to the different types of data we can now use. In the past we only focused on structured data that neatly fitted into tables or relational databases, such as financial data. In fact, 80% of the world s data is unstructured (text, images, video, voice, etc.) With big data technology we can now analyse and bring together data of different types such as messages, social media conversations, photos, 11 sensor data, video or voice recordings.

Variety Volume Veracity Value Velocity Veracity refers to the messiness or trustworthiness of the data. With many forms of big data quality and accuracy are less controllable (just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech as well as the reliability and accuracy of content) but technology now allows us to work with this type of data. 12 2014 Advanced Performance Institute, BWMC Ltd. All rights reserved.

Value The most important V of all! Variety Volume Veracity Value Velocity Then there is another V to take into account when looking at Big Data: Value! Having access to big data is no good unless we can turn it into value. Companies are starting to generate amazing value from their big data. 13 2014 Advanced Performance Institute, BWMC Ltd. All rights reserved.

Recommendation System Example 1

Recommendation System Example 2

Video Analysis

Video Surveillance

Steps of Data Analysis Pose a problem Collect data raw and dirty data Pre-process data (like extract feature) clean data Design mathematical model (formulation) Find a solution Evaluation