Big Data Analytics Process & Building Blocks

Similar documents
Big Data Analytics Building Blocks; Simple Data Storage (SQLite)

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Scaling Up HBase, Hive, Pegasus

Scaling Up 2 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Text Analytics (Text Mining)

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview

Computer Forensics Application. ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

COMP9321 Web Application Engineering

Big Data and Analytics: Challenges and Opportunities

The Need for Training in Big Data: Experiences and Case Studies

B490 Mining the Big Data. 0 Introduction

How To Make Sense Of Data With Altilia

Google Product. Google Module 1

Social Media Marketing UCSB Extension

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Information Management course

Tax Fraud in Increasing

SAFARI. Future Work Ideas. Alberto Garcia-Robledo, Abel Sanchez, Rongsha Li, Juan-Carlos Murillo-Torres, John Williams and Sascha Boheme

Using HP ArcSight API for data visualization

White Paper: Datameer s User-Focused Big Data Solutions

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015

CIS4930/6930 Data Science: Large-scale Advanced Data Analysis Fall Daisy Zhe Wang CISE Department University of Florida

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

BIG DATA TRENDS AND TECHNOLOGIES

SEO, Search Engine and Online Reputation Management

Big Data: calling for a new scope in the curricula of Computer Science. Dr. Luis Alfonso Villa Vargas

BIG DATA IN BUSINESS ENVIRONMENT

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Student Project 1 - Explorative Data Analysis with Hadoop and Spark

The Big Data Paradigm Shift. Insight Through Automation

Elements of a Successful Marketing Plan Part 1. Inbound Marketing Strategies

CMU SCS Large Graph Mining Patterns, Tools and Cascade analysis

Large-Scale Data Processing

Search in BigData2 - When Big Text meets Big Graph 1. Introduction State of the Art on Big Data

Analyzing Big Data with AWS

Big Analytics: A Next Generation Roadmap

Online Marketing Module COMP. Certified Online Marketing Professional. v2.0

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Big Data Spatial Analytics An Introduction

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Extreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh Stratis Viglas Extreme Computing 1

Big Data: A Closer Look

Social Media. A brief overview of the Social Media module

Jay Buckingham Dynamic Signal

How Using Big Data in Security Helps (and Hurts) Us

Fighting Future Fraud A Strategy for Using Big Data, Machine Learning, and Data Lakes to Fight Mobile Communications Fraud

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Reducing Usage on a Service Plan

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

Collaborations between Official Statistics and Academia in the Era of Big Data

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Introduction to Data Mining

Sunnie Chung. Cleveland State University

Big Systems, Big Data

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications

Big Data Analytics. The Hype and the Hope* Dr. Ted Ralphs Industrial and Systems Engineering Director, Laboratory

Big data and its transformational effects

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

UC Small Farm Program Agritourism Webinar Series. Social Media Kristin York - Instructor June 2, 2016

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

Title/Description/Keywords & Various Other Meta Tags Development

How Companies are! Using Spark

Reputation based Security. Vijay Seshadri Zulfikar Ramzan Carey Nachenberg

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

Are You Ready for Big Data?

FRAUD DETECTION AND PREVENTION: A DATA ANALYTICS APPROACH BY SESHIKA FERNANDO TECHNICAL LEAD, WSO2

The Power of Social Data: Transforming Big Data into Decisions. Andreas Weigend

ANALYSIS. wikia.com. YOUR NAME & SLOGAN Call Me:

Application Development. A Paradigm Shift

11 Core Elements Of A Successful Digital and Content Marketing Campaign. RODA marketing is a full service digital marketing and consulting agency.

Transcription:

Big Data Analytics Process & Building Blocks Duen Horng (Polo) Chau Georgia Tech CSE 6242 A / CS 4803 DVA Jan 10, 2013 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos

What is Data & Visual Analytics? 2

What is Data & Visual Analytics? No formal definition! 2

What is Data & Visual Analytics? No formal definition! Polo s definition: the interdisciplinary science of combining computation techniques and interactive visualization to transform and model data to aid discovery, decision making, etc. 2

What are the ingredients? 3

What are the ingredients? Need to worry (a lot) about storage, complex system design, scalability of algorithms, visualization techniques, etc. Used to be simpler before the big data era (why?) 3

What is big data? Why care? Many companies businesses are based on big data (Google, Facebook, Amazon, Apple, Symantec, LinkedIn, and many more) Web search Rank webpages (PageRank algorithm) Predict what you re going to type Advertisement (e.g., on Facebook) Infer users interest; show relevant ads Infer what you like, based on what your friends like Recommendation systems (e.g., Netflix, Pandora, Amazon) Online education

Good news! Many big data jobs What jobs are hot? Data scientist Emphasize breadth of knowledge This course helps you learn some of the skills

Big data analytics process and building blocks

Collection Cleaning Integration Analysis Visualization Presentation Dissemination

Process, not steps Collection Cleaning Integration Analysis Visualization Presentation Dissemination Can skip some Can go back (two-way street) Examples Data types inform visualization design Data informs choice of algorithms Visualization informs data cleaning (dirty data) Visualization informs algorithm design (user finds that results don t make sense)

How big data affects the process? Collection Cleaning Integration Analysis Visualization Presentation Dissemination The 3V of big data Volume: billions, petabytes are common Velocity: think Twitter, fraud detection, etc. Variety: text (webpages), video (e.g., youtube), etc. http://www-01.ibm.com/software/data/bigdata/

Schedule Collection Wk 2 Wk 8 (Hadoop) Cleaning Week 3 Integration Analysis Visualization Presentation Dissemination Wk 4: Classification Wk 5: Clustering Wk 6: Feature selection Wk 7: Graphs Wk 9: Hadoop Wk 12: Text Wk 13: anomaly detection

Two example analytics processes

NetProbe: Fraud Detection in Online Auction WWW 2007 NetProbe http://www.cs.cmu.edu/~dchau/papers/p201-pandit.pdf

NetProbe: The Problem Find bad sellers (fraudsters) on ebay who don t deliver their items $$$ Seller Buyer Auction fraud is #3 online crime in 2010 source: www.ic3.gov 13

14

NetProbe: Key Ideas Fraudsters fabricate their reputation by trading with their accomplices Fake transactions form near bipartite cores How to detect them? 15

NetProbe: Key Ideas Use Belief Propagation F A H Fraudster Accomplice Darker means more likely Honest 16

NetProbe: Main Results 17

18

18

Belgian Police 18

19

What analytics process does NetProbe go through? Collection Scraping Cleaning Integration Analysis Design detection algorithm Visualization Presentation Dissemination Paper, talks, lectures Not released

Discovr movie app

What analytics process would you go through to build the app? Collection Cleaning IMDB, Rotten tomatoes, youtube May have duplicate trailers Integration Analysis Determine which movies are related Visualization Presentation Dissemination Mac app, ios app

Homework 1 (out Jan 17) Collection Cleaning Integration Analysis Visualization Presentation Dissemination Simple End-to-end analysis Collect data from Rotten Tomatoes (using API) Movies (Actors, directors, related movies, etc.) Store in SQLite database Transform data to movie-movie network Analyze, using SQL queries (e.g., create graph s degree distribution) Visualize, using Gephi Describe your discoveries