Big Data: A Closer Look



Similar documents
Bringing Big Analytics to the Masses Neal Leavitt

Online Content Optimization Using Hadoop. Jyoti Ahuja Dec

Big Data Analytics & Electronics System Design

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Business Analytics Research and Teaching Perspectives

A Professional Big Data Master s Program to train Computational Specialists

Leveraging Big Data in Introductory Programming Courses. David Reed Department of Journalism, Media & Computing Creighton University

Big Data Analytics Process & Building Blocks

COMP9321 Web Application Engineering

Main Memory Data Warehouses

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

The Need for Training in Big Data: Experiences and Case Studies

Tammy Pirmann HS CS teacher in PA NSF RET in Big Data with Temple University Teach CS Principles course

Visualizing Data from Government Census and Surveys: Plans for the Future

Machine Learning using MapReduce

Big Data and Analytics: Challenges and Opportunities



SQream Technologies Ltd - Confiden7al

Big Data & Cloud Computing. Faysal Shaarani

Inge Os Sales Consulting Manager Oracle Norway

Performance Testing for SAS Running on AIX and Windows NT Platforms

Mining Text Data for Useful Information in Higher Education John Zilvinskis Indiana University

The Big Data Paradigm Shift. Insight Through Automation

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Text Mining - Scope and Applications

Séminaire du LaDHUL. Periklis Andritsos, «Big Data Challenges, Opportunities and Avenues of Research, or How did I grow up talking to data»

Fax No: (0360) , Academic Plan

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

How To Improve Your Profit With Optimized Prediction

Now, Next and the Future: IT, Big Data and other Implications for RIM. Presented by Michael S. Smith /

search engine optimization sheet

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

«The Five Myths of Predictive Analytics» 1

How To Understand Data Mining In R And Rattle

Too Big to Ignore. The Business Case for Big Data. Wiley and SAS Business Series

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Hexaware E-book on Predictive Analytics

How To Handle Big Data With A Data Scientist

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

EMC: Managing Data Growth with SAP HANA and the Near-Line Storage Capabilities of SAP IQ

Qi Liu Rutgers Business School ISACA New York 2013

Big Data Analytics in Facilities Management

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Data Management in SAP Environments

Large-Scale Data Processing

Customized Report- Big Data

Collaborative Filtering. Radek Pelánek

BIG DATA Impact on DMOs. TTRA June 21, 2013

SLIDE 1 Previous Next Exit

How to Quickly Leverage Big Data Analytics May 23, 2013

CFASAA231 - Sqa Unit Code H4RT 04 Use IT to support your role

$257,493, \ \ \00001

Information Management course

Hadoop and Map-reduce computing

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

A New Era Of Analytic

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Survey of Big Data Architecture and Framework from the Industry

BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA

9 % 13 % 12 % Industry Survey Who did we speak to? Between April and May 2015, FC Business Intelligence spoke to over300

Session 5. Mixing and matching Public, Private and Hybrid Clouds for maximum benefits

The Theory And Practice of Testing Software Applications For Cloud Computing. Mark Grechanik University of Illinois at Chicago

NESSI Summit 2014 The European Data Market. Gabriella Cattaneo, IDC Europe May 27, 2014 Brussels

Open source Google-style large scale data analysis with Hadoop

DELL s Oracle Database Advisor

Towards a Big Data Taxonomy. Bill Mandrick, PhD Data Tactics Version 26_August_2013

HYBRID ARCHITECTURE IN THE CLOUD

Big Data in Healthcare: Myth, Hype, and Hope

<Insert Picture Here> Big Data

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

BMW11: Dealing with the Massive Data Generated by Many-Core Systems. Dr Don Grice IBM Corporation

Big Data in Transportation Engineering

Open source large scale distributed data management with Google s MapReduce and Bigtable

Hadoop & its Usage at Facebook

No-SQL Databases for High Volume Data

PREDICTIVE ANALYTICS. The power to predict who will click, buy, lie, or die

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

Secure Web. Hardware Sizing Guide

BIOLOMICS SOFTWARE & SERVICES GENERAL INFORMATION DOCUMENT

Networking in the Hadoop Cluster

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

think Think about it New analytics from cobalt sky

The Archival Upheaval Petabyte Pandemonium Developing Your Game Plan Fred Moore President

Outline. What is Big data and where they come from? How we deal with Big data?

NETS for Teachers: Achievement Rubric

How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) (

EMC SYMMETRIX VMAX 20K STORAGE SYSTEM

Hadoop & its Usage at Facebook

An Introduction to the Topics Course

Introduction to Data Mining

The New World of Data. Don Strickland President, Strickland & Associates

Search Big Data with MySQL and Sphinx. Mindaugas Žukas

Axibase Time Series Database

From The Chair. Computer Information Systems & Quantitative Methods. Preparing students today to take advantage of technology and analytics tomorrow!

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Transcription:

Big Data: A Closer Look

What is Big Data? Increasingly popular search query http://www.google.com/trends/explore#q=big%20data

What is Big Data? the V s

Characteristics of Big Data Volume http://www.businessweek.com/articles/2014-03-06/interactive-graphichow-big-is-big-data

Characteristics of Big Data Volume Velocity http://blog.lionbridge.com/enterprise-crowdsourcing/files/2013/06/bigdata-blog-post-graphic.png

Characteristics of Big Data Volume Velocity Variety http://www.datasciencecentral.com/forum/topics/the-3vs-that-define-big-data

Characteristics of Big Data Volume Velocity Variety Veracity http://www.sec.gov/archives/edgar/data/51143/000110465913015636/g61551bii004.gif http://www.datasciencecentral.com/profiles/blogs/data-veracity http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-vs-of-big-data.jpg

What to do with Big Data?

What to do with Big Data? Advances in Computer Science (CS) sample companies sample applications in analytics Focus of speakers

Sample Companies EMC http://www.emc.com/campaign/bigdata/index.htm

Sample Companies EMC (data storage hardware) http://www.emc.com/campaign/bigdata/index.htm

Sample Companies EMC (data storage hardware) http://www.emc.com/campaign/bigdata/index.htm Oracle http://www.oracle.com/us/technologies/big-data/index.html

Sample Companies EMC (data storage hardware) http://www.emc.com/campaign/bigdata/index.htm Oracle (database software) http://www.oracle.com/us/technologies/big-data/index.html

Sample Companies EMC (data storage hardware) http://www.emc.com/campaign/bigdata/index.htm Oracle (database software) http://www.oracle.com/us/technologies/big-data/index.html SAS http://www.sas.com/en_us/insights/big-data.html

Sample Companies EMC (data storage hardware) http://www.emc.com/campaign/bigdata/index.htm Oracle (database software) http://www.oracle.com/us/technologies/big-data/index.html SAS (analytics software) http://www.sas.com/en_us/insights/big-data.html

Sample Companies EMC (data storage hardware) http://www.emc.com/campaign/bigdata/index.htm Oracle (database software) http://www.oracle.com/us/technologies/big-data/index.html SAS (analytics software) http://www.sas.com/en_us/insights/big-data.html IBM http://www.ibm.com/big-data/

Sample Companies EMC (data storage hardware) http://www.emc.com/campaign/bigdata/index.htm Oracle (database software) http://www.oracle.com/us/technologies/big-data/index.html SAS (analytics software) http://www.sas.com/en_us/insights/big-data.html IBM (computational hardware, database & analytics software, services) http://www.ibm.com/big-data/

Sample Applications in Analytics Automated Recommendation of products (amazon, netflix) Organization of news articles (google news)

Product Recommendation

Recommendation Systems Netflix Recommending movies Amazon Recommending products of many kinds

Netflix Prize netflixprize.com $1 million 10% improvement in accuracy over Netflix algorithm/system (in 5 years) 51K contestants, 41K teams, 186 countries Did any team win the prize? If so, how long did it take?

Netflix Prize Data (1998-2005) Customers 480,189 (ID: 1 2,649,429) Movies 17,770 (ID: 1 17,770) ID, title, year Ratings given in Training Set 100,480,507 min=1; max=17,653; avg=209 ratings per customer Rating scale: 1 5 Date Ratings to predict in Qualifying Set 2,817,131 About 1 GB (700 MB compressed)

Netflix Problem M1 M2 M3 M4 M5 M6 M7 M8 M9 C1? 1 3 4? C2 2 3 1 4 5 C3 2 5 3 3 3 4 4 1 C4 2 3 2 C5 4 1 3 3? = unknown rating to be predicted

Organizing News Articles

news.google.com Google News

Google News news.google.com News organizer from many sources Krishna Bharat, 2006 http://googleblog.blogspot.com/2006/01/and-now-news.html using computers to organize the world's news in real time and providing a bird's eye view of what's being reported on virtually any topic.

Google News news.google.com News organizer from many sources Krishna Bharat, 2006 http://googleblog.blogspot.com/2006/01/and-now-news.html using computers to organize the world's news in real time and providing a bird's eye view of what's being reported on virtually any topic. By presenting news "clusters" (related articles in a group), we thought it would encourage readers to get a broader perspective by digging deeper into the news -- reading ten articles instead of one, perhaps -- and then gain a better understanding of the issues, which could ultimately benefit society

Google News Problem Input News articles Output Clusters of news articles Algorithm (how)?

Animation of a Clustering Algorithm http://shabal.in/visuals/kmeans/1.html

Sample Applications in Analytics Automated Spam Detection -- Ryan Stansifer DNA Sequence Analysis -- Debasis Mitra

Upcoming CS Outreach Events Mar: teacher workshop Mar-Apr: student competition Boulder Dash Details in mid March Jul: summer camps cs.fit.edu/~pkc/cs4hs Questions?