Big Data and Open Data

Similar documents
COMP9321 Web Application Engineering

Big Data and Analytics: Challenges and Opportunities

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Big Data a threat or a chance?

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

Chapter 7. Using Hadoop Cluster and MapReduce

NoSQL and Hadoop Technologies On Oracle Cloud

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

BIG DATA CHALLENGES AND PERSPECTIVES

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Hadoop and Map-Reduce. Swati Gore

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Are You Ready for Big Data?

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

A Brief Outline on Bigdata Hadoop

Data Refinery with Big Data Aspects

Big Data. Fast Forward. Putting data to productive use

Are You Ready for Big Data?

Real Time Analytics for Big Data. NtiSh Nati

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Hadoop for Enterprises:

Modern (Computational) Approaches to Big Data Analytics. CSC 576 Computer Science, University of Rochester Instructor: Ji Liu

Apache Hadoop Patterns of Use

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Industry 4.0 and Big Data

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Big Data: Study in Structured and Unstructured Data

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

How To Use Hadoop For Gis

L1: Introduction to Hadoop

Leveraging unstructured data for improved decision making: A retail banking perspective

Copyright 2013 Splunk Inc. Introducing Splunk 6

Sources: Summary Data is exploding in volume, variety and velocity timely

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

Doing Multidisciplinary Research in Data Science

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

The New World of Data. Don Strickland President, Strickland & Associates

Tap into Hadoop and Other No SQL Sources

THE AGE OF BIG DATA. Chula DataScience

Big Data Introduction, Importance and Current Perspective of Challenges

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Analyzing Big Data: The Path to Competitive Advantage

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

Collaborations between Official Statistics and Academia in the Era of Big Data

REVIEW PAPER ON BIG DATA USING HADOOP

Open source large scale distributed data management with Google s MapReduce and Bigtable

The Next Wave of Data Management. Is Big Data The New Normal?

Big Data Efficiencies That Will Transform Media Company Businesses

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

Big Systems, Big Data

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

Now, Next and the Future: IT, Big Data and other Implications for RIM. Presented by Michael S. Smith /

Big Data Too Big To Ignore

A New Era Of Analytic

Getting to Know Big Data

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Open source Google-style large scale data analysis with Hadoop

Big Data: Tools and Technologies in Big Data

Big Data and Hadoop for the Executive A Reference Guide

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop and Map-reduce computing

BIG DATA & ANALYTICS

Big Data Systems CS 5965/6965 FALL 2014

Big Data: Big Challenge, Big Opportunity A Globant White Paper By Sabina A. Schneider, Technical Director, High Performance Solutions Studio

Keywords Big Data Analytic Tools, Data Mining, Hadoop and MapReduce, HBase and Hive tools, User-Friendly tools.

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

The Mobile Marketer s Complete Guide to User Acquisition

Taking Data Analytics to the Next Level

Business Process Services. White Paper. Social Media Influence: Looking Beyond Activities and Followers

Demystifying Big Data Government Agencies & The Big Data Phenomenon

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Introduction to the Mathematics of Big Data. Philippe B. Laval

Statistical Challenges with Big Data in Management Science

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Journée Thématique Big Data 13/03/2015

Big Data and Society: The Use of Big Data in the ATHENA project

1. Layout and Navigation

Generating the Business Value of Big Data:

Machine Learning and Predictive Analytics Foster Growth [1]

Transcription:

Big Data and Open Data Bebo White SLAC National Accelerator Laboratory/ Stanford University!! bebo@slac.stanford.edu

dekabytes hectobytes

Big Data IS a buzzword!

The Data Deluge From the beginning of recorded time until 2003, mankind generated 5 exabytes of data In 2011, the same amount of data was generated every two days In 2013, the same amount of data was generated every 10 minutes

Big Data - a Possible Definition Refers to datasets whose size is beyond the ability of Single storage devices Typical database software tools to capture, store, manage, and analyze (McKinsey Global Institute) This definition is not defined in terms of data size (which will increase) It can vary by sector/usage

So no cool Big Data apps for Mac or ios - yet

Where is Big Data Coming From? EVERYWHERE! Any communication over a network involves transfer of data that is meaningful to someone Every e-mail, every tweet, every transaction, every social media interaction, etc.,etc. Sensors - The Internet of Things

The Cloud/IOT/WOT is/ will be a very noisy place An unbelievable of objects (theoretically more than 10e38) will be able to talk to us and to each other (orders of magnitude more than now) We will be interested in hearing what some of them have to say How can we manage these conversations? Traditional interfaces break down

High-volume, -velocity, and -variety information assets that demand cost-effective innovative forms of information processing for enhanced insight and decision making

Volume? ~ data volume worldwide in 2013 = 3.5 ZB (including 400 billion feature length HD movies) Velocity? Every 60 sec. on Facebook - 510K posted comments; 293K status updates; 136K uploaded photos 30 billion shares 20 million apps installed

Variety? Any type of data both meaningful and meaningless Veracity? How is trust established? What does like really mean?

Usage Plus: ISPs Utilities Academic institutions Everyone

What/How Does Target Know About Pregnant Women?

Challenges of Harnessing Big Data Datamining huge datasets Shortages of Big Data experts Privacy, legal, and social issues Strategies for acquiring Big Data - a new form of currency

Big Data Analytics and Data Science

Data Science Skill Set

Typical Big Data Problem Iterate over a large number of records Extract something of interest from each (MAP) Shuffle and sort immediate results Aggregate immediate results (REDUCE) Generate final output

MapReduce Can Refer to The programming model The execution framework (aka runtime The specific implementation

MapReduce Implementations Google has a proprietary implementation in C++ Bindings in Java, Python Hadoop is an open-source implementation in Java Development led by Yahoo!, now an Apache project Used in production at Yahoo!, Facebook, Twitter, LinkedIn, Netflix, etc. The de facto Big Data processing platform Lots of custom research implementations

Example - Sentiment Analysis Goal - gauging mood on social network data Not a traditional survey or focus group Social sites operate 24/7 Timeliness - not subject to time lags Useful to marketers, IT, customers, etc.

Difficult Comment Analysis (1/2) False negatives - crying & crap (negative) vs. crying with joy & holy crap! (positive) Relative sentiment - I bought a Honda Accord - great for Honda, bad for Toyota Compound sentiment - I love the phone but hate the network Conditional sentiment - If someone doesn t call me back, I m never doing business with them again!

Difficult Comment Analysis Problems (2/2) Scoring sentiment - I like it vs. I really like it vs. I love it Sentiment modifiers - I bought an iphone today :-) Gotta love the telephone company ;-< International/cultural sentiments Japanese - unique emoticons for crying - (;_;) Italians - effusive, grandiose British - drier, less effusive

Linked Data Provides access to the semantics of data items Based upon Semantic Web technologies and ontologies Designed for machines first and humans later Degree of structure in descriptions of things is high

Big Data tends to be unstructured data (e.g., lists, e-mails, tweets, etc.) Therefore it tends to be thin rather than thick Thin means very little (if any) context - Suppose I send some e-mail stating My favorite book is The Dharma Bums What can be added to this data to change it from thin to thick?

Linked Data is similar to Metadata but provides Context

Linked Data Pros Far more parseable and machine processable than raw unstructured data Enhances data descriptions for complex analyses Can contribute to the VERACITY of our data Wide variety of discipline/data ontologies available

Linked Data Cons Much harder to do than adding keyword metadata Building efficient processing applications and parsers Implementing effective linked data stores

Linked Open Data LOD refers to data stores of Linked Data that are published (made available online and accessed via URLs) and free to use Open data means it must be available to all without copyright or ownership There is an increasing trend towards opening government data (US and UK, San Francisco and more) and scientific results Provides unprecedented ability to build mashup applications

But we are here to discuss Mac and ios I believe that there is great future potential for powerful and creative Mac and ios apps that leverage Big Data analytics results Linked Data and LOD data stores Great possibilities in E-Commerce, Education, Productivity, Social Interaction, etc., etc. The people in this room can help define, drive and evangelize these concepts!

! Thank You! Questions? Comments? bebo@slac.stanford.edu Want copies of these slides?