CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.



Similar documents
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

CIS4930/6930 Data Science: Large-scale Advanced Data Analysis Fall Daisy Zhe Wang CISE Department University of Florida

Doing Multidisciplinary Research in Data Science

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Introduction to Predictive Analytics. Dr. Ronen Meiri

COMP9321 Web Application Engineering

WHAT IS BIG DATA? David Bechtold

The Next Wave of Data Management. Is Big Data The New Normal?

BIG DATA: ARE YOU READY? Andy Kyiet Demand Flow Intelligence May, 2013

What happens when Big Data and Master Data come together?

Big Data Introduction, Importance and Current Perspective of Challenges

BIG DATA FUNDAMENTALS

Big Data Analytics: Collecting, Analyzing and Decision Making

SECURITY MEETS BIG DATA. Achieve Effectiveness And Efficiency. Copyright 2012 EMC Corporation. All rights reserved.

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Now, Next and the Future: IT, Big Data and other Implications for RIM. Presented by Michael S. Smith /

Introduction to the Mathematics of Big Data. Philippe B. Laval

Introduction to Data Mining

Big Data and Analytics: Challenges and Opportunities

Statistical Challenges with Big Data in Management Science

Collaborations between Official Statistics and Academia in the Era of Big Data

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

BIG DATA Impact on DMOs. TTRA June 21, 2013

From Internet Data Centers to Data Centers in the Cloud

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

How To Use Big Data Effectively

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

Statistics for BIG data

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

Big Data Analytics Process & Building Blocks

Sunnie Chung. Cleveland State University

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Intro to Big Data and Business Intelligence

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

Big Data a threat or a chance?

So Just What Is Big Data? James E. Tcheng, MD, FACC, FSCAI

Two Recent LE Use Cases

Problems to store, transfer and process the Big Data 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 1

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

Changing the face of Business Intelligence & Information Management

Tap into Hadoop and Other No SQL Sources

BIG DATA CHALLENGES AND PERSPECTIVES

Big Data Analytics Building Blocks; Simple Data Storage (SQLite)

1. Understanding Big Data

CPS 216: Advanced Database Systems (Data-intensive Computing Systems) Shivnath Babu

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

BIG DATA, BIOBANKS AND PREDICTIVE ANALYTICS FOR A BETTER CLINICAL OUTCOME

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Big Analytics: A Next Generation Roadmap

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

A Survey on Big Data Concepts and Tools

Are You Ready for Big Data?

Big Data. Fast Forward. Putting data to productive use

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

BIG DATA I N B A N K I N G

We are Big Data A Sonian Whitepaper

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Application Development. A Paradigm Shift

Beyond Watson: The Business Implications of Big Data

Applications for Business Intelligence, Predictive Analytics and Big Data

Open source Google-style large scale data analysis with Hadoop

The 3 questions to ask yourself about BIG DATA

Generating the Business Value of Big Data:

The New World of Data. Don Strickland President, Strickland & Associates

Modern (Computational) Approaches to Big Data Analytics. CSC 576 Computer Science, University of Rochester Instructor: Ji Liu

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Copyright 2013 Splunk Inc. Introducing Splunk 6

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 12

Big Data: An Introduction, Challenges & Analysis using Splunk

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Big Data Analytics & Electronics System Design

Big Data Analytics. The Hype and the Hope* Dr. Ted Ralphs Industrial and Systems Engineering Director, Laboratory

Are You Ready for Big Data?

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Transcription:

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang

Data Science Overview Why, What, How, Who

Outline Why Data Science? What is Data Science? What are some prominent examples of Data Science? How to become a Data Scientist? Who are hiring Data Scientists Now? 3

Why Data Science? 4

The Dawn of Big Data Google, Yahoo today Web Search and Computational advertising Google: 35,000 searches/sec Yahoo! scale: 600 million users per month, 4 billion clicks per day, 25 terabytes of data collected every day Netflix 2007 Movie recommendations, netflix prize 100 million ratings, 500,000 users, 18,000 movies Amazon 2003 Product recommendations, reviews 29 million customers, millions of products Word Economic Forum 2011 at Davos Personal data digital data created by and about people represents a new economic asset class, touching all aspects of society.

How Big is Your Data? Kilobyte (1000 bytes) Megabyte (1 000 000 bytes) Gigabyte (1 000 000 000 bytes) Terabyte (1 000 000 000 000 bytes) Petabyte (1 000 000 000 000 000 bytes) Exabyte (1 000 000 000 000 000 000 bytes) Zettabyte (1 000 000 000 000 000 000 000 bytes) Yottabyte (1 000 000 000 000 000 000 000 000 bytes) 7

5 Vs of Big Data Raw Data: Volume Change over time: Velocity Data types: Variety Data Quality: Veracity Information for Decision Making: Value

Cloud Computing Cloud computing is a style of computing where scalable and elastic IT-enabled capabilities are delivered as a (pay-asyou-go) service to external customers using Internet technologies. -- Gartner IT Glossary Cloud Computing is a new term for a long-held dream of computing as a utility -- Above the Clouds, 2009 9

Cloud Computing = Cloud + SaaS Cloud computing refers to both: Cloud: The hardware and system software in the datacenters that provide those services. Public Cloud (Utility Computing) vs. Private Cloud SaaS: The applications delivered as services over the Internet, and Cloud Computing started around 2006 Big Data and Data Science (Big Data Analytics) started around 2011 10

Current Trends Applications has bigger data and need more advanced analysis Example: Web, Corporate documents and Emails Natural Language Processing Example: Social Media Network/Graph Analysis IT Infrastructure moving to Cloud Computing Data Science arise given this application pull and technology push 11

What is Data Science? 12

Data Science A Definition Data Science is the science which uses computer science, statistics and machine learning, visualization and humancomputer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products. 13

Goal of Data Science Turn data into data products.

Data to Data Products Transaction Databases Fraud Detection Wireless Sensor Data Smart Home Text Data, Social Media Data Product Review and Consumer Satisfaction Software Log Data Automatic Trouble Shooting Genotype and Phenotype Data New treatment for Cancer

Other Data Products Financial products for investment or retirement funds Legal profession uses e-discovery tool for retrieval and review of legal documents Political campaign management Sports (e.g., Oakland baseball team) Remote Sensing for Environment Monitoring

What are some prominent examples of Data Science? 17

Data Products Google Web Search Google Ads News Recommendation Engine Google Maps Currently one of the best if not the best IT company to work for. (Google event on Jan 21/22)

Data Products Netflix Personalized Movie Ratings Movie Recommendations Similar Movies Movie Categories (e.g., 80 s movie with a strong female lead, Kung Fu movies) BlockBuster is out of the business

Data Products LinkedIn/Facebook People you may know Applications you may like Jobs/Events you might be interested Classifier for bad users and bad content With high accuracy, Facebook can guess whether you are single or married Who does not have LinkedIn/Facebook Account?

Data Products Twitter Text Analysis Spam Filter/Similarity Search User Sentiment/Satisfaction/Feedback News Breakout Trend and Topics 200 million users as of 2011, generating over 200 million tweets and handling over 1.6 billion search queries per day

Data Products Splunk Degradation, Failure Detection Identify Security Breach Event Monitoring Troubleshoot Tools Cross-platform Event Correlation Founded 2004, IPO in 2012

How to become a Data Scientist? 23

The Life of Data (state-of-the-art) Users Interface Collect Clean Integrate Analysis Visualization Data Sources

Challenges in Data Science Preparing Data (Noisy, Incomplete, Diverse, Streaming ) Analyze Data (Scalable, Accurate, Realtime, Advanced Methods, Probabilities and Uncertainties...) Represent Analysis Results (i.e. data product) (Story-telling, Interactive, explainable )

Skill Set of a Data Scientist Data Management Data collection, storage, cleaning, filtering, integration Large-scale Parallel Data Processing Parallel computing Statistics and Machine Learning Data modeling, inference, prediction, pattern recognition Interface and Data Visualization HCI design, visualization, story-telling

Who are hiring Data Scientists Now? 27

Sexy Job in the next 10 years The sexy job in the next ten years will be The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that s going to be a hugely important skill. -- Hal Varian, Google Chief Economist, 2009

Who s hiring Data Scientist? IT companies: Google, Twitter, Lexis/Nexis, Facebook, Pivotal/EMC Media and Financial sectors Fox, CNN, NYT, Bloomburg, Research: Biology, Medicine, Physics, Psychology, Information office in government and corporations Law firms: e-discovery tools

Books on Data Science 30

Additional Reading Pointers Data Science Summit (Strata) (http://www.datascientistsummit.com/ ) Kaggle Competitions (http://www.kaggle.com/) Data Science course at Berkeley & Corsera (http://datascienc.es/ ) 31

Summary Why now: Dawn of Big Data, Need for Advanced Analytics and Cloud Computing What is it: Data Data Product, many examples incl. Google, Netflix, Splunk, LinkIn How to become: Data management, parallel computing and data processing, statistical machine learning, and visualization skills Life of Data Who are hiring: Data Scientists are in great demands, from industry to government to science. 32