CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science



Similar documents
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

CIS4930/6930 Data Science: Large-scale Advanced Data Analysis Fall Daisy Zhe Wang CISE Department University of Florida

Doing Multidisciplinary Research in Data Science

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Big Data Analytics Process & Building Blocks

WHAT IS BIG DATA? David Bechtold

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Big Data Analytics Building Blocks; Simple Data Storage (SQLite)

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Introduction to Predictive Analytics. Dr. Ronen Meiri

Introduction to the Mathematics of Big Data. Philippe B. Laval

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

BIG DATA FUNDAMENTALS

Introduction to Data Mining

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

COMP9321 Web Application Engineering

Collaborations between Official Statistics and Academia in the Era of Big Data

Big Data Analytics: Collecting, Analyzing and Decision Making

Big Data and Analytics: Challenges and Opportunities

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

The Next Wave of Data Management. Is Big Data The New Normal?

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

Statistical Challenges with Big Data in Management Science

Big Data Introduction, Importance and Current Perspective of Challenges

What happens when Big Data and Master Data come together?

Two Recent LE Use Cases

Sunnie Chung. Cleveland State University

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

A Survey on Big Data Concepts and Tools

Statistics for BIG data

Sentiment Analysis on Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

We are Big Data A Sonian Whitepaper

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Industry 4.0 and Big Data

Intro to Big Data and Business Intelligence

So Just What Is Big Data? James E. Tcheng, MD, FACC, FSCAI

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA: ARE YOU READY? Andy Kyiet Demand Flow Intelligence May, 2013

SECURITY MEETS BIG DATA. Achieve Effectiveness And Efficiency. Copyright 2012 EMC Corporation. All rights reserved.

Social Media Marketing Measurement A research project to understand new possibilities

Beyond Watson: The Business Implications of Big Data

Big Systems, Big Data

CPS 216: Advanced Database Systems (Data-intensive Computing Systems) Shivnath Babu

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

Changing the face of Business Intelligence & Information Management

Big Data Analytics. The Hype and the Hope* Dr. Ted Ralphs Industrial and Systems Engineering Director, Laboratory

BIG DATA I N B A N K I N G

Are You Ready for Big Data?

Big Data a threat or a chance?

1. Understanding Big Data

Big Data: Study in Structured and Unstructured Data

How To Make Sense Of Data With Altilia

What is Big Data used for? Intro to Big Data INRIA

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

Generating the Business Value of Big Data:

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Big Data. Lyle Ungar, University of Pennsylvania

Online Marketing Module COMP. Certified Online Marketing Professional. v2.0

Modern (Computational) Approaches to Big Data Analytics. CSC 576 Computer Science, University of Rochester Instructor: Ji Liu

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Large-Scale Data Processing

From Internet Data Centers to Data Centers in the Cloud

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Big Data Systems CS 5965/6965 FALL 2014

Open source Google-style large scale data analysis with Hadoop

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Ramesh Bhashyam Teradata Fellow Teradata Corporation

Big Data in Telco & Banking Analytics. Benjamin Sznajder IBM Research Haifa

Now, Next and the Future: IT, Big Data and other Implications for RIM. Presented by Michael S. Smith /

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Application Development. A Paradigm Shift

Big data and its transformational effects

SOCIAL MEDIA ADVERTISING STRATEGIES THAT WORK

Are You Ready for Big Data?

Transcription:

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20

Review Overview of Data Science Why Data Science? What is Data Science? What are some prominent examples of Data Science? How to become a Data Scientist? Who are hiring Data Scientists Now? 21

The Dawn of Big Data Google, Yahoo today Core Business: Web Search and Computational advertising Google: 35,000 searches/sec Yahoo! scale: 600 million users per month, 4 billion clicks per day, 25 terabytes of data collected every day Netflix 2007 Movie recommendations 100 million ratings, 500,000 users, 18,000 movies netflix prize Amazon 2003 Product recommendations, reviews 29 million customers, millions of products Word Economic Forum 2011 at Davos Personal data digital data created by and about people represents a new economic asset class, touching all aspects of society. 22

How Big is Your Data? Kilobyte (1000 bytes) Megabyte (1 000 000 bytes) Gigabyte (1 000 000 000 bytes) Terabyte (1 000 000 000 000 bytes) Petabyte (1 000 000 000 000 000 bytes) Exabyte (1 000 000 000 000 000 000 bytes) Zettabyte (1 000 000 000 000 000 000 000 bytes) Yottabyte (1 000 000 000 000 000 000 000 000 bytes) It is not only about volumes! 23

5 Vs of Big Data Raw Data: Volume Change over time: Velocity Data types: Variety Data Quality: Veracity Information for Decision Making: Value 24

Multimodal Data Sources on World Cup Common Twitter ~5M tweets with 244,000 associating images News & Blogs 776 blog posts and associated pictures Facebook 250 posts and associated images Youtube 460 minutes of Videos and Audio Flickr 913 images and associated captions Google Images 1,920 images and associated captions 25

Current Trends Applications has bigger data and need more advanced analysis Example: Web, Corporate documents and Emails Natural Language Processing Example: Social Media Network/Graph Analysis IT Infrastructure moving to Cloud Computing (2006) Parallel Computing, SaaS Elastic Computing pay-as-you-go, scale up/down Data Science arise given this application pull and technology push (2011) 26

27

What is Data Science? 28

Data Science A Definition Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products. 29

Goal of Data Science Turn data into data products. 30

Data to Data Products Web Text Data Knowledge Bases with Facts & Relationships Twitter Text/Images Real-time Events and News Knowledge bases first-order inference rules and constraints Text Data, Social Media Data Product Review and Consumer Satisfaction Software Log Data Automatic Trouble Shooting Genotype and Phenotype Data common patterns New treatment for Cancer 31

Other Data Products Financial products for investment or retirement funds Legal profession uses e-discovery tool for retrieval and review of legal documents Political campaign management Sports (e.g., Oakland baseball team) Remote Sensing for Environment Monitoring 32

What are some prominent examples of Data Science? 33

Data Products Google Web Search Google Ads News Recommendation Engine Google Maps Currently one of the best if not the best IT company to work for. (Google event on Jan 21/22) 34

Data Products Netflix Personalized Movie Ratings Movie Recommendations Similar Movies Movie Categories (e.g., 80 s movie with a strong female lead, Kung Fu movies) BlockBuster is out of the business 35

Data Products LinkedIn/Facebook People you may know Applications you may like Jobs/Events you might be interested Classifier for bad users and bad content With high accuracy, Facebook can guess whether you are single or married Who does not have LinkedIn/Facebook Account? 36

Data Products Twitter Text Analysis Spam Filter/Similarity Search User Sentiment/Satisfaction/Feedback News Breakout Trend and Topics 200 million users as of 2011, generating over 200 million tweets and handling over 1.6 billion search queries per day 37

Data Products Splunk Degradation, Failure Detection Identify Security Breach Event Monitoring Troubleshoot Tools Cross-platform Event Correlation Founded 2004, IPO in 2012 38

How to become a Data Scientist? 39

The Life of Data (state-of-the-art) Users Interface Collect Clean Integrate Analysis Visualization Data Sources 40

Challenges in Data Science Find Data (Data Collection, Quality) Prepare Data (Noisy, Incomplete, Diverse, Streaming ) Analyze Data (Scalable, Accurate, Real-time, Advanced Methods, Probabilities and Uncertainties...) Represent Analysis Results (i.e. data product) (Story-telling, Interactive, explainable ) 41

Skill Set of a Data Scientist Data Management Data collection, storage, cleaning, filtering, integration Large-scale Parallel Data Processing Parallel computing Statistics and Machine Learning Data modeling, inference, prediction, pattern recognition Interface and Data Visualization HCI design, visualization, story-telling 42

Who are hiring Data Scientists Now? 43

Sexy Job in the next 10 years The sexy job in the next ten years will be The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that s going to be a hugely important skill. -- Hal Varian, Google Chief Economist, 2009 44

Who s hiring Data Scientist? IT companies: Google, Twitter, Lexis/Nexis, Facebook, Pivotal/EMC Media and Financial sectors Fox, CNN, NYT, Bloomburg, Research: Biology, Medicine, Physics, Psychology, Information office in government and corporations Law firms: e-discovery tools 45

46 Books on Data Science

Additional Reading Pointers Data Science Summit (Strata) (http://www.datascientistsummit.com/ ) Kaggle Competitions (http://www.kaggle.com/) Data Science course at Berkeley (http://datascienc.es/ ) & on Coursera (https://www.coursera.org/course/datasci/ ) 47

Summary Why now: Dawn of Big Data <<- Need for Advanced Analytics and Cloud Computing What is it: Data ->> Data Product, many examples incl. Google, Netflix, Splunk, LinkedIn How to become: Data management, parallel computing and data processing, statistical machine learning, and visualization skills Life of Data Who are hiring: Data Scientists are in great demands, from industry to government to science. 48

Announcements Course Website: http://www.cise.ufl.edu/class/cis6930fa14pro New registration slots will be available @ 4pm TODAY For first year MS students with no data mining background: please first take Introduction to Data Science 49

Next Few Lectures Graph Algorithms and Graph Databases Offline queries: page-rank over Web link structure Online queries: SPARQL (sub-graph matching) over RDF stores Shortest paths (social network transactions) Text and Image Extraction and Retrieval Features Extraction Bag of Words model + Solr indexing More advanced: Entity and event extractions/classifications Knowledge Base Construction, Querying, Integration and Mining General, Health, Ecology, Medical, Rule/Constraint Learning More advanced: Uncertainty Management and reasoning 50