What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015



Similar documents
What is Data Science? Data, Databases, and the Extraction of Knowledge Renée November 2014

An interdisciplinary model for analytics education

Integrating a Big Data Platform into Government:

Sunnie Chung. Cleveland State University

Proposal for New Program: BS in Data Science: Computational Analytics

ANALYTICS CENTER LEARNING PROGRAM

The 4 Pillars of Technosoft s Big Data Practice

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

The Future Of Computing

Statistics for BIG data

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Data tells stories, and business analytics depends on data. Hearing the stories in the data, however, only happens when you have people with the

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

2015 Workshops for Professors

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Capturing Meaningful Competitive Intelligence from the Social Media Movement

Insight Data Science: Bridging the gap between academia and industry. Josiah Walton Physics Careers Seminar UIUC Department of Physics April 23, 2015

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Master Specialization in Knowledge Engineering

Manu Marketing. Boutique marketing services. Web Development & SEO Social Media Marketing Video Production Mobile Applications

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Doing Multidisciplinary Research in Data Science

The following are the measurable objectives for graduated computer science students (ABET Standards):

Advanced In-Database Analytics

COMP9321 Web Application Engineering

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

ESS event: Big Data in Official Statistics

The Big Data Paradigm Shift. Insight Through Automation

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Big Data a threat or a chance?

Machine Learning for Data Science (CS4786) Lecture 1

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

21 SERIOUSLY COOL CAREERS THAT NEED MATHS.

Big Data & Netflix. Paul Ellwood February 9th, 2015

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Statistics Meets Big Data 統 計 遇 見 大 數 據

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Big data and its transformational effects

Big Data. Fast Forward. Putting data to productive use

Social Media: Managing Your Online Presence

Big Data Analytics. David Dietrich, EMC Education Services. April 4, 2013

Hexaware E-book on Predictive Analytics

Big Data Analytics Process & Building Blocks

Key Findings Advanced, Predictive Analytics Breaking the Barriers to Adoption

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

not think the same. So, the consumer, at the end, is the one that decides if a game is fun or not. Whether a game is a good game.

NetView 360 Product Description

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Big Data and Analytics: Challenges and Opportunities

Social Media. A brief overview of the Social Media module

Introduction to Data Mining

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

5 Tips for Growing Your Business with Social. Social Marketing in Action at T.H. March

Performing a data mining tool evaluation

SURVEY REPORT DATA SCIENCE SOCIETY 2014

MSCA Introduction to Statistical Concepts

Office: LSK 5045 Begin subject: [ISOM3360]...

Department of Computer Science School of Arts and Science.

Outline. What is Big data and where they come from? How we deal with Big data?

Social Media Creating an Approach That Will Bring You More Business

Social Media Measurement Meeting Robert Wood Johnson Foundation April 25, 2013 SOCIAL MEDIA MONITORING TOOLS

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Essential Communication Methods for Today s Public Social Media 101. Presented by Tom D. Trimble, CIO Tulsa County Government

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Big Data and Data Science: Behind the Buzz Words

UC Small Farm Program Agritourism Webinar Series. Social Media Kristin York - Instructor June 2, 2016

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Big Data and Society: The Use of Big Data in the ATHENA project

CIS 8695 Big Data Analytics

Data UNC. Vinayak Deshpande

Deep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Data Science and Big Data: Below the Surface and Implications for Governance

WORKING RELATIONSHIP: This position reports directly to the President with mandatory check-in 1-2 times per week via phone or person.

Transcription:

What is Data Science? { Girl Develop It! Meetup Renée M. P. Teate, March 2015

Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa _Big_Data.jpg https://encryptedtbn2.gstatic.com/images?q=tbn:and9gcs9dku3_tzi-swwyaqee5y0ehuvoiznsya_raknubbd0jyxpx7pw http://fc01.deviantart.net/fs71/i/2012/326/3/4/cute_dog_by_tho masmeadows345-d5lsah9.jpg http://www.freefoto.com/images/1351/06/1351_06_2---books-- Shakespeare-and-Company-Bookstore--The-Latin-Quarter-- Paris_web.jpg

http://upload.wikimedia.org/wikipedia/commons/9/96/bill_nye,_barack_obama_and_neil_degrasse_tyson_selfie_2014.jpg https://c1.staticflickr.com/1/2/1349370_07 03fce74c.jpg https://c2.staticflickr.com/4/3273/3017878633_65beb1c7d6.jpg http://upload.wikimedia.org/wikipedia/commons/e/e4/gr een_bank_100m_diameter_radio_telescope.jpg

Around 100 hours of video are uploaded to YouTube every minute it would take about 15 years to watch every video uploaded in one day AT&T is thought to hold the world s largest volume of data in one unique database its phone records database is 312 terabytes in size, and contains almost 2 trillion rows. Every minute we send 204,000,000 emails, generate 1,800,000 Facebook likes, send 278,000 Tweets, and up-load 200,000 photos to Facebook 570 new websites spring into existence every minute of every day. http://smartdatacollective.com/bernardmarr/277731/big-data-25-facts-everyone-needs-know

http://pixabay.com/static/uploads/photo/2014/03/13/01/12/datacen ter-286386_640.jpg https://c2.staticflickr.com/2/1296/533233247_b6baa30fdb_z.jpg?zz=1

https://c1.staticflickr.com/3/2300/2596366618_2d6cb01735.jpg http://upload.wiki media.org/wikipedi a/commons/9/90/ke ncf0618facebookne twork.jpg http://upload.wikimedia.org/wikipedia/commons/b/bf/usda_hardine ss_zone_map.jpg http://upload.wikimedia.org/wikipedia/commons/1/1c/cms_higgs-event.jpg

Databases You Use Pretty much every website you interact with Social Media Online Shopping Banking Course Registration/Canvas File Sharing Travel Search Engines Etc. etc. etc.. You broadcast/generate data everywhere you go Cell phones Email Purchases Posting status updates Driving (GPS) Attending events Streaming music Etc. etc. etc..

https://www.google.com/maps/@38.8905569,-77.1721577,13z/data=!5m1!1e1 http://upload.wikimedia.org/wikipedia/commons/6/69/netflix_logo.svg https://c2.staticflickr.com/4/3324/3507973704_563846fe14_z.jpg?zz=1 How is data collected about you used to help you?

Who builds these systems?

Data Scientist Computer Scientist Data collection systems Machine Learning Algorithms Interface Design Design/Manage/Query Databases Data Aggregation Data Mining Mathematician Statistical Models Evaluation Metrics Predictive Analytics Data Visualizations Business Person Domain Expertise Knowing what questions to ask Interpreting results for business decisions Presenting outcomes Examples not a complete definition, and not all simultaneously necessary skills

Data Science Venn Diagram by Drew Conway http://static.squarespace.com/static/5150aec6e4b0e340ec52710a/t/51525c33e4b0b3e0d10 f77ab/1364352052403/data_science_vd.png?format=750w

From Doing Data Science by Cathy O Neill & Rachel Schutt http://www.becomingadatascientist.com/wpcontent/uploads/2014/06/ds_profile.png http://semanticommunity.info/@api/deki/files/27057/figure1-4.png?size=bestfit&width=484&height=541&revision=1 No need to be a unicorn, but do need to know something about all of these areas, and become expert in some

Some other names for Data Scientist Statistician Data Mining Specialist Biostatistician Social Science Researcher Big Data Analyst Spatial/GIS Analyst Natural Language Programmer Computational Physicist Pythonista Financial Analyst Recommendation System Engineer Information Architect Artificial Intelligence Researcher Neuroscientist Data Visualization Designer

Data Science jobs pay an average of $118,000 per year It is estimated that by 2018, US could have a shortage of 140,000+ people with advanced analytical skills & need 1.5M managers/analysts that can make decisions based on data analysis

Extraction of Knowledge Also known as knowledge discovery Goes beyond queries Data Mining Business Understanding Data Understanding Data Preparation Modeling Clustering Classification Regression Evaluation From Data Science for Business by Provost & Fawcett Images from ODU ECE 607 Lecture Slides by Prof. Jiang Li

Video clip: Interview with Neha Kothari, LinkedIN Data Scientist http://youtu.be/8dxke5cghda?t=17s

Examples Galaxy Classification using Convolutional Neural Networks http://benanne.github.io/2014/04/05/galaxy-zoo.html Choosing Facebook Audience for Content Promotion using Random Forests Predicting Wine Quality with Principal Component Analysis http://fastml.com/predicting-wine-quality/ Readmission Risk Score to decide which patients to give additional follow-up help at Mt. Sinai hospital http://citizennet.com/blog/2012/11/10/random-forestsensembles-and-performance-metrics/ http://www.technologyreview.com/news/518916/ahospital-takes-its-own-big-data-medicine/

http://xkcd.com/1425/

How to get started

Topics to learn about Programming Math Any language is good to start with. Gain core understanding. Python or R data analysis experience a plus Database design, SQL Calculus Linear Algebra Statistics Advanced: Optimization / Linear Programming Research and Analysis Others Science involving data collection and interpretation Working with messy real life data Business Analytics Data Mining Business / Communication Graphic Design

Read, read, read Doing Data Science by Cathy O Neil* & Rachel Schutt Data Science for Business by Forster Provost & Tom Fawcett Data Smart by John Foreman* (uses Excel) I review other books as I read them: http://www.becomingadatascientist.com/learning/ Blogs & News Feeds (FlowingData.com is a good one to start with) Twitter look for curated lists of people to follow https://twitter.com/becomingdatasci/lists/women-in-datascience/members *on Twitter and willing to chat!

Free Online Courses Python Fundamentals Codecademy http://www.codecademy.com/tracks/python Machine Learning Coursera / Stanford https://www.coursera.org/course/ml Data Analyst Nanodegree Udacity https://www.udacity.com/course/nd002 (includes Hadoop mini-course) Applied Data Mining and Statistical Learning Penn State https://onlinecourses.science.psu.edu/stat857/ Pretty comprehensive list here: http://www.kdnuggets.com/education/online.html TED talks on Data http://www.ted.com/search?q=data Susan Etlinger* http://www.ted.com/talks/susan_etlinger_what_do_we_do_with_all_this_big_data Need to spend more time on critical thinking skills [because we have the] potential to make bad decisions far more quickly, efficiently, and with far greater impact than we did in the past. we need to be clear about..the methodologies that we use, because if I don't know what questions you asked, I don't know what questions you didn't ask.

Explore Volunteer to Analyze Data (DataKind) Play with public data sets http://101.datascience.community/2014/10/17/data-sources-for-cool-datascience-projects-part-1-guest-post/ https://www.opensciencedatacloud.org/publicdata/ http://catalog.data.gov/dataset https://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&nu matt=&numins=&type=&sort=nameup&view=table Data Science Competitions (Kaggle also has knowledge competitions for learning)

Questions? Renee Teate renee.parilak@gmail.com, @becomingdatasci http://www.becomingadatascientist.com