Leveraging Big Data in Introductory Programming Courses. David Reed Department of Journalism, Media & Computing Creighton University



Similar documents
How To Learn To Use Big Data

Big Data: A Closer Look

SOCIAL MEDIA ADVERTISING STRATEGIES THAT WORK

Strategies for Effective Tweeting: A Statistical Review

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Measure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by

Doing Multidisciplinary Research in Data Science

World Community Grid Engaging the members

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

SENTIMENT ANALYZER. Manual. Tel & Fax: info@altiliagroup.com Web:

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

6 TWITTER ANALYTICS TOOLS. SOCIAL e MEDIA AMPLIFIED

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

CHAPTER 24 Communicating with the Web

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Social Media & Human Resources: Strategic Partners for Success. By Meredith Chapman and Holly Norton

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

Tech Presentation 2016

Social Media Creating an Approach That Will Bring You More Business

Sunnie Chung. Cleveland State University

Engaging mobile consumers in emerging markets

Digital Marketing Social Media Marketing

The Experts Guide to Keyword Research for Social Media. A WordStream Guide

Social Media Marketing. Hours 45

Leveraging Social Media

Using R for Social Media Analytics

5 CREATIVE MARKETING & SEO TRENDS FOR 2016

BIG DATA: BIG BOOST TO BIG TECH

MINING DATA FROM TWITTER. Abhishanga Upadhyay Luis Mao Malavika Goda Krishna

Grapevine Chamber of Commerce. Social Media Plan. Current as of July 21, 2015

Forget the hype Ways to actually use Social Media to benefit your business

The Future of Telecom Digital Advertising. What Telecom Digital Advertisers Need to Know about First-Party Data and Targeted Advertising

Social Networking in Drupal

JamiQ Social Media Monitoring Software

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

Cloud Computing and Social Networks

30 Minutes to Automated Online Income (ver. 2)

96 PD Predictive Modeling: Now What? Moderator: Kara L. Clark, FSA, MAAA

Big Data Scoring. April 2014

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

CS 6220: Data Mining Techniques Course Project Description

Running HPCC in a Virtual Machine. Boca Raton Documentation Team

Why dread a bump on the head?

Developing the SMEs Innovative Capacity Using a Big Data Approach

Social Media. A brief overview of the Social Media module

Using Social Media, Webinars, Blogs, and More to Boost Your Small Business on a Budget

Introduction to Social, Mobile, and Local Marketing

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015

DST4L Class Notes: April 4, 2013 Presenter: David Dietrich

Big Data (Adv. Analytics) in 15 Mins. Peter LePine Managing Director Sales Support IM & BI Practice

Predictive Analytics: Turn Information into Insights

DIGITAL MARKETING STRATEGY. Setting up, Raising Awareness For and Monitoring Social Media

Predicting outcome of soccer matches using machine learning

FANTASY ALARM Providing Value to Your Brand Delivering Our Audience for Your Services

Predicting the NFL Using Twitter. Shiladitya Sinha, Chris Dyer, Kevin Gimpel, Noah Smith

RESEARCH 33 STATS TO KNOW WHEN MARKETING TO THE CONSTRUCTION INDUSTRY

How To Understand The Benefits Of Big Data

Clemson Athletics Social Media Plan. Semmes Gilmore Kate Velten

PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA

A Statewide Survey on Computing Education Pathways and Influences: Factors in Broadening Participation in Computing.

Leveraging Global Media in the Age of Big Data

Using Social Media to Improve Your SEO

INDIVIDUAL MASTERY for: St#: Test: CH 9 Acceleration Test on 29/07/2015 Grade: B Score: % (35.00 of 41.00)

Big Systems, Big Data

INDIVIDUAL MASTERY for: St#: Test: CH 9 Acceleration Test on 09/06/2015 Grade: A Score: % (38.00 of 41.00)

Adobe Social Product Capabilities. Publish Anywhere

Creating Mobile Apps Development and Deployment

iphone/ipad App Business Plan

Machine Learning for Understanding User Behaviours. Semi-Supervised Learning Applied to Click Streams

Technology to Use in Your Coaching

Simplifying the Challenges of Mobile Device Security Three Steps to Reduce Mobile Device Security Risks

Sentiment Analysis on Big Data

TEACHING AN APPLIED BUSINESS INTELLIGENCE COURSE

Table of Contents. June 2010

Séminaire du LaDHUL. Periklis Andritsos, «Big Data Challenges, Opportunities and Avenues of Research, or How did I grow up talking to data»

SOCIAL MEDIA ENGAGEMENT STRATEGIES

The 4 Pillars of Technosoft s Big Data Practice

Predictive Analytics Certificate Program

Electronic Communications & Social Media Recommendations Exeter High School

The Center for Research Libraries

Big Data / Privacy: Pick One?

2015 Ironside Group, Inc. 2

Dean College Social Media Handbook

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

Keyword Research for Social Media

The Rival Results Index (RRI)

Teaching Precursors to Data Science in Introductory and Second Courses in Statistics

Data Science Certificate General Information About Completion

DIGITAL AND SOCIAL MEDIA TRAINING WORKSHOPS

Cook County Bureau of Technology

Data Storage - Should We Use It?

Analysis of Tweets for Prediction of Indian Stock Markets

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

OCBC BANK FIRST TO LAUNCH BIOMETRIC AUTHENTICATION FOR A BUSINESS BANKING MOBILE APP

Big Data Systems CS 5965/6965 FALL 2014

Sunnie Chung. Cleveland State University

What to Look for When Evaluating Next-Generation Firewalls

The EBA Stress Test data set

How To Blog

How To Analyze Sentiment On A Microsoft Microsoft Twitter Account

Transcription:

Leveraging Big Data in Introductory Programming Courses David Reed Department of Journalism, Media & Computing Creighton University

Big Data is big! working definition: the storage & processing of massive amounts of data in order to extract useful information 2 it s all over the news n NSA phone & email surveillance n DHS mining social media n Google & Facebook collecting and studying online behavior n CS Principles course includes using large data sets to explore and discover information and knowledge it s driving the demand for faster computers, expanded storage media, innovative algorithmic approaches and capable technology specialists

Big Data is motivating generally accepted that students (especially girls) are motivated by real-world problems n even an intro programming level, real-world data can make programming more interesting & relevant unfortunately, finding accessible data sets is not easy n often stored in compressed or proprietary formats n may require special tools for data access & viewing while specialized formats & tools may be justified at an advanced level, beginning students need transparency 3

Three examples this session will present three Big Data examples n used in introductory programming classes n students programmed in Python, but any language would do n the data sets enable students to write simple programs to answer interesting questions the data sets vary in size and complexity n ENABLE online wordlist (1 MB) n NBA/WNBA player stats (5-10 MB) n Twitter hashtags from 2006-2009 (1.8 GB) 4

ENABLE Dictionary http://www.puzzlers.org/pub/wordlists/enable1.txt ENABLE (Enhanced North American Benchmark Lexicon) is a public domain list of 172,823 words n often used for word games such as Scrabble, Boggle, 5 when teaching lists/arrays in intro course, can make simple tasks more interesting n old school: how many grades in a list > 90? n new school: how many 4-letter words in English? longest & most common word lengths? first and last words (alphabetically)? how many anagrams of stale?

Student questions Python s built-in list-processing make it easy to pose and answer interesting questions (as do other languages) 6

Sports stats http://www.nba.com/statistics/ http://www.wnba.com/statistics/ sports stats can be interesting to some (but not all) students know your audience the NBA & WNBA maintain extensive databases of team and players stats online n not easily downloadable, but can filter as desired then copy-and-paste into a CSV file 7

NBA scoring 8

Student questions could easily generalize to consider other stats and/or other years 9

Twitter Hashtags http://www.infochimps.com/datasets/twitter-censusconversation-metrics-one-year-of-urls-hashtags-sm 10 Infochimp has collected Twitter data from 3/06 to 11/09 n every token (hashtag/url/smiley) from 500M+ tweets, organized by month or by hour hashtag 200910 1 csta hashtag 200909 4 cstagego hashtag 200907 1 cstar hashtag 200811 1 cstc hashtag 200812 5 cstc hashtag 200903 3 cstc

Finally, BIG data! the Twitter file contains 41M+ lines, takes 1.8 GB n too big for standard viewing tools (e.g., Word, Excel) n too slow to process in its entirety first task for students is to filter the data n filter out hashtag entries only, separate by year hashtags2006 à 4 KB hashtags2007 à 49 KB hashtags2008 à 1.9 MB hashtags2009 à 33.4 MB 11

Student questions can then pose interesting, historical questions (e.g., Google sixwords, blogher) 12

Other sources more datasets at www.infochimps.com/datasets n 60K+ UFO sightings n crime rates by state n Social Security benefits & payments detailed major league baseball play-by-play data at http://www.retrosheet.org/game.htm airplane departure/arrival/delay data at stat-computing.org/dataexpo/2009/the-data.html U.S. census data at www.census.gov/main/www/access.html 13

From the audience World Health Organization data repository (from Barb Ericson) n http://www.who.int/gho/database/en/ IBM s Many Eyes data repository (from Barb Ericson) n http://www-958.ibm.com/software/analytics/manyeyes/ datasets SAS s Data Depot, as part of its Curriculum Pathways (from Mason Matthews) n http://www.sascurriculumpathways.com/portal/launch? id=3001 14