2 Tammy Pirmann HS CS teacher in PA NSF RET in Big Data with Temple University Teach CS Principles course Slobodan Vucetic Temple University NSF research project involving Big Data education through the pipeline
3 CS Principles: Big Idea III. Data: Data and information facilitate the creation of knowledge. CSTA K-12 Standards (5.3.A): CT 4. Compare techniques for analyzing massive data collections CPP 11. Describe techniques for locating and collecting small and large-scale data sets Job growth for data scientists
4 Slobodan Vucetic teaches a great graduate level course at Temple that uses BIG data sets He created an undergrad course based on the successful grad course He and I worked together so I would understand the data sets and how college students work with them I wrote a unit for HS students based on his undergrad course
5 In my class, this unit follows a few basic App Inventor tutorials and a lesson on abstraction Students have varying degrees of comfort with spreadsheets and databases Students have read the first two chapters of Blown to Bits by Abelson, Ledeen, Lewis
6 1. Orient activate, motivate, prepare 2. Explore observe, analyze 3. Form Concepts questions 4. Apply examples and problems 5. Close reflect and assess
7 Wear two hats Take on the role of student and see how the student interacts with the material Remain an educator and think about what you can use in your situation Break into groups of 4, making sure that at least one member has a device and the files
9 In 2009 Netflix offered a $1,000,000 prize to the team that could create a movie recommendation system that was 10% better than their existing one. That prize went to BellKor s Pragmatic Chaos". In this activity, we will explore a smaller (but still very large) set of movie data to explore how data can be used to generate useful information.
10 Why is a movie recommendation system worth a million dollars to Netflix?
11 There are three interrelated sets of data The movies
12 The people The ratings
13 1. What scale is being used for recommendations? How many stars?
14 2. What information are we keeping track of for each movie?
15 3. Can a person rate more than one movie?
16 4. What information are we capturing from our users? How are we capturing this data?
17 What additional movie data would be useful?
18 What might we want to know about the people doing the ratings?
19 Is it possible for a movie to never be rated? What effect does that have?
20 How would you go about determining which movie is the "best" movie?
21 Why would people rate movies?
22 Who might use this data, and how?
23 Discuss and agree on three potential problems inherent in an online rating and recommendation system. Be prepared to report out to the class. Discuss and agree on three questions you would like the answers to based on this data. Are there any additional data points needed in order to answer any of your questions? What additional data points would it have been helpful to have access to?
24 Open the people text file How is it formatted? What type of file would you expect this type of data to be in? Why? Open the movie text file How is it different?
25 These three file are related to each other The people rated movies We will make three tabs in one excel file Video tutorial * I have a completed Excel workbook available on the next day for scaffolding, absents, etc.
26 What happened with the vote data? It turns out that Excel has limits There are too many rows in the vote data to be imported into Excel Google spreadsheets can only handle 400,000 cells!
27 One thing you should have noticed is that the data does not have any labels We need to create field labels for this data Let s start with the people tab: What do you think are good labels for the columns of data? The movie tab presents a significant problem We have a file called a read-me file that tells us what each column is
28 I have a question can we trust this data? Can I use it to say The data shows that males between 12 and 24 prefer action movies over romance movies? Do I have confidence in the demographic data? Use the sort function to sort the people data on age. What do you notice?
29 We break into small groups based on previous experience with Excel I teach sort, filter, the count function, renaming tabs Students then use this to determine the percent of people who have probably lied on the form: liars/all people
30 The original groups of students choose a question they wrote down on the first day They now determine how to go about getting the answer to that question from the data This is an analysis plan, not the actual analysis (since some of them have questions that may need a more powerful tool)
31 What genre of movies do people like me give the highest ratings to? We need to determine people like me from the people data We then need to find all the ratings provided by them We need to put those ratings into genre buckets
32 Basic formulas Advanced filtering
33 Spreadsheets gave us more tools than the text file Databases give us more tools than the spreadsheet We have a database on our computers as part of Microsoft Office Open Access
34 We will import our original txt files into Access Each file will become a table in the database The people file has an id for each person which will be defined as our primary key The movie file has an id for each movie which will be defined as our primary key The ratings file has the people id and the movie id, but no ratings id.
35 Each record in the database needs to be able to be identified The primary key is how we identify each record Since each movie can only be rated by a person once, the combination of person id and movie id can be the primary key for our vote table
36 A relational database is one where the tables of data are related to each other by the primary keys Our tables are related through the vote table The primary key of the people table is present in the vote table The primary key of the movie table is present in the vote table
37 The simplest query to write is one based on one table We will use a query to recreate a sort and filter we had done in Excel Using the people table, let s look only at the people who entered an age we consider valid Sort these records by age We can hide the postal code if we are not using it
38 Go back to your written analysis plan Write the query iteratively Start with one table and get that query working Add more complexity to your query in small chunks, checking for accuracy at each step
39 After using the Movie data to teach spreadsheets and databases, we change data sets lest the students believe that big data and recommendation systems are synonomous The Portland data is even larger than the movie data and represents the movements of the people of Portland Oregon over a 24 hour period
40 Locations - The city is divided into a grid with each square given a numeric representation Demographics - Each person has an id and demographic data associated with them Activities Each type of activity is given a numeric representation Time - measured in seconds past midnight
41 With this information, what can we learn?
42 Is it possible there are questions that we should not ask?
43 This data could be used by an urban planner to determine if the city needs a large venue in a particular part of the city It could also be used to determine if a major highway needs more capacity It could be used to predict where utilities are most needed by hour
44 This type of data could show drivers which roads have the most traffic on them Data can show us how much time people spend on their commute Companies can use this type of data to determine where to open a franchise
45 The group of students brainstorm and provide the teacher three proto-concepts for deeper analysis of the Portland data Teacher returns the concepts with one chosen for the group (to eliminate duplication) Students work together to develop that concept and find the answers in the data Students report out to the class what they did and the results
46 Two options to allow you to scaffold the project to different ability levels Both options have the same format Proposal Data acquisition Data analysis plan Final report
47 Proposal Includes the community, the question you want to answer, why it should be answered, what data will be collected and how the answer will be provided to the community Data collection plan and form Data analysis plan Final report
48 Proposal Data collection plan and form Create a form in Google Docs, disseminate the form via , forums, link on website, etc Data analysis plan Final report
49 Proposal Data collection plan and form Data analysis plan You have several tools at your disposal to analyze your data. Decide which tools you will use and why. Develop the queries, sorts and filters that you will use when the data is collected. Be sure your data analysis plan covers the main questions that originally prompted you to collect this data. Final report
50 Proposal Data collection plan and form Data analysis plan Final report Produce a written report back to the community to share the information discovered by your analysis of the data provided by the community. This report should use illustrations or charts where appropriate
51 The only difference for option 2 is that the student will find/access existing data There are many large data sets available from the government Some organizations may also have raw data for the student to work with (Scouts, church groups, etc)
52 The Portland data could be used with Processing to create animated graphs of people movement What s your idea?
Using Google Analytics Overview Google Analytics is a free tracking application used to monitor visitors to your website in order to provide site designers with a fuller knowledge of their audience. At
Creating and Managing Online Surveys LEVEL 2 Accessing your online survey account 1. If you are logged into UNF s network, go to https://survey. You will automatically be logged in. 2. If you are not logged
Verizon Cloud Desktop Application Help Guide Version 4.1 72 CONTENTS I. Getting started... 4 A. Verizon Cloud... 4 B. Main navigation... 5 C. System requirements... 5 D. Installing the application... 6
In-Depth Guide Advanced Spreadsheet Techniques Learning Objectives By reading and completing the activities in this chapter, you will be able to: Create PivotTables using Microsoft Excel Create scenarios
EXCEL XML SPREADSHEET TUTORIAL In this document we are going to work with our Excel XML spreadsheet that we used in Video 1. You have now identified your shoe products and have one image of each of these
Result Entry by Spreadsheet User Guide Created in version 2007.3.0.1485 1/50 Table of Contents Result Entry by Spreadsheet... 3 Result Entry... 4 Introduction... 4 XML Availability... 4 Result Entry...
TIBCO Spotfire Metrics Modeler User s Guide Software Release 6.0 November 2013 Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO SOFTWARE
Master Data Services SQL Server 2012 Books Online Summary: Master Data Services (MDS) is the SQL Server solution for master data management. Master data management (MDM) describes the efforts made by an
Text Analytics using Tableau The following tutorial will show you how to perform text analysis in Tableau 8.2. To get started, you will need the following: Tableau version 8.2 Data: Political Speeches.xlsx
Using NVivo to Manage Qualitative Data R e i d Roemmi c h R HS A s s e s s me n t Office A p r i l 6, 2 0 1 5 Introductions Please share: Your name Department Position and brief description of what you
This scope and sequence is aligned to the Common Core State Standards requirements for Mathematics and English Language Arts & Literacy in History/Social Studies, Science, and Technical Subjects as well
Kingsoft Spreadsheet 2012 Kingsoft Spreadsheet is a flexible and efficient commercial spreadsheet application. It is widely used by professionals in many fields such as: Business, Finance, Economics and
Origin Basics Topography of an Origin Project and Workspace When you start Origin, a new project opens displaying a worksheet window in the workspace. The worksheet is one type of window available in Origin.
ISBN: 978-1-921885-15-0 Produce Spreadsheets Excel 2010 BSBITU304A Software Publications Writing Team Produce Spreadsheets Supporting BSBITU304A Produce Spreadsheets in the Business Services Training Package.
Spreadsheet vs. Database: Hidden cost of a spreadsheet solution Andrew Powell Paisley Software Solutions, Inc. Reading this could save you 25% of your salary overhead! Today more than ever, companies employ
Spreadsheet Parts of a Spreadsheet 1. Open the AppleWorks program. Select spreadsheet. 2. Explore the spreadsheet setup for a while. Active Cell Address Entry Bar Column Headings Row Headings Active Cell
Managing Multi-Valued Attributes in Spreadsheet Applications Clare Churcher, Theresa McLennan and Wendy Spray Lincoln University New Zealand firstname.lastname@example.org Abstract End-users frequently
Getting Started with SharePoint Online for Small Business By Robert Crane Computer Information Agency http://www.ciaops.com Terms This Guide from the Computer Information Agency is provided as is. Every
How to make a line graph using Excel 2007 Format your data sheet Make sure you have a title and each column of data has a title. If you are entering data by hand, use time or the independent variable in
How to Import Data into Microsoft Access This tutorial demonstrates how to import an Excel file into an Access database. You can also follow these same steps to import other data tables into Access, such
Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing
CS 587 Software Project Management Instructor: Dr. Atef Bader MS Project Tutorial MS Project in Labs: Available in Siegal Hall Lab in Main Campus Available in Room 210 Rice Campus Prepared by Milton Hurtado
Creating an Excel Spreadsheet for Mail Merge Excel Spreadsheet Mail Merge 1 of 9 Creating an Excel Database for a Mail Merge 1. To create a database for a mail merge you will first need to open Microsoft
Power Editor Guide February 29, 2012 Why should I use Power Editor?...2 Get started in 5 easy steps...3 Power Editor buttons and features.................................... 4 Navigate within Power Editor....6
PTC Technical Specialists E-Newsletter Date: April 1, 2008 PTC Product Focus: A) Tolerance Analysis in Wildfire 4.0 Tips of the Month: B) Windchill Export Utility : Exporting Data to Excel A) Quick Start
Contents Foreword Introduction xix xxi Part I Measuring Success 1 Chapter 1 Why Understanding Your Web Traffic Is Important to Your Business 3 Website Measurement Why Do This?... 4 Information Web Analytics
Lower School The goal of the Lower School Computer curriculum is to continue to develop technology skills for both academic learning and personal productivity in the 21 st century. Students learn the science
Google Apps for Education in Sapienza: a Francesco Barcellona Centro InfoSapienza service for students 26/06/14 Pagina 2 Students and teachers of Sapienza 190.000 of students and 10.000 teachers already
INTRODUCTION TO CONSTANT CONTACT Email Marketing Program Technology Learning Centers Bucks County Community College www.bucks.edu/tlc Step 1: Log In To access Constant Contact, open a browser such as Internet
Reliance Communications, Inc. 603 Mission Street Santa Cruz, CA 95060 888-527-5225 www.schoolmessenger.com Contents Contents... 2 Before you Begin... 4 Advanced Lists... 4 List Builder... 4 Create a List...
Challenges of Data Privacy in the Era of Big Data Rebecca C. Steorts, Vishesh Karwa Carnegie Mellon University November 18, 2014 1 Outline Why should we care? What is privacy? How do achieve privacy? Big
In the Curriculum Tech Ed & Art An engaging, real-life project for exploring spreadsheets with middle school students Subject: Math, tech ed, art Audience: Teachers, teacher educators Technology: digital
Microsoft Project Server 2010 Project Manager s Guide for Project Web App Copyright This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web
Chapter 4. Spreadsheets We ve discussed rather briefly the use of computer algebra in 3.5. The approach of relying on www.wolframalpha.com is a poor subsititute for a fullfeatured computer algebra program
MODELLING IF THEN IF...THEN Function EXCEL 2007 Wherever you see this symbol, make sure you remember to save your work! IF.Then Function Some functions do not calculate values but instead do logical tests
Using Excel to find Perimeter, Area & Volume Level: LBS 4 V = lwh Goal: To become familiar with Microsoft Excel by entering formulas into a spreadsheet in order to calculate the perimeter, area and volume
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
The Secret Formula for Webinar Presentations that Work Every Time by Gihan Perera www.webinarsmarts.com Sponsored by In an online presentation, your slides aren t visual aids; they ARE the visuals. One
5.1 NAME: Wants, Wants, Wants: Now and Later Part 1 In the spaces below, make a list of 10 goods and services that you want to have. Part 2 Now take your list of 10 goods and services and divide the items
(or, Why You Don t Need MS Project) Using Microsoft Office to Manage Projects will explain how to use two applications in the Microsoft Office suite to document your project plan and assign and track tasks.
4 Steps to Select a Fixed Asset Management & Depreciation Solution By Matthew P. Kennedy About the Author: Matthew Kennedy has analyzed, developed and tested software applications for over 20 years. Matthew
You have made a smart decision in choosing Lab Escape s Heat Map Explorer. Over the next 30 minutes this guide will show you how to analyze your data visually. Your investment in learning to leverage heat
Directions for the Well Allocation Deck Upload spreadsheet OGSQL gives users the ability to import Well Allocation Deck information from a text file. The Well Allocation Deck Upload has 3 tabs that must
INTRODUCTION: You can extract data (i.e. the total cost report) directly from the Truck Tracker SQL Server database by using a 3 rd party data tools such as Excel or Crystal Reports. Basically any software
time and effort by using our efficient and robust components instead of developing your own. lets you open, create, save and convert files from within your application without Microsoft Excel, confident
Excel Integrated Reporting Copyright statement Sage (UK) Limited, 2012. All rights reserved We have written this guide to help you to use the software it relates to. We hope it will be read by and helpful
Intro to Excel spreadsheets What are the objectives of this document? The objectives of document are: 1. Familiarize you with what a spreadsheet is, how it works, and what its capabilities are; 2. Using
MS Project Tutorial for Senior Design Using Microsoft Project to manage projects Overview: Project management is an important part of the senior design process. For the most part, teams manage projects
PoW-TER Problem Packet A Phone-y Deal? (Author: Peggy McCloskey) 1. The Problem: A Phone-y Deal? [Problem #3280] With cell phones being so common these days, the phone companies are all competing to earn
MiraCosta College Computer Studies Department ~ CSIT: Computer Studies and Information Technology Course Catalog Effective Fall 2010 (pending final approval) Table of Contents CSIT Courses... 3 CSIT 100:
CSTA- Oracle Academy 2014 U.S. High School CS Survey: The State of Computer Science in U.S. High Schools: an Administrator s Perspective Methodology: The Computer Science Teachers Association (CSTA), in
PRILINK http://www.prilink.com Tel: 905-882-4488 1-866-261-0649 Fax: 905-597-1139 Sales@prilink.com Support@prilink.com Table of Contents Overview...3 Important Terms...3 Units of Digital Information...3
TheFinancialEdge Configuration Guide for Accounts Receivable 101711 2011 Blackbaud, Inc. This publication, or any part thereof, may not be reproduced or transmitted in any form or by any means, electronic,
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
Spreadsheet Professional Quick Start Guide Contact Details Adrian Miric Mobile: +27 83 272 2552-1 - Introduction The aim of this document is too give you a quick introduction on using Spreadsheet Professional.
with its unique General Ledger Drill Down feature, for any accounting software. Get the detail of an account balance, and the detail of a transaction, from the financial statements built with Excel FSM.
DBMS / Business Intelligence, Business Intelligence / DBMS Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the
2014 VALUATION FILING REQUIREMENTS DOMESTIC LIFE INSURANCE COMPANIES Absent a specific request from the Department, a Valuation Filing need not be submitted to the Bureau of Life Insurance if all of the
Task Manager Increase Task Name to 20-characters and Description to 40-characters Add Folder and Sub-Folders to organize Tasks Add tree-view look up for Tasks and Actions Add Email Library lookup for E-Mail
ProperSync 1.3 User Manual Rev 1.2 Contents Overview of ProperSync... 3 What is ProperSync... 3 What s new in ProperSync 1.3... 3 Getting Started... 4 Installing ProperSync... 4 Activating ProperSync...
Creating Surveys Using SurveyMonkey Prior to creating your survey, you should log-on to SurveyMonkey and create a free user account. 1. Log in to your SurveyMonkey account. a. Click on the tab Create Survey.
Exploring Relationships between Highest Level of Education and Income using Corel Quattro Pro Created by Michael Lieff (email@example.com) Faculty of Education, Queen s University While on practicum at Statistics
Website Development and Design: Real World Experience Debra Oglethorpe University CRS410 Internship in Communications Debra, Web Content Intern December 10, 2012 Experience Website Development and Design:
Stewardship Data Management User Guide (for Data Provider) ECCnet ProSYNC 10.6 Copyright 2013 GS1 Canada 1 Stewardship Data Management User Guide The information contained in this document is privileged
Create or customize CRM system dashboards Make the data that matters easily accessible to your teams in Microsoft Dynamics CRM For admins For admins Information is the oil of the 21st century, and analytics
SpreadSheet Inside Spreadsheet flexibility, database consistency This paper illustrates how the TimeScape SpreadSheet Inside can bring unstructured spreadsheet data and complex calculations within a centralised
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
Using the Bulk Export/Import Feature Through Bulksheet Export and Import, agencies have the ability to download complete campaign structures and statistics across multiple clients and providers, and to
CLIENT BOOKKEEPING SOLUTION ACCOUNTS RECEIVABLE Getting Started version 2007.x.x TL 20746 (06/09/09) Copyright Information Text copyright 1997 2009 by Thomson Reuters/Tax & Accounting. All rights reserved.
The Definitive Guide to Google AdWords Create Versatile and Powerful Marketing and Advertising Campaigns a ii a Bart Weller Lori Calcott Apress* Contents y About the Author About the Technical Reviewer
TESTING Producer Individual License Application Java Desktop App 7/21/2004 Entered demographic data Some fields need to be larger, digits are getting cut off. On Next No assumed names or agency affiliations
Big Data Analytics Process & Building Blocks Duen Horng (Polo) Chau Georgia Tech CSE 6242 A / CS 4803 DVA Jan 10, 2013 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos
33 DISCOUNTED PRESENT VALUE Purpose: To illustrate the idea of discounted present value with computations of the value of payments to be received in the future at different rates of interest. To use discounted
Visualizing Clinical Trial Data Matt Becker, SAS Institute ABSTRACT Today, all employees at health and life science corporations may need access to view operational data. There may be visualization needs
Smart Web User Guide Amcom Software, Inc. Copyright Version 4.0 Copyright 2003-2005 Amcom Software, Inc. All Rights Reserved. Information in this document is subject to change without notice. The software
CYBERSECURITY, CYBERSAFETY AND THE K-12 COMPUTER SCIENCE NATIONAL STANDARDS INTRODUCTION The invention of the computer in the 20 th century was a once in a millennium event, comparable in importance to
The Internet, the Web, and Electronic Commerce Chapter 2 2014 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets Karin LaPann ViroPharma Incorporated ABSTRACT Much functionality has been added to the SAS to Excel procedures in SAS version 9.
Curriculum Map (Computer Technology 9 th grade) Month August Main Concept (General terms) Operating Systems. 1. What is the difference between operating system software and Application Software 3. Print
Virtual Phone System User Guide v4.7 169 Saxony Road, Suite 212 Encinitas, CA 92024 Phone & Fax: (800) 477-1477 Welcome! Thank you for choosing FreedomVoice. This User Guide is designed to help you understand
Managing your course via the Current Students page My Class Timetable: How to view, plan and edit your timetable online Find out how to Login to My Class Timetable View your Class Schedule Register into
Using Ecwid to Build an Online Store Ecwid provides all you need for a one-stop online shop, including a built-in 'drag and drop' shopping cart, the recording of customer registration details, destination
Data Intensive Computing CSE 486/586 Project Report BIG-DATA CONTENT RETRIEVAL, STORAGE AND ANALYSIS FOUNDATIONS OF DATA-INTENSIVE COMPUTING Masters in Computer Science University at Buffalo Website: http://www.acsu.buffalo.edu/~mjalimin/
Level 3 Spreadsheet software (SS 7574-327) ITQ (QCF) Assignment guide for Candidates Assignment B www.cityandguilds.com January 2012 Version 1.0 About City & Guilds City & Guilds is the UK s leading provider
Errors in Operational Spreadsheets: A Review of the State of the Art Stephen G. Powell Tuck School of Business Dartmouth College firstname.lastname@example.org Kenneth R. Baker Tuck School of Business Dartmouth College
"Viewing account details" on page 22 "Viewing transactions on the Account Details page" on page 25 Viewing accounts Viewing accounts The following information appears on the Home page for each account:
Adagio GridView Academy Session #5 Advanced Topics Data Dictionary Exporter \Softrak\System\DataDictionaryExporter.exe Generate a field list for each table in an Excel workbook Include the /h parameter