Big Data Analytics Building Blocks. Simple Data Storage (SQLite)
|
|
|
- Dwain Conley
- 10 years ago
- Views:
Transcription
1 CSE6242 / CX4242: Data & Visual Analytics Big Data Analytics Building Blocks. Simple Data Storage (SQLite) Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos
2 What is Data & Visual Analytics? 2
3 What is Data & Visual Analytics? No formal definition! 2
4 What is Data & Visual Analytics? No formal definition! Polo s definition: the interdisciplinary science of combining computation techniques and interactive visualization to transform and model data to aid discovery, decision making, etc. 2
5 What are the ingredients? 3
6 What are the ingredients? Need to worry (a lot) about: storage, complex system design, scalability of algorithms, visualization techniques, interaction techniques, statistical tests, etc. Used to be simpler before this big data era. Why? 3
7 What is big data? Why care? ( big data is buzz word, so is IoT - Internet of Things)
8 (Fall 14) What is big data? Why care? Many companies businesses are based on big data (Google, Facebook, Amazon, Apple, Symantec, LinkedIn, and many more) Web search Rank webpages (PageRank algorithm) Predict what you re going to type Advertisement (e.g., on Facebook) Infer users interest; show relevant ads Infer what you like, based on what your friends like Recommendation systems (e.g., Netflix, Pandora, Amazon) Online education Health IT: patient records (EMR) Bio and Chemical modeling: Finance Cybersecruity Internet of Things (IoT)
9 Good news! Many big data jobs What jobs are hot? Data scientist Emphasize breadth of knowledge This course helps you learn some important skills
10 Big data analytics process and building blocks
11 Collection Cleaning Integration Analysis Visualization Presentation Dissemination
12 Building blocks, not steps Collection Cleaning Integration Analysis Visualization Presentation Dissemination Can skip some Can go back (two-way street) Examples Data types inform visualization design Data informs choice of algorithms Visualization informs data cleaning (dirty data) Visualization informs algorithm design (user finds that results don t make sense)
13 How big data affects the process? Collection Cleaning Integration Analysis Visualization Presentation Dissemination The 4V of big data (now 7Vs) Volume: billions, petabytes are common Velocity: think Twitter, fraud detection, etc. Variety: text (webpages), video (e.g., youtube), etc. Veracity: uncertainty of data
14 Schedule Collection Cleaning Integration Analysis Visualization Presentation Dissemination
15 Two analytics examples
16 NetProbe: Fraud Detection in Online Auction WWW 2007 NetProbe
17 NetProbe: The Problem Find bad sellers (fraudsters) on ebay who don t deliver their items $$$ Seller Buyer Auction fraud is #3 online crime in 2010 source: 14
18 15
19 NetProbe: Key Ideas Fraudsters fabricate their reputation by trading with their accomplices Fake transactions form near bipartite cores How to detect them? 16
20 NetProbe: Key Ideas Use Belief Propagation F A H Fraudster Accomplice Darker means more likely Honest 17
21 NetProbe: Main Results 18
22 19
23 19
24 Belgian Police 19
25 20
26 What analytics process does NetProbe go through? Collection Scraping (built a scraper / crawler ) Cleaning Integration Analysis Design detection algorithm Visualization Presentation Dissemination Paper, talks, lectures Not released
27 Discovr app (iphone & ipad)
28
29 What analytics process would you go through to build the app? Collection Cleaning Integration Analysis Visualization Presentation Dissemination
30 Homework 1 (out next week) Collection Cleaning Integration Analysis Visualization Presentation Dissemination Simple End-to-end analysis Collect data from Rotten Tomatoes (using API) Movies (Actors, directors, related movies, etc.) Store in SQLite database Transform data to movie-movie network Analyze, using SQL queries (e.g., create graph s degree distribution) Visualize, using Gephi Describe your discoveries
31 Data Collection. How? Download Low effort API Scrape/Crawl High effort 26
32 Data you can just download Yahoo Finance (csv) StackOverflow (xml) Yahoo Music (KDD cup) Atlanta crime data (csv) Soccer statistics More on course website: 27
33 Data via API Twitter (small subset) Last.fm (Pandora has unofficial API) Flickr Facebook (your friends only) CrunchBase (database about companies) Rotten Tomatoes not free anymore :-( itunes More on course website: 28
34 Data that needs scraping Amazon (reviews, product info) ESPN Google Scholar ebay Google Play More on course website: 29
35 How to Scrape? Google Play example Goal: build network of similar apps 30
36 Most popular embedded database in the world iphone (ios), Android, Chrome (browsers), Mac, etc. Self-contained: one file contains data + schema Serverless: database right on your computer Zero-configuration: no need to set up!
37 How does it work? >sqlite3 database.db sqlite> create table student(ssn integer, name text); sqlite>.schema CREATE TABLE student(ssn integer, name text); ssn name 32
38 How does it work? insert into student values(111, "Smith"); insert into student values(222, "Johnson"); insert into student values(333, "Obama"); select * from student; ssn name 111 Smith 222 Johnson 333 Obama 33
39 How does it work? create table takes (ssn integer, course_id integer, grade integer); ssn course_id grade 34
40 How does it work? More than one tables - joins E.g., create roster for this course ssn name ssn course_id grade 111 Smith 222 Johnson 333 Obama
41 How does it work? select name from student, takes where student.ssn = takes.ssn and takes.course_id = 6242; ssn name ssn course_id grade 111 Smith 222 Johnson 333 Obama
42 SQL General Form select a1, a2,... an from t1, t2,... tm where predicate [order by...] [group by...] [having...] 37
43 Find ssn and GPA for each student select ssn, avg(grade) from takes group by ssn; ssn course_id grade ssn avg(grade)
44 What if slow? Build an index to speed things up. SQLite s indices use B-tree data structure. O(logN) speed for adding/finding/deleting an item create index student_ssn_index on student(ssn); 39
Big Data Analytics Building Blocks. Simple Data Storage (SQLite)
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Big Data Analytics Building Blocks. Simple Data Storage (SQLite) Duen Horng (Polo) Chau Georgia Tech Partly based on materials
Big Data Analytics Building Blocks; Simple Data Storage (SQLite)
Big Data Analytics Building Blocks; Simple Data Storage (SQLite) Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Jan 9, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John
Big Data Analytics Process & Building Blocks
Big Data Analytics Process & Building Blocks Duen Horng (Polo) Chau Georgia Tech CSE 6242 A / CS 4803 DVA Jan 10, 2013 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos
Scaling Up 2 CSE 6242 / CX 4242. Duen Horng (Polo) Chau Georgia Tech. HBase, Hive
CSE 6242 / CX 4242 Scaling Up 2 HBase, Hive Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data
Scaling Up HBase, Hive, Pegasus
CSE 6242 A / CS 4803 DVA Mar 7, 2013 Scaling Up HBase, Hive, Pegasus Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
Text Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 3, 2014 Text Analytics (Text Mining) LSI (uses SVD), Visualization Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey
REAL ESTATE TECH TRENDS
By Properties Online, Inc. 2014 Real Estate Tech Trends Properties Online, Inc. has compiled important statistical information for the real estate community. Statistical sources include the 2013 National
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Data Science Overview Why, What, How, Who Outline Why Data Science?
Reducing Usage on a Service Plan
Reducing Usage on a Service Plan When purchasing a service plan you should carefully choose your data allowances based on the amount you estimate for business or personal use. However, people rarely account
Google Product. Google Module 1
Google Product Overview Google Module 1 Google product overview The Google range of products offer a series of useful digital marketing tools for any business. The clear goal for all businesses when considering
CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview
CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 1 Course Overview DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 1 Course Staff Instructor Da
Cut The TV Cable. Paul Glattstein
Cut The TV Cable by Paul Glattstein Cut The TV Cable by Paul Glattstein Television is a medium because it is neither rare nor well done. Ernie Kovacs Overview Disconnect from Cable or Satellite TV What's
Shafiq Khan. An Introduction to. Cloud Computing 13/12/2012
Shafiq Khan An Introduction to Cloud Computing 13/12/2012 Who we are. > Part of East Lancashire Chamber of Commerce > Web design agency established in 1998 > Customer focused with web solutions for every
The Age of BYOD A study of personal content streaming vs. video-on-demand for the hospitality industry
1 The Age of BYOD A study of personal content streaming vs. video-on-demand for the hospitality industry Conducted by Hotel Internet Services January 2015 2015 Hotel Internet Services All Rights Reserved
Impressive Analytics
Impressive Analytics and Insight on a Shoestring Lisa Ikariyama & Tracy Anderson Cabbage Tree Creative www.cabbagetree.co.nz Getting Started Before you design a page on your website or get started with
How to use the Cloud
How to use the Cloud Intro You may have heard advertisements, articles, or tech savvy people talk about the cloud and didn t know what that meant. Have no fear, we re here to teach you. Sometimes the cloud
media kit 2014 PUBLISH / DEVELOP Global Mobile Ad Network
media kit 2014 PUBLISH / DEVELOP Global Mobile Ad Network WHY MOBILE PUBLISHING Proliferation of smartphone devices and tablets is shifting the way that customers use Internet, making advertising a key
Building HTML5 and hybrid mobile apps using cloud services. Andrei Glazunov
Building HTML5 and hybrid mobile apps using cloud services Andrei Glazunov About Exadel Exadel is a global software engineering company. Founded in 1998, headquarters in San Francisco Bay Area 7 development
INTRODUCTION TO THE WEB
INTRODUCTION TO THE WEB A beginner s guide to understanding and using the web 3 September 2013 Version 1.2 Contents Contents 2 Introduction 3 Skill Level 3 Terminology 3 Video Tutorials 3 How Does the
What is your age? 18-30 31-40 41-50 51-60 61-70 71+ Are you? Male Female
Q1 What is your profession? Optometrist Optician Ophthalmologist Executive at Chain HQ Optical Product Buyer at Chain HQ Owner Lab Wholesaler/Distributor Lab/Distribution Executive Retail Manager Regional
Wireless Presentation Gateway. User Guide
User Guide Table of Contents 1 Initial Setup Present Anything Without Wires p. 3 2 From A Laptop (Windows or Mac) First, download he client p. 4 Now connect p. 5 Additional Features p. 6 3 From An ios
COMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
Clarity High School Student Survey
Clarity High School Student Survey Instructions Take 10 minutes to help your school with technology in the classroom. This is an anonymous survey regarding your technology use. It will take approximately
Computer Forensics Application. ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification
Computer Forensics Application ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification Project Overview A new framework that provides additional clues extracted from images
The Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
Online Marketing Module COMP. Certified Online Marketing Professional. v2.0
= Online Marketing Module COMP Certified Online Marketing Professional v2.0 Part 1 - Introduction to Online Marketing - Basic Description of SEO, SMM, PPC & Email Marketing - Search Engine Basics o Major
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications. Outline. We ll learn: Faloutsos CMU SCS 15-415
Faloutsos 15-415 Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications C. Faloutsos Lecture#1: Introduction Outline Introduction to DBMSs The Entity Relationship model The Relational
Social Media Marketing UCSB Extension
Social Media Marketing UCSB Extension Instructor: Amber Wallace Class dates: October 10 th October 24 th Amber Wallace // 805.681.1930 // [email protected] 1 Welcome! Introductions Class Overview
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
Enter Here -->>> App Store Tracking, Track your Rankings - AppStoreShark.com Scam or Work? Visit Here
Enter Here -->>> App Store Tracking, Track your Rankings - AppStoreShark.com Scam or Work? Visit Here Getting instant access app store tracking is there a find my friends app for android and iphone android
U.S. Mobile Benchmark Report
U.S. Mobile Benchmark Report ADOBE DIGITAL INDEX 2014 80% 40% Methodology Report based on aggregate and anonymous data across retail, media, entertainment, financial service, and travel websites. Behavioral
HOW TO IMPROVE YOUR WEBSITE FOR BETTER LEAD GENERATION Four steps to auditing your site & planning improvements
HOW TO IMPROVE YOUR WEBSITE FOR BETTER LEAD GENERATION Four steps to auditing your site & planning improvements A good website will do incredible things for business to business (B2B) companies. Over the
Developing and deploying mobile apps
Developing and deploying mobile apps 1 Overview HTML5: write once, run anywhere for developing mobile applications 2 Native app alternative Android -- Java ios -- Objective-C Windows Mobile -- MS tools
Guest Quick Guide PC and Mac Users Updated to version 3.5.8 March 2015
Guest Quick Guide PC and Mac Users Updated to version 3.5.8 March 2015 Table of Contents Welcome to imeet 3 Sign in and join a meeting (via the imeet desktop app) 4 Sign in and join a meeting (via a Web
Big Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Mobile Learning Apps. Distance Learning [email protected] (860) 343 5756 Founders 131/131A Middlesex Community College http://mxcc.
Mobile Learning Apps Distance Learning [email protected] (860) 343 5756 Founders 131/131A Middlesex Community College http://mxcc.edu/ett 4/10/2014 Mobile Learning Basics Basics in Mobile Devices Settings
RESTful or RESTless Current State of Today's Top Web APIs
RESTful or RESTless Current State of Today's Top Web APIs Frederik Buelthoff, Maria Maleshkova AIFB, Karlsruhe Ins-tute of Technology (KIT), Germany [1] Growing Number of Web APIs Challenges Scalability
Website, Blogs, Social Sites : Create web presence in the world of Internet [email protected], June 21, 2015.
Website, Blogs, Social Sites : Create web presence in the world of Internet [email protected], June 21, 2015. www.myreaders.info Return to Website Create Presence on Internet and World Wide Web. This article
STATE OF THE MEDIA: CONSUMER USAGE REPORT
STATE OF THE MEDIA: CONSUMER USAGE REPORT 2011 The U.S. Media Universe DEVICE OWNERSHIP 1 (millions of people who own) MOBILE & ONLINE CONSUMERS (millions of users) At least one TV 290 Mobile phone (ages
Digital marketing strategy
Digital marketing strategy You don t need a digital strategy, you need a business strategy for the digital age Judy Goldberg, Sony Pictures #HRVision14 Branding goes Digital Understand your brand before
Grow Your Business wi w t i h a a Mobil i e l A p A p
Grow Your Business with a Mobile App About Us» WizzApps.com is one of the leading Mobile App and Website Development Company that is powering small businesses around the world on daily bases.» We create
Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data
Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data Yinong Chen 2 Big Data Big Data Technologies Cloud Computing Service and Web-Based Computing Applications Industry Control Systems
Online Marketing Strategies
Online Marketing Strategies SHOOT2SELL PHOTOGRAPHY D/FW: 214.272.3200 SA / AUSTIN: 210.200.8984 [email protected] WWW.SHOOT2SELL.NET Essentials Mobile-Friendly Website Professional Design Professional
Clarity Middle School Survey
Clarity Middle School Survey Instructions Take 10 minutes to help your school with technology in the classroom. This is an anonymous survey regarding your technology use. It will take approximately 10
Mobile App Framework For any Website
Mobile App Framework For any Website Presenting the most advanced and affordable way to create a native mobile app for any website The project of developing a Mobile App is structured and the scope of
Business Challenges and Research Directions of Management Analytics in the Big Data Era
Business Challenges and Research Directions of Management Analytics in the Big Data Era Abstract Big data analytics have been embraced as a disruptive technology that will reshape business intelligence,
ONLINE ACCOUNTABILITY FOR EVERY DEVICE. Quick Reference Guide V1.0
ONLINE ACCOUNTABILITY FOR EVERY DEVICE Quick Reference Guide V1.0 TABLE OF CONTENTS ACCOUNT SET UP Creating an X3watch account DOWNLOADING AND INSTALLING X3WATCH System Requirements How to install on a
Title/Description/Keywords & Various Other Meta Tags Development
Module 1 - Introduction Introduction to Internet Brief about World Wide Web and Digital Marketing Basic Description SEM, SEO, SEA, SMM, SMO, SMA, PPC, Affiliate Marketing, Email Marketing, referral marketing,
SEO, Search Engine and Online Reputation Management
SEO, Search Engine and Online Reputation Management Presented By: Kathleen Dorsey, Founder & CEO, Global Results & Inner Circle Member 1-800-408-0093 x 210 [email protected] Today s Topics Overview
Millennial Teens: Non-Conformist Trendsetters
Millennial Teens: Non-Conformist Trendsetters Millennial teens (ages 16-19) are the first generation to have no memory of life before the Internet and cell phones. Technology use defines this age group,
Web 2.0 Technology Overview. Lecture 8 GSL Peru 2014
Web 2.0 Technology Overview Lecture 8 GSL Peru 2014 Overview What is Web 2.0? Sites use technologies beyond static pages of earlier websites. Users interact and collaborate with one another Rich user experience
MY DIGITAL PLAN MY DIGITAL PLAN BROCHURE
MY DIGITAL PLAN BROCHURE Digital Marketing Overview What is marketing? What is digital marketing and why is it required? Traditional marketing v/s Digital marketing How to do it? Visibility of my brand
How Using Big Data in Security Helps (and Hurts) Us
How Using Big Data in Security Helps (and Hurts) Us Kerry Matre, CIPP/US What is your role? 2 $300 Billion 3 What do insiders look like? 4 Basic insider threat monitoring Block IPs Signature based monitors
More information >>> HERE <<<
More information >>> HERE http://dbvir.com/pcpandora/pdx/cc131/ Tags: pc pandora scam or work?-- vmware
The Best Mobile App Development Platform. Period.
The Best Mobile App Development Platform. Period. Native Apps. Code-Free. Cross-Platform. In Hours. It s a Block Party and everyone s invited! Use snap together building blocks to quickly and easily assemble
2010 Brightcove, Inc. and TubeMogul, Inc Page 2
Background... 3 Methodology... 3 Key Findings... 4 Online Video Streams... 4 Engagement... 5 Discovery... 5 Distribution & Engagement... 6 Special Feature: Brand Marketers & On-Site Video Initiatives...
ITP 140 Mobile Technologies. Mobile Topics
ITP 140 Mobile Technologies Mobile Topics Topics Analytics APIs RESTful Facebook Twitter Google Cloud Web Hosting 2 Reach We need users! The number of users who try our apps Retention The number of users
The State Of Mobile Apps
The State Of Mobile Apps Created for the AppNation Conference with Insights from The Nielsen Company s Mobile Apps Playbook by The Nielsen Company Introduction Most Americans can t imagine leaving home
CUSTOM MOBILE APP DEVELOPMENT & LICENSING
CUSTOM MOBILE APP DEVELOPMENT & LICENSING We believe in empowering users and professionals with the best mobile solutions. We do this by building beautifully designed products that are easy to use with
Searching the Social Network: Future of Internet Search? [email protected]
Searching the Social Network: Future of Internet Search? [email protected] Alessio Signorini Who am I? I was born in Pisa, Italy and played professional soccer until a few years ago. No coffee
Minimum Requirements for Web Based Applications
Recommended Browsers Skyward recognizes the diverse Operating Systems, Devices, and Internet browsers our customers are using. While we want every customer to have the best possible experience, we recognize
About Blue Sky Sessions
Web Technologies Agenda About Blue Sky Sessions What We Do Web Development Application Development Search Engine Marketing Social Media Strategy Trends in Web Questions? About Blue Sky Sessions What We
Getting Started with Oracle Data Miner 11g R2. Brendan Tierney
Getting Started with Oracle Data Miner 11g R2 Brendan Tierney Scene Setting This is not about DB log mining This is an introduction to ODM And how ODM can be included in OBIEE (next presentation) Domain
DEVELOPING A COMPREHENSIVE E-MARKETING STRATEGY USING 3 POPULAR ONLINE CHANNELS
DEVELOPING A COMPREHENSIVE E-MARKETING STRATEGY USING 3 POPULAR ONLINE CHANNELS Your Brand & Your Goals Three Popular Online Channels Search Engine Optimization (SEO) Article Marketing & Link Building
Introduction to Web Science
Web Science & Technologies University of Koblenz Landau, Germany Introduction to Web Science http://west.uni-koblenz.de/teaching/ws1213/webscience Prof. Dr. 1 Web Science & Technologies University of Koblenz
120 Broadus Ave, Greenville, SC 29601 (864) 248-0886 DOM360.com. The Composition of a
120 Broadus Ave, Greenville, SC 29601 The Composition of a Digital Ad Budget: How much to spend and where to spend it There is no doubt about it, digital ad budgets are on the move upward and Digital will
Social Intelligence Report Adobe Digital Index Q2 2015
Social Intelligence Report Adobe Digital Index Q2 2015 Key Insights Paid Social Cost per click (CPC) rates for Facebook are flat YoY while impressions fell by half and click through rates doubled 51% of
MARKETING KUNG FU SEO: Key Things to Expand Your Digital Footprint. A Practical Checklist
MARKETING KUNG FU SEO: Key Things to Expand Your Digital Footprint A Practical Checklist 1 1. Content Development... Page 3 2. Content Organization... Page 4 3. META Data... Page 5 4. Fix Errors... Page
Doing Multidisciplinary Research in Data Science
Doing Multidisciplinary Research in Data Science Assoc.Prof. Abzetdin ADAMOV CeDAWI - Center for Data Analytics and Web Insights Qafqaz University [email protected] http://ce.qu.edu.az/~aadamov 16 May
