The Social Science Data Revolution

Size: px
Start display at page:

Download "The Social Science Data Revolution"

Transcription

1 The Social Science Data Revolution Gary King Department of Government Harvard University (Horizons in Political Science talk, Government Department, Harvard University, 3/30/11) Gary King (Harvard, Gov Dept) The Data Revolution 1 / 24

2 The Changing Evidence Base of Social Science Research The Last 50 Years: Survey research Aggregate government statistics In depth studies of individual places, people, or events The Next 50 Years: Spectacular increases in new data sources, due to... Much more of the above improved, expanded, and applied Shrinking computers & the growing Internet: data everywhere The replication movement: academic data sharing (e.g., Dataverse) Governments encouraging data collection, distribution, experimentation (e.g., GovData) Advances in statistical methods, informatics, & software The march of quantification: through academia, professions, government, & commerce (SuperCrunchers, The Numerati, MoneyBall) Gary King (Harvard, Gov Dept) The Data Revolution 2 / 24

3 Examples of what s now possible Opinions of activists: A few thousand interviews millions of political opinions in social media posts (1B tweets/week) Exercise: A survey: How many times did you exercise last week? 500K people carrying cell phones with accelerometers Social contacts: A survey: Please tell me your 5 best friends continuous record of phone calls, s, text messages, bluetooth, social media connections, electronic address books Economic development in developing countries: Dubious or nonexistent governmental statistics satellite images of human-generated light at night, or networks of roads and other infrastructure Expert-vs-Statistician contests: Whenever enough information is quantified (& a right answer exists), stats wins every time Many, many more... Gary King (Harvard, Gov Dept) The Data Revolution 3 / 24

4 How to make progress in the new data-rich world? 1 Computer-assisted methods: Traditional quantitative-only or qualitative-only approaches are infeasible 2 Large-scale, interdisciplinary, collaborative research 3 New statistical methods & engineering required 4 Better theory: to respond to massive new evidence, privacy challenges, data-driven science Bigger changes in the practice of social science then ever before Gary King (Harvard, Gov Dept) The Data Revolution 4 / 24

5 Two Examples of Automated Text Analysis Gary King (Harvard, Gov Dept) The Data Revolution 5 / 24

6 Example 1: How to Read a Billion Blog Posts (& Classify Deaths without Physicians) Daniel Hopkins and Gary King. Extracting Systematic Social Science Meaning from Text AJPS, commercialized via: Gary King and Ying Lu. Verbal Autopsy Methods with Multiple Causes of Death, Statistical Science used by (among others): Gary King (Harvard, Gov Dept) The Data Revolution 6 / 24

7 Data and Quantities of Interest Input Data: Large set of text documents (e.g., all English language blog posts) Categories (posts about US candidates): extremely negative, negative, neutral, positive, extremely positive, no opinion, not a blog A small training set of documents hand-coded into the categories Quantities of interest Computer science: individual document classifications (spam filters, Google searches) Social Science: proportion in each category (proportion of which is spam; proportion extremely negative comment about Pres Bush) Estimation Can get the 2nd by counting the 1st (if 1st is accurate) High classification accuracy unbiased category proportions 70% classification accuracy is high disaster for category proportions New methodology: unbiased category proportions, even when classification accuracy is low Gary King (Harvard, Gov Dept) The Data Revolution 7 / 24

8 Out-of-sample Comparison: 60 Seconds vs. 8.7 Days Affect in Blogs Estimated P(D) Actual P(D) Gary King (Harvard, Gov Dept) The Data Revolution 8 / 24

9 Reactions to John Kerry s Botched Joke You know, education if you make the most of it... you can do well. If you don t, you get stuck in Iraq. Affect Towards John Kerry Proportion Sept Oct Nov Dec Jan Feb Mar Gary King (Harvard, Gov Dept) The Data Revolution 9 / 24

10 Example 2: Computer-Assisted Reading Justin Grimmer and Gary King General-Purpose Clustering and Conceptualization Proceedings of the National Academy of Sciences. Conceptualization through Classification: one of the most central and generic of all our conceptual exercises.... Without classification, there could be no advanced conceptualization, reasoning, language, data analysis or,for that matter, social science research. (Bailey, 1994). Cluster Analysis: simultaneously (1) invents categories and (2) assigns documents to categories Gary King (Harvard, Gov Dept) The Data Revolution 10 / 24

11 What s Hard about Clustering? (Why Johnny Can t Classify) Goal: Computer-assisted conceptualization & clustering Bell(n) = number of ways of partitioning n objects Bell(2) = 2 (AB, A B) Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C) Bell(5) = 52 Bell(100) Number of elementary particles in the universe Now imagine choosing the optimal classification scheme by hand! Available compromises pursue different goals: Standard Approach: Fully automated methods no method works well in general; impossible to know which to apply! Our Approach: Computer-assisted methods You, not some computer algorithm, decides what s important, but with help Gary King (Harvard, Gov Dept) The Data Revolution 11 / 24

12 Switch from Fully Automated to Computer Assisted Computer-Assisted Clustering Easy in theory: list all clusterings; choose the best Impossible in practice: Too hard for us mere humans! An organized list will make the search possible Insight: Many clusterings are perceptually identical E.g.,: consider two clusterings that differ only because one document (of 10,000) moves from category 5 to 6 Question: How to organize clusterings so humans can understand? Gary King (Harvard, Gov Dept) The Data Revolution 12 / 24

13 How to Zoom Out while Reading You choose one (or more) clustering, based on insight, discovery, useful information,... Ford Obama Clustering 1 Space of Clusterings mixvmf Clustering 2 Nixon kmedoids stand.euc affprop info.costs Carter Johnson Carter Eisenhower Truman kmeans correlation rock hclust correlation single hclust pearson single affprop maximum Ford Eisenhower Truman Roosevelt Johnson Kennedy Clinton Roosevelt ``Reagan Republicans'' ``Other Presidents '' Bush kmeans binary hclust maximum single hclust canberra binary hclust centroid correlation pearson hclust median centroid correlation median hclust correlation pearson mcquitty average hclust kendall single hclust euclidean centroid kmeans hclust kendall canberra mcquitty hclust canberra binary median affprop biclust_spectral manhattan cosine mspec_max hclust canberra binary single mspec_canb hclust canberra average kmeans pearson hclust binary average divisive stand.euc mspec_cos som hclust manhattan centroid hclust kmedoids spearman maximum manhattan kendall manhattan centroid median single hclust euclidean median hclust correlation pearson complete hclust hclust manhattan spearman kendall maximum euclidean average median complete mcquitty average single affprop hclust euclidean manhattan hclust mcquitty average euclidean average hclust spearman single divisive euclidean kmedoids euclidean hclust spearman average mspec_mink mspec_euc hclust binary complete mcquitty divisive manhattan mspec_man hclust euclidean complete mcquitty hclust kendall complete hclust correlation hclust canberra ward complete clust_convex hclust spearman kendall mcquitty hclust euclidean dismea ward hclust binary ward hclust canberra ward hclust spearman complete hclust manhattan complete spec_canb hclust kendall ward mixvmfva spec_cos hclust manhattan ward spec_euc spec_man kmeans euclidean kmeans manhattan hclust pearson ward hclust spearman ward kmeans spearman spec_max hclust maximum ward kmeans maximum spec_mink Nixon ``Roosevelt To Carter'' Kennedy Bush Obama `` Reagan To Obama '' Reagan HWBush kmeans canberra Clinton HWBush mult_dirproc Reagan Gary King (Harvard, Gov Dept) The Data Revolution 13 / 24

14 Evaluation: More Informative Discoveries Found 2 scholars analyzing lots of textual data for their work Created 6 clusterings: 2 clusterings selected with our method (biased against us) 2 clusterings from each of 2 other methods (varying tuning parameters) Created info packet on each clustering (for each cluster: exemplar document, automated content summary) Asked for ( 6 2) =15 pairwise comparisons User chooses only care about the one clustering that wins Both cases a Condorcet winner: Immigration : Our Method 1 vmf 1 vmf 2 Our Method 2 K-Means 1 K-Means 2 Genetic testing : Our Method 1 {Our Method 2, K-Means 1, K-means 2} Dir Proc. 1 Dir Proc. 2 Gary King (Harvard, Gov Dept) The Data Revolution 14 / 24

15 Evaluation: What Do Members of Congress Do? Gary King (Harvard, Gov Dept) The Data Revolution 15 / 24

16 Example Discovery mult_dirproc hclust canberra ward kmeans correlation sot_cor divisive stand.euc mixvmf hclust correlation mixvmfva mcquitty hclust binary complete hclust pearson single affprop cosine hclust pearson median hclust correlation single hclust pearson mcquitty mec hclust pearson average hclust correlation complete hclust correlation median hclust binary single hclust correlation average hclust pearson complete hclust binary average kmeans pearson hclust pearson correlation centroid centroid rock som hclust binary median hclust canberra single hclust binary mcquitty biclust_spectral hclust spearman complete spec_cos spec_man kmeans hclust canberra kendall median spec_mink spec_max spec_euc affprop maximum hclust canberra average mspec_mink mspec_man spec_canb kmeans spearman kmeans manhattan mspec_max mspec_canb mspec_cos mspec_euc kmeans hclust binary canberra centroid hclust hclust hclust spearman kendall spearman kendall median single single average hclust spearman hclust median centroid kendall mcquitty mcquitty hclust canberra centroid hclust kendall complete hclust hclust hclust euclidean kmedoids single affprop manhattan manhattan hclust hclust manhattan euclidean manhattan median centroid divisive median single average hclust manhattan hclust maximum hclust euclidean single hclust manhattan euclidean centroid mcquitty average clust_convex hclust correlation ward hclust euclidean kmedoids mcquitty euclidean hclust pearson kmedoids wardstand.euc hclust maximum divisive hclust euclidean median centroid maximum affprop average euclidean hclust canberra mcquitty hclust hclust euclidean maximum complete complete hclust manhattan hclust maximum complete mcquitty dist_minkowski dist_ebinary dist_fbinary dist_binary dist_canb dist_max dist_cos dismea hclust manhattan ward affprop info.costs kmeans hclust euclidean euclidean ward hclust canberra complete sot_euc hclust binary ward hclust hclust kendall spearman ward ward hclust maximum ward kmeans binary kmeans maximum Gary King (Harvard, Gov Dept) The Data Revolution 16 / 24

17 In Sample Illustration of Partisan Taunting Taunting ruins deliberation - Senator Lautenberg Blasts Republicans as Chicken Hawks [Government Oversight] Sen. Lautenberg on Senate Floor 4/29/04 Gary King (Harvard, Gov Dept) The Data Revolution 17 / 24

18 Out of Sample Confirmation of Partisan Taunting - Discovered using 200 press releases; 1 senator. Gary King (Harvard, Gov Dept) The Data Revolution 18 / 24

19 Out of Sample Confirmation of Partisan Taunting - Discovered using 200 press releases; 1 senator. - Confirmed using 64,033 press releases; 301 senator-years. - Apply supervised learning method: measure proportion of press releases a senator taunts other party Frequency Prop. of Press Releases Taunting Gary King (Harvard, Gov Dept) The Data Revolution 19 / 24

20 Normative Implications of Taunting Partisan taunting: Very common Makes deliberation less likely Occurs more often in homogeneously partisan districts (i.e., when preaching to the choir) Incompatibility of the principles of representative democracy To get reflection: Homogeneous (noncompetitive) districts To get deliberation (no taunting): Heterogeneous (competitive) districts you can t have both! Gary King (Harvard, Gov Dept) The Data Revolution 20 / 24

21 Some New Data Types 1 Unstructured text: s (1 LOC every 10 minutes), speeches, government reports, blogs, social media updates, web pages, newspapers, scholarly literature 2 Commercial activity: credit cards, sales data, and real estate transactions, product RFIDs 3 Geographic location: cell phones, Fastlane or EZPass transponders, garage cameras 4 Health information: digital medical records, hospital admittances, google/ms health, and accelerometers and other devices being included in cell phones 5 Biological sciences: effectively becoming social sciences as genomics, proteomics, metabolomics, and brain imaging produce huge numbers of person-level variables. 6 Satellite imagery: increasing in scope, resolution, and availability. 7 Electoral activity: ballot images, precinct-level results, individual-level registration, primary participation, and campaign contributions Gary King (Harvard, Gov Dept) The Data Revolution 21 / 24

22 Some More New Data Examples 8 Social media: facebook, twitter, social bookmarking, blog comments, product reviews, virtual worlds, game behavior, crowd sourcing 9 Web surfing artifacts: clicks, searches, and advertising clickthroughs. (Google collects 1 petabyte/72 minutes on human behavior!) 10 Multiplayer web games and virtual worlds: Billions of highly controlled experiments on human behavior 11 Government bureaucracies: moving from paper to electronic data bases, increasing availability 12 Governmental policies: requiring more data collection, such e.g., No Child Left Behind Act ; allowing randomized policy experiments; Obama pushing data distribution 13 Scholarly data: the replication movement in academia, led in part by political science, is massively increasing data sharing Gary King (Harvard, Gov Dept) The Data Revolution 22 / 24

23 Enormous Emerging Opportunities for Social Scientists For the first time: technologies, policies, data, and methods are making it feasible to attack some of the most vexing problems that afflict human society A massive change from studying problems to understanding and solving problems Opportunities require a change in our job descriptions, with new: 1 Computer-assisted methods: Traditional quantitative-only or qualitative-only approaches are infeasible 2 Large-scale, interdisciplinary, collaborative research 3 New statistical methods & engineering required 4 Better theory: to respond to massive new evidence, privacy challenges, data-driven science And then there s you & me: In most legislatures, courts, academic departments,..., change comes from replacement not conversion Will we wait to be replaced? or put in the effort to convert and learn how to use the new information? Gary King (Harvard, Gov Dept) The Data Revolution 23 / 24

24 For more information Gary King (Harvard, Gov Dept) The Data Revolution 24 / 24

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks [email protected] ([email protected]) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

FOR RELEASE: MONDAY, SEPTEMBER 10 AT 4 PM

FOR RELEASE: MONDAY, SEPTEMBER 10 AT 4 PM Interviews with 1,022 adult Americans conducted by telephone by ORC International on September 7-9, 2012. The margin of sampling error for results based on the total sample is plus or minus 3 percentage

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

Model Selection. Introduction. Model Selection

Model Selection. Introduction. Model Selection Model Selection Introduction This user guide provides information about the Partek Model Selection tool. Topics covered include using a Down syndrome data set to demonstrate the usage of the Partek Model

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

OBAMA IS FIRST AS WORST PRESIDENT SINCE WWII, QUINNIPIAC UNIVERSITY NATIONAL POLL FINDS; MORE VOTERS SAY ROMNEY WOULD HAVE BEEN BETTER

OBAMA IS FIRST AS WORST PRESIDENT SINCE WWII, QUINNIPIAC UNIVERSITY NATIONAL POLL FINDS; MORE VOTERS SAY ROMNEY WOULD HAVE BEEN BETTER Tim Malloy, Assistant Director, Quinnipiac University Poll (203) 645-8043 Rubenstein Associates, Inc. Public Relations Contact: Pat Smith (212) 843-8026 FOR RELEASE: JULY 2, 2014 OBAMA IS FIRST AS WORST

More information

CLINTON OR CUOMO THUMP GOP IN 2016 NEW YORK PRES RACE, QUINNIPIAC UNIVERSITY POLL FINDS; ADOPTED DAUGHTER RUNS BETTER THAN NATIVE SON

CLINTON OR CUOMO THUMP GOP IN 2016 NEW YORK PRES RACE, QUINNIPIAC UNIVERSITY POLL FINDS; ADOPTED DAUGHTER RUNS BETTER THAN NATIVE SON Maurice Carroll, Assistant Director, Quinnipiac University Poll (203) 582-5334 Rubenstein Associates, Inc. Public Relations Contact: Pat Smith (212) 843-8026 FOR RELEASE: AUGUST 21, 2014 CLINTON OR CUOMO

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Inside the Obama Analytics Cave Andrew Claster, Deputy Chief Analytics Officer Obama for America 2011-2012 W INNING K N OWLEDGE T M

Inside the Obama Analytics Cave Andrew Claster, Deputy Chief Analytics Officer Obama for America 2011-2012 W INNING K N OWLEDGE T M Inside the Obama Analytics Cave Andrew Claster, Deputy Chief Analytics Officer Obama for America 2011-2012 W INNING K N OWLEDGE T M Political Landscape Obama faced the highest unemployment rate of any

More information

Data Mining with R. Text Mining. Hugh Murrell

Data Mining with R. Text Mining. Hugh Murrell Data Mining with R Text Mining Hugh Murrell reference books These slides are based on a book by Yanchang Zhao: R and Data Mining: Examples and Case Studies. http://www.rdatamining.com for further background

More information

Collaborations between Official Statistics and Academia in the Era of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI [email protected] What

More information

Social Media and Digital Marketing Analytics ( INFO-UB.0038.01) Professor Anindya Ghose Monday Friday 6-9:10 pm from 7/15/13 to 7/30/13

Social Media and Digital Marketing Analytics ( INFO-UB.0038.01) Professor Anindya Ghose Monday Friday 6-9:10 pm from 7/15/13 to 7/30/13 Social Media and Digital Marketing Analytics ( INFO-UB.0038.01) Professor Anindya Ghose Monday Friday 6-9:10 pm from 7/15/13 to 7/30/13 [email protected] twitter: aghose pages.stern.nyu.edu/~aghose

More information

Big Data how it changes the way you treat data

Big Data how it changes the way you treat data Big Data how it changes the way you treat data Oct. 2013 Chung-Min Chen Chief Scientist Info. Analysis Research & Services The views and opinions expressed in this presentation are those of the author

More information

BIG DATA FUNDAMENTALS

BIG DATA FUNDAMENTALS BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

THE PRESIDENCY OF GEORGE W. BUSH January 11-15, 2009

THE PRESIDENCY OF GEORGE W. BUSH January 11-15, 2009 CBS NEWS/NEW YORK TIMES POLL FOR RELEASE: Friday, January 16, 2009 6:30 pm EST THE PRESIDENCY OF GEORGE W. BUSH January 11-15, 2009 President George W. Bush will leave office with some of the most negative

More information

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data

More information

Technology has transformed the way in which science is conducted. Almost

Technology has transformed the way in which science is conducted. Almost Chapter 1 Educational Technology in the Science Classroom Glen Bull and Randy L. Bell Technology has transformed the way in which science is conducted. Almost every aspect of scientific exploration has

More information

Breen Elementary School

Breen Elementary School Breen Elementary School Day: Monday Time: 1:40 pm to 3:10 pm (90 minutes) Session A Sept 21, Sept28, Oct 5, Oct 19, Oct 26, Nov 2 No class on Oct 12 Session B Nov 30, Dec 7, Dec 14, Jan 4, Jan 11, Jan

More information

Connecting library content using data mining and text analytics on structured and unstructured data

Connecting library content using data mining and text analytics on structured and unstructured data Submitted on: May 5, 2013 Connecting library content using data mining and text analytics on structured and unstructured data Chee Kiam Lim Technology and Innovation, National Library Board, Singapore.

More information

How To Change Medicine

How To Change Medicine P4 Medicine: Personalized, Predictive, Preventive, Participatory A Change of View that Changes Everything Leroy E. Hood Institute for Systems Biology David J. Galas Battelle Memorial Institute Version

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: [email protected]

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

Analysis One Code Desc. Transaction Amount. Fiscal Period

Analysis One Code Desc. Transaction Amount. Fiscal Period Analysis One Code Desc Transaction Amount Fiscal Period 57.63 Oct-12 12.13 Oct-12-38.90 Oct-12-773.00 Oct-12-800.00 Oct-12-187.00 Oct-12-82.00 Oct-12-82.00 Oct-12-110.00 Oct-12-1115.25 Oct-12-71.00 Oct-12-41.00

More information

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division Building the Global Information System Elements of

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Big Data Analytics in Health Care

Big Data Analytics in Health Care Big Data Analytics in Health Care S. G. Nandhini 1, V. Lavanya 2, K.Vasantha Kokilam 3 1 13mss032, 2 13mss025, III. M.Sc (software systems), SRI KRISHNA ARTS AND SCIENCE COLLEGE, 3 Assistant Professor,

More information

Social Intelligence Report ADOBE DIGITAL INDEX Q4 2013

Social Intelligence Report ADOBE DIGITAL INDEX Q4 2013 Social Intelligence Report ADOBE DIGITAL INDEX Q4 2013 ADOBE DIGITAL INDEX Q4 2013 key insights Facebook ad: Click-through-rate (CTR) is up 365% year-over-year and 41% quarter-over-quarter. Cost-per-click

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

Social Media Marketing Measurement A research project to understand new possibilities

Social Media Marketing Measurement A research project to understand new possibilities Social Media Marketing Measurement A research project to understand new possibilities Boris Grinkot Associate Director of Product Development MECLABS Primary Research @grinkot Social Media Measurement

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Data Science Overview Why, What, How, Who Outline Why Data Science?

More information

Big Data and Economics, Big Data and Economies. Susan Athey, Stanford University Disclosure: The author consults for Microsoft.

Big Data and Economics, Big Data and Economies. Susan Athey, Stanford University Disclosure: The author consults for Microsoft. Big Data and Economics, Big Data and Economies Susan Athey, Stanford University Disclosure: The author consults for Microsoft. Lenses on big data 1. The science and practice of using big data 2. Management

More information

AP / AOL / PEW RESEARCH CENTER SURVEY OF CELLULAR PHONE USERS March 8-28, 2006 Total N=1,503 / Landline RDD Sample N=752 / Cell Phone RDD Sample N=751

AP / AOL / PEW RESEARCH CENTER SURVEY OF CELLULAR PHONE USERS March 8-28, 2006 Total N=1,503 / Landline RDD Sample N=752 / Cell Phone RDD Sample N=751 AP / AOL / PEW RESEARCH CENTER SURVEY OF CELLULAR PHONE USERS March 8-28, 2006 Total N=1,503 / Landline RDD Sample N=752 / Cell Phone RDD Sample N=751 Q.1 How do you feel about computers and technology...

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

More information

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg The Big Picture on Big Data Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg Objective of Talk 1. Deliver a Primer on Big Data. 2. How does this emerging topic apply to Quality? 3.

More information

Using Your Own Marketing Score to Measure Your Impact in Social Media A Market Smart 360 Whitepaper

Using Your Own Marketing Score to Measure Your Impact in Social Media A Market Smart 360 Whitepaper Using Your Own Marketing Score to Measure Your Impact in Social Media A Market Smart 360 Whitepaper 1000 Commerce Park Drive Suite 301 Williamsport, PA 17701 P 866-623-0963 [email protected] www.marketsmart360.com

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Newsweek Poll Psychology of Voter Anger Princeton Survey Research Associates International. Final Topline Results (10/1/10)

Newsweek Poll Psychology of Voter Anger Princeton Survey Research Associates International. Final Topline Results (10/1/10) Newsweek Poll Psychology of Voter Anger Princeton Survey Research Associates International Final Topline Results (10/1/10) N = 1,025 adults 18+ (691 landline interviews and 334 cell phone interviews) Margins

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

What Will Drive Growth in the Washington Area Economy Going Forward?

What Will Drive Growth in the Washington Area Economy Going Forward? /4/25 Cardinal Bank & George Mason University 23 rd Annual Economic Conference What Will Drive Growth in the Washington Area Economy Going Forward? Stephen S. Fuller, Ph.D. Dwight Schar Faculty Chair and

More information

We Must Prepare Ph.D. Students for the Complicated Art of Teaching - Commentary - The Chronicle of Higher Education

We Must Prepare Ph.D. Students for the Complicated Art of Teaching - Commentary - The Chronicle of Higher Education Commentary November 11, 2013 We Must Prepare Ph.D. Students for the Complicated Art of Teaching By Derek Bok Graduate study for the Ph.D. in the United States presents a curious paradox. Our universities

More information

Online & Offline Correlation Conclusions I - Emerging Markets

Online & Offline Correlation Conclusions I - Emerging Markets Lead Generation in Emerging Markets White paper Summary I II III IV V VI VII Which are the emerging markets? Why emerging markets? How does the online help? Seasonality Do we know when to profit on what

More information

Pretty Only TABLE 2 POSITIVE RATINGS: TRENDS SINCE 9/11/01: SUMMARY. April 2005. Nov. 2004. Feb. 2005

Pretty Only TABLE 2 POSITIVE RATINGS: TRENDS SINCE 9/11/01: SUMMARY. April 2005. Nov. 2004. Feb. 2005 FOR IMMEDIATE RELEASE TABLE 1 CURRENT RATINGS OF PRESIDENT, SENIOR CABINET MEMBERS AND PARTIES IN CONGRESS How would you rate the job (READ ITEM) are/is doing excellent, pretty good, only fair, or poor?

More information

i. Definition ii. Primary Activities iii. Support Activities iv. Information Systems role in value chain analysis

i. Definition ii. Primary Activities iii. Support Activities iv. Information Systems role in value chain analysis ACS 1803 Final Exam Topic Outline I. Enterprise Information Systems a. Enterprise systems vs. inter-organisational systems b. Value Chain Analysis ii. Primary Activities iii. Support Activities iv. Information

More information

Unprecedented Exposure January 2014

Unprecedented Exposure January 2014 Unprecedented Exposure January 2014 * * Mobile News App Page Views:10,802,154 * ipad App Page Views:1,940,225 Page Views: 4,099,363 * Page Views 34,318,508 Visits 9,946,681 Unique Visitors 7,629,425 Video

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Social Business Intelligence For Retail Industry

Social Business Intelligence For Retail Industry Actionable Social Intelligence SOCIAL BUSINESS INTELLIGENCE FOR RETAIL INDUSTRY Leverage Voice of Customers, Competitors, and Competitor s Customers to Drive ROI Abstract Conversations on social media

More information

Advisors: Using Marketing to Build Your Pipeline. Presenter: Barbara Kotlyar Sr. Marketing Manager ByAllAccounts Managing Director, Bridge Marketing

Advisors: Using Marketing to Build Your Pipeline. Presenter: Barbara Kotlyar Sr. Marketing Manager ByAllAccounts Managing Director, Bridge Marketing Advisors: Using Marketing to Build Your Pipeline Presenter: Barbara Kotlyar Sr. Marketing Manager ByAllAccounts Managing Director, Bridge Marketing Who is this person and why should I listen to them? 88.4%

More information

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics contents A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics Abstract... 2 Need of Social Content Analytics... 3 Social Media Content Analytics... 4 Inferences

More information

2014 Midterm Elections: Voter Dissatisfaction with the President and Washington October 23-27, 2014

2014 Midterm Elections: Voter Dissatisfaction with the President and Washington October 23-27, 2014 CBS NEWS POLL For release: Wednesday, October 29, 2014 6:30 pm EDT 2014 Midterm Elections: Voter Dissatisfaction with the President and Washington October 23-27, 2014 51% of voters expect the Republicans

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Conquering the Astronomical Data Flood through Machine

Conquering the Astronomical Data Flood through Machine Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:

More information

Data Mining and Neural Networks in Stata

Data Mining and Neural Networks in Stata Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca [email protected] [email protected]

More information

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

Domain Name Abuse Detection. Liming Wang

Domain Name Abuse Detection. Liming Wang Domain Name Abuse Detection Liming Wang Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 2 Why?

More information

Social Media Implementations

Social Media Implementations SEM Experience Analytics Social Media Implementations SEM Experience Analytics delivers real sentiment, meaning and trends within social media for many of the world s leading consumer brand companies.

More information

Guide: Social Media Metrics in Government

Guide: Social Media Metrics in Government Guide: Social Media Metrics in Government Guide: Social Media Metrics in Government Why Social Media Metrics Matter to Your Agency In a digital strategy, nearly anything can be measured, compiled and analyzed.

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

adjust reports The Undead App Store A 2014 retrospective report on App Store performance

adjust reports The Undead App Store A 2014 retrospective report on App Store performance 1 adjust reports The Undead App Store The course for discovery in 2015 A 2014 retrospective report on App Store performance i Executive summary The app ecosystem is evolving, and it is becoming more Darwinistic

More information

Predictive Analytics Certificate Program

Predictive Analytics Certificate Program Information Technologies Programs Predictive Analytics Certificate Program Accelerate Your Career Offered in partnership with: University of California, Irvine Extension s professional certificate and

More information

Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification. Heatmaps Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

Pallas Ludens. We inject human intelligence precisely where automation fails. Jonas Andrulis, Daniel Kondermann

Pallas Ludens. We inject human intelligence precisely where automation fails. Jonas Andrulis, Daniel Kondermann Pallas Ludens We inject human intelligence precisely where automation fails Jonas Andrulis, Daniel Kondermann Chapter One: How it all started The Challenge of Reference and Training Data Generation Scientific

More information

The Fed, Interest Rates, and Presidential Elections

The Fed, Interest Rates, and Presidential Elections The Fed, Interest Rates, and Presidential Elections by David Brubaker Summary During every four-year presidential term, there is a period of 24 months when the Federal Reserve Board s actions on interest

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

BIG Data Analytics Move to Competitive Advantage

BIG Data Analytics Move to Competitive Advantage BIG Data Analytics Move to Competitive Advantage where is technology heading today Standardization Open Source Automation Scalability Cloud Computing Mobility Smartphones/ tablets Internet of Things Wireless

More information

Lead Generation Lessons From 4,000 Businesses. A study based on real data from 4,000 businesses

Lead Generation Lessons From 4,000 Businesses. A study based on real data from 4,000 businesses Lead Generation Lessons From 4,000 Businesses A study based on real data from 4,000 businesses Table of Contents Introduction: Real Data from 4,000 Businesses... 3 Factor 1: Blogging... 4 Factor 2: Web

More information

Sentiment Analysis on Big Data

Sentiment Analysis on Big Data SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social

More information

Analytics on Big Data

Analytics on Big Data Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

More information

Lead Generation in Emerging Markets

Lead Generation in Emerging Markets Lead Generation in Emerging Markets White paper Summary I II III IV V VI VII Which are the emerging markets? Why emerging markets? How does online help? Seasonality Do we know when to profit on what we

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Web analytics: Data Collected via the Internet

Web analytics: Data Collected via the Internet Database Marketing Fall 2016 Web analytics (incl real-time data) Collaborative filtering Facebook advertising Mobile marketing Slide set 8 1 Web analytics: Data Collected via the Internet Customers can

More information

National and Transnational Security Implications of Big Data in the Life Sciences

National and Transnational Security Implications of Big Data in the Life Sciences Prepared by the American Association for the Advancement of Science in conjunction with the Federal Bureau of Investigation and the United Nations Interregional Crime and Justice Research Institute National

More information

Democratic Process and Social Media: A Study of Us Presidential Election 2012

Democratic Process and Social Media: A Study of Us Presidential Election 2012 Democratic Process and Social Media: A Study of Us Presidential Election 2012 Abstract Susanta Kumar Parida M.Phil Scholar Dept. of Political Science Utkal University, Vanivihar, Bhubaneswar, Odisha India

More information