The Social Science Data Revolution
|
|
|
- Rudolph Conley
- 10 years ago
- Views:
Transcription
1 The Social Science Data Revolution Gary King Department of Government Harvard University (Horizons in Political Science talk, Government Department, Harvard University, 3/30/11) Gary King (Harvard, Gov Dept) The Data Revolution 1 / 24
2 The Changing Evidence Base of Social Science Research The Last 50 Years: Survey research Aggregate government statistics In depth studies of individual places, people, or events The Next 50 Years: Spectacular increases in new data sources, due to... Much more of the above improved, expanded, and applied Shrinking computers & the growing Internet: data everywhere The replication movement: academic data sharing (e.g., Dataverse) Governments encouraging data collection, distribution, experimentation (e.g., GovData) Advances in statistical methods, informatics, & software The march of quantification: through academia, professions, government, & commerce (SuperCrunchers, The Numerati, MoneyBall) Gary King (Harvard, Gov Dept) The Data Revolution 2 / 24
3 Examples of what s now possible Opinions of activists: A few thousand interviews millions of political opinions in social media posts (1B tweets/week) Exercise: A survey: How many times did you exercise last week? 500K people carrying cell phones with accelerometers Social contacts: A survey: Please tell me your 5 best friends continuous record of phone calls, s, text messages, bluetooth, social media connections, electronic address books Economic development in developing countries: Dubious or nonexistent governmental statistics satellite images of human-generated light at night, or networks of roads and other infrastructure Expert-vs-Statistician contests: Whenever enough information is quantified (& a right answer exists), stats wins every time Many, many more... Gary King (Harvard, Gov Dept) The Data Revolution 3 / 24
4 How to make progress in the new data-rich world? 1 Computer-assisted methods: Traditional quantitative-only or qualitative-only approaches are infeasible 2 Large-scale, interdisciplinary, collaborative research 3 New statistical methods & engineering required 4 Better theory: to respond to massive new evidence, privacy challenges, data-driven science Bigger changes in the practice of social science then ever before Gary King (Harvard, Gov Dept) The Data Revolution 4 / 24
5 Two Examples of Automated Text Analysis Gary King (Harvard, Gov Dept) The Data Revolution 5 / 24
6 Example 1: How to Read a Billion Blog Posts (& Classify Deaths without Physicians) Daniel Hopkins and Gary King. Extracting Systematic Social Science Meaning from Text AJPS, commercialized via: Gary King and Ying Lu. Verbal Autopsy Methods with Multiple Causes of Death, Statistical Science used by (among others): Gary King (Harvard, Gov Dept) The Data Revolution 6 / 24
7 Data and Quantities of Interest Input Data: Large set of text documents (e.g., all English language blog posts) Categories (posts about US candidates): extremely negative, negative, neutral, positive, extremely positive, no opinion, not a blog A small training set of documents hand-coded into the categories Quantities of interest Computer science: individual document classifications (spam filters, Google searches) Social Science: proportion in each category (proportion of which is spam; proportion extremely negative comment about Pres Bush) Estimation Can get the 2nd by counting the 1st (if 1st is accurate) High classification accuracy unbiased category proportions 70% classification accuracy is high disaster for category proportions New methodology: unbiased category proportions, even when classification accuracy is low Gary King (Harvard, Gov Dept) The Data Revolution 7 / 24
8 Out-of-sample Comparison: 60 Seconds vs. 8.7 Days Affect in Blogs Estimated P(D) Actual P(D) Gary King (Harvard, Gov Dept) The Data Revolution 8 / 24
9 Reactions to John Kerry s Botched Joke You know, education if you make the most of it... you can do well. If you don t, you get stuck in Iraq. Affect Towards John Kerry Proportion Sept Oct Nov Dec Jan Feb Mar Gary King (Harvard, Gov Dept) The Data Revolution 9 / 24
10 Example 2: Computer-Assisted Reading Justin Grimmer and Gary King General-Purpose Clustering and Conceptualization Proceedings of the National Academy of Sciences. Conceptualization through Classification: one of the most central and generic of all our conceptual exercises.... Without classification, there could be no advanced conceptualization, reasoning, language, data analysis or,for that matter, social science research. (Bailey, 1994). Cluster Analysis: simultaneously (1) invents categories and (2) assigns documents to categories Gary King (Harvard, Gov Dept) The Data Revolution 10 / 24
11 What s Hard about Clustering? (Why Johnny Can t Classify) Goal: Computer-assisted conceptualization & clustering Bell(n) = number of ways of partitioning n objects Bell(2) = 2 (AB, A B) Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C) Bell(5) = 52 Bell(100) Number of elementary particles in the universe Now imagine choosing the optimal classification scheme by hand! Available compromises pursue different goals: Standard Approach: Fully automated methods no method works well in general; impossible to know which to apply! Our Approach: Computer-assisted methods You, not some computer algorithm, decides what s important, but with help Gary King (Harvard, Gov Dept) The Data Revolution 11 / 24
12 Switch from Fully Automated to Computer Assisted Computer-Assisted Clustering Easy in theory: list all clusterings; choose the best Impossible in practice: Too hard for us mere humans! An organized list will make the search possible Insight: Many clusterings are perceptually identical E.g.,: consider two clusterings that differ only because one document (of 10,000) moves from category 5 to 6 Question: How to organize clusterings so humans can understand? Gary King (Harvard, Gov Dept) The Data Revolution 12 / 24
13 How to Zoom Out while Reading You choose one (or more) clustering, based on insight, discovery, useful information,... Ford Obama Clustering 1 Space of Clusterings mixvmf Clustering 2 Nixon kmedoids stand.euc affprop info.costs Carter Johnson Carter Eisenhower Truman kmeans correlation rock hclust correlation single hclust pearson single affprop maximum Ford Eisenhower Truman Roosevelt Johnson Kennedy Clinton Roosevelt ``Reagan Republicans'' ``Other Presidents '' Bush kmeans binary hclust maximum single hclust canberra binary hclust centroid correlation pearson hclust median centroid correlation median hclust correlation pearson mcquitty average hclust kendall single hclust euclidean centroid kmeans hclust kendall canberra mcquitty hclust canberra binary median affprop biclust_spectral manhattan cosine mspec_max hclust canberra binary single mspec_canb hclust canberra average kmeans pearson hclust binary average divisive stand.euc mspec_cos som hclust manhattan centroid hclust kmedoids spearman maximum manhattan kendall manhattan centroid median single hclust euclidean median hclust correlation pearson complete hclust hclust manhattan spearman kendall maximum euclidean average median complete mcquitty average single affprop hclust euclidean manhattan hclust mcquitty average euclidean average hclust spearman single divisive euclidean kmedoids euclidean hclust spearman average mspec_mink mspec_euc hclust binary complete mcquitty divisive manhattan mspec_man hclust euclidean complete mcquitty hclust kendall complete hclust correlation hclust canberra ward complete clust_convex hclust spearman kendall mcquitty hclust euclidean dismea ward hclust binary ward hclust canberra ward hclust spearman complete hclust manhattan complete spec_canb hclust kendall ward mixvmfva spec_cos hclust manhattan ward spec_euc spec_man kmeans euclidean kmeans manhattan hclust pearson ward hclust spearman ward kmeans spearman spec_max hclust maximum ward kmeans maximum spec_mink Nixon ``Roosevelt To Carter'' Kennedy Bush Obama `` Reagan To Obama '' Reagan HWBush kmeans canberra Clinton HWBush mult_dirproc Reagan Gary King (Harvard, Gov Dept) The Data Revolution 13 / 24
14 Evaluation: More Informative Discoveries Found 2 scholars analyzing lots of textual data for their work Created 6 clusterings: 2 clusterings selected with our method (biased against us) 2 clusterings from each of 2 other methods (varying tuning parameters) Created info packet on each clustering (for each cluster: exemplar document, automated content summary) Asked for ( 6 2) =15 pairwise comparisons User chooses only care about the one clustering that wins Both cases a Condorcet winner: Immigration : Our Method 1 vmf 1 vmf 2 Our Method 2 K-Means 1 K-Means 2 Genetic testing : Our Method 1 {Our Method 2, K-Means 1, K-means 2} Dir Proc. 1 Dir Proc. 2 Gary King (Harvard, Gov Dept) The Data Revolution 14 / 24
15 Evaluation: What Do Members of Congress Do? Gary King (Harvard, Gov Dept) The Data Revolution 15 / 24
16 Example Discovery mult_dirproc hclust canberra ward kmeans correlation sot_cor divisive stand.euc mixvmf hclust correlation mixvmfva mcquitty hclust binary complete hclust pearson single affprop cosine hclust pearson median hclust correlation single hclust pearson mcquitty mec hclust pearson average hclust correlation complete hclust correlation median hclust binary single hclust correlation average hclust pearson complete hclust binary average kmeans pearson hclust pearson correlation centroid centroid rock som hclust binary median hclust canberra single hclust binary mcquitty biclust_spectral hclust spearman complete spec_cos spec_man kmeans hclust canberra kendall median spec_mink spec_max spec_euc affprop maximum hclust canberra average mspec_mink mspec_man spec_canb kmeans spearman kmeans manhattan mspec_max mspec_canb mspec_cos mspec_euc kmeans hclust binary canberra centroid hclust hclust hclust spearman kendall spearman kendall median single single average hclust spearman hclust median centroid kendall mcquitty mcquitty hclust canberra centroid hclust kendall complete hclust hclust hclust euclidean kmedoids single affprop manhattan manhattan hclust hclust manhattan euclidean manhattan median centroid divisive median single average hclust manhattan hclust maximum hclust euclidean single hclust manhattan euclidean centroid mcquitty average clust_convex hclust correlation ward hclust euclidean kmedoids mcquitty euclidean hclust pearson kmedoids wardstand.euc hclust maximum divisive hclust euclidean median centroid maximum affprop average euclidean hclust canberra mcquitty hclust hclust euclidean maximum complete complete hclust manhattan hclust maximum complete mcquitty dist_minkowski dist_ebinary dist_fbinary dist_binary dist_canb dist_max dist_cos dismea hclust manhattan ward affprop info.costs kmeans hclust euclidean euclidean ward hclust canberra complete sot_euc hclust binary ward hclust hclust kendall spearman ward ward hclust maximum ward kmeans binary kmeans maximum Gary King (Harvard, Gov Dept) The Data Revolution 16 / 24
17 In Sample Illustration of Partisan Taunting Taunting ruins deliberation - Senator Lautenberg Blasts Republicans as Chicken Hawks [Government Oversight] Sen. Lautenberg on Senate Floor 4/29/04 Gary King (Harvard, Gov Dept) The Data Revolution 17 / 24
18 Out of Sample Confirmation of Partisan Taunting - Discovered using 200 press releases; 1 senator. Gary King (Harvard, Gov Dept) The Data Revolution 18 / 24
19 Out of Sample Confirmation of Partisan Taunting - Discovered using 200 press releases; 1 senator. - Confirmed using 64,033 press releases; 301 senator-years. - Apply supervised learning method: measure proportion of press releases a senator taunts other party Frequency Prop. of Press Releases Taunting Gary King (Harvard, Gov Dept) The Data Revolution 19 / 24
20 Normative Implications of Taunting Partisan taunting: Very common Makes deliberation less likely Occurs more often in homogeneously partisan districts (i.e., when preaching to the choir) Incompatibility of the principles of representative democracy To get reflection: Homogeneous (noncompetitive) districts To get deliberation (no taunting): Heterogeneous (competitive) districts you can t have both! Gary King (Harvard, Gov Dept) The Data Revolution 20 / 24
21 Some New Data Types 1 Unstructured text: s (1 LOC every 10 minutes), speeches, government reports, blogs, social media updates, web pages, newspapers, scholarly literature 2 Commercial activity: credit cards, sales data, and real estate transactions, product RFIDs 3 Geographic location: cell phones, Fastlane or EZPass transponders, garage cameras 4 Health information: digital medical records, hospital admittances, google/ms health, and accelerometers and other devices being included in cell phones 5 Biological sciences: effectively becoming social sciences as genomics, proteomics, metabolomics, and brain imaging produce huge numbers of person-level variables. 6 Satellite imagery: increasing in scope, resolution, and availability. 7 Electoral activity: ballot images, precinct-level results, individual-level registration, primary participation, and campaign contributions Gary King (Harvard, Gov Dept) The Data Revolution 21 / 24
22 Some More New Data Examples 8 Social media: facebook, twitter, social bookmarking, blog comments, product reviews, virtual worlds, game behavior, crowd sourcing 9 Web surfing artifacts: clicks, searches, and advertising clickthroughs. (Google collects 1 petabyte/72 minutes on human behavior!) 10 Multiplayer web games and virtual worlds: Billions of highly controlled experiments on human behavior 11 Government bureaucracies: moving from paper to electronic data bases, increasing availability 12 Governmental policies: requiring more data collection, such e.g., No Child Left Behind Act ; allowing randomized policy experiments; Obama pushing data distribution 13 Scholarly data: the replication movement in academia, led in part by political science, is massively increasing data sharing Gary King (Harvard, Gov Dept) The Data Revolution 22 / 24
23 Enormous Emerging Opportunities for Social Scientists For the first time: technologies, policies, data, and methods are making it feasible to attack some of the most vexing problems that afflict human society A massive change from studying problems to understanding and solving problems Opportunities require a change in our job descriptions, with new: 1 Computer-assisted methods: Traditional quantitative-only or qualitative-only approaches are infeasible 2 Large-scale, interdisciplinary, collaborative research 3 New statistical methods & engineering required 4 Better theory: to respond to massive new evidence, privacy challenges, data-driven science And then there s you & me: In most legislatures, courts, academic departments,..., change comes from replacement not conversion Will we wait to be replaced? or put in the effort to convert and learn how to use the new information? Gary King (Harvard, Gov Dept) The Data Revolution 23 / 24
24 For more information Gary King (Harvard, Gov Dept) The Data Revolution 24 / 24
Exploring Big Data in Social Networks
Exploring Big Data in Social Networks [email protected] ([email protected]) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
FOR RELEASE: MONDAY, SEPTEMBER 10 AT 4 PM
Interviews with 1,022 adult Americans conducted by telephone by ORC International on September 7-9, 2012. The margin of sampling error for results based on the total sample is plus or minus 3 percentage
Big Data a threat or a chance?
Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but
Model Selection. Introduction. Model Selection
Model Selection Introduction This user guide provides information about the Partek Model Selection tool. Topics covered include using a Down syndrome data set to demonstrate the usage of the Partek Model
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
OBAMA IS FIRST AS WORST PRESIDENT SINCE WWII, QUINNIPIAC UNIVERSITY NATIONAL POLL FINDS; MORE VOTERS SAY ROMNEY WOULD HAVE BEEN BETTER
Tim Malloy, Assistant Director, Quinnipiac University Poll (203) 645-8043 Rubenstein Associates, Inc. Public Relations Contact: Pat Smith (212) 843-8026 FOR RELEASE: JULY 2, 2014 OBAMA IS FIRST AS WORST
CLINTON OR CUOMO THUMP GOP IN 2016 NEW YORK PRES RACE, QUINNIPIAC UNIVERSITY POLL FINDS; ADOPTED DAUGHTER RUNS BETTER THAN NATIVE SON
Maurice Carroll, Assistant Director, Quinnipiac University Poll (203) 582-5334 Rubenstein Associates, Inc. Public Relations Contact: Pat Smith (212) 843-8026 FOR RELEASE: AUGUST 21, 2014 CLINTON OR CUOMO
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Inside the Obama Analytics Cave Andrew Claster, Deputy Chief Analytics Officer Obama for America 2011-2012 W INNING K N OWLEDGE T M
Inside the Obama Analytics Cave Andrew Claster, Deputy Chief Analytics Officer Obama for America 2011-2012 W INNING K N OWLEDGE T M Political Landscape Obama faced the highest unemployment rate of any
Data Mining with R. Text Mining. Hugh Murrell
Data Mining with R Text Mining Hugh Murrell reference books These slides are based on a book by Yanchang Zhao: R and Data Mining: Examples and Case Studies. http://www.rdatamining.com for further background
Collaborations between Official Statistics and Academia in the Era of Big Data
Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI [email protected] What
Social Media and Digital Marketing Analytics ( INFO-UB.0038.01) Professor Anindya Ghose Monday Friday 6-9:10 pm from 7/15/13 to 7/30/13
Social Media and Digital Marketing Analytics ( INFO-UB.0038.01) Professor Anindya Ghose Monday Friday 6-9:10 pm from 7/15/13 to 7/30/13 [email protected] twitter: aghose pages.stern.nyu.edu/~aghose
Big Data how it changes the way you treat data
Big Data how it changes the way you treat data Oct. 2013 Chung-Min Chen Chief Scientist Info. Analysis Research & Services The views and opinions expressed in this presentation are those of the author
BIG DATA FUNDAMENTALS
BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
THE PRESIDENCY OF GEORGE W. BUSH January 11-15, 2009
CBS NEWS/NEW YORK TIMES POLL FOR RELEASE: Friday, January 16, 2009 6:30 pm EST THE PRESIDENCY OF GEORGE W. BUSH January 11-15, 2009 President George W. Bush will leave office with some of the most negative
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data
Technology has transformed the way in which science is conducted. Almost
Chapter 1 Educational Technology in the Science Classroom Glen Bull and Randy L. Bell Technology has transformed the way in which science is conducted. Almost every aspect of scientific exploration has
Breen Elementary School
Breen Elementary School Day: Monday Time: 1:40 pm to 3:10 pm (90 minutes) Session A Sept 21, Sept28, Oct 5, Oct 19, Oct 26, Nov 2 No class on Oct 12 Session B Nov 30, Dec 7, Dec 14, Jan 4, Jan 11, Jan
Connecting library content using data mining and text analytics on structured and unstructured data
Submitted on: May 5, 2013 Connecting library content using data mining and text analytics on structured and unstructured data Chee Kiam Lim Technology and Innovation, National Library Board, Singapore.
How To Change Medicine
P4 Medicine: Personalized, Predictive, Preventive, Participatory A Change of View that Changes Everything Leroy E. Hood Institute for Systems Biology David J. Galas Battelle Memorial Institute Version
International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: [email protected]
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
Statistical Challenges with Big Data in Management Science
Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision
Big Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
Analysis One Code Desc. Transaction Amount. Fiscal Period
Analysis One Code Desc Transaction Amount Fiscal Period 57.63 Oct-12 12.13 Oct-12-38.90 Oct-12-773.00 Oct-12-800.00 Oct-12-187.00 Oct-12-82.00 Oct-12-82.00 Oct-12-110.00 Oct-12-1115.25 Oct-12-71.00 Oct-12-41.00
Big Data, Official Statistics and Social Science Research: Emerging Data Challenges
Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division Building the Global Information System Elements of
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Big Data Analytics in Health Care
Big Data Analytics in Health Care S. G. Nandhini 1, V. Lavanya 2, K.Vasantha Kokilam 3 1 13mss032, 2 13mss025, III. M.Sc (software systems), SRI KRISHNA ARTS AND SCIENCE COLLEGE, 3 Assistant Professor,
Social Intelligence Report ADOBE DIGITAL INDEX Q4 2013
Social Intelligence Report ADOBE DIGITAL INDEX Q4 2013 ADOBE DIGITAL INDEX Q4 2013 key insights Facebook ad: Click-through-rate (CTR) is up 365% year-over-year and 41% quarter-over-quarter. Cost-per-click
Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
Social Media Marketing Measurement A research project to understand new possibilities
Social Media Marketing Measurement A research project to understand new possibilities Boris Grinkot Associate Director of Product Development MECLABS Primary Research @grinkot Social Media Measurement
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Data Science Overview Why, What, How, Who Outline Why Data Science?
Big Data and Economics, Big Data and Economies. Susan Athey, Stanford University Disclosure: The author consults for Microsoft.
Big Data and Economics, Big Data and Economies Susan Athey, Stanford University Disclosure: The author consults for Microsoft. Lenses on big data 1. The science and practice of using big data 2. Management
AP / AOL / PEW RESEARCH CENTER SURVEY OF CELLULAR PHONE USERS March 8-28, 2006 Total N=1,503 / Landline RDD Sample N=752 / Cell Phone RDD Sample N=751
AP / AOL / PEW RESEARCH CENTER SURVEY OF CELLULAR PHONE USERS March 8-28, 2006 Total N=1,503 / Landline RDD Sample N=752 / Cell Phone RDD Sample N=751 Q.1 How do you feel about computers and technology...
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg
The Big Picture on Big Data Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg Objective of Talk 1. Deliver a Primer on Big Data. 2. How does this emerging topic apply to Quality? 3.
Using Your Own Marketing Score to Measure Your Impact in Social Media A Market Smart 360 Whitepaper
Using Your Own Marketing Score to Measure Your Impact in Social Media A Market Smart 360 Whitepaper 1000 Commerce Park Drive Suite 301 Williamsport, PA 17701 P 866-623-0963 [email protected] www.marketsmart360.com
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Newsweek Poll Psychology of Voter Anger Princeton Survey Research Associates International. Final Topline Results (10/1/10)
Newsweek Poll Psychology of Voter Anger Princeton Survey Research Associates International Final Topline Results (10/1/10) N = 1,025 adults 18+ (691 landline interviews and 334 cell phone interviews) Margins
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»
What Will Drive Growth in the Washington Area Economy Going Forward?
/4/25 Cardinal Bank & George Mason University 23 rd Annual Economic Conference What Will Drive Growth in the Washington Area Economy Going Forward? Stephen S. Fuller, Ph.D. Dwight Schar Faculty Chair and
We Must Prepare Ph.D. Students for the Complicated Art of Teaching - Commentary - The Chronicle of Higher Education
Commentary November 11, 2013 We Must Prepare Ph.D. Students for the Complicated Art of Teaching By Derek Bok Graduate study for the Ph.D. in the United States presents a curious paradox. Our universities
Online & Offline Correlation Conclusions I - Emerging Markets
Lead Generation in Emerging Markets White paper Summary I II III IV V VI VII Which are the emerging markets? Why emerging markets? How does the online help? Seasonality Do we know when to profit on what
Pretty Only TABLE 2 POSITIVE RATINGS: TRENDS SINCE 9/11/01: SUMMARY. April 2005. Nov. 2004. Feb. 2005
FOR IMMEDIATE RELEASE TABLE 1 CURRENT RATINGS OF PRESIDENT, SENIOR CABINET MEMBERS AND PARTIES IN CONGRESS How would you rate the job (READ ITEM) are/is doing excellent, pretty good, only fair, or poor?
i. Definition ii. Primary Activities iii. Support Activities iv. Information Systems role in value chain analysis
ACS 1803 Final Exam Topic Outline I. Enterprise Information Systems a. Enterprise systems vs. inter-organisational systems b. Value Chain Analysis ii. Primary Activities iii. Support Activities iv. Information
Unprecedented Exposure January 2014
Unprecedented Exposure January 2014 * * Mobile News App Page Views:10,802,154 * ipad App Page Views:1,940,225 Page Views: 4,099,363 * Page Views 34,318,508 Visits 9,946,681 Unique Visitors 7,629,425 Video
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Social Business Intelligence For Retail Industry
Actionable Social Intelligence SOCIAL BUSINESS INTELLIGENCE FOR RETAIL INDUSTRY Leverage Voice of Customers, Competitors, and Competitor s Customers to Drive ROI Abstract Conversations on social media
Advisors: Using Marketing to Build Your Pipeline. Presenter: Barbara Kotlyar Sr. Marketing Manager ByAllAccounts Managing Director, Bridge Marketing
Advisors: Using Marketing to Build Your Pipeline Presenter: Barbara Kotlyar Sr. Marketing Manager ByAllAccounts Managing Director, Bridge Marketing Who is this person and why should I listen to them? 88.4%
A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics
contents A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics Abstract... 2 Need of Social Content Analytics... 3 Social Media Content Analytics... 4 Inferences
2014 Midterm Elections: Voter Dissatisfaction with the President and Washington October 23-27, 2014
CBS NEWS POLL For release: Wednesday, October 29, 2014 6:30 pm EDT 2014 Midterm Elections: Voter Dissatisfaction with the President and Washington October 23-27, 2014 51% of voters expect the Republicans
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Conquering the Astronomical Data Flood through Machine
Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:
Data Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca [email protected] [email protected]
Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
Domain Name Abuse Detection. Liming Wang
Domain Name Abuse Detection Liming Wang Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 2 Why?
Social Media Implementations
SEM Experience Analytics Social Media Implementations SEM Experience Analytics delivers real sentiment, meaning and trends within social media for many of the world s leading consumer brand companies.
Guide: Social Media Metrics in Government
Guide: Social Media Metrics in Government Guide: Social Media Metrics in Government Why Social Media Metrics Matter to Your Agency In a digital strategy, nearly anything can be measured, compiled and analyzed.
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
adjust reports The Undead App Store A 2014 retrospective report on App Store performance
1 adjust reports The Undead App Store The course for discovery in 2015 A 2014 retrospective report on App Store performance i Executive summary The app ecosystem is evolving, and it is becoming more Darwinistic
Predictive Analytics Certificate Program
Information Technologies Programs Predictive Analytics Certificate Program Accelerate Your Career Offered in partnership with: University of California, Irvine Extension s professional certificate and
Distances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
Pallas Ludens. We inject human intelligence precisely where automation fails. Jonas Andrulis, Daniel Kondermann
Pallas Ludens We inject human intelligence precisely where automation fails Jonas Andrulis, Daniel Kondermann Chapter One: How it all started The Challenge of Reference and Training Data Generation Scientific
The Fed, Interest Rates, and Presidential Elections
The Fed, Interest Rates, and Presidential Elections by David Brubaker Summary During every four-year presidential term, there is a period of 24 months when the Federal Reserve Board s actions on interest
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
BIG Data Analytics Move to Competitive Advantage
BIG Data Analytics Move to Competitive Advantage where is technology heading today Standardization Open Source Automation Scalability Cloud Computing Mobility Smartphones/ tablets Internet of Things Wireless
Lead Generation Lessons From 4,000 Businesses. A study based on real data from 4,000 businesses
Lead Generation Lessons From 4,000 Businesses A study based on real data from 4,000 businesses Table of Contents Introduction: Real Data from 4,000 Businesses... 3 Factor 1: Blogging... 4 Factor 2: Web
Sentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
Analytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
Lead Generation in Emerging Markets
Lead Generation in Emerging Markets White paper Summary I II III IV V VI VII Which are the emerging markets? Why emerging markets? How does online help? Seasonality Do we know when to profit on what we
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
Web analytics: Data Collected via the Internet
Database Marketing Fall 2016 Web analytics (incl real-time data) Collaborative filtering Facebook advertising Mobile marketing Slide set 8 1 Web analytics: Data Collected via the Internet Customers can
National and Transnational Security Implications of Big Data in the Life Sciences
Prepared by the American Association for the Advancement of Science in conjunction with the Federal Bureau of Investigation and the United Nations Interregional Crime and Justice Research Institute National
Democratic Process and Social Media: A Study of Us Presidential Election 2012
Democratic Process and Social Media: A Study of Us Presidential Election 2012 Abstract Susanta Kumar Parida M.Phil Scholar Dept. of Political Science Utkal University, Vanivihar, Bhubaneswar, Odisha India
