CS294-1 Final Project Report. Real-Time Event Detection by Twitter

Size: px
Start display at page:

Download "CS294-1 Final Project Report. Real-Time Event Detection by Twitter"

Transcription

1 CS294-1 Final Project Report Real-Time Event Detection by Twitter Cheng Lu & Yun Jin Abstract Twitter has become a popular microblogging service. An important characteristic of Twitter is its real-time nature. For example, when an earthquake occurs, people make many tweets related to the earthquake, which enables detection of earthquake occurrence promptly, simply by observing the tweets. In this report, we investigate the real-time interaction of events such as earthquakes, in Twitter, and propose an algorithm to monitor tweets and to detect a target event. To detect a target event, we devise a classifier of tweets based on features such as the keywords in a tweet, the number of words, and their context. We also establish a temporal model to estimate time of occurrence. Finally, we use geocharts from Google to show our results on maps. In this report, section 1 describes our goal of this project. Section 2 introduces how we collect data and how we processed these data for classification. Section 3 talks about the approach we used for the project in detail and why we choose it. Section 4 shows the results and performance analysis. Section 5 gives a short scenario about what will be done for next step. Section 6 summarizes what we have done and what we have learned through this project. Project Goal Since real-time nature is one of the most important features for Twitter, we could detect real-time events by monitoring tweets. And we try to find the real time of location of such events. These events share several properties in common: i) They are of large scale (many users experience the event). ii) They particularly influence people s daily life (for that reason, they are induced to tweet about it). iii) They have both spatial and temporal regions (so that real-time location estimation would be possible). The research questions we are trying to answer in this project are: i) Can we detect such event occurrence in real-time by monitoring twitter? ii) Can we find the real time and location information of such event?

2 In this report, we focus on the detection of earthquake that has the above properties. The goal of our project is to detect earthquake events by collecting tweets data and to estimate the time of location of earthquake occurrence. Data Collecting & Data Processing We start our analysis by searching the queries from the MarkLogic Twitter Database by XQueries. In our project, we first search the keyword earthquake for a subset of tweets that all contains the keyword in our searching queries. Subsequently, we limit the location of all tweets to US and set the language as English since we concentrate on the events that are in the region of US. Finally, we get 20,000 tweets that are dated from Nov 22nd, 2011 to Mar 13th, Follow by that, we employed simple statistical analysis of the twitter data we got. From these tweets data, we found that most are located in west in accordance to the fact that earthquake happens most in the west coast in US and are pretty rare in Midwest and Northeast area (this may also due to the possibility that the twitter population is way more dense in west coast). Figure 1 shows the regional distribution of the data that contains the keyword earthquake located in US. Figure1. Distribution of earthquake in US from tweets data Approach To detect an event like earthquake and estimate the time and location of such event, we build an overall analysis platform for classification and estimation. Figure 2 shows the procedure of our approach. In this report, we first introduce two assumptions for detecting an event like earthquake. Then we describe the semantic analysis for classification. Subsequently, we propose a temporal model for event occurrence probability. Finally, we estimate the time of location of the earthquake.

3 Figure2. Frame of approach Assumption If we search the tweet and find out one user posted a relevant tweet, then we classify it into a positive class. The users function like a sensor of the event, therefore a tweet can be considered as a sensor output. In order to make our event detection feasible, we made the following assumptions: 1. Each Twitter user is a sensor, which detects a target event and makes a report following a certain probability. 2. Each tweet is associated with a time and location. We called our Twitter users as virtual sensors, which have various characteristics. Each sensor has different sensitivity, which is contradictory to traditional sensors. This means that some users are only activate for a limited time and specific events, while others are active to a wider range of time and events. The number of sensors is enormous. Therefore, we concluded that the virtual sensors are noisier than traditional physical sensors. Each tweet is associated with its post time, and we use this information as the estimated occurrence time of our target event. Figure 3 shows the conceptual model of our analysis platform. Each twitter user could be considered as a virtual sensor for our targeting event. Upon the occurrence of such events, each twitter users will have a certain probability to post a tweet about it, and the tweet they posted is the output of every sensor, just like the output signal of normal thermal sensor. Trained from known data, the classifier will take these tweets as input and giving out positive or negative result. The probabilistic model will finally indicate the signal of the targeting event.

4 Figure3. Conceptual model Semantic Analysis To obtain tweets on the target event precisely, we apply semantic analysis of a tweet: For example, users might make tweets such as Earthquake! thus earthquake could be keywords, but users might also make tweets such as I am attending an Earthquake Conference. Moreover, even though a tweet refers to the target event, it might not be appropriate as an event report; for example a user makes tweets such as The earthquake yesterday was scaring, or Three earthquakes in four days. Japan scares me. These tweets are truly the mentions of the target event, but they are not real-time reports of the events. Therefore, it is necessary to clarify that a tweet is actually referring to an actual earthquake occurrence, which is denoted as a positive class. By preparing positive and negative examples as a training set, we can produce a model to classify tweets automatically into positive and negative categories. We prepare three groups of features for each tweet as follows:

5 Features A (statistical features): The number of words in a tweet message Features B (word context features): The position of the query word within a tweet. Features C (keyword features): Bag of words in tweets We use the dictionary built in HW2, which has the length of around 260K. After gathering these features, we find the dimension of the feature vector is 260K. However, the model has more degrees of freedom than we have data. We decide to use linear dimensionality reduction (LDA) to compress data for both efficiency and accuracy. After the process of LDA, the dimension of feature vector has been reduced to 10K. Classification In the training stage, we have to manually classify the twitter data to prepare the training set. We manually classify about 1000 tweets. One interesting facts we found out is that most of the earthquake tweets are not really talking about real-time event of earthquake. Some people use earthquake as an expression, while others simply post earthquake news obtained from other media source, which are not our targeting positive tweets. To classify a tweet into a positive class or a negative class, we use several methods to train our classifier, including Naïve Bayes, linear regression and kernel SVMs to classify the data into positive and negative class. After training, we get top ten positive words and top ten negative words and we also provide weights for these words (Table 1). Table1. Weights of top ten positive and negative words

6 We can tell that the table shows some insights about our system. For the positive term, most authentic real-time earthquake tweets will tweet like I was shocked by an earthquake or Woke up from bed or I had a weird feeling, maybe an earthquake. The most significant negative terms are predict, report and magnitude, which usually don t indicate a real-time earthquake. People are also like to post Japan earthquake to show sympathy. Another interesting facts is that the number of words in each tweet has a negative correlation, meaning that most of the positive real-time earthquake tweets are shorter than the report and other expression, which makes sense too. In the testing stage, we get a comparison of rate of accuracy with LDA and without LDA for different classifier. Figure4. Rate of accuracy for different algorithms with LDA and without LDA From the result, the Intersection SVM, Linear Regression and Naïve Bayes methods outperform others, which all exceed 90% of success. We get our best result by using Intersection kernel SVM, baring only 8% of error rate. Event Occurrence Probability After classifying a tweet into positive or negative case, we have to calculate the event occurrence probability. We form a probabilistic model that gives the probability of event occurrence, for a given tweet that is a positive example. If the probability is larger than a predetermined threshold, then it determines an actual occurrence of the target event. To assess an alarm, we must calculate the reliability of multiple sensor values. For example, a user might make a false alarm by writing a tweet. It is also possible that the classifier misclassifies a tweet into a positive class. We can design the alarm probabilistically using the following two facts:

7 1. The false-positive ratio P f of a sensor is approximately 0.3. (The misclassification error we got is about 10%, while the rest are those who reports there is an earthquake while it does not happen.) 2. Sensors are assumed to be independent and identically distributed. Assuming that we have n sensors at one specific time, which produce positive signals, the probability of all n sensors returning a false alarm is P f n. Therefore, the probability of event occurrence can be estimated as P occur = 1 P f n. We determine the threshold as 0.95, so the probability of event occurrence should be P occur We can calculate that the positive tweets we got should be no less than 3 (n 3). So our system will give out alarms when seeing more than 3 positive tweets about earthquake within a specific time. Estimate Time & Location For time estimation, we set the first post tweet time as the occurrence time. We didn t do much about the location estimation, since our data lost most of the GPS information. We simply used the city information for the location of events occurrence. Result We implemented a geochart by using HTML to show our final results of detecting the occurrence of earthquake in US from Nov 22nd, 2011 to Mar 13th, 2012 (Figure 5). We also collect real earthquake data from Table 2 shows a comparison of our results and real earthquake data. From the comparison, we find that the estimated time of earthquake is a bit late than the real time of earthquake, but within a tolerable level. Figure5. Results of detecting earthquake Tweet Time Tweet Location Real Time Real Location 03:55:02 UTC, 11/26/2011 Oklahoma 03:53:10UTC, 11/26/2011 Oklahoma

8 11:58:20 UTC 11/30/2011 Seattle 11:56:30 UTC 11/30/2011 Seattle, Washington 03:48:37 UTC 12/23/2011 Southern Alaska 03:44:27 UTC 12/23/2011 Southern Alaska 12:09:10 UTC 1/3/2012 Beaver 12:06:36 UTC 1/3/2012 Beaver, Utah 07:25:34 UTC 1/5/2012 Boise 07:20:10 UTC 1/5/2012 Southern Idaho 07:15:05 UTC 1/20/2012 Butte 07:05:55 UTC 1/20/2012 Western Montana 05:26:18 UTC 1/21/2012 Camden 05:20:30 UTC 1/21/2012 Philadelphia 03:50:22 UTC 1/29/2012 Riverton 03:46:24 UTC 1/29/2012 Lander, Wyoming 03:59:49 UTC 1/31/2012 Hawaii 03:54:43 UTC 1/31/2012 Hawaii 04:50:11 UTC 2/1/2012 Wells 04:40:31 UTC 2/1/2012 Wells, Nevada 07:29:45 UTC 2/12/2012 New York 07:28:30 UTC 2/12/2012 Boston, Massachusetts 03:33:28 UTC 2/15/2012 Coos Bay 03:31:20 UTC 2/15/2012 Coos Bay, Oregon 05:07:30 UTC 3/5/2012 San Francisco 05:03:30 UTC 3/5/2012 Richmond, California 03:45:18 UTC 3/5/2012 Milpitas 03:33:12 UTC 3/5/2012 San Francisco 03:18:34 UTC 3/6/2012 El Paso 03:11:50 UTC 3/6/2012 Western Texas Table2. Comparison of estimated occurrence of earthquake and real earthquake Future Work We would focus on more accurate temporal and spatial model for time and location estimation if we have more data with location information. For temporal model, we should consider about the interval time between the occurrence time and the post time. So we should not use post time as occurrence time. For spatial model, we d like to apply Karman filtering or particle filtering into estimating the locations of events. We plan to detect real-time events of various kinds using Twitter, such as traffic jam, accidents and typhoon. Our model includes the assumption that a single instance of the target event exists at a certain time. For example, we assume that we do not have two or more earthquakes or typhoons simultaneously. Although the assumption is reasonable for these cases, it might not hold for other events such as traffic jams, accidents, and rainbows. To realize multiple event detection, we must produce advanced probabilistic models that allow hypotheses of multiple event occurrences. A search query is important to search possibly relevant tweets. For example, we set a query term as earthquake because most tweets mentioning an earthquake occurrence use this word. However, to improve the recall, it is necessary to obtain a good set of queries. We can use advanced algorithms for query expansion, which is a subject of our future work. Conclusion The project we propose is Real Time Event Detection by Twitter, which utilizes the real-time nature of Twitter. We consider each Twitter user as a virtual sensor, and use semantic analysis to classify tweets into a positive and negative class. Afterwards, we use probabilistic temporal model for detecting such event. Finally we use the post time as estimated occurrence time of the earthquake. Since we lose the GPS information of data, we cannot investigate into estimating the location of earthquake. From this project, we learn how to use MarkLogic XQuery to collect data. Also, we learn how to parse and tokenize XML data. The semantic analysis and feature

9 extraction are the key issue in the whole project. We also use LDA to get more accurate classification result. Additionally, we utilize different classification techniques to train and test data. Finally, we learnt how to use geocharts to visualize our results on maps. Reference 1. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: Realtime event detection by social sensors.

Project Report CS294-1: Behavioral Data Mining Chronological and Geographical Visualization for Tweets. Hanzhong (Ayden) Ye, Hong Wu, Rohan Nagesh

Project Report CS294-1: Behavioral Data Mining Chronological and Geographical Visualization for Tweets. Hanzhong (Ayden) Ye, Hong Wu, Rohan Nagesh Project Report CS294-1: Behavioral Data Mining Chronological and Geographical Visualization for Tweets Hanzhong (Ayden) Ye, Hong Wu, Rohan Nagesh Abstract In this report, we present Tweet Visualizer, an

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

EST.03. An Introduction to Parametric Estimating

EST.03. An Introduction to Parametric Estimating EST.03 An Introduction to Parametric Estimating Mr. Larry R. Dysert, CCC A ACE International describes cost estimating as the predictive process used to quantify, cost, and price the resources required

More information

Predicting Flight Delays

Predicting Flight Delays Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

AN EXAMINATION OF ONLINE FRAUD COMPLAINT OCCURRENCES

AN EXAMINATION OF ONLINE FRAUD COMPLAINT OCCURRENCES AN EXAMINATION OF ONLINE FRAUD COMPLAINT OCCURRENCES Lai C. Liu, University of Texas Pan American, liul@utpa.edu Kai S. Koong, University of Texas Pan American, koongk@utpa.edu Margaret Allison, University

More information

Can Twitter provide enough information for predicting the stock market?

Can Twitter provide enough information for predicting the stock market? Can Twitter provide enough information for predicting the stock market? Maria Dolores Priego Porcuna Introduction Nowadays a huge percentage of financial companies are investing a lot of money on Social

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

National Price Rankings

National Price Rankings National Price Rankings The links below provide the latest available electric service price data from a July 1, 2010 survey of more than 150 investor-owned utilities. Data source: Edison Electric Institute.

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Exploring Our World with GIS Lesson Plans Engage

Exploring Our World with GIS Lesson Plans Engage Exploring Our World with GIS Lesson Plans Engage Title: Exploring Our Nation 20 minutes *Have students complete group work prior to going to the computer lab. 2.List of themes 3. Computer lab 4. Student

More information

Less naive Bayes spam detection

Less naive Bayes spam detection Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems

More information

An Analysis of Commercial Vehicle Driver Traffic Conviction Data to Identify High Safety Risk Motor Carriers

An Analysis of Commercial Vehicle Driver Traffic Conviction Data to Identify High Safety Risk Motor Carriers An Analysis of Commercial Vehicle Driver Traffic Conviction Data to Identify High Safety Risk Motor Carriers by Brenda M. Lantz, Research Associate The Upper Great Plains Transportation Institute and Michael

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

Three-Year Moving Averages by States % Home Internet Access

Three-Year Moving Averages by States % Home Internet Access Three-Year Moving Averages by States % Home Internet Access Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana

More information

Recognizing Informed Option Trading

Recognizing Informed Option Trading Recognizing Informed Option Trading Alex Bain, Prabal Tiwaree, Kari Okamoto 1 Abstract While equity (stock) markets are generally efficient in discounting public information into stock prices, we believe

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Impacts of Sequestration on the States

Impacts of Sequestration on the States Impacts of Sequestration on the States Alabama Alabama will lose about $230,000 in Justice Assistance Grants that support law STOP Violence Against Women Program: Alabama could lose up to $102,000 in funds

More information

Job Market Intelligence:

Job Market Intelligence: March 2014 Job Market Intelligence: Report on the Growth of Cybersecurity Jobs Matching People & Jobs Reemployment & Education Pathways Resume Parsing & Management Real-Time Jobs Intelligence Average #

More information

Applying Machine Learning to Stock Market Trading Bryce Taylor

Applying Machine Learning to Stock Market Trading Bryce Taylor Applying Machine Learning to Stock Market Trading Bryce Taylor Abstract: In an effort to emulate human investors who read publicly available materials in order to make decisions about their investments,

More information

Server Load Prediction

Server Load Prediction Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that

More information

Public School Teacher Experience Distribution. Public School Teacher Experience Distribution

Public School Teacher Experience Distribution. Public School Teacher Experience Distribution Public School Teacher Experience Distribution Lower Quartile Median Upper Quartile Mode Alabama Percent of Teachers FY Public School Teacher Experience Distribution Lower Quartile Median Upper Quartile

More information

Information About Filing a Case in the United States Tax Court. Attached are the forms to use in filing your case in the United States Tax Court.

Information About Filing a Case in the United States Tax Court. Attached are the forms to use in filing your case in the United States Tax Court. Information About Filing a Case in the United States Tax Court Attached are the forms to use in filing your case in the United States Tax Court. It is very important that you take time to carefully read

More information

End-to-End Sentiment Analysis of Twitter Data

End-to-End Sentiment Analysis of Twitter Data End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India apoorv@cs.columbia.edu,

More information

Net-Temps Job Distribution Network

Net-Temps Job Distribution Network Net-Temps Job Distribution Network The Net-Temps Job Distribution Network is a group of 25,000 employment-related websites with a local, regional, national, industry and niche focus. Net-Temps customers'

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction

More information

Sentiment analysis using emoticons

Sentiment analysis using emoticons Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was

More information

2016 ERCOT System Planning Long-Term Hourly Peak Demand and Energy Forecast December 31, 2015

2016 ERCOT System Planning Long-Term Hourly Peak Demand and Energy Forecast December 31, 2015 2016 ERCOT System Planning Long-Term Hourly Peak Demand and Energy Forecast December 31, 2015 2015 Electric Reliability Council of Texas, Inc. All rights reserved. Long-Term Hourly Peak Demand and Energy

More information

NON-RESIDENT INDEPENDENT, PUBLIC, AND COMPANY ADJUSTER LICENSING CHECKLIST

NON-RESIDENT INDEPENDENT, PUBLIC, AND COMPANY ADJUSTER LICENSING CHECKLIST NON-RESIDENT INDEPENDENT, PUBLIC, AND COMPANY ADJUSTER LICENSING CHECKLIST ** Utilize this list to determine whether or not a non-resident applicant may waive the Oklahoma examination or become licensed

More information

BUSINESS DEVELOPMENT OUTCOMES

BUSINESS DEVELOPMENT OUTCOMES BUSINESS DEVELOPMENT OUTCOMES Small Business Ownership Description Total number of employer firms and self-employment in the state per 100 people in the labor force, 2003. Explanation Business ownership

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

Twitter Analytics: Architecture, Tools and Analysis

Twitter Analytics: Architecture, Tools and Analysis Twitter Analytics: Architecture, Tools and Analysis Rohan D.W Perera CERDEC Ft Monmouth, NJ 07703-5113 S. Anand, K. P. Subbalakshmi and R. Chandramouli Department of ECE, Stevens Institute Of Technology

More information

VACA: A Tool for Qualitative Video Analysis

VACA: A Tool for Qualitative Video Analysis VACA: A Tool for Qualitative Video Analysis Brandon Burr Stanford University 353 Serra Mall, Room 160 Stanford, CA 94305 USA bburr@stanford.edu Abstract In experimental research the job of analyzing data

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

MAINE (Augusta) Maryland (Annapolis) MICHIGAN (Lansing) MINNESOTA (St. Paul) MISSISSIPPI (Jackson) MISSOURI (Jefferson City) MONTANA (Helena)

MAINE (Augusta) Maryland (Annapolis) MICHIGAN (Lansing) MINNESOTA (St. Paul) MISSISSIPPI (Jackson) MISSOURI (Jefferson City) MONTANA (Helena) HAWAII () IDAHO () Illinois () MAINE () Maryland () MASSACHUSETTS () NEBRASKA () NEVADA (Carson ) NEW HAMPSHIRE () OHIO () OKLAHOMA ( ) OREGON () TEXAS () UTAH ( ) VERMONT () ALABAMA () COLORADO () INDIANA

More information

State Tax Information

State Tax Information State Tax Information The information contained in this document is not intended or written as specific legal or tax advice and may not be relied on for purposes of avoiding any state tax penalties. Neither

More information

ObamaCare s Impact on Small Business Wages and Employment

ObamaCare s Impact on Small Business Wages and Employment ObamaCare s Impact on Small Business Wages and Employment Sam Batkins, Ben Gitis, Conor Ryan September 2014 Executive Summary Introduction American Action Forum (AAF) research finds that Affordable Care

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

State Estate Taxes BECAUSE YOU ASKED ADVANCED MARKETS

State Estate Taxes BECAUSE YOU ASKED ADVANCED MARKETS ADVANCED MARKETS State Estate Taxes In 2001, President George W. Bush signed the Economic Growth and Tax Reconciliation Act (EGTRRA) into law. This legislation began a phaseout of the federal estate tax,

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

Wyoming Community College Commission 2015 Tuition Review

Wyoming Community College Commission 2015 Tuition Review Wyoming Community College Commission 2015 Review In October 2006, the Wyoming Community College Commission (WCCC) adopted a long-term tuition policy. This policy still guides the WCCC in establishing annual

More information

Executive Summary. Public Support for Marriage for Same-sex Couples by State by Andrew R. Flores and Scott Barclay April 2013

Executive Summary. Public Support for Marriage for Same-sex Couples by State by Andrew R. Flores and Scott Barclay April 2013 Public Support for Marriage for Same-sex Couples by State by Andrew R. Flores and Scott Barclay April 2013 Executive Summary Around the issue of same-sex marriage, there has been a slate of popular and

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Monitoring and Analyzing Customer Feedback Through Social Media Platforms for Identifying and Remedying Customer Problems

Monitoring and Analyzing Customer Feedback Through Social Media Platforms for Identifying and Remedying Customer Problems Monitoring and Analyzing Customer Feedback Through Social Media Platforms for Identifying and Remedying Customer Problems Sumit Bhatia, Jingxuan Li, Wei Peng, and Tong Sun Xerox Research Centre, Webster,

More information

Avalanche: Prepare, Manage, and Understand Crisis Situations Using Social Media Analytics

Avalanche: Prepare, Manage, and Understand Crisis Situations Using Social Media Analytics Avalanche: Prepare, Manage, and Understand Crisis Situations Using Social Media Analytics Sven Schaust AGT Group (R&D) GmbH sschaust@agtinternational.com Maximilian Walther AGT Group (R&D) GmbH mwalther@agtinternational.com

More information

Creating a NL Texas Hold em Bot

Creating a NL Texas Hold em Bot Creating a NL Texas Hold em Bot Introduction Poker is an easy game to learn by very tough to master. One of the things that is hard to do is controlling emotions. Due to frustration, many have made the

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Workers Compensation State Guidelines & Availability

Workers Compensation State Guidelines & Availability ALABAMA Alabama State Specific Release Form Control\Release Forms_pdf\Alabama 1-2 Weeks ALASKA ARIZONA Arizona State Specific Release Form Control\Release Forms_pdf\Arizona 7-8 Weeks by mail By Mail ARKANSAS

More information

West Case Reporters and Digests Research Guide

West Case Reporters and Digests Research Guide West Case Reporters and Digests Research Guide H. Douglas Barclay Law Library Check the library s Location Guide and Summit/Voyager the Online catalog for the current location of sources mentioned in this

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Englishinusa.com Positions in MSN under different search terms.

Englishinusa.com Positions in MSN under different search terms. Englishinusa.com Positions in MSN under different search terms. Search Term Position 1 Accent Reduction Programs in USA 1 2 American English for Business Students 1 3 American English for Graduate Students

More information

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems

More information

Predicting Defaults of Loans using Lending Club s Loan Data

Predicting Defaults of Loans using Lending Club s Loan Data Predicting Defaults of Loans using Lending Club s Loan Data Oleh Dubno Fall 2014 General Assembly Data Science Link to my Developer Notebook (ipynb) - http://nbviewer.ipython.org/gist/odubno/0b767a47f75adb382246

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

Bayesian Spam Detection

Bayesian Spam Detection Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal Volume 2 Issue 1 Article 2 2015 Bayesian Spam Detection Jeremy J. Eberhardt University or Minnesota, Morris Follow this and additional

More information

Challenges for Big Data Applications in Japan: Hopes and Concerns

Challenges for Big Data Applications in Japan: Hopes and Concerns Challenges for Big Data Applications in Japan: Hopes and Concerns Kibi International University Takushi OTANI In this presentation, I will discuss three challenging big data applications that are thought

More information

14-Sep-15 State and Local Tax Deduction by State, Tax Year 2013

14-Sep-15 State and Local Tax Deduction by State, Tax Year 2013 14-Sep-15 State and Local Tax Deduction by State, Tax Year 2013 (millions) deduction in state dollars) claimed (dollars) taxes paid [1] state AGI United States 44.2 100.0 30.2 507.7 100.0 11,483 100.0

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

ITU Kaleidoscope 2015

ITU Kaleidoscope 2015 ITU Kaleidoscope 2015 Trust in the Informa0on Society Network Failure Detection System for Traffic Control using Social Information in Large-Scale Disasters Chihiro Maru 1, Miki Enoki 1,2, Nakao Akihiro

More information

Music Mood Classification

Music Mood Classification Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may

More information

other distance education innovations, have changed distance education offerings.

other distance education innovations, have changed distance education offerings. WEB TABLES DEPARTMENT OF EDUCATION JUNE 2014 NCES 2014-023 Enrollment in Distance Education Courses, by State: Fall 2012 Postsecondary enrollment in, particularly those offered online, has rapidly increased

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

US Department of Health and Human Services Exclusion Program. Thomas Sowinski Special Agent in Charge/ Reviewing Official

US Department of Health and Human Services Exclusion Program. Thomas Sowinski Special Agent in Charge/ Reviewing Official US Department of Health and Human Services Exclusion Program Thomas Sowinski Special Agent in Charge/ Reviewing Official Overview Authority to exclude individuals and entities from Federal Health Care

More information

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Agenda Security Monitoring: We are doing it wrong Machine Learning

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Chex Systems, Inc. does not currently charge a fee to place, lift or remove a freeze; however, we reserve the right to apply the following fees:

Chex Systems, Inc. does not currently charge a fee to place, lift or remove a freeze; however, we reserve the right to apply the following fees: Chex Systems, Inc. does not currently charge a fee to place, lift or remove a freeze; however, we reserve the right to apply the following fees: Security Freeze Table AA, AP and AE Military addresses*

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

NTT DATA Big Data Reference Architecture Ver. 1.0

NTT DATA Big Data Reference Architecture Ver. 1.0 NTT DATA Big Data Reference Architecture Ver. 1.0 Big Data Reference Architecture is a joint work of NTT DATA and EVERIS SPAIN, S.L.U. Table of Contents Chap.1 Advance of Big Data Utilization... 2 Chap.2

More information

State Tax Information

State Tax Information State Tax Information The information contained in this document is not intended or written as specific legal or tax advice and may not be relied on for purposes of avoiding any state tax penalties. Neither

More information

Feature Selection for Electronic Negotiation Texts

Feature Selection for Electronic Negotiation Texts Feature Selection for Electronic Negotiation Texts Marina Sokolova, Vivi Nastase, Mohak Shah and Stan Szpakowicz School of Information Technology and Engineering, University of Ottawa, Ottawa ON, K1N 6N5,

More information

High Risk Health Pools and Plans by State

High Risk Health Pools and Plans by State High Risk Health Pools and Plans by State State Program Contact Alabama Alabama Health 1-866-833-3375 Insurance Plan 1-334-263-8311 http://www.alseib.org/healthinsurance/ahip/ Alaska Alaska Comprehensive

More information

Data show key role for community colleges in 4-year

Data show key role for community colleges in 4-year Page 1 of 7 (https://www.insidehighered.com) Data show key role for community colleges in 4-year degree production Submitted by Doug Lederman on September 10, 2012-3:00am The notion that community colleges

More information

Geovisualization. Geovisualization, cartographic transformation, cartograms, dasymetric maps, scientific visualization (ViSC), PPGIS

Geovisualization. Geovisualization, cartographic transformation, cartograms, dasymetric maps, scientific visualization (ViSC), PPGIS 13 Geovisualization OVERVIEW Using techniques of geovisualization, GIS provides a far richer and more flexible medium for portraying attribute distributions than the paper mapping which is covered in Chapter

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

CAPSTONE ADVISOR: PROFESSOR MARY HANSEN

CAPSTONE ADVISOR: PROFESSOR MARY HANSEN STEVEN NWAMKPA GOVERNMENT INTERVENTION IN THE FINANCIAL MARKET: DOES AN INCREASE IN SMALL BUSINESS ADMINISTRATION GUARANTEE LOANS TO SMALL BUSINESSES INCREASE GDP PER CAPITA INCOME? CAPSTONE ADVISOR: PROFESSOR

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Concepts of digital forensics

Concepts of digital forensics Chapter 3 Concepts of digital forensics Digital forensics is a branch of forensic science concerned with the use of digital information (produced, stored and transmitted by computers) as source of evidence

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Recruitment and Retention Resources By State List

Recruitment and Retention Resources By State List Recruitment and Retention Resources By State List Alabama $5,000 rural physician tax credit o http://codes.lp.findlaw.com/alcode/40/18/4a/40-18-132 o http://adph.org/ruralhealth/index.asp?id=882 Area Health

More information

Worksheet: Firearm Availability and Unintentional Firearm Deaths

Worksheet: Firearm Availability and Unintentional Firearm Deaths Worksheet: Firearm Availability and Unintentional Firearm Deaths Name: Date: / / 1. How might the possibility that... where there are more guns parents care less about their children s welfare.... influence

More information

Analysis of Social Media Streams

Analysis of Social Media Streams Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization

More information

L25: Ensemble learning

L25: Ensemble learning L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information