A CRF-based approach to find stock price correlation with company-related Twitter sentiment

Size: px
Start display at page:

Download "A CRF-based approach to find stock price correlation with company-related Twitter sentiment"

Transcription

1 POLITECNICO DI MILANO Scuola di Ingegneria dell Informazione POLO TERRITORIALE DI COMO Master of Science in Computer Engineering A CRF-based approach to find stock price correlation with company-related Twitter sentiment Master Graduation Thesis by: Ekaterina Shabunina Supervisor: Prof. Marco Brambilla Academic Year 212/13

2 This is an example of how powerful can a Twitter post be: On 23 of April 213 at 1:7 pm, the hacked Twitter account of Associated Press posted a false tweet saying: Two explosions in the White House and Barack Obama is injured causing a flash crash on the stock market as auto-trading computer systems on autopilot sold $134 billion dollars worth of stocks. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

3 Twitter Background 554,75, registered users, out of which 288 million monthly active, with on average 5 million tweets posted a day with an estimated rate of 9,1 tweets per second; Users have public by default profiles; Users from all over the world with different age, nationality, household income, professions, and hobbies distributions. Cashtags clickable ticker symbols with a dollar sign prefix (for example, $goog), which takes a user to the search results about company s finance and stock. Sentiment Analysis (multi-class) Determining the attitude within a tweet with respect to the company in experiment (in this thesis context). Hard over only 14 symbols long Twitter micro-blogs Even harder over special financial domain, which employs a very specific set of jargons and slangs, with particular abbreviations and symbols and in which many words imply different meanings and associate with distinct emotions. Stock Markets Two categories of analysis performed by the players in financial stock markets in order to determine whether to buy or sell a given security: Technical (an attempt of applying mathematical models) Fundamentalist (based on the study of the value of a company based on its capacity of generating cash in the future) Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

4 Data Pre-Processing Crawling Twitter with Twitter Search API Filtering Data Processing Manually labeling POS tagging Training the model Templates CRF++ tool Twitter data labeling Regression analysis Tools & Methods Stock market data Minitab 16 tool Architectural overview of the proposed approach. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

5 Results Classifier s Performance Templates description: Simple current word and it s POS tag; Previous previous and current words and their POS tags; Prev+Next previous, current and next words and their POS tags; Prevprev+Nextnext - two previous words, current and two next words and their POS tags; Word_combinations - includes Prevprev+Nextnext template features and the combinations: word before previous word / previous words, previous / current, current / next and next / next after next words Classifier s accuracy, average over 1-folds, for Microsoft Inc. and Google Inc. with various templates. For both companies, Microsoft Inc. and Google Inc., the best performance was obtained using Word_combinations template, which was chosen to be used for the labeling of the next one month long time period, from 25 th of April until 24 th of May, 213, to produce the Twitter sentiments daily volume, necessary for the next task of finding correlation of it with the stock values for these companies. Training parameters effect on the classifier s performance, on Google Inc. dataset Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

6 Results Classifier s Performance Performance measures of the resulting classification models for each company, selected out of the 1-folds. Classification models performance for Microsoft Inc. Classification models performance for Google Inc. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

7 Number of Tweets Closing Price (USD) Number of Tweets Closing Price (USD) Results Adherences The initial results are summarized in the charts below for Microsoft Inc. and Google Inc /4 24/4 1/5 8/5 15/5 22/ /4 24/4 1/5 8/5 15/5 22/ Total Positive Negative Neutral Closing Price Total Positive Negative Neutral Closing Price Sentiments and Closing price, Microsoft Inc. Sentiments and Closing price, Google Inc. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

8 Number of Tweets Closing Price Variation Number of Tweets Closing Price Variation Results Adherences In these two charts are plotted the value of the variation of the stock price (compared to the closing price of the previous day), the number of positive-classified tweets, the number of negative classified tweets and the net number of positive tweets, i.e. the total number of positive-classified tweets subtracted by the number of the negative-classified tweets. 3 5% 75 5% 15 17/4 24/4 1/5 8/5 15/5 22/5-15 4% 3% 2% 1% % -1% -2% /4 24/4 1/5 8/5 15/5 22/5-15 4% 3% 2% 1% % -1% -2% -3% Net Positive Positive Negative Price Change (%) Net Positive Positive Negative Price Change (%) Sentiments, Net Positive and Price change, Microsoft Inc. Sentiments, Net Positive and Price change, Google Inc. A visible adherence of the net positive value and the stock daily performance, in Google Inc. case, possibly due to a broader sample size. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

9 Number of Tweets Closing Price (USD) Number of Tweets Closing Price Variation Results Adherences Charts comparing the accumulated value of the net positive tweets and the stock closing price, for every studied day, to cope with the inertial effect presented by the stock markets /4 24/4 1/5 8/5 15/5 22/ /4 24/4 1/5 8/5 15/5 22/ Net Positive Accumulated Pos. Accumulated Net Positive Accumulated Pos. Accumulated Neg. Accumulated Closing Price Neg. Accumulated Closing Price Accumulated Net Positive and the Closing price, Microsoft Inc. Accumulated Net Positive and the Closing price, Google Inc. In both cases, the plots follow similar patterns, even if the closing price presents a higher volatility, denoting a good adherence of the classification model to the observed performance of the stock price. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

10 Number of Tweets Traded Volume Number of Tweets Traded Volume Results Adherences The comparison of the observed volume and the total number of tweets for each trading day: 7 16,, 18 7,, ,, 12,, 1,, 8,, 6,, 4,, 2,, ,, 5,, 4,, 3,, 2,, 1,, 17/4 24/4 1/5 8/5 15/5 22/5 17/4 24/4 1/5 8/5 15/5 22/5 Total Volume Total Volume Total number of tweets and traded volume, Microsoft Inc. Total number of tweets and traded volume, Google Inc. This comparison was the one with the closest fit among all presented for the both companies experiments. It validates the expectation that notable facts that may impact the number of trades in a day also impact in a similar manner the volume of mentions of that company across the social networks. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

11 Results Regression Analysis Net positive tweets as independent explanatory variable for the daily change, in percent, of the stock closing price: Microsoft Inc. Google Inc. Regression for Price Change (%) vs Net Positive Diagnostic Report 1 Regression for Price Change (%) vs Net Positive Diagnostic Report 1,2 Residuals vs Fitted Values Look for large residuals (marked in red) and patterns.,3 Residuals vs Fitted Values Look for large residuals (marked in red) and patterns.,1,15 Residual, Residual, -,1 -,15 -,2 -,1,,1 Fitted Value,2,3 -,3 -,1,,1 Fitted Value,2,3 Examples of patterns that may indicate problems with the fit of the model: Regression for Price Change (%) vs Net Positive Unequal variation Summary Report Strong curvature Y: Price Change Uneven (%) variability, such as when the spread of Curve in the data that is not well explained by the X: Net Positive points increases as the fitted values increase. If the regression model. If you are already using the best unequal variation is severe, get help to address the fitting Fitted model, Line get help Plot to for address Cubic the Model problem. problem. Y =,191 -,99 X +,14 X**2 -, X**3 Is there a relationship between Y and X? Clusters,5,1 >,5 Groups of points that suggest there may be Yes important X variables that were not included in the No regression model. Get help to address the problem. P =,64 The relationship between Price Change (%) and Net Positive is not statistically significant (p >,5). Price Change (%) 4,% Large residuals 2,%,% Points that are not well fit by the model. Try to understand why the points are unusual. Correct measurement or data entry errors and consider removing data that have special causes. Examples of patterns that may indicate problems with the fit of the model: Regression for Price Change (%) vs Net Positive Unequal variation Strong curvature Summary Report Uneven variability, such as when the spread of Y: Price Change (%) points increases as the fitted values increase. If the X: Net Positive unequal variation is severe, get help to address the problem. Is there a relationship between Y and X?,5,1 >,5 Clusters Yes No Groups of points that suggest there may be P =,1 important X variables that were not included in the The relationship regression between model. Price Get Change help to (%) address and Net the problem. Positive is statistically significant (p <,5). Price Change (%) Large 4,% residuals 2,%,% Curve in the data that is not well explained by the regression model. If you are already using the best fitting model, get help to address the problem. Fitted Line Plot for Linear Model Y = -,4668 +,81 X Points that are not well fit by the model. Try to understand why the points are unusual. Correct measurement or data entry errors and consider removing data that have special causes Net Positive 2 % % of variation accounted for by model 1% -2,% Net Positive 4 % of variation accounted for by model % R-sq (adj) = 15,83% 15,83% of the variation in Price Change (%) can be accounted for by the regression model. 1% Comments The fitted equation for the cubic model that describes the relationship between Y and X is: Y =,191 -,99 X +,14 X**2 -, X**3 If the model fits the data well, this equation can be used to predict Price Change (%) for a value of Net Positive, or find the settings for Net Positive that correspond to a desired value or range of values for Price Change (%). R-sq (adj) = 32,37% 32,37% of the variation in Price Change (%) can be accounted for by the regression model. Correlation between Y and X Negative No correlation Positive -1 1 Comments The fitted equation for the linear model that describes the relationship between Y and X is: Y = -,4668 +,81 X If the model fits the data well, this equation can be used to predict Price Change (%) for a value of Net Positive, or find the settings for Net Positive that correspond to a desired value or range of values for Price Change (%). A statistically significant relationship does not imply that X causes Y.,59 The positive correlation (r =,59) indicates that when Net Positive increases, Price Change (%) also tends to increase. A statistically significant relationship does not imply that X causes Y. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

12 Results Regression Analysis Accumulated net positive value versus the closing price behavior: Microsoft Inc. Google Inc. Regression for Closing Price vs Net Positive Accumulated Diagnostic Report 1 Regression for Closing Price vs Net Positive Accumulated Diagnostic Report 1 1, Residuals vs Fitted Values Look for large residuals (marked in red) and patterns. 1 Residuals vs Fitted Values Look for large residuals (marked in red) and patterns.,5 5 Residual, -,5 Residual -5-1, Fitted Value Fitted Value Examples of patterns that may indicate problems with the fit of the model: Regression for Closing Price vs Net Positive Accumulated Unequal variation Strong curvature Summary Report Uneven variability, such as when the spread of Y: Closing Price points increases as the fitted values increase. If the X: Net Positive Accumulated unequal variation is severe, get help to address the problem. Clusters Is there a relationship between Y and X?,5,1 Groups of points that suggest there may be>,5 important X variables that were not included in the Yes No regression model. Get help to address the problem. P =, The relationship between Closing Price and Net Positive Accumulated is statistically significant (p <,5). Closing Price 36 Large residuals Curve in the data that is not well explained by the regression model. If you are already using the best fitting model, get help to address the problem. Fitted Line Plot for Quadratic Model Y = 28,81 +,5878 X -,1 X**2 Points that are not well fit by the model. Try to understand why the points are unusual. Correct measurement or data entry errors and consider removing data that have special causes. Examples of patterns that may indicate problems with the fit of the model: Regression for Closing Price vs Net Positive Accumulated Unequal variation Strong curvature Summary Report Uneven variability, such as when the spread of Y: Closing Price points increases as the fitted values increase. If the X: Net Positive Accumulated unequal variation is severe, get help to address the problem. Clusters Is there a relationship between Y and X?,5,1 Groups of points that suggest there may be>,5 important X variables that were not included in the Yes No regression model. Get help to address the problem. P =, The relationship between Closing Price and Net Positive Accumulated is statistically significant (p <,5). Closing Price Large 9 residuals 85 8 Curve in the data that is not well explained by the regression model. If you are already using the best fitting model, get help to address the problem. Fitted Line Plot for Cubic Model Y = 77,7 +,6772 X +,25 X**2 -, X**3 Points that are not well fit by the model. Try to understand why the points are unusual. Correct measurement or data entry errors and consider removing data that have special causes Net Positive Accumulated Net Positive Accumulated 3 Comments Comments % of variation accounted for by model % 1% R-sq (adj) = 9,72% 9,72% of the variation in Closing Price can be accounted for by the regression model. The fitted equation for the quadratic model that describes the relationship between Y and X is: Y = 28,81 +,5878 X -,1 X**2 If the model fits the data well, this equation can be used to predict Closing Price for a value of Net Positive Accumulated, or find the settings for Net Positive Accumulated that correspond to a desired value or range of values for Closing Price. A statistically significant relationship does not imply that X causes Y. % of variation accounted for by model % 1% R-sq (adj) = 97,56% 97,56% of the variation in Closing Price can be accounted for by the regression model. The fitted equation for the cubic model that describes the relationship between Y and X is: Y = 77,7 +,6772 X +,25 X**2 -, X**3 If the model fits the data well, this equation can be used to predict Closing Price for a value of Net Positive Accumulated, or find the settings for Net Positive Accumulated that correspond to a desired value or range of values for Closing Price. A statistically significant relationship does not imply that X causes Y. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

13 Results Regression Analysis Traded volume change according to the total number of tweets for a given day: Microsoft Inc. Google Inc. Regression for Volume vs Total Diagnostic Report 1 Regression for Volume vs Total Diagnostic Report 1 5 Residuals vs Fitted Values Look for large residuals (marked in red) and patterns. 2 Residuals vs Fitted Values Look for large residuals (marked in red) and patterns Residual Residual Fitted Value Examples of patterns that may indicate problems with the fit of the model: Regression for Volume vs Total Unequal variation Strong curvature Summary Report Uneven variability, such as when the spread of Y: Volume X: Total points increases as the fitted values increase. If the unequal variation is severe, get help to address the problem. Is there a relationship between Y and X?,5,1 >,5 Clusters Yes Groups of points that suggest there may be No P =,1 important X variables that were not included in the The relationship regression between model. Volume Get help and to Total address is the problem. statistically significant (p <,5). Volume Curve in the data that is not well explained by the regression model. If you are already using the best fitting model, get help to address the problem. Fitted Line Plot for Linear Model Y = X Large residuals 125 Points that are not well fit by the model. Try to understand why the points are unusual. Correct 1 measurement or data entry errors and consider removing data that have special causes points increases as the fitted values increase. If the unequal variation is severe, get help to address the problem. Is there a relationship between Y and X?,5,1 >,5 Clusters Yes Groups of points that suggest there may be No P =, important X variables that were not included in the The relationship regression between model. Volume Get help and to Total address is the problem. statistically significant (p <,5) Fitted Value 35 Examples of patterns that may indicate problems with the fit of the model: Regression for Volume vs Total Unequal variation Strong curvature Uneven variability, such as when the spread Summary of Report Y: Volume X: Total Volume 6 Large residuals Curve in the data that is not well explained by the regression model. If you are already using the best fitting model, get help to address the problem. Fitted Line Plot for Linear Model Y = X Points that are not well fit by the model. Try to understand why the points are unusual. Correct measurement or data entry errors and consider removing data that have special causes. % of variation accounted for by model 5 % of variation accounted for by model 2 % 1% 15 3 Total 45 6 % 1% 4 8 Total R-sq (adj) = 31,21% 31,21% of the variation in Volume can be accounted for by the regression model. Correlation between Y and X Negative No correlation Positive -1 1 Comments The fitted equation for the linear model that describes the relationship between Y and X is: Y = X If the model fits the data well, this equation can be used to predict Volume for a value of Total, or find the settings for Total that correspond to a desired value or range of values for Volume. R-sq (adj) = 59,3% 59,3% of the variation in Volume can be accounted for by the regression model. Correlation between Y and X Negative No correlation Positive -1 1 Comments The fitted equation for the linear model that describes the relationship between Y and X is: Y = X If the model fits the data well, this equation can be used to predict Volume for a value of Total, or find the settings for Total that correspond to a desired value or range of values for Volume.,58 The positive correlation (r =,58) indicates that when Total increases, Volume also tends to increase. A statistically significant relationship does not imply that X causes Y.,78 The positive correlation (r =,78) indicates that when Total increases, Volume also tends to increase. A statistically significant relationship does not imply that X causes Y. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

14 Conclusions Multi-class sentiment classification model built with Conditional Random Fields, achieving a good performance, especially for the complex financial domain: 81.67% accuracy for Microsoft Inc. 8.8% accuracy for Google Inc. Interesting patterns and adherences revealed between the company-related Twitter stream sentiments and stock values: for the accumulated net positive versus the stock s closing price: 97.56% explanatory capacity for Google Inc. 9.72% explanatory capacity for Microsoft Inc. The visible correlations of the companyrelated sentiments to the stock values prove also the quality of the built classification models for the companies in the experiment. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

15 Thank you! Master of Science in Computer Engineering Graduation Thesis Student: Ekaterina Shabunina Supervisor: Prof. Marco Brambilla Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

16 Appendix Conditional Random Fields a framework for building probabilistic models to segment and label sequence data. DEFINITION: If X is a random variable over data sequence to be labeled, Y is a random variable over corresponding label sequences. Let G = (V, E) be a graph such that Y = (Yv)v V, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Yv obey the Markov property with respect to the graph: p(yv X,Yw,w= v) = p(yv X,Yw,w v), where w v means that w and v are neighbors in G. The joint distribution over the label sequence Y given X has the form: where x is the data sequence, y is the label sequence, v is the vertex from vertex set V, e is the edge set E over V, fk Boolean vertex feature, gk Boolean edge feature, k number of features, λk and µk are parameters to be estimated, y e is the set of components of y defined by edge e, y v is the set of components of y defined by vertex v. Let Y = start and Yn+1 = stop special start and stop states. For each position i in the observation sequence x, defined the Y Y matrix random variable Mi(x) = [Mi(y,y x)] by: where ei is the edge with labels (Yi 1,Yi) and vi is the vertex with label Yi. CRFs use the observation-dependent normalization factor over all state sequences Z(x) for conditional distributions, it is the (start, stop) entry of the product of these matrixes: Then the conditional probability of a label sequence y is written as: where y = start and yn+1 = stop. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

17 Appendix Regression Analysis Is a statistical process for estimating the relationships among variables. A study which seeks to provide an equation that relates two (or more) variables, in the following form: where x1, x2., xk are called factors (or independent variables) and is called error. To understand whether the regression is or not significant, the ANOVE (analysis of variance) methodology is applied to the linear regression: Starting from the set of assumptions: The total variance: And the residual variance: And finally the regression model variance: From these definitions, it becomes possible, to calculate the critical F-value (based on the F-Snedecor distribution) as: Which should be compared to the critical F value where α is the chance of misinterpretation (1minus the desired confidence level). If, should be rejected and therefore it is implied that the linear regression is statistically significant. Shabunina Ekaterina - Politecnico di Milano - Como Campus - 24/7/213

Correlation between Stock Prices and polarity of companies performance in Tweets: a CRF-based Approach

Correlation between Stock Prices and polarity of companies performance in Tweets: a CRF-based Approach Correlation between Stock Prices and polarity of companies performance in Tweets: a CRF-based Approach Ekaterina Shabunina Università degli Studi di Milano-Bicocca Dipartimento di Informatica Sistemistica

More information

CRF to find stock price correlation with company-related Twitter sentiment

CRF to find stock price correlation with company-related Twitter sentiment POLITECNICO DI MILANO Scuola di Ingegneria dell Informazione POLO TERRITORIALE DI COMO Master of Science in Computer Engineering CRF to find stock price correlation with company-related Twitter sentiment

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

JetBlue Airways Stock Price Analysis and Prediction

JetBlue Airways Stock Price Analysis and Prediction JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Forecasting stock markets with Twitter

Forecasting stock markets with Twitter Forecasting stock markets with Twitter Argimiro Arratia argimiro@lsi.upc.edu Joint work with Marta Arias and Ramón Xuriguera To appear in: ACM Transactions on Intelligent Systems and Technology, 2013,

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Sentiment analysis using emoticons

Sentiment analysis using emoticons Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Using Twitter as a source of information for stock market prediction

Using Twitter as a source of information for stock market prediction Using Twitter as a source of information for stock market prediction Ramon Xuriguera (rxuriguera@lsi.upc.edu) Joint work with Marta Arias and Argimiro Arratia ERCIM 2011, 17-19 Dec. 2011, University of

More information

Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies

Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies Drazen Pesjak Supervised by A.A. Tsvetkov 1, D. Posthuma 2 and S.A. Borovkova 3 MSc. Thesis Finance HONOURS TRACK Quantitative

More information

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015 Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

More information

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL

More information

Four of the precomputed option rankings are based on implied volatility. Two are based on statistical (historical) volatility :

Four of the precomputed option rankings are based on implied volatility. Two are based on statistical (historical) volatility : Chapter 8 - Precomputed Rankings Precomputed Rankings Help Help Guide Click PDF to get a PDF printable version of this help file. Each evening, once the end-of-day option data are available online, the

More information

The Volatility Index Stefan Iacono University System of Maryland Foundation

The Volatility Index Stefan Iacono University System of Maryland Foundation 1 The Volatility Index Stefan Iacono University System of Maryland Foundation 28 May, 2014 Mr. Joe Rinaldi 2 The Volatility Index Introduction The CBOE s VIX, often called the market fear gauge, measures

More information

Traffic Prediction and Analysis using a Big Data and Visualisation Approach

Traffic Prediction and Analysis using a Big Data and Visualisation Approach Traffic Prediction and Analysis using a Big Data and Visualisation Approach Declan McHugh 1 1 Department of Computer Science, Institute of Technology Blanchardstown March 10, 2015 Summary This abstract

More information

CHAPTER 6. Topics in Chapter. What are investment returns? Risk, Return, and the Capital Asset Pricing Model

CHAPTER 6. Topics in Chapter. What are investment returns? Risk, Return, and the Capital Asset Pricing Model CHAPTER 6 Risk, Return, and the Capital Asset Pricing Model 1 Topics in Chapter Basic return concepts Basic risk concepts Stand-alone risk Portfolio (market) risk Risk and return: CAPM/SML 2 What are investment

More information

Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Using Twitter to Analyze Stock Market and Assist Stock and Options Trading

Using Twitter to Analyze Stock Market and Assist Stock and Options Trading University of Connecticut DigitalCommons@UConn Doctoral Dissertations University of Connecticut Graduate School 12-17-2015 Using Twitter to Analyze Stock Market and Assist Stock and Options Trading Yuexin

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Pearson's Correlation Tests

Pearson's Correlation Tests Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

c 2015, Jeffrey S. Simonoff 1

c 2015, Jeffrey S. Simonoff 1 Modeling Lowe s sales Forecasting sales is obviously of crucial importance to businesses. Revenue streams are random, of course, but in some industries general economic factors would be expected to have

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Stock Market Forecasting Using Machine Learning Algorithms

Stock Market Forecasting Using Machine Learning Algorithms Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of

More information

A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Homework 8 Solutions

Homework 8 Solutions Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

More information

Twitter sentiment vs. Stock price!

Twitter sentiment vs. Stock price! Twitter sentiment vs. Stock price! Background! On April 24 th 2013, the Twitter account belonging to Associated Press was hacked. Fake posts about the Whitehouse being bombed and the President being injured

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Detecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo

Detecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo Detecting Email Spam MGS 8040, Data Mining Audrey Gies Matt Labbe Tatiana Restrepo 5 December 2011 INTRODUCTION This report describes a model that may be used to improve likelihood of recognizing undesirable

More information

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch The Viability of StockTwits and Google Trends to Predict the Stock Market By Chris Loughlin and Erik Harnisch Spring 2013 Introduction Investors are always looking to gain an edge on the rest of the market.

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Math 1314 Lesson 8 Business Applications: Break Even Analysis, Equilibrium Quantity/Price

Math 1314 Lesson 8 Business Applications: Break Even Analysis, Equilibrium Quantity/Price Math 1314 Lesson 8 Business Applications: Break Even Analysis, Equilibrium Quantity/Price Three functions of importance in business are cost functions, revenue functions and profit functions. Cost functions

More information

Analysis of Variance. MINITAB User s Guide 2 3-1

Analysis of Variance. MINITAB User s Guide 2 3-1 3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced

More information

A Quantitative Approach to Commercial Damages. Applying Statistics to the Measurement of Lost Profits + Website

A Quantitative Approach to Commercial Damages. Applying Statistics to the Measurement of Lost Profits + Website Brochure More information from http://www.researchandmarkets.com/reports/2212877/ A Quantitative Approach to Commercial Damages. Applying Statistics to the Measurement of Lost Profits + Website Description:

More information

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Solution Let us regress percentage of games versus total payroll.

Solution Let us regress percentage of games versus total payroll. Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

OUTLIER ANALYSIS. Data Mining 1

OUTLIER ANALYSIS. Data Mining 1 OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Social Market Analytics, Inc.

Social Market Analytics, Inc. S-Factors : Definition, Use, and Significance Social Market Analytics, Inc. Harness the Power of Social Media Intelligence January 2014 P a g e 2 Introduction Social Market Analytics, Inc., (SMA) produces

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Point Biserial Correlation Tests

Point Biserial Correlation Tests Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

FINANCIAL ENGINEERING CLUB TRADING 201

FINANCIAL ENGINEERING CLUB TRADING 201 FINANCIAL ENGINEERING CLUB TRADING 201 STOCK PRICING It s all about volatility Volatility is the measure of how much a stock moves The implied volatility (IV) of a stock represents a 1 standard deviation

More information

Chapter 4 and 5 solutions

Chapter 4 and 5 solutions Chapter 4 and 5 solutions 4.4. Three different washing solutions are being compared to study their effectiveness in retarding bacteria growth in five gallon milk containers. The analysis is done in a laboratory,

More information

Part II Management Accounting Decision-Making Tools

Part II Management Accounting Decision-Making Tools Part II Management Accounting Decision-Making Tools Chapter 7 Chapter 8 Chapter 9 Cost-Volume-Profit Analysis Comprehensive Business Budgeting Incremental Analysis and Decision-making Costs Chapter 10

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

MASCOT Search Results Interpretation

MASCOT Search Results Interpretation The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually

More information

Business Valuation Review

Business Valuation Review Business Valuation Review Regression Analysis in Valuation Engagements By: George B. Hawkins, ASA, CFA Introduction Business valuation is as much as art as it is science. Sage advice, however, quantitative

More information

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14 HOW TO USE MINITAB: DESIGN OF EXPERIMENTS 1 Noelle M. Richard 08/27/14 CONTENTS 1. Terminology 2. Factorial Designs When to Use? (preliminary experiments) Full Factorial Design General Full Factorial Design

More information

Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies

Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Introduction. example of a AA curve appears at the end of this presentation.

Introduction. example of a AA curve appears at the end of this presentation. 1 Introduction The High Quality Market (HQM) Corporate Bond Yield Curve for the Pension Protection Act (PPA) uses a methodology developed at Treasury to construct yield curves from extended regressions

More information

Data Mining and Visualization

Data Mining and Visualization Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D. Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

More information

430 Statistics and Financial Mathematics for Business

430 Statistics and Financial Mathematics for Business Prescription: 430 Statistics and Financial Mathematics for Business Elective prescription Level 4 Credit 20 Version 2 Aim Students will be able to summarise, analyse, interpret and present data, make predictions

More information

Teaching Business Statistics through Problem Solving

Teaching Business Statistics through Problem Solving Teaching Business Statistics through Problem Solving David M. Levine, Baruch College, CUNY with David F. Stephan, Two Bridges Instructional Technology CONTACT: davidlevine@davidlevinestatistics.com Typical

More information

Keywords social media, internet, data, sentiment analysis, opinion mining, business

Keywords social media, internet, data, sentiment analysis, opinion mining, business Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction

More information

Polynomial Neural Network Discovery Client User Guide

Polynomial Neural Network Discovery Client User Guide Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Getting Started with Minitab 17

Getting Started with Minitab 17 2014 by Minitab Inc. All rights reserved. Minitab, Quality. Analysis. Results. and the Minitab logo are registered trademarks of Minitab, Inc., in the United States and other countries. Additional trademarks

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction

More information

Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC

Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC Abstract Three examples of time series will be illustrated. One is the classical airline passenger demand data with definite seasonal

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

4. Multiple Regression in Practice

4. Multiple Regression in Practice 30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd

More information

How to Win the Stock Market Game

How to Win the Stock Market Game How to Win the Stock Market Game 1 Developing Short-Term Stock Trading Strategies by Vladimir Daragan PART 1 Table of Contents 1. Introduction 2. Comparison of trading strategies 3. Return per trade 4.

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

A Martingale System Theorem for Stock Investments

A Martingale System Theorem for Stock Investments A Martingale System Theorem for Stock Investments Robert J. Vanderbei April 26, 1999 DIMACS New Market Models Workshop 1 Beginning Middle End Controversial Remarks Outline DIMACS New Market Models Workshop

More information

Scatter Plots with Error Bars

Scatter Plots with Error Bars Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information