Twitter keyword volume, current spending, and weekday spending norms predict consumer spending

Size: px
Start display at page:

Download "Twitter keyword volume, current spending, and weekday spending norms predict consumer spending"

Transcription

1 212 IEEE 12th International Conference on Data Mining Workshops Twitter keyword volume, current spending, and weekday spending norms predict consumer spending Justin Stewart, Homer Strong, Jeffrey Parker, Mark A. Bedau Reed College, Portland, Oregon, USA Lucky Sort, Inc., Portland, Oregon, USA Current address: Department of Economics, University of Pittsburgh, Pittsburgh, PA, USA Abstract We examine whether aggregate daily Twitter keyword volumes over eight months from November 211 to June 212 can be used to predict aggregate daily consumer spending as reported by Gallup. We also examine whether Twitter keyword volume improves predictive ability over prediction based solely on current spending, weekday spending norms, and spending history. We divide spending and Twitter data into (i) in-sample data used to identify which Twitter words are highly correlated with spending and to estimate model coefficients, and (ii) out-ofsample data used to measure model forecast success. Our methods are very general and include n-grams (e.g., pairs of words, like going shopping ). We note that the historical spending data exhibit a weekday pattern of high spending on two days and low spending over the rest of the week. Spending history also shows some striking deviations from weekday norms, such as Black Friday (the day after the American Thanksgiving holiday) and Boxing day (the day after Christmas) historically large shopping days. We build models on combinations of Twitter keyword volume (T), current spending (S), and weekday spending norms (D), and compare four model forecast success measures: the correlation between actual and forecast daily spending changes, the percentage of correctly forecast directions of daily spending change, the correlation between actual and forecast deviations from weekday spending norms, and the percentage of correctly forecast deviations from weekday norms. We test model forecasts over the period: April - June. Our results show that weekday Twitter keyword volume, current spending, and weekday spending norms all have significant value for predicting consumer spending three days in advance, but none demonstrates a significant predictive advantage over the others. Index Terms social media; Twitter; forecast; consumer spending. I. INTRODUCTION We examine whether the aggregate daily volume of keywords in Twitter can be used to forecast consumer spending. The exploding volume of on-line social media opens the door to new ways to forecast economic variables, and Twitter has become the E. coli model organism of computational social science [1] []. Twitter has been used to forecast daily consumer confidence as reported by Gallup [3] and financial variables like the S&P, DJIA, and VIX [2], [6], [7], as well as movie box office receipts [4] and Amazon book sales [8]. We examine whether Twitter can be used to forecast something much broader and more diffuse: the total aggregation of all consumer spending. Furthermore, whereas traditional economic forecasts of consumer spending are monthly [9], [1], we use daily Twitter and spending data to forecast daily consumer spending values. We compare models that are linear combinations of three parameters: current consumer spending (S), weekday average spending (D), and current Twitter keyword volume (T ). We evaluate how well the models can predict daily spending changes and daily deviations from weekday spending norms. We choose Twitter keywords by looking for words with Twitter frequencies that are highly correlated with consumer spending. We also use historical spending and Twitter data to estimate model coefficients. We mitigate the well-known problems with estimating model terms and parameters from data [11] by providing a clear and compelling motivation for choosing the terms in our models, by separating in-sample and outof-sample data and generating model terms and coefficients only in-sample, and testing model forecast success only outof-sample, and by emphasizing the statistical significance of our results. The causal link between sentiment and economic activity can seem obscure and dubious when financial indexes are predicted by sentiment analysis applied to Twitter data [3], [6], [7], [12]. A plausible causal connection between consumer spending and Twitter keyword keyword volume is that tweeters are a reasonable approximation of consumer spenders, and they tend to tweet about whatever they plan to do or are doing. So, if tweeters plan to spend more or are spending more, this will be reflected by an increased volume of tweets about spending. As we explain below, we observed that Twitter volume about spending peaks three days before peaks in consumer spending as reported by Gallup. This might enable models based on today s Twitter keyword volume to predict the spending levels that Gallup will report in three days. II. CONSUMER SPENDING AND TWITTER DATA We test each model s ability to predict daily consumer spending changes and deviations from weekday norms, derived from daily consumer spending values reported by Gallup. Gallup describes this data as the average dollar amount /12 $ IEEE DOI 1.119/ICDMW

2 Americans report spending or charging on a daily basis, not counting the purchase of a home, motor vehicle, or normal household bills. Fig. 1 shows consumer spending (S t ) reported by Gallup, daily changes in spending (ΔS t ), weekday statistical spending norms (D t ), and daily deviations from weekday norms (S t D t ) for November - March 212. This is the in-sample data used in all experiments reported here. Fig. 2 is a blow-up of the data just from November. The Gallup data is reported as a three day moving average of consumer spending. In order to attain the daily consumer spending time series S t used in this study, we had to reconstruct daily spending values. To do so, we identified a 4-day interval in the month preceding the first day of the sample which exhibited almost zero variation; assuming that two of the four were the true spending values for those days, we were able to infer the third used to construct the moving average. Then we proceeded to decompose the moving averaged series into its component daily parts. We handle missing values as Gallup does, by dropping those days from the data set. All of the data used in this study falls within the period 1 November 211 through 3 June 212. All experiments segregate data into non-overlapping in- and out-of-sample segments. Only in-sample data influenced Twitter keyword choice and model coefficient estimation, and only out-ofsample data tested model spending forecasts. Fig. 3 is a scatter plot of daily consumer spending on each day of the week. One can see a slight but evident difference between the scatter plots on the top and bottom. The top shows the first five months of data (November - March) and the bottom shows all eight months of data (November - June). The similarity between top and bottom shows that five months of data provides a good but imperfect approximation of the typical pattern in spending across the days of the week. There is a noticeable pattern in mean spending across the week, consisting of four days of low spending, followed by a day of average spending, followed by two days of high spending (starting on the second day depicted in the figure). We use the weekday spending pattern (D t ) in a number of ways. First, we ask how well weekday spending norms alone predict consumer spending (Model D), and we ask how much predictions improve with the addition of terms for recent spending history (Model DS) and Twitter keyword volume (Model DST). Second, we examine deviations from weekday spending norms (S t D t ), and evaluate how well models of recent spending history (Model DS) and Twitter keyword volume (Model DST) predict deviations from this norm (D t ). Third, the highest spending value on the chart (which is Black Friday) is reported by Gallup on Sunday. This indicates that Gallup s daily spending values report spending that actually occurred two days before. So, forecasting what Gallup reports three days in advance is equivalent to forecasting one day before money changes hands and spending actually occurs. Finally, we use D t to produce our null model H, which simply samples the data points scattered in Fig. 3. This string of spending forecasts is guaranteed to share the much of the statistical quality of the actual spending values Nov Dec Jan Feb Mar Apr Fig. 1. Actual consumer spending (S t), daily change in consumer spending (ΔS t = S t S t 1 ), average weekday spending (D t), and deviation from weekday norms (S t D t) for November 211 to April 212. Twitter keywords and model coefficients for our experiments are based on the in-sample data in this figure Oct 31 Nov 6 Nov 13 Nov 2 Nov 27 Fig. 2. A blow-up of November from Fig. 1. Black Friday is the big spike in spending near the end of November. Note the gaps in spending record (and, hence, gaps in the records of spending change and weekday deviation), due to missing data from Gallup on those days. Our social media data source is Twitter, and our raw Twitter data is the daily word frequencies in a Twitter spritzer Spending Change WeekdayNorm Deviation Spending Change WeekdayNorm Deviation 748

3 spending spending mon tues wed thurs fri sat sun TABLE I WORDS FREQUENCIES CORRELATED WITH CONSUMER SPENDING. Word Correlation Lag Word Correlation Lag shopping days fun days store.39 3 days clubbing days wal mart days couch days going shopping days bar.34 2 days shop days beer days buy days bought days the correlation between the lagged Twitter word frequency time series and the consumer spending index and keeping the words exhibiting high correlations. Word frequencies were lagged by three or two days before being correlated with spending, so a high correlation could be used to forecast future consumer spending. To maximize the statistical power of our words, we correlated consumer spending with lagged frequency of thousands of randomly sampled words in Twitter and kept only those words with correlations in the highest 2.% tail of the distribution of correlations. Table I lists the resulting words, their lagged correlations with consumer spending, and their lag length. We also computed the correlation of every possible pair of words in the sample to find highly related words. A subset of words that are mutually highly correlated with one another form a cluster. We found two distinct clusters of related words: the first was defined by high correlation with spending three days in advance, and the other two days in advance. Fig. 4 displays a heat map revealing the two distinct clusters in the sample words. The top-right cluster consists of words with three-day lags, and the bottom-left cluster consists of words with two-day lags. Our hypothesis is that the two clusters represent two different signals that can be used to forecast consumer spending. Proper definition of the clusters will robustly capture the consumer spending signal and reduce its measurement noise. mon tues wed thurs fri sat sun Fig. 3. Scatter plot of Gallup s daily consumer spending values, S t, for each day of the week. The large blue dots show the mean spending on each day (D t). The red line shows the mean spending over the entire week. Above: Data from only the five months November - March. Below: Data from the eight months November - June. (which represents 1% of the total Twitter volume). Twitter word frequencies (T ) appear as a term in some of the models we study here. We chose the words that compose T by examining Twitter feed data (tweets) during periods of high consumption. An initial set of words was found by ranking all the words used in tweets occurring around the Black Friday holiday sales by their frequency and selecting those that signaled general consumption behavior (e.g., buy, shopping, and store ). This list of words was then reduced by computing III. MODELS OF CONSUMER SPENDING We compare the forecasting ability of a series of models of consumer spending. The models are driven by different sources of data. We use these models to predict daily changes and deviations from weekday norms in consumer spending three days in advance. Then we gauge a model s relative forecasting success by comparing correlations and percentage match of direction between model forecasts and actual consumer spending. A. Model parameters The models are driven by different combinations of the following data: current consumer spending, spending averaged over each day of the week, and current Twitter keyword frequencies: S t : the consumer spending as reported by Gallup at t, D t : the weekday average spending norm at t, T t : the Twitter keyword volume at t. 749

4 couch clubbing bar beer fun bought wal.mart buy shop store walmart going.shopping shopping shopping going.shop walmart store shop buy wal.mart bought fun beer bar clubbing couch Fig. 4. Heat map depicting the magnitudes of the correlations between lagged word frequencies and consumer spending. Each box indicates the absolute correlation between one word pair, and brighter colors represent larger correlations. Words are clustered along the axes so that closely related words are adjacent. Two distinct clusters emerge: the large positive correlations among words with 3-day and 2-day lags. Our raw Twitter data is parsed into a two-dimensional Twitter volume matrix, W, where W tw refers to the volume (number of occurrences or tokens) of word type w at time t. Welet t be the total number of different times t (rows), and w be the total number of word types w (columns) in W. Summing along the rows of the Twitter volume matrix (W ) yields a vector containing the volume over all word types at some time t, and the total Twitter volume of word type w, V w, is: V w = t t=1 W tw. (1) Summing along the columns yields a vector of the volume for some word type w over all times, and the total Twitter volume of some time t, V t, is: V t = w w=1 Finally, the total absolute Twitter volume is: V = t t=1 w=1 W tw. (2) w W tw. (3) These three volumes can then be used to estimate relative Twitter word frequencies. The probability that a randomly sampled word token in the entire Twitter volume is of type w can be estimated to be V w /V, and the probability that the token occurred at time t can be estimated to be V t /V. We choose our set K of keywords to be those candidate words that most highly correlate with spending and maximize our models predictive power on in-sample training data. Some of our keywords are bi-grams, i.e., sequences of two words, like wal mart or going shopping. We treat bi-grams as single words, and normalize bi-grams by dividing by the total number of bi-grams. We define the Twitter index, Tt w, for each word type w K to be: Tt w = W tw, (4) V t V w which gives the volume of w at t compared to (divided by) the total volume at t and the total volume of w. A word s Twitter index indicates the degree to which the word s volume is higher or lower than expected from the background volumes at t relative to other times and of w relative to other words. The total Twitter index we ultimately then used in our models of consumer spending, T t, is simply the mean of the Twitter indices of all keywords w in K: T t = 1 Tt w () K w K The influence of each keyword on the index is weighted by its relative importance, undistorted by fluctuations in the number of keywords. B. Predicting consumer spending We compare models on their ability to forecast consumer spending. The coefficients in the models (α, β, γ, δ) are estimated by an autoregressive distributed lag model from the history of consumer spending in the in-sample data, and the coefficients are estimated separately for each model. Here we study predictions that are three days into the future, but our methods generalize to predictions of other ranges. The models we study here are constructed by regressing consumer spending history on weekday spending norms and current spending levels. The models explicitly forecast consumer spending, from which we calculate forecasts of daily changes in spending and deviations from weekday spending norms. Models built by regressing on daily spending changes or deviations from weekday spending norms produced results akin to the models studied here. Model H predicts consumer spending by randomly resampling the history of actual spending values (from the training data). The model uses the actual historical sequence of spending values (S t ) in the entire training set, and then forecasts successive values by sampling with replacement from this history. If we let spending.history refer to the distribution of points scattered in Fig. 3, then Model H generates n successive predictions of consumer spending by sampling n times with replacement from spending.history, as in the following R command: S H t+3 =sample(spending.history, n, replace = T). (6) 7

5 Model H guarantees that the scatter of future spending values will resemble the scatter seen in Fig. 3. We assess a model s forecasting ability by asking how much better it predicts spending than Model H. Model D guarantees that the consumer spending level matches the statistical average value for the day of the week, D t. The model calculates the average spending on each day of the week in the in-sample data, and then forecasts that consumer spending every day out-of-sample will be equal to that weekday s average spending. The forecast for consumer spending three days in the future is given by an equation of the form: S D t+3 = α + βd t+3 + ɛ. (7) Note that D t is known for all times t covered in our experiments. Model DS builds on Model D and assumes that spending is determined by the combination of recent history of spending and average weekday spending. So, the spending forecast for three days in the future of Model DS is: St+3 DS = α + βd t+3 + γs t + ɛ. (8) Model ST forecasts spending using only current spending and Twitter volume: St+3 ST = α + γs t + δt t + ɛ. (9) Model DT forecasts spending using only weekday norms and current Twitter volume: St+3 DT = α + βd t+3 + δt t + ɛ. (1) Model DST builds on Model DS by forecasting spending from today s Twitter index in addition to today s spending and weekday norms: St+3 DST = α + βd t+3 + γs t + δt t + ɛ. (11) Fig. shows April - June consumer spending daily changes (top) and deviations from weekday norms (bottom), and the predictions of Models D, DS and DST. (Model D predicts no deviation from weekday norms, of course.) Note the comparable predictions of Models DS and DST. (Model D predicts no deviation from weekday norms, of course.) C. Measuring model forecast success We ask how well a model forecasts daily changes and deviations from weekday norms in consumer spending. We choose the Twitter keywords and estimate the coefficients (α, β...) by training models on in-sample data about consumer spending history and Twitter keyword frequencies. We then have our models predict consumer spending three days in advance of time t, S t+3, using only information available at t, such as S t and T t. Each model predicts consumer spending on each day in the out-of-sample consumer spending data. The model three-day forecast is then compared with the actual consumer spending reported by Gallup to see how well predicted and actual spending match. Economic data like consumer spending is known to be autocorrelated, so we measure how well models forecast two detrended data streams calculated from consumer spending: the Apr 1 Apr 1 May 1 May 1 Jun 1 Jun 1 Jul 1 Apr 1 Apr 1 May 1 May 1 Jun 1 Jun 1 Jul 1 Fig.. Above: Actual consumer spending changes (ΔS) and predictions by models D, DS, DT, ST, and DST for out-of-sample data April - July 212. Below: The same for deviations from weekday norms (S t D t). Note that Model D predicts no deviations from weekday norms. daily spending change (ΔS t = S t S t 1 ) and daily deviation from weekday norms (S t D t ). We use two measures of the success of a model M in predicting each data stream: the correlation between actual and predicted values, (S t, St M ), and the percentage of pairs with the same sign (i.e., pairs of values that move up or down together). These four measures of model DeltaS DeltaS.D DeltaS.DS DeltaS.DT DeltaS.ST DeltaS.DST Deviations Deviations.D Deviations.DS Deviations.DT Deviations.ST Deviations.DST 71

6 forecast success constitute a model s success profile. IV. RESULTS OF PREDICTING CONSUMER SPENDING We calculate success profiles for each model on the two tasks of forecasting daily spending changes (ΔS t ) and daily deviations from weekday spending norms (S t D t ). We first look for ability to predict spending significantly better than Model H. We test whether a model s correlation scores are significantly better than zero (Model H) by comparing measured values with the standard error ( 1 n ). To test whether a model is improved with the addition of one or more further terms, we check whether F statistics for the pair of models exceed the % significance threshold. 1 We do not quantify the statistical success of percentage match scores. Table II shows the success profiles for model forecasts three days into the future. Model terms and coefficients were based on in-sample data from months November to March, and model predictions were evaluated on out-of-sample data from months April to June. D(1) collects norms over only five in-sample months from November to March. Models DS, DT, and DST, and weekday norm deviations were based on D(1). Success results for Model H are reported with standard error bounds from resampling 1 times. The standard error of the other correlation measurements (n =88) is close to.1. Note that the percentage of ups and downs in ΔS t and S t D t is not exactly %. As expected, Model H displays zero ability to forecast future spending. This makes Model H perfect as a no success baseline against which to measure the profiles of better models. Also note that Model D fails to forecast major deviations from weekday norms. The most significant positive finding from the table is that all of the models are significantly better than Model H at predicting spending. Since the standard error for the correlation results in Table II is.1, the correlation profiles provide solid evidence that models of Twitter keyword volume in combination with either current spending or weekday norms have significant success at predicting consumer spending. The most significant negative finding from the table is that none of the models demonstrates a significantly better forecasting ability than any other model. The F statistics for nested model comparisons range from.8 to 1.19, and none is above the % significance threshold of In sum, Models D, DS, and DST all significantly predict spending changes, but the differences in their success profiles are statistically too weak to reject the null hypothesis of no difference in model success. We completed other tests as well by applying the same basic methodology to variations of the in- and out-sample periods. For example, because November is known to include significant deviations from weekday norms (most notably Black Friday), we applied the above methods to forecast 1 We compute F-statistics for out-of-sample forecasts by noting that the ratio of the residual sum of squares for the reduced and full models should have approximately an F (n,n) distribution, where n is the number of out-of-sample observations. TABLE II SUCCESS PROFILES OF MODEL FORECASTS OF APRIL-JUNE. Forecasting April - June (n =88) from November - March (n = 13) Model Cor ΔS t % ΔS t Cor S t D t % S t D t H (x1). ± % ± 6%.9 ±.9 49% ± % D(1).14 42% NA NA DS.39 44%.4 % DT.13 44%.3 3% ST.3 6%.29 8% DST.38 46%.42 6% November spending. However, the results were consistent with those presented above. V. CONCLUSIONS Earlier work demonstrated strong correlations between Twitter and economic variables in narrowly focused economic contexts [4], [8], but it is much more difficult to predict aggregate consumer spending on a daily resolution. To identify whether social media information significantly improves model forecasts, we compared forecasts of models based on social media data and models based on only recent spending history. Our results verified the significant three-day forecasting power of models based on Twitter volume and current spending (Model ST) and models based on Twitter volume and weekday norms (Model DT). But similar forecasting ability was demonstrated also by models based on only weekday norms and current spending (Model DS) or on weekday norms alone (Model D). The statistical resolution provided by our data detected no significant difference between models that do or do not depend on Twitter volume. So, Twitter keyword volume helps predict consumer spending but not demonstrably better than current spending and weekday spending norms alone. Future work could aim to adjust certain aspects of the methodology presented above. One possibility could be to develop further the n-gram selection process. To this end, it could be potentially beneficial to include keyword cooccurences (i.e., n-grams that are often contained in the same tweet as the objective keyword) in the models as well. The difficulty of determining whether Twitter data improves model forecasting ability in the present work is partly due to our small sample size (the small number of days in the insample data). We are cautiously optimistic that this limitation can be surmounted with the accumulation and analysis of big data in the social sciences [13] [1]. ACKNOWLEDGMENT Thanks for helpful advice to Albyn Jones, Noah Pepper, and Norman Packard, and thanks for support to a Ruby grant from Reed College. REFERENCES [1] J. Bollen, H. Mao, and S. Counts, Computational economic and finance gauges: Polls, search, and twitter, Meeting of the National Bureau of Economic Research - Behavioral Finance Meeting, Stanford, CA, November

7 [2] X. Zhang, H. Furhres, and P. Gloor, Predicting the stock market through twitter i hope it is not as bad as i fear, Collaborative Innovation Networks (COINs), Savannah, GA, 21. [3] B. O Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith, From tweets to polls: Linking text sentiment to public opinion time series, In Proceedings of the International AAAI, Conference on Weblogs and Social Media, 21. [4] S. Asur and B. A. Huberman, Predicting the future with social media, arxiv:13.699v1, March 21. [] J. Weng, E.-P. Lim, Q. He, and C. W.-K. Leung, What do people want in microblogs? measuring interestingness of hashtags in twitter, in 21 IEEE 1th International Conference on Data Mining, December 21, pp [6] J. Bollen, H. Mao, and X. Zeng, Twitter mood predicts the stock market, Journal of Computational Science, vol. 2, no. 1, pp. 1 8, March 211. [7] E. Gilbert and K. Karahalios, Widespread worry and the stock market, in Fourth International AAAI Conference on Weblogs and Social Media. Washington, DC: E. Gilbert and K. Karahalios, 21, pp [8] D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins, The predictive power of online chatter, in Proceeding of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. New York, NY: ACM Press, 2, pp [9] J. Slacalek, Forecasting consumption, 24, working Paper, German Institute for Economic Research. [1] J. A. Wilcox, Forecasting components of consumption with components of consumer sentiment, Business Economics, vol. 42, no. 4, pp , 27. [11] M. C. Lovell, Data mining, The Review of Economics and Statistics, vol. 6, no. 1, pp. 1 12, [12] A. Pak and P. Paroubek., Twitter as a corpus for sentiment analysis and opinion finding, in Proceedings of the Seventh conference on international Language Resources Association (ELRA), Valletta, Malta,, May 21, pp [13] D. Lazer, A. Pentland, L. Adamic, S. Aral, A.-L. Barabasi, D. Brewer, N. Christakis, N. Contractor, J. Fowler, M. Gutmann, T. Jebara, G. King, M. Macy, D. Roy, and M. V. Alstyne, Computational social science, Science, vol. 323, no. 91, pp , 29. [14] J. Giles, Making the links, Nature, vol. 488, pp , 212. [1] D. J. Watts, A twenty-first century science, Nature, vol. 44, p. 489,

A Description of Consumer Activity in Twitter

A Description of Consumer Activity in Twitter Justin Stewart A Description of Consumer Activity in Twitter At least for the astute economist, the introduction of techniques from computational science into economics has and is continuing to change

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS Huina Mao School of Informatics and Computing Indiana University, Bloomington, USA ECB Workshop on Using Big Data for Forecasting

More information

Predicting Stock Market Indicators Through Twitter I hope it is not as bad as I fear

Predicting Stock Market Indicators Through Twitter I hope it is not as bad as I fear Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences Procedia - Social and Behavioral Sciences 00 (2009) 000 000 www.elsevier.com/locate/procedia COINs2010 Predicting Stock

More information

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction

More information

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015 Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

More information

Data driven approach in analyzing energy consumption data in buildings. Office of Environmental Sustainability Ian Tan

Data driven approach in analyzing energy consumption data in buildings. Office of Environmental Sustainability Ian Tan Data driven approach in analyzing energy consumption data in buildings Office of Environmental Sustainability Ian Tan Background Real time energy consumption data of buildings in terms of electricity (kwh)

More information

Tweets Miner for Stock Market Analysis

Tweets Miner for Stock Market Analysis Tweets Miner for Stock Market Analysis Bohdan Pavlyshenko Electronics department, Ivan Franko Lviv National University,Ukraine, Drahomanov Str. 50, Lviv, 79005, Ukraine, e-mail: b.pavlyshenko@gmail.com

More information

IMPACT OF SOCIAL MEDIA ON THE STOCK MARKET: EVIDENCE FROM TWEETS

IMPACT OF SOCIAL MEDIA ON THE STOCK MARKET: EVIDENCE FROM TWEETS IMPACT OF SOCIAL MEDIA ON THE STOCK MARKET: EVIDENCE FROM TWEETS Vojtěch Fiala 1, Svatopluk Kapounek 1, Ondřej Veselý 1 1 Mendel University in Brno Volume 1 Issue 1 ISSN 2336-6494 www.ejobsat.com ABSTRACT

More information

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize

More information

Predicting Stock Market Fluctuations. from Twitter

Predicting Stock Market Fluctuations. from Twitter Predicting Stock Market Fluctuations from Twitter An analysis of the predictive powers of real-time social media Sang Chung & Sandy Liu Stat 157 Professor ALdous Dec 12, 2011 Chung & Liu 2 1. Introduction

More information

Industry Environment and Concepts for Forecasting 1

Industry Environment and Concepts for Forecasting 1 Table of Contents Industry Environment and Concepts for Forecasting 1 Forecasting Methods Overview...2 Multilevel Forecasting...3 Demand Forecasting...4 Integrating Information...5 Simplifying the Forecast...6

More information

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market

More information

Using INZight for Time series analysis. A step-by-step guide.

Using INZight for Time series analysis. A step-by-step guide. Using INZight for Time series analysis. A step-by-step guide. inzight can be downloaded from http://www.stat.auckland.ac.nz/~wild/inzight/index.html Step 1 Click on START_iNZightVIT.bat. Step 2 Click on

More information

Module 6: Introduction to Time Series Forecasting

Module 6: Introduction to Time Series Forecasting Using Statistical Data to Make Decisions Module 6: Introduction to Time Series Forecasting Titus Awokuse and Tom Ilvento, University of Delaware, College of Agriculture and Natural Resources, Food and

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Sensex Realized Volatility Index

Sensex Realized Volatility Index Sensex Realized Volatility Index Introduction: Volatility modelling has traditionally relied on complex econometric procedures in order to accommodate the inherent latent character of volatility. Realized

More information

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques Chris MacLellan cjmaclel@asu.edu May 3, 2012 Abstract Different methods for aggregating twitter sentiment data are proposed and three

More information

Define calendars. Pre specified holidays

Define calendars. Pre specified holidays PRACTICAL GUIDE TO SEASONAL ADJUSTMENT WITH DEMETRA+ Define calendars Demetra+ provides an easy tool for creating calendar regression variables using the Calendar module. The calendar regression variable

More information

Italian Journal of Accounting and Economia Aziendale. International Area. Year CXIV - 2014 - n. 1, 2 e 3

Italian Journal of Accounting and Economia Aziendale. International Area. Year CXIV - 2014 - n. 1, 2 e 3 Italian Journal of Accounting and Economia Aziendale International Area Year CXIV - 2014 - n. 1, 2 e 3 Could we make better prediction of stock market indicators through Twitter sentiment analysis? ALEXANDER

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

CALL VOLUME FORECASTING FOR SERVICE DESKS

CALL VOLUME FORECASTING FOR SERVICE DESKS CALL VOLUME FORECASTING FOR SERVICE DESKS Krishna Murthy Dasari Satyam Computer Services Ltd. This paper discusses the practical role of forecasting for Service Desk call volumes. Although there are many

More information

Predicting IMDB Movie Ratings Using Social Media

Predicting IMDB Movie Ratings Using Social Media Predicting IMDB Movie Ratings Using Social Media Andrei Oghina, Mathias Breuss, Manos Tsagkias, and Maarten de Rijke ISLA, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands

More information

It has often been said that stock

It has often been said that stock Twitter Mood as a Stock Market Predictor Johan Bollen and Huina Mao Indiana University Bloomington Behavioral finance researchers can apply computational methods to large-scale social media data to better

More information

Some Quantitative Issues in Pairs Trading

Some Quantitative Issues in Pairs Trading Research Journal of Applied Sciences, Engineering and Technology 5(6): 2264-2269, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2013 Submitted: October 30, 2012 Accepted: December

More information

Analysis of Tweets for Prediction of Indian Stock Markets

Analysis of Tweets for Prediction of Indian Stock Markets Analysis of Tweets for Prediction of Indian Stock Markets Phillip Tichaona Sumbureru Department of Computer Science and Engineering, JNTU College of Engineering Hyderabad, Kukatpally, Hyderabad-500 085,

More information

Using Tweets to Predict the Stock Market

Using Tweets to Predict the Stock Market 1. Abstract Using Tweets to Predict the Stock Market Zhiang Hu, Jian Jiao, Jialu Zhu In this project we would like to find the relationship between tweets of one important Twitter user and the corresponding

More information

Energy Savings from Business Energy Feedback

Energy Savings from Business Energy Feedback Energy Savings from Business Energy Feedback Behavior, Energy, and Climate Change Conference 2015 October 21, 2015 Jim Stewart, Ph.D. INTRODUCTION 2 Study Background Xcel Energy runs the Business Energy

More information

Threshold Autoregressive Models in Finance: A Comparative Approach

Threshold Autoregressive Models in Finance: A Comparative Approach University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Informatics 2011 Threshold Autoregressive Models in Finance: A Comparative

More information

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch The Viability of StockTwits and Google Trends to Predict the Stock Market By Chris Loughlin and Erik Harnisch Spring 2013 Introduction Investors are always looking to gain an edge on the rest of the market.

More information

Probabilistic Forecasting of Medium-Term Electricity Demand: A Comparison of Time Series Models

Probabilistic Forecasting of Medium-Term Electricity Demand: A Comparison of Time Series Models Fakultät IV Department Mathematik Probabilistic of Medium-Term Electricity Demand: A Comparison of Time Series Kevin Berk and Alfred Müller SPA 2015, Oxford July 2015 Load forecasting Probabilistic forecasting

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling 1 Forecasting Women s Apparel Sales Using Mathematical Modeling Celia Frank* 1, Balaji Vemulapalli 1, Les M. Sztandera 2, Amar Raheja 3 1 School of Textiles and Materials Technology 2 Computer Information

More information

Forecasting in supply chains

Forecasting in supply chains 1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the

More information

FORECASTING. Operations Management

FORECASTING. Operations Management 2013 FORECASTING Brad Fink CIT 492 Operations Management Executive Summary Woodlawn hospital needs to forecast type A blood so there is no shortage for the week of 12 October, to correctly forecast, a

More information

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

More information

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting

More information

JetBlue Airways Stock Price Analysis and Prediction

JetBlue Airways Stock Price Analysis and Prediction JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue

More information

Methodology For Illinois Electric Customers and Sales Forecasts: 2016-2025

Methodology For Illinois Electric Customers and Sales Forecasts: 2016-2025 Methodology For Illinois Electric Customers and Sales Forecasts: 2016-2025 In December 2014, an electric rate case was finalized in MEC s Illinois service territory. As a result of the implementation of

More information

Machine Learning in Statistical Arbitrage

Machine Learning in Statistical Arbitrage Machine Learning in Statistical Arbitrage Xing Fu, Avinash Patra December 11, 2009 Abstract We apply machine learning methods to obtain an index arbitrage strategy. In particular, we employ linear regression

More information

USING TWITTER TO PREDICT SALES: A CASE STUDY

USING TWITTER TO PREDICT SALES: A CASE STUDY USING TWITTER TO PREDICT SALES: A CASE STUDY Remco Dijkman 1, Panagiotis Ipeirotis 2, Freek Aertsen 3, Roy van Helden 4 1 Eindhoven University of Technology, Eindhoven, The Netherlands r.m.dijkman@tue.nl

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

Twitter mood predicts the stock market.

Twitter mood predicts the stock market. Twitter mood predicts the stock market. Johan Bollen,,Huina Mao,,Xiao-Jun Zeng. : authors made equal contributions. arxiv:00.3003v [cs.ce] 4 Oct 00 Abstract Behavioral economics tells us that emotions

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp.

More information

A STOCHASTIC DAILY MEAN TEMPERATURE MODEL FOR WEATHER DERIVATIVES

A STOCHASTIC DAILY MEAN TEMPERATURE MODEL FOR WEATHER DERIVATIVES A STOCHASTIC DAILY MEAN TEMPERATURE MODEL FOR WEATHER DERIVATIVES Jeffrey Viel 1, 2, Thomas Connor 3 1 National Weather Center Research Experiences for Undergraduates Program and 2 Plymouth State University

More information

VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR

VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR Andrew Goldstein Yale University 68 High Street New Haven, CT 06511 andrew.goldstein@yale.edu Alexander Thornton Shawn Kerrigan Locus Energy 657 Mission St.

More information

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015 AT&T Global Network Client for Windows Product Support Matrix January 29, 2015 Product Support Matrix Following is the Product Support Matrix for the AT&T Global Network Client. See the AT&T Global Network

More information

A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH g.schindlmayr@enbw.com

A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH g.schindlmayr@enbw.com A Regime-Switching Model for Electricity Spot Prices Gero Schindlmayr EnBW Trading GmbH g.schindlmayr@enbw.com May 31, 25 A Regime-Switching Model for Electricity Spot Prices Abstract Electricity markets

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Forecasting Methods. What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes?

Forecasting Methods. What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes? Forecasting Methods What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes? Prod - Forecasting Methods Contents. FRAMEWORK OF PLANNING DECISIONS....

More information

Stock Market Forecasting Using Machine Learning Algorithms

Stock Market Forecasting Using Machine Learning Algorithms Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of

More information

How can smart data analytics improve the way of information provision? -The trend of methods and an exploratory analysis-

How can smart data analytics improve the way of information provision? -The trend of methods and an exploratory analysis- How can smart data analytics improve the way of information provision? -The trend of methods and an exploratory analysis- Central Research Institute of Electric Power Industry Hidenori Komatsu Ken-ichiro

More information

Exponential Smoothing with Trend. As we move toward medium-range forecasts, trend becomes more important.

Exponential Smoothing with Trend. As we move toward medium-range forecasts, trend becomes more important. Exponential Smoothing with Trend As we move toward medium-range forecasts, trend becomes more important. Incorporating a trend component into exponentially smoothed forecasts is called double exponential

More information

Forecasting DISCUSSION QUESTIONS

Forecasting DISCUSSION QUESTIONS 4 C H A P T E R Forecasting DISCUSSION QUESTIONS 1. Qualitative models incorporate subjective factors into the forecasting model. Qualitative models are useful when subjective factors are important. When

More information

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams 2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment

More information

Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail: jina@ilstu.edu

Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail: jina@ilstu.edu Submission for ARCH, October 31, 2006 Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail: jina@ilstu.edu Jed L. Linfield, FSA, MAAA, Health Actuary, Kaiser Permanente,

More information

Demand forecasting & Aggregate planning in a Supply chain. Session Speaker Prof.P.S.Satish

Demand forecasting & Aggregate planning in a Supply chain. Session Speaker Prof.P.S.Satish Demand forecasting & Aggregate planning in a Supply chain Session Speaker Prof.P.S.Satish 1 Introduction PEMP-EMM2506 Forecasting provides an estimate of future demand Factors that influence demand and

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL

More information

Can Twitter Predict Royal Baby's Name?

Can Twitter Predict Royal Baby's Name? Summary Can Twitter Predict Royal Baby's Name? Bohdan Pavlyshenko Ivan Franko Lviv National University,Ukraine, b.pavlyshenko@gmail.com In this paper, we analyze the existence of possible correlation between

More information

Change-Point Analysis: A Powerful New Tool For Detecting Changes

Change-Point Analysis: A Powerful New Tool For Detecting Changes Change-Point Analysis: A Powerful New Tool For Detecting Changes WAYNE A. TAYLOR Baxter Healthcare Corporation, Round Lake, IL 60073 Change-point analysis is a powerful new tool for determining whether

More information

When someone asks, How s the economy. A New Look at Economic Indexes for the States in the Third District. Theodore M. Crone*

When someone asks, How s the economy. A New Look at Economic Indexes for the States in the Third District. Theodore M. Crone* A New Look at Economic Indexes for the States in the Third District Theodore M. Crone A New Look at Economic Indexes For the States in the Third District *Ted Crone is a vice president in the Research

More information

A Note on Using Calendar Module in Demetra+ (UNPUBLISHED MANUSCRIPT)

A Note on Using Calendar Module in Demetra+ (UNPUBLISHED MANUSCRIPT) A Note on Using Calendar Module in Demetra+ (UNPUBLISHED MANUSCRIPT) N. Alpay KOÇAK 1 Turkish Statistical Institute Ankara, 28 September 2011 1 Expert, alpaykocak@tuik.gov.tr The views expressed are the

More information

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS Carlos Andre Reis Pinheiro 1 and Markus Helfert 2 1 School of Computing, Dublin City University, Dublin, Ireland

More information

LOYOLAN. Los Angeles he 2015-2016. Advertising Guide ADVERTISING THAT REACHES LMU ONLINE 24/7 AT LALOYOLAN.COM ON TWITTER @LALOYOLAN

LOYOLAN. Los Angeles he 2015-2016. Advertising Guide ADVERTISING THAT REACHES LMU ONLINE 24/7 AT LALOYOLAN.COM ON TWITTER @LALOYOLAN Los Angeles he 2015-2016 Advertising Guide ADVERTISING THAT REACHES LMU ONLINE 24/7 AT LA.COM IN PRINT ONCE A WEEK ON TWITTER @LA ON OUR APP FOR MOBILE & TABLETS ON FACEBOOK /LOSANGELES THE IS EVERYWHERE

More information

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas Regression and Time Series Analysis of Petroleum Product Sales in Masters Energy oil and Gas 1 Ezeliora Chukwuemeka Daniel 1 Department of Industrial and Production Engineering, Nnamdi Azikiwe University

More information

Analysis One Code Desc. Transaction Amount. Fiscal Period

Analysis One Code Desc. Transaction Amount. Fiscal Period Analysis One Code Desc Transaction Amount Fiscal Period 57.63 Oct-12 12.13 Oct-12-38.90 Oct-12-773.00 Oct-12-800.00 Oct-12-187.00 Oct-12-82.00 Oct-12-82.00 Oct-12-110.00 Oct-12-1115.25 Oct-12-71.00 Oct-12-41.00

More information

TIME SERIES ANALYSIS

TIME SERIES ANALYSIS TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 lmb@iasri.res.in. Introduction Time series (TS) data refers to observations

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller

Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence Dr. Sulkhan Metreveli Leo Keller The greed https://www.youtube.com/watch?v=r8y6djaeolo The money https://www.youtube.com/watch?v=x_6oogojnaw

More information

8. Time Series and Prediction

8. Time Series and Prediction 8. Time Series and Prediction Definition: A time series is given by a sequence of the values of a variable observed at sequential points in time. e.g. daily maximum temperature, end of day share prices,

More information

We provide the following resources online at http:// compstorylab.org/share/papers/dodds2014a/ and at

We provide the following resources online at http:// compstorylab.org/share/papers/dodds2014a/ and at S1 SUPPLEMENTARY INFORMATION Online, interactive visualizations: We provide the following resources online at http:// compstorylab.org/share/papers/dodds2014a/ and at http://hedonometer.org. Links to example

More information

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships

More information

EXPLOITING TWITTER IN MARKET RESEARCH FOR UNIVERSITY DEGREE COURSES

EXPLOITING TWITTER IN MARKET RESEARCH FOR UNIVERSITY DEGREE COURSES EXPLOITING TWITTER IN MARKET RESEARCH FOR UNIVERSITY DEGREE COURSES Zhenar Shaho Faeq 1,Kayhan Ghafoor 2, Bawar Abdalla 3 and Omar Al-rassam 4 1 Department of Software Engineering, Koya University, Koya,

More information

ENERGY STAR for Data Centers

ENERGY STAR for Data Centers ENERGY STAR for Data Centers Alexandra Sullivan US EPA, ENERGY STAR February 4, 2010 Agenda ENERGY STAR Buildings Overview Energy Performance Ratings Portfolio Manager Data Center Initiative Objective

More information

Social Market Analytics, Inc.

Social Market Analytics, Inc. S-Factors : Definition, Use, and Significance Social Market Analytics, Inc. Harness the Power of Social Media Intelligence January 2014 P a g e 2 Introduction Social Market Analytics, Inc., (SMA) produces

More information

Predicting Asset Value Through Twitter Buzz

Predicting Asset Value Through Twitter Buzz Predicting Asset Value Through Twitter Buzz Xue Zhang a,b, Hauke Fuehres b, Peter A. Gloor b a Department of Mathematic and Systems Science, National University of Defense Technology, Changsha, Hunan,

More information

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Jan Krhovják Basic Idea Behind the Statistical Tests Generated random sequences properties as sample drawn from uniform/rectangular

More information

3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Using Twitter as a source of information for stock market prediction

Using Twitter as a source of information for stock market prediction Using Twitter as a source of information for stock market prediction Ramon Xuriguera (rxuriguera@lsi.upc.edu) Joint work with Marta Arias and Argimiro Arratia ERCIM 2011, 17-19 Dec. 2011, University of

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Data Analysis of Trends in iphone 5 Sales on ebay

Data Analysis of Trends in iphone 5 Sales on ebay Data Analysis of Trends in iphone 5 Sales on ebay By Wenyu Zhang Mentor: Professor David Aldous Contents Pg 1. Introduction 3 2. Data and Analysis 4 2.1 Description of Data 4 2.2 Retrieval of Data 5 2.3

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

The Influence of Sentimental Analysis on Corporate Event Study

The Influence of Sentimental Analysis on Corporate Event Study Volume-4, Issue-4, August-2014, ISSN No.: 2250-0758 International Journal of Engineering and Management Research Available at: www.ijemr.net Page Number: 10-16 The Influence of Sentimental Analysis on

More information

Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality

Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality Anindya Ghose, Panagiotis G. Ipeirotis {aghose, panos}@stern.nyu.edu Department of

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

More information

Is the Basis of the Stock Index Futures Markets Nonlinear?

Is the Basis of the Stock Index Futures Markets Nonlinear? University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2011 Is the Basis of the Stock

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Twitter Volume Spikes: Analysis and Application in Stock Trading

Twitter Volume Spikes: Analysis and Application in Stock Trading Twitter Volume Spikes: Analysis and Application in Stock Trading Yuexin Mao University of Connecticut yuexin.mao@uconn.edu Wei Wei FinStats.com weiwei@finstats.com Bing Wang University of Connecticut bing@engr.uconn.edu

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

I. Introduction. II. Background. KEY WORDS: Time series forecasting, Structural Models, CPS

I. Introduction. II. Background. KEY WORDS: Time series forecasting, Structural Models, CPS Predicting the National Unemployment Rate that the "Old" CPS Would Have Produced Richard Tiller and Michael Welch, Bureau of Labor Statistics Richard Tiller, Bureau of Labor Statistics, Room 4985, 2 Mass.

More information

Integrated Resource Plan

Integrated Resource Plan Integrated Resource Plan March 19, 2004 PREPARED FOR KAUA I ISLAND UTILITY COOPERATIVE LCG Consulting 4962 El Camino Real, Suite 112 Los Altos, CA 94022 650-962-9670 1 IRP 1 ELECTRIC LOAD FORECASTING 1.1

More information