Clustering Time Series Based on Forecast Distributions Using Kullback-Leibler Divergence
|
|
|
- Shannon Griffith
- 10 years ago
- Views:
Transcription
1 Clustering Time Series Based on Forecast Distributions Using Kullback-Leibler Divergence Taiyeong Lee, Yongqiao Xiao, Xiangxiang Meng, David Duling SAS Institute, Inc 100 SAS Campus Dr. Cary, NC 27513, USA {taiyeong.lee, yongqiao.xiao, xiangxiang.meng, ABSTRACT One of the key tasks in time series data mining is to cluster time series. However, traditional clustering methods focus on the similarity of time series patterns in past time periods. In many cases such as retail sales, we would prefer to cluster based on the future forecast values. In this paper, we show an approach to cluster forecasts or forecast time series patterns based on the Kullback-Leibler divergences among the forecast densities. We use the same normality assumption for error terms as used in the calculation of forecast confidence intervals from the forecast model. So the method does not require any additional computation to obtain the forecast densities for the Kullback-Leibler divergences. This makes our approach suitable for mining very large sets of time series. A simulation study and two real data sets are used to evaluate and illustrate our method. It is shown that using the Kullback-Leibler divergence results in better clustering when there is a degree of uncertainty in the forecasts. Keywords Time Series Clustering, Time Series Forecasting, Kullback- Leibler Divergence, Euclidean Distance 1. INTRODUCTION Time series clustering has been used in many data mining areas such as retail, energy, weather, quality control chart, stock/financial data, and sequence/time series data generated by medical devices etc[3, 12, 14]. Typically, the observed data is used directly or indirectly as a source of time series clustering. For example, we can cluster CO2 emission patterns of each country based on their historical data or based on some extracted features from the historical data. Numerous similarity/dissimilarity/distance/divergence measures [4, 5, 8] have been proposed and studied. Another category of time series clustering methods is the model-based clustering technique, which clusters time series using the parameter estimates of the models or other statistics using the errors associated with the estimates[10, 13]. In[11], Liao summarized these time series clustering methods into three categories: raw data based, extracted feature based, and model based. Instead of using the observed time series, or some extracted features of the observations, or even models in the past time periods, we consider forecasts themselves at a specific future forecast time point or during a future time period. For the retail stores, we can cluster them based on their sales forecast distributions at a particular future time, instead of the observed sales data. Alonso [1] used density forecast models for time series clustering at a specific future time point. However, since the method [1] requires bootstrap samples, nonparametric forecast density estimation, and a specific distance measurement between the forecast densities, it is not an efficient approach to cluster a large number of time series. In this paper, we use the Kullback-Leibler divergence [9] for clustering the forecasts at a future point. Under the normality assumption in the error, the Kullback-Leibler distance can be computed directly from the forecast means and variances provided by the forecast model. We also extend our method to cluster the forecasts at all future points in the forecast horizon to capture the forecast patterns that could evolve over time. For instance, in the retail industry, business decisions such as stocking up or rearranging the shelves can be made after clustering the products based on the sales forecasts. Similarly, the clustering could be carried out at the store level, so that the store sales or price policies can be made for each group of stores. Typically, the number of time series in the retail industry is very large, and the industry also requires fast forecasting as well as fast clustering. The proposed method is suitable for clustering large amounts of forecasts. The paper is organized as follows. In Sections 2 and 3, we describe the KL divergence as a distance measure of forecast densities, and explain how to cluster forecasts. Following that, a simulation study and real data analyses are presented. 2. DISTANCE MEASURE FOR CLUSTER- ING FORECASTS Since forecasts are not observed values, the Euclidean distance between two forecast values may not be close to the true distance. Our proposed method uses a symmetric version of Kullback-Leibler divergence to calculate the distance
2 between the forecast densities under the normal assumption of the forecast error terms. In another word, both mean (forecast) and variance(forecast variance) are used in the calculation of the distance. assumption, the Kullback-Leibler distance between the forecast distributions of f 0 and f 1 consider both the mean and variance information of the forecasts, which has the following relationship with the Euclidean distance, 2.1 Kullback-Leibler Divergence Suppose P 0 and P 1 are the probability distributions of two continuous random variables, the Kullback-Leibler divergence of P 0 from P 1 is defined as KLD avg(f 1, f 0) = 1 4 ( 1ˆσ ˆσ )EUC(f 1, f 0) (K + 1 K ) (5) KLD(P 1 P 0) = p 1(x) log p1(x) dx (1) p 0(x) where p 0 and p 1 are the density functions of P 0 and P 1. The Kullback-Leibler divergence KLD(P 1 P 0) is not a symmetric measure of the difference between P 0 and P 1, but in clustering we need to define a symmetric version of distance measure for the items (in this paper, time series) to be grouped. A well-known symmetric version of the Kullback-Leibler divergence is the average of two divergences KLD(P 1 P 0) and KLD(P 0 P 1), where K = ˆσ2 1 is the relative ratio of the noises in two forecast f 0 and f ˆσ The following plots of normal density functions (Figure 1 and Figure 2) show that using the forecast values without considering their distributions (that is, using EUC distance) may not be appropriate in clustering forecast values. The plots show the reverse relationship between what the KL and the Euclidean distances measure. KLD avg(p 1, P 0) = 1 {KLD(P1 P0) + KLD(P0 P1)} 2 = 1 (p 1(x) p 0(x)) log p1(x) dx (2) 2 p 0(x) This is also known as the J-Divergence of P 0 and P 1 [7]. When P 1 and P 0 are two normal distributions, that is, P 1 N(µ 1, σ 2 1) and P 0 N(µ 0, σ 2 0), the KLD avg can be simplified as follows, Figure 1: An example of forecasts with the same mean values but different errors KLD(P 1, P 0) = 1 [(µ 2σ0 2 1 µ 0) 2 + (σ1 2 σ0) 2 2 ] + log σ0 σ 1 KLD avg(p 1, P 0) = 1 2 ( )[(µ 2σ0 2 2σ1 2 1 µ 0) 2 + (σ1 2 σ0) 2 2 ] (3) In the rest of the paper, we denote the symmetric version of KL divergence in (3) as KL distance. 2.2 KL and Euclidean Distances for Clustering Forecasts For two forecasts f 0 and f 1 with values ˆµ 0, ˆµ 1 and standard errors ˆσ 0, ˆσ 1, the Euclidean distance between the two forecasts is defined as EUC(f 1, f 2) = (ˆµ 1 ˆµ 0) 2 (4) In consistent with the definition of the KL divergence for normal density, here we define the squared distance function. Using Euclidean distance for clustering forecast time series ignores the variance information (ˆσ 0 2 and ˆσ 1) 2 of the underlying forecast distributions. In contrast, under normal Figure 2: An example of forecasts with different mean values but the same errors When two forecast values are the same and the forecast distributions are ignored, the two forecast values are definitely clustered into the same category based on the mean difference (Euclidean distance). However, when we use the KL distance, clustering forecast values may produce a different result even when the mean difference is zero (Figure 1). For example, let us consider the sales data from retail stores. The sales forecasts of two stores are both zero in the next week but their standard deviations are different from each other as shown in Figure 1. When we do not consider the forecast distributions, the two stores are clustered into the same segment and may get the same sales policy for the
3 coming week. Contrary to Figure 1, Figure 2 shows two different forecast values of sales (0 and 50) with the same large standard deviations. Based on the KL distance, the forecast sales of the two stores in Figure 2 show less difference than the forecast sales of two stores in Figure 1 (KL distance = 1.78 vs. KL distance = 0.22). In other words, two stores in Figure 1 are less likely to be clustered into the same segment comparing with the two stores in Figure 2 even though their forecast values are identical (Figure 1). We also observe the following properties of the symmetric Kullback-Leibler Divergence as defined in Equation 5, Property 1. KLD avg is not scale free, that is, it depends on the forecast errors. Especially when ˆσ 1 = ˆσ 0, KLD avg = 1 EUC(ˆµ 2ˆσ 1, ˆµ 0) 0 2 Property 1 is desirable for clustering the forecasts, since we want to distinguish the forecast mean values together with their errors. It indicates that when the errors of the forecasts are the same, the KL distance differs from the Euclidean distance with a ratio which depends on the error. Property 2. Suppose there exists a constant c > 0 such that ˆσ 0 = c ˆσ 1, KLD avg 0 when ˆσ 0. Property 2 implies that the KL distance cannot distinguish two forecasts when their errors are both very large. This indicates that the forecasting models are also very important while clustering the forecasts. If a poor forecast model is fit, we may end up with few clusters because the errors make the forecasts indistinguishable. Property 3. Under ths same condition in Property 2, KLD avg when ˆσ 0 = c ˆσ 1 0. Property 3 tells us that the KL distance cannot group two forecasts when their errors are very small. In theory, when we have perfect forecasts (errors are zero), there is no need to consider the errors in clustering. However, in practice, this does not hold, since the errors will increase when we forecast further away, as shown by the example in Equation Forecast Distributions for KL Divergence To get the KL distance among forecasts, we need to know the forecast density. As stated before, we utilize the forecast distributions that are used in the calculation of forecast confidence intervals to compute KL distance. Since the forecast confidence intervals are readily available in any forecast software, it saves a lot of time and computing resources compared to [1], which needs the full forecast density estimation for the calculation of the distance matrix. As an example, we show how to get the k-step ahead forecast value and variance in the simple exponential smoothing model. Under the assumption of Gaussian white noise process, Y t = µ t + ɛ t, t = 1, 2,... then the smoothing equation is S t = αy t + (1 α)s t 1, and the k-step ahead forecast of Y t is S t, i.e. Ŷ t(k) = S t. The simple exponential smoothing model uses an exponentially weighted moving average of the past values. The model is equivalent to ARIMA(0,1,1) model without constant. So the model is (1 B)Y t = (1 θb)ɛ t, where θ = 1 α. Thus Y t = ɛ t + j=1 αɛt j. Therefor the variance of Ŷt(k) k 1 V (Ŷt(k)) = V (ɛt)[1 + α 2 ] = V (ɛ t)[1 + (k 1)α 2 ]. (6) j=1 Under the Gaussian white noise assumption, Ŷ t(k) follows N(Ŷt(k), V (Ŷt(k))). Therefore the KL distance of two forecasts at a future time point can be easily obtained using Equation CLUSTERING THE FORECASTS When a distance function has been defined between all pairs of forecasts, we can use available clustering algorithms to cluster the forecasts. A hierarchical clustering algorithm needs a distance matrix between all the pairs, while the more scalable k-means clustering algorithm requires the distance between a group of points (typically represented by the centroid of the group) and any other single point. Thanks to the additive property of the normal distribution, that is, the sum of two independent normal random variables still follows a normal distribution with mean and variance equal to the sum of the individual means and variances respectively, the KL distance between a group of points and any other single point can be easily computed as well. Therefore, we can use both the hierarchical and the k-means clustering algorithms with the KL distance for clustering the forecasts. When clustering the forecasts, we consider two scenarios: clustering forecast values at a particular future time point, and clustering the forecast series for all future time points in the forecast horizon. Clustering the forecasts at a future time point helps us understand the forecasts and their clusters at the given time point, while clustering the forecast series helps us understand the overall forecast patterns. 3.1 Clustering at One Future Point Let ˆX t(k) and Ŷt(k) be the k-step ahead forecasts of two time series X t and Y t, and ˆσ x(k) and ˆσ y(k) be the standard errors of the forecasts. The KLD avg( ˆX t(k), Ŷt(k)) between the two forecasts can be calculated using Equation 5.
4 The steps of clustering the forecasts at a future time point are shown below. In this report, we consider hierarchical clustering, but the the procedure can be easily modified for any non-hierarchical clustering algorithms such as k-means. 1. Apply forecasting models to a forecast lead time k. 400 times. We fit AR(2) models to the simulated time series and obtain the forecast values and variances. For the synthetic data, since we know the group label of each series, we can easily compute the clustering error rate (CER). We report the mean clustering error rates of both distance measures for each SNR setting. 2. Obtain forecasts ( ˆX t(k), Ŷt(k)) and their standard errors (ˆσ x(k), ˆσ y(k)) fore each pair of the time series. 3. Calculate the KL distance matrix among all pairs of the time series. 4. Apply a clustering algorithm with the KL distances. 5. Obtain the clusters of the forecasts. 3.2 Clustering the Forecast Series The clusters at different future time points may be different. To capture the changes of the whole forecast pattern, we can cluster the forecast series for all future time points. Given a total forecast lead h, we extend the KL distance as follows. KLD avg( ˆX t, Ŷt) = h k=1 KLD avg( ˆX t(k), Ŷt(k)). (7) Note that we still define the squared distance. The steps of clustering the forecast series are 1. Apply forecasting models with total forecast lead h. 2. Obtain forecasts ( ˆX t(k), Ŷt(k)) and their standard errors (ˆσ x(k), ˆσ y(k)) at each lead time points k, k = 1, 2,... h. 3. Calculate the KL distance matrix among all pairs of the time series using Equation Apply a clustering algorithm with the KL distances. 5. Obtain the clusters of the forecasts. 4. A SIMULATION STUDY To demonstrate the performance of the proposed KL distance for clustering the forecasts, we simulate two groups of time series with the same autoregressive AR(2) [2] structure but different intercepts. Each time series is of length 100, and there are 50 time series in each group. X (i) t = µ i X t 1 0.5X t 2 + σ i, where t = 1, 2,..., 100, i = 1, 2. The two groups have µ 1 = 0 and µ 2 = 1, respectively. The standard errors of the white noise for both groups (σ 1 and σ 2) vary from 0.5 to 5 by 0.5 in order to examine the performance difference of the KL distance and the Euclidean distance for time series with different signal-to-noise ratios (SNR). This yields 100 settings of SNR combinations (σ 1 and σ 2) and for each setting we repeat the simulations for Figure 3: The mean clustering error rates in 400 simulations for clustering two groups of time series with different combinations of noise standard errors (σ 1, σ 2). The forecast leads are 10. Top: Euclidean distance; Bottom: KL distance. The density plots in Figure 3 show the mean CER for both methods in 400 simulations in the clustering of all future time points with forecast length 10. It is clear that the proposed KL distance outperforms the traditional Euclidean distance in a variety of SNR combinations. Especially, when one group of time series tends to have a relative high SNR compared with the other group of time series, shown in the top-left and bottom-right corners in the density plots, the KL distance can help identify the true grouping of the time series (mean CERs are close to zero) but the Euclidean distance results in poor clustering results (mean CERS are around 15%). When both groups of time series have high noise standard errors and the two standard errors are close, shown in the top-right corners in the density plots, the performances of both distance measures are poor.
5 by all the countries in the world 1, and the other is the weekly sales amounts by store of a retail chain. The summary of the data sets is shown in Table 1. Data Set Number of of Series Length Frequency CO Yearly Sales Monthly Table 1: Summary of Data Sets The CO2 data consists of the CO2 emissions (metric tons per capita) of 216 countries from year 1961 to 2008 (The CO2 data from year 1960 to 1999 was used in [1]). There is one CO2 emission time series recorded for each country. For better comparison of the results, we remove the series with missing values, and it ends up with 146 complete series. The sales data is a small subset of the weekly sales data of a big retail chain in the USA, which has millions of products and thousands of stores. The subset has the aggregated sales of a department for 43 stores. Each store has 152 weeks of sales history from February 2008 to December There are 43 time series, and no other exogenous variables besides the aggregated sales in the data. Figure 4: The mean clustering error rates in 400 simulations for clustering two groups of time series with different combinations of noise standard errors (σ 1, σ 2). The forecast leads are 1. Top: Euclidean distance; Bottom: KL distance. In Figure 4, we show the simulation results for the same SNR combinations as in Figure 3 but the clustering is on one future forecast (forecast length is 1). Clearly, the proposed KL distance measure still has better mean CERs compared with the traditional Euclidean measure in the clustering of one future time point. Consistent with the results in Figure 3, when the noise standard errors of the two groups of time series are different (the top-left and bottom-right corners in the density plots), the KL distance yields much smaller error rates than the Euclidean distance does. When comparing across Figure 3 and Figure 4, it is found that for the same SNRs, clustering time series based on 10 forecast time points tends to produce better results compared with clustering time series based on just one future time point, which suggests using a sufficient length of forecast is necessary to guarantee the further clustering based on the forecasts. 5. REAL LIFE DATA STUDY Two real life data sets are investigated to further evaluate the proposed K-L distance: one data set is the CO2 emission For the real life data sets, we cannot compute the clustering error rate since we don t have the true class labels available. In order to compare the quality of the clustering results, we do a holdout test, that is, we hold out the most recent h periods of data for testing, and the forecasting models are built with the data prior to the holdout periods (training data). For example, for the sales data, we holdout the most recent 12 weeks of data, and the forecasting model are built with the 140 weekly data prior to the recent 12 weeks. After building the appropriate forecasting models, we use the models to obtain forecast values and variance for the holdout periods. Since our focus is on the evaluation of different distance functions in the clustering of the forecasts instead of the accuracy of the models, we simply fit the training data with the best ESM (Exponential Smoothing Model) model for both the sales and the CO2 data sets. The candidate ESM models include simple, double, linear, damped trend, seasonal, additive and multiplicative Winters method. We select the best model with the minimum Root Mean Squared Error (RMSE). After computing the distance matrix using either KL or Euclidean distance, hierarchical clustering is used. 5.1 CO2 Emission Data by Country There are 146 complete time series after removing those with missing values. The initial exploration of the data indicates that there is an outlier series(country Qatar), which has significantly higher CO2 emissions per capita than the rest. So we remove the outlier from the further data analysis. For the rest 145 series, we set the holdout period to 5. After clustering the forecasts in the holdout period, we plot the actual time series for each cluster of the forecasts when setting the number of clusters to 3 in Figure
6 Figure 6: Two forecast series with different forecast errors and separated by the KL distance: Albania in cluster 2 and Afghanistan in cluster 1. When using the Euclidean distance, both countries are in cluster 1. The clusters are shown in Figure 5. Figure 5: Clusters of the time series for the CO2 data: each line shows a series of the actual CO2 emissions by a country in the holdout 5 years, while the clusters are identified based on the clustering of the forecast series in the holdout 5 years using Euclidean distance (EUC) and KL distance (KLD). We can see that Euclidean distance separates the countries into 3 clusters: cluster 1 with relatively low CO2 emissions (106 countries), cluster 2 with medium CO2 emissions (29 countries), and cluster 3 with relatively high CO2 emissions (10 countries). When the errors in the forecasts are considered, the clusters are different. With KL distance, clusters 1, 2 and 3 have 65, 52 and 28 countries respectively. Figures 6 and 7 illustrate the difference in the clustering results between the KL distance and the Euclidean distance. Notice that the dash lines are for the fits and forecasts, the solid lines are for the actual time series (including holdouts), and the filled areas are for the confidence intervals of the forecast periods. Using Euclidean distance, Afghanistan and Albania in Figure 6 are in cluster 1. However, Albania is separated from cluster 1 into cluster 2 when using the KL distance. This is because there are larger errors in the forecasts for Albania, and thus the KL distance separates the time series from cluster 1. Figure 7 shows that Australia and United Kingdom are in different clusters (3 and 2 respectively) when using the Euclidean distance. Instead, they are both put into cluster 3 when using the KL distance because their forecast distributions are similar. Figure 7: Two forecast series with similar forecast errors and grouped into one cluster by the KL distance: both Korea and Australia are in cluster 3. When using the Euclidean distance, Korea is in cluster 2 and Australia is in cluster 3. The clusters are shown in Figure 5. in forecasting the CO2 data with MAPE (Mean Absolute Percent Error) in the holdout period about 10%. It is also interesting to observe that the cluster pattern changes over time when we cluster at each time point in the holdout. Table 2 illustrates the cluster membership at each time point in the holdout and the overall cluster membership for four countries with the KL distance. The actual values, forecast values and the confidence intervals for these countries in the holdout period are shown in Figure 8. Within five years, most countries stay in the same cluster. But the cluster membership of a country could change because of the forecast changes. For example, the forecasts for China have an upward trend, and its cluster is changed from 1 to 2 in The errors in the forecasts could also contribute to the changes of cluster membership from one future time point to another. We checked the accuracy of the forecasting models (the best ESM). It turned out the best ESM models perform very well
7 Cluster Country Overall Kenya China Mexico USA Table 2: An illustration of the cluster changes over time for the C02 emissions of four countries. The cluster membership for each year is obtained based on the clustering of forecasts at that year using KL distance. The overall cluster membership is obtained based on clustering of the whole forecast time series using KL distance. For illustration, the number of clusters is set to 3. Figure 8: The time series plots of the four countries listed in Table 2. The clustering of one future forecast is performed at each of the five years in the holdout period. Dash: the forecast values, Solid: the actual holdout values, filled areas: the forecast confidence intervals. 5.2 Retail Store Sales Data We set the holdout period to be 12 weeks. When setting the number of clusters to 5, the actual time series for the clusters based on the forecasts in the holdout period are shown in Figure 9. With Euclidean distance, the stores are grouped into 5 clusters: 17, 16, 5, 3, 2 stores are in clusters 1, 2, 3, 4 and 5, respectively. By KL Distance, clusters 1, 2, 3, 4 and 5 have the number of stores 14, 6, 14, 7, 2 respectively. We check the forecast accuracy of the models. It turns out that the best ESM models have MAPE 40% in the holdout period. Notice that for the sales data the forecasting models could be enhanced to include other models like ARIMA[2], UCM[6] and with other exogenous variables such prices, holidays, promotions, etc. Figure 9: Clusters of the time series for the Sales data: each line shows a series of the actual sales in the holdout 12 weeks, while the clusters are identified based on the clustering of the forecast series in the holdout 12 weeks using Euclidean distance (EUC) and KL distance (KLD). We then experiment without the holdout, that is, all avail-
8 sets. We would like to enhance KL distance to handle cases when the forecast errors are extremely large or small, in which cases the current KL distance may not perform well. Different clustering algorithms such as k-means can be applied as well. Dynamic Time Warping[8] together with KL distance to detect forecast pattern changes for series of different length can also be an extension of this paper. Figure 10: Forecasts for the Sales data: each line shows a series of the forecast values in the future 6 weeks. The squares show the weeks with interesting forecast clustering pattern changes. Figure 11: Cluster changes over time in the future for the Sales data. The cluster membership for each week is obtained based on the clustering of forecasts at that week using KL distance. able data points are used to build the forecasting models. Forecast values for the next six weeks are obtained using the best ESM models and the KL distances are calculated at three specific future weeks. The interesting forecasting lead points and forecast values are marked with squares at Figure 10. Figure 11 shows the cluster change over time. We can see that some stores which are grouped in the same cluster in the first week (2Jan2011) may be separated into different clusters in later weeks (30Jan2011). Thus the retail company may need to adjust sales or promotion policy to each store and may set different business goals from week to week. 6. CONCLUSIONS AND FUTURE WORK In this paper we used the KL distance for clustering the forecast distributions of time series. The KL distance requires density functions, and the clustering of a larger number of time series with full density estimations of the forecasts takes a lot of computing resources. We approximated the forecast density using normal density with the forecast means and variances, which are directly provided by the forecast model. Thus, the KL distances among forecasts can be easily obtained. We demonstrated the advantage of using the KL distance over the Euclidean distance with simulations from autoregressive models. We also showed that the KL distance improved the clustering results in two real life data 7. REFERENCES [1] A. Alonso, J. Berrendero, A. Hernandez, and A. Justel. Time series clustering based on forecast densities. Computational Statistics & Data Analysis, 51(2): , [2] G. Box, G. Jenkins, and G. Reinsel. Time Series Analysis: Forecasting and Control. Prentice Hall, 3rd edition, [3] P. S. P. Cowpertwait and T. F. Cox. Clustering population means under heterogeneity of variance with an application to a rainfall time series problem. Journal of the Royal Statistical Society. Series D (The Statistician), 41(1): , [4] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. Proc. VLDB Endow., 1: , August [5] A. W.-C. Fu, E. Keogh, L. Y. H. Lau, and C. A. Ratanamahatana. Scaling and time warping in time series querying. In Proceedings of the 31st international conference on Very large data bases, VLDB 05, pages , [6] A. C. Harvey. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, [7] H. Jeffreys. An invariant form for the prior probability in estimation problems. Proc. Roy. Soc. A, 186:453Ű 461, [8] E. Keogh. Exact indexing of dynamic time warping. In Proceedings of the 28th international conference on Very Large Data Bases, VLDB 02, pages , [9] S. Kullback and R. A. Leibler. On information and sufficiency. The Annals of Mathematical Statistics, 22(1):79 86, [10] M. Kumar and N. R. Patel. Clustering data with measurement errors. Comput. Stat. Data Anal., 51: , August [11] T. W. Liao. Clustering of time series data - a survey. Pattern Recognition, 38: , [12] L. R. L. L. V. Macchiato, M. F. and M. Ragosta. Time modelling and spatial clustering of daily ambient temperature: An application in southern italy. Environmetrics, 6(1):31Ű 53, [13] P. C. Mahalanobis. On the generalized distance in statistics. Proceedings of the National Institute of Science Calcutta, 2(1):49 55, [14] R. H. S. Yoshihide Kakizawa and M. Taniguchi. Discrimination and clustering for multivariate time series. Journal of the American Statistical Association, 93(441): , 1998.
Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA
Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA Abstract Virtually all businesses collect and use data that are associated with geographic locations, whether
IBM SPSS Forecasting 22
IBM SPSS Forecasting 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release 0, modification
A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data
A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt
Promotional Forecast Demonstration
Exhibit 2: Promotional Forecast Demonstration Consider the problem of forecasting for a proposed promotion that will start in December 1997 and continues beyond the forecast horizon. Assume that the promotion
ADVANCED FORECASTING MODELS USING SAS SOFTWARE
ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 [email protected] 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting
USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY
Paper PO10 USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY Beatrice Ugiliweneza, University of Louisville, Louisville, KY ABSTRACT Objectives: To forecast the sales made by
Practical Time Series Analysis Using SAS
Practical Time Series Analysis Using SAS Anders Milhøj Contents Preface... vii Part 1: Time Series as a Subject for Analysis... 1 Chapter 1 Time Series Data... 3 1.1 Time Series Questions... 3 1.2 Types
Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC
Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC Abstract Three examples of time series will be illustrated. One is the classical airline passenger demand data with definite seasonal
MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims
STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL
Stock Asian-African Market Trends Journal using of Economics Cluster Analysis and Econometrics, and ARIMA Model Vol. 13, No. 2, 2013: 303-308 303 STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Chapter 25 Specifying Forecasting Models
Chapter 25 Specifying Forecasting Models Chapter Table of Contents SERIES DIAGNOSTICS...1281 MODELS TO FIT WINDOW...1283 AUTOMATIC MODEL SELECTION...1285 SMOOTHING MODEL SPECIFICATION WINDOW...1287 ARIMA
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4
4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression
An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]
An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] Stephan Spiegel and Sahin Albayrak DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7,
Cross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
Time Series Analysis
JUNE 2012 Time Series Analysis CONTENT A time series is a chronological sequence of observations on a particular variable. Usually the observations are taken at regular intervals (days, months, years),
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Recall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS
OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
How To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
Time series Forecasting using Holt-Winters Exponential Smoothing
Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Time Series Analysis
Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)
Financial TIme Series Analysis: Part II
Department of Mathematics and Statistics, University of Vaasa, Finland January 29 February 13, 2015 Feb 14, 2015 1 Univariate linear stochastic models: further topics Unobserved component model Signal
Multivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables
Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
Time Series Analysis and Forecasting Methods for Temporal Mining of Interlinked Documents
Time Series Analysis and Forecasting Methods for Temporal Mining of Interlinked Documents Prasanna Desikan and Jaideep Srivastava Department of Computer Science University of Minnesota. @cs.umn.edu
Chapter 27 Using Predictor Variables. Chapter Table of Contents
Chapter 27 Using Predictor Variables Chapter Table of Contents LINEAR TREND...1329 TIME TREND CURVES...1330 REGRESSORS...1332 ADJUSTMENTS...1334 DYNAMIC REGRESSOR...1335 INTERVENTIONS...1339 TheInterventionSpecificationWindow...1339
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Data Mining mit der JMSL Numerical Library for Java Applications
Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale
Using simulation to calculate the NPV of a project
Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial
TOURISM DEMAND FORECASTING USING A NOVEL HIGH-PRECISION FUZZY TIME SERIES MODEL. Ruey-Chyn Tsaur and Ting-Chun Kuo
International Journal of Innovative Computing, Information and Control ICIC International c 2014 ISSN 1349-4198 Volume 10, Number 2, April 2014 pp. 695 701 OURISM DEMAND FORECASING USING A NOVEL HIGH-PRECISION
I. Introduction. II. Background. KEY WORDS: Time series forecasting, Structural Models, CPS
Predicting the National Unemployment Rate that the "Old" CPS Would Have Produced Richard Tiller and Michael Welch, Bureau of Labor Statistics Richard Tiller, Bureau of Labor Statistics, Room 4985, 2 Mass.
2) The three categories of forecasting models are time series, quantitative, and qualitative. 2)
Exam Name TRUE/FALSE. Write 'T' if the statement is true and 'F' if the statement is false. 1) Regression is always a superior forecasting method to exponential smoothing, so regression should be used
How to Get More Value from Your Survey Data
Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2
SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID
SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID Renewable Energy Laboratory Department of Mechanical and Industrial Engineering University of
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Demand Management Where Practice Meets Theory
Demand Management Where Practice Meets Theory Elliott S. Mandelman 1 Agenda What is Demand Management? Components of Demand Management (Not just statistics) Best Practices Demand Management Performance
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable
Booth School of Business, University of Chicago Business 41202, Spring Quarter 2015, Mr. Ruey S. Tsay. Solutions to Midterm
Booth School of Business, University of Chicago Business 41202, Spring Quarter 2015, Mr. Ruey S. Tsay Solutions to Midterm Problem A: (30 pts) Answer briefly the following questions. Each question has
The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network
, pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and
Energy Demand Forecasting Industry Practices and Challenges
Industry Practices and Challenges Mathieu Sinn (IBM Research) 12 June 2014 ACM e-energy Cambridge, UK 2010 2014 IBM IBM Corporation Corporation Outline Overview: Smarter Energy Research at IBM Industry
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
How To Predict Web Site Visits
Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many
OUTLIER ANALYSIS. Data Mining 1
OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Introduction to time series analysis
Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples
Module 6: Introduction to Time Series Forecasting
Using Statistical Data to Make Decisions Module 6: Introduction to Time Series Forecasting Titus Awokuse and Tom Ilvento, University of Delaware, College of Agriculture and Natural Resources, Food and
UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH
QATAR UNIVERSITY COLLEGE OF ARTS & SCIENCES Department of Mathematics, Statistics, & Physics UNDERGRADUATE DEGREE DETAILS : Program Requirements and Descriptions BACHELOR OF SCIENCE WITH A MAJOR IN STATISTICS
Forecasting the first step in planning. Estimating the future demand for products and services and the necessary resources to produce these outputs
PRODUCTION PLANNING AND CONTROL CHAPTER 2: FORECASTING Forecasting the first step in planning. Estimating the future demand for products and services and the necessary resources to produce these outputs
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Probabilistic Forecasting of Medium-Term Electricity Demand: A Comparison of Time Series Models
Fakultät IV Department Mathematik Probabilistic of Medium-Term Electricity Demand: A Comparison of Time Series Kevin Berk and Alfred Müller SPA 2015, Oxford July 2015 Load forecasting Probabilistic forecasting
Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation
Improved Trend Following Trading Model by Recalling Past Strategies in Derivatives Market
Improved Trend Following Trading Model by Recalling Past Strategies in Derivatives Market Simon Fong, Jackie Tai Department of Computer and Information Science University of Macau Macau SAR [email protected],
AP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts
Page 1 of 20 ISF 2008 Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts Andrey Davydenko, Professor Robert Fildes [email protected] Lancaster
Ch.3 Demand Forecasting.
Part 3 : Acquisition & Production Support. Ch.3 Demand Forecasting. Edited by Dr. Seung Hyun Lee (Ph.D., CPL) IEMS Research Center, E-mail : [email protected] Demand Forecasting. Definition. An estimate
Aspen Collaborative Demand Manager
A world-class enterprise solution for forecasting market demand Aspen Collaborative Demand Manager combines historical and real-time data to generate the most accurate forecasts and manage these forecasts
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
TIME SERIES ANALYSIS
TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 [email protected]. Introduction Time series (TS) data refers to observations
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
SINGULAR SPECTRUM ANALYSIS HYBRID FORECASTING METHODS WITH APPLICATION TO AIR TRANSPORT DEMAND
SINGULAR SPECTRUM ANALYSIS HYBRID FORECASTING METHODS WITH APPLICATION TO AIR TRANSPORT DEMAND K. Adjenughwure, Delft University of Technology, Transport Institute, Ph.D. candidate V. Balopoulos, Democritus
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
Forecasting in STATA: Tools and Tricks
Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be
There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images
A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods
IDENTIFICATION OF DEMAND FORECASTING MODEL CONSIDERING KEY FACTORS IN THE CONTEXT OF HEALTHCARE PRODUCTS
IDENTIFICATION OF DEMAND FORECASTING MODEL CONSIDERING KEY FACTORS IN THE CONTEXT OF HEALTHCARE PRODUCTS Sushanta Sengupta 1, Ruma Datta 2 1 Tata Consultancy Services Limited, Kolkata 2 Netaji Subhash
Applications of improved grey prediction model for power demand forecasting
Energy Conversion and Management 44 (2003) 2241 2249 www.elsevier.com/locate/enconman Applications of improved grey prediction model for power demand forecasting Che-Chiang Hsu a, *, Chia-Yon Chen b a
The CUSUM algorithm a small review. Pierre Granjon
The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Univariate and Multivariate Methods PEARSON. Addison Wesley
Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston
TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS
TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS 1. Bandwidth: The bandwidth of a communication link, or in general any system, was loosely defined as the width of
Overview of Factor Analysis
Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,
Outline: Demand Forecasting
Outline: Demand Forecasting Given the limited background from the surveys and that Chapter 7 in the book is complex, we will cover less material. The role of forecasting in the chain Characteristics of
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Mining Airline Data for CRM Strategies
Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization, Beijing, China, September 15-17, 2007 345 Mining Airline Data for CRM Strategies LENA MAALOUF, NASHAT MANSOUR
On Entropy in Network Traffic Anomaly Detection
On Entropy in Network Traffic Anomaly Detection Jayro Santiago-Paz, Deni Torres-Roman. Cinvestav, Campus Guadalajara, Mexico November 2015 Jayro Santiago-Paz, Deni Torres-Roman. 1/19 On Entropy in Network
Time Series Analysis
Time Series Analysis Identifying possible ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos
CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I
Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting
Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London)
Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London) 1 Forecasting: definition Forecasting is the process of making statements about events whose
Sections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
