Fine-grained Photovoltaic Output Prediction using a Bayesian Ensemble
|
|
|
- Sibyl Francis
- 10 years ago
- Views:
Transcription
1 Fine-grained Photovoltaic Output Prediction using a Bayesian Ensemble Prithwish Chakraborty 1,2, Manish Marwah 3, Martin Arlitt 3, and Naren Ramakrishnan 1,2 1 Department of Computer Science, Virginia Tech, Blacksburg, VA Discovery Analytics Center, Virginia Tech, Blacksburg, VA Sustainable Ecosystems Research Group, HP Labs, Palo Alto, CA Abstract Local and distributed power generation is increasingly reliant on renewable power sources, e.g., solar (photovoltaic or PV) and wind energy. The integration of such sources into the power grid is challenging, however, due to their variable and intermittent energy output. To effectively use them on a large scale, it is essential to be able to predict power generation at a finegrained level. We describe a novel Bayesian ensemble methodology involving three diverse predictors. Each predictor estimates mixing coefficients for integrating PV generation output profiles but captures fundamentally different characteristics. Two of them employ classical parameterized (naive Bayes) and non-parametric (nearest neighbor) methods to model the relationship between weather forecasts and PV output. The third predictor captures the sequentiality implicit in PV generation and uses motifs mined from historical data to estimate the most likely mixture weights using a stream prediction methodology. We demonstrate the success and superiority of our methods on real PV data from two locations that exhibit diverse weather conditions. Predictions from our model can be harnessed to optimize scheduling of delay tolerant workloads, e.g., in a data center. Introduction Increasingly, local and distributed power generation e.g., through solar (photovoltaic or PV), wind, fuel cells, etc., is gaining traction. In fact, integration of distributed, renewable power sources into the power grid is an important goal of the smart grid effort. There are several benefits of deploying renewables, e.g., decreased reliance (and thus, demand) on the public electric grid, reduction in carbon emissions, and significantly lower transmission and distribution losses. Finally, there are emerging government mandates on increasing the proportion of energy coming from renewables, e.g., the Senate Bill X1-2 in California, which requires that one-third of the state s electricity come from renewable sources by However, renewable power sources such as photovoltaic (PV) arrays and wind are both variable and intermittent in their energy output, which makes integration with the power grid challenging. PV output is affected by temporal factors Copyright c 2012, Association for the Advancement of Artificial Intelligence ( All rights reserved. such as the time of day and day of the year, and environmental factors such as cloud cover, temperature, and air pollution. To effectively use such sources at a large scale, it is essential to be able to predict power generation. As an example, a fine-grained PV prediction model can help improve workload management in data centers. In particular, a data center s workloads may be shaped so as to closely match the expected generation profile, thereby maximizing the use of locally generated electricity. In this paper, we propose a Bayesian ensemble of three heterogeneous models for fine-grained prediction of PV output. Our contributions are: 1. The use of multiple diverse predictors to address finegrained PV prediction; while two of the predictors employ classical parametrized (naive Bayes) and non-parametric (nearest neighbor) methods, we demonstrate the use of a novel predictor based on motif mining from discretized PV profiles. 2. To accommodate variations in weather profiles, a systematic approach to weight profiles using a Bayesian ensemble; thus accommodating both local and global characteristics in PV prediction. 3. Demonstration of our approach on real data from two locations, and exploring its application to data center workload scheduling. Related Work Comprehensive surveys on time series prediction (Brockwell and Davis 2002; Montgomery, Jennings, and Kulahci 2008) exist that provide overviews of classical methods from ARMA to modeling heteroskedasticity (we implement some of these in this paper for comparison purposes). More related to energy prediction, a range of methods have been explored, e.g., weighted averages of energy received during the same time-of-the-day over few previous days (Cox 1961). Piorno et al. (2009) extended this idea to include current day energy production values as well. However, these works did not explicitly use the associated weather conditions as a basis for modeling. Sharma et al. (2011b) considered the impact of the weather conditions explicitly and used an SVM classifier in conjunction with a RBF kernel to predict solar irradiation. In an earlier work (Sharma et al. 2011a), the same authors showed that irradiation patterns
2 and Solar PV generation obey a highly linear relationship, and thus conclude that irradiance prediction was in turn predicting the Solar PV generation implicitly. In other works, Lorenz et al. (2009) used a benchmarking approach to estimate the power generation from photovoltaic cells. Bofinger et al. (2006) proposed an algorithm where the forecasts of an European weather prediction center (of midrange weathers) were refined by local statistical models to obtain a fine tuned forecast. Other works on temporal modeling with applications to sustainability focus on motif mining; e.g., Patnaik et al. (2011) proposed a novel approach to convert multivariate time-series data into a stream of symbols and mine frequent episodes in the stream to characterize sustainable regions of operation in a data center. Hao et al. (2011) describe an algorithm for peak preserving time series prediction with application to data centers. Problem Formulation Our goal is to predict photovoltaic (PV) power generation from i) historic PV power generation data, and, ii) available weather forecast data. Without loss of generality, we focus on fine-grained prediction for the next day in one hour intervals. Such predictions are useful in scheduling delay tolerant work load in a data center that follows renewable supply (Krioukov et al. 2011). Furthermore, these predictions need to be updated as more accurate weather forecast data and generation data from earlier in the day become available. Let us denote the actual PV generation for j th hour of i th day by a i,j. Let I be the number of days and J be the maximum number of hours per day for which data is available. Then, the actual PV generation values for all the time points can be expressed as a I J matrix, which we denote as A = [a i,j ] I J Corresponding to each entry of PV generation, there is a vector of weather conditions. Assuming K unique weather attributes (such as temperature, humidity, etc.), each time point is associated with vector ω i,j = ω i,j [1],ω i,j [2],..ω i,j [K], whereω i,j [k] denotes the value for the k-th weather condition for the time-slot (i, j). Corresponding to the PV generation matrix, we can define a matrix of weather conditions given by ω = [ω i,j ] I J. Finally, for each time slot, weather forecast data is continuously collected. The predicted weather condition for j th hour of the i th day, at an offset of t hours is given by the vector ρ i,j,t = ρ i,j,t [1],ρ i,j,t [2],...,ρ i,j,t [K]. Then, with reference to time-slot (e,f), given the values for a i,j and ω i,j, (i,j) (e,f) and weather forecast ρ e,f+1,1,ρ e,f+2,2,..,ρ e,j,j f ; we need to determine the prediction for PV generation for the remaining time slots of the day, i.e.,(e,f),(e,f +1),..(e,J). Methods We propose a novel Bayesian ensemble method that aggregates diverse predictors to solve the PV prediction problem described above. An architectural overview of the method is shown in Figure 1. As an initial pre-processing step, we determine the common modes of daily PV generation profiles via distance based clustering of historical generation data. Once such profiles are discovered, we represent the Figure 1: Schematic Diagram of the proposed Bayesian Ensemble method. All the predictors make use of the profiles discovered from historical data. unknown PV generation of a day as a mixture of these profiles. The mixture coefficients are independently estimated using three different predictors and finally averaged using a Bayesian approach. These predictors are (i) a modified naive Bayes classifier that derives its features from weather forecast; (ii) a weighted k-nearest neighbor classifier that uses past generation during the same day as features; and, (iii) a predictor based on motifs of day profiles in the recent past. Profile Discovery. The first step is profile discovery which takes as input the available historic PV generation data and outputs characteristic day long profiles. Let us denote the actual PV generation for an entire i th day by the vector x i which can be written as: x i = a i,1,a i,2,,a i,j. Then we can express the entire PV dataset (A) as, A = ( x 1, x 2,.., x I ) T. This dataset A is clustered using Euclidean distance between the J dimensional feature vectors (i.e power generation for each day) by k-means (Lloyd 1982) algorithm into N clusters. The value of N is a parameter in the ensemble method and is estimated by minimizing the cross-validation error of the ensemble, keeping other parameters fixed. This yields N day long profiles. Let us denote these profiles by D = {D 1,D 2,..,D N } and the corresponding centroids by µ = {µ 1,µ 2, µ N }. This step is required to be run only once on the entire dataset. Naive Bayesian predictor. The NB predictor estimates the mixture coefficients given the weather forecast, assuming conditional independence of features (we assume they follow Gaussian distributions). If we denote all the training information obtained from the weather-pv table, such as the likelihood functions and priors of the profiles, by γ, and the weather forecast by ρ i,j = ρ i,j+1,1,ρ i,j+2,2,,ρ i,j,j j, the posterior probability of profile labels, for each remaining time slots, is computed
3 as: ( ) Pr(D n ρ i,j+t,t,γ) L(D n ρ i,j+t,t [k]) Pr(D n ) finally giving Pr(D n ρ i,j,γ,c 1) = k J j t=1 J j N n=1 t=1 where C 1 indicates classifier 1. Pr(D n ρ i,j+t,t,γ) Pr(D n ρ i,j+t,t,γ) k-nn based predictor. The k-nn (Dudani 1976) based predictor uses prior PV generation during the same day as a feature and assigns mixing coefficients based on Euclidean distance from centroids of discovered daily profiles. In order to make a prediction at thej th hour of thei th day for the rest of that day, we consider the already observed PV output values for the i th day x i (1 : j) = {a i,1,a i,2,,a i,j }. Next, we find the Euclidean distance of this vector to the truncated centroid vectors (first j dimensions) of the PV profiles and find the probability of the i th day belonging to a cluster as given by the following equation. Pr( x i D n x i(1 : j),c 2) = (1) 1 φ x i(1 : j) µ n(1 : j) 2 (2) where φ is a normalizing constant found as: φ = 1 n x i(1:j) µ n(1:j) 2, where C 2 indicates classifier 2. Motif based Predictor. The final predictor exploits the sequentiality in PV generation between successive days to find motifs and give membership estimates of the profiles based on such motifs. For this step, we consider the entire PV data as a stream of profile labels. We further consider a window size : the maximum number of past days that can influence the profile and treat the stream as a group of vectors of the form d i 1,d i 2,,d i W where d j D (j < i) denotes the profile label of the jth data point. Sliding the window we can get different values of such vectors and can mine for motifs. Definition 1: For a window W i ( W i = W ) containing labels d i W,,d i 2,d i 1, eligible episodes are defined as all such sequences ep = d p1,d p2,d p3,, such that p 1 < p 2 < p 3 <. Definition 1 formalizes the term eligible. As evident from the definition, we allow episodes to contain gaps. The only criterion is that they must maintain the temporal order. Furthermore, we can define an episode ep 1 to be a sub-episode of ep 2, denoted by ep 1 ep 2, if ep 1 is a sub-sequence of ep 2. From the definition of sub-episodes it is evident that if ep 1 ep 2 then, ep 1 W j implies ep 2 W j ; i.e. the sub-episode property is anti-monotonic. The support of an episode ep i,denoted by sup epi, is equal to the number of windows W j, such that ep i W j. Finally we can now defined a maximal frequent eligible episode as follows: Definition 2: An eligible episode ep i is maximal frequent iff: Figure 2: Illustration to show the process of finding motifs 1. sup epi > τ 2. ep j such that ep i ep j Thus, we can use an Apriori approach (Agrawal and Srikant 1994) to prune out episodes when trying to find the maximally frequent ones (as relation is anti-monotonic). From the training data set, we consider such windows of size W and through apriori-counting as in (Patnaik et al. 2011) get the entire corpus of maximally frequent episodes (F E) and their corresponding supports. We refer to such maximally frequent episodes as motifs. Here τ, the support threshold is a parameter and as before, we estimate this value through cross-validation keeping the other values constant. While predicting for the i-th day, we need to find motifs which contains the i-th label and labels of some of the previous days. For this we chose the immediately preceding window of size W 1 (since the label for i-th day must already be part of the motifs). To find mixing coefficient of profile D n for the ith day, we consider all those maximally frequent episodes that end with D n denoted by ep(d n ). Let us denote the set of all such episodes by ep(d n ) = {ep(d n ) 1,...,ep(D n ) P } where P denotes the number of such episodes in the set. Then, within a window W i, support of the entire set is given by: sup( ep(d n ) ) = p sup ep(dn) p Then membership of a profile is given by: Pr( x i D n x i 1, x i 2,.., x i W 1,C 3) = sup( ep(dn) ) N sup( ep(d n) ) n=1 (3) where C 3 indicates classifier 3. This counting step can be potentially very expensive. However, for the current problem the best window sizes are small. Through cross-validation we picked a window of size
4 5 (from the range 3 to 10). Hence, the counting step, even if naively implemented, is not too expensive. Bayesian Model Averaging Finally, we aggregate the memberships obtained from the three predictors and combine them to arrive at the final prediction as follows. P ( x i D n D i,j = x i(1 : j),ρ i,j, x i 1, x i 2,.., x i W 1 ) = 3 l=1 P ( xi Dn,C l D i,j) = 3 l=1 P ( xi Dn C l,d i,j) P (C l D i,j) = P ( x i D n C 1,ρ i,j)p (C 1 D i,j)+ P (( x i D n C 2, x i(1 : j))p (C 2 D i,j)+ P ( x i D n C 3, x i 1, x i 2,.., x i W 1)P (C 3 D i,j) (4) We use Bayesian Model Averaging (BMA), as outlined in (Raftery et al. 2005), operating on mutually exclusive parts of the data to compute the values of P(C l D i,j ). P(C l D i,j ) P(D i,j C l ) P(C l ) (5) Assuming a uniform prior on the classifiers, equation 5 can be written as: P(C l D i,j ) P(D i,j C l ) (6) The values of P(D i,j C l ) can be viewed as the proportion of data explained (truly predicted) when using classifier C i. This can be estimated by constructing a confusion matrix and taking the relative frequency of true positives as an estimate. Then the predicted solar PV values can be estimated as : E( x i(j +1 : J) D i,j) = x i(j +1 : J)P ( x i(j +1 : J) D i,j) x i (j+1:j) = x i(j +1 : J) N n=1p ( xi(j +1 : J), xi Dn Di,j) x i (j+1:j) = x i(j +1 : J) N n=1p ( xi(j +1 : J) xi Dn) x i (j+1:j) P ( x i D n D i,j) = ( x i(j +1 : J)P ( x i(j +1 : J) x i D n)) n x i (j+1:j) P ( x i D n D i,j) = µ n(j +1 : J)P ( x i D n D i,j) n Baseline Models Previous Day as prediction This is the simplest baseline model, where prediction for a particular hour during the day is simply the PV generation during the same hour on the previous day. Autoregression with weather In this model, autoregression outputs as in (Piorno et al. 2009) are explicitly modified by the influence of weather information. All weather attributes except sunrise and sunset times are clustered into N C groups using k-means clustering (Lloyd 1982) based on actual solar PV generations corresponding to these conditions in the training set. A set of such weather attributes is represented by the mean Solar PV value of the corresponding cluster, denoted by l(i,j). The model can then be given (7) separately forj 1 hours (based on offset of prediction) as: A i,j = β t 1 1+β t 2 l(i,j)+β t 3 [i SR(j)] +β t 4 [ST(j)i]+β t 5 P a i,j,t +ǫ (8) where Pi,j,t a is the auto-regressed prediction from (Piorno et al. 2009), ǫ is the error term we try to minimize and the weights {β} are estimated using linear least square regression. Stagewise Modelling Another baseline model tried is a stagewise model. This draws inspiration from (Hocking 1976), where the prediction is done in correlated stages : improving the result at every additional stage. Here we consider an average model to the actual data as the first stage, auto-regression as the next and, weather regression as the final one, where only the error values from a preceding stage is passed onto the next one. Experimental Results Datasets. Data was collected from a 154 kw PV installation at a commercial building in Palo Alto, CA. PV output (in kwh) in 5 min intervals was collected from March 2011 to November 2011 (267 days) and aggregated hourly. Hourly weather data was collected from a nearby weather station and included temperature, humidity, visibility and weather conditions, which mainly relate to cloud cover and precipitation. We also needed weather forecast data, which is typically not available after the fact. We ran a script to collect weather forecast data every hour. We also collected solar and weather data from a site in Amherst, MA. Data preparation. The data for this experiment contains a total of 3,747 data points with 69 missing values. While the PV generation values are numeric, the corresponding weather data contains both numeric and categorical variables. The numeric values were imputed using linear interpolation while a majority mechanism over a window was used for the categorical ones. We use 6-fold cross validation for evaluating predictor performance and for estimating some parameters. To account for seasonal dependence, each month is uniformly split among all folds. Parameters. Some heuristics were applied to compute the probabilities of the Bayesian ensemble method. As mentioned, the likelihood needs to be estimated for the classifiers (C l ). We assume the valuesp(d i,j C l ) to be dependant only on the hour of the day i.e. we neglect the effect of seasons on classifier beliefs. Under this assumption, ideally the values need to be estimated for each hour. Instead here we apply a simple heuristic. We assume that the data is progressively explained better by the k-nn estimator (C 2 ) while the motif estimator, which estimates in a global sense i.e., without looking at the data, explains the data in a consistent manner irrespective of the hour of the day. These heuristics are given below: P(D i,j C 3 ) = θ P(D i,j C 2 ) = min(1 θ,α j +β) P(D i,j C 1 ) = 1 θ P(D i,j C 2 ) where all the values in the left hand side of the equations are bounded between 0 and 1 during computation. (9)
5 Predictor training. For all methods, parameters (including the heuristic combination weights) were selected by training over a range of values and picking the one with least cross-validation error. The basic daily generation profiles were extracted by k-means clustering over the entire dataset. The number of clusters were set at 10 based on cross-validation over 5 to 15 clusters. Some of the daily profiles obtained are shown in Figure 3. Each plot represents a cluster, and the intra-cluster variance is shown by boxplots. Cluster 2 shows a high level of generation and a low variance, likely corresponding to clear summer days, while Cluster 6 is likely related to cloudy conditions, winter days when the angle of sun is low, or when some of PV panels are not operating due to an anomaly. Method Testing Error Per. Abs. Per. RMS Rel. Abs. Error Error Error PreviousDay ARWeather Stagewise Ensemble Ensemble Table 1: Performance at 1-hour offset. sion (Stagewise). The results of these methods are summarized in Table 1. Our proposed methods perform better than the three baseline methods. Ensemble3 performs about 4% better than the best baseline method (Stagewise), and about 1% better that Ensemble2. We also present an unpaired t-test of the competing methods against Ensemble3 in Table 2 and we find that our results are statistically significant.the box plot of percentage errors together with the raw values are shown in Figure 4. The red dot indicates the average value. The average error and variance is least for Ensemble3. As expected, PreviousDay fares the worst. Figure 5 shows the residual errors for the five methods. Again, the superior performance of Ensemble2 and Ensemble3 (both in terms of average and variance) is apparent. Figure 3: Six of the ten daily profiles obtained via clustering. For the motif-based predictor, a threshold support parameter of τ = 70 and a window size of 5 were used. A total of 49 maximal frequent episodes were found. Since this number is not large, during prediction for finding motifs that end in a particular symbol, even a naive strategy for counting support will work. Also, since the motifs are day long, this only needs to be done once per day. Error metrics. For comparing the performance of the models, three distinct error metrics were considered. Using A to denote the actual output and P for the predicted, these are defined as (1) Percentage absolute error: 100%; A i,j>3 Ai,j Pi,j A i,j (2) Percentage root mean ( ) 2 Ai,j P square error: i,j A i,j>3 A i,j 100%; (3) Relative Pi,j A absolute error:error = i,j 100. For (1) and (2), Ai,j Āj errors for only the values A i,j > 3 are considered as errors corresponding to smaller values in the denominator may dominate the overall error. Also, for the system concerned any PV generation less than 3kwH is unusable. Results Based on the cross-validation error, we evaluate two variants of the proposed methods. These are Ensemble2, which is a Bayesian combination of two predictors (NB and k-nn) and Ensemble3, which includes the motif-based predictor as well. In addition, three baseline methods are also included: previous day as prediction (PreviousDay), autoregression with weather (ARWeather), and stagewise regres- Figure 4: Comparison of the error (%) of different methods. Figure 5: Residual error of the methods.
6 (except the average one), mainly because we are selecting among some known patterns and standard deviation results from difference in the mixing coefficients rather than completely new predictions as is the case in regression models. Figure 6: Error conditioned on weather and hour of day (Ensemble3 method). Method Unpaired t-test with respect to Ensemble3 Std. Error t 95% Conf. Interval Two-tailed P Significance PreviousDay to < Extremely Significant ARWeather to < Extremely Significant Stagewise to < Extremely Significant Ensemble to Very Significant Table 2: Unpaired t-tests on 1-hr prediction data compared against Ensemle3. N = 6 and degrees of freedom= 10 Figure 6 shows the percentage error distribution for Ensemble3 conditioned on weather and hour of day. The errors are lowest for clear days, and worst for rain conditions. Cloudy conditions increase both average error and variance. We observed that some of the predictions were higher than the capacity of the PV system, so for all methods we capped the maximum predicted output. In fact, we realized we could do better. From almost an year of data, we found the maximum generation for each hour and bounded the output by that amount plus a small margin. This can be further improved to add the month (season) as well. This optimization was applied to all methods and the gain in performance was in the range of 0.6% to 1%. Updating predictions. As we get closer to the hour for which a prediction is made, we update the prediction based on better weather forecast and the PV generation already seen that day. Figure 7 shows the progressive improvement in average accuracy of PV output prediction for 12pm. The plot shows the cross-validation percentage absolute errors with standard deviation marked. The Bayesian ensemble method (Ensemble3) performs the best. One likely reason is the fact that in the ensemble method we predict memberships of the characteristic daily profiles and thus consider a more global view. On the other hand, the regression models are tailored to one hour prediction and the future hour predictions are based on assuming the predictions as the true value for the unknown hours (for more than one hour offset in prediction). Furthermore, the standard deviation for the ensemble method is also lower than the other models Figure 7: Change in error(%) as the prediction is updated. Applying fine-grained PV prediction to data centers. The accurate prediction of PV generation is an important functionality required by proposed net-zero energy data centers (Arlitt et al. 2012). Data centers use a lot of electricity, and their operators are looking for ways to reduce their dependence on public electric grids. Within data centers, an opportunity exists to reschedule non-critical (i.e., delay tolerant) workloads such that they run when PV generated electricity is available, so operators have the option of turning off IT and cooling infrastructure at other times of the day. Having accurate predictions of the available PV output enables this renewable source of energy to be utilized much more effectively. For example, Figure 8(a) shows a typical data center workload where non-critical jobs are uniformly scheduled throughout the day. If accurate, fine-grained PV predictions are available, then the non-critical jobs can be scheduled such that they run only when PV power is available (Figure 8(b)). In this specific case, the optimized workload schedule results in 65% less grid energy being consumed than with the original schedule, as the data center is able to consume most of the renewable energy directly (Arlitt et al. 2012). Discussion We have demonstrated a systematic approach to integrate multiple predictors for PV output prediction. Our encouraging results sets the foundation for more detailed modeling. In particular, it is known (Sharma et al. 2011b) that solar irradiance and PV generation generally follow a highly linear relationship. Thus, our models here can easily be adapted to predict for solar irradiance. We have analyzed historical data for the site mentioned in (Sharma et al. 2011b) for the period March 2006 to May The cross-fold average rms error for the Bayesian ensemble method for the
7 (a) (b) Figure 8: (a) Data center workload schedule; (b) Optimized workload schedule based on predicted PV energy supply (Arlitt et al. 2012) entire period was found to 101 watts/m 2. This was significantly lower than that for the SVM-RBF model proposed in (Sharma et al. 2011b) where for a period of observation Jan 2010 to Oct 2010 the reported prediction rms error was 128 watts/m 2. (However, due to unavailability of predicted values of weather conditions for the site, we used actual weather conditions to make our prediction.) Ongoing work is focused on validating the efficacy of predicting the irradiance as a precursor to PV generation. References Agrawal, R., and Srikant, R Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB. Arlitt, M.; Bash, C.; Blagodurov, S.; Chen, Y.; Christian, T.; Gmach, D.; Hyser, C.; Kumari, N.; Liu, Z.; Marwah, M.; McReynolds, A.; Patel, C.; Shah, A.; Wang, Z.; and Zhou, R Towards the design and operation of net-zero energy data centers. In 13th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm). Bofinger, S., and Heilscher, G Solar electricity forecast - approach and first results. In 20th European PV conference. Brockwell, P. J., and Davis, R. A Introduction to Time Series and Forecasting. New York: Springer-Verlag, 2 edition. Cox, D. R Prediction by exponentially weighted moving averages and related methods. Royal Statistical Society 23(2):pp Dudani, S. A The Distance-Weighted k-nearest- Neighbor Rule. IEEE Transactions on Systems, Man and Cybernetics SMC-6(4): Hao, M.; Janetzko, H.; Mittelstdt, S.; Hill, W.; Dayal, U.; Keim, D.; Marwah, M.; and Shama, R A visual analytics approach for peak-preserving prediction of large seasonal time Series. In EuroVus. Hocking, R. R A biometrics invited paper. the analysis and selection of variables in linear regression. Biometrics 32(1):pp Krioukov, A.; Goebel, C.; Alspaugh, S.; Chen, Y.; Culler, D.; and Katz, R Integrating renewable energy using data analytics systems: Challenges and oppurtunities. In Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. Lloyd, S Least Squares Quantization in PCM. IEEE Transactions on Information Theory 28(2): Lorenz, E.; Remund, J.; Mller, S. C.; Traunmller, W.; Steinmaurer, G.; Pozo, D.; Ruiz-Arias, J. A.; Fanego, V. L.; Ramirez, L.; Romeo, M. G.; Kurz, C.; Pomares, L. M.; ; and Guerrero, C. G Benchmarking of different approaches to forecast solar irradiance. In 24th European Photovoltaic Solar Energy Conference. Montgomery, D. C.; Jennings, C. L.; and Kulahci, M Introduction to Time Series Analysis and Forecasting. Wiley. Patnaik, D.; Marwah, M.; Sharma, R. K.; and Ramakrishnan, N Temporal data mining approaches for sustainable chiller management in data centers. ACM Transactions on Intelligent Systems and Technology 2(4). Piorno, J. R.; Bergonzini, C.; Atienza, D.; and Rosing, T. S Prediction and Management in Energy Harvested Wireless Sensor Nodes. In Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology, Wireless VITAE st International Conference on. Raftery, A. E.; Gneiting, T.; Balabdaoui, F.; and Polakowski, M Using Bayesian Model Averaging to Calibrate Forecast Ensembles. Monthly Weather Review 133: Sharma, N.; Gummeson, J.; Irwin, D.; and Shenoy, P. 2011a. Leveraging weather forecasts in energy harvesting systems. Technical report, University of Massachusetts Amherst, Tech. Rep. Sharma, N.; Sharma, P.; Irwin, D.; and Shenoy, P. 2011b. Predicting solar generation from weather forecasts using machine learning. In Proceedings of Second IEEE International Conference on Smart Grid Communications(SmartGridComm).
Data Mining for Sustainable Data Centers
Data Mining for Sustainable Data Centers Manish Marwah Senior Research Scientist Hewlett Packard Laboratories [email protected] 2009 Hewlett-Packard Development Company, L.P. The information contained
Predicting Solar Generation from Weather Forecasts Using Machine Learning
Predicting Solar Generation from Weather Forecasts Using Machine Learning Navin Sharma, Pranshu Sharma, David Irwin, and Prashant Shenoy Department of Computer Science University of Massachusetts Amherst
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
Statistical Learning for Short-Term Photovoltaic Power Predictions
Statistical Learning for Short-Term Photovoltaic Power Predictions Björn Wolff 1, Elke Lorenz 2, Oliver Kramer 1 1 Department of Computing Science 2 Institute of Physics, Energy and Semiconductor Research
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Evaluation of Machine Learning Techniques for Green Energy Prediction
arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
A Study on the Comparison of Electricity Forecasting Models: Korea and China
Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data
A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction
Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction Jin Xu, Shinjae Yoo, Dantong Yu, Dong Huang, John Heiser, Paul Kalb Solar Energy Abundant, clean, and secure
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Machine Learning in Statistical Arbitrage
Machine Learning in Statistical Arbitrage Xing Fu, Avinash Patra December 11, 2009 Abstract We apply machine learning methods to obtain an index arbitrage strategy. In particular, we employ linear regression
Cross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Detection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR
VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR Andrew Goldstein Yale University 68 High Street New Haven, CT 06511 [email protected] Alexander Thornton Shawn Kerrigan Locus Energy 657 Mission St.
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION
MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION Matthew A. Lanham & Ralph D. Badinelli Virginia Polytechnic Institute and State University Department of Business
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
Wikipedia in the Tourism Industry: Forecasting Demand and Modeling Usage Behavior
kipedia in the Tourism Industry: Forecasting Demand and Modeling Usage Behavior Pejman Khadivi Discovery Analytics Center Computer Science Department Virginia Tech, Blacksburg, Virginia Email: [email protected]
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
REDUCING UNCERTAINTY IN SOLAR ENERGY ESTIMATES
REDUCING UNCERTAINTY IN SOLAR ENERGY ESTIMATES Mitigating Energy Risk through On-Site Monitoring Marie Schnitzer, Vice President of Consulting Services Christopher Thuman, Senior Meteorologist Peter Johnson,
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort [email protected] Motivation Location matters! Observed value at one location is
Modeling of System of Systems via Data Analytics Case for Big Data in SoS 1
Modeling of System of Systems via Data Analytics Case for Big Data in SoS 1 Barnabas K. Tannahill Aerospace Electronics and Information Technology Division Southwest Research Institute San Antonio, TX,
Local outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Visual exploration of frequent patterns in multivariate time series
430769IVI0010.1177/1473871611430769Hao et al.information Visualization 2011 Article Visual exploration of frequent patterns in multivariate time series Information Visualization 0(0) 1 13 The Author(s)
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013
A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
CS229 Project Report Automated Stock Trading Using Machine Learning Algorithms
CS229 roject Report Automated Stock Trading Using Machine Learning Algorithms Tianxin Dai [email protected] Arpan Shah [email protected] Hongxia Zhong [email protected] 1. Introduction
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
HT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
Car Insurance. Prvák, Tomi, Havri
Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen
Tracking Groups of Pedestrians in Video Sequences
Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal
Structural Health Monitoring Tools (SHMTools)
Structural Health Monitoring Tools (SHMTools) Getting Started LANL/UCSD Engineering Institute LA-CC-14-046 c Copyright 2014, Los Alamos National Security, LLC All rights reserved. May 30, 2014 Contents
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Machine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Visual Exploration of Frequent Patterns in Multivariate Time Series
Visual Exploration of Frequent Patterns in Multivariate Time Series Ming C. Hao 1, Manish Marwah 1, Halldór Janetzko 2, Umeshwar Dayal 1, Daniel A. Keim 2, Debprakash Patnaik 3, Naren Ramakrishnan 3 and
How To Forecast Solar Power
Forecasting Solar Power with Adaptive Models A Pilot Study Dr. James W. Hall 1. Introduction Expanding the use of renewable energy sources, primarily wind and solar, has become a US national priority.
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Drugs store sales forecast using Machine Learning
Drugs store sales forecast using Machine Learning Hongyu Xiong (hxiong2), Xi Wu (wuxi), Jingying Yue (jingying) 1 Introduction Nowadays medical-related sales prediction is of great interest; with reliable
Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
Predicting Flight Delays
Predicting Flight Delays Dieterich Lawson [email protected] William Castillo [email protected] Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
Predicting daily incoming solar energy from weather data
Predicting daily incoming solar energy from weather data ROMAIN JUBAN, PATRICK QUACH Stanford University - CS229 Machine Learning December 12, 2013 Being able to accurately predict the solar power hitting
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References
Big Data Analytic Paradigms -From PCA to Deep Learning
The Intersection of Robust Intelligence and Trust in Autonomous Systems: Papers from the AAAI Spring Symposium Big Data Analytic Paradigms -From PCA to Deep Learning Barnabas K. Tannahill Aerospace Electronics
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Analysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
Towards Online Recognition of Handwritten Mathematics
Towards Online Recognition of Handwritten Mathematics Vadim Mazalov, joint work with Oleg Golubitsky and Stephen M. Watt Ontario Research Centre for Computer Algebra Department of Computer Science Western
Invited Applications Paper
Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK [email protected] [email protected]
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Machine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
