GMP Data Warehouse Data Analysis and Reporting in GMP DWH

Size: px
Start display at page:

Download "GMP Data Warehouse Data Analysis and Reporting in GMP DWH"

Transcription

1 GMP Data Warehouse Data Analysis and Reporting in GMP DWH Jiří Jarkovský, Jiří Kalina, Ladislav Dušek, Jana Klánová, Richard Hůlek, Daniel Klimeš, Daniel Schwarz, Petr Holub, Jakub Gregor, Jana Borůvková, Kateřina Šebková

2

3 Contents 1 Introduction guide to chapters Technical note on statistical analysis in GMP Introduction General principles Data pre-processing From primary to aggregated POPs concentrations Statistical testing and its power Exploratory and confirmatory statistics: estimates and comparisons Proposed outcomes of the statistical processing of GMP data Summary statistics POPs concentrations in the atmosphere Uncertainty analysis Power analysis: quantification of minimum detectable difference Stochastic identification of time trends Quantification of time trends References Evaluation of atmospheric concentrations of selected POPs from GMP 1 data collection campaign content analysis Available data in GMP 1 database Six levels of data selection Statistical methodology Results air monitoring using passive sampling Results - air monitoring using active sampling Reporting of GMP data collected in the period and its extension World map monitoring overview Available data parameters Available data years Reported values summary statistics Reported values advanced interactive map visualization Time series analysis Documentation of new GIS module proposed for GMP DWH

4 1 Introduction guide to chapters The statistical methodology and data processing in GMP 2 is based on the approaches adopted for the data analysis in GMP 1 with new improvements and extensions. These extensions aim at trend comparison. The standards and principles adopted from the GMP1 are shown in the following chapters: - Technical note on statistical analysis in GMP - Evaluation of atmospheric concentrations of selected POPs from GMP 1 data collection campaign content analysis New improvements and annexes of proposed methodology in data analysis are summarized in chapters: - Reporting of GMP data in and its extension - Documentation of new GIS module for GMP DWH The robust design of methods allows the use of the set of algorithms to any GMP data collection campaign records. It can also include subsequent campaigns, comparison and testing of time series. The methodology supports the following tasks: - Validation and description of data availability - Summary statistics of reported concentration values - Visualization of reported concentration values using a novel GIS map technology - Complex time series analysis of POPs concentrations and confirmatory assessment of timerelated changes in contamination - All outcomes of statistical processing are generated in user friendly interactive reports, fully accessible on-line Proposed concept of statistical data analysis is depicted in the Figure 1. 2

5 Figure 1. Factsheet on statistical analysis of GMP2 data 3

6 2 Technical note on statistical analysis in GMP (published in Guidance on the global monitoring plan for persistent organic pollutants, document UNEP/POPS/COP.6/INF/31, version January 2013). 2.1 Introduction The aim of this technical note is to complement the statistical methodology published in earlier version of the GMP Guidance Document, and to specify methodical procedures which should be employed when processing POPs data reported to the GMP. Based on the practical experience with the first GMP reports data processing, this technical note specifies obligatory data fields which should be filled in order to allow correct interpretation of the data analysis. Finally, this text highlights the most important data pre-processing and processing steps and proposes a logical sequence of the statistical outcomes. The methodology is based on robust statistical methods which can be generally applied for statistical analysis of POPs concentrations in any environmental matrix reported in GMP 2 (air, human tissues, water). 2.2 General principles Data pre-processing Correct definition of data is an unavoidable prerequisite of all subsequent statistical analyses. Only reliably reported concentration values can be accepted for any spatial or temporal comparisons. Therefore, a multilevel evaluation procedure based on the annually aggregated concentrations is proposed to maintain a high predictive value of the GMP records while avoiding bias in the concentration values. The evaluation of the first GMP reports revealed a number of challenges related to data standardization, such as the lack of a standardized taxonomy for the listed POPs, their isomers, transformation products and summations. Some records provided detailed primary data, including rarely measured compounds, while others contained only the sums of the key groups of POPs. The data heterogeneity was further enhanced by reporting various toxic equivalents (TEQ) (based on the WHO TEF values from different years) rather than concentrations of each PCDDs, PCDFs and PCB congener. Unclear identification of matrices, units, time scales of reported concentrations, as well as insufficient specification of aggregated data have also been observed. Large volumes of valuable data have been generated in all regions and further standardization of reporting formats would significantly improve their applicability. A more elaborated guidance for handling, reporting and analyzing/interpreting these data will thus improve their applicability and support the development of the second GMP reports. The practical experience with processing and validation of GMP1 records established more precise data handling rules for the future data collection. The rules define mandatory data fields which correspond to the standardized structure of the GMP DWH: typology of the background site; definition of the matrix; taxonomy of parameters; sampling frequency (and data aggregation, if applied) and measured value specified by its unit and data variability. The proposed data evaluation procedure guarantees comparability of the different samples, taking into account type of site, matrix, sampling method, time span and sampling frequency. Heterogeneity in these factors might a dramatically increase the uncertainty in the final outcomes. The pre-processing 4

7 procedures also limit the impact of uncontrolled covariates and thus reduce the risk of false trend detection or the risk of neglecting truly significant changes. a) Initial data filtering stratifies the records according to objective entities, such as site-matrix type and analyzed compounds. The filters must also check/verify the completeness of the primary GMP DWH database records in the reported sampling frequency, number of detected LOQs and their handling rules. b) In the statistical part, the validation procedure excludes obvious extreme or unreliable values from quantitative analyses. The outlying concentrations can be identified by checking their quantile position in the sample distribution function. Estimated mean and standard deviation of log transformed annually aggregated data can be used for the reconstruction of the normal or log-normal distribution; a resulting pattern can be used to assess probabilistic position of the point values From primary to aggregated POPs concentrations Seasonal dynamic is the most important source of variability in levels of POPs concentrations measured in the atmosphere s. This is not the case for human tissue data. Provided that primary concentration data sets (atmosphere) are available, the impact of seasonality can be quantified and extracted from the time series by proper smoothing techniques and adjusting statistical models. Seasonally-adjusted time series constitute a pool for the subsequent trend detection and quantification. Annually aggregated data can also be used for spatial and temporal comparisons and quantification of time-related trends. Most records reported to the first GMP data collections are annually aggregated arithmetic means. Nevertheless the experience shows that the aggregated values must be reported with appropriate variability estimates generated from primary data in the future (5th-95th percentile range and standard deviation are recommended). The quantity of non-detects (below LOQ values) in the primary records and their handling in the mean calculations must be provided as well Statistical testing and its power Power analysis is an obligatory step to define reliably the magnitude of changes identified by the statistical methods. Power analysis minimizes the risk of misinterpretation or of the incorrect generalization on the basis of values observed. The power calculation must be applied for both the statistical trend detection (Mann-Kendall algorithms) and the quantification (paired tests assessing the difference in the annual arithmetic or geometric means). Two approaches are recommended for GMP data testing, in particular, for the time trend detection: a) Quantification of the minimum detectable difference between the annually aggregated values allowing to benchmark the identified changes against the statistically detectable levels; b) Prospective calculation of the sample size needed for the detection of a given relative time change in the POPs concentrations (e.g. 50% annual decrease) Exploratory and confirmatory statistics: estimates and comparisons Simplicity and robustness are the main principles when processing the GMP records. Non-parametric tests and summary statistics without or with negligible assumptions for the distribution patterns are highly recommended: - Median estimates supplied with a 5th-95th percentile range and geometric mean estimated on the basis of log-transformed data with a corresponding 95% confidence interval are 5

8 recommended for the summary statistics. Mann-Kendall U test, Kruskal-Wallis test, and Wilcoxon paired rank sum test are recommended for comparative analyses. - Spearman s rank correlation coefficient is recommended for correlation analysis. 2.3 Proposed outcomes of the statistical processing of GMP data The following outcomes are proposed for the statistical processing of GMP data: 1) Summary statistics of POPs concentrations in the atmosphere 2) Uncertainty analysis 3) Power analysis: quantification of the minimum detectable difference as a base for relevant estimates of changes over time and, if possible, of time trends 4) Stochastic identification of time trends 5) Quantification of time trends Summary statistics POPs concentrations in the atmosphere Annually aggregated POPs concentrations, calculated as arithmetic means from the primary values can be used for both the quantitative and qualitative analyses (see also section 2.2). In order to evaluate the heterogeneity of the primary data, the individual compounds values have to be assessed for their sample size and type of data sources. Two approaches are recommended for assessing summary statistics of the concentration values: a) Median estimate reported together with a 5th-95th percentile range b) Geometric mean estimate based on the log-transformed annual averages with a corresponding 95% confidence interval The variability of the baseline values can be evaluated at local, regional or global scale, merging appropriate data sets. The underlying data pooling (both primary and aggregated data), however, has to be supported by an uncertainty analysis (see 3.2). Non-parametric tests like Mann-Kendall U test or Kruskal-Wallis test are recommended for interregional comparison of POPs concentrations. Parametric test like ANOVA models or analysis of covariance can be applied only after an effective normalizing transformation of the concentration estimates Uncertainty analysis Data reported to the GMP are generated by a variety of programmes, at several background sites of each UN region; therefore, they have to be inspected for an intra-regional and inter-regional homogeneity in the annually averaged POPs concentrations. Graphically, regional variability can be reported as the intra-regional 5th-95th percentile range. Sample distribution functions of the regional samples can then be compared and tested by proper robust methods (Kolmogorov-Smirnov test, Kruskal-Wallis test). The same applies to geometric means of the averaged concentrations and their 95% confidence intervals. The uncertainty analysis identifies regions or data subsets with an increased intra-regional variability in the annually averaged concentrations as well as sources of such variability (evident outlying values should be excluded). Any spatial or temporal comparison should be preceded by an assessment of internal homogeneity of concentration values in the areas of interest. 6

9 Similarly, the homogeneity should also be assessed in the time trend analysis (i.e. presence of and same direction in the trend change and annual difference). A year-to-year difference can be compared among time-series based on the individual sites. Such variability can be expressed as a standardized year-to-year difference or as a coefficient of variation (expressed in %). Applying time-related regression models and their residuals is possible as well. In the existing time series, homogeneity (or non-homogeneity) in a year-to-year variance indicates the degree of representativeness and the stability of identified time trends. The time series reported from various sites can be merged for more powerful trend analysis only if their homogeneity was confirmed Power analysis: quantification of minimum detectable difference The heterogeneity and a limited accessibility of primary data in the first GMP reports significantly limit the power analysis of time trends for many compounds. Time-related analysis performed on the annually aggregated data (see 2.2) partially reduces their sensitivity and the ability to detect significant differences. Therefore, any time trend analysis must be accompanied by power analysis, and the identified trends must always be reported together with the corresponding minimum detectable difference. Power analysis estimates a minimum difference between the two annually aggregated concentration values detectable by paired t-test on log-transformed data (α = 0.05 and β = 0.20). Appropriate non-parametric alternatives such as the Wilcoxon-rank-sum test or the Mann-Kendall test can be used as well, especially in cases where the analysis is based on primary concentration data rather than on normalized data. This approach should be tested in a pilot study performed on available primary data sets at first Stochastic identification of time trends Time trends are identified via a qualitative test for a statistical significance of the time-related changes observed in the consecutive measurements. At least five consecutive annually aggregated concentration values are required when assessing time trends using one of the following robust techniques: - The Daniel s test, as an application of the Spearman s rank correlation coefficient between the concentration values and corresponding time ranks; - The Mann-Kendal test, as a non-parametric test for detecting a trend in time series, based on binary coding of the changes in measurements consecutive in time. The direction of the time trend (whether concentration values are increasing or decreasing over time) has to be recorded whenever it is confirmed as statistically significant. In addition, any concentration change over time should be reported in the same way, although there is no exact statistical significance behind it. Both the statistically significant and the non-significant time changes over time must be correctly quantified in the reports and marked with the p value generated by appropriate tests (see 2.3.5) Quantification of time trends A quantification of time trends should be performed whenever the proper statistical tests confirm significant and consistent time-related differences in POPs concentrations (see 3.4). A quantified trend means a difference Δ=y1 y2, where y1 and y2 correspond to annually aggregated concentration values recorded in two consecutive years. The time-related difference in the concentration value should be expressed with the following attributes: - The difference as an absolute value expressed in concentration units; - The relative change (%) expressed as an index of the value detected in the baseline year; 7

10 - The 95% confidence interval of the time-related difference; - The p value of the trend test; - The corresponding minimum detectable annual difference. In addition, it can be useful to monitor temporal changes of relative contribution, together with the evaluation of the temporal changes in POPs concentrations in core matrices. Such information can provide a new insight into the changing primary and secondary sources or to the transport pathways of POPs. 2.4 References - Bails, D. G. and Peppers, L. C. (1982) Business fluctuations: Forecasting techniques and applications. Englewood Cliffs, NJ: Prentice-Hall. - Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (second ed.). Lawrence Erlbaum Associates, Hillsdale, New Jersey. - Daniels, H.E. (1950) Rank correlation and population models. Journal of Royal Statistical Society Series B, 12, Hirsch, R.M., Slack, J.R. and Smith, R.A. (1982) Techniques of Trend Analysis for Monthly Water Quality Data, Water Resources Research 18(1), Holoubek, I; Klanova, J; Jarkovsky, J; Kohoutek, J (2007) Trends in background levels of persistent organic pollutants at Kosetice observatory, Czech Republic. Part I. Ambient air and wet deposition Journal Of Environmental Monitoring. 9 (6), Kendall, M.G. (1975) Rank Correlation Methods, Charles Griffin, London. - Libiseller, C. and Grimvall, A. (2002) Performance of Partial Mann-Kendall Test for Trend Detection in the Presence of Covariates. Environmetrics 13, Libiseller, C. (2004) MULTMK/PARTMK A program for the computation of multivariate and partial Mann-Kendall tests. - Limpert, E.; Stahel, W.; Abbt, M. (2001) Log-normal Distributions across the Sciences: Keys and Clues. BioScience. 51 (5), Mann, H.B. (1945) Nonparametric tests against trend. Econometrica 13, Shasha, D. (2004), High Performance Discovery in Time Series, Berlin: Springer. - Wei, W. W. (1989). Time series analysis: Univariate and multivariate methods. New York: Addison-Wesley. - Zar, J.H. (2009) Biostatistical Analysis (5th Edition). Prentice Hall, Upper Saddle River, New Jersey

11 3 Evaluation of atmospheric concentrations of selected POPs from GMP 1 data collection campaign content analysis 3.1 Available data in GMP 1 database The GMP 1 data repository aggregates air pollution data from multiple sources covering 177 POPs compounds (Tables 1 and 2) from 76 countries (Figure 2). The number of countries, sites and measurements are summarized in Table 1. Table 1. Countries, sampling sites and samples from air monitoring in the five UN regions (samples are aggregated annually) CEEC WEOG GRULAC Asia and Pacific Africa Number of countries Number of sites Number of samples Active sampling Sites with short time series Sites with long time series Passive sampling Sites with short time series Sites with long time series

12 Table 2. List of compounds reported to the GMP 1 data repository (N=171) 10

13 Figure 2. GMP air data availability in the world map (the darker the colour the higher quantity of available samples) 11

14 3.2 Six levels of data selection Air samples (12,075 records) Site type filter Background sites (8,649 records) Anthropogenic sites (3,426 records) For a reliable assessment of the baseline POPs concentrations, a six-step validation procedure was applied on the annually aggregated GMP1 data. This procedure was introduced in order to maintain a high predictive value of the GMP1 records and to avoid bias in the concentration values. The initial data filtering stratified the records by objective entities such as the sampling matrix, site type and compounds analyzed, and subsequently excluded apparently extreme or unreliable values from the quantitative analysis. POPs filter Selected compounds (3,536 records) Statistics filter Aggregated values available (3,473 records) Extreme values Non-outlying values (3,455 records) Sampling type Passive sampling (1,611 records) Other compounds (5,113 records) Incomplete values and unclearly coded sites (63 records) Outliers (analysis of sample distribution function) (18 records) Active sampling (1,844 records) Figure 3. Six-step validation process of the GMP 1 records. This case study is focused on POPs concentrations in ambient air. Other matrices reported in GMP1 - breast milk and blood were not included. Therefore, in the first step, only 10,216 records resulting from atmospheric monitoring were used from a total of 16,317 records. The second validation step verified the relevance and completeness in the identification of the sampling site (spatially aggregated atmospheric samples were excluded, except for the GRULAC region, where they were aggregated over the remote background sites). The subsequent analysis considers only clearly coded background sites: sampling sites with increased pollution stemming from industry, transport or agricultural production were filtered out. These controls resulted in a smaller sub-set of 7,120 records provided from well-described background sites. In the third step, the procedure separated 1,694 records of selected substances well-represented in GMP1 to estimate their concentrations. Among the initial 12 legacy POPs reported in GMP1, DDTs (p,p'-ddt and p,p'-dde), hexachlorobenzene (HCB) and a sum of 6 polychlorinated biphenyls (PCBs) were selected. In addition, alpha-hch and gamma-hch from newly listed POPs were also reported frequently. Further selection was carried out according to the accessibility of time series data for pilot estimates of historical time trends and for changes over time. At least one time series with five consecutive years of measurement on one site was required for this purpose at a minimum. Such requirement was fulfilled for all of the 12 parameters (groups) listed in Table 1. The fourth step statistical filter - removed 85 records with the atmospheric concentration expressed by other statistical characteristic than the mean value of the measurements performed during the one-year period (i.e. percentiles, minimum, maximum, median estimates and values that covered unclear periods of time were excluded). 12

15 The fifth step excluded statistically verified extreme values, i.e. points exceeding the 95th percentile in a computer-assisted reconstruction of log-normal distribution of primary values. The final pool of data consisted of 1,560 records that were classified according to the sampling method, i.e. active and passive samples. Statistical analysis was separately performed for these two groups. Table 3. Parameters selected for the GMP 1 pilot analysis p,p'-ddt p,p'-dde hexachlorobenzene (HCB) α-hexachlorocyclohexane γ-hexachlorocyclohexane sum of 6 polychlorinated biphenyls (PCB) PCB 28 (2,4,4'-trichlorobiphenyl) PCB 52 (2,2',5,5'-tetrachlorobiphenyl) PCB 101 (2,2',4,5,5'-pentachlorobiphenyl) PCB 138 (2,2',3,4,4',5'-hexachlorobiphenyl) PCB 153 (2,2',4,4',5,5'-hexachlorobiphenyl) PCB 180 (2,2',3,4,4',5,5'-heptachlorobiphenyl) The compounds in Table 3 were included in the model training data set to demonstrate the validity of the proposed methodology. The selection of POPs for the analysis was based on the list of 12 initial POPs reported in GMP1 with respect to the availability of data. Furthermore, α-hch and γ-hch were also included, since the volume of GMP1data for these compounds was large enough to enable a reliable analysis. 3.3 Statistical methodology The statistical analysis was carried out for 5 individual parameters listed in Table 1 and separately for the sum of the six indicator PCB congeners. Quantitative and qualitative analyses were performed solely on the annually aggregated concentration values reported in GMP1 or on obtained from GMP1 primary data by arithmetic mean. All statistical calculations were applied on the basis of standard and robust methods, without excessive requirements on the sample size or on the shape of sample distribution of primary values. The sample size and range of data sources were assessed for each parameter. This step is very important for clarifying the heterogeneity of the primary GMP1 records, as some compounds were reported from several regions/sites, while others were reported only from one site. The following descriptive items were evaluated and summarized: - Data source: information on the site/laboratory/project reporting the data. In the case of a single source, this is directly identified. For a more robust estimation, data combined from several reporting units were taken; then the multiple reports code is used. - Period: time span (in years) for which data assessment can take place. - No. of records: sample size used for the calculation of the relevant statistical quantity. - Value: the value of the statistical quantity (e.g. quantified concentrations, time trend, etc.), if available; otherwise the "N/A" code was used. The following five statistical measures were examined as key endpoints: 1) Summary statistics of POPs concentrations. A robust set of descriptive statistics based on selected data records was applied in the computation of: 13

16 a) Regional median supplemented by 5th and 95th percentile: computed for all records over the accessible sampling period. This value describes the overall range of all values and serves as a robust description of the central tendency in the sample data distribution. b) Regional geometric mean and its 95% confidence interval: computed over the accessible sampling period. It provides a parametric estimate of the concentration based on the assumption of lognormal distribution of values. c) Regional arithmetic mean: provides a supplementary description of the concentration values in the region over the accessible sampling period. This procedure was used as a quantitative estimate, although the sample distributions do not correspond to the normal distribution model. 2) Detectable alternative for time trend estimates, expressed as the minimum detectable annual difference, was estimated in available time series using power analysis for the minimum statistically significant detectable difference of two arithmetic means (annually averaged concentration values) in paired t-test with α = 0.05 and power = The difference between the start and the end of the series was computed, and t divided by the length of the time series, to obtain average annual differences for each site with at least two concentration values available. The standard deviation of the differences were used as input to the standard power analysis of the paired t-test together with a desired power = 0.80 (1 β) and α = 0.05; the paired t-test was selected as a standard test for analysing the differences in the situation, where the computed differences follow a normal distribution; this was also valid for cases where primary concentration data were log-normally distributed. The computation is based on the following formula: DA t v / = 1 α 2 SD N where DA is estimated detectable alternative, SD represents the standard deviation of the differences, t ν is value of Student s distribution for a given power, α represents the degree of freedom (n=ν-1). 3) Time trend identification, as a qualitative test which confirms a consistent time trend among the consecutive POPs values in terms of the statistical significance. Robust Mann-Kendall tests were used. The evaluation was applicable only to those sites with at least five consecutive annually aggregated mean concentration values available. α = 0.05 was used as a level of statistical significance in all tests. The Mann-Kendall test (Mann, 1945; Kendall, 1975) is a non-parametric test for identifying trends in data time series. The test compares relative magnitudes of sample data instead of the exact values. The main benefit of this test is that data do not need to conform to any particular distribution. Moreover, data reported as not-detected can be included in the test by replacing them with a common value smaller than the smallest measured value in the data set. The procedure assumes that there is only one value per each point in time. The median is used for multiple data points over a single time period. Data values are evaluated as sorted time series. Each data value is compared to all subsequent data values. The initial value of the Mann-Kendall statistics, S, is assumed to be 0 (e.g., no trend). If a 14

17 value from a subsequent time period is higher, the S is increased by 1. On the other hand, if the value in the following time period is lower than a value sampled earlier, S is diminished by 1. The net result of all such increments and decreases yields the final value of S. Let x1, x2,, xn represent n data points where xj represents the data point at time j. Then the Mann- Kendall statistics (S) is given by S = n 1 n k = 1 j = k + 1 sign( x j x k ), where 1 if x j xk > 0 sign ( x j xk ) = 0 if x j xk = 0. 1 if x j xk < 0 A high positive value of S is an indicator of an increasing trend, and a low negative value indicates a decreasing trend. However, it is necessary to compute the probability associated with S and with the sample size, n, to quantify statistically the significance of the trend observed. The variance of S is defined as g 1 VAR ( S) = n( n 1)(2n + 5) t 18 p= 1 p ( t p 1)(2t p + 5), where n is the number of data points, g is the number of tied groups (a tied group is a set of sample data having the same value), and tp is the number of data points in the pth group. For example, in the model sequence {2, 3, non-detect, 3, non-detect, 3}, n=6, g=2, t1=2 for the non-detects, and t2=3 for the tied value 3. A normalized test statistic Z can be then computed as follows: Z = S 1 VAR( S) 0 S + 1 VAR( S) if if if S > 0 S = 0 S < 0 The trend is considered decreasing if Z is negative and the computed probability is higher than the level of significance. The trend is considered increasing if Z is positive and the computed probability is higher than the level of significance. If the computed probability is lower than the level of significance, there is no trend. Standardized version of statistic S, called Kendall tau is used for description of trends in Tables 3 and 5. 4) Time trend quantification is computed using a linear regression slope estimate. The quantified trend represents the mean annual change based on a linear model and supplemented by its 95% 15

18 confidence interval. Simple linear regression (Zar, 1998) for time series trend estimate is defined as: Yí = α + βx i + ε i where α and β are constants and ɛ i is referred to as an error or residual. The parameter β is also called the regression coefficient, the slope or the trend estimate in time series analysis and parameter α is called Y intercept. If the estimate for β is b and for α is the best estimate a, then the sample regression equation is expressed as: Y ˆ = a + i bx i The trend, b, of the regression line computed from sample data expresses quantitatively the straight-line dependence of Y and X in the sample. It is possible to obtain a sample of data points (i.e. by random sampling) where the calculated b would suggest that β was positive, even though it is, in fact, zero. It is unlikely to obtain points to yield β=0 by a random sampling. To demonstrate the likelihood, we can examine the null hypothesis : 0 0 H β =, and the alternate hypothesis, :β 0. A H Thus, if the probability of obtaining the calculated b is small (let s say 5% or less), then 0 H is rejected, and A H is assumed to be true. This null hypothesis for β can be tested for example by using Student s t statistic. The (1-α) confidence interval can be calculated for the parameter being estimated (β) using Student s t statistic as: b t s, ± α ( 2),( n 2) b where s b is the standard error of b and is calculated using residual mean square, which is often 2 written as s, Y X s b = s 2 Y 2 X x. The residual mean square depends on residual sum of squares and residual degree of freedom as follows ( Y Yˆ ) residualss = i i residualdf = totaldf regressiondf 2 s = residualss / residualdf, Y X 2 When working with a simple linear regression then the regressiondf equals 1 and the totaldf is n-1, thus the residualdf equals n-2, where n denotes number of observations. 16

19 3.4 Results air monitoring using passive sampling Table 4 contains aggregated entries summarizing data sources for the statistical analysis performed, i.e. the sites where relevant passive sampling data were available. Subsequently to the six-step validation, the entire data set consists of 1,279 entries from all five UN regions. The cell in the table is shaded if there were no data entries available. The statistical summary is complemented by the variability/uncertainty analysis as shown in Figure 2, consisting of six parts; each focusing on an individual parameter (compound) in the analysis. The analysis compares median and geometric mean estimates and their variability among the individual UN regions. The boxes in the charts indicate medians and the circles geometric means of the data sets from the individual regions; whiskers correspond to the 5th 95th percentile range. The shaded area of the chart corresponds to the 5th 95th percentile range of the entire set of sites for all regions; the grey vertical stripe corresponds to the global median estimate and the black vertical stripe corresponds to the global geometric mean estimate of this set. Table 5 displays results of the statistical analysis of validated atmospheric POPs concentrations obtained by passive sampling. Table 4. Accessible data sources for air monitoring using passive sampling (if data entries from no site are available, the cell is shaded) Compound Africa CEEC GRULAC WEOG Asia and Pacific p,p'-ddt 3 Total sites: 20 Total records: 23 9 Total sites: 70 Total records: 105 Total countries: 3 Total sites: 6 Total records: 11 3 Total sites: 25 Total records: 37 Total countries: 3 Total sites: 8 Total records: 11 p,p'-dde 4 Total sites: 21 Total records: 24 9 Total sites: 70 Total records: 106 Total countries: 4 Total sites: 8 Total records: 17 3 Total sites: 27 Total records: 39 Total countries: 3 Total sites: 8 Total records: 11 hexachlorbenzene (HCB) 2 Total sites: 16 Total records: 16 9 Total sites: 67 Total records: 100 Total countries: 3 Total sites: 8 Total records: 11 Σ 6 polychlorinated biphenyls (PCB) 2 Total sites: 16 Total records: 16 9 Total sites: 67 Total records: 100 Total countries: 3 Total sites: 7 Total records: 10 PCB 28 2 Total sites: 16 Total records: 16 9 Total sites: 67 Total records: 100 Total countries: 3 Total sites: 7 Total records: 10 PCB 52 2 Total sites: 16 Total records: 16 9 Total sites: 67 Total records: 100 Total countries: 3 Total sites: 7 Total records: 10 PCB Total sites: 16 Total records: 16 9 Total sites: 67 Total records: 100 Total countries: 3 Total sites: 7 Total records: 10 PCB Total sites: 16 Total records: 16 9 Total sites: 67 Total records: 100 Total countries: 3 Total sites: 8 Total records: 11 17

20 PCB Total sites: 16 Total records: 16 9 Total sites: 67 Total records: 100 Total countries: 3 Total sites: 8 Total records: 11 PCB Total sites: 16 Total records: 16 9 Total sites: 67 Total records: 100 Total countries: 3 Total sites: 8 Total records: 11 α-hch 1 Total sites: 15 Total records: 15 9 Total sites: 70 Total records: Total sites: 27 Total records: 39 Total countries: 3 Total sites: 8 Total records: 11 γ-hch 0 Total sites: 14 Total records: 14 9 Total sites: 70 Total records: Total sites: 27 Total records: 39 Total countries: 3 Total sites: 8 Total records: 11 Records: annual POPs concentration values aggregated from primary records by arithmetic mean Figure 4. Variability analysis of the atmospheric concentrations of POPs in the five UN regions, air monitoring by passive samplers (regional median: boxes; regional geometric mean: circles; regional 5 th 95 th percentile ranges: whiskers; global median: grey vertical stripe; global geometric mean: black vertical stripe; shaded area: global 5 th 95 th percentile range). 18

21 Table 5. Data summary for air monitoring using passive samplers Compound Baseline concentration Detectable alternative trend identification Historical time trends trend quantification p,p'-ddt ; ; Records: 187 Median (5-95 th percentile): 1.8 ( ) pg/m 3 Geometric mean (95% CI): 2.2 ( ) pg/m 3 Mean (standard deviation): 9.8 (62.2) pg/m 3 Sites: 45 Value: 5.3 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y p,p'-dde ; ; Records: 197 Median (5-95 th percentile): 16.8 ( ) pg/m 3 Geometric mean (95% CI): 10.3 ( ) pg/m 3 Mean (standard deviation): 31.7 (65.5) pg/m 3 Sites: 47 Value: 10.4 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y HCB ; ; Records: 127 Median (5-95 th percentile): 36.0 ( ) pg/m 3 Geometric mean (95% CI): 32.4 ( ) pg/m 3 Mean (standard deviation): 48.1 (55.0) pg/m 3 Sites: 22 Value: 52.0 pg/m 3 /y Series (sites): 2 Significant: 2 (2-) Kendall tau: -1.0 Series (sites): (-170.2; ) pg/m 3 /y Σ 6 PCB ; Period: ; Records: 126 Median (5-95 th percentile): 30.0 ( ) pg/m 3 Geometric mean (95% CI): 35.0 ( ) pg/m 3 Mean (standard deviation): 54.2 (65.1) pg/m 3 Sites: 22 Value: 22.2 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y PCB 28 ; ; Records: 126 Median (5-95 th percentile): 9.0 ( ) pg/m 3 Geometric mean (95% CI): 10.7 ( ) pg/m 3 Mean (standard deviation): 19.4 (32.2) pg/m 3 Sites: 22 Value: 1.2 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y 19

22 PCB 52 ; ; Records: 126 Median (5-95 th percentile): 7.7 ( ) pg/m 3 Geometric mean (95% CI): 8.8 ( ) pg/m 3 Mean (standard deviation): 12.4 (13.1) pg/m 3 Sites: 22 Value: 3.5 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y PCB 101 ; ; Records: 126 Median (5-95 th percentile): 4.3 ( ) pg/m 3 Geometric mean (95% CI): 4.4 ( ) pg/m 3 Mean (standard deviation): 6.7 (7.4) pg/m 3 Sites: 22 Value: 2.3 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y PCB 138 ; ; Records: 127 Median (5-95 th percentile): 2.4 ( ) pg/m 3 Geometric mean (95% CI): 3.1 ( ) pg/m 3 Mean (standard deviation): 6.0 (14.1) pg/m 3 Sites: 22 Value: 10.7 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y PCB 153 ; ; Records: 127 Median (5-95 th percentile): 4.5 ( ) pg/m 3 Geometric mean (95% CI): 5.1 ( ) pg/m 3 Mean (standard deviation): 8.0 (14.1) pg/m 3 Sites: 22 Value: 4.6 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y PCB 180 ; ; Records: 127 Median (5-95 th percentile): 1.6 ( ) pg/m 3 Geometric mean (95% CI): 2.0 ( ) pg/m 3 Mean (standard deviation): 3.4 (7.5) pg/m 3 Sites: 22 Value: 3.1 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y α-hch ; ; Records: 171 Median (5-95 th percentile): 20.9 ( ) pg/m 3 Geometric mean (95% CI): 20.9 ( ) pg/m 3 Mean (standard deviation): (2841.9) pg/m 3 Sites: 37 Value: 5.2 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y γ-hch ; ; Records: 170 Median (5-95 th percentile): 28.9 ( ) pg/m 3 Geometric mean (95% CI): 25.0 ( ) pg/m 3 Mean (standard deviation): 71.6 (436.1) pg/m 3 Sites: 37 Value: 15.8 pg/m 3 /y Series (sites): 2 Significant: 0 Kendall tau: N/A Series (sites): 0 Period: N/A N/A (N/A; N/A) pg/m 3 /y 1 Detectable alternative is expressed as the detectable mean annual difference with power = 0.8 and α = The computation is based on the set of sites with available trend data, which are used for the estimation of the variability of differences. 3.5 Results - air monitoring using active sampling Table 6 contains aggregated entries summarizing data sources for the statistical analysis performed on sites with relevant active concentration data records available. The set consisted of 281 entries from all regions after the six-step validation process described in the introductory section: WEOG (81 records), Asia and Pacific (134 records) and CEEC (66 records). No valid sets were obtained from Africa and GRULAC. The respective cells are shaded if there were no available data entries. The statistical summary is complemented by the variability/uncertainty analysis provided in Figure 5, consisting of six parts, each part focusing on the individual parameters (compounds) in the analysis. The analysis compares median and geometric mean estimates and their variability among the UN regions. The boxes in the charts indicate medians and the circles geometric means of sets of values for individual regions; whiskers correspond to the 5th 95th percentile range. The shaded area in the chart corresponds to the 5th 95th percentile range of the entire set of sites for all regions. The grey vertical stripe corresponds to the global median estimate and black vertical stripe corresponds to the global geometric mean of this set. Table 7 displays results of the statistical analysis of the validated atmospheric POPs concentrations obtained by active sampling. Both baseline concentrations and historical time trends are estimated. 20

23 Table 6. Data sources for air monitoring using active sampling (if data entries from no site are available, the cell is shaded) compound Africa CEEC GRULAC WEOG Asia and Pacific p,p'-ddt Total sites: 1 Total records: 14 Total sites: 3 Total records: 7 Total countries: 9 Total sites: 16 Total records: 140 Total countries: 9 Total sites: 33 Total records: 39 p,p'-dde Total sites: 1 Total records: 14 Total sites: 3 Total records: 7 Total countries: 9 Total sites: 16 Total records: 140 Total countries: 9 Total sites: 33 Total records: 40 hexachlorbenzene (HCB) Total countries: 2 Total sites: 4 Total records: 19 Total countries: 8 Total sites: 21 Total records: 145 Total countries: 9 Total sites: 32 Total records: 38 Σ 6 polychlorinated biphenyls (PCB) Total sites: 1 Total records: 14 Total countries: 9 Total sites: 11 Total records: 81 Total sites: 21 Total records: 21 PCB 28 Total sites: 1 Total records: 14 Total countries: 9 Total sites: 13 Total records: 100 Total sites: 21 Total records: 21 PCB 52 Total sites: 1 Total records: 14 0 Total sites: 18 Total records: 153 Total sites: 21 Total records: 21 PCB 101 Total sites: 1 Total records: 14 0 Total sites: 16 Total records: 136 Total sites: 21 Total records: 21 PCB 138 Total sites: 1 Total records: 14 Total countries: 9 Total sites: 11 Total records: 85 Total sites: 21 Total records: 21 PCB 153 Total sites: 1 Total records: 14 Total countries: 9 Total sites: 11 Total records: 85 Total sites: 21 Total records: 21 PCB 180 Total sites: 1 Total records: 14 Total countries: 9 Total sites: 11 Total records: 86 Total sites: 21 Total records: 21 α-hch Total countries: 2 Total sites: 4 Total records: 19 Total countries: 9 Total sites: 25 Total records: 175 γ-hch Total countries: 2 Total sites: 4 Total records: 19 Total countries: 9 Total sites: 25 Total records: 173 Records: annual POPs concentration values aggregated from primary records by arithmetic mean 21

24 GMP Data Warehouse Data Analysis and Reporting Figure 5. Variability analysis of atmospheric concentrations of POPs in the five UN regions, air monitoring using active sampling (regional median: boxes; regional geometric mean: circles; regional 5th 95th percentile ranges: whiskers; global median: grey vertical stripe; global geometric mean: black vertical stripe; global 5th 95th percentile range: shaded area). 22

25 Table 7. Data summary for air monitoring using active sampling Compound Overall concentration Detectable alternative 1 Historical time trends trend identification trend quantification p,p'-ddt ; Period: ; Records: 200 Median (5-95 th percentile): 0.8 ( ) pg/m 3 Geometric mean (95% CI): 1.8 ( ) pg/m 3 Mean (standard deviation): 26.2 (161.1) pg/m 3 Period: Sites: 23 Value: pg/m 3 /y Series (sites): 12 Period: Significant: 4 (4-) Kendall tau: -0.6 Series (sites): 4 Period: (-1.6; -0.2) pg/m 3 /y p,p'-dde ; Period: ; Records: 201 Median (5-95 th percentile): 1.8 ( ) pg/m 3 Geometric mean (95% CI): 3.5 ( ) pg/m 3 Mean (standard deviation): 16.6 (60.9) pg/m 3 Period: Sites: 24 Value: 20.6 pg/m 3 /y Series (sites): 13 Period: Significant: 4 (4-) Kendall tau: -0.6 Series (sites): 4 Period: (-15.2; 4.6) pg/m 3 /y hexachlorbenzene (HCB) ; Period: ; Records: 202 Median (5-95 th percentile): 60.5 ( ) pg/m 3 Geometric mean (95% CI): 52.2 ( ) pg/m 3 Mean (standard deviation): 89.7 (176.1) pg/m 3 Period: Sites: 25 Value: 31.1 pg/m 3 /y Series (sites): 13 Period: Significant: 4 (4-) Kendall tau: -0.4 Series (sites): 4 Period: (-57.2; 17.5) pg/m 3 /y Σ 6 polychlorinated biphenyls (PCB) ; Period: ; Records: 116 Median (5-95 th percentile): 8.8 ( ) pg/m 3 Geometric mean (95% CI): 12.7 ( ) pg/m 3 Mean (standard deviation): 22.4 (34.2) pg/m 3 Period: Sites: 12 Value: 27.5 Series (sites): 9 Period: Significant: 5 (4-; 1+) Kendall tau: 0.5 Series (sites): 5 Period: (-110.2;54.0) pg/m 3 /y PCB 28 ; Period: ; Records: 135 Median (5-95 th percentile): 3.5 ( ) pg/m 3 Geometric mean (95% CI): 4.6 ( ) pg/m 3 Mean (standard deviation): 7.2 (10.8) pg/m 3 Period: Sites: 14 Value: 4.9 Series (sites): 11 Period: Significant: 3 (3-) Kendall tau: -0.8 Series (sites): 3 Period: (-15.0; -1.6) pg/m 3 /y PCB 52 ; Period: ; Records: 188 Median (5-95 th percentile): 3.6 ( ) pg/m 3 Geometric mean (95% CI): 4.3 ( ) pg/m 3 Mean (standard deviation): 6.5 (8.0) pg/m 3 Period: Sites: 18 Value: 5.2 Series (sites): 15 Period: Significant: 7 (5-; 2+) Kendall tau: -0.2 Series (sites): 7 Period: (-17.3; 10.3) pg/m 3 /y PCB 101 ; Period: ; Records: 171 Median (5-95 th percentile): 1.7 ( ) pg/m 3 Geometric mean (95% CI): 2.2 ( ) pg/m 3 Mean (standard deviation): 3.4 (5.1) pg/m 3 Period: Sites: 16 Value: 2.9 Series (sites): 13 Period: Significant: 7 (5-; 2+) Kendall tau: -0.2 Series (sites): 7 Period: (-6.6; 3.4) pg/m 3 /y PCB 138 ; Period: ; Records: 120 Median (5-95 th percentile): 0.5 ( ) pg/m 3 Geometric mean (95% CI): 1.0 ( ) pg/m 3 Mean (standard deviation): 2.2 (5.1) pg/m 3 Period: Sites: 12 Value: 6.0 Series (sites): 9 Period: Significant: 4 (4-) Kendall tau: -0.6 Series (sites): 4 Period: (-27.7; 13.9) pg/m 3 /y PCB 153 ; Period: ; Records: 120 Median (5-95 th percentile): 0.7 ( ) pg/m 3 Geometric mean (95% CI): 1.4 ( ) pg/m 3 Mean (standard deviation): 3.1 (7.3) pg/m 3 Period: Sites: 12 Value: 8.2 Series (sites): 9 Period: Significant: 4 (4-) Kendall tau: -0.6 Series (sites): 4 Period: (-37.0; 18.2) pg/m 3 /y 23

26 PCB 180 ; Period: ; Records: 121 Median (5-95 th percentile): 0.2 ( ) pg/m 3 Geometric mean (95% CI): 0.5 ( ) pg/m 3 Mean (standard deviation): 1.3 (4.2) pg/m 3 Period: Sites: 12 Value: 6.8 Series (sites): 9 Period: Significant: 5 (5-) Kendall tau: -0.6 Series (sites): 5 Period: (-28.4; 16.5) pg/m 3 /y α-hch ; Period: ; Records: 194 Median (5-95 th percentile): 18.1 ( ) pg/m 3 Geometric mean (95% CI): 17.9 ( ) pg/m 3 Mean (standard deviation): 29.7 (34.9) pg/m 3 Period: Sites: 24 Value: 25.0 pg/m 3 /y Series (sites): 15 Period: Significant: 12 (12-) Kendall tau: -0.9 Series (sites): 12 Period: (-129.4; 30.3) pg/m 3 /y γ-hch ; Period: ; Records: 192 Median (5-95 th percentile): 9.9 ( ) pg/m 3 Geometric mean (95% CI): 9.5 ( ) pg/m 3 Mean (standard deviation): 14.9 (17.8) pg/m 3 Period: Sites: 23 Value: 13.4 pg/m 3 /y Series (sites): 15 Period: Significant: 10 (10-) Kendall tau: -0.7 Series (sites): 10 Period: (-51.5; 17.0) pg/m 3 /y 1 The detectable alternative is expressed as the detectable mean annual difference with power = 0.8 and α = The computation is based on the set of sites with available trend data, which are used for the estimation of the variability of time-related differences. 24

27 4 Reporting of GMP data collected in the period and its extension Interactive reporting tools prepared for the GMP 2 campaign are based on the on-line visualization created for GMP 1 (see to ensure consistency of reported content; functionality of these tools for the GMP 2 purposes is broadened for time series analysis and additional map functions. All tools share standardized data selection and customization of analysis settings. 4.1 World map monitoring overview An animated map shows the geographical coverage of data with predefined filters for displaying available datasets: matrix, compound and year. For the latter, the user has the option to select a single point in time or a time interval. The user gets data coverage for a selected matrix and a point or interval in time or data coverage for a particular point/interval in time, matrix and compound as the output. Figure 6. Position of sampling sites on a world map 25

28 GMP Data Warehouse Data Analysis and Reporting Figure 7. Coverage of countries by monitoring 4.2 Available data parameters A chart shows sampling frequency for each compound in a particular year (user-defined). Compounds are listed on x-axis, countries on y-axis. The predefined filters include: matrix, sampling method (for air), type of measurement site (for air urban, rural, remote etc.), UN region, compound and year. Detailed data are displayed by clicking on a chart cell. Multiple selections are enabled in the predefined filters. Figure 8. Available data parameters 4.3 Available data years Chart shows the sampling frequency in a user-defined time period. Years are listed on x-axis, countries on y-axis. Predefined filters for the selection include: matrix, sampling method (for air), type of measurement site (for air urban, rural, remote etc.), UN region, compound and time period (three 26

29 years at a minimum). Detailed data are displayed by clicking on a chart cell. Multiple selections are enabled in the predefined filters. Figure 9. Available data years 4.4 Reported values summary statistics Chart shows a summary statistics for each reported value (mean, median, min, max). Concentration values are displayed in the form of box-and-whisker plots with mean/median value, minimum and maximum. Concentrations are listed on x-axis, reported sites/countries on y-axis. The predefined filters include: matrix, sampling method (for air), type of measurement site (for air urban, rural, remote etc.), UN region, compound and year. 4.5 Reported values advanced interactive map visualization The output is a map with visualization at each site corresponding to the measured POPs concentrations (the user can select prioritized statistics, mean, median, etc.). Filters for the analysis include: matrix, sampling method (for air), type of measurement site (for air urban, rural, remote etc.), compound, year, UN region. Multiple selections are enabled for the list of compounds / years. The map allows interactive panning and zooming. 4.6 Time series analysis The report offers screening examination of the trends in data as well as exact statistical evaluation of the trend significance via standard tests (according to the GMP guidance document, Chapter 3: Statistical Considerations and the Technical Note to Chapter 3 ). Filters for the analysis include: matrix, compound and measurement site. The option to adjust the analysis settings, e.g., selection of the method for trend evaluation, is available. 27

30 GMP Data Warehouse Data Analysis and Reporting Figure 10. Reported values summary statistics Figure 11. Reported values advanced interactive map visualization 28

31 Figure 12. Time series analysis 29

32 5 Documentation of new GIS module proposed for GMP DWH A GIS server extends functions of the GMP Data warehouse by providing support for spatial data visualizations, map distribution and for performing spatial analysis in real-time. The GIS server also allows linking additional information to stored data, all based on spatial references. GIS Server is an important module of common ICT (information and communication technologies) infrastructure which provides technological background to all GMP DWH tools and services. GIS server is connected to the GMP DWH and provides data, together with application server and R server, in a form suitable for presentation, visualization and exports. Figure 13. GIS server as a component of GMP DWS 30

33 GIS module of GMP DWH enables work with spatial data. GIS server custom software components will be developed to enable visualization of spatial data, their analysis, extraction of the hidden information and performing advanced spatially based queries, processing and transformations. Table 8. Specification of new GIS module functions in the GMP DWS GIS Module for GMP DWH Software Component GIS server integration Dynamic map services GMP Spatial Web Engine Map Browser Plugin for GMP Spatial Web Engine GMP Spatial Web Engine - interoperability Technology background GIS server is based on the ESRI s ArcGIS Server integration. ArcGIS server represents hi-end technology in the area of GIS and provides a broad range of additional features for data integration and interoperability. Integration of ArcGIS Server with the database and data services of GMP DWH. Web technologies, software libraries for communication with ArcGIS Server (ArcGIS API for JavaScript / Flex) and specialized software frameworks and technologies focused on spatial and data visualizations. GMP Spatial web engine, web technologies, software libraries for communication with ArcGIS Server. Interoperability of whole GIS Module is ensured by support of standardized data formats designed by OGC (Open Geospatial Consortium). Benefits ArcGIS Server enables to publish static map compositions, generate dynamic map layers based on data from various sources, process calculations over spatial features based on individual user inputs. In general, GIS server provides a platform for publishing and sharing spatially based information. Interconnection of ArcGIS Server with GMP data warehouse enables creation of specialized map layers dynamically generated from the current data warehouse content. This enables, e.g., development of map visualizations of a selected pollutant in a given geographic area and in a specific time range. GMP Spatial Web Engine is a specialized software product derived from the GIS module. GMP Spatial Web Engine is based on the common web technologies and is accessible via the Internet. The engine is connected to GIS server and other software components. GMP Spatial Web Engine is a basic platform on which other software components are being developed in a form of plugins (software extensions). Map Browser Plugin allows displaying map services published upon GMP DWH in a common internet browser. Feature of inserting custom map layers enables users to combine prepared map layers with custom map services published on any server accessible via the 31

34 GMP Spatial Web Engine - import feature Time-lapse spatial visualizations Geoprocessing Quality print capabilities and export to PDF GMP Spatial Web Engine, data formats able to store tabular data. Time-lapse spatial visualizations are based on the so called time aware feature layer of ArcGIS. Based on geo-processing capabilities of ArcGIS Server. GMP Spatial web engine, web technologies, software libraries for communication with ArcGIS Server. Internet. GMP Spatial Web Engine enables inserting spatial data directly from the user's computer and display them together with prepared map layers in one composition. Time aware feature layers will be used to visualize temporally changing spatial data in appropriate cases. In general, geo-processing services enable to create calculations over spatial data. Results can be used in map compositions accessible via Map Browser of GIS Module. GIS Module is equipped with quality print and export engine; thus, the whole GIS Module can be used as a tool for creating reports. 32

GMP Data Warehouse Data Management

GMP Data Warehouse Data Management GMP Data Warehouse Data Management Richard Hůlek, Jana Klánová, Ladislav Dušek, Jiří Jarkovský, Daniel Klimeš, Daniel Schwarz, Petr Holub, Jiří Kalina Jakub Gregor, Jana Borůvková, Kateřina Šebková GMP

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls [email protected] MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Part II Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Part II

Part II Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Part II Part II covers diagnostic evaluations of historical facility data for checking key assumptions implicit in the recommended statistical tests and for making appropriate adjustments to the data (e.g., consideration

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Theory at a Glance (For IES, GATE, PSU)

Theory at a Glance (For IES, GATE, PSU) 1. Forecasting Theory at a Glance (For IES, GATE, PSU) Forecasting means estimation of type, quantity and quality of future works e.g. sales etc. It is a calculated economic analysis. 1. Basic elements

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST UNDERSTANDING THE DEPENDENT-SAMPLES t TEST A dependent-samples t test (a.k.a. matched or paired-samples, matched-pairs, samples, or subjects, simple repeated-measures or within-groups, or correlated groups)

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]

More information

Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation.

Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation. Computer Workshop 1 Part I Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation. Outlier testing Problem: 1. Five months of nickel

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp.

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Tools in the Global Monitoring Plan Data Warehouse

Tools in the Global Monitoring Plan Data Warehouse Tools in the Global Monitoring Plan Data Warehouse version 3 September 2015 Stockholm Convention Regional Centre in the Czech Republic and Research Centre for Toxic Compounds in the Environment Masaryk

More information

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 [email protected] 1. Descriptive Statistics Statistics

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

More information

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected]

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Statistics in Medicine Research Lecture Series CSMC Fall 2014 Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Quality. Guidance for Data Quality Assessment. Practical Methods for Data Analysis EPA QA/G-9 QA00 UPDATE. EPA/600/R-96/084 July, 2000

Quality. Guidance for Data Quality Assessment. Practical Methods for Data Analysis EPA QA/G-9 QA00 UPDATE. EPA/600/R-96/084 July, 2000 United States Environmental Protection Agency Office of Environmental Information Washington, DC 060 EPA/600/R-96/08 July, 000 Guidance for Data Quality Assessment Quality Practical Methods for Data Analysis

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480 1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for the Difference Between Two Means Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

More information

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

The Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test 1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Forecasting in supply chains

Forecasting in supply chains 1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

THE KRUSKAL WALLLIS TEST

THE KRUSKAL WALLLIS TEST THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Pearson's Correlation Tests

Pearson's Correlation Tests Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation

More information

Non-Parametric Tests (I)

Non-Parametric Tests (I) Lecture 5: Non-Parametric Tests (I) KimHuat LIM [email protected] http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent

More information

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Instructions for SPSS 21

Instructions for SPSS 21 1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

HLM software has been one of the leading statistical packages for hierarchical

HLM software has been one of the leading statistical packages for hierarchical Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics [email protected] http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

7 Time series analysis

7 Time series analysis 7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

The Statistics Tutor s Quick Guide to

The Statistics Tutor s Quick Guide to statstutor community project encouraging academics to share statistics support resources All stcp resources are released under a Creative Commons licence The Statistics Tutor s Quick Guide to Stcp-marshallowen-7

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST EPS 625 INTERMEDIATE STATISTICS The Friedman test is an extension of the Wilcoxon test. The Wilcoxon test can be applied to repeated-measures data if participants are assessed on two occasions or conditions

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Robust procedures for Canadian Test Day Model final report for the Holstein breed

Robust procedures for Canadian Test Day Model final report for the Holstein breed Robust procedures for Canadian Test Day Model final report for the Holstein breed J. Jamrozik, J. Fatehi and L.R. Schaeffer Centre for Genetic Improvement of Livestock, University of Guelph Introduction

More information

Quantitative Inventory Uncertainty

Quantitative Inventory Uncertainty Quantitative Inventory Uncertainty It is a requirement in the Product Standard and a recommendation in the Value Chain (Scope 3) Standard that companies perform and report qualitative uncertainty. This

More information

Time series analysis as a framework for the characterization of waterborne disease outbreaks

Time series analysis as a framework for the characterization of waterborne disease outbreaks Interdisciplinary Perspectives on Drinking Water Risk Assessment and Management (Proceedings of the Santiago (Chile) Symposium, September 1998). IAHS Publ. no. 260, 2000. 127 Time series analysis as a

More information