Descriptive Statistics on Companies in the Forest Products Industry



From this document you will learn the answers to the following questions:

Who is the author of the paper Descriptive Statistics on Companies in the Forest products Industry?

What ratio exceeded the boundaries for normal distributions?

What is the main reason the paper is used?

Similar documents
UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Projects Involving Statistics (& SPSS)

How To Check For Differences In The One Way Anova

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Non Parametric Inference

MEASURES OF LOCATION AND SPREAD

II. DISTRIBUTIONS distribution normal distribution. standard scores

Lecture Notes Module 1

Normality Testing in Excel

Simple linear regression

Tutorial 5: Hypothesis Testing

HYPOTHESIS TESTING WITH SPSS:

UNDERSTANDING THE TWO-WAY ANOVA

Skewness and Kurtosis in Function of Selection of Network Traffic Distribution

A MULTIVARIATE OUTLIER DETECTION METHOD

How To Test For Significance On A Data Set

Descriptive Statistics

Permutation Tests for Comparing Two Populations


Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Impact of Skewness on Statistical Power

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Walk the Line Written by: Maryann Huey Drake University

THE IMPACT OF DAILY TRADE VOLUME ON THE DAY-OF-THE- WEEK EFFECT IN EMERGING STOCK MARKETS

Additional sources Compilation of sources:

THE KRUSKAL WALLLIS TEST

Chapter 7 Section 7.1: Inference for the Mean of a Population

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST

6.4 Normal Distribution

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

Data Transforms: Natural Logarithms and Square Roots

NCSS Statistical Software

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

DATA INTERPRETATION AND STATISTICS

4. Continuous Random Variables, the Pareto and Normal Distributions

12: Analysis of Variance. Introduction

T test as a parametric statistic

1.5 Oneway Analysis of Variance

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Module 4: Data Exploration

Descriptive Statistics

Week 1. Exploratory Data Analysis

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Statistical tests for SPSS

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Exploratory data analysis (Chapter 2) Fall 2011

Lesson 4 Measures of Central Tendency

Descriptive Statistics

Diagrams and Graphs of Statistical Data

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

Lecture 1: Review and Exploratory Data Analysis (EDA)

How To Write A Data Analysis

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Java Modules for Time Series Analysis

What Does the Normal Distribution Sound Like?

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

INTRODUCING THE NORMAL DISTRIBUTION IN A DATA ANALYSIS COURSE: SPECIFIC MEANING CONTRIBUTED BY THE USE OF COMPUTERS

Using Excel for inferential statistics

A Picture Really Is Worth a Thousand Words

Evaluating System Suitability CE, GC, LC and A/D ChemStation Revisions: A.03.0x- A.08.0x

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Wealth inequality: Britain in international perspective. Frank Cowell: Wealth Seminar June 2012

Variables Control Charts

START Selected Topics in Assurance

Interpreting Data in Normal Distributions

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

NCSS Statistical Software

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

determining relationships among the explanatory variables, and

Exercise 1.12 (Pg )

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Descriptive Analysis

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Chapter 7. One-way ANOVA

Mathematics within the Psychology Curriculum

8. THE NORMAL DISTRIBUTION

Data Exploration Data Visualization

INTERNATIONAL COMPARISONS OF PART-TIME WORK

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Study Guide for the Final Exam

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

AP * Statistics Review. Descriptive Statistics

The Standard Normal distribution

Stat 5102 Notes: Nonparametric Tests and. confidence interval

The labour market, I: real wages, productivity and unemployment 7.1 INTRODUCTION

Lecture 2. Summarizing the Sample

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Convex Hull Probability Depth: first results

Module 3: Correlation and Covariance

Validating Market Risk Models: A Practical Approach

Quantitative Methods for Finance

Transcription:

Descriptive Statistics on Companies in the Forest Products Industry Kristoffer Öström IAMSR, Åbo Akademi University Lemminkäineng. 14 B, 252 Åbo, Finland Kristoffer.Ostrom@abo.fi Barbro Back IAMSR, Åbo Akademi University Lemminkäineng. 14 B, 252 Åbo, Finland Barbro.Back@abo.fi Hannu Vanharanta Pori School of Technology and Economics Pohjoisrantak. 11, P.O. Box 33, 281 Pori, Finland Hannu.Vanharanta@pori.tut.fi Ari Visa Tampere University of Technology P.O. Box 553, 3311 Tampere, Finland Ari.Visa@cs.tut.fi Turku Centre for Computer Science TUCS Technical Report No 33 February 2 ISBN 952-12-614-4

ABSTRACT A prerequisite of many statistical methods is that the underlying data material is normally or approximately normally distributed. If the data material violates this assumption, the results of an analysis might be misleading or incorrect. The overall objective with this paper is to provide the reader with descriptive statistics over two separate sets of empirical data and test these sets against the assumption of normality. The data material consists of financial ratios calculated from the international forest products industry. This paper is part of a larger study where we want to analyse the forest products companies financial performance world-wide. For that study it is of outmost importance that we know whether the variables in each of our sets of data are normally distributed. Three different test statistics: skewness, kurtosis and Kolmogorov- Smirnov with Lilliefors correction are used to test the data material against normality. Results achieved show that the Kolmogorov-Smirnov test statistic rejected normality for all variables. The skewness ratio for all variables fell within the specified range of approximately normal distributions while half of the kurtosis ratios exceeded the boundaries for approximately normally distributed data. Key words: Normality, Skewness, Kurtosis, Kolmogorov-Smirnov TUCS Research Group Computational Intelligence in Business

1. Introduction The most widely recognised distribution in the world is perhaps the normal distribution: the well-known bell-shaped curve. Textbooks in statistics often goes into great depth when describing how various statistical methods can be used in modelling large sets of data. The examples used in these books are often constructed on clean sets of data and both the techniques as well as the outcome work well. This situation, the textbook one, is very rare in the real world. It is the nature of real world data to be skewed, poorly distributed or otherwise incorrect. These issues, if not properly addressed, can undermine the even most robust statistical method to yield reliable and stable results. There seems to be a tendency that current research reporting ignores the shape of a distribution [Hopkins et al., 199]. All too often sets of data are dumped in statistical software packages without giving a thought on the distribution of the data material and only hoping for the best outcome. In this paper six financial ratios will be investigated and tested against the assumption of normality. The ratios will be extracted from two empirical databases with information about forest products companies world-wide. The first database encloses data from the time period 1985 to 1989 while the other database includes financial information from the years 1996 to 1997. The ratios will be inspected in terms of identifying anomalous values, pre-processed with appropriate statistical methods and tested for normality. Financial ratios are used because they are good indicators on a company s overall performance. As the financial reports of a business contains a wealth of financial information it is, of course, these ratios that are essential when financially measuring a company or even an entire industrial sector. Moreover, financial ratios are especially exposed to distortions in their distributions since they are a result of human management activities to measure and control natural business processes. Many studies have shown that financial data, and from them extracted financial ratios, do not tend to be normally distributed. For example, Tam et al. (1992) tested 19 financial ratios for normality and the tests indicated that 15 out of the 19 ratios were not normally distributed. Dorfman (1993) again proved that a large portion of agricultural economic data is inconsistent with the assumption of normality. This paper is part of a larger study where we want to analyse the forest products companies financial performance world-wide. For that study it is of outmost importance that we know whether the variables in each of our sets of data are normally distributed or not. The questions to be answered in this paper are therefore: 1) do financial ratios in the forest products industry diverge from normality and 2) should normal multivariate statistical methods be applied or are instead non-

parametrical methods such as neural networks to prefer when analysing real world, empirical sets of data. The rest of the paper is organised as follows: Section 2 describes the sets of data and the applied procedure for selecting the financial ratios. Section 3 presents the preliminary data inspection and used statistical methods. In section 4 the results are presented while the conclusions of this study are found in the final section 5. 2. The Database and Selection of Ratios In this paper two sets of data are used for the statistical calculations. The Green Gold Financial Reports database [Salonen et al., 199a, 199b, 1991] is used as a source of information to acquire the 1985 to 1989 data while the set of data collected in [Öström, 1999] is used to provide information from the years 1996 and 1997. The Green Gold database consists of standardised income statements, balance sheets and cash flow statements of 16 companies in the international forest products industry. The database also consists of specific financial ratios calculated from the standardised reports as well as general company information concerning production volumes, products etc. The latter database includes financial information collected from companies www pages through Internet. This database consists of 34 major forest products companies from the Nordic countries and North America. The selection process of ratios is one of the most important decisions an analyst makes when preparing a statistical analysis [Stein, 1993]. In the first database there were more than 5 different key ratios that represented the financial situation of each company and in the latter enough information to calculate several important financial ratios. The selection of appropriate financial ratios in this paper is based on a paper conducted by Vanharanta et al. (1995). In that paper a cognitive approach was used to select the most important variables that would adequately represent a company s overall performance. Ten corporate analysts with long experience in the field of capital investments and accounting were asked to choose the most important variables from the Green Gold Financial Reports database for a performance analysis task. This approach resulted in nine different financial key ratios. From those nine ratios six were selected to represent the points of reference in this paper. The reduction from nine to six ratios was done in order to compile the two sets of data and make comparative studies possible.

The financial ratios retrieved from the databases and used in this paper are: 1. Operating profit / sales 2. Profit after financial items / sales 3. Return on equity 4. Solidity 5. Current ratio 6. Funds from operations / sales We note that there are three profitability measures (1-3), one that measures capital structure (4), one liquidity measure (5), and one cash flow measure (6). Certainly, including more financial ratios measuring the capital structure, working capital or indebtedness may further improve the behavioural picture of this industry sector. However, the limited quantity of available financial information in the 1996 1997 set of data resulted in a trade-off where the added benefits of using more financial ratios would not have exceeded the hardship of obtaining them. The selected ratios are therefore considered sufficient in numbers for normality testing in the international forest products industry. 3. Data Preparation and Test Methods The financial ratios collected from the databases were first processed and cleaned from anomalous outliers i.e. from erroneous data entries that can have a very disruptive influence on any analysis. For this purpose boxplots and frequency histograms of individual variables were used. The boxplot is a box summarising the distribution of a set of data values where anomalous data entries appear as individual plots far outside the box. In Figure 1 the frequency histogram of Return on equity is presented. Figure 1 reveals that the data material contains at least two erroneous data entries. These anomalous values are discarded when assured that they in fact are actual errors and not divergent behaviours of some of the companies in the database. Figure 1 distribution of Return on equity

Each financial ratio was examined in the same manner and anomalies were identified. These values, a total of 14, were then excluded. The final set of data contained a total of 3177 observations for the years 1985 to 1989, distributed on 525 to 531 observations per financial ratio and a total of 355 observations from the years 1996 to 1997, distributed on 57 to 6 observations per financial ratio. In Table 1 the minimum and maximum values for both databases are presented. The companies included in the study are listed in Table 2 and Table 3. Financial ratios 1985 1989 data sets 1996 1997 data sets unit min max min max Operating profit / sales % -5.1 35. -3.1 19.3 Profit after financial items / sales % -2.9 27.2-15.6 19. Return on equity % -57.5 85.1-19.1 52.5 Solidity ratio 1.3 85.6-13.9 63.4 Current ratio ratio.3 4.5.4 3.7 Funds from operations / sales % -14.9 42.8-5.1 34.3 Table 1 Range of the financial ratios We note that the range for the period 1996 1997 is considerably smaller compared to that for the period 1985-1989. The minimum and maximum values for each variable are also to be found in the 1985 1989 set of data except for solidity, which has its minimum located in the latter set of data. As earlier mentioned, natural processes that possess similar features may well produce distributions that are approximately normal when there is an appreciable number of observations, the socalled law of large numbers [Feller, 1968]. However, when analysing real-world processes and problems normally distributed data is rarely available [Pryzdek, 1995]. A reason to this is that the objective of most human management activity is to measure and control natural processes. Other distortions, which obstruct normality, also occur when people try to measure the results from different activities. In this paper three main methods: skewness, kurtosis and Kolmogorov-Smirnov with Lilliefors correction are used to test the variables for normality. Skewness is a method for determining the symmetry of a distribution. The skewness gives the answer on whether the function is shaped like a bell, half-bell, dented i.e. the asymmetry of the distribution around its mean. The normal distribution is symmetric, and has a skewness value of zero. In a normal distribution the mean and the median will also have the same value. Data from a positively skewed distribution i.e. skewed to the right will have values grouped together below the mean but also a long tail of values above the mean. Negative skewed distributions have values grouped above the mean with tails extending toward more negative values. If the distribution is skewed to the right the mean of the distribution will be larger than the median and vice versa. A skewness value greater than 2. [SPSS, 1999] indicates a distribution that differs significantly from a normal distribution.

Sweden AB Statens Skogsindustrier Graningeverkens AB Korsnäs AB Mo och Domsjö AB Munkedals AB Munksjö AB Norrlands Skogsägarens Cellulosa AB Norrsundets Bruks AB Obbola Linerboard AB Rottneros Bruk AB Svenska Cellulosa AB (SCA) Stora Kopparbergs Bergslags AB Södra Skogsägarna AB Finland A. Ahlström Oy Enso-Gutzeit Oy Kemi Oy Kymmene Oy Metsä-Serla Oy Rauma-Repola Oy Sunila Oy Oy Tampella Ab Oy Veitsiluoto Ab Norway Norske Skogindustrier A.S. A/S Union Austria Laakirchen Lenzig Leykam Nettingsdorfer Steyrermuhl Zellstoff Germany Europa Carton Feldmuhle Haindl Papier Hannover MD Papier Schwöbishe Waldhof-A. Zanders Italy Saffa Cartiera Holland Berghuizer Buchmann Crown van Gelder Gelderse Koninklijke N.V. Papierfabriek Parenco Portugal Portucel-Empresa Group Caima Celbi Soporcel UK Associated Bowater Inrdustries BPB Industries David S. Smith France Arjomari Aussedat Beghin-Say Kaysersberg La Cellulose Du. La Rochette Sibylle Spain Empresa Nacion. La Papelera Swiss Attisholz Biber Holding Holstoff Industrieholding Papierfab. Perlen James Cropper Canada Abitibi-Price Inc. Canfor Corporation Cascades Inc. Canadian Pacific Forest Products Ltd. Crestbrook Forest Industries Ltd. Doman Industries Ltd. Domtar Inc. Donohue Inc. Fletcher Challenge Canada Ltd. International Forest Products Ltd. MacMillan Bloedel Ltd. Noranda Inc. Noranda Forest Inc. Perkins Papers Ltd. Repap Enterprises Corporation Inc. Rolland Inc. Scott Paper Ltd. Tembec Inc. USA Boise Cascade Corporation Champion International Corporation Chesapeake Corporation Consolidated Papers Inc. Dennison Manufacturing Company The Dexter Corporation Federal Paper Board Company Gaylord Container Corporation Georgia-Pacific Corporation P.H. Glatfelter Company Great Northern Nekoosa Corporation International Paper Company James River Corporation Kimberly-Clark Corporation Longview Fibre Company Louisiana-Pacific Corporation Mead Corporation Mosinee Paper Corporation Pentair Inc. Potlatch Corporation The Procter & Gamble Company Scott Paper Company Sonoco Products Company Stone Container Corporation Tambrands Inc. Temple-Inland Inc. Union Camp Corporation Wausau Paper Mills Company Westvaco Corporation Weyerhaeuser Company WTD Industries Inc. Sweden MoDo AB Munksjö AB SCA AB Södra AB Finland ENSO OY Metsä-Serla OY UPM-Kymmene OY Norway Hunsfoss Fabrikker Norske Skog A.S. USA Boise Cascade Bowater Champion International Consolidated Papers Fort James Georgia-Pacific Group International Paper Jefferson-Smurfit Corp. Kimberly-Clark Mead Potlatch Corp. Stone Container Union Camp Westvaco Weyerhaeuser Willamette Industries Canada Alliance Canfor Cascades Inc. Crestbrook Forest Ind. Ltd. Doman Industries Domtar Inc. MacMillan Bloedel Sonoco Inc. Tembec inc. Table 2 Companies included in the 1985-1989 set of data Table 3 Companies included in the 1996-1997 set of data

The second test for normality measures the kurtosis. While skewness indicates the asymmetry of the distribution around its mean, kurtosis measures the extent to which observations cluster around a central point. The value of the kurtosis coefficient is for normally distributed data. Positive kurtosis denotes a relatively peaked distribution while negative kurtosis indicates a relatively flat distribution. A kurtosis value greater than ±2. indicate, in the same way as skewness, a distribution that differs significantly from the normal distribution. The third test is the Kolmogorov-Smirnov test statistic. The Kolmogorov-Smirnov test is defined as the maximum vertical distance between an empirical distribution (the data sample) and a specified distribution function (the normal distribution function). In the Kolmogorov-Smirnov method the mean and/or variance are not specified beforehand. This implies a risk that the method becomes less powerful i.e. the probability of rejecting a null hypothesis when it is in fact false increases, and therefore a Lilliefors correction [Lilliefors, 1969] is added to the method. The Lilliefors test for normality adjusts the Kolmogorov-Smirnov test specifically for testing for normality when the mean and variance are unknown. Further information on the Kolmogorov-Smirnov test statistic can be found in books devoted exclusively to non-parametric statistics, such as Conover (198). Most statistical packages and spreadsheet programs provide some form of the skewness and kurtosis functions. Well worth to remember is that several different measures of skewness and kurtosis have been proposed [Groeneveld et al., 1984]. As each method differs slightly from the other they also yield slightly different results. Therefore, it is important to determine which formula is being used in the actual software where the analysis is being carried out. In this paper some basic statistical measures such as the mean, median and standard deviation are also used.

4. Results In this chapter the analysis of the databases will be carried out. The analysis was performed using the SPSS version 9. statistical program package [SPSS, 1999]. The numerical data was presented to the program after outliers and distributional anomalies were removed. Each variable was processed separately and in Table 4 summarised statistics for the financial ratios of both databases are presented. Financial ratios 1985 1989 1996 1997 Unit Mean Median Std.Dev. Mean Median Std.Dev. Operating profit / sales % 14.94 14.2 6.31 7.49 7.17 4.88 Profit after financial items / sales % 7.75 7.5 6.58 5.1 4.84 5.4 Return on equity (ROE) % 15.63 14.9 12.46 6. 5.58 11.25 Solidity ratio 42.69 43.4 14.65 38.68 39.16 12.23 Current ratio ratio 1.73 1.7.6 1.57 1.5.59 Funds from operations / sales % 11.57 1.7 6. 9.36 8.67 6.66 Table 4 Summarised descriptive statistics for the data sets A breakdown of the table shows that the average Operating profit was, for example, 14,9 % during the eighties, which is much higher than 7,5 % recorded under recent years. The Return on equity was also twice as large in the former database. The other financial ratios follow the same tendency, which is to be somewhat higher during the years 1985-1989 though, not as much as the Operating profit and the Return on equity. Figure 2 presents the distribution for each of the six ratios during 1985 1989 and 1996 1997 in a frequency histogram format. From the picture it is possible to graphically compare the distribution of the variables with their theoretical normal distributions generated from the mean and the standard deviation of each sample. The histograms created from the 1985 1989 set of data are presented to the left while the histograms to the right is from the 1996 1997 set of data. The values on the X-axis, the size of bins and the number of bins are same for each ratio. Endpoints on the X- axis are chosen to adequately represent the entire range of data.

1 1985 1989 1996-1997 Operating profit / sales Operating profit / sales 12 8 1 6 8 6 4 4 2-6. -2. 2. 6. 1. 14. 18. 22. 26. 3. 34. 2-6. -2. 2. 6. 1. 14. 18. 22. 26. 3. 34. 12 Profit after financial items / sales 14 Profit after financial items / sales 1 12 8 1 6 8 6 4 4 2 2 14 28. 24. 2. 16. 12. 8. 4.. -4. -8. -12. -16. -2. Return on equity (ROE) 16-12. -16. -2. 28. 24. 2. 16. 12. 8. 4.. -4. -8. Return on equity (ROE) 12 14 1 12 8 1 6 8 4 6 2 4 2 5 8. 7. 6. 5. 4. 3. 2. 1.. -1. -2. -3. -4. -5. -6. Solidity 1 8. 7. 6. 5. 4. 3. 2. 1.. -1. -2. -3. -4. -5. -6. Solidity 4 8 3 6 2 82.5 77.5 72.5 67.5 62.5 57.5 52.5 47.5 42.5 37.5 32.5 27.5 22.5 17.5 12.5 7.5 2.5 4 81.7 75.6 69.5 63.4 57.3 51.2 45.1 39. 32.8 26.7 2.6 14.5 8.4 2.3-3.8-9.9-16. 1 2

4. 35. 3. 25. 2. 15. 1. 5.. -5. -1. -15. 4. 35. 3. 25. 2. 15. 1. 5.. -5. -1. -15. 1985 1989 1996-1997 Current ratio Current ratio 14 16 12 14 1 12 8 1 8 6 6 4 2.25.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4 2.25.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 14 Funds from operations / sales 12 Funds from operations / sales 12 1 1 8 8 6 6 4 4 2 2 Figure 2 histograms of the distributions. The statistical tests for normality indicated that none of the ratios were precisely normally distributed. Results of the Kolmogorov-Smirnov test are presented in Table 5. This test (at a significance level of α =.5) rejected the normality for all financial ratios in both the former as well as the latter database. Financial ratios 1985 1989 1996-1997 Statistic Significance Statistic Significance Operating profit / sales.71..78.2* Profit after financial items / sales.68..96.2* Return on equity (ROE).86..17.8344 Solidity.47.722.18.7687 Current ratio.11..156.142 Funds from operations / sales.74..116.4867 * This is a lower bound of the true significance. Table 5 Kolmogorov-Smirnov with Lilliefors significance correction

The skewness ratio for all financial ratios fell within the specified range of [-2,2]. However, a breakdown of the skewness ratios in Table 6 gives us the information that most of the values that diverge from zero are positive, which indicates slight asymmetric distributions with tails extending more towards positive values i.e. there are somewhat more companies performing below the industry average. The kurtosis was not as consistent with the normality as skewness. In the first database two values, and in the latter four values exceeded the right boundary of the interval [-2,2] indicating somewhat peaked distributions when compared to the normal distribution. Financial ratios 1985 1989 1996-1997 Skewness Kurtosis Skewness Kurtosis Operating profit / sales.46.57.3 -.19 Profit after financial items / sales 1.48.45.25 Return on equity (ROE).19 6.52 1.21 4.33 Solidity.2.19-1.41 4.83 Current ratio 1.6 1.85 1.59 3.86 Funds from operations / sales.53 3.64 1. 3.1 Table 6 Skewness and kurtosis for the databases To investigate the cause of the non-normal results further we can re-examine the frequency histograms. It is well worth combining the statistical analysis with visual examination of the frequency histograms since no measure of normality is conclusive. In this case, again we notice that the shapes of most histograms are quite peaked, further strengthening the hypothesis that the underlying data material is not normal. 5. Conclusions In this paper two sets of data from the international forest products industry were tested against the assumption of normality. The data material consisted of six financial ratios: three profitability measures, one capital structure measure, one liquidity measure and one ratio measuring the cash flow. These ratios were considered to give an adequate description of a company as well as of the entire forest products industry. When testing the data material for normality anomalous values in the data sets were first identified and removed. Three statistical measures: skewness, kurtosis and Kolmogorov-Smirnov with Lilliefors correction were used to test the cleaned data material against the assumption of normality. Apart from the normality tests basic statistical measures such as the mean, median and standard deviation were also used in order to give a descriptive picture of the data material.

It comes as no surprise that some of the variables in the two sets of data could not be considered to fill the requirements of normally distributed data. The Kolmogorov- Smirnov test statistic rejected normality for all variables. The skewness ratio for all variables fell within the specified range of approximately normal distributions. Two ratios in the 1985 1989 database and four in the 1996 1997 set of data exceeded the kurtosis boundaries for approximately normally distributed data. From these results it is apparent that normally distributed data in empirical databases are all but common also in the international forest products industry. It is therefore worth to emphasise the importance of proper normality testing before performing any kind of statistical analysis, and to consider using non-parametric methods for further analysis of the data material.

References Conover, W. J. (198). Practical Nonparametric Statistics. 2nd ed. New York: Wiley & Sons Dorfman, J. H. (1993). Should normality be a normal assumption? Economics Letters, 42, pp. 143-147. Elsevier Science Publ. Feller, W. (1968). Laws of Large Numbers. An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd ed., pp. 228-247, New York: Wiley & Sons Groeneveld, R. A., G. Meeden (1984). Measuring Skewness and Kurtosis. The Statistician, 33, pp. 391-399 Hopkins, K., D. Weeks (199). Tests for normality and measures of skewness: Their place in research reporting. Educational & Psychological Measurement, Vol. 5. Issue 4, pp. 717 73. Sage Publ. Inc. Lilliefors, H. W. (1969). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 64, pp. 387-389 Pryzdek, T. (1995). Why Normal Distributions Aren't [All That Normal]. Quality Engineering, Vol. 7, No. 4, pp. 769-777 Salonen, H., H. Vanharanta (199a). Financial Analysis World Pulp and Paper Companies 1985-1989, Nordic Countries. Green Gold Financial Reports. Vol. 1., Finland: Ekono OY Salonen, H., H. Vanharanta (199b). Financial Analysis World Pulp and Paper Companies 1985-1989, North America. Green Gold Financial Reports. Vol. 2., Finland: Ekono OY Salonen, H., H. Vanharanta (1991). Financial Analysis World Pulp and Paper Companies 1985-1989, Europe. Green Gold Financial Reports. Vol. 3., Finland: Ekono OY SPSS Incorporated (1999). SPSS for Windows. Release 9..1. Stein, R. (1993). Selecting Data for Neural Networks. AI Expert. February issue, pp. 43-47 Tam, K. Y., M. Kiang (1992). Managerial Applications of Neural Networks: The Case of Bank Failure Predictions. Management Science. Vol. 38. No. 7, pp. 926 947 Vanharanta, H., T. Käkölä, B. Back (1995). Validity and utility of a Hyperknowledge- Based financial Benchmarking System. Proceedings of the Twenty-Eight Annual Hawaii International Conference on Systems Science. Vol. 3, pp. 221-23. IEEE Computer Society Press Öström, K. (1999). Addressing Benchmarking Complexity with Neural Networks and Self-organising Maps. Master s thesis, Åbo Akademi University, Turku