NOMCLUST: AN R PACKAGE FOR HIERARCHICAL CLUSTERING OF OBJECTS CHARACTERIZED BY NOMINAL VARIABLES



Similar documents
A Holistic Method for Selecting Web Services in Design of Composite Applications

Behavior Analysis-Based Learning Framework for Host Level Intrusion Detection

Hierarchical Clustering and Sampling Techniques for Network Monitoring

Chapter 1 Microeconomics of Consumer Theory

Weighting Methods in Survey Sampling


Sebastián Bravo López

Channel Assignment Strategies for Cellular Phone Systems

Classical Electromagnetic Doppler Effect Redefined. Copyright 2014 Joseph A. Rybczyk

A Keyword Filters Method for Spam via Maximum Independent Sets

arxiv:astro-ph/ v2 10 Jun 2003 Theory Group, MS 50A-5101 Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley, CA USA

Improved SOM-Based High-Dimensional Data Visualization Algorithm

Open and Extensible Business Process Simulator

Capacity at Unsignalized Two-Stage Priority Intersections

Findings and Recommendations

Henley Business School at Univ of Reading. Pre-Experience Postgraduate Programmes Chartered Institute of Personnel and Development (CIPD)

FOOD FOR THOUGHT Topical Insights from our Subject Matter Experts

Computer Networks Framing

UNIVERSITY AND WORK-STUDY EMPLOYERS WEB SITE USER S GUIDE

Social Network Analysis Based on BSP Clustering Algorithm

Intelligent Measurement Processes in 3D Optical Metrology: Producing More Accurate Point Clouds

FIRE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero

Algorithm of Removing Thin Cloud-fog Cover from Single Remote Sensing Image

Granular Problem Solving and Software Engineering

Static Fairness Criteria in Telecommunications

10.1 The Lorentz force law

Recovering Articulated Motion with a Hierarchical Factorization Method

1.3 Complex Numbers; Quadratic Equations in the Complex Number System*

Pattern Recognition Techniques in Microarray Data Analysis

Deadline-based Escalation in Process-Aware Information Systems

Discovering Trends in Large Datasets Using Neural Networks

A novel active mass damper for vibration control of bridges

Big Data Analysis and Reporting with Decision Tree Induction

AUDITING COST OVERRUN CLAIMS *

i_~f e 1 then e 2 else e 3

Learning Curves and Stochastic Models for Pricing and Provisioning Cloud Computing Services

HEAT CONDUCTION. q A q T

An Enhanced Critical Path Method for Multiple Resource Constraints

WORKFLOW CONTROL-FLOW PATTERNS A Revised View

THE PERFORMANCE OF TRANSIT TIME FLOWMETERS IN HEATED GAS MIXTURES

VOLUME 13, ARTICLE 5, PAGES PUBLISHED 05 OCTOBER DOI: /DemRes

) ( )( ) ( ) ( )( ) ( ) ( ) (1)

Chapter 5 Single Phase Systems

Impedance Method for Leak Detection in Zigzag Pipelines

How To Fator

Henley Business School at Univ of Reading. Chartered Institute of Personnel and Development (CIPD)

4.15 USING METEOSAT SECOND GENERATION HIGH RESOLUTION VISIBLE DATA FOR THE IMPOVEMENT OF THE RAPID DEVELOPPING THUNDERSTORM PRODUCT

Effects of Inter-Coaching Spacing on Aerodynamic Noise Generation Inside High-speed Trains

An integrated optimization model of a Closed- Loop Supply Chain under uncertainty

A Survey of Usability Evaluation in Virtual Environments: Classi cation and Comparison of Methods

Trade Information, Not Spectrum: A Novel TV White Space Information Market Model

Customer Efficiency, Channel Usage and Firm Performance in Retail Banking

International Journal of Supply and Operations Management. Mathematical modeling for EOQ inventory system with advance payment and fuzzy Parameters

Supply chain coordination; A Game Theory approach

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments

Performance Analysis of IEEE in Multi-hop Wireless Networks

Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising

5.2 The Master Theorem

On Some Mathematics for Visualizing High Dimensional Data

RESEARCH SEMINAR IN INTERNATIONAL ECONOMICS. Discussion Paper No The Evolution and Utilization of the GATT/WTO Dispute Settlement Mechanism

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF

User s Guide VISFIT: a computer tool for the measurement of intrinsic viscosities

State of Maryland Participation Agreement for Pre-Tax and Roth Retirement Savings Accounts

The Basics of International Trade: A Classroom Experiment

Criminal Geographical Profiling: Using FCA for Visualization and Analysis of Crime Data

A Three-Hybrid Treatment Method of the Compressor's Characteristic Line in Performance Prediction of Power Systems

Software Ecosystems: From Software Product Management to Software Platform Management

Table of Contents. Appendix II Application Checklist. Export Finance Program Working Capital Financing...7

A Robust Optimization Approach to Dynamic Pricing and Inventory Control with no Backorders

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 3, MAY/JUNE

Programming Basics - FORTRAN 77

Electrician'sMathand BasicElectricalFormulas

cos t sin t sin t cos t

THE EFFECT OF WATER VAPOR ON COUNTERFLOW DIFFUSION FLAMES

RATING SCALES FOR NEUROLOGISTS

Dynamic and Competitive Effects of Direct Mailings

Dataflow Features in Computer Networks

Agile ALM White Paper: Redefining ALM with Five Key Practices

Intuitive Guide to Principles of Communications By Charan Langton

TRENDS IN EXECUTIVE EDUCATION: TOWARDS A SYSTEMS APPROACH TO EXECUTIVE DEVELOPMENT PLANNING

protection p1ann1ng report

Revista Brasileira de Ensino de Fsica, vol. 21, no. 4, Dezembro, Surface Charges and Electric Field in a Two-Wire

Parametric model of IP-networks in the form of colored Petri net

REDUCTION FACTOR OF FEEDING LINES THAT HAVE A CABLE AND AN OVERHEAD SECTION

From a strategic view to an engineering view in a digital enterprise

Hierarchical Beta Processes and the Indian Buffet Process

Transcription:

The 9 th International Days of Statistis and Eonomis, Prague, September 10-1, 015 NOMCLUST: AN R PACKAGE FOR HIERARCHICAL CLUSTERING OF OBJECTS CHARACTERIZED BY NOMINAL VARIABLES Zdeněk Šul Hana Řezanková Abstrat Clustering of objets haraterized by nominal (ategorial) variables is not still suffiiently solved in software systems. Most of them provide only hierarhial luster analysis with one basi similarity measure for nominal data, the simple mathing oeffiient, whih provides worse results than lustering based on some of reently proposed measures. In this paper, we introdue the nomlust R pakage, whih ompletely overs hierarhial lustering of objets haraterized by nominal variables from a proimity matri omputation to final lusters evaluation. It enables to hoose among 11 similarity measures, three linkage methods of hierarhial lustering, and uses si evaluation riteria based on the mutability and the entropy. Some of the riteria were onstruted for omparison of different luster approahes and some for determining the optimal number of lusters. The pakage ontains both the main funtion, whih overs the whole lustering proess, and a set of subsidiary funtions, whih allow omputing only a part of lustering proess, e.g. a proimity matri, or evaluation of given luster solution. We illustrate the methodology of this pakage on a small dataset. Key words: luster analysis, nominal variables, similarity measures, evaluation riteria JEL Code: C38, C88 Introdution In this paper, we present the nomlust pakage for the R software, whih was developed for hierarhial lustering of objets haraterized by nominal variables. This pakage is available on the Comprehensive R Arhive Network (CRAN) web site 1. We developed it due to the lak of software implementation of similarity measures determined for nominal variables. Nominal variables are urrently lustered by means of methods based on the simple mathing (SM) oeffiient, whih are usually used in statistial software. This treatment is not 1 http://cran.r-projet.org/ 1581

The 9 th International Days of Statistis and Eonomis, Prague, September 10-1, 015 suffiient though, beause it neglets important harateristis of a dataset, suh as frequenies of ategories or number of ategories, whih ould be used for better similarity determination. The other possibility is to use measures determined for mied type data. One of the famous ones is the Gower similarity oeffiient, see (Gower, 1971), whih an be found e.g. in funtion daisy in the R pakage luster, or the log-likelihood distane in two-step luster analysis, see (Chiu et al., 001), in the IBM SPSS software. However, the Gower oeffiient treats nominal variables in the same way as measures based on the SM oeffiient, and as far as we know, the log-likelihood distane is not available in any non-ommerial software. Another possibility is to transform the original nominal variables into a set of dummy variables, to whih measures for quantitative or binary data an be applied. However, this proedure always provides some loss of information aused by dummy transformation. Overall, urrent software solutions do not provide satisfatory results. In reent years, there have been introdued many similarity measures whih were proposed for nominal variables, e.g. the Lin measure, see (Lin, 1998), or the Eskin measure, see (Eskin et al., 00). Many of them provide very good results as well, see (Šul and Řezanková, 014) or (Šul, 015). The problem is that those measures have not beome widely known, mostly due to the lak of software implementation. By introduing our R pakage, we aspire to deal with this problem. The pakage ontains 11 similarity measures determined for nominal data, and ompletely overs the lustering proess from proimity matri omputation, over hierarhial lustering analysis, to evaluation riteria of resulting lusters determination. In this paper, we introdue the funtionality of this pakage in detail. The paper is organized as follows. The Setion 1 offers a short introdution of used similarity measures, linkage methods, and evaluation riteria. The Setion presents several typial use illustrations on a simple eample. Final remarks are plaed in Conlusion. 1 Theoretial bakground of the pakage The pakage ontains 11 similarity measures for nominal data, 10 of them were summarized in (Boriah et al., 008), and one of them was introdued by Morlini and Zani, (01). All the measures were more thoroughly eamined in (Šul and Řezanková, 014) or (Šul, 015). Tab. 1 presents equations whih determine similarity between the i-th and j-th objets by the -th variable. This is the first (and the most important) level of a proimity matri omputation. On the seond level, similarity for objets aross all variables is omputed. The last level involve omputation of the dissimilarity. All formulas in Tab. 1 are based on the 158

The 9 th International Days of Statistis and Eonomis, Prague, September 10-1, 015 data matri X = [i], where i = 1,,..., n (n is the total number of objets); = 1,,..., m (m is the total number of variables). The number of ategories of the -th variable is denoted as n, absolute frequeny as f, and relative frequeny as p. Tab. 1: Similarity measures in the nomlust pakage(without dummy transformation) Measure Similarity between the i-th and j-th objets by the -th variable Eskin 1 if Goodall 1 Goodall S ( i S ( i S ( i,,, j j j ) ) ) n otherwise n 1 i p j q 0 otherwise 1 if i j q Q, Q X :, ( ) ( ) q Q p q p i p q 0 otherwise Goodall 3 1 p S ( i, j ) if i j q Q, Q X :, ( ) ( ) q Q p q p i i 0 otherwise Goodall 4 p IOF Lin Lin 1 OF SM S ( i S ( i S ( i S ( i S ( i S ( i,,,,,, j j j j j j ) ) ) ) ) ) i if 0 otherwise 1 if i j if i i 1 1 ln f ( ) ln f ( ln p( i ln( p( qq ln 1if ln p qq i i ) if i q n 1 ln f ( 1if i i j ) p( if i j j j j otherwise ) )) otherwise j p qotherwise j i 0 otherwise j, Q X : q Q, p ( i) p ( q) p ( j ) 1 otherwise n ln ) f ( ) j 1583

The 9 th International Days of Statistis and Eonomis, Prague, September 10-1, 015 The Morlini and Zani s similarity measure uses dummy transformation; thus, it is omputed in a different way than the rest of measures. Similarity between the objets i and j is epressed: m K 1 ( i, u ln 1 u1 fu S( i, j ), m K m K 1 1 ( i, u ln ( i, f u ln 1 u1 f u 1 u1 f u (1) Where u = 1,,... K is an inde of the u-th dummy variable of the -th original variable, fu is the seond power of the frequeny of the u-th ategory of the original -th variable. When using Eq. (1), the following onditions have to be held: ( i, ( i, u u 1if iu 1and 0 otherwise, ju 1, ( i, ( i, 1if iu 1 0 otherwise. jt 1and it 1 ju 0, for u t, After omputation of proimity matri, hierarhial luster analysis is performed. The pakage enables to use three different linkage methods omplete, single and average one. They differ from the way, how they determine dissimilarity between two lusters. The omplete linkage method onsider this dissimilarity as dissimilarity between two furthest objets from two different lusters. The single linkage method uses dissimilarity between two losest objets, and the average linkage method takes average pairwise dissimilarity between objets in two different lusters. The resulting lusters an be evaluated by si riteria, whih were introdued in (Řezanková et al., 011). They are based on the within-luster variability of lusters, whih an be measured either by the mutability or by the entropy. The riteria are presented in Tab., where k is the number of lusters, n g is the number of objets in the g-th luster (g = 1,,...,, n giu is the number of objets in the g-th luster by the i-th objet with the u-th ategory (u = 1,,..., K ; K is the number of ategories), = 1,,..., m (m is the number of variables). WCM and WCE measure the within-luster variability in a dataset. They take values from zero to one, where values lose to one indiate high variability. With inreasing number of lusters, the within-luster variability dereases, and thus, values of these indies derease. 1584

The 9 th International Days of Statistis and Eonomis, Prague, September 10-1, 015 Values of oeffiients PSTau and PSU take values from zero to one as well. This time, values lose to one indiate the low within-luster variability and vie versa. Pseudo F oeffiients, originally determined for quantitative data, see e.g. (Löster, 014), are based on the mutability or the entropy for nominal variables. They were developed to determine the optimal luster solution, i.e. the solution with the relative highest derease of within-luster variability. Maimal value should indiate the best solution. Tab. : Evaluation riteria in the nomlust pakage Within-luster mutability oeffiient K K 1 g (WCM) WCM ( 1 Within-luster entropy oeffiient k n n m m g1 1 u1 g WCE( (WCE) K n n gu g k m K n 1 n gu ngu ln 1 n m 1 ln K u1 ng ng Pseudo tau oeffiient (PSTau) WCM(1) WCM( PSTau( WCM(1) g Pseudo unertainty oeffiient (PSU) WCE(1) WCE( PSU ( WCE(1) Pseudo F oeffiient based on the ( n ( WCM(1) WCM( ) mutability (PSFM) PSFM( ( k 1) WCM( k ) Pseudo F oeffiient based on the ( n ( WCE(1) WCE( ) entropy (PSFE) PSFE( ( k 1) WCE( k ) Coeffiients based on the mutability and the entropy usually do not differ very muh. They are inluded in the pakage in order to provide two independent ways of variability omputation. Big differenes between mutability- and entropy-based oeffiients should attrat researher s attention. The use of two types of oeffiients is espeially useful when determining the optimal number of lusters using the pseudo F oeffiients. If both the oeffiients prefer the same luster solution, there is strong evidene for hoosing this partiular solution by a researher. Illustrations of use There are three possible ways how to use the nomlust pakage. One an use it to a proimity matri omputation using one of 11 available similarity measures, or to perform the omplete 1585

The 9 th International Days of Statistis and Eonomis, Prague, September 10-1, 015 lustering proess, where the input is data matri X, and the output onsists of luster membership variables, and a set of evaluation riteria. The third way of the use is to ompute a set of evaluation riteria of a dataset with luster membership variables, whih were obtained by any other lustering proess. In this setion, all the ways of use will be desribed in detail. Eamples will be demonstrated on the dataset data0, whih ontains five nominal variables, eah with 0 observations. It is available in the pakage after typing: R> data(data0).1 Proimity matri omputation In ase one wants to ompute a proimity matri using only one of available similarity measures, e.g. for researh purposes, the best way is to use diretly the funtion of a partiular measure. All the funtional alls are presented in Tab. 3. The resulting proimity matries have standard proportions; their size is nn, and they have zero diagonal elements. Thus, they an be used as the input in other pakages as well, for instane hlust in the stats pakage. Tab. 3: Funtional alls for proimity matries omputation Measure Funtional all Measure Funtional all Eskin R>eskin(data) Morlini R>morlini(data) Goodall 1 R> good1(data) Lin R> lin(data) Goodall R> good(data) Lin 1 R> lin1(data) Goodall 3 R> good3(data) OF R> of(data) Goodall 4 R> good4(data) SM R>sm(data) IOF R>iof(data). Complete lustering proess The most omple funtion in the pakage is the nomlust funtion. It overs the whole lustering proess, and it has following synta: R>nomlust(data, measure = "iof", lu_low =, lu_high = 6, eval = TRUE, pro = FALSE, method = "omplete") The only mandatory parameter is data, whih represents the input data for the analysis. The rest of parameters is preset by default in a way whih should suit to majority of users, but it an be hanged if needed. Parameter measure hooses the similarity measure for the analysis. 1586

The 9 th International Days of Statistis and Eonomis, Prague, September 10-1, 015 As a default hoie, the IOF measure is hosen, beause it often provides the best results, see (Šul and Řezanková, 014). Generally, any of measures from Tab. 3 an be used in its plae. Parameters lu_low and lu_high determine the lower and upper bounds for the number of reated luster solutions. There are two logial parameters eval and pro, whih enable to ompute evaluation riteria or to display a proimity matri respetively. The last parameter is method, whih enables to use one of three linkage methods. By default, the omplete linkage method is hosen, beause it usually provides the best results; see (Šul, 014). In the pakage, the agnes algorithm from the pakage luster was used for performing the hierarhial luster analysis. In order to use the nomlust funtion on the data0 dataset with the IOF measure, and displayed proimity matri, the following synta is needed: R> data <- nomlust(data0, pro = TRUE) The output of the nomlust funtion has a form of a list with up to three omponents. The mem omponent ontains luster memberships for all ases. Beause the output would be large, only first si rows of the output are displayed: R>lu_mem<- head(data$mem) lu_ lu_3 lu_4 lu_5 lu_6 1 1 1 1 1 1 1 1 1 1 1 3 1 4 1 1 1 3 3 5 1 1 1 1 1 6 3 3 4 4 The evaluation riteria an be found in the eval omponent after typing: R>lu_eval<- data$eval luster WCM WCE PSTau PSU PSFM PSFE 1 1 0.9666 0.9600 NA NA NA NA 0.7330 0.7010 0.1987 0.084 4.4646 4.738 3 3 0.617 0.5789 0.3365 0.3607 4.3106 4.7949 4 4 0.4546 0.4066 0.5005 0.5491 5.3446 6.4961 1587

The 9 th International Days of Statistis and Eonomis, Prague, September 10-1, 015 5 5 0.4136 0.3658 0.5373 0.5807 4.3551 5.1943 6 6 0.3600 0.3085 0.6004 0.6514 4.074 5.37 The output ontains values for all the evaluation riteria presented in Tab. for the defined range of lusters. In the first row, there is always displayed variability in the whole dataset by means of the WCM and WCE oeffiients. The other evaluation riteria have always symbol NA in this row. The other rows represent the within-luster variability derease pretty well. When studying the PSFM and PSFE oeffiients, both of them prefer the four-luster solution. The pro omponent displays proimity matri, if the logial parameter is set to TRUE. Sine the proimity matri would be large, the output is omitted. R>lu_pro<- data$pro.3 Clustering evaluation Sometimes it may arise a need to evaluate set of luster solutions obtained by a similarity measure or method whih is not available in the nomlust pakage, for instane, results obtained by two-step luster analysis in IBM SPSS. For suh ases, one an use the evallust funtion, whih has the following synta: R>evallust(data, num_var, lu_low =, lu_high = 6) The funtion has two mandatory parameters; data, whih is an original dataset with luster membership variables in an inreasing order, and num_var, whih defines the number of original variables in the dataset. Again, lu_low and lu_high parameters define the range of used luster solutions. Thus, evaluation riteria omputation for the data0 dataset with five additional luster membership variables ( to 6 lusters) an be written in the following way: R>evallust(data0, 5, lu_low =, lu_high = 6) luster WCM WCE PSTau PSE PSFM PSFE 1 1 0.9666 0.9600 NA NA NA NA 0.7601 0.714 0.1943 0.366 4.3400 5.580 3 3 0.6118 0.5670 0.3490 0.4034 4.5570 5.7473 4 4 0.4615 0.400 0.4851 0.5488 5.05 6.4857 5 5 0.3938 0.384 0.566 0.6355 4.8955 6.5379 1588

The 9 th International Days of Statistis and Eonomis, Prague, September 10-1, 015 6 6 0.383 0.649 0.693 0.6941 4.7540 6.354 The output has the same form as omponent eval in the nomlust funtion. Thus, a diret omparison of outputs is ensured. For instane, in this partiular ase, two-step luster analysis provides omparable results to those ones gotten by the IOF measure used in the nomlust funtion. Moreover, PSFM and PSFE determine the same optimal solution in ase of IOF, whereas their values in ase of two-step luster analysis evaluation differ. For more information about two-step luster analysis lustering performane, see e.g. (Šul and Řezanková, 014). Conlusion In this paper, we introdued the main features of the nomlust pakage. We think it offers a omprehensive look at the nominal lustering issue whih annot be found in any other R pakages or ommerial software. We are aware that the R pakage development is a living proess; and thus, we are going to improve it steadily. In net releases, we would like to fous on performane optimization, beause proimity matries omputation takes a lot of omputational time by large datasets. Net, we will inorporate graphial outputs of the analysis, whih should make a luster evaluation proess even easier. Aknowledgments This paper was proessed with ontribution of long term institutional support of researh ativities by Faulty of Informatis and Statistis, University of Eonomis, Prague. Referenes Boriah, S., Chandola, V., Kumar, V. (008). Similarity measures for ategorial data: a omparative evaluation. In Proeedings of the 8th SIAM International Conferene on Data Mining. SIAM, 43-54. Chiu, T., Fang, D., Chen, J., Wang, Y., Jeris, C. (001). A robust and salable lustering algorithm for mied type attributes in large database environment. In Proeedings of the seventh ACM SIGKDD international onferene on knowledge disovery and data mining, 63. Gower, J. C. (1971). A general oeffiient of similarity and some of its properties. In Biometris, 8(4), 857-871. 1589

The 9 th International Days of Statistis and Eonomis, Prague, September 10-1, 015 Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S. (00). A geometri framework for unsupervised anomaly detetion. In Appliations of Data Mining in Computer Seurity, 78-100. Morlini, I., Zani, S. (01). A new lass of weighted similarity indies using polytomous variables. In Journal of Classifiation, 9(), 199-6. Lin, D. (1998). An information-theoreti definition of similarity. In ICML '98: Proeedings of the 15th International Conferene on Mahine Learning. San Franiso: Morgan Kaufmann Publishers In., 96-304. Löster, T. (014). The evaluation of CHF oeffiient in determining the number of lusters using Eulidean distane measure. In The 8th International Days of Statistis and Eonomis. Slaný: Melandrium, 014, 858-869. http://msed.vse.z/msed_014/artile/463-loster-tomaspaper.pdf. Řezanková, H., Löster, T.,Húsek, D. (011). Evaluation of ategorial data lustering. In Mugellini, E, Szzepaniak, P. S., Pettenati, M. C. et al. (Eds.), In Advanes in Intelligent Web Mastering 3. Berlin: Springer Verlag, 173-18. Šul, Z. (014). Similarity Measures for Nominal Variable Clustering. In The 8th International Days of Statistis and Eonomis. Slaný: Melandrium, 1536-1545. http://msed.vse.z/msed_014/artile/75-sul-zdenek-paper.pdf. Šul, Z. (015). Appliation of Goodall's and Lin's similarity measures in hierarhial lustering. In Sborník praí vědekého semináře doktorského studia FIS VŠE. Praha: Oeonomia, 11-118. http://fis.vse.z/wp-ontent/uploads/015/01/dd_fis_015_cely_sbornik.pdf. Šul, Z., Řezanková, H. (014). Evaluation of reent similarity measures for ategorial data. In Proeedings of the 17th International Conferene Appliations of Mathematis and Statistis in Eonomis. Wrolaw: Wydawnitwo Uniwersytetu Ekonomiznego we Wroławiu, 49-58. Zdeněk Šul University of Eonomis, Prague, Dept. of Statistis and Probability W. Churhill sq. 4, 130 67 Prague 3, Czeh Republi zdenek.sul@vse.z Hana Řezanková University of Eonomis, Prague, Dept. of Statistis and Probability W. Churhill sq. 4, 130 67 Prague 3, Czeh Republi hana.rezankova@vse.z 1590