Gaining Insights to the Tea Industry of Sri Lanka using Data Mining



Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

A DATA MINING APPLICATION IN A STUDENT DATABASE

Forecasting the Direction and Strength of Stock Market Movement

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

An Interest-Oriented Network Evolution Mechanism for Online Communities

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Recurrence. 1 Definitions and main statements

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Mining Multiple Large Data Sources

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

CHAPTER 14 MORE ABOUT REGRESSION

An Alternative Way to Measure Private Equity Performance

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

Gender Classification for Real-Time Audience Analysis System

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

BERNSTEIN POLYNOMIALS

Abstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Customer Segmentation Using Clustering and Data Mining Techniques

Statistical Approach for Offline Handwritten Signature Verification

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Customer Lifetime Value Modeling and Its Use for Customer Retention Planning

Performance Analysis and Coding Strategy of ECOC SVMs

What is Candidate Sampling

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

A COLLABORATIVE TRADING MODEL BY SUPPORT VECTOR REGRESSION AND TS FUZZY RULE FOR DAILY STOCK TURNING POINTS DETECTION

Statistical Methods to Develop Rating Models

Credit Limit Optimization (CLO) for Credit Cards

A Simple Approach to Clustering in Excel

A Genetic Programming Based Stock Price Predictor together with Mean-Variance Based Sell/Buy Actions

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

7.5. Present Value of an Annuity. Investigate

Analysis of Premium Liabilities for Australian Lines of Business

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

1 Example 1: Axis-aligned rectangles

1. Measuring association using correlation and regression

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

A FEATURE SELECTION AGENT-BASED IDS

Decision Tree Model for Count Data

SIMPLE LINEAR CORRELATION

The OC Curve of Attribute Acceptance Plans

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688,

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

The Application of Fractional Brownian Motion in Option Pricing

A Secure Password-Authenticated Key Agreement Using Smart Cards

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

How To Calculate The Accountng Perod Of Nequalty

Project Networks With Mixed-Time Constraints

Pricing Model of Cloud Computing Service with Partial Multihoming

L10: Linear discriminants analysis

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Data Mining Analysis and Modeling for Marketing Based on Attributes of Customer Relationship

A practical approach to combine data mining and prognostics for improved predictive maintenance

Using Series to Analyze Financial Situations: Present Value

Data Visualization by Pairwise Distortion Minimization

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

DEFINING %COMPLETE IN MICROSOFT PROJECT

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Macro Factors and Volatility of Treasury Bond Returns

When Talk is Free : The Effect of Tariff Structure on Usage under Two- and Three-Part Tariffs

THE efficient market hypothesis (EMH) asserts that financial. Predicting Financial Markets: Comparing Survey, News, Twitter and Search Engine Data

Section 5.3 Annuities, Future Value, and Sinking Funds

Calendar Corrected Chaotic Forecast of Financial Time Series

Transcription:

Proceedngs of the Internatonal MultConference of Engneers and Computer Scentsts 2008 Vol I Ganng Insghts to the Tea Industry of Sr Lanka usng Data Mnng H.C. Fernando, W. M. R Tssera, and R. I. Athauda Abstract Ths paper descrbes an applcaton of data mnng technques to study certan aspects of tea ndustry n Sr Lanka usng dfferent types of data avalable. Ths study showed that exports and producton contnuously ncrease wth tme. It further revealed how producton vary wth the three elevatons they are grown, namely, Hgh, Low and Md. Tme seres analyss produced ART(p) models for producton and prce. The study resulted n sgnfcant nsghts and knowledge about the tea ndustry whch contrbutes sgnfcantly to the Sr Lankan economy. Index Terms Data mnng, cluster analyss, tme seres analyss I. INTRODUCTION Data mnng (DM) s a powerful new technology to analyze nformaton n large databases and extract hdden predctve nformaton from them. There s a growng nterest to use ths new technology to convert large passve databases nto useful actonable nformaton. The advantage of data mnng technques s that they can be used to explore massve volumes of data automatcally unlke queres posed by a human analyst. The process of data mnng conssts of three stages: () the ntal exploraton, () model buldng or pattern dentfcaton and () deployment [8]. The ntal exploraton usually starts wth data preparaton whch may nvolve cleanng data, data transformatons, selectng subsets of records and - n case of data sets wth large numbers of varables - performng some prelmnary feature selecton operatons to brng the number of varables to a manageable range. Data Mnng lends technques from many dfferent dscplnes such as databases ([9], [13], [15], [16], [18], [24], and others), statstcs [12], Machne Learnng/Pattern Recognton [5], and Vsualzaton [14]. The am of ths study s to nvestgate how data mnng technques can be used to fnd relatonshps among factors such as producton, export and aucton prce of tea. It s possble to obtan comparatvely large sets of data on tea exports from dfferent sources, namely, Sr Lanka Tea Board Manuscrpt receved December 30, 2007 H. C. Fernando (correspondng author to provde phone: +94-11-230-1904; fax: +94-11-230-1906; e-mal: chandrka.f@slt.lk)., and W. M. R Tssera are wth the Sr Lanka Insttute of Informaton Technology, Level #16, BOC Merchant Tower, St. Mcheals Road, Colombo 03. Sr Lanka.. (e-mal: menk.t@slt.lk) R. I. Athauda s wth the Faculty of Scence and Informaton Technology, School of Desgn, Communcaton and Informaton Technology, The Unversty of Newcastle, NSW 2308. Australa. (e-mal: Rukshan.Athauda@newcastle.edu.au) (SLTB) [4], Central Bank of Sr Lanka [1], [3], Sr Lanka Customs [23] and Department of Census and Statstcs [6]. The data of ths study was provded by SLTB. We hope that the dentfed patterns n ths study would help to make polcy decsons n mprovng tea producton and exports, whch contrbutes sgnfcantly to the natonal economy of Sr Lanka. II. METHODOLOGY The ntal exploraton started wth preparng data to be comparable wth respect to currency. The currency of the aucton prces s recorded n Sr Lanka Rupee (LKR). Due to the fluctuatons of the exchange rates and currency deprecaton, comparable prce n USD was calculated as follows: Actual _ monthly _ average _ aucton _ prce _ n _ LKR _ per _ kg Monthly _ average _ exchange _ rate _ n _ USD Monthly average exchange rates [2] were used to perform ths transformaton. As there were only a few varables, reducton of the dmenson of the data was not necessary. It was essental to select data for the same perod of tme for a gven varable. Prelmnary analyss was carred out as the next step of ntal exploraton usng SPSS [20]. Mcrosoft SQL Sever [19], [21], [26] s a powerful Database Management Systems (DBMS) that offers dfferent DM algorthms and technques and s user frendly. Therefore, we selected Mcrosoft SQL Server 2005 due to ts DBMS capabltes and DM algorthms [19], [26]. A. Clusterng Cluster analyss s a branch of statstcs that has been successfully appled to many applcatons. Its man goal s to dentfy clusters present n the data. There are two man approaches to clusterng, namely, herarchcal clusterng and parttonng clusterng. The Mcrosoft clusterng algorthm, k-means [10], s a parttonng algorthm whch creates parttons of a database of N objects nto a set of k clusters. It mnmzes total ntra-cluster varance, or, the squared error functon (V) k V = x μ = 1 x j S where, there are k clusters S, = 1,2,..., k and µ s the centrod or mean pont of all the ponts x S. Monthly data was avalable for producton and prces but j 2

Proceedngs of the Internatonal MultConference of Engneers and Computer Scentsts 2008 Vol I not for exports n our data set. Therefore, cluster analyss was possble only for the three varables: tme, prce and producton for the three types of tea. B. Tme seres analyss The trends of the varables wth tme (monthly or yearly) were analyzed usng tme seres analyss. SQL Server provdes the faclty to perform tme seres analyss usng Autoregressve tree model (ART) [17]. An ART s a pecewse lnear autoregressve model n whch the boundares are defned by a decson tree and leaves of the decson tree contan lnear autoregressve models. An ART(p) model s an ART model n whch each leaf of the decson tree contans AR(p) model, and the splt varables for the decson tree are chosen from among the prevous p varables n the tme seres. ART(p) models are more powerful than AR models because they can model non-lnear relatonshps n tme seres data [25]. ART models are partcularly sutable for data mnng because of the computatonally effcent methods avalable for learnng from data. Also, the resultng models yeld accurate forecasts and are easly nterpretable. the least throughout the year (see Fg. 5). Tea Producton (MT) Producton vs 250000 200000 150000 1900 1920 1940 1960 1980 Fgure 1. Producton by year Exports vs 2000 2020 ART (p) model s gven by: t t p p Φ L L Φ 2 t 1, θ ) = f ( yt yt p,..., yt 1, θ ) = N( m + bj yt j, ) = 1 = 1 j= 1 θ = ( θ 1,..., θ L, and f ( y y,..., y σ where L s the number of leaves, ) 2 θ = ( m, b 1,..., bp, σ ), are the model parameters for the lnear regresson at leaf l, = 1,..., L. The experment s carred out for three dfferent data sets. Tea Exports (MT) 250000 200000 150000 1900 1920 1940 1960 1980 Fgure 2. Exports by year 2000 2020 Data set I: Annual export and producton of tea from 1911 to 2006. Data set II. Monthly producton accordng to the elevaton from 1986 to 2006. 105 % 100 Data set III Monthly aucton prces accordng to the elevaton from 1986 to 2006. 95 90 85 III. RESULTS Data set I contans annual fgures of export and producton of tea (n metrc tons) of any elevaton. Fg. 1 shows a contnuous progress n producton wth tme. The same relatonshp was found for exports and tme too (see Fg. 2). It s mportant to observe that we have exported more than the producton n some nstances (see Fg. 3). Ths was possble because tea was mported to blend wth Ceylon tea, thereby addng more tea to the producton of Sr Lanka (e.g. In 2006, export as a percentage of producton was 102%). We have been able to export any amount we produced so far (see Fg. 6). Data set II contans how the average producton of a partcular month for a perod of 20 years s dstrbuted (see Fg. 4 and Fg. 5) among the three elevatons they are grown, namely, Hgh, Low and Md [23]. Low grown tea s produced at 1500-1800 feet. Md-grown tea s from heghts between 1800-3500 feet. Hgh grown tea s produced at the heght of 3500 to 7500 feet. Low grown tea contrbutes to almost half of the producton (see Fg. 4). Md grown tea producton s 80 75 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 Fgure 3. Export as a percentage of producton by year 29% Low grown tea Md grown tea 51% Hgh grown tea 20% Fgure 4. Average tea producton by elevaton

Proceedngs of the Internatonal MultConference of Engneers and Computer Scentsts 2008 Vol I 14.00 12.00 10.00 8.00 6.00 4.00 2.00 L-aver age M-aver age H-aver age Comparable Avg. Tea Prces 2.00 1.50 1.00 0.50 0.00 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month L M H 0.00 Month Fgure 5. Average tea producton by elevaton A. Regresson of exports on producton Data set I conssts of yearly exports and producton of tea from 1911 (see Fg. 1 and Fg. 2). Fg. 1 shows a contnuous progress n exports wth tme. The same relatonshp was found for producton and tme too. Ths ndcates that exports and producton follows the same pattern wth tme. It s shown that the more tea s produced, more can be exported (see Fg. 3). Snce the scatter plot showed a strong lnear correlaton (r =0.993) between these two varables, a smple lnear regresson was ftted (export = 6625 + 0.914*producton; R 2 =0.98). Fg. 6 shows how well the regresson lne fts the data. It shows a postve trend, whch amounts to an ncrease of 914,000 kg per year for exports of tea, further justfyng our earler comments. Exports (MT) 250000 200000 150000 Exports vs producton 150000 200000 250000 producton (MT) Fgure 6. Regresson graph between exports and producton of tea Data set III contans monthly aucton prces accordng to the elevaton of tea (see Fg. 7). On average, the prce of low grown tea stands above that of the other types. However, prces of all three types decrease n the perod of June-July. It s mportant to note that the prce s not governed by mere producton. The smple lnear regressons explan almost nothng of the varaton n prce n all three types of tea (Low: R-Sq = 7.8%, Md: R-Sq = 1.8% and Hgh: R-Sq = 0.2%). Ths ndcates that the aucton prce s ndependent of the amount of producton. Fgure 7. Average prce dstrbuton by elevaton B. Cluster analyss for producton and prce Ths secton presents the results of cluster analyss usng K-means algorthm [19], [26] avalable n Mcrosoft SQL Server 2005. Ths analyss was performed only for monthly producton and prces. Snce, monthly data was not avalable for exports, t was excluded from cluster analyss. We were able to obtan seven clusters as presented n Fg. 8 and Table I. It s mportant to note that low grown tea forms two dstnct clusters: cluster 5 and 7. The consttuton of cluster 7, whch gves the hghest average producton, s 100% low grown tea. The other cluster, whch conssts of 100% low grown tea, namely, cluster 5, gves second hghest average producton. The thrd hghest average producton s found n cluster 6, whch conssts of 88.5% of low grown tea, and the balance s hgh grown tea. It can be concluded that low grown tea s the man contrbutor to the producton. The pattern n clusterng of prce s the same as of producton. That s, the hghest average prce s found n cluster 7, followed by cluster 5 and cluster 6. Therefore, the most sgnfcant contrbutor to the prce of tea s low grown tea. Table I. Dstrbuton of Mean and Standard Devaton by cluster Clust er Category Partcpaton Prce Producton Low % Medu m % Hgh % Mean SD Mean SD 1 1.8 85.9 12.3 1.45 ± 0.27 3.89 ± 0.58 2 32.6 7.3 60.1 1.51 ± 0.28 5.55 ± 0.48 3 30.8 _ 69.2 1.56 ± 0.03 7.19 ± 0.51 4 64.2 _ 35.8 1.53 ± 0.25 9.01 ± 0.45 5 100 1.89 ± 0.25 13.31 ± 0.73 6 88.5 _ 11.5 1.73 ± 0.31 10.77 ± 0.63 7 100 1.9 ± 0.16 16.08 ± 1.10 Fgure 8. Cluster Profle

Proceedngs of the Internatonal MultConference of Engneers and Computer Scentsts 2008 Vol I C. Tme seres analyss for export and producton It s mportant to know the trend for producton and export so that the recent future can be predcted. One of the data mnng technques, tme seres [25] was used to dentfy the trend of exports and producton. For ths analyss tme frame was fxed from 1977, n whch year open economy was ntroduced to the country. After ntroducng the open economy the exports have ncreased consderably. In order to ft a tme seres model for the trend found n exports the followng ART model was calculated (Table II). The decson tree produces two dstnct nodes. One node represents perod before md 1992 and the other node represents perod after md 1992 (see Fg. 9). Table III. ART model for producton Regon ART model >= 1993 Producton = 73215.369 + 0.166 * Producton(-3) + 0.232 * Producton(-2) + 0.379 * Producton(-1) < 1993 Producton = 99579.595 + 0.629 * Producton(-3) + 0.049 * Producton(-1) -0.149 * Producton(-2) In order to nvestgate how the exports and producton predcted to vary by the two tme seres model, Fg. 11 was constructed. Ths graph uses the data calculated by the two tme seres models. The gap between export and producton s predcted to exst n future too. Ths shows that the producton doesn t meet the contnuous ncrease n the export adequately (see Fg. 11). Fgure 9. Decson Tree of ART model of annual exports The two tme seres model for the two perods mentoned above s gven n Table II. They both show that the export fgure for a gven month only depends on export fgures of the prevous two months. Also, exports are contnuously ncreasng wth tme. The tme seres model predcts a contnuous growth n exports (.e. coeffcents are postve) (see Table II). Table II. ART model for annual exports Regon ART model >= 1992.500 Exports = 49164.421 + 0.254 * Exports(-2) + 0.600 * Exports(-1) < 1992.500 Exports = 93197.567 + 0.279 * Exports(-2) + 0.244 * Exports(-1) In order to ft a tme seres model for the trend found n producton, the followng ART model was calculated (see Table III). The decson tree produces two dstnct nodes (see Fg. 11). Quantty (MT) 340000 330000 320000 310000 290000 2007 2010 2013 2016 2019 2022 Fgure 11. Predcton usng ART model Exports Producton D. Tme seres analyss for Prces Ths analyss was carred out separately for the three types of tea. As low-grown tea s the hghest contrbutor, we started ths analyss wth low grown tea. The decson tree for ths model had only the root node, hence one leaf (see Fg. 12). Therefore, the lnear autoregressve model needs to be calculated only for one regon and s presented n Table IV. Ths model shows a postve trend n prce of low grown tea, whch only depends on the prce of the prevous month Fgure 12. Decson Tree of ART model for prces of Low-grown tea Table IV. ART model for prce of low grown tea Fgure 10. Decson Tree of ART model of annual producton One node represents perod before 1993 and the other node represents perod after 1993 (see Fg. 10). The two tme seres models for the two regons show that the producton fgure for a gven month only depends on producton fgures of the prevous three months (see Table III). Also, producton s contnuously ncreasng wth tme. The tme seres model predcts a contnuous growth n producton but slow. Regon AR model All Low Grown Prce = 0.078 + 0.956 * Low Grown Prce(-1) The decson tree for md-grown tree also has only the root node, hence one leaf (see Fg. 13). Therefore, the lnear autoregressve model needs to be calculated only for one regon and s presented n Table V. Ths model shows a postve trend n prce of md grown tea, whch only depends on the prce of the prevous two months.

Proceedngs of the Internatonal MultConference of Engneers and Computer Scentsts 2008 Vol I Fgure 13. Decson Tree of prce of md- grown tea Table V. ART model for comparable prce fluctuatons of Md grown tea Regon All AR model Md Grown Prce = 0.132 + 1.093 * Md Grown Prce(-1) -0.183 * Md Grown Prce(-2) The decson tree for hgh-grown tree produced three regons (see Fg. 14). The pecewse lnear autoregressve model was calculated for three regons and s presented n Table VI. Ths model revealed that the prce of hgh grown tea s constant at present. Fgure 14. Decson Tree of prce of Hgh- grown tea Table VI. ART model for prce of Hgh grown tea Regon AR model Hgh Grown Prce-3 >= 1.957 Hgh Grown Prce = 1.651 Hgh Grown Prce-3 >= 1.957 Hgh Grown Prce = 1.651 and Month >= 5/19/1999 Hgh Grown Prce-3 >= 1.957 Hgh Grown Prce = 0.744-0.472 * Hgh and Month < 5/19/1999 Grown Prce(-3) + 0.121 * Hgh Grown Prce(-2) + 0.433 * Hgh Grown Prce(-1) + 0.715 * Hgh Grown Prce(-96) IV. CONCLUSION Ths secton dscusses a study carred out wth some data related to the Sr Lankan tea ndustry. Sr Lanka consders export of tea as a lucratve busness. Understandng the dfferent parameters such as producton, costs, prces, and others can assst n growth, development and hgher proftablty n ths sector. As dscussed, the strong correlaton between producton and exports makes us to beleve that the tea producton s the only aspect governng tea exports. Nnety eght per cent of the varaton n exports s explaned by producton. Ths s further proven by the fact that the aucton prce has lttle/no correlaton over exports of all types of tea. As shown n Fg. 5, md and hgh grown tea producton s consderably lower than that of low grown tea. The aucton prces for tea start rsng by end of July. From August to March prces reman comparatvely hgh. Also, low grown tea seems to fetch an average hgher prce when compared to others. It would be nterestng to fnd out the proftablty of each type of tea (hgh, md and low) consderng the cost of producton. Consderng monthly data, t s mportant to note that n February, March, September and October, the producton s comparatvely low (see Fg. 5 and Fg. 7). In 1994, 1998 and 2003, there had been marked dfferences n prces for all three types of tea. When consderng the aucton prces for dfferent categores of tea from 1986 to 2006, the hghest prces for dfferent categores of tea have been reported n the perod 1996-1997. One of the sgnfcant events took place n 1996 was wnng the World Cup n crcket. The marketng campagn and the publcty obtaned by the country may have helped to mprove the prces of tea. If the correlaton between marketng and prces of tea can be establshed then consderng dfferent strateges to market tea can result n hgher aucton prce for tea. Ths correlaton (although possble) remans to be shown by future research. The authors beleve that further research n data analyss wth respect to the tea ndustry can result n sgnfcant nsghts and knowledge, whch can be utlzed for better algnment and growth n the sector contrbutng to the Sr Lankan economy. Ths paper demonstrates the possblty of applyng technques such as data mnng to gan useful nsghts n sectors such as tea ndustry. In future, we plan to further nvestgate ths area of research n close collaboraton wth doman experts from the tea sector. ACKNOWLEDGMENT Ths research was nspred and necessary data were provded by Sr Lanka Tea Board (SLTB). We would lke to acknowledge Mr. Paltha Sarukkal, Statstcan, SLTB, for hs valuable gudance and support. REFERENCES [1] Annual report, 2005, Central Bank of Sr Lanka. [2] Average monthly exchange rates, [cted Aug. 2007] [onlne], avalable from World Wde Web [http://www.cbsl.gov.lk/nfo/ _ce/er/e_1.asp]. [3] Central bank of Sr Lanka: Offcal web page for Central bank of Sr Lanka [cted Aug 2007] [onlne], avalable from [http://www.cbsl.gov.lk/]. [4] Ceylon tea board: Offcal web ste for Ceylon tea board [onlne], [cted Aug 03 2007], avalable from World Wde Web [http://www.pureceylontea.com]. [5] Crone S.F., Lessmann S. and Stahlbock R., Utlty based data mnng for tme seres analyss: cost-senstve learnng for neural network predctors, Chcago, pp. 59 68, 2005. [6] Department of Census and Statstcs of Sr Lanka: Offcal web page for Census and Statstcs Department of Sr Lanka [cted Aug 2007] [onlne], avalable from [http://www.statstcs.gov.lk/]. [7] Department of Commerce: Offcal web page for Department of Commerce [cted Aug 2007] [onlne], avalable from [http://www.doc.gov.lk/]. [8] Dunham M.H., Data Mnng Introductory and Advanced Topcs, Pearson, 2005. [9] Ester. M., Kregal. H., and Schubert M., Web Ste Mnng: A new way to spot Compettors, Customers and supplers n the World Wde Web, Proceedngs of SIGKDD-2002, 2002. [10] Faber V., Clusterng and the Contnuous k-means Algorthm, Los Alamos Scence, Number 22, 1994. [11] Grabmeer J., and Rudolph A., Technques of Cluster Algorthms n Data Mnng, Data Mnng and Knowledge Dscovery, Sprnger Netherlands Vol. 6, No 4, pp 303-360,2002. [12] Hoskng R. M. J.,. Edwn P. D and Sudan M., A statstcal perspectve on data mnng, Future Generaton Computng Systems, specal ssue on Data Mnng, 1997. [13] Julsch. K., and Dacer. M., Mnng Intruson Detecton Alarms for Actonable Knowledge, Proceedngs of KDD-2002, 2002.

Proceedngs of the Internatonal MultConference of Engneers and Computer Scentsts 2008 Vol I [14] Kem. D.A, Informaton vsualzaton and vsual data mnng, Proceedngs of IEEE Transactons on Vsualzaton and Computer Graphcs, Vol. 8, pp 1-8, 2002. [15] Ktts. B., Freed. D., and Kommers. J, Cross sell: A Fast Promoton Turntable Customer tem Recommendaton Method Based on Condtonally Independent Probabltes., Proceedngs of KDD-2000, 2000. [16] Ma Y., Lu B., Wong C. K., Yu P.S., Lee S. M., Targetng the Rght Student Usng Data Mnng, Proceedngs of KDD 2000,Boston, MA USA,2000. [17] Meek C., Chckerng D.M, and Heckerman D., Autoregressve Tree Models for Tme-Seres Analyss, 2002. [18] Roset. S., Murad U., Neumann. E., Idan.Y., and Pnkas.G., Dscovery of Fraud Rules for Telecommuncatons Challenges and Solutons., Proceedngs of KDD-99, 1999. [19] SQL Server 2005 Data Mnng nformaton: [cted Aug 2007] [onlne], avalable from World Wde Web [http://www.sqlserverdatamnng.com]. [20] SPSS: Offcal web page for SPSS: [cted Jan 2008] [onlne], avalable from [http://www.spss.com/spss/] [21] SQL Server 2005: Offcal web page for SQL Server 2005 [cted Aug 2007] [onlne], avalable from [http://www.mcrosoft.com/sql/default.mspx]. [22] Sr Lanka Customs: Offcal web page for Department of Commerce [cted Aug 2007] [onlne], avalable from [http://www.customs.gov.lk/]. [23] The story of the Ceylon tea [cted Aug. 2007] [onlne], avalable from World Wde Web [http://www.angelfre.com/w/srlanka/ ceyl_tea.htm]. [24] Tssera. W.M.R., Athauda. R. I., and Fernando H.C., Dscovery of Strongly Related Subjects n the Undergraduate Syllab usng Data Mnng, Proceedng of ICIA 2006,2006. [25] Tong H., Threshold models n Nonlnear Tme Seres Analyss, Sprnger-Verlag, New York,1983. [26] Zhao H. T. and Maclennan J., Data Mnng wth SQL Server 2005, Wley Publshng Inc. USA., 2005.