A Data Mining-Based OLAP Aggregation of. Complex Data: Application on XML Documents

Size: px
Start display at page:

Download "A Data Mining-Based OLAP Aggregation of. Complex Data: Application on XML Documents"

Transcription

1 1 Runnng head: A DATA MINING-BASED OLAP AGGREGATION A Data Mnng-Based OLAP Aggregaton of Complex Data: Applcaton on XML Documents Radh Ben Messaoud, Omar Boussad, Sabne Loudcher Rabaséda {rbenmessaoud omar.boussad sabne.loudcher}@unv-lyon2.fr Laboratory ERIC - Unversty of Lyon 2 5 avenue Perre Mendès-France 69676, Bron Cedex France

2 2 ABSTRACT Nowadays, most organzatons deal wth complex data havng dfferent formats and comng from dfferent sources. The XML formalsm s evolvng and becomng a promsng soluton for modellng and warehousng these data n decson support systems. Nevertheless, classcal OLAP tools are stll not capable to analyze such data. In ths paper, we assocate OLAP and data mnng to cope advanced analyss on complex data. We provde a generalzed OLAP operator, called OpAC, based on the AHC. OpAC s adapted for all types of data snce t deals wth data cubes modelled wthn XML. Our operator enables sgnfcant aggregates of facts expressng semantc smlartes. Evaluaton crtera of aggregates parttons are proposed n order to assst the choce of the best partton. Furthermore, we developed a Web applcaton for our operator. We also provde performance experments and drve a case study on XML documents dealng wth the breast cancer researches doman. Keywords: OLAP; data warehouse; data mnng; aggregaton; agglomeratve herarchcal clusterng; evaluaton of aggregates, XML documents

3 3 INTRODUCTION Data warehouses were ntroduced to provde a support enablng to make decsons from huge amounts of data. A data warehouse s an analyss orented structure that stores a large collecton of subject orented, ntegrated, tme varant and non-volatle data (Kmball, 1996; Inmon, 1996). Onlne analytcal processng (OLAP) s a key feature supported by most data warehouse systems. Based on vsualzaton technques (Manats et al., 2005), OLAP tools enable exploraton and navgaton nto multdmensonal data vews, commonly called data cubes, n order to present nterestng nformaton to end users and decson makers. A data cube s a multdmensonal data model used to conceptualze data n a data warehouse (Chaudahur & Dayal, 1997). The data cube contans facts or cells that are measures or values based on a set of dmensons where each dmenson conssts n a set of categorcal descrptors, called attrbutes, and t may be organzed wthn herarchcal structures. Consder for example a retal sales applcaton where the dmensons of nterest may nclude, Costumer, Product, Locaton, and Tme. If the measure of nterest n ths applcaton s sales amount, then an OLAP fact represents the sales measure correspondng to the prevous dmensons accordng to a sngle attrbute n each dmenson. Dmensons often form a herarchy. For nstance, the Tme dmenson may form a day-month-year herarchy, and the Locaton dmenson may form a cty-stateregon herarchy. Dmensons allow dfferent levels of granularty n the warehouse. For example, a regon corresponds to a hgh level of granularty whereas a cty corresponds to a low level of granularty. Classcal aggregaton n OLAP s consdered the process of consoldatng data values nto a sngle and summarzed one by movng from a herarchcal level of a dmenson to a hgher one. Typcally, addtve data are well suted to be aggregated by elementary operatons (Sum, Average, Max, Mn and Count) n a smple computaton of

4 4 measures. For example, a user wants to observe the sum of sales amount of products accordng to years and regons. Ths aggregaton should use attrbutes to descrbe the targeted facts and make computaton over ther measures. In the recent years, as more organzatons see the web as an ntegral part of ther communcaton and busness, we have been dealng wth a prolferaton of new data formats. These data are complex and qute dfferent and harder to treat than classcal ones. They need new methodologes to be warehoused frst, and then to be analyzed. XML (extensble Markup Language) s provdng some promsng solutons for ntegratng complex nformaton from dfferent sources and warehousng them. Many recent works have proposed some modellngapproaches for XML data warehouses (Golfarell, Rzz & Vrdoljak, 2001; Trujllo, Mora & Song, 2004; Pokornỷ, 2001; Barl & Bellahsène, 2000; Hümmer, Bauer & Harde, 2003; Rusu, Rahayu & Tanar, 2005; Nasss, Rajagopalaplla, Dllon & Rahayu, 2005). The general purpose of these approaches s to desgn or to feed warehouse through the XML formalsm. For nstance, Golfarell et al. (2001) affrm that the use of XML wll become a standard for warehousng heterogeneous and complex data n the next few years. Ths evoluton n the way of warehousng complex data has some drawbacks on modellngand analyss tasks. In fact, classcal OLAP tools are unsutable and unable to deal wth complex data. For example, when treatng mages, sounds, vdeos, texts or even XML documents, aggregatng nformaton wth the classcal OLAP does not make sense. Indeed, we are not able to compute a sum or an average operaton over such knds of data. However, when users analyze complex data, they need more expressve aggregates than those created from elementary computaton of addtve measures. We thnk that OLAP facts representng complex objects need approprate tools and new ways of aggregaton snce we wsh to analyze them. To summarze nformaton about complex data, we should rather gather ther smlar facts nto a sngle group and separate dssmlar facts nto dfferent groups.

5 5 In ths case, t s necessary to consder an aggregaton by computng both descrptors and measures. Instead of groupng facts only by computng ther measures, we also take ther descrptors nto account to obtan aggregates expressng semantc smlartes. In order to do so, we ntend to couple OLAP wth data mnng to create a new type of onlne aggregaton of complex data. OLAP and data mnng can be vewed as two complementary felds. Assocatng them can be a soluton to cope wth ther respectve defects. In fact, on the one hand, when supported by database systems, OLAP has a powerful ablty to organze vews and structure data adapted to analyss, but t s restrcted to smple navgaton and exploraton of data whch weakens ts analyss power. On the other hand, data mnng s not very powerful for organzng data, but t s known for ts descrptve and predctve power, whch can dscover knowledge from both smple and complex data. The general ssue of couplng data mnng wth database systems was already dscussed and motvated by Imelnsk and Mannla (1996). The authors argue that data mnng sets new challenges to database technology. Ther combnaton wll lead to a second generaton of database systems able to manage KDD (Knowledge Dscovery n Databases) applcatons just as classcal ones manage busness applcatons. Furthermore, a data cube structure can provde a sutable context for applyng data mnng methods. More generally, the assocaton of OLAP and data mnng allows elaborated analyss tasks exceedng the smple exploraton of a data cube. Our dea s to take advantage from OLAP as well as data mnng technques and to ntegrate them to the same analyss framework n order to analyze complex objects. In spte of the fact that both OLAP and data mnng were consdered two separate felds for a long, several recent works proved the capablty of ther assocaton to provde nterestng analyss process. In addton to these works, we have already proposed n (Messaoud, Boussad & Rabaséda, 2004) a new OLAP

6 6 operator, called OpAC (Operator for Aggregaton by Clusterng), that combnes OLAP wth an automatc clusterng technque. We use the Agglomeratve Herarchcal Clusterng (AHC) as an aggregaton strategy for complex data. We proved the nterest of ths new operator and ts effcency n creatng semantc aggregates over an mages data cube. More generally, the aggregates provded by OpAC gve nterestng knowledge about the analyzed doman. In ths paper, we propose a generalzaton of our operator whch enables to deal wth all types of data by handlng a data cube modeled and fed drectly by XML sources. In fact, snce XML s able to represent and structure complex objects collected from dfferent sources and whch have dfferent formats (Darmont, Boussad, Bentayeb, Rabaséda & Zellouf, 2003), adaptng OpAC to XML wll lead to a consderable generalzaton of ts analyss capablty. In order to valdate ths generalzaton on a real world doman, we base our current study on screenng mammography data taken from the breast cancer researches. We have structured these data as XML documents and have modeled them on a multdmensonal data cube. Furthermore, we also propose some evaluaton crtera that support the results of our operator. These crtera am at assstng the user and helpng hm/her to choose the best partton of aggregates that wll ft well wth hs/her analyss requrements. The development of ths paper s organzed as follows. In the second secton, we expose a state of the art of works that combne OLAP and data mnng. In the thrd secton, we present an overvew of our approach. We also ntroduce the general context, the XML screenng mammography data cube, and the objectves of our operator. In the fourth secton, we develop a formal background of our approach. The ffth secton s a presentaton of the crtera we propose to evaluate the results of our approach. In the sxth secton, we descrbe the archtecture of a Web platform, called MnngCubes, whch we have developed to valdate our generalzed approach. We also acheve some experments concernng the performance and the tme processng of ths Web applcaton. In the seventh secton, we

7 7 propose a case study on the XML documents that represent a screenng mammography data cube. Fnally, n the eghth secton, we draw conclusons from ths work and propose some future research drectons. RELATED WORK TO COUPLING OLAP AND DATA MINING The major dffculty of combnng OLAP and data mnng s that tradtonal data mnng algorthms are mostly desgned wth tabular datasets organzed n ndvdualsvarables form (Fayyad, Shapro, Smyth & Uthurusamy, 1996). Therefore, multdmensonal data are not suted for these algorthms. Nevertheless, a lot of prevous works motvated and proved an nterest of couplng OLAP wth data mnng methods. We dstngush three major approaches n ths feld. The frst approach tres to extend the query language of decson support systems n order to acheve data mnng tasks. DBMner system, proposed by Han (1998), summarzes ths approach. Some extended OLAP operators perform data mnng methods such as assocaton, classfcaton, predcton, clusterng and sequencng. Han defnes the OLAP Mnng as a new concept that ntegrates OLAP technology wth data mnng technques and allows to perform analyss on dfferent portons and levels of abstracton of a data cube. He also ntroduces the OLAM (On-Lne Analytcal Mnng) as a process of extractng knowledge from multdmensonal databases. He expects that, n the future, OLAM wll be a natural addton to OLAP technology that enhances the power of multdmensonal data analyss. Chen, Dayal and Hsu (2000) dscover behavor patterns by mnng assocaton rules about customers from transactonal e-commerce data. They extend OLAP functons and use a dstrbuted OLAP server wth a data mnng nfrastructure and the resultng assocaton rules

8 8 are represented n partcular cubes called Assocaton Rule Cubes. Gol and Choudhary (1998) thnk that dmenson herarches can be used to provde nterestng nformaton at multple concept levels. Ther approach summarzes nformaton n a data cube, extends OLAP operators and mnes assocaton rules. Some other works consst n ntegratng mnng functons n the database system usng SQL. Chaudhur (1998) argues that data mnng promses a gant leap over OLAP. He proposes a data mnng system based on extendng SQL and constructs data mnng methods over relatonal databases. Chaudhur, Fayyad and Bernhardt (1997) developed a clent-server mddleware that performs a decson tree classfer over MS SQL Server 7.0. Meo, Psala and Cer (1996) propose a model that enables a unform descrpton for the problem of dscoverng assocaton rules. The model also extends SQL and provdes an operator called MINE RULE. The second approach conssts n adaptng multdmensonal data nsde or outsde the database system and apples classcal data mnng algorthms on the resultng datasets. Ths approach can be vewed accordng to two strateges. The frst one conssts n takng advantage from multdmensonal database management system (MDBMS) n order to help the constructon of learnng models. In (Laurent, Bouchon-Meuner, Doucet, Gançarsk & Marsala, 2000), the authors propose a cooperaton between Oracle Express and a fuzzy decson tree software (Salammbô). Ths cooperaton allows transferrng learnng tasks, storage constrants and data handlng to the MDBMS. The second strategy transforms multdmensonal data and makes them usable by data mnng methods. For nstance, Pnto et al. (2001) ntegrate multdmensonal nformaton n data sequences and apply on them the dscovery of frequent patterns. In order to apply decson trees on multdmensonal data, Gol and Choudhary (2001) flatten data cubes and extract contngency matrx for each dmenson at each constructon step of the tree. Chen, Zhu and Chen (2001) thnk that OLAP should be

9 9 adopted as a pre-processng step n the knowledge dscovery process. In the same context, Maedche, Hotho and Wese (2000) combne databases wth classcal data mnng systems by usng OLAP engne as nterface and treat telecommuncaton data. In ths nterface, OLAP tools create a target data set to generate new hypotheses by applyng data mnng methods. Tjoe and Tanar (2005) propose a method for mnng assocaton rules n data warehouses. Based on the multdmensonal data organzaton, ths method s capable of extractng assocatons from multple dmensons at multple levels of abstracton by focusng on measurements of summarzed data. In order to do ths, the authors propose to prepare multdmensonal data for the mnng process accordng to four algorthms: VAvg, HAvg, WMAvg, and ModusFlter. These algorthms prune all rows n the fact table whch have less than the average quantty and provde an ntalzed table. The latter table s used next for mnng both on non-hybrd (non-repeatable predcate) and hybrd (repeatable predcate) assocaton rules. Fu (2005) proposes an algorthm, called CubeDT, for constructng decson tree classfers based on data cubes. Ths algorthm works on statstc trees whch are representatons of multdmensonal data especally sutable for the constructon of decson trees. The thrd approach s rather based on adaptng data mnng methods and applyng them drectly on multdmensonal data. Palpanas (2000) thnks that adaptng data mnng algorthms s an nterestng soluton to provde elaborated analyss and precous knowledge. Parsaye (1997) clams that decson-support applcatons must consder data mnng wthn multple dmensons. He proposes a theoretcal OLAP Data Mnng System that ntegrates a multdmensonal dscovery engne n order to perform dscovery along multple dmensons. Sarawag, Agrawal and Megddo (1998) propose to ntegrate a multdmensonal regresson module, called Dscovery-drven, n OLAP servers. Ths module gudes the

10 10 user to detect relevant areas at varous herarchcal levels of a cube. In (Sarawag, 2001), the author proposes another tool called Dff. It detects both relevant areas n a data cube and the reasons of ther presence. The same approach was adopted by Favero and Robn (2001) to generate quanttatve analyss reports from data cubes. They ntegrate n a platform, called HYSSOP, a content determnaton component based on data mnng methods. Imelnsk, Khachyan and Abdulghan (2002) propose a generalzed verson of assocaton rules called Cubegrades. The authors clam that assocaton rules can be vewed as the change of an aggregate's measure due to a change n the cube's structure. They also ntroduce CGQL language for queryng the Cubegrades. Dong, Han, Lam, Pe and Wang (2001) enhanced the Cubegrades and ntroduced constraned gradent analyss. Ther proposton focuses on extractng pars of cube cells that are qute dfferent n aggregates and smlar n dmensons. Instead of dealng wth the whole cube, constrants on sgnfcance, probablty, and gradent are added to lmt the search range. These prevous works have proved that assocatng data mnng to OLAP s a promsng way to nvolve elaborated analyss tasks. They affrm that data mnng methods are able to extend OLAP analyss power. In addton to these works, we have proposed n (Messaoud et al., 2004) another contrbuton to ths feld by developng an Operator for Aggregaton by Clusterng called OpAC. Besdes enhancng classc OLAP wth a clusterng method, ths operator also couples OLAP and data mnng n order to deal wth complex data n multdmensonal context. We have shown n (Messaoud et al., 2004) the nterest of applyng our approach on a cube of mages fles, and we have proven the semantc sgnfcance of ts facts' aggregates. In ths paper, we propose to generalze our operator and to adapt t n order to handle XML data cubes and apply t on the breast cancer doman.

11 OVERVIEW AND OBJECTIVES OF OUR APPROACH 11 Nowadays, n almost any area of scentfc research or busness applcaton doman, there s an ncreasng avalablty of data. These data are not only becomng larger n sze, but also n complexty. Data have dfferent types, come from heterogeneous sources, and are supported by dfferent formats. Analyzng and extractng features from these data s therefore a complex task. To learn from these data, we need analyss tools that can make sense from them. OLAP s a powerful mean of explorng and extractng pertnent nformaton from data through multdmensonal analyss. In ths context, data are organzed n multdmensonal vews, commonly called data cubes. The constructon of a data cube targets a precse analyss context and descrbes real world facts. For nstance, these facts can be vewed accordng to several dmensons such as costumer, Product, Locaton, and Tme. The choce of these dmensons closely depends on the user and the way (s)he would lke to treat the facts analyss. In addton to dmensons, an OLAP fact s also evaluated by a set of quanttatve measures such as revenue, proftablty, and customer retenton. By organzng nformaton nto dmensons and measures, OLAP allows us to follow trends n a customer realm, spot anomales across products, compare annual sales n a regon by product lne or customer type. Furthermore, a dmenson s usually organzed accordng to several herarches defnng varous levels of data granularty. Each herarchal level contans a set of attrbutes (also called members), and each attrbute may conceptually nclude other attrbutes from the herarchcal level mmedately below. For example, as the Locaton dmenson may form the herarchy cty-stateregon, the attrbute Calforna from the state level could nclude Los Angeles, Long Beach, Oakland, San Dego, and Santa Monca as attrbutes from the cty level. Therefore,

12 12 by movng from a herarchcal level to a hgher one, attrbutes are gathered together nto aggregates. In consequence, measures related to the attrbutes are computed and so nformaton s summarzed to a small number of sets. In many applcaton domans, a user s sometmes faced to take crtcal decsons. Analyss tools should be effcent. For nstance, aggregated measures need to reflect sgnfcant values of a set of facts sharng relaton deeper than a smple order of membershp. In medcne doman, experts need to see aggregates of objects, lke tumors or any other pathology, that have a maxmum number of common medcal propretes. For example, n the breast cancer research feld, assocatng malgn and bengn patents n the same aggregate can cause dramatc consequences. In the recent years, clncal data were wdely treated by data mnng technques n medcne outcome analyss (Chen & Lu, 2005; Hu et al., 2005). In fact, medcne s one of the most mportant applcaton domans where a lot of efforts are needed for structurng and analysng data n order to enhance the medcal sound researches. We also propose to refer our study to an XML data cube whch descrbes suspcous regons of tumors detected on mammography screens. We constructed ths cube from the Dgtal Database for Screenng Mammography (DDSM 1 ). In the followng, we present the DDSM and the XML data cube of the screenng mammography data. Presentaton of the DDSM The DDSM s bascally a resource used by the mammographc mage analyss research communty n order to facltate sound research n the development of analyss and learnng algorthms (Heath, Bowyer, Kopans, Moore & Jr, 2000). The database contans approxmately studes, where each study corresponds to a patent case.

13 Fgure 1. An Example of a patent case study from the DDSM A Data Mnng-Based OLAP Aggregaton 13 A patent case s a collecton of mage and text fles contanng several medcal nformaton collected along a screenng mammography exam. The DDSM contans four types of patent cases: Normal, Bengn wthout callback, Bengn, Cancer. Normal type are mammograms from screenng exams that were read as normal and had a normal screenng exam. Bengn wthout callback cases are exams that had an abnormalty that was noteworthy but dd not requre the patent to be recalled for any addtonal workup. In Bengn cases, somethng suspcous was found and the patent was recalled for some addtonal workup that resulted n a bengn fndng. Cancer type corresponds to cases n whch a proven cancer was found. As shows Fgure 1, a case conssts of a set of text and mage fles. There are an cs fle (ASCII format) whch descrbes general nformaton about a patent, four LJPEG scanner fles (mage compressed wth lossless JPEG encodng), and zero to four OVERLAY fles. Only cases havng suspcous regons n ther scanner mages are assocated to overlay fles. Normal cases are not. An overlay fle contans nformaton about the locaton, the

14 14 subtlety value, and a spatal descrpton of the marked suspcous regons. These nformaton are specfed by an expert mammography radologst. The XML Cube of the Screenng Mammography Data Snce a patent study s composed by several data formats and presented on heterogeneous supports, we consder t a complex object. To warehouse and analyze such complex objects, frst, we need to structure them and make them homogeneous as well as possble. In order to do so, we use XML to represent these complex data of screenng mammography and model them n a data cube. Bascally, XML s consdered as a partcular standard syntax for the exchange of semstructured data. The structure of XML, composed of nested custom defned tags, can descrbe the meanng of the content tself. XML documents can also be assocated and valdated aganst ether a Document Type Defnton (DTD) or an XML Schema. Both of them allow descrbng the structure of an XML document and to constrant ts content. Nowadays, many works addressed methodologes based on XML for multdmensonal desgn of data warehouses n order to ntegrate nformaton from dfferent sources (Golfarell et al., 2001; Trujllo et al., 2004; Pokornỷ, 2001; Barl & Bellahsène, 2000; Hümmer et al., 2003; Rusu et al., 2005; Nasss et al., 2005). Snce a large complex amount of data s needed n a decson makng process, the mportance of ntegratng XML n data warehousng envronments s becomng ncreasngly hgh. Accordng to Golfarell et al. (2001), usng XML sources for desgnng and feedng data warehouse systems wll become a standard n the next few years. Furthermore, as XML source are becomng wdely employed, we naturally expect mportant evolutons of query languages to extract knowledge from them for decson supports (Termer, Rousset & Sebag, 2002; Braga, Camp, Cer, Klemettnen & Lanz, 2003; Feng & Dllon, 2005).

15 15 In the case of the screenng mammography data, an OLAP fact corresponds to a suspcous regon (abnormalty) detected by an expert. The set of collected facts concerns only Bengn, Bengn wthout callback, and Cancer patent cases. Normal cases are not concerned snce they do not contan suspcous regons. As shows the conceptual model n Fgure 2, a suspcous regon can be analyzed accordng to several axes: the leson type, the assessment code, the subtlety, the pathology, the date of study, the dgtzer, the patent age, etc. A suspcous regon s measured by the boundary length of ts suspcous regon. We have also added the number of regons havng the same abnormalty per patent as a derved measure to the data cube model. Fgure 2. Conceptual model of the screenng mammography data cube The conceptual model of the screenng mammography data cube s descrbed wth an XML Schema. An nstance of ths XML Schema s presented by the XML document of Fgure 3. The fact s assocated to the root element of the XML schema, whereas ts dmensons correspond to sub-elements. The measures of a fact are attrbutes n the root element, and the attrbute value of each dmenson s an attrbute n the element correspondng

16 16 to that dmenson. The screenng mammography data cube contans a collecton of XML documents, where each document corresponds to an OLAP fact. Fgure 3. Example of an XML document from the screenng mammography data cube Objectves of our Approach In OLAP context, herarchcal structure of a dmenson nduces sets of attrbutes organzed accordng to the logcal order of membershp. Through a dmenson, a classcal OLAP aggregaton computes measures of facts and gathers these facts nto groups accordng to the herarchcal order of ther attrbutes n that dmenson. For example, n the screenng mammography data cube, accordng to the age class of patents, we can buld aggregates of suspcous regons as those of Fgure 4. In ths example, we can note that, n a sngle aggregate, detected regons do not have relevant common medcal propretes. They have dfferent forms and lengths of boundares. We also note that regons of a sngle aggregate can

17 17 have dfferent types of leson. Some of them can represent bengn tumors whle some others are cancer. For example, accordng to expert annotatons, suspected regons (c), (e) and (g) of $40$ to $49$ years old patents represent cancer tumors whereas the rest of regons are bengn. In the aggregate $50$ to $59$ years old patents, an expert declares that only regons (b) and (c) are cancer. Ths classcal aggregaton presented above s fully establshed n the conceptual step of the data cube. Therefore, t does not provde to breast cancer experts sgnfcant relatons between suspcous regons. Fgure 4. Example of classcal OLAP aggregaton We wsh to buld aggregates of objects havng smlar medcal propretes. In the case of the screenng mammography data cube, we would lke to construct more homogenous aggregates of suspected regons of tumors. These aggregates should reflect relatons between objects and help experts to extract knowledge from ther common propretes. The man dea of our operator OpAC s to explot the cube's facts descrbng complex objects n order to provde over them a more sgnfcant aggregaton. In order to do so, we use a clusterng method and automatcally hghlght aggregates semantcally rcher than those provded by the current OLAP operators. So the clusterng method provdes a new OLAP aggregaton concept. Ths aggregaton provdes herarchcal groups of objects resumng nformaton and enables navgaton through levels of these groups. Exstng OLAP tools, lke

18 18 Slcng operator, can create new restrcted aggregates n a cube dmenson, too. Therefore, these tools always need a handmade assstance, whereas our operator s based on a clusterng algorthm that provdes automatcally relevant aggregates. Furthermore, wth classcal OLAP tools, aggregates are created n an ntutve way n order to compare some measure values, whereas OpAC creates sgnfcant aggregates that express deep relatons wth the cube's measures. Thus, the constructon of such aggregates s nterestng to establsh a more elaborated on lne analyss context. Accordng to the above objectves, we choose the AHC as an aggregaton method. Our choce s motvated by the fact that the herarchcal aspect consttutes a relevant analogy between AHC results and herarchcal structures of dmensons. The objectves and the results expected for OpAC match perfectly wth AHC strategy. Furthermore, AHC adopts an agglomeratve strategy that starts by the fnest partton where each ndvdual s consdered a cluster. Therefore, OpAC results nclude the fnest attrbutes of a dmenson. Moreover, AHC s compatble wth the exploratory aspect of OLAP. Its results can also be reused by classcal OLAP operators. In fact, AHC provdes several herarchcal parttons. By movng from a partton level to a hgher one, two aggregates are joned together. Conversely, by movng from a partton level to a lower one, an aggregate s dvded nto two new ones. These operatons are strongly smlar to the classcal operators Roll-up and Drll-down. AHC s a well suted clusterng method to summarze nformaton nto OLAP aggregates from complex facts.

19 FORMAL BACKGROUND OF OUR APPROACH 19 Indvduals and Varables of the Clusterng Algorthm Ths formalzaton defnes domans of ndvduals and varables of the clusterng problem. Note that these domans are extracted from a multdmensonal envronment. Thus, we should respect some constrants to ensure the statstcal and logcal valdty of the extracted data. Let Ω be the set of ndvduals, and Σ be the set of varables. We also assume that: C s a data cube havng d dmensons and m measures. Accordng to Fgure 2, the XML screenng mammography data cube conssts of nne dmensons and two measures, n ths case d = 9 and m = 2 ; D, K, D,, D are the dmensons of C. For example, n Fgure 2, 1 K d Subtlety dmenson corresponds to D 3 ; M, K, M q,, M are the measures of C. For example, n Fgure 2, Regon 1 K m length corresponds to M 1, and Boundary length corresponds to M 2 ; { 1K,,d}, the dmenson D contans n herarchcal levels. For nstance, Patent dmenson ( D 8 ) of Fgure 2 s composed of two herarchcal levels. So, we note n = 8 2 ; hj s the th j herarchcal level of D, where { 1K, } j,n ; j { 1K,, } n, the herarchcal level h j contans l j attrbutes (or members); g jt s the t th attrbute of j h, where { 1K, } t,l j ; G ( h j ) s the set of attrbutes of h j.

20 Let suppose that we ntend to aggregate attrbutes from level h j. So the user may choose the dmenson D, the herarchcal level h j n D, and even select ndvduals n G ( h j ). We assume that selected attrbutes are elements of Ω. Therefore, we defne the set of ndvduals as follows: j { g, K, g, g } Ω G ( h ) = 1 K, (1) j Now, we adopt the followng notatons: jt jl j 20 s a meta-symbol ndcatng the total aggregate of a dmenson; q { 1K,,m}, we defne the measure M q as the functon: M q : G R. As shows Formula (2), G s the set of d-tuples of all the herarchcal level's attrbutes of the cube C ncludng the total aggregates of dmensons: G = d = 1 G( hj ) 123 j { 1, K, n } G = G( h1 j ) 123 j { 1, K, n1 } { } { } K G( hj ) 123 j { 1, K, n } { } K G( hdj ) 123 j { 1, K, nd } { } (2) For example, for the data cube of Fgure 2, by usng the above notatons, we can say that: M ( calcfcaton, 2,, K, ) ponts out the aggregated value of the length of 1 all suspcous regon havng calcfcaton as leson type and 2 as subtlety code; M 2(, K,, Patent between 50and 59 years old, lumsyslaser ) ponts out the number of suspcous regons of patents between 50 and 59 years old, scanned by a lumsys laser dgtzer. Remnd that the objectve of OpAC s to establsh a semantc aggregaton va a clusterng technque on real data cube facts. In order to do so, we adopt the cube measures as

21 quanttatve varables descrbng the ndvduals of Ω. However, t s necessary to satsfy two fundamental constrants on varables: Frst constrant. Herarchcal levels belongng to the dmenson D whch s retaned for the ndvduals can not generate varables. In fact, descrbng an ndvdual by a property whch contans t does not make logcal sense. Conversely, a varable whch specfes a property of an ndvdual would only descrbe ths one; 21 Second constrant. In a dmenson, only one herarchcal level should be selected to generate varables. Ths constrant enables the ndependence of varables. In fact, a value taken by an attrbute from a herarchcal level can be calculated from attrbutes' values belongng to the lower level. Snce Ω s selected, we formulate the possble extracted set of varables Σ as defned n Formula (3): { 1, K, l } V / t j Σ V ( gjt ) = M q, K,, gjt,, K,, { { gsrv,, K, 123 { } { } { } j 1, K, n j 1, K, n r 1, K, n j s j wth s, r s unque for each s, v 1, K, lsr, and q { } { 1, K, m} (3) A user can defne the set of varables by selectng dmensons h sr, and measures precse attrbutes out by the user. D s, herarchcal levels M q. In order to acheve precse analyss tasks, a user may also select g srv n h sr. The selecton of g srv depends naturally on the objectves carred

22 The Agglomeratve Herarchcal Clusterng Algorthm A Data Mnng-Based OLAP Aggregaton Once ndvduals and varables are selected, we can run the AHC algorthm. We note X the ndvduals-varables table. X s a ( n, p) matrx. Its rows represent ndvduals of Ω, and ts columns represent varables of Σ. We suppose that n s the number of ndvduals, and p s the number of varables. Dssmlartes between all pars of ndvduals are pre-computed. Thus a ( n, n) dssmlarty matrx S s constructed. The dssmlarty of two ndvduals s computed accordng to a dstance functon. A lot of dstance can be used, such as the Eucldan dstance. The general term of S s s j, whch corresponds to the dstance between the 22 ndvduals and j. The greater s j s, the less smlar ndvduals and j are. We sum up the AHC algorthm by the followng steps: Step 1. The n ndvduals of X are assgned nto n dstnct clusters ndexed by { A A, } 1, 2K A n ; Step 2. Two dstnct clusters A and dssmlarty measure s the smallest; Step 3. The two clusters A and A j are pcked up such that ther Aj are merged nto a new cluster A n+ 1. At each step two clusters are merged to form a new cluster. Therefore, the number of clusters s reduced by one; Step 4. Step 2 and 3 are repeated untl the number of obtaned clusters s reduced to a requred number n c, or the smallest dssmlarty value between clusters s dropped to a lower threshold.

23 In the specfc context of our operator OpAC, t s up to the user to choose the number n c of clusters he requres to see at the end of the AHC algorthm. Else, n a default stuaton, the AHC algorthm s stopped when t attends a sngle cluster. 23 EVALUATION OF AGGREGATES Recall that we propose to use AHC as an aggregaton operator over the attrbutes of a cube dmenson. For n ndvduals to classfy, the AHC generates n herarchcal parttons. Lke almost all unsupervsed mnng methods, the man defect of AHC s that t does not gve mplct evaluaton of ts results. In partcular, we do not have any ndcator about provded parttons of clusters. Therefore, t s qute tedous to choose the best partton suted wth analyss objectves. Furthermore, the choce of the best partton s more dffcult when we deal wth a great number n of ndvduals. Usually, t s the expert who decdes about the number of aggregates that corresponds both to the context and to the goal of hs analyss. In data mnng lterature, many efforts have provded a set of statstcal measures for cluster qualty evaluaton. We emphasze that n our current study, the terms cluster and class refer to an OLAP aggregate provded by our operator. Note that unsupervsed clusterng methods lack a unversal crteron of cluster qualty. Any measure of cluster qualty n ths feld closely depends on the way t s computed. It also depends on the orentatons of user's analyss (Lamrel, Franços, Shehab & Hoffmann, 2004). Hence, for our operator, we propose to use more than one qualty crteron. The comparson of many crtera seems mandatory n order to study the qualty of the resultng aggregates and to decde about the best partton accordng to user's requrements. In the followng, we present the ntra and nter-clusters nertas (Lebart, Morneau & Fénelon, 1982) and the Ward's method (Ward, 1963) that we used as crtera to measure the qualty of aggregates obtaned by OpAC. In addton to these two crtera, we also propose a

24 new crteron based on the separablty of classes (Zghed, Lallch & Muhlenbach, 2002). In order to formulate these crtera, we assume the followng notatons: = { ω, ω, 2, } Ω s the set of ndvduals to cluster; 1 K ω n each ndvdual takes the weght P (ω), and t s descrbed by p numercal varables V, V, 1 2 K, Vp ; let { 0,, n 1} k K be the ndex of AHC teratons (or parttons). k = 0 corresponds to the ntal AHC partton where each ndvdual represents a sngle cluster. In general, an teraton k corresponds to a partton wth clusters; n teraton k, clusters A and n k A j are merged together, and we move from the partton k 1 to the partton k. A, A, 1 2 K, A n k represents the current partton of Ω ; 24 n s the sze of the cluster A,.e. the number of ndvduals n A ; {,, n k} 0K, the cluster A takes the weght P( A ) = P(ω) ; ω A 1 G( A ) = P( ω ) V ( ω) s the gravty center of A ; P( A ) ω Ω ω A G = P( ω ) V ( ω) s the gravty center of Ω ; d s the Eucldan dstance, and 2 d s the Squared Eucldan dstance. Intra and Inter-Clusters Inertas consst of: These crtera derve from the classcal measures of nerta (Lebart et al., 1982). They

25 25 mnmzng the ntra-cluster dstances,.e. the dstance between ndvduals wthn a cluster; maxmzng the nter-cluster dstances,.e. the dstance between the gravty's centers of the clusters. For a gven subset of ndvduals I ( A ) = P( ω ) d( V ( ω), G( )) (4) A ω A A, the ntra-cluster nerta s defned as: The total ntra-clusters nerta of a partton k s defned by the sum of ts ( n k) subsets' nerta: I n = k nt ra k) I ( A ) = 1 ( (5) The nter-clusters nerta s defned by the weghted sum of dstances between the gravty's center of Ω and the gravty's centers of all the subsets A of the partton k. I n = k nt er k) P( A ) d( G( A ), G) = 1 ( (6) Accordng to the theorem of Huygens, for each partton, the sum of the two nertas s constant and equal to the nerta of Ω. k {,, n 1 }, I ( k) + I ( k) = I ( Ω) K (7) 0 nt ra nt er The ntra-cluster nerta (respectvely nter-clusters nerta) s an ncreasng (respectvely decreasng) functon accordng to the ndex of parttons k. Remember that the teraton k corresponds to a partton wth ( n k) aggregates. Therefore, the ntra-cluster nerta s a decreasng functon accordng to the number of aggregates. Whle movng from a partton to another, a remarkable break pont of the ntra or nter-clusters nerta wll be an ndcator n the choce of the number of aggregates. Through these crtera, we help the user to attend a better compromse between the mnmzaton of the ntra-clusters nerta, the

26 26 maxmzaton of the nter-clusters nerta, the number of aggregates, the sgnfcance of the aggregates, and the analyss' objectves. The ntra and nter-clusters nertas may present some lmts snce they have a monotonous general trend. We also propose to use the Ward's method whch s another way of evaluatng the AHC result's by measurng ts mergng cost when movng from a partton to another. Ward s Method The Ward's methods, proposed n (Ward, 1963), constructs a crteron that consders what happens to the sum of squared devatons from the gravty centers of two merged clusters A and A j. Ths mergng cost turns to calculate the Squared Eucldan dstance between the gravty center's of the merged clusters weghted accordng to ther respectve szes at each AHC teraton. The formula of ths crteron s wrtten as follows: n n j 2 W ( A, Aj ) = d ( G( A ), G( Aj )) (8) n + n j At each AHC teraton, ths crteron measures varaton of nternal nerta when two clusters are merged together. Recall that the am s to fnd a partton where ts clusters are as homogenous as possble. Ths leads to mnmze the nternal nerta of clusters. Therefore, when the Ward's method provdes a hgh crteron at teraton k, t mples a great varaton of nternal nerta when movng from a partton k 1 to a partton k. Ths varaton s qute an ndcator that helps users to prefer the prevous partton k 1 whch corresponds to ( n k +1) aggregates. In general, the Ward's method provdes more than one relevant varaton n a herarchcal clusterng. Once agan, t s up to users to choose the best partton that provdes the best soluton to the analyss' objectves.

27 27 Note that the two prevous crtera are manly related to the prncple of nerta whch measures the homogenety of clusters. In order to provde a complementary way of evaluatng aggregates, we propose a new alternatve crteron that rather measures the qualty of aggregates accordng to the proprety of separablty of classes (Zghed et al., 2002). Separablty Based Crteron Ths crteron s derved from the method of separablty of classes bascally ntroduced by Zghed et al. (2002). Ths crteron starts by constructng a neghborhood graph for the whole set of objects to aggregate. A neghborhood graph, also called a proxmty graph, s a vsual presentaton whch dsplays the overall arrangement of ndvduals n ther space representaton. In such a graph, ndvduals are presented by ponts, and two ponts are connected by an edge f they are, by a certan measure, close together. Specfcally, two ponts are lnked together f there are no other ponts n a certan forbdden regon defned by these two ponts. The Gabrel graph s a partcular case of neghbourhood graphs proposed n (Gabrel & Sokal, 1969). It has been studed n the feld of classfcaton as a way to edt and condense large data sets. In the Gabrel graph, two ponts A and B are connected f ther dametral sphere (.e. the sphere such that AB s ts dameter) does not contan any other ponts. Fgure 5 (a) shows a plane representaton of a Gabrel graph constructed on a set of objects descrbed by two varables X 1 and X 2.

28 Fgure 5: Prncple of the separablty based crteron A Data Mnng-Based OLAP Aggregaton 28 We assume that g Ω s the Gabrel graph constructed on the whole set Ω of ndvduals. At each AHC teraton { 0,, n 1} k K, our crteron conssts n buldng for each constructed cluster A ( {,, n k} 1K ) ts own Gabrel graph noted g A. Remark that: n k U{ g A } gω = 1 In fact, n a partton of ndvduals, the unon of sub-graphs of ts clusters ( {,, n k} 1K ) does not correspond to the whole graph of Ω. Let j e, also noted { } ω ω j, be the edge that connects two ndvduals A ω and n a neghborhood graph. Each edge e j can be assocated to a weght P ( e j ) accordng to the opposte Eucldean dstance that separates ts connected ponts ω and ω j. ω j

29 29 P( e j ({ ω }) 1 ) = P ω j = (9) d( ω, ω ) j The weght assocated to edges allows to quantfy the mportance of each connecton n a neghborhood graph. In fact, two ponts separated by a large dstance are easly separable, so ther connecton s relatvely weak. Therefore, two close ponts are less separable, and ther connecton s qute strong. In a smple case, we can also consder that all connectons n a neghbourhood graph have the same separablty level. Hence, we assocate the same weght ( P ( ) = 1) for all the edges of the graph. e j For each AHC teraton, the separablty based crteron conssts n computng the sum of new bult connectons for the Gabrel graphs of clusters A ( { 1K,, n k} ). Let k ξ be the set of the new bult edges at teraton k of the AHC. For example, accordng to Fgure 5, at the teraton k = 3, the cluster 2 s merged wth the cluster {3,4}. The new bult connectons n ths case are { 2 3} and { 4 } = { 2 3},{ 2 4 } 3 ξ. Let (k) 2. Therefore, we note J be the sum of new connectons of Gabrel graphs bult at teraton k. J (k) s wrtten accordng to the followng formula: J ( k) = P( e) (10) k e ξ Our crteron ams at evaluaton of separablty of clusters for each AHC partton. Two clusters are more separable when they are connected va a small number of edges wth weak connectons. Nevertheless, the mportance of new bult edges at each teraton should also take nto account the current number of clusters. Thus, the formula of our separablty based crteron s wrtten as follows: P( e) J ( k) k e ξ S( k) = = (11) n k n k

30 S (k) computes, per cluster, the rato of new bult edges when AHC merges two clusters by movng from partton ( k 1) to k. In the crteron formula, we dvde J (k) by ( n k) n order to get a relatve evaluaton of separablty accordng to the current number of clusters. When J (k) has a relatve low value compared to other parttons, t means that the fact of movng from the ( k 1) to the k partton, weak connectons are bult, and therefore, the merged clusters are qute separable. So, the user may prefer to select the partton ( k 1) rather than the partton k. For example, Fgure 5 (c) dsplays the process of buldng edges of the Gabrel graph at each teraton of AHC provded n Fgure 5 (b). We suppose n ths example that all connectons have the same weght ( P ( ) = 1). Ths example also provdes the number of bult edges J (k) and the crteron value S (k) at each step. We note that S (k) marks a relatve low value for the partton k = 5. Ths can help the user to select the prevous partton ( k = 4 ) wth sx separable clusters. e j 30 IMPLEMENTATION AND EXPERIMENTAL RESULTS To valdate our approach, we have developed a Web based envronment platform called MnngCubes. We have ncluded n ths platform an mplementaton of OpAC. In the followng, we detal the archtecture of ths Web applcaton and present some performance experments that we have led over t. Archtecture of the Web Applcaton MnngCubes contans a set of OLAP modules lke a connecton to classcal data cubes va MS SQL Server 2000/Analyss Servces, a connecton to XML data cubes and an exploraton of multdmensonal data. In addton to these OLAP tools, we have also ntegrated analyss modules based on data mnng methods. Among these, we developed a

31 31 module for our operator OpAC whch s composed of four components: a Data loader component from Analyss Servces of MS SQL Server 2000 or drectly from XML documents, a Parameter settng nterface, a Clusterng component that provdes aggregates of objects, and an Aggregates evaluaton component to measure the pertnence of parttons of aggregates accordng to the crtera presented n the prevous secton. Fgure 6, shows the general archtecture of the OpAC module. In the followng, we detal the functons of each component. Fgure 6: General archtecture of the OpAC module The data loader component. Ths component connects and loads nformaton about the structure (labels of dmensons, herarchcal levels and measures),

32 32 and the content of a data cube. It can work ether on a data cube stored n the Analyss Servces of MS SQL Server 2000 or drectly on XML data cubes. To connect to a data cube on Analyss Servces, the data loader component uses MDX queres (Multdmensonal Expressons) to mport nformaton about the cube's structure. In the case of a connecton to an XML data cube, the component uses the DOM (Document Object Model) MSXML to parse the XML schema that represents the conceptual model of the data cube. The DOM s also used to load the data of the cube from ts correspondng XML documents. As the applcaton s based on the Web technology, a user should enter, n a Web form, a cube name, ts XML schema and ts correspondng XML documents (see Fgure 7). The applcaton wll automatcally load on the Web server the XML schema, and the XML documents. The parameter settng nterface. Ths component asssts the user to extract both ndvduals and varables from a data cube. It enables navgaton nto herarchcal levels of dmensons, selecton of attrbutes selecton of attrbutes g jt for ndvduals, g srv, and selecton of measures M q for the varables of the clusterng problem. It also provdes a user assstance respectng constrants whch we have defned n the prevous formalzaton. The clusterng component. The clusterng component enables the selecton of the dssmlarty measure and the aggregaton crteron. We mplemented four dssmlarty measures (the Eucldean Dstance, the Squared Eucldean Dstance, the Manhattan Dstance, and the Chebychev Dstance), and seven aggregaton crtera (the Ward's crteron, the Nearest Neghbor crteron, the Furthest Neghbor crteron, the Average Dstance crteron, the McQueen's

33 33 crteron, the Medan Clusterng crteron, and the Centrod Clusterng crteron). Once the user selects dssmlarty measure and the aggregaton crteron, the clusterng component constructs the AHC model, and plots ts results wthn a dendrogram. The aggregates evaluaton component. Ths component computes at each step of the AHC the crtera presented n the prevous secton. In fact, for each constructed partton, ths component calculates nter and ntra-clusters nertas, and the separablty based crteron. When AHC moves from a partton to the next one, ths component also calculates the sum of squared devatons accordng to the Ward's method. In the end of the AHC, the aggregates evaluaton component plots the prevous crtera results wthn graphs. Each graph presents a curve of a crteron accordng to parttons. Ths component gves an dea about the qualty of AHC parttons. It also helps the user to decde about the best number of aggregates he wants to consder. Fgure 7: An XML data cube loaded by MnngCubes

34 34 Performances of the Web Applcaton We have expermentally evaluated performances of our Web applcaton wthn datasets of XML documents. We have constructed these datasets by a random samplng on the whole collecton of OLAP facts from the screenng mammography data cube presented n the thrd secton 2. Recall that ths data cube contans OLAP facts, where each fact s presented by an XML document as shows the example of Fgure 3. The current experments measure tmes processng for dfferent stuatons of nput data and parameters of our operator OpAC supported by the Web applcaton MnngCubes. We led these seres of experments under Wndows XP on a 1.60 GHz PC wth 480 MB of RAM, and an Intel Pentum 4 processor. Fgure 8: (a) Effect of XML documents' number on DOM parsng tme. (b) Effect of XML documents' number on AHC tme processng We have measured the runnng tme of the data loader component for loadng XML documents, and for constructng an XML data cube. The runnng tme of the DOM parser s summarzed by the curve of Fgure 8 (a). The general trend of the curve proves that the parsng tme has a lnear ncreasng accordng to the number of XML documents. Note that these experments were acheved on localhost, so n a real clent/server archtecture, n addton to the parsng tme we should also take nto account the communcaton tme of the used network.

35 35 We also evaluated the tme processng of the clusterng component. Accordng to Fgure 8 (b), the processng tme of AHC marks a polynomal ncreasng accordng to the number of documents. Indeed, there are two man expensve steps n the agglomeratve clusterng. The frst one corresponds to the computaton of the parwse dssmlarty between all the documents. Let n be the number of XML documents to cluster, the complexty of ths step s O ( n 2 ). The second step s the repeated selecton of the par of most smlar clusters. 2 Durng the teraton k, the AHC algorthm requres O (( n 1) ) tme. Ths lead to an overall complexty of O ( n 3 ). Nevertheless n OLAP context, we should note that we usually deal wth data cube dmensons wth relatvely small number of attrbutes. In addton, n the context of our operator, the AHC complexty would be avoded snce a user focus on targeted analyss wth precse, and small number, of facts to aggregate. In the next secton, we ntroduce a real case study on the XML screenng mammography data cube. APPLICATION ON THE XML SCREENING MAMMOGRAPHY DATA CUBE To llustrate the results of our operator, we propose to run t on the screenng mammography data cube presented n Fgure 2. We suppose that a user needs to create aggregates from the attrbutes of the Scanner name level ( h 71 ) of the Scanner mage dmenson ( D 7 ). We suppose that (s)he selects from G ( h 71 ) a set of 36 mammogram scanners. Fgure 9 shows the set of the selected ndvduals Ω.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

A Simple Approach to Clustering in Excel

A Simple Approach to Clustering in Excel A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Cluster Analysis. Cluster Analysis

Cluster Analysis. Cluster Analysis Cluster Analyss Cluster Analyss What s Cluster Analyss? Types of Data n Cluster Analyss A Categorzaton of Maor Clusterng Methos Parttonng Methos Herarchcal Methos Densty-Base Methos Gr-Base Methos Moel-Base

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Search Efficient Representation of Healthcare Data based on the HL7 RIM

Search Efficient Representation of Healthcare Data based on the HL7 RIM 181 JOURNAL OF COMPUTERS, VOL. 5, NO. 12, DECEMBER 21 Search Effcent Representaton of Healthcare Data based on the HL7 RIM Razan Paul Department of Computer Scence and Engneerng, Bangladesh Unversty of

More information

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Abstract. Clustering ensembles have emerged as a powerful method for improving both the Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

An Inductive Fuzzy Classification Approach applied to Individual Marketing

An Inductive Fuzzy Classification Approach applied to Individual Marketing An Inductve Fuzzy Classfcaton Approach appled to Indvdual Marketng Mchael Kaufmann, Andreas Meer Abstract A data mnng methodology for an nductve fuzzy classfcaton s ntroduced. The nducton step s based

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

Web Object Indexing Using Domain Knowledge *

Web Object Indexing Using Domain Knowledge * Web Object Indexng Usng Doman Knowledge * Muyuan Wang Department of Automaton Tsnghua Unversty Bejng 100084, Chna (86-10)51774518 Zhwe L, Le Lu, We-Yng Ma Mcrosoft Research Asa Sgma Center, Hadan Dstrct

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Poltecnco d Torno Porto Insttutonal Repostory [Artcle] A cost-effectve cloud computng framework for acceleratng multmeda communcaton smulatons Orgnal Ctaton: D. Angel, E. Masala (2012). A cost-effectve

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms Internatonal Journal of Appled Informaton Systems (IJAIS) ISSN : 2249-0868 Foundaton of Computer Scence FCS, New York, USA Volume 7 No.7, August 2014 www.jas.org Cluster Analyss of Data Ponts usng Parttonng

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Software project management with GAs

Software project management with GAs Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

A Dynamic Load Balancing for Massive Multiplayer Online Game Server A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

Selecting Best Employee of the Year Using Analytical Hierarchy Process

Selecting Best Employee of the Year Using Analytical Hierarchy Process J. Basc. Appl. Sc. Res., 5(11)72-76, 2015 2015, TextRoad Publcaton ISSN 2090-4304 Journal of Basc and Appled Scentfc Research www.textroad.com Selectng Best Employee of the Year Usng Analytcal Herarchy

More information

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

A Dynamic Energy-Efficiency Mechanism for Data Center Networks A Dynamc Energy-Effcency Mechansm for Data Center Networks Sun Lang, Zhang Jnfang, Huang Daochao, Yang Dong, Qn Yajuan A Dynamc Energy-Effcency Mechansm for Data Center Networks 1 Sun Lang, 1 Zhang Jnfang,

More information

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Complex Service Provisioning in Collaborative Cloud Markets

Complex Service Provisioning in Collaborative Cloud Markets Melane Sebenhaar, Ulrch Lampe, Tm Lehrg, Sebastan Zöller, Stefan Schulte, Ralf Stenmetz: Complex Servce Provsonng n Collaboratve Cloud Markets. In: W. Abramowcz et al. (Eds.): Proceedngs of the 4th European

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Dynamic Fuzzy Pattern Recognition

Dynamic Fuzzy Pattern Recognition Dynamc Fuzzy Pattern Recognton Von der Fakultät für Wrtschaftswssenschaften der Rhensch-Westfälschen Technschen Hochschule Aachen zur Erlangung des akademschen Grades enes Doktors der Wrtschafts- und Sozalwssenschaften

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

A Fast Incremental Spectral Clustering for Large Data Sets

A Fast Incremental Spectral Clustering for Large Data Sets 2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School

More information

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks Master s Thess Ttle Confgurng robust vrtual wreless sensor networks for Internet of Thngs nspred by bran functonal networks Supervsor Professor Masayuk Murata Author Shnya Toyonaga February 10th, 2014

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Business Process Improvement using Multi-objective Optimisation K. Vergidis 1, A. Tiwari 1 and B. Majeed 2

Business Process Improvement using Multi-objective Optimisation K. Vergidis 1, A. Tiwari 1 and B. Majeed 2 Busness Process Improvement usng Mult-objectve Optmsaton K. Vergds 1, A. Twar 1 and B. Majeed 2 1 Manufacturng Department, School of Industral and Manufacturng Scence, Cranfeld Unversty, Cranfeld, MK43

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello * Internatonal Journal of Computatonal Scence 992-6669 (Prnt) 992-6677 (Onlne) Global Informaton Publsher 27, Vol., No., 27-39 A neuro-fuzzy collaboratve flterng approach for Web recommendaton G. Castellano,

More information

An Integrated Approach of AHP-GP and Visualization for Software Architecture Optimization: A case-study for selection of architecture style

An Integrated Approach of AHP-GP and Visualization for Software Architecture Optimization: A case-study for selection of architecture style Internatonal Journal of Scentfc & Engneerng Research Volume 2, Issue 7, July-20 An Integrated Approach of AHP-GP and Vsualzaton for Software Archtecture Optmzaton: A case-study for selecton of archtecture

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Research on Evaluation of Customer Experience of B2C Ecommerce Logistics Enterprises

Research on Evaluation of Customer Experience of B2C Ecommerce Logistics Enterprises 3rd Internatonal Conference on Educaton, Management, Arts, Economcs and Socal Scence (ICEMAESS 2015) Research on Evaluaton of Customer Experence of B2C Ecommerce Logstcs Enterprses Yle Pe1, a, Wanxn Xue1,

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

iavenue iavenue i i i iavenue iavenue iavenue

iavenue iavenue i i i iavenue iavenue iavenue Saratoga Systems' enterprse-wde Avenue CRM system s a comprehensve web-enabled software soluton. Ths next generaton system enables you to effectvely manage and enhance your customer relatonshps n both

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages Assessng Student Learnng Through Keyword Densty Analyss of Onlne Class Messages Xn Chen New Jersey Insttute of Technology xc7@njt.edu Brook Wu New Jersey Insttute of Technology wu@njt.edu ABSTRACT Ths

More information

Overview of monitoring and evaluation

Overview of monitoring and evaluation 540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000 Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information