Essays on using Formal Concept Analysis in Information Engineering

Size: px
Start display at page:

Download "Essays on using Formal Concept Analysis in Information Engineering"

Transcription

1 FACULTEIT ECONOMIE EN BEDRIJFSWETENSCHAPPEN KATHOLIEKE UNIVERSITEIT LEUVEN Essays on using Formal Concept Analysis in Information Engineering Proefschrift voorgedragen tot het behalen van de graad van Doctor in de Toegepaste Economische Wetenschappen door Jonas POELMANS Nummer

2 Committee Supervisor Prof. Dr. Guido Dedene K.U.Leuven Supervisor Prof. Dr. Stijn Viaene K.U.Leuven Supervisor Prof. Dr. Monique Snoeck K.U.Leuven Prof. Dr. Henrik Scharfe Prof. Dr. Ir. Rik Maes Prof. Dr. Aimé Heene Aalborg University Universiteit van Amsterdam Universiteit Gent Daar de proefschriften in de reeks van de Faculteit Economische en Toegepaste Economische Wetenschappen het persoonlijk werk zijn van hun auteurs, zijn alleen deze laatsten daarvoor verantwoordelijk

3

4 Acknowledgements First and foremost I wish to express special thanks to my supervisor Guido Dedene. His guidance during my doctoral trajectory and numerous research ideas were of great value and helped me achieve this end result. I also want to express special thanks to Stijn Viaene for carefully reading and giving feedback on every text I wrote. Our numerous discussions helped me in staying enthusiast on doing research and improving my work. I also want to thank Monique Snoeck for her help with the software engineering part of my thesis and for helping me with Microsoft Word every time I got stuck again. I am very grateful to Paul Elzinga from the Amsterdam-Amstelland police for our close cooperation on the domestic violence, human trafficking and terrorism data. Our discussions always resulted in interesting research avenues and we made many interesting discoveries that will hopefully be beneficial to police officers working in the field. I would like to thank the police of Amsterdam-Amstelland for granting us the liberty to conduct and publish this research. In particular, we are most grateful to Deputy Police Chief Reinder Doeleman and Police Chief Hans Schönfeld for their continued support. I also want to thank my mother Gerda for our research work on breast cancer treatment and for her overall support during the course of my PhD. I want to thank the Gasthuiszusters Antwerpen and in particular Herman Van der Mussele for giving me access to their medical data. I would like to thank Edward Peters for our cooperation on the medical data analysis. I also want to thank Marc Van Hulle for his insights and help with the self organizing maps. I also want to thank my doctoral committee members Rik Maes, Aime Heene and Henrik Scharfe for their feedback on my final text. I would also like to thank my colleagues for the stimulating work environment. In particular, I would like to thank Karel for his help with Matlab programming, Raf for the many snooker evenings filled with fruit beers, Tom for always wishing me a good morning in the early afternoon, Geert and Pieter for the enjoyable talks in the evenings after work, David and Filip for their always funny jokes, Willem for the many enjoyable breaks during long working days, Wouter for his valuable advice on data mining, and finally Jochen and Helen for the pleasant talks during lunch breaks. Special thanks go to Vicky Verhaert for helping me prepare the final version

5 of my thesis. I would like to thank our secretary Nicole Meesters for the numerous times she helped me arranging my conference travels. I also want to thank my sister, father and my friends for supporting me during the course of my PhD. Finally I want to thank the Fonds Voor Wetenschappelijk Onderzoek Vlaanderen for funding this research. This PhD was written as part of my aspirant of FWO research mandate. Jonas, Leuven, July 2010

6 Contents ACKNOWLEDGEMENTS... III CONTENTS...V SAMENVATTING... XIII CHAPTER 1... XXI INTRODUCTION... XXI CHAPTER FORMAL CONCEPT ANALYSIS IN INFORMATION ENGINEERING: A SURVEY Introduction Formal Concept Analysis FCA essentials FCA software Web portal Dataset Studying the literature using FCA Knowledge discovery and data mining Some FCA mining applications Association rule mining Software mining Web mining Extending FCA for data mining FCA mining applications in biology and medicine FCA in relation to other machine learning techniques 38 Fuzzy FCA in KDD... 39

7 Information retrieval Knowledge representation and browsing with FCA Knowledge browsing systems based on FCA Query result improvement with FCA Defining and processing complex queries with FCA Domain knowledge incorporation in query search results: contextual answers & ranking Fuzzy FCA in IR Scalability Iceberg lattices Reducing the size of concept lattices Handling complex data FCA algorithm scalability Ontologies Ontology construction Ontology quality management Linguistic applications using FCA and ontologies Ontology mapping and merging Fuzzy and rough FCA in ontology construction and mapping Extensions and related disciplines of FCA theory Fuzzy FCA Applications of fuzzy FCA Fuzzy concept lattice reduction Parameterized approaches for fuzzy concept lattices Multi-adjoint and generalized concept lattices Variations to traditional fuzzy FCA Fuzzy attribute logic, implications and inference Rough set theory Combining FCA with rough set theory Generalizations of rough set theory Attribute reduction Fuzzy FCA and rough set theory... 67

8 Monotone concept and AFS algebra Triadic FCA Temporal FCA Mathematical research and algorithmic innovations Conclusions CHAPTER CURBING DOMESTIC VIOLENCE: INSTANTIATING C-K THEORY WITH FORMAL CONCEPT ANALYSIS AND EMERGENT SELF ORGANIZING MAPS Introduction Intelligence Led Policing Domestic violence Motivation FCA, ESOM and C-K theory Formal Concept Analysis Emergent Self Organizing Map Emergent SOM ESOM parameter settings C-K theory Instantiating C-K theory with FCA and ESOM Dataset Data preprocessing and feature selection Initial classification performance Iterative knowledge discovery with FCA and ESOM Transforming existing knowledge into concepts Expanding the space of concepts Transforming concepts into knowledge Expanding the space of knowledge Actionable results Comparative study of ESOM and multi-dimensional scaling Conclusions CHAPTER

9 EXPRESSING OBJECT-ORIENTED CONCEPTUAL DOMAIN MODELS IN FORMAL CONCEPTS ANALYSIS Using Formal Concept Analysis for verification of process data matrices in conceptual domain models Introduction Research Question FCA essentials MERODE essentials FCA and the object-event matrices Object-event table with empty row column Propagation rule and subconcept-superconcept relation 135 FCA and the contract rule Verification process Conclusions Extending Formal concept Analysis with a second ordering relationship on concepts: a software engineering case study Introduction Expressing UML associations with an FCA lattice Expressing inheritance relationships with an FCA lattice 144 Merging the existence dependency and inheritance lattices 146 Properties of extended lattice Applications Accumulation rule Conclusions An Iterative Requirements Engineering Framework based on Formal Concept Analysis and C-K theory matrix Requirements engineering artifacts Use cases, conceptual domain model and quality UML class model normalization UML associations and the use case-entity interaction 165 C-K theory Iterative requirements engineering process using FCA

10 C-K theory in requirements engineering FCA lattice properties and relation with software artifacts Case study: Book Trader System Case study: Hotel Administration System Case Study: Ordering system Case study: Elevator repair system Validation experiment Participants Setup of experiment Results of experiment FCA in requirements engineering Conclusions CHAPTER FORMAL CONCEPT ANALYSIS OF TEMPORAL DATA Detecting and profiling human trafficking suspects with Temporal Concept Analysis Backgrounder Human trafficking Current situation Dataset Human trafficking indicators Temporal Concept Analysis FCA essentials TCA essentials Detecting and profiling human traffickers Detecting suspects with FCA Profiling suspects with TCA Network evolution analysis using TCA Conclusions Terrorist threat assessment with Temporal Concept Analysis Introduction Backgrounder

11 Home-grown terrorism The four phase model of radicalism Current situation Dataset Formal Concept Analysis Temporal Concept Analysis Research method Extracting potential jihadists with FCA Constructing Jihadism phases with FCA Build detailed TCA lattice profiles for subjects Conclusions Combining business process and data discovery techniques for analyzing and improving integrated care pathways Introduction Business process discovery Hidden Markov Models HMM-based process discovery Data discovery with Formal Concept Analysis Dataset Analysis and results Quality of care analysis Process variations Workforce intelligence Data entrance quality issues Discussion and conclusions CHAPTER CONCLUSION Thesis conclusion Future work APPENDIX A LITERATURE SURVEY THESAURUS APPENDIX B DOMESTIC VIOLENCE THESAURUS

12 APPENDIX C DOMESTIC VIOLENCE REVIEW REPORTS LIST OF FIGURES BIBLIOGRAPHY

13

14 Samenvatting Formele Concept Analyse (FCA) is een vrij nieuwe wiskundige techniek (Ganter 1999), die initieel gebruikt is voor de structurering van cluster analyses. De essentie van de techniek bestaat uit een reeks algoritmes om een associatie-matrix van entiteiten en attributen om te zetten naar een roosterstructuur ( lattice ) van concepten, die (groepen van) entiteiten voorstellen met gelijkaardige attributen, en omgekeerd. Belangrijk hierbij is dat een roosterstructuur een niet-hiërarchische, a-cyclische partiële orderelatie vormt tussen de concepten. In de eerste onderzoekslijn is een nieuwe originele toepassing van FCA ontwikkeld, waarbij FCA gebruikt wordt als een techniek om binnen grootschalige ongestructureerde gegevensbronnen concepten en patronen te herkennen. Er bestaat een grote verscheidenheid aan data mining technieken, vooral voor gestructureerde gegevens. Voor ongestructureerde gegevens zijn er niet alleen minder technieken beschikbaar, maar zijn er ook weinig methoden om de resultaten van de data mining analyse te interpreteren, en te valideren. In het kader van een leerstoel met de Politie Amsterdam- Amstelland is aangetoond dat FCA hier een bijzondere bijdrage kan leveren. In een proof-of-concept is een analyse doorgevoerd van politieverslagen rond huishoudelijk geweld, teneinde te komen tot preventieve indicatoren rond huiselijk geweld binnen gezinnen. FCA is toegepast op reële gegevensverzamelingen, waarbij de roosterstructuur met concepten een duidelijke aanwijzing geeft van de relevante indicatoren voor huiselijk geweld. Meer nog, hiaten in de roosterstructuur openen de vraag naar bijkomende informatie-attributen in de politieverslagen, zodat de agenten, die de basisgegevens verzorgen door hun observatie, bijkomende aandachtpunten krijgen voor hun verslagen. De toepassing van FCA heeft in de eerste plaats geleid tot een verbeterde definitie van het concept van huiselijk geweld. Uit het concept rooster dat voorvloeit uit FCA kunnen immers regels worden afgeleid, met de kansen dat de regel van toepassing is. Dit laat toe om zowel kennisgaten in de definitie te ontdekken, maar ook relevante uitzonderingen. Een belangrijke aandachtsfactor is de combinatorische explosie van het FCA algoritme bij grotere aantallen attributen, waarvoor in eerste instantie gewerkt is met clustering van die attributen, die op basis van de probabiliteiten een sterke vorm van samenhang vertonen. Om een fijnmazige analyse te doen van de

15 SAMENVATTING XIV uitzonderingen is FCA vervolgens uitgebreid en gecombineerd met een emergente (bottum-up) techniek gebaseerd op neurale netwerken Emergent Self Organising Maps. Uit de studies rond huiselijk geweld kwam duidelijk naar voor hoe beide technieken elkaar versterken, en leiden tot vrij sterke karakterisatietechnieken, waarbij tot 90% of meer van de gevallen van huiselijk geweld automatisch ontdekt kunnen worden. Het onderzoek heeft geleid tot een significante verbetering van de aanpak van huiselijk geweld. De Politie heeft in dit geval gezorgd voor een uitstekende basisverzameling van (geanonimiseerde) attributen (kenmerken) bij politieverslagen. De tweede onderzoekslijn, die mee ontwikkeld werd in de gevallenstudie van huiselijk geweld, positioneert FCA als een kennisvernieuwingstechniek. In de literatuur zijn niet zoveel technieken bekend voor de concrete ontwikkeling van innovatieve ideeën. De vroege publicaties over artificiële intelligentie wezen reeds op het belang van de symbolisatie en conceptualisatie van de aanwezige kennis, op een dermate wijze dat mogelijke kennisgaten naar voor komen in de conceptualisatie. Een belangrijke systematisering van kennisvernieuwing is vanuit de design science benadering in de voorbije jaren ontwikkeld in de C-K-theory (Concept-Knowledge-Theory). Daarbij werd echter tot nu toe gebruik gemaakt van hiërarchisch geordende conceptbomen voor de conceptualisatie van bestaande kennis. In deze onderzoekslijn is voor de eerste maal aangetoond hoe FCA binnen de C-K-theory kan gebruikt worden als een techniek voor de disjunctie van bestaande kennis in niet-hiërarchische (partieel geordende) conceptroosters. Bovendien zijn de praktijkervaringen rond de gevallenstudie van huiselijk geweld gepositioneerd als leerprocessen binnen de C-K-theory, met een belangrijke bijdrage voor de conjunctie van nieuwe concepten in kennisvernieuwing. In een derde onderzoekslijn is onderzocht hoe het gebruik van FCA uitgebreid en verfijnd kan worden voor gegevensbronnen met een inherente tijdsdimensie. In de literatuur is een temporele variant van FCA, Temporal Concept Analysis, bekend. Deze techniek is uitgediept in combinatie met FCA voor een gevallenstudie van politie rapporten voor de detectie en opsporing van mogelijke verdachten in de context van mensenhandel. Er is ondermeer vastgesteld hoe de FCA-roosters kunnen gebruikt worden als een voorstelling voor de verschillende mogelijkheden waaronder een verdachte kan evolueren, waarbij eveneens de probabiliteiten van de kenmerken een belangrijke rol spelen. Dit leidt tot een pro-actieve vroege opsporing met een hogere vaststellingsgraad, hetgeen verder is uitgewerkt in een breedschalig raamwerk voor de opsporing van potentiële terreurverdachten, dat vervolgens nu van de nodige software instrumenten wordt voorzien. Tijdens de uitwerking van de derde onderzoekslijn zijn beperkingen naar voor gekomen in de tijdsgerichte voorstelling van concepten, zeker waar het gaat om de structurele voorstelling van de evolutie van concepten in processen (op basis van sequentie, selectie en iteratie). In eerste instantie is daarom in een vierde onderzoekslijn FCA toegepast met events

16 XV SAMENVATTING (gebeurtenissen) in plaats van gegevensattributen. Hierbij worden niet zozeer de conceptroosters gebruikt voor de voorstelling van processen, maar wel Hidden Markov Modellen (HMM). FCA wordt dan wel gebruikt om de relevante (deel)verzamelingen van gebeurtenissen te filteren, vooral in de context van grootschalige gegevensbronnen. FCA helpt ook bij het in kaart brengen van eigenschappen van proces variaties en anomalieën, zoals kwaliteitsproblemen in de zorg verstrekt aan patiënten. Als gevallenstudie is voor deze onderzoekslijn een verzameling van behandelingsgegevens gebruikt conform aan Healthcare Level 7 binnen de context van zorgprocessen voor borstkanker in een daartoe gespecialiseerd ziekenhuis. 470 behandelingsstappen zijn samengebracht tot 65 relevante stappen, die resulteren in een belangrijke vereenvoudiging van het ontdekken van de zorgprocessen vanuit de behandelingsgegevens. De eerste resultaten van deze onderzoekslijn hebben tevens geresulteerd in incrementele verbeteringen in de behandelingsprocessen. Uit de gevallenstudie bleek tevens een belangrijke synergie tussen de ontdekking van data-aspecten versus proces-aspecten. De vijfde onderzoekslijn heeft de mogelijkheden van FCA in verband met event-gebaseerde modellen verder gesystematiseerd in de context van domein modellering binnen software engineering. Daarbij is ondermeer aangetoond hoe de Object-Event tabel binnen de MERODE-methodologie voor software engineering, zich automatisch via FCA laat vertalen is een objectrooster dat isomorf is met de Existence Dependency Graph uit de MERODE-methodologie. FCA voegt algoritme toe aan de MERODE consistentie controle. Het gebruik van meer-waarde-tabellen laat toe om alle consistentieregels rond klassendiagrammen in UML formeel wiskundig uit te drukken. Bovendien kunnen de algoritmes van FCA gebruikt worden om inconsistenties en onvolledigheden in de modellen op te sporen. Hiermee is eigenlijk voor de eerste keer aangetoond dat conceptuele modellen ook effectief het begrip concept in de wiskundige zin van het woord hanteren. Vervolgens werd een driedimensionale variant van FCA verder onderzocht op zijn toepasbaarheid voor het combineren van bestaansafhankelijkheid met generalisatie/specialisatie. Tevens wordt onderzocht hoe FCA probabilistische informatie kan toevoegen aan de cardinaliteiten binnen klassendiagrammen in UML. Tenslotte is in een zesde onderzoekslijn alle relevante literatuur rond FCA in kaart gebracht door middel van FCA. De concepten op basis van FCA zijn geïnventariseerd in een literatuur overzicht, dat een relevant beeld en inzicht geeft in de ongeveer 700 papers rond FCA die verschenen zijn in de periode Deze benadering toont eveneens aan hoe FCA als een meta-techniek kan aangewend worden.

17 SAMENVATTING XVI Het geleverde onderzoekswerk werd gepubliceerd in de volgende artikels: JOURNAL PAPERS GEPUBLICEERD/AANVAARD: 1. Poelmans, J., Elzinga, P., Viaene, S., Van Hulle, M. & Dedene G. (2009). Gaining insight in domestic violence with emergent self organizing maps, Expert systems with applications, 36, (9), [SCI 2009 = 2.908] 2. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2010) Formally Analyzing the Concepts of Domestic Violence, Expert Systems with Applications, In Press, doi /j.eswa Vuylsteke A., Baesens B., Poelmans J. (2010). Consumers search for information on the internet: how and why China differs from Western Europe, Journal of interactive marketing, In Press, doi /j.intmar Poelmans, J., Elzinga, P., Viaene, S., Dedene, G., Van Hulle, M. (2009). Analyzing domestic violence with topographic maps: a comparative study, Lecture Notes in Computer Science, 5629, , Advances in Self-organizing Maps, 7th International Workshop on Self- Organizing Maps (WSOM). St. Augustine, Florida (USA), 8-10 June 2009, Springer. 5. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2009). A case of using formal concept analysis in combination with emergent self organizing maps for detecting domestic violence, Lecture Notes in Computer Science, 5633, , Advances in Data Mining. Applications and Theoretical Aspects, 9th Industrial Conference (ICDM), Leipzig, Germany, July 20-22, 2009, Springer. 6. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2008). An exploration into the power of formal concept analysis for domestic violence analysis, Lecture Notes in Computer Science, 5077, , Advances in Data Mining. Applications and Theoretical Aspects, 8th Industrial Conference (ICDM), Leipzig, Germany, July 16-18, 2008, Springer. 7. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2010), Formal Concept Analysis in Knowledge Discovery: a Survey. Lecture Notes in Computer Science, 6208, , 18 th international conference on conceptual structures (ICCS): from information to intelligence July, Kuching, Sarawak, Malaysia. Springer.

18 XVII SAMENVATTING 8. Manyakov, N., Poelmans, J., Vogels, R., Van Hulle, M. (2010), Combining ESOMs trained on hierarchy of feature subsets for singletrial decoding of LFP responses in monkey area V4. Lecture Notes in Artificial Intelligence, 6114, , 10th International Conference on Artificial Intelligence and Soft Computing (ICAISC). June 13-17, Zakopane, Poland. Springer 9. Poelmans, J., Dedene, G., Verheyden, G., Van der Mussele, H., Viaene, S., Peters, E. (2010). Combining business process and data discovery techniques for analyzing and improving integrated care pathways. Lecture Notes in Computer Science, Advances in Data Mining. Applications and Theoretical Aspects, 10th Industrial Conference (ICDM), Leipzig, Germany, July 12-14, Springer CONFERENCE PROCEEDINGS GEPUBLICEERD/AANVAARD: 1. Poelmans, J., Dedene, G., Snoeck, M. Viaene, S. (2010). Using Formal Concept Analysis for the Verification of Process-Data matrices in Conceptual Domain Models, Proc. IASTED International Conference on Software Engineering (SE 2010), Feb 16-18, Innsbruck, Austria. Acta Press, pp.. 2. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2010). A method based on Temporal Concept Analysis for detecting and profiling human trafficking suspects. Proc. IASTED International Conference on Artificial Intelligence (AIA 2010). Innsbruck, Austria, february. Acta Press ISBN , pp Elzinga, P., Poelmans, J., Viaene, S., Dedene, G. (2009), Detecting domestic violence Showcasing a Knowledge Browser based on Formal Concept Analysis and Emergent Self Organizing Maps, Proc. 11th International Conference on Enterprise Information Systems ICEIS, Volume AIDSS, pp , Milan, Italy, May 6-10, Poelmans, J, Elzinga, P., Van Hulle, M., Viaene, S., and Dedene, G. (2009). How Emergent Self Organizing Maps can help counter domestic violence, World Congress on Computer Science and Information Engineering (CSIE 2009), Los Angeles (USA), Vol. 4, IEEE Computer Society Press ISBN , Vuylsteke, A., Wen, Z., Baesens, B. and Poelmans J. (2009). Consumers Online Information Search: A Cross-Cultural Study between China and Western Europe. Paper presented at Academic And

19 SAMENVATTING XVIII Business Research Institute Conference 2009, Orlando, USA, available at 6. Elzinga, P., Poelmans, J., Viaene, S., Dedene, G., Morsing, S. (2010) Terrorist threat assessment with Formal Concept Analysis. Proc. IEEE International Conference on Intelligence and Security Informatics. May 23-26, 2010 Vancouver, Canada. ISBN /10, Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2010). Concept Discovery Innovations in Law Enforcement: a Perspective, IEEE Computational Intelligence in Networks and Systems Workshop (INCos 2010), Thesalloniki, Greece. 8. Dejaeger, K., Hamers, B., Poelmans, J., Baesens, B. (2010) A novel approach to the evaluation and improvement of data quality in the financial sector, 15 th International Conference on Information Quality (ICIQ 2010) UALR, Little Rock, Arkansas USA, November Poelmans, J., Dedene, G., Verheyden, G., Peters, E., Van Der Mussele, H., Viaene, S. (2010) Concept mining and discovery in clinical pathways, INFORMS Data mining and health informatics (DM-HI) Workshop, Austin, Texas, November JOURNAL PAPERS INGEDIEND: 1. Verheyden, G., Poelmans, J., Viaene, S., Van der Mussele, H., Dedene, G., van Dam, P. (2010). Key Success Factors for significantly improving Patient Satisfaction on Breast Cancer care: a Case Study, submitted for The Breast 2. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2010) Curbing domestic violence: Instantiating C-K theory with Formal Concept Analysis and Emergent Self Organizing Maps, submitted for Intelligent Systems in Accounting, Finance and Management. 3. Poelmans, J., Elzinga, P., Viaene, S., Van Hulle, M. & Dedene G. (2010) Text Mining with Emergent Self Organizing Maps and Multi- Dimensional Scaling: A comparitive study on domestic violence, submitted for Applied Soft Computing. 4. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2010) Formal Concept Analysis in Information Engineering: a Survey, submitted for ACM Computing Surveys.

20 XIX SAMENVATTING 5. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2010) Informatiegestuurd politiewerk: een slimme en nieuwe kijk op de data, submitted for Informatie. 6. Elzinga, P., Poelmans, J., Viaene, S., Dedene, G. (2010) Formele concept analyse: een nieuwe dimensie voor intelligence, submitted for Blauw. BOOK CHAPTER: 1. Poelmans J, Van Hulle M, Elzinga P, Viaene S, Dedene G (2008) Topographic maps for domestic violence analysis. Self-organizing maps and the related tools, pp NEDERLANDSTALIGE PUBLICATIES: 1. Vuylsteke A., Poelmans J., Baesens B. (2009) Online zoekgedrag van consumenten: China vs West-Europa, Business In-Zicht, December, Dejaeger K., Ruelens J., Van Gestel T., Jacobs J., Baesens B., Poelmans J., Hamers B. (2009) Evaluatie en verbetering van de datakwaliteit. Informatie, November, Jaargang 51/9, AWARDS: Nominated for best paper award at 8 th Industrial Conference on Data Mining (ICDM), Leipzig, Germany, July 16-18, 2008 Winner of young professionals best paper award at 9th Industrial Conference on Data Mining (ICDM), Leipzig, Germany, July 20-22, 2009 Winner of best paper award at 10 th Industrial Conference on Data Mining (ICDM), Berlin, Germany, July 12-14, 2010

21

22 CHAPTER 1 Introduction Formal Concept Analysis was originally introduced as a mathematical theory by Rudolf Wille in Between the beginning of 2003 and the end of 2009, over 700 papers have been published in which FCA was used by the authors. We performed a semantic text mining analysis of these papers. We downloaded these 702 pdf-files and built a thesaurus containing terms related to FCA research. We used Lucene to index the abstract, title and keywords of these papers with this thesaurus. After clustering the terms, we obtained several lattices summarizing the most notorious FCA-related research topics. While exploring the literature, we found FCA to be an interesting metatechnique for clustering and categorizing papers in different research topics. Over the years FCA has found its way from mathematics to computer science, resulting in numerous applications in knowledge discovery (20% of papers), information retrieval (15% of papers), ontology engineering (13 % of papers) and software engineering (15% of papers). 18 % of the papers described extensions of traditional FCA such as fuzzy FCA and rough FCA. In this thesis we filled in some of the gaps in the existing literature (Poelmans et al. 2010g). During the past 20 years, the amount of unstructured data available for analysis has been ever-increasing. Today, 90% of the information available to police organizations resides in textual form. We investigated the possibilities of FCA as a human-centered instrument for distilling new knowledge from these data. FCA was found to be particularly useful for exploring and refining the underlying concepts of the data. To cope with scalability issues, we combined its use with Emergent Self Organising Maps. This neural network technique helped us gain insight in the overall distribution of the data and the combination with FCA was found to have significant synergistic results. The knowledge extraction process was framed in the C-K design theory. At the basis of the method are multiple successive iterations through the design square consisting of a concept and knowledge space. The knowledge space consists of the information used to steer the

23 INTRODUCTION XXII action environment, while this information is put under scrutiny in the concept space. The first real life case study zoomed in on the problem of domestic violence at the Amsterdam-Amstelland police. We exposed multiple anomalies and inconsistencies in the data and were able to improve the employed definition of domestic violence. An important spin-off of this KDD exercise was the development of a highly accurate and comprehensible rulebased case labelling system. This system is currently used to automatically assign a label to 75% of incoming cases whereas in the past all cases had to be dealt with manually. Besides analysing data containing relations between objects and attributes, we analysed data containing objects and events. The binary relation of FCA changes its meaning from object o has attribute a to object o participates in event a. We studied FCA in relation to the software engineering methodology MERODE. We found that the lattice, automatically obtained from the Object-Event Table, is isomorphic to the Existence Dependency Graph (EDG) in MERODE. We also expanded traditional FCA theory with a third dimension. Traditional FCA only imposes one ordering relation on concepts, namely the subconcept-superconcept relation, whereas for the representation of models containing both ED and inheritance relationships, an extra ordering relationship or dimension is needed. Finally, we framed the requirements engineering process based on FCA in C-K theory. All steps in this process, namely elaboration, verification, modification and validation of the model, can be considered as the four operators of the design square. A case study with 20 students confirmed the method's practical usefulness. For the analysis of other phenomena such as human trafficking and terrorism threat, a complicating factor is the inherent time dimension in the data. We applied the temporal variant of FCA, namely Temporal Concept Analysis (TCA), to the unstructured text in a large set of police reports. The aim was to distill potential subjects for further investigation. In both case studies, TCA was found to give interesting insights into the evolution of subjects over time. Amongst other things, several (to the police unknown) persons involved in human trafficking or the recruitment of future potential jihadists were distilled from the data. The intuitive visual interface allowed for an effective interaction between the police officer who used to be numbed by the overload of information, and the data. In a subsequent step, we combined data discovery based on FCA with process discovery using Hidden Markov Models (HMM). This combination was used to gain insight into a breast cancer care process by analyzing patient treatment data. Multiple quality of care issues were exposed that will be resolved in the near future

Terrorist Threat Assessment with Formal Concept Analysis

Terrorist Threat Assessment with Formal Concept Analysis Terrorist Threat Assessment with Formal Concept Analysis Paul Elzinga Department of Business Information Police Amsterdam-Amstelland Amsterdam, The Netherlands Paul.Elzinga@amsterdam.politie.nl Jonas Poelmans

More information

Combining business process and data discovery techniques for analyzing and improving integrated care pathways

Combining business process and data discovery techniques for analyzing and improving integrated care pathways Combining business process and data discovery techniques for analyzing and improving integrated care pathways Jonas Poelmans 1, Guido Dedene 1,4, Gerda Verheyden 5, Herman Van der Mussele 5, Stijn Viaene

More information

About Universality and Flexibility of FCA-based Software Tools

About Universality and Flexibility of FCA-based Software Tools About Universality and Flexibility of FCA-based Software Tools A.A. Neznanov, A.A. Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia ANeznanov@hse.ru,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Examen Software Engineering 2010-2011 05/09/2011

Examen Software Engineering 2010-2011 05/09/2011 Belangrijk: Schrijf je antwoorden kort en bondig in de daartoe voorziene velden. Elke theorie-vraag staat op 2 punten (totaal op 24). De oefening staan in totaal op 16 punten. Het geheel staat op 40 punten.

More information

Full-text Search in Intermediate Data Storage of FCART

Full-text Search in Intermediate Data Storage of FCART Full-text Search in Intermediate Data Storage of FCART Alexey Neznanov, Andrey Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia ANeznanov@hse.ru,

More information

George J. Klir. State University of New York (SUNY) Binghamton, New York 13902, USA gklir@binghamton.edu

George J. Klir. State University of New York (SUNY) Binghamton, New York 13902, USA gklir@binghamton.edu POSSIBILISTIC INFORMATION: A Tutorial Combining business process & data discovery techniques for analyzing and improving integrated care pathways George J. Klir Dr. Jonas Poelmans (Katholieke Universiteit

More information

GMP-Z Annex 15: Kwalificatie en validatie

GMP-Z Annex 15: Kwalificatie en validatie -Z Annex 15: Kwalificatie en validatie item Gewijzigd richtsnoer -Z Toelichting Principle 1. This Annex describes the principles of qualification and validation which are applicable to the manufacture

More information

Risk-Based Monitoring

Risk-Based Monitoring Risk-Based Monitoring Evolutions in monitoring approaches Voorkomen is beter dan genezen! Roelf Zondag 1 wat is Risk-Based Monitoring? en waarom doen we het? en doen we het al? en wat is lastig hieraan?

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Is het nodig risico s te beheersen op basis van een aanname..

Is het nodig risico s te beheersen op basis van een aanname.. Is het nodig risico s te beheersen op basis van een aanname.. De mens en IT in de Zorg Ngi 19 april 2011 René van Koppen Agenda Er zijn geen feiten, slechts interpretaties. Nietzsche Geen enkele interpretatie

More information

CO-BRANDING RICHTLIJNEN

CO-BRANDING RICHTLIJNEN A minimum margin surrounding the logo keeps CO-BRANDING RICHTLIJNEN 22 Last mei revised, 2013 30 April 2013 The preferred version of the co-branding logo is white on a Magenta background. Depending on

More information

PoliticalMashup. Make implicit structure and information explicit. Content

PoliticalMashup. Make implicit structure and information explicit. Content 1 2 Content Connecting promises and actions of politicians and how the society reacts on them Maarten Marx Universiteit van Amsterdam Overview project Zooming in on one cultural heritage dataset A few

More information

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results , pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Relationele Databases 2002/2003

Relationele Databases 2002/2003 1 Relationele Databases 2002/2003 Hoorcollege 5 22 mei 2003 Jaap Kamps & Maarten de Rijke April Juli 2003 Plan voor Vandaag Praktische dingen 3.8, 3.9, 3.10, 4.1, 4.4 en 4.5 SQL Aantekeningen 3 Meer Queries.

More information

The information in this report is confidential. So keep this report in a safe place!

The information in this report is confidential. So keep this report in a safe place! Bram Voorbeeld About this Bridge 360 report 2 CONTENT About this Bridge 360 report... 2 Introduction to the Bridge 360... 3 About the Bridge 360 Profile...4 Bridge Behaviour Profile-Directing...6 Bridge

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

VR technologies create an alternative reality in

VR technologies create an alternative reality in Dossier: REPAR: Design through exploration Getting started with Virtual Reality As a designer you might be familiar with various forms of concept representations, such as (animated) sketches, storyboards

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Inhoudsopgave. Vak: Web Analysis 1 Vak: Web Information 1 Vak: Web Science 2 Vak: Web Society 3 Vak: Web Technology 4

Inhoudsopgave. Vak: Web Analysis 1 Vak: Web Information 1 Vak: Web Science 2 Vak: Web Society 3 Vak: Web Technology 4 Minor Web Science I Inhoudsopgave Vak: Web Analysis 1 Vak: Web Information 1 Vak: Web Science 2 Vak: Web Society 3 Vak: Web Technology 4 II Web Analysis Vakcode X_401065 () dr. Z. Szlavik Docent(en) dr.

More information

CMMI version 1.3. How agile is CMMI?

CMMI version 1.3. How agile is CMMI? CMMI version 1.3 How agile is CMMI? A small poll Who uses CMMI without Agile? Who uses Agile without CMMI? Who combines both? Who is interested in SCAMPI? 2 Agenda Big Picture of CMMI changes Details for

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu

More information

Maximizer Synergy. info@adafi.be BE 0415.642.030. Houwaartstraat 200/1 BE 3270 Scherpenheuvel. Tel: +32 495 300612 Fax: +32 13 777372

Maximizer Synergy. info@adafi.be BE 0415.642.030. Houwaartstraat 200/1 BE 3270 Scherpenheuvel. Tel: +32 495 300612 Fax: +32 13 777372 Maximizer Synergy Adafi Software is een samenwerking gestart met Advoco Solutions, een Maximizer Business Partner uit Groot Brittannië. Advoco Solutions heeft een technologie ontwikkeld, genaamd Synergy,

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

De rol van requirements bij global development

De rol van requirements bij global development De rol van requirements bij global development 19 & 25 november 2008 Rini van Solingen Requirements zijn een noodzakelijk kwaad Immers, als wij elkaars gedachten konden lezen hadden we geen requirements

More information

David Martens. David.Martens@econ.kuleuven.be

David Martens. David.Martens@econ.kuleuven.be David Martens David.Martens@econ.kuleuven.be Professional experience Since 2008 Katholieke Universiteit Leuven Post-doctoral researcher (FWO) Substitute Lecturer Management Informatics Department Hogeschool

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Uw partner in system management oplossingen

Uw partner in system management oplossingen Uw partner in system management oplossingen User Centric IT Bring your Own - Corporate Owned Onderzoek Forrester Welke applicatie gebruik je het meest op mobiele devices? Email 76% SMS 67% IM / Chat 48%

More information

The Application of Association Rule Mining in CRM Based on Formal Concept Analysis

The Application of Association Rule Mining in CRM Based on Formal Concept Analysis The Application of Association Rule Mining in CRM Based on Formal Concept Analysis HongSheng Xu * and Lan Wang College of Information Technology, Luoyang Normal University, Luoyang, 471022, China xhs_ls@sina.com

More information

Corporate Security & Identity

Corporate Security & Identity ir. Yvan De Mesmaeker Secretary general ir. Yvan De Mesmaeker Secretary general of the European Corporate Security Association - ECSA Education: MSc in Engineering Professional responsibilities: Secretary

More information

Citrix Access Gateway: Implementing Enterprise Edition Feature 9.0

Citrix Access Gateway: Implementing Enterprise Edition Feature 9.0 coursemonstercom/uk Citrix Access Gateway: Implementing Enterprise Edition Feature 90 View training dates» Overview Nederlands Deze cursus behandelt informatie die beheerders en andere IT-professionals

More information

A Knowledge Management Framework Using Business Intelligence Solutions

A Knowledge Management Framework Using Business Intelligence Solutions www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For

More information

Information Need Assessment in Information Retrieval

Information Need Assessment in Information Retrieval Information Need Assessment in Information Retrieval Beyond Lists and Queries Frank Wissbrock Department of Computer Science Paderborn University, Germany frankw@upb.de Abstract. The goal of every information

More information

CyberDEW Een Distributed Early Warning Systeem ten behoeve van Cyber Security

CyberDEW Een Distributed Early Warning Systeem ten behoeve van Cyber Security THALES NEDERLAND B.V. AND/OR ITS SUPPLIERS. THIS INFORMATION CARRIER CONTAINS PROPRIETARY INFORMATION WHICH SHALL NOT BE USED, REPRODUCED OR DISCLOSED TO THIRD PARTIES WITHOUT PRIOR WRITTEN AUTHORIZATION

More information

Data Mining for Knowledge Management in Technology Enhanced Learning

Data Mining for Knowledge Management in Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Influence Discovery in Semantic Networks: An Initial Approach

Influence Discovery in Semantic Networks: An Initial Approach 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing

More information

Assuring the Cloud. Hans Bootsma Deloitte Risk Services hbootsma@deloitte.nl +31 (0)6 1098 0182

Assuring the Cloud. Hans Bootsma Deloitte Risk Services hbootsma@deloitte.nl +31 (0)6 1098 0182 Assuring the Cloud Hans Bootsma Deloitte Risk Services hbootsma@deloitte.nl +31 (0)6 1098 0182 Need for Assurance in Cloud Computing Demand Fast go to market Support innovation Lower costs Access everywhere

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Format samenvatting aanvraag. Algemeen Soort aanvraag (kruis aan wat van toepassing is):

Format samenvatting aanvraag. Algemeen Soort aanvraag (kruis aan wat van toepassing is): Format samenvatting aanvraag Algemeen Soort aanvraag (kruis aan wat van toepassing is): Naam instelling Contactpersoon/contactpersonen Contactgegevens Opleiding Naam (Nederlands en evt. Engels) Graad Inhoud

More information

COOLS COOLS. Cools is nominated for the Brains Award! www.brainseindhoven.nl/nl/top_10/&id=507. www.cools-tools.nl. Coen Danckmer Voordouw

COOLS COOLS. Cools is nominated for the Brains Award! www.brainseindhoven.nl/nl/top_10/&id=507. www.cools-tools.nl. Coen Danckmer Voordouw Name Nationality Department Email Address Website Coen Danckmer Voordouw Dutch / Nederlands Man and Activity info@danckmer.nl www.danckmer.nl Project: Image: Photographer: Other images: COOLS CoenDVoordouw

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Cluster analysis and Association analysis for the same data

Cluster analysis and Association analysis for the same data Cluster analysis and Association analysis for the same data Huaiguo Fu Telecommunications Software & Systems Group Waterford Institute of Technology Waterford, Ireland hfu@tssg.org Abstract: Both cluster

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Study and Analysis of Data Mining Concepts

Study and Analysis of Data Mining Concepts Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

More information

CitationBase: A social tagging management portal for references

CitationBase: A social tagging management portal for references CitationBase: A social tagging management portal for references Martin Hofmann Department of Computer Science, University of Innsbruck, Austria m_ho@aon.at Ying Ding School of Library and Information Science,

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Politecnico di Torino Porto Institutional Repository [Proceeding] NEMICO: Mining network data through cloud-based data mining techniques Original Citation: Baralis E.; Cagliero L.; Cerquitelli T.; Chiusano

More information

Vergaderen in de toekomst Wilfried Post, Anita Cremers en Wessel Kraaij

Vergaderen in de toekomst Wilfried Post, Anita Cremers en Wessel Kraaij Vergaderen in de toekomst Wilfried Post, Anita Cremers en Wessel Kraaij Probleemstelling We vergaderen veel Iedere dag wel een keer Soms tot 50% van onze tijd En dat gaat toenemen Binnen en buiten de organisatie,

More information

Chapter ML:XI. XI. Cluster Analysis

Chapter ML:XI. XI. Cluster Analysis Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Managing and Tracing the Traversal of Process Clouds with Templates, Agendas and Artifacts

Managing and Tracing the Traversal of Process Clouds with Templates, Agendas and Artifacts Managing and Tracing the Traversal of Process Clouds with Templates, Agendas and Artifacts Marian Benner, Matthias Book, Tobias Brückmann, Volker Gruhn, Thomas Richter, Sema Seyhan paluno The Ruhr Institute

More information

An Introduction to Health Informatics for a Global Information Based Society

An Introduction to Health Informatics for a Global Information Based Society An Introduction to Health Informatics for a Global Information Based Society A Course proposal for 2010 Healthcare Industry Skills Innovation Award Sponsored by the IBM Academic Initiative submitted by

More information

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,

More information

IP-NBM. Copyright Capgemini 2012. All Rights Reserved

IP-NBM. Copyright Capgemini 2012. All Rights Reserved IP-NBM 1 De bescheidenheid van een schaker 2 Maar wat betekent dat nu 3 De drie elementen richting onsterfelijkheid Genomics Artifical Intelligence (nano)robotics 4 De impact van automatisering en robotisering

More information

A Framework of Personalized Intelligent Document and Information Management System

A Framework of Personalized Intelligent Document and Information Management System A Framework of Personalized Intelligent and Information Management System Xien Fan Department of Computer Science, College of Staten Island, City University of New York, Staten Island, NY 10314, USA Fang

More information

Sample test Secretaries/administrative. Secretarial Staff Administrative Staff

Sample test Secretaries/administrative. Secretarial Staff Administrative Staff English Language Assessment Paper 3: Writing Time: 1¼ hour Secretarial Staff Administrative Staff Questions 1 17 Text 1 (questions 1 8) Assessment Paper 3 Writing Part 1 For questions 1-8, read the text

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Specification by Example (methoden, technieken en tools) Remco Snelders Product owner & Business analyst

Specification by Example (methoden, technieken en tools) Remco Snelders Product owner & Business analyst Specification by Example (methoden, technieken en tools) Remco Snelders Product owner & Business analyst Terminologie Specification by Example (SBE) Acceptance Test Driven Development (ATDD) Behaviour

More information

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD 72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is

More information

International Journal of Electronics and Computer Science Engineering 1449

International Journal of Electronics and Computer Science Engineering 1449 International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Self-test SQL Workshop

Self-test SQL Workshop Self-test SQL Workshop Document: e0087test.fm 07/04/2010 ABIS Training & Consulting P.O. Box 220 B-3000 Leuven Belgium TRAINING & CONSULTING INTRODUCTION TO THE SELF-TEST SQL WORKSHOP Instructions The

More information

MAYORGAME (BURGEMEESTERGAME)

MAYORGAME (BURGEMEESTERGAME) GATE Pilot Safety MAYORGAME (BURGEMEESTERGAME) Twan Boerenkamp Who is it about? Local council Beleidsteam = GBT or Regional Beleidsteam = RBT Mayor = Chairman Advisors now = Voorlichting? Official context

More information

Hoe bestuurt u de cloud?

Hoe bestuurt u de cloud? Hoe bestuurt u de cloud? Over SDDC en cloud management met VMware Viktor van den Berg Senior Consultant 1 oktober 2013 Even voorstellen 2 Viktor van den Berg Senior Consultant @ PQR Focus op Server Virtualisatie

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

More information

APPLYING CASE BASED REASONING IN AGILE SOFTWARE DEVELOPMENT

APPLYING CASE BASED REASONING IN AGILE SOFTWARE DEVELOPMENT APPLYING CASE BASED REASONING IN AGILE SOFTWARE DEVELOPMENT AIMAN TURANI Associate Prof., Faculty of computer science and Engineering, TAIBAH University, Medina, KSA E-mail: aimanturani@hotmail.com ABSTRACT

More information

2 AIMS: an Agent-based Intelligent Tool for Informational Support

2 AIMS: an Agent-based Intelligent Tool for Informational Support Aroyo, L. & Dicheva, D. (2000). Domain and user knowledge in a web-based courseware engineering course, knowledge-based software engineering. In T. Hruska, M. Hashimoto (Eds.) Joint Conference knowledge-based

More information

COMPARING MATRIX-BASED AND GRAPH-BASED REPRESENTATIONS FOR PRODUCT DESIGN

COMPARING MATRIX-BASED AND GRAPH-BASED REPRESENTATIONS FOR PRODUCT DESIGN 12 TH INTERNATIONAL DEPENDENCY AND STRUCTURE MODELLING CONFERENCE, 22 23 JULY 2010, CAMBRIDGE, UK COMPARING MATRIX-BASED AND GRAPH-BASED REPRESENTATIONS FOR PRODUCT DESIGN Andrew H Tilstra 1, Matthew I

More information

ruimtelijk ontwikkelingsbeleid

ruimtelijk ontwikkelingsbeleid 38 S a n d e r O u d e E l b e r i n k Digitale bestemmingsplannen 3D MODELLING OF TOPOGRAPHIC Geo-informatie OBJECTS en BY FUSING 2D MAPS AND LIDAR DATA ruimtelijk ontwikkelingsbeleid in Nederland INTRODUCTION

More information

Integration of Process Simulation and Data Mining Techniques for the Analysis and Optimization of Process Systems. Balazs Balasko

Integration of Process Simulation and Data Mining Techniques for the Analysis and Optimization of Process Systems. Balazs Balasko Theses of the doctoral (PhD) dissertation Integration of Process Simulation and Data Mining Techniques for the Analysis and Optimization of Process Systems Balazs Balasko University of Pannonia PhD School

More information

FCART: A New FCA-based System for Data Analysis and Knowledge Discovery

FCART: A New FCA-based System for Data Analysis and Knowledge Discovery FCART: A New FCA-based System for Data Analysis and Knowledge Discovery A.A. Neznanov, D.A. Ilvovsky, S.O. Kuznetsov National Research University Higher School of Economics, Pokrovskiy bd., 11, 109028,

More information

Visualization of Breast Cancer Data by SOM Component Planes

Visualization of Breast Cancer Data by SOM Component Planes International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian

More information

Selbo 2 an Environment for Creating Electronic Content in Software Engineering

Selbo 2 an Environment for Creating Electronic Content in Software Engineering BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 3 Sofia 2009 Selbo 2 an Environment for Creating Electronic Content in Software Engineering Damyan Mitev 1, Stanimir

More information

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals

More information

LDA Based Security in Personalized Web Search

LDA Based Security in Personalized Web Search LDA Based Security in Personalized Web Search R. Dhivya 1 / PG Scholar, B. Vinodhini 2 /Assistant Professor, S. Karthik 3 /Prof & Dean Department of Computer Science & Engineering SNS College of Technology

More information

The University of Jordan

The University of Jordan The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S

More information

LIACS Fundamentals. Jetty Kleijn Informatica Bachelorklas 2015-12-01

LIACS Fundamentals. Jetty Kleijn Informatica Bachelorklas 2015-12-01 LIACS Fundamentals Jetty Kleijn Informatica Bachelorklas 2015-12-01 Discover the Discover world at the Leiden world University at Leiden University Research at LIACS Two clusters Algorithms and Software

More information

How to manage Business Apps - Case for a Mobile Access Strategy -

How to manage Business Apps - Case for a Mobile Access Strategy - How to manage Business Apps - Case for a Mobile Access Strategy - Hans Heising, Product Manager Gábor Vida, Manager Software Development RAM Mobile Data 2011 Content Introduction 2 Bring your own device

More information

Cleaned Data. Recommendations

Cleaned Data. Recommendations Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110

More information

Concept Design. Gert Landheer Mark van den Brink Koen van Boerdonk

Concept Design. Gert Landheer Mark van den Brink Koen van Boerdonk Concept Design Gert Landheer Mark van den Brink Koen van Boerdonk Content Richness of Data Concept Design Fast creation of rich data which eventually can be used to create a final model Creo Product Family

More information

Co-creation & Co-creativity

Co-creation & Co-creativity Co-creation & Co-creativity Innovating together with end-users off-line and on-line Liliane.Kuiper@TNO.nl +31 651085354 Co-creation & Co-creativity 1 What are we talking about? 2 Co-creativity steps (off-line)

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Timeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

Timeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining? Text Mining 2004-2005 Master TKI Antal van den Bosch en Walter Daelemans http://ilk.uvt.nl/~antalb/textmining/ Dinsdag, 10.45-12.30, SZ33 Timeline (1) [1 februari 2005] Introductie (WD) [15 februari 2005]

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Principles of Fund Governance BNP Paribas Investment Partners Funds (Nederland) N.V.

Principles of Fund Governance BNP Paribas Investment Partners Funds (Nederland) N.V. Principles of Fund Governance BNP Paribas Investment Partners Funds (Nederland) N.V. Versie november 2012 Inleiding Het doel van de Principles of Fund Governance (verder Principles ) is het geven van nadere

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information