Text Mining and Qualitative Analysis of an IT History Interview Collection
|
|
- Lambert Wright
- 8 years ago
- Views:
Transcription
1 Text Mining and Qualitative Analysis of an IT History Interview Collection Petri Paju 1, Eric Malmi 2, and Timo Honkela 2 1 Cultural History Department, University of Turku, Finland petpaju@utu.fi 2 Department of Information and Computer Science Aalto University School of Science and Technology, Finland eric.malmi@gmail.com, timo.honkela@tkk.fi Abstract. In this paper, we explore the possibility of applying a text mining method on a large qualitative source material concerning the history of information technology in one nation. This data was collected in the Swedish documentation project From Computing Machines to IT. We apply text mining on the interview transcripts of this Swedish documentation project. Specifically, we seek to group the interviews according to their central themes and affinities and pinpoint the most relevant interviews for specific research questions. In addition, we search for interpersonal links between the interviews. We apply a method called the self-organizing map that can be used to create a similarity diagram of the interviews. We then discuss the results in several contexts including the possible future uses of text mining in researching history. Keywords: Documentation project, IBM, IT-history, methods development, self-organizing map (SOM), Sweden, text mining. 1 Introduction Research on the history of computing has, for a considerable time, involved developing methods, especially the application of oral history methods for conducting interviews, as Thomas Misa wrote in the book History of Nordic computing 2 [1]. However, until recently, the field has shown limited interest in developing methods of its own or in using computational methods. Notwithstanding, new experiences from a couple of projects has brought attention to using the internet and other electronic means in creating sources for research. This contrasts with the tradition of collecting source material from the archives or publications but is closer to the established practices of interviewing for research purposes. One of these recent projects is the Swedish documentation project Från matematikmaskin till IT ( in English From Computing Machines to IT, which during had produced a large database concerning the Swedish history of information technology. This collection of information includes among other things (more than) 160 interviews and 50 organized group discussions, or witness seminars, almost all available as transcripts in the Internet [2]. They are in Swedish language, and therefore understandable in most of the Nordic area. Nevertheless, because of its J. Impagliazzo, P. Lundin, and B. Wangler (Eds.): HiNC3, IFIP AICT 350, pp , IFIP International Federation for Information Processing 2011
2 434 P. Paju, E. Malmi, and T. Honkela formidable size, the data is difficult to handle as a whole and especially so to a non- Swede or non-expert who is unfamiliar with the many details of the subjects, people and (national) topics. In this paper, we explore the possibility of applying text mining on the interview transcripts of the Swedish documentation project. Specifically, we seek to group the interviews according to their central themes and affinities and pinpoint the most relevant interviews for specific research questions. In addition, we have searched for links between the interviews and we use those links to outline interpersonal connections of the group of interviewees. However, our study includes seventy-four interviews so it does not cover all of the interviews conducted in the project. The choice of the interviews was based on the selection of transcribed texts at the beginning of this study. We apply a method called the self-organizing map [3] that can be used to create a similarity diagram of the interviews [4]. In the following, we shortly introduce the method, present the analysis results and discuss them in the contexts of the history of computing, the Nordic countries and as a methodological tool for future challenges with the masses of historical electronic sources. 2 Methods The self-organizing map (SOM) algorithm [3] has been developed for the analysis and visualization of large masses of complex data. It projects data non-linearly on to a two-dimensional plane in such a way that the original structure of the data is retained as well as possible. The outcome of the SOM analysis is a map in which entities, such as people, words, sentences, or documents, are positioned on the map according to similarity with respect to some property. The SOM serves several analytical functions. First, it provides a mapping from a high dimensional space into a lowdimensional space, thus providing a suitable means for analysis and visualization of complex data. Second, the SOM reveals topological structure of the data. The topological distance between two points in the map is proportional to the distance between the points in the original input space. The SOM algorithm originally grew out of early neural network models, especially models of associative memory and adaptive learning [5]. The underlying motivation was to explain the spatial organization of the brain s functions, as observed especially in the cerebral cortex. Nonetheless, the SOM was not the first step in that direction. The spatially ordered line detectors of von der Malsburg [6] and the neural field model of Amari [7] preceded the development of the self-organizing map. However, the self-organizing power of these early models was rather weak. The crucial invention of Kohonen was to introduce a system model that is composed of at least two interacting subsystems of different nature. One of these subsystems is a competitive neural network that implements the winner-take-all function, but there is also another subsystem that is controlled by the neural network and which modifies the local synaptic plasticity of the neurons in learning [8]. The learning is restricted spatially to the local neighborhood of the most active neurons. Only by means of the separation of the neural signal transfer and the plasticity control has it become possible to implement an effective and robust self-organizing system [9].
3 Text Mining and Qualitative Analysis of an IT History Interview Collection 435 From a present-day point of view, the SOM, as an unsupervised statistical machine learning method, compares to classical unsupervised quantitative research methods such as multidimensional scaling or clustering. The SOM has been extensively used to analyze numerical data in many areas, including various branches of industry, medicine, and economics and the number of references to SOM-based research articles is currently over eight thousand [10 12]. This popularity of the SOM in a large area of scientific disciplines has raised Kohonen to be (one of) the most cited Finnish scientists in any field. The use of the SOM has also been extended into the analysis of text data (see, e.g., [4, 13]). It can be used for the study of large amounts of material such as s, web sites, interview transcripts, etc. Janasik et al. have shown that the SOM-based text mining process improves the quality of the inferences drawn by researchers doing qualitative research [14]. The SOM specifies a holistic conceptual space. The meaning of some item in an analysis is not based on a predefined definition but is the emergent result of a number of encounters in which the item is used in some context. Moreover, the emergent prototypes on the map are not isolated instances (as in many forms of cluster analysis), but they influence each other in the adaptive formation process [14]. In its emphasis on the grounded nature of knowledge, the SOM approach aligns well with the central epistemological presuppositions of traditional and revised grounded theory (see e.g. [15]). In grounded theory, the concepts and conclusions are drawn from the research material rather than determined beforehand as hypotheses to be tested. This is also the approach taken commonly in modern historical research. Fig. 1. The basic processes in creating map of interviews. In our analysis, the interviewees are automatically positioned on the SOM according to the central themes discussed in their interviews (see Fig. 1 as an illustration of the overall process). The themes are automatically extracted from the texts using language-independent keyphrase extraction (Likey) method [16]. Likey utilizes a reference corpus to highlight keyphrases that occur frequently in an interview compared with the reference corpus. Likey approach to extracting keyphrases is based on observations about the statistical properties of natural language. We use the distribution of phrase
4 436 P. Paju, E. Malmi, and T. Honkela frequencies in texts to determine the significance of a phrase in a document by comparing it to the corresponding frequency in a baseline reference. In a related method introduced by Damerau, terms are ranked according to the likelihood ratio and the top m terms are used as index terms [17]. Likey produces keyphrases using relative ranks of n-gram frequencies. It is a simple language-independent method: the only language-specific component is a reference corpus in the corresponding language. Likey does not utilize any commonly used language-dependent (pre)processing such as stemming, stop word lists, part-of-speech tagging or syntactic parsing [16]. The availability of the software implementing the method may be requested from one of the authors (TH). In the present analysis, we use a concatenation of all the interviews as the reference corpus for the Likey method. The effect of using Likey is that we are reasonably independent of researchers biases. On the other hand, the choice of the reference corpus is a subjective matter. Choosing the whole corpus of interviews could be a rather neutral alternative but it has some specific effects. For instance, it does not select the acronym IBM as a keyword by the method as it is common in this corpus in general. If some other corpus, such as Europarl (the extracted proceedings of the EU Parliament), were used a word like IBM would end up in the list of keyphrases. When the keyphrases have been selected (this can also be conducted manually instead of using the Likey method), a term-document matrix can be formed. In the matrix, each row corresponds to a document (interview), and each term (keyphrase) is represented by a column. The cells in the matrix then contain the number of occurrences of each term in each document. We may normalize the matrix by dividing the number in each cell by the row sum. This operation makes sure that documents of different length are considered in the analysis in a balanced manner. This matrix can then be given as input to some statistical software package that contains the SOM algorithm. In our case, we have used Matlab software, specifically the SOM Toolbox developed for Matlab (available for free at One can find more detailed information on using SOM-based text mining for qualitative research in [14]. In addition to the central themes, we also extract the references between the interviewees. Specifically, a directed link from interviewee A to interviewee B is drawn if A mentions B s surname in the interview. Thus, we get a social network of the interviewees drawn from top of the SOM. 3 Results The resulting SOM is presented in Fig. 2. The interviewees that have had common themes discussed in their interviews are located close to each other. In a color picture, the lighter the area between two people, the more in common their interviews have. To visualize different perspectives of this SOM, a terminology distribution diagram for each central theme is created (in short: theme map). The terminology distribution diagrams of the most popular themes are presented in Fig. 3 and Fig. 4. A dark area on a terminology distribution diagram implicates that on the SOM the interviewees within the same area have discussed the theme indicated in the title of
5 Text Mining and Qualitative Analysis of an IT History Interview Collection 437 the sub-diagram. For example, from the diagram on numerisk analysis (numerical analysis) one can see that the interviewees on the bottom-left corner of the SOM have mentioned numerical analysis in their interviews. Furthermore, some of these interviewees have also been discussing human-computer interaction (Fig. 4: människa-datorinteraktion ). Fig. 2. A self-organizing map of interviewees based on interviews produced in the Swedish documentation project Från matematikmaskin till IT. The interviewees that have had common themes discussed in their interviews are located close to each other. Also references between the interviewees are indicated: a directed link from a person to another is drawn if the first mentions the surname of the other in his or her interview.
6 438 P. Paju, E. Malmi, and T. Honkela Fig. 3. A terminology distribution diagram of twenty-four themes commonly discussed in the interviews. Areas with dark color in each sub-diagram indicate that the theme has been handled in the interviews that have been mapped in the corresponding area on the SOM. For instance, the lower right corner of the sub-diagram billerudprojektet is dark which coincides with the corner where the surnames Wedell and Frisk are located (see Fig. 2). To learn how to interpret the theme maps imagining them on top of the SOM map (Fig. 2), see Fig. 5 (on the Billerud project). Comparing the theme area of the human-computer interaction with the SOM map (Fig. 2), we can see that the interviewees (Lars) Kjelldahl and (Anita) Kollerbaur and perhaps their neighboring interviewees discussed this topic. After looking at other theme area maps, we notice these two have also talked about Simula (the programming language), vektorgrafik (vector graphics), and to a lesser extent färgbildsskrivaren (color printer) as well as the already mentioned numerical analysis. The terminology distribution diagrams thus allow us to find the interviewees who have mentioned certain themes. It has to be emphasized that each diagram illuminates a particular aspect of the same map. Thus, a particular area on each diagram, for instance the upper right corner, always refers to the same items, in this case particular persons and their interviews. The theme maps can also be used for finding correlations between the themes. There is, e.g., an intuitive connection between the planes patienterna (patients) and medicinsk information (medical information). In addition, some less obvious connections can also be found, such as:
7 Text Mining and Qualitative Analysis of an IT History Interview Collection 439 Fig. 4. A terminology distribution diagram of another seventeen themes commonly discussed in the interviews (see the caption of Fig. 3 for more details) försäkringskassan (social insurance office) kommunförbundet (association of local governments) and ekonomisystem (accounting system) datacentralerna (computer centers). A total of 188 interpersonal references were found from the interviews. These connections are visualized with arrows (ties) and they include 35 mutual ties. Majority of the ties are short suggesting that two interviewees, one of whom has mentioned the other, are likely to have discussed similar topics as well. In other words, the structure of the map based on the contents of the interviews coincides well with the structure of the social network. From yet another point of view, one can say that people who have similar interests also tend to know each other. If the same number, i.e. 188, interpersonal references were created randomly between the interviewees, the average length of the ties would be considerably longer than in this real case, and the network diagram would appear much more complex. The social network also brings out the interviewees with many references. Three interviewees with the most references are Börje Langefors (13 references), Björn Tell (7), and Gunnar Wedell (7). 4 Discussion and Conclusions In this section, the above results will be discussed in the contexts of the history of computing, especially in the Nordic countries and as a methodological tool for future challenges with the masses of historical electronic sources.
8 440 P. Paju, E. Malmi, and T. Honkela Fig. 5. An enlarged terminology distribution diagram or theme map shown on top of the SOM map indicates which interviewees talked about the Billerud project. They were, from right to left, (Karl Johan) Åström, (Torsten) Bohlin, (Tage) Frisk, and (Ingemar) Ringström. As a test case towards qualitative analysis, we have utilized the SOM maps especially in studying IBM in Sweden. With the help of, for instance, keywords provided by the editors of the interviews, we identified the bottom right hand corner as having a high concentration of IBM related people and topics. From the theme area maps we see that billerudprojektet (Billerud project), systemprogrammerare (systems programming), processreglering (process control), and some other themes come up in that section of the SOM. A closer look at the interviews indicated that this corner has gathered interviews about the IBM Nordic Laboratory, established in 1960 near Stockholm and clearly one focus point of the documentation project. Further, reading the close-by interviews we can quickly find other related interviews, such as
9 Text Mining and Qualitative Analysis of an IT History Interview Collection 441 the ones dealing with the ALGOL compiler project of the IBM laboratory. Especially Bengt Gällmo and Birgitta Frejhagen talked about the ALGOL project. Another job of the IBM laboratory was a process control project in Billerud Company, depicted in one of the theme maps (see also Fig. 5). We thus get a good chance of finding most of the relevant information on such details promptly. However, the big picture of, for instance IBM s influence in Sweden or effects on the interviewees careers, is much more complex and requires close reading of several of the interviews in the SOM s IBM corner. Since IBM was originally not chosen as a possible SOM theme because of its very high frequency in the data, we wanted to see how common it really is in this material. We decided to compare the words IBM and Ericsson to see their spread in the SOM data. Fig. 6. Word frequencies of IBM (light) and Ericsson (dark) in the interviews
10 442 P. Paju, E. Malmi, and T. Honkela This map (see Fig. 6) reveals that IBM, unsurprisingly perhaps, is very commonly referred to in the interviews of the documentation project. The mapping also indicates an Ericsson concentration (OEhvik, Hoff) that could be interesting. We hope to have shown that the SOM method can contribute to analyzing data also in history research and furthermore, can be useful for a special field such as history of computing in the Nordic countries. Especially when the Swedish language is applied, as in this documentation project under analysis, the data is understandable in most of the Nordic area. The SOM makes the data of even such a formidable size, and rich with countless details of subjects, people and (nationally specific) topics, accessible for a non-swede and/or for a non-expert user. In the future, we suggest using the SOM method for historical data (also in English) and as a new way of producing user guidelines for large information archives such as in the field of history of computing and those in the Charles Babbage Institute at the University of Minnesota. Moreover, with ever-increasing amount of electronic source materials becoming objects of historical scrutiny in the future, we foresee many uses for text mining with the SOM method in organizing the analysis of such electronic source collections. Many historians oriented towards qualitative research may find it challenging to use a method such as terminology extraction or the SOM, with which they potentially do not have any prior experience. For the moment, before easy-to-use implementations of the necessary software are available, we recommend that these kinds of studies be conducted in collaborative and interdisciplinary contexts because this would ensure that various aspects of the necessary methodological expertise are available. This is particularly true for automatic terminology extraction and some advanced uses of the SOM [14]. Acknowledgments. We warmly thank Prof. Reijo (Shosta) Sulonen who with his vision brought the authors of this paper together. His discussions with Prof. Yrjö Neuvo originally generated the basic ideas behind this study. We are also grateful to Academician Teuvo Kohonen whose work, including the development of the self-organizing map algorithm, has been foundational to a whole research area. References 1. Misa, T.J.: Organizing the History of Computing: Lessons Learned at the Charles Babbage Institute. In: Impagliazzo, J., Järvi, T., Paju, P. (eds.) HiNC2. IFIP AICT, vol. 303, pp Springer, Heidelberg (2009) 2. Lundin. P.: Documenting the Use of Computers in Swedish Society between 1950 and 1980: Final Report on the Project From Computing Machines to IT. Working Papers from the Division of History of Science and Technology, TRITA/HST 2009/1, Stockholm (2009) 3. Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Berlin (2001) 4. Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: WEBSOM Self-organizing maps of document collections. In: Proceedings of WSOM 1997, Workshop on self-organizing Maps, pp Helsinki University of Technology, Espoo (1997) 5. Kohonen, T.: Self-Organization and Associative Memory. Springer, Berlin (1984)
11 Text Mining and Qualitative Analysis of an IT History Interview Collection von der Malsburg, C.: Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14, (1973) 7. Amari, S.: Topographic organization of nerve fields. Bulletin of Mathematical Biology 42, (1980) 8. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, (1982) 9. Kohonen, T., Honkela, T.: Kohonen network. Scholarpedia 2(1), 1568 (2007) 10. Kaski, S., Kangas, J., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: Neural Computing Surveys 1, (1998) 11. Oja, M., Kaski, S., Kohonen, T.: Bibliography of Self-Organizing Map (SOM) Papers: Addendum. Neural Computing Surveys 3, (2003) 12. Pöllä, M., Honkela, T., Kohonen, T.: Bibliography of Self-Organizing Map (SOM) Papers: Addendum. Technical Report TKK-ICS-R23, Helsinki University of Technology (2009) 13. Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Information Sciences 163(1-3), (2004) 14. Janasik, N., Honkela, T., Bruun, H.: Text mining in qualitative research: Application of an unsupervised learning method. Organizational Research Methods 12(3), (2009) 15. Castellani, B., Castellani, J., Spray, S.L.: Grounded neural networking: Modeling complex quantitative data. Symbolic Interaction 26(4), (2003) 16. Paukkeri, M.S., Nieminen, I.T., Pöllä, M., Honkela, T.: A Language-Independent Approach to Keyphrase Extraction and Evaluation. In: Scott, D., Uszkoreit, H. (eds.) COLING 2008, 22nd International Conference on Computational Linguistics, Posters Proceedings, Manchester, UK, August 18-22, pp (2008) 17. Damerau, F.: Generating and evaluating domain-oriented multi-word terms from text. Information Processing and Management 29(4), (1993)
ENERGY RESEARCH at the University of Oulu. 1 Introduction Air pollutants such as SO 2
ENERGY RESEARCH at the University of Oulu 129 Use of self-organizing maps in studying ordinary air emission measurements of a power plant Kimmo Leppäkoski* and Laura Lohiniva University of Oulu, Department
More informationVisualization of Breast Cancer Data by SOM Component Planes
International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian
More informationUsing Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps
Technical Report OeFAI-TR-2002-29, extended version published in Proceedings of the International Conference on Artificial Neural Networks, Springer Lecture Notes in Computer Science, Madrid, Spain, 2002.
More informationData Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it
More informationLanguage Technology based on Big Data: Current Situation and Future Perspectives
Language Technology based on Big Data: Current Situation and Future Perspectives Timo Honkela 30 October 2014 Centre for Preservation and Digitisation Department of Modern Languages KITES-symposium Introductory
More informationVisualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
More informationComparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
More informationFramework for Modeling Partial Conceptual Autonomy of Adaptive and Communicating Agents
Framework for Modeling Partial Conceptual Autonomy of Adaptive and Communicating Agents Timo Honkela (timo.honkela@hut.fi) Laboratory of Computer and Information Science Helsinki University of Technology
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationContent Based Analysis of Email Databases Using Self-Organizing Maps
A. Nürnberger and M. Detyniecki, "Content Based Analysis of Email Databases Using Self-Organizing Maps," Proceedings of the European Symposium on Intelligent Technologies, Hybrid Systems and their implementation
More informationVisualizing an Auto-Generated Topic Map
Visualizing an Auto-Generated Topic Map Nadine Amende 1, Stefan Groschupf 2 1 University Halle-Wittenberg, information manegement technology na@media-style.com 2 media style labs Halle Germany sg@media-style.com
More informationSelf Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationMonitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations
Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations Christian W. Frey 2012 Monitoring of Complex Industrial Processes based on Self-Organizing Maps and
More informationUSING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).
More informationInformation Visualization with Self-Organizing Maps
Information Visualization with Self-Organizing Maps Jing Li Abstract: The Self-Organizing Map (SOM) is an unsupervised neural network algorithm that projects highdimensional data onto a two-dimensional
More informationSelf-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data
Self-Organizing g Maps (SOM) Ke Chen Outline Introduction ti Biological Motivation Kohonen SOM Learning Algorithm Visualization Method Examples Relevant Issues Conclusions 2 Introduction Self-organizing
More informationCITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationEFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
More informationA Computational Framework for Exploratory Data Analysis
A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.
More informationAnalecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
More informationCustomer Data Mining and Visualization by Generative Topographic Mapping Methods
Customer Data Mining and Visualization by Generative Topographic Mapping Methods Jinsan Yang and Byoung-Tak Zhang Artificial Intelligence Lab (SCAI) School of Computer Science and Engineering Seoul National
More informationSelf Organizing Maps: Fundamentals
Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen
More informationINTERACTIVE DATA EXPLORATION USING MDS MAPPING
INTERACTIVE DATA EXPLORATION USING MDS MAPPING Antoine Naud and Włodzisław Duch 1 Department of Computer Methods Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract: Interactive
More informationTopic Maps Visualization
Topic Maps Visualization Bénédicte Le Grand, Laboratoire d'informatique de Paris 6 Introduction Topic maps provide a bridge between the domains of knowledge representation and information management. Topics
More informationReconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets
Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets Aaditya Prakash, Infosys Limited aaadityaprakash@gmail.com Abstract--Self-Organizing
More informationData topology visualization for the Self-Organizing Map
Data topology visualization for the Self-Organizing Map Kadim Taşdemir and Erzsébet Merényi Rice University - Electrical & Computer Engineering 6100 Main Street, Houston, TX, 77005 - USA Abstract. The
More informationUSING SELF-ORGANISING MAPS FOR ANOMALOUS BEHAVIOUR DETECTION IN A COMPUTER FORENSIC INVESTIGATION
USING SELF-ORGANISING MAPS FOR ANOMALOUS BEHAVIOUR DETECTION IN A COMPUTER FORENSIC INVESTIGATION B.K.L. Fei, J.H.P. Eloff, M.S. Olivier, H.M. Tillwick and H.S. Venter Information and Computer Security
More information9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08
9. Text & Documents Visualizing and Searching Documents Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 Slide 1 / 37 Outline Characteristics of text data Detecting patterns SeeSoft
More informationA Discussion on Visual Interactive Data Exploration using Self-Organizing Maps
A Discussion on Visual Interactive Data Exploration using Self-Organizing Maps Julia Moehrmann 1, Andre Burkovski 1, Evgeny Baranovskiy 2, Geoffrey-Alexeij Heinze 2, Andrej Rapoport 2, and Gunther Heidemann
More information2002 IEEE. Reprinted with permission.
Laiho J., Kylväjä M. and Höglund A., 2002, Utilization of Advanced Analysis Methods in UMTS Networks, Proceedings of the 55th IEEE Vehicular Technology Conference ( Spring), vol. 2, pp. 726-730. 2002 IEEE.
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationInformation Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
More informationSpecific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
More informationFrom IP port numbers to ADSL customer segmentation
From IP port numbers to ADSL customer segmentation F. Clérot France Télécom R&D Overview ADSL customer segmentation: why? how? Technical approach and synopsis Data pre-processing The many faces of a Kohonen
More informationA Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad
More informationPRACTICAL DATA MINING IN A LARGE UTILITY COMPANY
QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,
More informationVisual analysis of self-organizing maps
488 Nonlinear Analysis: Modelling and Control, 2011, Vol. 16, No. 4, 488 504 Visual analysis of self-organizing maps Pavel Stefanovič, Olga Kurasova Institute of Mathematics and Informatics, Vilnius University
More informationAdapting an Enterprise Architecture for Business Intelligence
Adapting an Enterprise Architecture for Business Intelligence Pascal von Bergen 1, Knut Hinkelmann 2, Hans Friedrich Witschel 2 1 IT-Logix, Schwarzenburgstr. 11, CH-3007 Bern 2 Fachhochschule Nordwestschweiz,
More informationIntelligent Analysis of User Interactions in a Collaborative Software Engineering Context
Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Alejandro Corbellini 1,2, Silvia Schiaffino 1,2, Daniela Godoy 1,2 1 ISISTAN Research Institute, UNICEN University,
More informationA Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
More informationModels of Cortical Maps II
CN510: Principles and Methods of Cognitive and Neural Modeling Models of Cortical Maps II Lecture 19 Instructor: Anatoli Gorchetchnikov dy dt The Network of Grossberg (1976) Ay B y f (
More informationHow To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
More informationAnalysis of Big Data Survey 2015 on Skills, Training and Capacity Building
Analysis of Big Data Survey 2015 on Skills, Training and Capacity Building D R A F T Version 1.0 12 Oct 2015 By UN Global Working Group on Big Data for Official Statistics Task Team on Skills, Training
More informationUtilizing spatial information systems for non-spatial-data analysis
Jointly published by Akadémiai Kiadó, Budapest Scientometrics, and Kluwer Academic Publishers, Dordrecht Vol. 51, No. 3 (2001) 563 571 Utilizing spatial information systems for non-spatial-data analysis
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationCommunication Through Community? The Effects of Structural Changes on High School Communication
Communication Through Community? The Effects of Structural Changes on High School Communication Michael J. Weiss Consortium for Policy Research in Education University of Pennsylvania weissmj@dolphin.upenn.edu
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationExternally Growing Self-Organizing Maps and Its Application to E-mail Database Visualization and Exploration
Externally Growing Self-Organizing Maps and Its Application to E-mail Database Visualization and Exploration Andreas Nürnberger University of Magdeburg, FIN IWS - IR Group Universitätsplatz 2 39106 Magdeburg,
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationVisualising Class Distribution on Self-Organising Maps
Visualising Class Distribution on Self-Organising Maps Rudolf Mayer, Taha Abdel Aziz, and Andreas Rauber Institute for Software Technology and Interactive Systems Vienna University of Technology Favoritenstrasse
More informationCross-Cultural Communication Training for Students in Multidisciplinary Research Area of Biomedical Engineering
Cross-Cultural Communication Training for Students in Multidisciplinary Research Area of Biomedical Engineering Shigehiro HASHIMOTO Biomedical Engineering, Department of Mechanical Engineering, Kogakuin
More informationApplication of self-organizing maps to clustering of high-frequency financial data
Application of self-organizing maps to clustering of high-frequency financial data Adam Blazejewski Richard Coggins School of Electrical and Information Engineering University of Sydney, Sydney, NSW 2,
More informationUSING SELF-ORGANIZED MAPS AND ANALYTIC HIERARCHY PROCESS FOR EVALUATING CUSTOMER PREFERENCES IN NETBOOK DESIGNS
International Journal of Electronic Business Management, Vol. 7, No. 4, pp. 297-303 (2009) 297 USING SELF-ORGANIZED MAPS AND ANALYTIC HIERARCHY PROCESS FOR EVALUATING CUSTOMER PREFERENCES IN NETBOOK DESIGNS
More informationBayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
More informationIcon and Geometric Data Visualization with a Self-Organizing Map Grid
Icon and Geometric Data Visualization with a Self-Organizing Map Grid Alessandra Marli M. Morais 1, Marcos Gonçalves Quiles 2, and Rafael D. C. Santos 1 1 National Institute for Space Research Av dos Astronautas.
More informationA STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
More informationAnalysis of Performance Metrics from a Database Management System Using Kohonen s Self Organizing Maps
WSEAS Transactions on Systems Issue 3, Volume 2, July 2003, ISSN 1109-2777 629 Analysis of Performance Metrics from a Database Management System Using Kohonen s Self Organizing Maps Claudia L. Fernandez,
More informationORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationNeural network software tool development: exploring programming language options
INEB- PSI Technical Report 2006-1 Neural network software tool development: exploring programming language options Alexandra Oliveira aao@fe.up.pt Supervisor: Professor Joaquim Marques de Sá June 2006
More informationOn the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data
On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data Jorge M. L. Gorricha and Victor J. A. S. Lobo CINAV-Naval Research Center, Portuguese Naval Academy,
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationANALYSIS OF MOBILE RADIO ACCESS NETWORK USING THE SELF-ORGANIZING MAP
ANALYSIS OF MOBILE RADIO ACCESS NETWORK USING THE SELF-ORGANIZING MAP Kimmo Raivio, Olli Simula, Jaana Laiho and Pasi Lehtimäki Helsinki University of Technology Laboratory of Computer and Information
More informationData Mining on Sequences with recursive Self-Organizing Maps
Data Mining on Sequences with recursive Self-Organizing Maps Sebastian Blohm Universität Osnabrück sebastian@blomega.de Bachelor's Thesis International Bachelor Program in Cognitive Science, Universität
More informationBio-inspired mechanisms for efficient and adaptive network security
Bio-inspired mechanisms for efficient and adaptive network security Falko Dressler Computer Networks and Communication Systems University of Erlangen-Nuremberg, Germany dressler@informatik.uni-erlangen.de
More informationLocal Anomaly Detection for Network System Log Monitoring
Local Anomaly Detection for Network System Log Monitoring Pekka Kumpulainen Kimmo Hätönen Tampere University of Technology Nokia Siemens Networks pekka.kumpulainen@tut.fi kimmo.hatonen@nsn.com Abstract
More informationQoS Mapping of VoIP Communication using Self-Organizing Neural Network
QoS Mapping of VoIP Communication using Self-Organizing Neural Network Masao MASUGI NTT Network Service System Laboratories, NTT Corporation -9- Midori-cho, Musashino-shi, Tokyo 80-88, Japan E-mail: masugi.masao@lab.ntt.co.jp
More informationThe Research of Data Mining Based on Neural Networks
2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51.09 The Research of Data Mining
More informationVisualization of Topology Representing Networks
Visualization of Topology Representing Networks Agnes Vathy-Fogarassy 1, Agnes Werner-Stark 1, Balazs Gal 1 and Janos Abonyi 2 1 University of Pannonia, Department of Mathematics and Computing, P.O.Box
More informationNeural Networks in Data Mining
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department
More information14TH INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN 19-21 AUGUST 2003
14TH INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN 19-21 AUGUST 2003 A CASE STUDY OF THE IMPACTS OF PRELIMINARY DESIGN DATA EXCHANGE ON NETWORKED PRODUCT DEVELOPMENT PROJECT CONTROLLABILITY Jukka Borgman,
More informationViSOM A Novel Method for Multivariate Data Projection and Structure Visualization
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 237 ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization Hujun Yin Abstract When used for visualization of
More informationA user friendly toolbox for exploratory data analysis of underwater sound
061215-064 1 A user friendly toolbox for exploratory data analysis of underwater sound Fernando J. Pires 1 and Victor Lobo 2, Member, IEEE Abstract The underwater acoustics research group at the Portuguese
More informationA STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationVisual Cluster Analysis of Trajectory Data With Interactive Kohonen Maps
Visual Cluster Analysis of Trajectory Data With Interactive Kohonen Maps Tobias Schreck Interactive Graphics Systems Group TU Darmstadt, Germany Jürgen Bernard Interactive Graphics Systems Group TU Darmstadt,
More informationLVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico licinia.monteiro@tagus.ist.utl.pt I. Executive Summary In this Resume we describe a new functionality implemented
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationCredit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationDATA VERIFICATION IN ETL PROCESSES
KNOWLEDGE ENGINEERING: PRINCIPLES AND TECHNIQUES Proceedings of the International Conference on Knowledge Engineering, Principles and Techniques, KEPT2007 Cluj-Napoca (Romania), June 6 8, 2007, pp. 282
More informationSimultaneous Gamma Correction and Registration in the Frequency Domain
Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong a28wong@uwaterloo.ca William Bishop wdbishop@uwaterloo.ca Department of Electrical and Computer Engineering University
More informationA Review of Data Clustering Approaches
A Review of Data Clustering Approaches Vaishali Aggarwal, Anil Kumar Ahlawat, B.N Panday Abstract- Fast retrieval of the relevant information from the databases has always been a significant issue. Different
More informationURL: http://www.swedsoft.se/wp-content/uploads/2011/09/stew2011_submission_17.pdf!
This is an author-generated version. Bibliographic information: The final publication is available at swedsoft.se URL: http://www.swedsoft.se/wp-content/uploads/2011/09/stew2011_submission_17.pdf Vaibhavi
More informationVISUALIZATION OF GEOSPATIAL DATA BY COMPONENT PLANES AND U-MATRIX
VISUALIZATION OF GEOSPATIAL DATA BY COMPONENT PLANES AND U-MATRIX Marcos Aurélio Santos da Silva 1, Antônio Miguel Vieira Monteiro 2 and José Simeão Medeiros 2 1 Embrapa Tabuleiros Costeiros - Laboratory
More informationAppendix 4 Simulation software for neuronal network models
Appendix 4 Simulation software for neuronal network models D.1 Introduction This Appendix describes the Matlab software that has been made available with Cerebral Cortex: Principles of Operation (Rolls
More informationData Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
More informationTechnical Club: New Vision of Computing
1 Technical Club: New Vision of Computing Core Discipline : Mentor : Computer Science Engineering Dr. Shripal Vijayvergia, Associate Professor, CSE Co-Mentor : 1. Mr. Subhash Gupta, Assistant Professor,
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationSpatial Data Analysis
14 Spatial Data Analysis OVERVIEW This chapter is the first in a set of three dealing with geographic analysis and modeling methods. The chapter begins with a review of the relevant terms, and an outlines
More informationMachine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
More informationPersonalized Information Management for Web Intelligence
Personalized Information Management for Web Intelligence Ah-Hwee Tan Kent Ridge Digital Labs 21 Heng Mui Keng Terrace, Singapore 119613 Email: ahhwee@krdl.org.sg Abstract Web intelligence can be defined
More informationANN Based Fault Classifier and Fault Locator for Double Circuit Transmission Line
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Special Issue-2, April 2016 E-ISSN: 2347-2693 ANN Based Fault Classifier and Fault Locator for Double Circuit
More informationcategory field F 2 feature field feature field ART b ART a F b 1 F 1 a - x a x b b w
FOCI: A Personalized Web Intelligence System Ah-Hwee Tan, Hwee-Leng Ong, Hong Pan, Jamie Ng, Qiu-Xiang Li Kent Ridge Digital Labs 21 Heng Mui Keng Terrace, Singapore 119613, email: fahhwee, hweeleng, panhong,
More information