Assessing Italian Research in Statistics: Interdisciplinary or Multidisciplinary?
|
|
- Benedict Foster
- 8 years ago
- Views:
Transcription
1 Assessing Italian Research in Statistics: Interdisciplinary or Multidisciplinary? Sandra De Francisci Epifani*, Maria Gabriella Grassia**, Nicole Triunfo**, Emma Zavarrone* Abstract In this paper, we assess cross disciplinary of research produced by the Italian Academic Statisticians (IAS) combining text mining and bibliometrics techniques Textual and bibliometric approaches have together advantages and disadvantages, and provide different views on the same interlinked corpus of scientific publications. In addition textual information in such documents, jointly citations also constitute huge networks that yield additional information. We incorporate both points of view and show how to improve on existing text-based and bibliometric methods. In particular, we propose an hybrid clustering procedure based on Fisher s inverse chi-square method as the preferred method for integrating textual content and citation information. Given clustered papers, it s possible to evaluate ISI subject categories (SCs) as descriptive labels for statistical documents, and to address individual researchers interdisciplinary. Keywords: Bibliometrics, Text mining, Social network Analysis, Hybrid Clustering 1 Introduction Increasing dissemination of scientific and technological publications via web sides, and their availability in large-scale bibliographic databases, opened to massive opportunities for improving classification and bibliometric cartography for science and technology. This metascience benefits of the continuous arise of computing power and development of new algorithms. The purpose of mapping, charting or cartography of scientific fields is the knowledge of the structure and the evolution for different areas of research and link other fields, based on scientific publications. Research fields can be profiled using different keywords i.e. in terms of prolific authors, major concepts, important publications and journals, institutions, regions and countries, etc. Knowledge about the amount of activity in various fields and about new, emerging and converging fields is important to organizations, research institutions and nations. Quantitative information can be used for evaluation of research performance, interdisciplinary, collaboration, internationalization and for the support of innovation management, science and technology policies (for example, what fields should be supported through funding?). Such policies are crucial for competitive positions at university. We focus on cross disciplinary within scientific areas of research Italian Universities using clustering algorithms and techniques in bibliometrics and text mining. The multidisciplinary context given by statistical affords an excellent opportunity to examine the methods used to study interdisciplinary and integration. 2 Background Research that occurs at the intersection between disciplines is thought to lead to great advances in science (Porter and Rafols, 2009). Interdisciplinary research would be supported and encouraged to solve new statistical challenges. A cynical disposition to this problem is eloquently stated in Brewer (1999): The world has problems, but universities have departments. The term interdisciplinary tends to be tacitly understood by researchers, without shared definition. We adopt the definition suggested by Porter et al. (2007), given by the National Academies (2005): interdisciplinary research requires an integration of concepts, theories, techniques and/or data from two or more bodies of specialized knowledge. Multidisciplinary research may incorporate elements of other specialized knowledges, but without * UNIVERSITA' IULM - Via Carlo Bo, 1 Milano ** Dipartimento di Matematica e Statistica, Università degli Studi Federico II Napoli via Cintia, Napoli
2 interdisciplinary synthesis (Wagner et al., 2011) which includes more than single parts. Analysis of cross disciplinary improves traditional indicators assessing and quantifying interdisciplinary research (Morillo et al., 2001) (fig.1). Fig. 1: Interdisciplinary and multidisciplinary Indicators of different disciplinary describe heterogeneity of a bibliometric set obtained starting from predefined categories i.e. using a top-down approach, we allocate the set on the global map of science. Network coherence indicators are constructed to measure the intensity of similarity relations within a bibliometric set, i.e. using a bottom-up approach, which reveals the structural consistency of the publications network (Rafols and Meyer, 2010). Instead of exploring large-scale trends in publications using a top-down approach, it is necessary to have a large amount of data that represents the research track of each statistician using a bottom-up approach. We suggest to measure one or more individuals versed in statistics. Therefore, an unsupervised approach is optimal as such methods can find trends in data without prior knowledge of its structure. Substantial distinction between text world and graph world refers to different parts of views on a collection of interlinked publications. In addition, textual information such as citations, kept in documents, are large networks, which yield additional information. To create groups of publications in clusters or groups of documents, we consider two complementary approaches. In integrated or hybrid analysis we include how to improve existing text-based and graph analytic (or bibliometric) methods by deeply merging textual content with the structure of the citation graph. The main difference between text world and graph world refers to an interlinked data collection such as World Wide Web and bibliographic databases containing written scientific communications. These documents contain textual information that can be mined for knowledge by using text mining techniques. Moreover, each document refers to other documents that are related in some way. Most scientific papers indeed cite previous research on which it is based or which is considered to be relevant for the subject. These citations are collected in the bibliography of a publication. Although various reasons are conceivable for citing other works, citations usually imply endorsement or recommendation of previous work. All citations among publications or hyperlinks among Web pages constitute extremely large networks, of which the World Wide Web is the biggest example. Instead of the Web, where each Web page can have hyperlinks to any other page, a citation network or literature network is a kind of/or similar to directed acyclic graph (DAG). Citations and hyperlinks have, respectively, a direction (they point from one entity to another), but citations are not reciprocal and no directed cycles occur in the citation graph. Usually, a scientific paper only cites documents that have already been published. Textual and graph-based approaches might be applied to a dataset. For example, similarity of different perceptions Page 2
3 between documents or groups of documents can be described using different methods. In addition, we observe dynamics in evolving databases. We include viewpoints and claim jointly to improve on existing text-based and graph analytic or bibliometric methods to science and statistics mapping. Indeed, textual information can indicate similarities that are invisible to bibliometric techniques. Based only on text, true document similarity can be overshadowed by differences in vocabulary use, or spurious similarities might be introduced as a result of textual pre-processing, or because of polysemous words (a word with several meanings) or words with little semantic values. Widely used method of co-citation clustering was introduced independently by Small (1973, 1978) and Marshakova (1973). Cross-citation-based cluster analysis for science mapping is different; while the former is usually based on links connecting individual documents, the latter requires aggregation of documents to units like journals or subject fields among which cross-citation links are established. Some advantages of this method are undermined by possible biases. (for instance, analyze directed information flows). For example, bias could be caused by the use of predefined units (journals, subject categories, etc.), in some way, this implies an initial level of structural classification. Journal crosscitation clustering has been used by Leydesdorff (2006), Leydesdorff and Rafols (2009), and Boyack, BÜrner, and Klavans (2005), while Moya-Anegùn et al. (2007) applied subject co-citation analysis to visualize the structure of science and its dynamics. The integration of lexical similarities and citation links are attractive also in other fields such as search engine design (i.e., Google combines text and links; Brin & Page 1998). In early 90 s, the combination of link-based clustering with a textual approach was suggested for better efficiency and appliability of co-citation and coword analysis. A new Weighted hybrid clustering framework was proposed by Liu, Yu, Janssens, Glènzel, Moreau & De Moor (2010) the focus was on text mining with bibliometrics in journal set analysis. This framework integrates two different approaches: clustering 1. ensemble and kernel-fusion clustering. 3 Aims, methods and data collection 2. In order to verify the hypothesis of accuracy of clustering and classification of scientific papers, we propose an answer to the following question: Is Statistics interdisciplinary or multidisciplinary?. In synthesis, our methodological proposal is organized as follows: 3. We combine different text mining techniques for information retrieval and map the networks of the content papers written by single or multiple statisticians. 4. We focus on analysis of large networks that emerge from individual papers of statisticians (authors) citing other scientific works. These networks are analyzed with techniques from bibliometrics and social network techniques in order to: construct coherent indicators for measuring the intensity of similarity relations within the bibliometric set and cluster the papers analyzed. We propose a clustering procedure based on Fisher s inverse chi-square method for integrating textual content and citation information. We evaluate ISI Web of Knowledge subject categories as descriptive labels for statistical documents, compare the clusters obtained in the third step with the ISI classification of Statistician papers. We collect monthly Italian Statisticians papers, for the period and we take Scs through Scopus and Web of Knowledge(WoK). To gather this data, we employed the following procedure: 1.- Create a list of whole papers authored by Italian Statisticians (Scopus Author search). 2. Create a list of all the papers present in references of Italian Statisticians papers (Scopus) 3. Create a list of all of the papers that cite Italian Statisticians papers Manually download the html files one for each paper from WoK (with next information: Authors, Title, Year, Source title, Volume, Abstract, Author Keywords, Index Keywords, References, Editors, ISSN, ISBN, CODEN, Language of Original Document, Document Type, SC).The dataset will have the title, the abstract text, author keywords and the SCs for each Statistician s publication, the publications they cite (references), the publications that cite them (citations). We modify the subject categories using the following method: Papers with a single WoK SCs that appears 10 or more times in our dataset uses assigned WoK SCs name. Page 3
4 Papers with a single WoK SCs that appears less than 10 times is changed to a broader WoK category. Papers with two or more SCs, containing equivalent weight, are assigned to a new conflated SC. Papers with two or more SCs that have a clear primary SC have Multidisciplinary appended to the primary name. Textual content is entirely indexed and encoded in the Vector Space Model using the TF-IDF weighted schemes, and text-based similarities were computed as the cosine of the angle between the two papers. The dimension of the term-by-document matrix is reduced by Latent Semantic Indexing (LSI) Deerwester, et al. (1988). Citations among selected publications are investigated in three different aspects: a) Cross-citation (CRC): Cross-citation between two papers is defined as the frequency of citations between each others. The direction of citations is ignored. b) Co-citation (COC): Co-citation refers to the number of times two papers are cited together in subsequent literature. The co-citation frequency of two papers is equal to the number of papers that cite them simultaneously. c) Bibliographic coupling (BGC): Bibliographic coupling occurs when two papers refer a common third paper in their bibliographies. The coupling frequency corresponds to the number of papers they simultaneously cite. All textual and citation data sources were converted into kernels using a linear kernel function. In particular, for the textual data, the kernel matrices were normalized and their elements correspond to the cosine value of pairwise document-by-term vectors. We combine document in a matrix of dissimilarities based on textual information, network structure or other bibliometric indicators. So the integrated document distances can be used for a learning algorithm. The integrated document distances can then be passed to a learning algorithm. Weighted linear combination of distance matrices, as well as Fisher s inverse chi-square method from statistical meta-analysis, are applied. We label the clusters obtained on their most significant terms and most representative publications. Finally, we compare the cluster structure with ISI classification schemes. We clustered statistical abstract data to evaluate SCs as document labels. We attempt to reconcile clustering (bottom-up approach) with pre-defined categories (top-down approach). If the clusters produced by hybrid framework don t correspond well to the SCs, so we can conclude that SCs are not well suited to the classification of statistical publications, and speculate that this may also be true for other interdisciplinary fields. 4 Conclusion Disciplinary diversity indicators are developed to describe the heterogeneity of a bibliometric set viewed from predefined categories, i.e. using a top-down approach that locates the set on the global map of science. In this pilot study on Italian Statisticians, we investigated the use of an hybrid clustering technique, to aid in measuring researcher interdisciplinary. Furthermore, we assess whether Journal Subject Categories from the Web of Knowledge database are sufficient for labeling statistics documents. Clustering and textual classification allow interdisciplinary analysis such that 1) describe collaboration and integration of knowledge and 2) draw to useful conclusions for statistical researchers by uncovering the underlying structure of research tracks 5 References Boyack, K. W., Klavans, R., & B Orner, K. (2005). Mapping the backbone of science. Scientometrics, 64, Braam R. R., Moed H. F., & van Raan A. F. J. (1991). Mapping of science by combined cocitation and word analysis.2. dynamic aspects. Journal of the American Society forinformation Science, 42(4): Brewer, G. D. (1999). The challenges of interdisciplinarity. Policy Sciences, 32, Brin S. & Page L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1-7): Page 4
5 Deerwester, S., et al. ( 1988). Improving Information Retrieval with Latent Semantic Indexing, Proceedings of the 51st Annual Meeting of the American Society for Information Science 25, pp Gowanlock M. & Gazan R. (2012). Assessing Researcher Interdisciplinarity: A Case Study of the University of Hawaii NASA Astrobiology Institute. ICS research documents Janssens, F., Zhang, L., Moor, B. D., & Glänzel, W. (2009). Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management, 45(6), Leydesdorff, L. (2006). Can scientific journals be classified in terms of aggregated journal-journal citation relations using the Journal Citation Reports? Journal of the American Society for Information Science and Technology, 57(5), Leydesdorff, L., & Rafols, I. (2009). A global map of science based on the ISI subject categories. Journal of the American Society for Information Science and Technology, 60(2), Liu, X., Yu, S., Janssens, F., Glänzel, W., Moreau, Y., & De Moor, B. (2010). Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. Journal of the American Society for Information Science and Technology, 61(6), Marshakova I. V. (2003). Journal co-citation analysis in the field of information science and library science. In P. Nowak and M. Gorny, editors, Language, information and communication studies, pages Adam Mieckiewicz University, Poznan. Morillo, F., Bordons, M., & G omez, I. (2001). An approach to interdisciplinarity through biblio- metric indicators. Scientometrics, 51, Moya-Anegon, F., Vargas-Quesada, B., Herrero-Solana, V., Chinchilla-Rodriguez, Z., Corera-Alvarez, E., & Munoz-Fernandez, F. J. (2004). A new technique for building maps of large scientific domains based on the cocitation of classes and categories. Scientometrics, 61(1), National Academies. (2005). Committee on Facilitating Interdisciplinary Research, of the Committee on Science, Engineering, and Public Policy. Facilitating Interdisciplinary Research. Washington, DC. Porter, A., & Rafols, I.. Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81, , Porter, A., Cohen, A., Roessner, J. D., & Perreault, M.. Measuring researcher interdisciplinarity. Scientometrics, 72, , Rafols, I. and Meyer, M. (2010) Diversity and Network Coherence as indicators of interdisciplinarity: case studies in bionanoscience. Scientometrics 82(2), Small, H. (1999). Visualizing science by citation mapping. Journal of the American Society for Information Science, 50, Wagner, C. S., Roessner, J. D., Bobb, K., Klein, J. T., Boyack, K. W., Keyton, J., Rafols, I., & B orner, K.. Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature. Journal of Informetrics, 5(1), 14 26, Zhang, L., Liu, X., Janssens, F., Liang, L., & Gl anzel, W.. Subject clustering analysis based on ISI category classification. Journal of Informetrics, 4(2), , Zitt M. & Bassecoulard E. (1994). Development of a method for detection and trend analysis of research fronts built by lexical or cocitation analysis. Scientometrics, 30(1): Page 5
A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
More informationVisualizing bibliometric networks
This is a preprint of the following book chapter: Van Eck, N.J., & Waltman, L. (2014). Visualizing bibliometric networks. In Y. Ding, R. Rousseau, & D. Wolfram (Eds.), Measuring scholarly impact: Methods
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More informationEvolving and Emerging Populations and Topics Katy Börner and Angela Zoss 1. Executive Summary 2. Nontechnical Description of Techniques and Tools
Evolving and Emerging Populations and Topics White Paper for CISE/SBE Advisory Committee on Research Portfolio Analysis Katy Börner and Angela Zoss Cyberinfrastructure for Network Science Center School
More informationdm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationBinary Pathfinder: An improvement to the Pathfinder algorithm
Information Processing and Management 42 (2006) 1484 1490 www.elsevier.com/locate/infoproman Binary Pathfinder: An improvement to the Pathfinder algorithm Vicente P. Guerrero-Bote a, *, Felipe Zapico-Alonso
More informationUtilizing spatial information systems for non-spatial-data analysis
Jointly published by Akadémiai Kiadó, Budapest Scientometrics, and Kluwer Academic Publishers, Dordrecht Vol. 51, No. 3 (2001) 563 571 Utilizing spatial information systems for non-spatial-data analysis
More informationHow often are Patients Interviewed in Health Research? An Informetric Approach
How often are Patients Interviewed in Health Research? An Informetric Approach Jonathan M. Levitt 1 and Mike Thelwall 2 1 J.M.Levitt@wlv.ac.uk, 2 m.thelwall@wlv.ac.uk Statistical Cybermetrics Research
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationUsing Bibliometrics-aided Retrieval to Delineate the Field of Cardiovascular Research
Using Bibliometrics-aided Retrieval to Delineate the Field of Cardiovascular Research Diane Gal 1, Karin Sipido 1 and Wolfgang Glänzel 2 {diane.gal, karin.sipido}@med.kuleuven.be, wolfgang.glanzel@kuleuven.be
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More information1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationA Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities
A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities The first article of this series presented the capability model for business analytics that is illustrated in Figure One.
More informationUrban Andersson Jonas Gilbert and Karin Henning Gothenburg University Library Gothenburg, Sweden
Date submitted: 03/07/2010 Download data versus traditional impact metrics: Measuring impact in a sample of biomedical doctoral dissertations Urban Andersson Jonas Gilbert and Karin Henning Gothenburg
More informationBibliometric Big Data and its Uses. Dr. Gali Halevi Elsevier, NY
Bibliometric Big Data and its Uses Dr. Gali Halevi Elsevier, NY In memoriam https://www.youtube.com/watch?v=srbqtqtmncw The Multidimensional Research Assessment Matrix Unit of assessment Purpose Output
More informationBibliometrics and Transaction Log Analysis. Bibliometrics Citation Analysis Transaction Log Analysis
and Transaction Log Analysis Bibliometrics Citation Analysis Transaction Log Analysis Definitions: Quantitative study of literatures as reflected in bibliographies Use of quantitative analysis and statistics
More informationScholarly Use of Web Archives
Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationMovie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani
More informationBagged Ensemble Classifiers for Sentiment Classification of Movie Reviews
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie
More informationPatent Big Data Analysis by R Data Language for Technology Management
, pp. 69-78 http://dx.doi.org/10.14257/ijseia.2016.10.1.08 Patent Big Data Analysis by R Data Language for Technology Management Sunghae Jun * Department of Statistics, Cheongju University, 360-764, Korea
More informationElectronic Medical Record Integration and Controversy: Visualizing the Adoption of EMRs in the U.S.
Electronic Medical Record Integration and Controversy: Visualizing the Adoption of EMRs in the U.S. Lindsay A. Carrabine Drexel University, College of Information Science and Technology Abstract The integration
More informationKeywords: Information Retrieval, Vector Space Model, Database, Similarity Measure, Genetic Algorithm.
Volume 3, Issue 8, August 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Effective Information
More informationMaría Elena Alvarado gnoss.com* elenaalvarado@gnoss.com Susana López-Sola gnoss.com* susanalopez@gnoss.com
Linked Data based applications for Learning Analytics Research: faceted searches, enriched contexts, graph browsing and dynamic graphic visualisation of data Ricardo Alonso Maturana gnoss.com *Piqueras
More informationOn the Evolution of Journal of Biological Education s Hirsch Index in the New Century
Karamustafaoğlu / TÜFED-TUSED/ 6(3) 2009 13 TÜRK FEN EĞİTİMİ DERGİSİ Yıl 6, Sayı 3, Aralık 2009 Journal of TURKISH SCIENCE EDUCATION Volume 6, Issue 3, December 2009 http://www.tused.org On the Evolution
More informationWeb Archiving and Scholarly Use of Web Archives
Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationSearch engine ranking
Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 417 422. Search engine ranking Mária Princz Faculty of Technical Engineering, University
More informationUsing Big Data Analytics
Using Big Data Analytics to find your Competitive Advantage Alexander van Servellen a.vanservellen@elsevier.com 2013 Electronic Resources and Consortia (November 6 th, 2013) The Topic What is Big Data
More informationSEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationA Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.
A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc. Introduction: The Basel Capital Accord, ready for implementation in force around 2006, sets out
More informationIs Science Becoming more Interdisciplinary? Measuring and Mapping Six Research Fields over Time [Jan., 2009, submitted to Scientometrics]
Is Science Becoming more Interdisciplinary? Measuring and Mapping Six Research Fields over Time [Jan., 2009, submitted to Scientometrics] Abstract Alan L. Porter 1 Ismael Rafols 2 3 In the last two decades
More informationHow To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
More informationInternational Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.
RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,
More informationWho Studies MOOCs? Interdisciplinarity in MOOC Research and its Changes over Time
International Review of Research in Open and Distributed Learning Volume 16, Number 3 June 2015 Who Studies MOOCs? Interdisciplinarity in MOOC Research and its Changes over Time George Veletsianos 1 and
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationIntelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives
Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Search The Way You Think Copyright 2009 Coronado, Ltd. All rights reserved. All other product names and logos
More informationMIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
More informationData Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationSciMAT: A New Science Mapping Analysis Software Tool
SciMAT: A New Science Mapping Analysis Software Tool M.J. Cobo, A.G. López-Herrera, E. Herrera-Viedma, and F. Herrera Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More informationHow To Visualize A Classification Tree In Informatics
NONLINEAR APPROACH IN CLASSIFICATION VISUALIZATION AND EVALUATION Veslava Osińska Institute of Information Science and Book Studies, Nicolas Copernicus University, Toruń, Poland e-mail: wieo@umk.pl Piotr
More informationIndicators' relativity & data collection dependence.
Indicators' relativity & data collection dependence. QUONIAM L.*, ROSTAING H.*, BOUTIN E.**, DOU H.* *C.R.R.M. Fac. St. Jérôme. 13397 Marseille CEDEX 20. France. E-mail: crrm@crrm.univ-mrs.fr **Centre
More informationScientific Collaboration Networks in China s System Engineering Subject
, pp.31-40 http://dx.doi.org/10.14257/ijunesst.2013.6.6.04 Scientific Collaboration Networks in China s System Engineering Subject Sen Wu 1, Jiaye Wang 1,*, Xiaodong Feng 1 and Dan Lu 1 1 Dongling School
More informationDiffusion of Latent Semantic Analysis as a Research Tool: A Social Network Analysis Approach
*Manuscript Click here to view linked References Diffusion of Latent Semantic Analysis as a Research Tool: A Social Network Analysis Approach Yaşar Tonta and Hamid R. Darvish tonta@hacettepe.edu.tr, darvish@cankaya.edu.tr
More informationPeters & Heinrich GFKL 2008 - An Introduction
Qualitative Citation Analysis Based on Formal Concept Analysis Wiebke Petersen & Petja Heinrich Institute of Language and Information University of Düsseldorf Overview aim: to present the FCA as an applicable
More informationImpact measures of interdisciplinary research in physics
Jointly published by Akadémiai Kiadó, Budapest Scientometrics, and Kluwer Academic Publishers, Dordrecht Vol. 53, No. 2 (2002) 241 248 Impact measures of interdisciplinary research in physics ED J. RINIA,
More informationAN SQL EXTENSION FOR LATENT SEMANTIC ANALYSIS
Advances in Information Mining ISSN: 0975 3265 & E-ISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp-19-25 Available online at http://www.bioinfo.in/contents.php?id=32 AN SQL EXTENSION FOR LATENT SEMANTIC ANALYSIS
More informationComparative Study of Features Space Reduction Techniques for Spam Detection
Comparative Study of Features Space Reduction Techniques for Spam Detection By Nouman Azam 1242 (MS-5) Supervised by Dr. Amir Hanif Dar Thesis committee Brig. Dr Muhammad Younas Javed Dr. Azad A Saddiqui
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More information0: This manual describes how to use the Bibliometric Mapping Tools program located at http://www.lognostics.co.uk/tools/mappingtools/
Bibliometric Mapping Tools v1.0 The Manual Paul Meara January 2014 0: This manual describes how to use the Bibliometric Mapping Tools program located at http://www.lognostics.co.uk/tools/mappingtools/
More informationHow to Create an Overlay Map of Science Using the Web of Science
How to Create an Overlay Map of Science Using the Web of Science Contents Ken Riopelle 1, Loet Leydesdorff 2, Li Jie 3 Part.1 Overview... 2 Part.2 Detail steps to create overlay map... 6 Step1. Create
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationDATA QUALITY AND SCALE IN CONTEXT OF EUROPEAN SPATIAL DATA HARMONISATION
DATA QUALITY AND SCALE IN CONTEXT OF EUROPEAN SPATIAL DATA HARMONISATION Katalin Tóth, Vanda Nunes de Lima European Commission Joint Research Centre, Ispra, Italy ABSTRACT The proposal for the INSPIRE
More informationScience Navigation Map: An Interactive Data Mining Tool for Literature Analysis
Science Navigation Map: An Interactive Data Mining Tool for Literature Analysis Yu Liu School of Software yuliu@dlut.edu.cn Zhen Huang School of Software kobe_hz@163.com Yufeng Chen School of Computer
More informationText Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that
More informationLibrary and information science research trends in India
Annals of Library and Studies Vol. 58, December 011, pp. 319-35 Library and information science research trends in India Rekha Mittal Senior Principal Scientist, CSIR-National Institute of Science Communication
More informationMake search become the internal function of Internet
Make search become the internal function of Internet Wang Liang 1, Guo Yi-Ping 2, Fang Ming 3 1, 3 (Department of Control Science and Control Engineer, Huazhong University of Science and Technology, WuHan,
More informationDoes it Matter Which Citation Tool is Used to Compare the h-index of a Group of Highly Cited Researchers?
Australian Journal of Basic and Applied Sciences, 7(4): 198-202, 2013 ISSN 1991-8178 Does it Matter Which Citation Tool is Used to Compare the h-index of a Group of Highly Cited Researchers? 1 Hadi Farhadi,
More informationSpam Filtering Based on Latent Semantic Indexing
Spam Filtering Based on Latent Semantic Indexing Wilfried N. Gansterer Andreas G. K. Janecek Robert Neumayer Abstract In this paper, a study on the classification performance of a vector space model (VSM)
More informationKnowledge base and research front of Information science 2006-2010: An author co-citation and bibliographic coupling analysis
Knowledge base and research front of Information science 2006-2010: An author co-citation and bibliographic coupling analysis Dangzhi Zhao 1 School of Library and Information Studies, University of Alberta,
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More informationKeyphrase Extraction for Scholarly Big Data
Keyphrase Extraction for Scholarly Big Data Cornelia Caragea Computer Science and Engineering University of North Texas July 10, 2015 Scholarly Big Data Large number of scholarly documents on the Web PubMed
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationMining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
More informationAdvanced Database Marketing Innovative Methodologies and Applications for Managing Customer Relationships
Advanced Database Marketing Innovative Methodologies and Applications for Managing Customer Relationships Edited by KRISTOF COUSSEMENT KOEN W. DE BOCK and SCOTT A. NESLIN GOWER Contents List of Figures
More informationThe Visualization Pipeline
The Visualization Pipeline Conceptual perspective Implementation considerations Algorithms used in the visualization Structure of the visualization applications Contents The focus is on presenting the
More informationData Pre-Processing in Spam Detection
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 11 May 2015 ISSN (online): 2349-784X Data Pre-Processing in Spam Detection Anjali Sharma Dr. Manisha Manisha Dr. Rekha Jain
More informationSubordinating to the Majority: Factoid Question Answering over CQA Sites
Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationDATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
More informationExploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it
More informationEnhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects
Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com
More informationUsing bibliometric maps of science in a science policy context
Using bibliometric maps of science in a science policy context Ed Noyons ABSTRACT Within the context of science policy new softwares has been created for mapping, and for this reason it is necessary to
More informationText Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationFacilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
More informationCS 207 - Data Science and Visualization Spring 2016
CS 207 - Data Science and Visualization Spring 2016 Professor: Sorelle Friedler sorelle@cs.haverford.edu An introduction to techniques for the automated and human-assisted analysis of data sets. These
More informationSearch and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
More informationLatent Semantic Indexing with Selective Query Expansion Abstract Introduction
Latent Semantic Indexing with Selective Query Expansion Andy Garron April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville PA 19426 Abstract This article describes
More informationNetwork Working Group
Network Working Group Request for Comments: 2413 Category: Informational S. Weibel OCLC Online Computer Library Center, Inc. J. Kunze University of California, San Francisco C. Lagoze Cornell University
More informationScience Overlay Maps: A New Tool for Research Policy and Library Management
Science Overlay Maps: A New Tool for Research Policy and Library Management Ismael Rafols SPRU Science and Technology Policy Research, University of Sussex, Brighton, BN1 9QE, England. E-mail: i.rafols@sussex.ac.uk.
More informationTHE USE OF THOMSON REUTERS RESEARCH ANALYTIC RESOURCES IN ACADEMIC PERFORMANCE EVALUATION DR. EVANGELIA A.E.C. LIPITAKIS OCTOBER 2014, PRAGUE
THE USE OF THOMSON REUTERS RESEARCH ANALYTIC RESOURCES IN ACADEMIC PERFORMANCE EVALUATION DR. EVANGELIA A.E.C. LIPITAKIS OCTOBER 2014, PRAGUE Thomson Reuters: Solutions Portfolio to Seamlessly Support
More informationDATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
More informationThe SCONUL Seven Pillars of Information Literacy. Core Model For Higher Education
The SCONUL Seven Pillars of Information Literacy Core Model For Higher Education SCONUL Working Group on Information Literacy April 2011 The SCONUL Seven Pillars of Information Literacy: Core Model 2 Introduction
More informationELPUB Digital Library v2.0. Application of semantic web technologies
ELPUB Digital Library v2.0 Application of semantic web technologies Anand BHATT a, and Bob MARTENS b a ABA-NET/Architexturez Imprints, New Delhi, India b Vienna University of Technology, Vienna, Austria
More informationSearch Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
More information