Boom and Bust Cycles in Scientific Literature A Toolbased Big-Data Analysis

Similar documents
Implementation requirements for knowledge management components into ERP Systems: Comparison of software producers and companies

A Process Model for Data Warehouses Integration to Enable Business Intelligence: An Applicability Check for the Airline Sector.

Comparing Social Media Sites: A Facebook Case Study about Employer Branding. Bachelorarbeit

Requirements and Challenges for the Migration from EDIFACT-Invoices to XML-Based Invoices. Master Thesis

Affiliate Marketing Technology, Opportunities and Challenges

Analysis of Business Models for Electric Vehicles Usage

E-Commerce Opportunities for a Commercial Vehicle Industry System Supplier. Bachelorarbeit

Evaluation of Selection Methods for Global Mobility Management Software

Thema: Hybrid IT-Project Management: Design, Application and Analysis

Quantifying the Influence of Volatile Renewable Electricity Generation on EEX Spotmarket Prices using Artificial Neural Networks.

Empirical Analysis of Leadership and Social Learning Effects on Employees' Information Security Behaviour. Masterarbeit

User Guidance in Business Process Modelling

Cost-Benefit Analysis of Videoconferencing and Telepresence Systems in Virtual Project Environments: a Holistic Approach. D i p l o m a r b e i t

Effort Estimation of Software Development Projects with Neural Networks

Customer Intimacy Analytics

for High Performance Computing

Smartphone Integration in the Car IT: User Acceptance of Phone-Centric Car Connectivity Solutions

E-Commerce Design and Implementation Tutorial

Guidelines for the practical study semester Faculty of Mechatronics and Electrical Engineering

An Enterprise Modeling Framework for Banks using. Algebraic Graph Transformation

COUNTERACTING PHISHING THROUGH HCI: DETECTING ATTACKS AND WARNING USERS

Satellite-UMTS - Specification of Protocols and Traffic Performance Analysis

Agreement on. Dual Degree Master Program in Computer Science KAIST. Technische Universität Berlin

Buyout and Distressed Private Equity: Performance and Value Creation

Study regulations for the master s degree program in Energy Engineering at Technische Universität Berlin in El Gouna, Egypt Date: March 27, 2009

Entwurf eines Lizenzmanagement-Systems als zentraler Dienst für das Plan S Chassis. Masterarbeit

Three Contributions to Experimental Economics

Artikel I Änderung der Studienordnung In der Anlage zur Studienordnung wird Punkt 2 wie folgt neu gefasst:

Content Analyst's Cerebrant Combines SaaS Discovery, Machine Learning, and Content to Perform Next-Generation Research

Taming SDN Controllers in Heterogeneous Hardware Environments

Faculty of Economics, Business Administration and Information Technology

Guidance to the Master and PhD Programmes in Computer Science

Master Seminar Cloud Computing

THE MASTER'S DEGREE IN ENGLISH

Privacy-preserving Infrastructure for. Social Identity Management

Molecular Biology. Master s degree program. Faculty of Science UNI INFO 2015

A Decision Support Tool for the Risk Management of Offshore Wind Energy Projects

Ontology based Recruitment Process

ELPUB Digital Library v2.0. Application of semantic web technologies

Resource Monitoring in Industrial Manufacturing Using Knowledge-Based Technologies Lisa Theresa Abele

BOFIRE A FEM-programm for non-linear analysis of structural members under fire conditions

Content management and protection using Trusted Computing and MPEG-21 technologies

SOCIAL NETWORKS AND STUDENTS' FUTURE JOBS

Relevé de notes pour le cursus intégré Saar-Lor-Lux en physique Licence / Bachelor

FACT SHEET. White Paper on Teacher Education The teacher the role and the education (Report to the Storting No. 11 ( )) Principal elements

of 16 th November 2006

Enhancing the Fusion Method to Fusion B Requirements Engineering and Formal Specification

On Applying Big Data and Cloud Computing for Quality Improvement in Higher Education

KNOWLEDGENT REPORT Big Data Survey: Current Implementation Challenges

THE ROLE OF SMALL MANUFACTURING ENTERPRISES IN SUSTAINABLE REGIONAL DEVELOPMENT

1 Business Modeling. 1.1 Event-driven Process Chain (EPC) Seite 2

UNIVERSITÄTSBIBLIOTHEK

MASTER OF SCIENCE (MSc) IN ENGINEERING (SOFTWARE ENGINEERING) (Civilingeniør, Cand. Polyt. i Software Engineering)

USE OF INFORMATION SOURCES AMONGST POSTGRADUATE STUDENTS IN COMPUTER SCIENCE AND SOFTWARE ENGINEERING A CITATION ANALYSIS YIP SUMIN

Semantic Search in Portals using Ontologies

Programme Specifications

Module compendium of the Master s degree course of Information Systems

Improving SAS Global Forum Papers

Big Data Collection Study for Providing Efficient Information

MSc Programme Intelligent Adaptive Systems (IAS)

Customized Efficient Collection of Big Data for Advertising Services

Characterization of Microemulsions using Small Angle Scattering Techniques

Formal Concept Analysis used for object-oriented software modelling Wolfgang Hesse FB Mathematik und Informatik, Univ. Marburg

How To Determine Your Level Of Competence

HOW TO SUCCESSFULLY USE SOFTWARE PROJECT SIMULATION FOR EDUCATING SOFTWARE PROJECT MANAGERS

Databases and Information Systems 1 Part 3: Storage Structures and Indices

INDIVIDUAL MASTERY for: St#: Test: CH 9 Acceleration Test on 29/07/2015 Grade: B Score: % (35.00 of 41.00)

INDIVIDUAL MASTERY for: St#: Test: CH 9 Acceleration Test on 09/06/2015 Grade: A Score: % (38.00 of 41.00)

APPLICATION FOR ADMISSION For further information please visit our website

This is an English translation only. The original German version is the legally binding document.

Business Intelligence Operational Structures: Towards the Design of a Reference Process Map

A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS

D&B Data Manager Your Data Management process in the Cloud. Transparent, Complete & Up-To-Date Master Data

Tool-Based Business Process Modeling using the SOM Approach

Advanced Volume Rendering Techniques for Medical Applications

Bachelor Entrepreneurship and Corporate Succession

Natur und Umwelt. International Studies in Aquatic Tropical Ecology (ISATEC) Master

Multi-Channel Distribution Strategies in the Financial Services Industry

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects

UNIVERSITY OF BELGRADE FACULTY OF PHILOSOPHY. Part two: INFORMATION ON DEGREE PROGRAMS

Transcription:

Boom and Bust Cycles in Scientific Literature A Toolbased Big-Data Analysis Bachelorarbeit zur Erlangung des akademischen Grades Bachelor of Science (B.Sc.) im Studiengang Wirtschaftsingenieur der Fakultät für Elektrotechnik und Informatik, Fakultät für Maschinenbau und der Wirtschaftswissenschaftlichen Fakultät der Leibniz Universität Hannover vorgelegt von Name: Khoshandam Ghashang Geb. am. : 13.03.1992 Vorname: Mohammad Amin in: Paderborn Prüfer: Prof. Dr. M. H. Breitner Ort, den: Hannover, 01.09.2015

Contents Abstract ii 1. Introduction 1 1.1. Motivation................................ 1 1.2. Objectives................................ 2 1.3. Course of investigation......................... 3 2. Research background 4 2.1. Research design............................. 4 2.2. Process model.............................. 5 2.3. Definitions................................ 6 2.4. Literature review............................ 8 3. State of the art 15 3.1. Tools for research............................ 15 3.2. Natural Language Processing Methods............... 16 4. Case example 22 4.1. Literature database........................... 22 4.2. Query................................... 25 4.3. Results.................................. 26 4.4. Side findings............................... 30 4.5. Comparison of other search engines................. 36 5. Discussion, Limitations and Further Research 43 6. Conclusion and Outlook 47 A. Query text 49 B. Developer Manual 52 B.0.1. Components and Architecture................ 52 B.1. Classes and Methods.......................... 53 B.1.1. TSISQ-Class........................... 53 B.1.2. Tornado-Webserver Classes.................. 57 B.1.3. Development tools....................... 58 iii

Contents C. User Manual 60 C.1. Installation................................ 60 C.1.1. Requirements.......................... 60 C.1.2. Installation............................ 60 C.1.3. First Run............................. 61 C.2. How to use TSISQ............................ 63 C.2.1. Prerequisites for Documents................. 63 C.2.2. Adding Documents....................... 63 C.2.3. Building an dictionary..................... 63 C.2.4. Building an index........................ 64 C.2.5. Preform a Query........................ 64 C.2.6. Analyze Queries........................ 64 Bibliography 66 iv

1. Introduction 1.1. Motivation The growing number of research articles evokes to unstructured and complicated literature research and analysis for researchers (Parolo et al. (2015); Mabe and Amin (2001)). In times of the internet and electronic library s it is much easier to search for scientific articles as before, where researchers had to use books and magazines in paper form. As a consequence of this, the researcher publish even more articles and studies so that a structured overview of existing work becomes difficult (Parolo et al. (2015); J. Park and J.-N. Lee (2011)). It is particularly not possible to read any article in detail to select only these which are really relevant in the considered research area. However, a structured literature review is the basis for any research process and therefore it is indispensable (Webster and Watson (2002)). The published articles and studies are the foundation to identify a research gap and to ascertain an overview of the already existing knowledge in the research area (Levy and Ellis (2006)). Especially in the field of IS-research only a few articles are available, which describe the procedure of a structured literature review (Webster and Watson (2002)). A further option to identify relevant and high quality literature are the citations. Different calculations exist to evaluate the present article (Bar-Ilan (2008); Dong et al. (2005)). However, to read and study any article in detail and hence recognize the most relevant literature is still a manual, difficult and lengthy process. Oftentimes researchers are searching for the literature by using keywords which select not every relevant article since synonyms and periphrases are used by the authors (S. T. Dumais, Furnas, et al. (1988); Foltz (1990)). Thus, it is useful and more efficient to use latent semantic indexing to evaluate the present literature (Koukal et al. (2014); S. T. Dumais (1991)). Once the researcher selected the relevant articles and studies, it is useful to know from which country and of which source the literature is from. By knowing the source and country it can be evaluated if an article is on international high top level or if it is only a national or limited one. Additionally, to assess the identified literature in more detail it is beneficial to ascertain in which years these are published (Koukal et al. (2014)). Research should always be 1

1. Introduction connected to previous ones, since many research already exist, is published and is available in forms of articles and studies from conferences and journals. Hence, it should be analyzed in which years the literature is published and if this specific research topic is too old for further research or if it is still a hype and much potential is seen from the current researchers. However, it could be also possible, that the literature is elderly but the topic can be continued due to new developments e.g. around information system (IS) topics. Due to this reasons and the exponential rising number of published research articles, an automated analyzes may increase the process of studying the identified literature accurately. A further option is to examine if authors still search in the same field and only named their research different. Furthermore, diverse authors may use different keywords and synonyms for the same meaning. Hence, a latent semantic indexing supports to identify any relevant article. First, the articles can be selected quickly by an automated literature research within one or more databases. The next step is already to analyze these articles by having information about the source, hypes, trends, authors and published years before reading them to interpret them correctly. Due to the missing analytics options of current automated literature research tools, in this thesis an existing software tool from Koukal et al. (2014) named TSISQ (Tool for Semantic Indexing and Similarities Queries) is enhanced. It has its focus especially on the tasks to find and analyze scientific literature. 1.2. Objectives The core of this thesis is to further develop and improve the existing literature research tool from Koukal et al. (2014) to filter the identified literature e.g. for years or for sources and to analyze the articles automatically. To demonstrate the options and functions of the new tool, it is used in this thesis for a case example, the literature research and analyze for the topic of cloud computing. Furthermore, the present tool is limited to only 100 articles as result, so that one further aim is, to detect even more relevant literature. With the case example cloud computing, the relevant literature will be identified within the database AISeL, where articles from diverse conferences and journals 2

1. Introduction are available. These articles can be assorted e.g. of the similarity score to have first the most relevant and matching ones. The score during the case example is appointed to be at least 50%. The identified literature will be then used for further analyzes to ascertain boom and bust cycles within scientific literature for cloud computing. Using the tool, the research field of cloud computing will be presented and development over the years will be detected. The research tool TSISQ can support to interpret the found articles and gives an overview of the research field quickly. It may be used by any researcher as a basis for further research. Hence, this thesis seeks to answer the following research question: How can a LSI-based approach be used for an analysis of scientific literature? 1.3. Course of investigation Figure 1.1 shows the overview of the thesis. First, the research question (RQ) is defined. To derive the RQ a systematic literature review is necessary. This is the second step. After this step, the research tools are analyzed, regarding in which way they can support to answer the research question. In the third step, the past is analyzed in order to answer the RQ. In the fourth step, the discussion of results and a way to answer the research questions is presented. The thesis (sub)-tile is A tool-based big-data analysis: Tool-based is defined in step four whereas the big-data analysis is processed in step five. The last and sixth step is the discussion, which is in the fifth chapter. Figure 1.1.: Overview of the thesis 3

6. Conclusion and Outlook In this thesis the research objective was to enhance an existing literature research tool. The tool was developed to improve the analysis of the identified literature. Existing limits were identified and resolved. The tool displays now more than 100 results. For that purpose a new module was implemented to gain an indepth knowledge of the database and the query results. With this method, parts of the AISeL database were analyzed and a case example was preformed on the topic cloud computing. This topic was compared to another scientific search engine, the AISeL search engine. The comparison demonstrated the strength of this enhanced tool. In terms of quantity and quality more and better results could be realized. Additionally, the distribution over the time was evaluated. A trend was indicated, but not enough input data is available to determine this trend for sure. These results were compared also to public search trends. This comparison revealed that the science started later the discussion and analysis of the topic Cloud Computing as the public search interests for this specific topic. Moreover, a boom and bust cycle could be identified (at least in the public search trends). Furthermore, the same data revealed that the topic Big Data rises, when the topic cloud computing declines. The tool was developed to support researchers in their literature review process. An automated analyzes may increase the process of studying the identified literature accurately. The amount of research articles in the scientific literature is rising and therefore new methods to analyze them are necessary. The semantic similarity is a method, which can handle such rising data and is able to process reasonable queries and results automatically. A structured literature review is the basis for any research process. The developed tool increases and accelerate exact this indispensable process and can be hence utilized of researchers for their research field. 47