REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION"

Transcription

1 REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety of economic and social activities signifies a challenge and an opportunity for official statistics. But it remains still to discover how to extract significant value for the production of statistical figures from the diversity of data available. This paper proposes to start exploring some big data following a straightforward path to achieve results. Next section provides a preliminary insight on the issues that are at stake, and the following presents some ideas to start a road map from Eurostat. Some defining features of Big Data The most conspicuous feature of Big Data as compared to traditional statistical sources is that they do not come from a previous design with the aim of obtaining specific statistics, but become available as traces of human activity. This attribute makes it difficult to use traditional statistical methods and tools such as probabilistic sampling, statistical classifications and so on, turning into useless and not applicable the Generic Statistical Business Process Model. The outcomes of a recent survey conducted among executives of a wide range of industries around the world can be clarifying: although a great part of the respondents agreed on that data had become an important factor for their business, many companies were struggling with basic aspects of data management, still attempting to exploit it effectively [1]. This confirms the fact that extracting useful information from this kind of data is a non-obvious and rather difficult task that should be carefully planned. Although the appearance of some statistical figures based on Big Data may suggest the information is found directly on the data prepared to be published, there should be considered the huge amounts of data that have been previously analysed and processed to achieve these results. Nevertheless, the attractive of the potential reduction of respondents burden and costs, and the general framework of improving the productivity of the ESS, introduce increasing pressure to the use of Big Data as sources of statistical information. The experience on the use of another source that shares with Big Data this feature of not being designed for statistical purposes as it is administrative data may illuminate the road map, comparing the features in common and the ones that make a difference. The main aspects of the administrative data we are interested to consider are the following: 1

2 a) Methods to obtain statistical information from administrative data usually depend on the specific data, being difficult to establish general rules or use generic production process models. b) Administrative data are not structured as statistical data are, that is, they do not use statistical classifications and definitions, but still they show a certain structure related with the objective of its creation. This means that some tasks of translating, linking or harmonizing the structures (units, definitions, classifications...) should always be done. c) Sampling procedures are not used to obtain the reporting units but frequently there is an idea of their representativeness on the population of interest (sometimes all the population units are included). d) The volume of administrative data is not usually a problem and they may be treated with the statistical procedures used with other typical sources. e) The way they are increasingly being used to produce statistical figures can be classified as (i) totally replacing statistical sources, (ii) partially replacing statistical sources, completing the information by means of record linkage, matching or other procedures, and (iii) providing completely new statistical figures that may be a complement to the available statistical information from other perspectives. The two first ways may result in theory on significant reductions of costs and respondents burden, but they frequently imply new tasks of translating, linking or harmonizing which are not necessary when completely new statistical figures are produced. An example of this last case could be the figures of registered unemployment. As for Big Data, and concerning the same corresponding features: a) Due to the heterogeneity of Big Data available, methods to produce statistical information should be developed ad hoc for each case, exactly the same to the case of administrative data. b) Some Big Data have a certain structure related with the source of information and some are just unstructured text strings. Good metadata are not usually available and it seems that in most of the cases the tasks to harmonize or translate to statistical structures would be enormous. c) Apart from not using sampling procedures, Big Data come frequently from private companies and its representativeness and coverage over the populations of interest for official statistical is difficult to assess. d) The name of Big Data refers precisely to the huge volume. This dimension has an impact on the storage and processing, falling frequently out of the scope of the traditional statistical tools. e) The way Big Data could be used to produce statistical figures deals with a crucial issue. The idea is that it seems not easy to find Big Data able to totally or partially replace statistical sources in the short term because of the reasons explained in previous points and follow the path in this direction may be too expensive in time and resources. Thus, a sound approach would be to start searching for sources that could provide completely new and independent statistical figures not adapted to traditional statistical structures but offering new perspectives. For example, instead of finding sources to substitute the HBS, try to build indicators of its trends over time. When improvements on this area are achieved, the new set of statistics available will provide a valuable basis for re-designing the products and the process of production of official statistics. There may be opportunities to tackle the specific problems of Big Data by using the suitable tools: 1. An apparently critical problem is the volume of the Big Data available: there is a necessity to move away from exclusive dependence on the statistical methods that cannot handle this volume of information and adopt a more 2

3 diverse set of tools. This can be simply addressed through the use of algorithms specially developed for this goal such as data mining methods. These algorithms have the computational efficiency required and are scalable, that is, have the ability to handle a growing amount of work in a capable manner, or to be enlarged to accommodate that growth [2]. The state of the art provides a great variety of data mining tools for different objectives: classification, clustering, regression, association, feature extraction A first stage of exploration using data mining procedures should be usually carried out to learn about the unknown data structure and the possible outcomes, combining later this with traditional statistical procedures. The type of Big Data and its form determines the type of data mining tool to be used. Thus the statistical production process from Big Data should have as a first step the performance of an exploratory analysis. A combination of data mining and traditional statistical procedures may follow to produce the best results. 2. Another important concern is the representativeness and validity of the statistics produced. The use of probabilistic sampling in traditional statistics provides a theoretical framework that ensures confidence on the figures produced, being the accuracy based on sampling errors. Most of Big Data available cannot be adapted to this framework and other procedures should be devised. This seems to be an important weakness of Big Data use and efforts should be focused on it. Meanwhile, experiences of successful uses of Big Data could be investigated to follow a similar approach. Two well-known examples are here briefly considered. The first refers to the estimates of the incidence of flu in different countries and regions around the world from the searches on Google for flu-related topics [3]. It has been found that these estimates are very closely matched to traditional flu activity indicators. Similarly, a recent article in BBC News [4] reported that Google searches for finance-related terms may predict moves in markets, and that an investment strategy based on these search volume data between 2004 and 2011 would have made a profit of 326%. These examples have two important features in common (apart from being Google products) that may help with the problem of representativeness, coverage and validity. The first thing is that both of them estimate changes or movements across time, and not absolute figures. A well-established statistical principle is that it is more reliable to estimate changes (over time or space) than absolute figures, because some bias and errors can be cancelled up when computing the change: maybe the first attempts to use Big Data should be addressed to produce estimates of changes or evolutions. The other relevant feature shared by both examples is the criterion to evaluate the results. What is estimated are proxy variables that perform well in following the movements of a phenomenon of interest. That is, the performance is assessed in terms of its similarity to other figures available measuring the same or analogous thing. In the same way, the performance of Big Data could be evaluated on a first instance from the similarity or agreement to other available measures and not from a sampling errors criterion. This makes sense from a data mining perspective, where the equivalent to fitting a model is tuning an algorithm so that it fits with the real world. 3

4 When many different statistical figures are produced from different and independent Big Data sources following these principles, the coherence and agreement among them may be an argument to support the validity and representativeness of the whole system. 3. Although Big Data may be not structured as statistical data are, they may have the same type of structure/non-structure across countries. This would have the advantage of making unnecessary the process of harmonization between countries what is of special interest for transnational statistics. 4. There are other concerns about Big Data that seem to be similar to the case of statistical sources, such as the appearance of diverse types of problems or errors: noise, incompleteness, missing data, reporting errors, outliers Data editing (cleaning, checking, imputing ) are time and resources consuming activities in traditional statistical processing and similar methods to deal with them could be used. It is likely that some errors (reporting, incompleteness ) have fewer occurrences in Big Data through non-human intervention on its origin, although machine or system failures may as well happen, producing other errors. A new type of problem that do not occur with statistical sources but may emerge in Big Data is imprecision (for instance, vague or categorical measures as high, medium, low...): it may be attacked using other data mining tools such as fuzzy and rough sets. Some data mining procedures are interesting because they are robust in the sense of being tolerant towards erroneous data or departures from data assumptions. In any case, all these methods should be developed in an ad hoc basis. A final remark is that the opportunity with the use of Big Data is based on the reduction of burden to respondents and that sometimes may be quickly obtained. Hence prior to engage into a complex process to make it a reliable source for statistics, a careful analysis of the potential gains should be made. Or, something similar, reduction of costs, burden and timeliness provided by the use of Big Data may balance a possible decreasing of accuracy or quality in general. A possible road map to exploit Big Data This section just sketches out a few actions that Eurostat may promote as a first step to exploit Big Data sources. These actions are to: 1. Identify possible Big Data sources. These may be private or public, internet or non-internet, being the interest especially on those sources having international scope and appropriate to produce indicators of trends or changes in different economic and social activities. The access to these data and possible problems (confidentiality, ownership ) should be studied as well. 2. Gather information of practices in European countries on the use of Big Data for producing statistics, classifying the methods and tools and the outcomes produced. This would provide information on alternative approaches. 4

5 3. Launch pilot research projects to produce statistical figures from identified Big Data sources. An example of a possible research exercise using a non-internet Big Data source is the production of an indicator of the evolution of household budgets from the transactions records of a department store. It may be obtained using association rules as a first step and computing later weighted indices. These can be checked comparing to the outcomes of alternative sources as the annual HBS. References [1] Big Data: Lessons from the leaders, The Economist Intelligence Unit Limited, [2] André B. Bondi, Characteristics of scalability and their impact on performance, Proceedings of the 2nd international workshop on Software and performance, Ottawa, Ontario, Canada, 2000, ISBN X. [3] [4] 5

CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

More information

ICT Perspectives on Big Data: Well Sorted Materials

ICT Perspectives on Big Data: Well Sorted Materials ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1. Introduction 1.1 Data Warehouse In the 1990's as organizations of scale began to need more timely data for their business, they found that traditional information systems technology

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

OECD SHORT-TERM ECONOMIC STATISTICS EXPERT GROUP (STESEG)

OECD SHORT-TERM ECONOMIC STATISTICS EXPERT GROUP (STESEG) OECD SHORT-TERM ECONOMIC STATISTICS EXPERT GROUP (STESEG) 10-11 September 2009 OECD Conference Centre, Paris Session II: Short-Term Economic Statistics and the Current Crisis A national statistics office

More information

International Journal of Electronics and Computer Science Engineering 1449

International Journal of Electronics and Computer Science Engineering 1449 International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

THE INTELLIGENT BUSINESS INTELLIGENCE SOLUTIONS

THE INTELLIGENT BUSINESS INTELLIGENCE SOLUTIONS THE INTELLIGENT BUSINESS INTELLIGENCE SOLUTIONS ADRIAN COJOCARIU, CRISTINA OFELIA STANCIU TIBISCUS UNIVERSITY OF TIMIŞOARA, FACULTY OF ECONOMIC SCIENCE, DALIEI STR, 1/A, TIMIŞOARA, 300558, ROMANIA ofelia.stanciu@gmail.com,

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT

DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT Scientific Bulletin Economic Sciences, Vol. 9 (15) - Information technology - DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT Associate Professor, Ph.D. Emil BURTESCU University of Pitesti,

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Data quality and metadata

Data quality and metadata Chapter IX. Data quality and metadata This draft is based on the text adopted by the UN Statistical Commission for purposes of international recommendations for industrial and distributive trade statistics.

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Data mining and official statistics

Data mining and official statistics Quinta Conferenza Nazionale di Statistica Data mining and official statistics Gilbert Saporta président de la Société française de statistique 5@ S Roma 15, 16, 17 novembre 2000 Palazzo dei Congressi Piazzale

More information

INFORMATION LOGISTICS VERSUS SEARCH. How context-sensitive information retrieval saves time spent reaching goals

INFORMATION LOGISTICS VERSUS SEARCH. How context-sensitive information retrieval saves time spent reaching goals INFORMATION LOGISTICS VERSUS SEARCH How context-sensitive information retrieval saves time spent reaching goals 2 Information logictics versus search Table of contents Page Topic 3 Search 3 Basic methodology

More information

Business Information Systems. IT Enabled Services And Emerging Technologies. Chapter 4: Facilitated e-learning Part 1 of 2 CA M S Mehta, FCA

Business Information Systems. IT Enabled Services And Emerging Technologies. Chapter 4: Facilitated e-learning Part 1 of 2 CA M S Mehta, FCA Business Information Systems IT Enabled Services And Emerging Technologies Chapter 4: Facilitated e-learning Part 1 of 2 CA M S Mehta, FCA 1 Business Information Systems Task Statements 1.6 Consider the

More information

relevant to the management dilemma or management question.

relevant to the management dilemma or management question. CHAPTER 5: Clarifying the Research Question through Secondary Data and Exploration (Handout) A SEARCH STRATEGY FOR EXPLORATION Exploration is particularly useful when researchers lack a clear idea of the

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information

More information

Cloud computing based big data ecosystem and requirements

Cloud computing based big data ecosystem and requirements Cloud computing based big data ecosystem and requirements Yongshun Cai ( 蔡 永 顺 ) Associate Rapporteur of ITU T SG13 Q17 China Telecom Dong Wang ( 王 东 ) Rapporteur of ITU T SG13 Q18 ZTE Corporation Agenda

More information

Customer Analysis - Customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles, etc.

Customer Analysis - Customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles, etc. Data Warehouses Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

What is FuturICT? Why do we need it?

What is FuturICT? Why do we need it? ICT Global computing for our complex world Complexity Science Social Sciences What is FuturICT? FuturICT is a visionary project that will deliver new science and technology to explore, understand and manage

More information

Will big data transform official statistics?

Will big data transform official statistics? Will big data transform official statistics? Denisa Florescu, Martin Karlberg, Fernando Reis, Pilar Rey Del Castillo, Michail Skaliotis and Albrecht Wirthmann 1 Abstract Official Statistics, confronted

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

Sampling: Design and Procedures

Sampling: Design and Procedures MBACATÓLICA JAN/APRIL 2006 Marketing Research Fernando S. Machado Week 6 Sampling: Design and Procedures Sampling: Sample Size Determination Data Preparation 1 Sampling: Design and Procedures The Sampling

More information

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions

More information

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information

More information

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, University of Indonesia Objectives

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

Task force on quality of BCS data. Analysis of sample size in consumer surveys

Task force on quality of BCS data. Analysis of sample size in consumer surveys Task force on quality of BCS data Analysis of sample size in consumer surveys theoretical considerations and factors determining minimum necessary sample sizes, link between country size and sample size

More information

E-commerce and ICT activity 2009

E-commerce and ICT activity 2009 E-commerce and ICT activity 2009 Date: 26 November 2010 Coverage: United Kingdom Theme: Economy The 2009 annual survey into e-commerce and ICT activity measures the use of information and communication

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Alternative data collection methods -

Alternative data collection methods - Alternative data collection methods - focus on online data Presentation prepared by Ragnhild Nygaard, Statistics Norway for the UNECE/ILO Meeting on CPIs, Geneva, 2.-4. May 2016 Contents Data sources and

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Appendix B Data Quality Dimensions

Appendix B Data Quality Dimensions Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational

More information

The Benefits of Using Data Mining Approach in Business Intelligence for Healthcare Organizations

The Benefits of Using Data Mining Approach in Business Intelligence for Healthcare Organizations The Benefits of Using Data Mining Approach in Business Intelligence for Healthcare Organizations Hisham S, Katoua Management Information Systems Dept. Faculty of Economics & Administration King Abdulaziz

More information

warehouse landscape for HINC

warehouse landscape for HINC Transforming the data warehouse landscape for the financial industry HINC by Graz A data warehouse pre-configured for the financial industry significantly reduces the costs and risks associated with reporting

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS

THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS List of best practice for the conduct of business and consumer surveys 21 March 2014 Economic and Financial Affairs This document is written

More information

Key Requirements for a Job Scheduling and Workload Automation Solution

Key Requirements for a Job Scheduling and Workload Automation Solution Key Requirements for a Job Scheduling and Workload Automation Solution Traditional batch job scheduling isn t enough. Short Guide Overcoming Today s Job Scheduling Challenges While traditional batch job

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

More information

1. Understanding Big Data

1. Understanding Big Data Big Data and its Real Impact on Your Security & Privacy Framework: A Pragmatic Overview Erik Luysterborg Partner, Deloitte EMEA Data Protection & Privacy leader Prague, SCCE, March 22 nd 2016 1. 2016 Deloitte

More information

FOR THE NATIONAL AND COMMUNITY STATISTICAL AUTHORITIES. Adopted by the European Statistical System Committee

FOR THE NATIONAL AND COMMUNITY STATISTICAL AUTHORITIES. Adopted by the European Statistical System Committee FOR THE NATIONAL AND COMMUNITY STATISTICAL AUTHORITIES Adopted by the European Statistical System Committee 28th September 2011 Preamble The vision of the European Statistical System 1 The European Statistical

More information

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell Leo_Pipino@UML.edu David Kopcso Babson College Kopcso@Babson.edu Abstract: A series of simulations

More information

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/

More information

Corporate Data Quality Policy

Corporate Data Quality Policy Appendix A Corporate Data Quality Policy Right first time Author: Head of Policy Date: November 2008 Contents 1. INTRODUCTION...3 2. STATEMENT OF MANAGEMENT INTENT...3 3. POLICY AIM...3 4. DEFINITION OF

More information

Accountable relationship marketing: Evidence of an agricultural input vendor

Accountable relationship marketing: Evidence of an agricultural input vendor Accountable relationship marketing: Evidence of an agricultural input vendor a. Problem Statement Relationship marketing has become a buzz word in marketing management practice and theory (Palmatier et

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

ISSN: 2321-7782 (Online) Volume 3, Issue 7, July 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 3, Issue 7, July 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 7, July 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Introduction Electronic Commerce 2 is accelerating dramatically changes in the business process. Electronic

More information

Automatic Document Categorization A Hummingbird White Paper

Automatic Document Categorization A Hummingbird White Paper Automatic Document Categorization A Hummingbird White Paper Automatic Document Categorization While every attempt has been made to ensure the accuracy and completeness of the information in this document,

More information

Research of Postal Data mining system based on big data

Research of Postal Data mining system based on big data 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

More information

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining BUSINESS INTELLIGENCE Bogdan Mohor Dumitrita 1 Abstract A Business Intelligence (BI)-driven approach can be very effective in implementing business transformation programs within an enterprise framework.

More information

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational

More information

Intrusion Detection System using Log Files and Reinforcement Learning

Intrusion Detection System using Log Files and Reinforcement Learning Intrusion Detection System using Log Files and Reinforcement Learning Bhagyashree Deokar, Ambarish Hazarnis Department of Computer Engineering K. J. Somaiya College of Engineering, Mumbai, India ABSTRACT

More information

Delivering Smart Answers!

Delivering Smart Answers! Companion for SharePoint Topic Analyst Companion for SharePoint All Your Information Enterprise-ready Enrich SharePoint, your central place for document and workflow management, not only with an improved

More information

Higher Business ROI with Optimized Prediction

Higher Business ROI with Optimized Prediction Higher Business ROI with Optimized Prediction Yottamine s Unique and Powerful Solution Forward thinking businesses are starting to use predictive analytics to predict which future business events will

More information

Data Mining for Loyalty Based Management

Data Mining for Loyalty Based Management Data Mining for Loyalty Based Management Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, Peter Zemp Credit Suisse P.O. Box 100, CH - 8070 Zurich, Switzerland markus.tresch@credit-suisse.ch,

More information

Big Data-Challenges and Opportunities

Big Data-Challenges and Opportunities Big Data-Challenges and Opportunities White paper - August 2014 User Acceptance Tests Test Case Execution Quality Definition Test Design Test Plan Test Case Development Table of Contents Introduction 1

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Introduction to Quality Assessment

Introduction to Quality Assessment Introduction to Quality Assessment EU Twinning Project JO/13/ENP/ST/23 23-27 November 2014 Component 3: Quality and metadata Activity 3.9: Quality Audit I Mrs Giovanna Brancato, Senior Researcher, Head

More information

Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction

Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013 ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION, Fuel Consulting, LLC May 2013 DATA AND ANALYSIS INTERACTION Understanding the content, accuracy, source, and completeness of data is critical to the

More information

The CPA Way 5 - Conclude and Advise

The CPA Way 5 - Conclude and Advise The CPA Way 5 - Conclude and Advise This document focuses on Conclude and Advise, the fourth part of The CPA Way, as shown in the following diagram. For an overview of Conclude and Advise, see the video

More information

Data Mining and Analytics in Realizeit

Data Mining and Analytics in Realizeit Data Mining and Analytics in Realizeit November 4, 2013 Dr. Colm P. Howlin Data mining is the process of discovering patterns in large data sets. It draws on a wide range of disciplines, including statistics,

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

Data Mining Applications in Manufacturing

Data Mining Applications in Manufacturing Data Mining Applications in Manufacturing Dr Jenny Harding Senior Lecturer Wolfson School of Mechanical & Manufacturing Engineering, Loughborough University Identification of Knowledge - Context Intelligent

More information

The Future of Content Aggregation 2011

The Future of Content Aggregation 2011 The Future of Content Aggregation 2011 The Future of Content Aggregation 2011 "Traditional" print media is under pressure; fuelled by the perceived commoditisation of news on the internet, falling print

More information

DESKTOP BASED RECOMMENDATION SYSTEM FOR CAMPUS RECRUITMENT USING MAHOUT

DESKTOP BASED RECOMMENDATION SYSTEM FOR CAMPUS RECRUITMENT USING MAHOUT Journal homepage: www.mjret.in ISSN:2348-6953 DESKTOP BASED RECOMMENDATION SYSTEM FOR CAMPUS RECRUITMENT USING MAHOUT 1 Ronak V Patil, 2 Sneha R Gadekar, 3 Prashant P Chavan, 4 Vikas G Aher Department

More information

Fight fire with fire when protecting sensitive data

Fight fire with fire when protecting sensitive data Fight fire with fire when protecting sensitive data White paper by Yaniv Avidan published: January 2016 In an era when both routine and non-routine tasks are automated such as having a diagnostic capsule

More information

Chapter ML:XI. XI. Cluster Analysis

Chapter ML:XI. XI. Cluster Analysis Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster

More information

Data Mining Algorithms and Techniques Research in CRM Systems

Data Mining Algorithms and Techniques Research in CRM Systems Data Mining Algorithms and Techniques Research in CRM Systems ADELA TUDOR, ADELA BARA, IULIANA BOTHA The Bucharest Academy of Economic Studies Bucharest ROMANIA {Adela_Lungu}@yahoo.com {Bara.Adela, Iuliana.Botha}@ie.ase.ro

More information

Improving quality through regular reviews:

Improving quality through regular reviews: Implementing Regular Quality Reviews at the Office for National Statistics Ria Sanderson, Catherine Bremner Quality Centre 1, Office for National Statistics, UK Abstract There is a requirement under the

More information

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining System, Functionalities and Applications: A Radical Review Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially

More information

Towards a Measurement Agenda for Innovation

Towards a Measurement Agenda for Innovation Towards a Measurement Agenda for Innovation Towards a Measurement Agenda for Innovation builds on the OECD s half-century of indicator development and the challenge presented by the broad horizontal focus

More information

Collaborations between Official Statistics and Academia in the Era of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Connecting library content using data mining and text analytics on structured and unstructured data

Connecting library content using data mining and text analytics on structured and unstructured data Submitted on: May 5, 2013 Connecting library content using data mining and text analytics on structured and unstructured data Chee Kiam Lim Technology and Innovation, National Library Board, Singapore.

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Energy Industry Cybersecurity Report. July 2015

Energy Industry Cybersecurity Report. July 2015 Energy Industry Cybersecurity Report July 2015 Energy Industry Cybersecurity Report INTRODUCTION Due to information sharing concerns, energy industry cybersecurity information is not readily available.

More information

Quality Control of Web-Scraped and Transaction Data (Scanner Data)

Quality Control of Web-Scraped and Transaction Data (Scanner Data) Quality Control of Web-Scraped and Transaction Data (Scanner Data) Ingolf Boettcher 1 1 Statistics Austria, Vienna, Austria; ingolf.boettcher@statistik.gv.at Abstract New data sources such as web-scraped

More information

Tapping the benefits of business analytics and optimization

Tapping the benefits of business analytics and optimization IBM Sales and Distribution Chemicals and Petroleum White Paper Tapping the benefits of business analytics and optimization A rich source of intelligence for the chemicals and petroleum industries 2 Tapping

More information

Online Ensembles for Financial Trading

Online Ensembles for Financial Trading Online Ensembles for Financial Trading Jorge Barbosa 1 and Luis Torgo 2 1 MADSAD/FEP, University of Porto, R. Dr. Roberto Frias, 4200-464 Porto, Portugal jorgebarbosa@iol.pt 2 LIACC-FEP, University of

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

IMPLEMENTATION OF THE NORTH AMERICAN INDUSTRY CLASSIFICATION SYSTEM (NAICS) IN MEXICO

IMPLEMENTATION OF THE NORTH AMERICAN INDUSTRY CLASSIFICATION SYSTEM (NAICS) IN MEXICO IMPLEMENTATION OF THE NORTH AMERICAN INDUSTRY CLASSIFICATION SYSTEM (NAICS) IN MEXICO Eva Castillo Navarrete INSTITUTO NACIONAL DE ESTADÍSTICA Y GEOGRAFÍA (INEGI) Dirección General de Estadísticas Económicas

More information

2. Metadata update 2.1 Metadata last certified 07 August 2013 2.2 Metadata last posted 07 August 2013 2.3 Metadata last update 07 August 2013

2. Metadata update 2.1 Metadata last certified 07 August 2013 2.2 Metadata last posted 07 August 2013 2.3 Metadata last update 07 August 2013 1. Contact 1.1 Contact organisation STATEC 1.2 Contact organisation unit Unit SOC4: Price statistics 1.5 Contact mail address 13, rue Erasme L-1468 Luxembourg 2. Metadata update 2.1 Metadata last certified

More information

The University of Adelaide Business School

The University of Adelaide Business School The University of Adelaide Business School MBA Projects Introduction There are TWO types of project which may be undertaken by an individual student OR a team of up to 5 students. This outline presents

More information

Questionnaire Design. Outline. Introduction

Questionnaire Design. Outline. Introduction Questionnaire Design Jean S. Kutner, MD, MSPH University of Colorado Health Sciences Center Outline Introduction Defining and clarifying survey variables Planning analysis Data collection methods Formulating

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information