REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION
|
|
- Jerome Lee
- 7 years ago
- Views:
Transcription
1 REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety of economic and social activities signifies a challenge and an opportunity for official statistics. But it remains still to discover how to extract significant value for the production of statistical figures from the diversity of data available. This paper proposes to start exploring some big data following a straightforward path to achieve results. Next section provides a preliminary insight on the issues that are at stake, and the following presents some ideas to start a road map from Eurostat. Some defining features of Big Data The most conspicuous feature of Big Data as compared to traditional statistical sources is that they do not come from a previous design with the aim of obtaining specific statistics, but become available as traces of human activity. This attribute makes it difficult to use traditional statistical methods and tools such as probabilistic sampling, statistical classifications and so on, turning into useless and not applicable the Generic Statistical Business Process Model. The outcomes of a recent survey conducted among executives of a wide range of industries around the world can be clarifying: although a great part of the respondents agreed on that data had become an important factor for their business, many companies were struggling with basic aspects of data management, still attempting to exploit it effectively [1]. This confirms the fact that extracting useful information from this kind of data is a non-obvious and rather difficult task that should be carefully planned. Although the appearance of some statistical figures based on Big Data may suggest the information is found directly on the data prepared to be published, there should be considered the huge amounts of data that have been previously analysed and processed to achieve these results. Nevertheless, the attractive of the potential reduction of respondents burden and costs, and the general framework of improving the productivity of the ESS, introduce increasing pressure to the use of Big Data as sources of statistical information. The experience on the use of another source that shares with Big Data this feature of not being designed for statistical purposes as it is administrative data may illuminate the road map, comparing the features in common and the ones that make a difference. The main aspects of the administrative data we are interested to consider are the following: 1
2 a) Methods to obtain statistical information from administrative data usually depend on the specific data, being difficult to establish general rules or use generic production process models. b) Administrative data are not structured as statistical data are, that is, they do not use statistical classifications and definitions, but still they show a certain structure related with the objective of its creation. This means that some tasks of translating, linking or harmonizing the structures (units, definitions, classifications...) should always be done. c) Sampling procedures are not used to obtain the reporting units but frequently there is an idea of their representativeness on the population of interest (sometimes all the population units are included). d) The volume of administrative data is not usually a problem and they may be treated with the statistical procedures used with other typical sources. e) The way they are increasingly being used to produce statistical figures can be classified as (i) totally replacing statistical sources, (ii) partially replacing statistical sources, completing the information by means of record linkage, matching or other procedures, and (iii) providing completely new statistical figures that may be a complement to the available statistical information from other perspectives. The two first ways may result in theory on significant reductions of costs and respondents burden, but they frequently imply new tasks of translating, linking or harmonizing which are not necessary when completely new statistical figures are produced. An example of this last case could be the figures of registered unemployment. As for Big Data, and concerning the same corresponding features: a) Due to the heterogeneity of Big Data available, methods to produce statistical information should be developed ad hoc for each case, exactly the same to the case of administrative data. b) Some Big Data have a certain structure related with the source of information and some are just unstructured text strings. Good metadata are not usually available and it seems that in most of the cases the tasks to harmonize or translate to statistical structures would be enormous. c) Apart from not using sampling procedures, Big Data come frequently from private companies and its representativeness and coverage over the populations of interest for official statistical is difficult to assess. d) The name of Big Data refers precisely to the huge volume. This dimension has an impact on the storage and processing, falling frequently out of the scope of the traditional statistical tools. e) The way Big Data could be used to produce statistical figures deals with a crucial issue. The idea is that it seems not easy to find Big Data able to totally or partially replace statistical sources in the short term because of the reasons explained in previous points and follow the path in this direction may be too expensive in time and resources. Thus, a sound approach would be to start searching for sources that could provide completely new and independent statistical figures not adapted to traditional statistical structures but offering new perspectives. For example, instead of finding sources to substitute the HBS, try to build indicators of its trends over time. When improvements on this area are achieved, the new set of statistics available will provide a valuable basis for re-designing the products and the process of production of official statistics. There may be opportunities to tackle the specific problems of Big Data by using the suitable tools: 1. An apparently critical problem is the volume of the Big Data available: there is a necessity to move away from exclusive dependence on the statistical methods that cannot handle this volume of information and adopt a more 2
3 diverse set of tools. This can be simply addressed through the use of algorithms specially developed for this goal such as data mining methods. These algorithms have the computational efficiency required and are scalable, that is, have the ability to handle a growing amount of work in a capable manner, or to be enlarged to accommodate that growth [2]. The state of the art provides a great variety of data mining tools for different objectives: classification, clustering, regression, association, feature extraction A first stage of exploration using data mining procedures should be usually carried out to learn about the unknown data structure and the possible outcomes, combining later this with traditional statistical procedures. The type of Big Data and its form determines the type of data mining tool to be used. Thus the statistical production process from Big Data should have as a first step the performance of an exploratory analysis. A combination of data mining and traditional statistical procedures may follow to produce the best results. 2. Another important concern is the representativeness and validity of the statistics produced. The use of probabilistic sampling in traditional statistics provides a theoretical framework that ensures confidence on the figures produced, being the accuracy based on sampling errors. Most of Big Data available cannot be adapted to this framework and other procedures should be devised. This seems to be an important weakness of Big Data use and efforts should be focused on it. Meanwhile, experiences of successful uses of Big Data could be investigated to follow a similar approach. Two well-known examples are here briefly considered. The first refers to the estimates of the incidence of flu in different countries and regions around the world from the searches on Google for flu-related topics [3]. It has been found that these estimates are very closely matched to traditional flu activity indicators. Similarly, a recent article in BBC News [4] reported that Google searches for finance-related terms may predict moves in markets, and that an investment strategy based on these search volume data between 2004 and 2011 would have made a profit of 326%. These examples have two important features in common (apart from being Google products) that may help with the problem of representativeness, coverage and validity. The first thing is that both of them estimate changes or movements across time, and not absolute figures. A well-established statistical principle is that it is more reliable to estimate changes (over time or space) than absolute figures, because some bias and errors can be cancelled up when computing the change: maybe the first attempts to use Big Data should be addressed to produce estimates of changes or evolutions. The other relevant feature shared by both examples is the criterion to evaluate the results. What is estimated are proxy variables that perform well in following the movements of a phenomenon of interest. That is, the performance is assessed in terms of its similarity to other figures available measuring the same or analogous thing. In the same way, the performance of Big Data could be evaluated on a first instance from the similarity or agreement to other available measures and not from a sampling errors criterion. This makes sense from a data mining perspective, where the equivalent to fitting a model is tuning an algorithm so that it fits with the real world. 3
4 When many different statistical figures are produced from different and independent Big Data sources following these principles, the coherence and agreement among them may be an argument to support the validity and representativeness of the whole system. 3. Although Big Data may be not structured as statistical data are, they may have the same type of structure/non-structure across countries. This would have the advantage of making unnecessary the process of harmonization between countries what is of special interest for transnational statistics. 4. There are other concerns about Big Data that seem to be similar to the case of statistical sources, such as the appearance of diverse types of problems or errors: noise, incompleteness, missing data, reporting errors, outliers Data editing (cleaning, checking, imputing ) are time and resources consuming activities in traditional statistical processing and similar methods to deal with them could be used. It is likely that some errors (reporting, incompleteness ) have fewer occurrences in Big Data through non-human intervention on its origin, although machine or system failures may as well happen, producing other errors. A new type of problem that do not occur with statistical sources but may emerge in Big Data is imprecision (for instance, vague or categorical measures as high, medium, low...): it may be attacked using other data mining tools such as fuzzy and rough sets. Some data mining procedures are interesting because they are robust in the sense of being tolerant towards erroneous data or departures from data assumptions. In any case, all these methods should be developed in an ad hoc basis. A final remark is that the opportunity with the use of Big Data is based on the reduction of burden to respondents and that sometimes may be quickly obtained. Hence prior to engage into a complex process to make it a reliable source for statistics, a careful analysis of the potential gains should be made. Or, something similar, reduction of costs, burden and timeliness provided by the use of Big Data may balance a possible decreasing of accuracy or quality in general. A possible road map to exploit Big Data This section just sketches out a few actions that Eurostat may promote as a first step to exploit Big Data sources. These actions are to: 1. Identify possible Big Data sources. These may be private or public, internet or non-internet, being the interest especially on those sources having international scope and appropriate to produce indicators of trends or changes in different economic and social activities. The access to these data and possible problems (confidentiality, ownership ) should be studied as well. 2. Gather information of practices in European countries on the use of Big Data for producing statistics, classifying the methods and tools and the outcomes produced. This would provide information on alternative approaches. 4
5 3. Launch pilot research projects to produce statistical figures from identified Big Data sources. An example of a possible research exercise using a non-internet Big Data source is the production of an indicator of the evolution of household budgets from the transactions records of a department store. It may be obtained using association rules as a first step and computing later weighted indices. These can be checked comparing to the outcomes of alternative sources as the annual HBS. References [1] Big Data: Lessons from the leaders, The Economist Intelligence Unit Limited, [2] André B. Bondi, Characteristics of scalability and their impact on performance, Proceedings of the 2nd international workshop on Software and performance, Ottawa, Ontario, Canada, 2000, ISBN X. [3] [4] 5
Appendix B Data Quality Dimensions
Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational
More informationOECD SHORT-TERM ECONOMIC STATISTICS EXPERT GROUP (STESEG)
OECD SHORT-TERM ECONOMIC STATISTICS EXPERT GROUP (STESEG) 10-11 September 2009 OECD Conference Centre, Paris Session II: Short-Term Economic Statistics and the Current Crisis A national statistics office
More informationICT Perspectives on Big Data: Well Sorted Materials
ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationTHE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS
THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS List of best practice for the conduct of business and consumer surveys 21 March 2014 Economic and Financial Affairs This document is written
More informationData quality and metadata
Chapter IX. Data quality and metadata This draft is based on the text adopted by the UN Statistical Commission for purposes of international recommendations for industrial and distributive trade statistics.
More informationDATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT
Scientific Bulletin Economic Sciences, Vol. 9 (15) - Information technology - DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT Associate Professor, Ph.D. Emil BURTESCU University of Pitesti,
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1. Introduction 1.1 Data Warehouse In the 1990's as organizations of scale began to need more timely data for their business, they found that traditional information systems technology
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationANNUAL QUALITY REPORT
ANNUAL QUALITY REPORT FOR THE SURVEY ANNUAL STATISTICAL SURVEY ON THE QUANTITY OF WASTE AT WASTE LANDFILL SITES (KO-U) FOR 2013 Prepared by: Mojca Žitnik, Marko Polh, Department for Environment and Energy
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More information5 Discussion and Implications
5 Discussion and Implications 5.1 Summary of the findings and theoretical implications The main goal of this thesis is to provide insights into how online customers needs structured in the customer purchase
More informationInternational Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
More informationConcept and Project Objectives
3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the
More informationCloud computing based big data ecosystem and requirements
Cloud computing based big data ecosystem and requirements Yongshun Cai ( 蔡 永 顺 ) Associate Rapporteur of ITU T SG13 Q17 China Telecom Dong Wang ( 王 东 ) Rapporteur of ITU T SG13 Q18 ZTE Corporation Agenda
More informationThe big data revolution
The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing
More informationWill big data transform official statistics?
Will big data transform official statistics? Denisa Florescu, Martin Karlberg, Fernando Reis, Pilar Rey Del Castillo, Michail Skaliotis and Albrecht Wirthmann 1 Abstract Official Statistics, confronted
More informationIntrusion Detection System using Log Files and Reinforcement Learning
Intrusion Detection System using Log Files and Reinforcement Learning Bhagyashree Deokar, Ambarish Hazarnis Department of Computer Engineering K. J. Somaiya College of Engineering, Mumbai, India ABSTRACT
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More informationTHE INTELLIGENT BUSINESS INTELLIGENCE SOLUTIONS
THE INTELLIGENT BUSINESS INTELLIGENCE SOLUTIONS ADRIAN COJOCARIU, CRISTINA OFELIA STANCIU TIBISCUS UNIVERSITY OF TIMIŞOARA, FACULTY OF ECONOMIC SCIENCE, DALIEI STR, 1/A, TIMIŞOARA, 300558, ROMANIA ofelia.stanciu@gmail.com,
More informationA STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
More informationDynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
More informationStatistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept
Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions
More informationCHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved
CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information
More informationOUTLIER ANALYSIS. Data Mining 1
OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationTypes of Studies. Systematic Reviews and Meta-Analyses
Types of Studies Systematic Reviews and Meta-Analyses Important medical questions are typically studied more than once, often by different research teams in different locations. A systematic review is
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationDMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
More informationIdentifying IT Markets and Market Size
Identifying IT Markets and Market Size by Number of Servers Prepared by: Applied Computer Research, Inc. 1-800-234-2227 www.itmarketintelligence.com Copyright 2011, all rights reserved. Identifying IT
More informationCOURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
More informationInformation Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
More informationENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013
ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION, Fuel Consulting, LLC May 2013 DATA AND ANALYSIS INTERACTION Understanding the content, accuracy, source, and completeness of data is critical to the
More informationData mining and official statistics
Quinta Conferenza Nazionale di Statistica Data mining and official statistics Gilbert Saporta président de la Société française de statistique 5@ S Roma 15, 16, 17 novembre 2000 Palazzo dei Congressi Piazzale
More informationBusiness Information Systems. IT Enabled Services And Emerging Technologies. Chapter 4: Facilitated e-learning Part 1 of 2 CA M S Mehta, FCA
Business Information Systems IT Enabled Services And Emerging Technologies Chapter 4: Facilitated e-learning Part 1 of 2 CA M S Mehta, FCA 1 Business Information Systems Task Statements 1.6 Consider the
More informationData Discovery, Analytics, and the Enterprise Data Hub
Data Discovery, Analytics, and the Enterprise Data Hub Version: 101 Table of Contents Summary 3 Used Data and Limitations of Legacy Analytic Architecture 3 The Meaning of Data Discovery & Analytics 4 Machine
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationDownloaded from UvA-DARE, the institutional repository of the University of Amsterdam (UvA) http://hdl.handle.net/11245/2.122992
Downloaded from UvA-DARE, the institutional repository of the University of Amsterdam (UvA) http://hdl.handle.net/11245/2.122992 File ID Filename Version uvapub:122992 1: Introduction unknown SOURCE (OR
More information1. Understanding Big Data
Big Data and its Real Impact on Your Security & Privacy Framework: A Pragmatic Overview Erik Luysterborg Partner, Deloitte EMEA Data Protection & Privacy leader Prague, SCCE, March 22 nd 2016 1. 2016 Deloitte
More informationData Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction
Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration
More information2. Issues using administrative data for statistical purposes
United Nations Statistical Institute for Asia and the Pacific Seventh Management Seminar for the Heads of National Statistical Offices in Asia and the Pacific, 13-15 October, Shanghai, China New Zealand
More informationAutomatic Document Categorization A Hummingbird White Paper
Automatic Document Categorization A Hummingbird White Paper Automatic Document Categorization While every attempt has been made to ensure the accuracy and completeness of the information in this document,
More informationDATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell Leo_Pipino@UML.edu David Kopcso Babson College Kopcso@Babson.edu Abstract: A series of simulations
More informationCurrent Situations and Issues of Occupational Classification Commonly. Used by Private and Public Sectors. Summary
Current Situations and Issues of Occupational Classification Commonly Used by Private and Public Sectors Summary Author Hiroshi Nishizawa Senior researcher, The Japan Institute for Labour Policy and Training
More informationWelcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
More informationQualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1
Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Introduction Electronic Commerce 2 is accelerating dramatically changes in the business process. Electronic
More informationAssessment Policy. 1 Introduction. 2 Background
Assessment Policy 1 Introduction This document has been written by the National Foundation for Educational Research (NFER) to provide policy makers, researchers, teacher educators and practitioners with
More informationBig Data-Challenges and Opportunities
Big Data-Challenges and Opportunities White paper - August 2014 User Acceptance Tests Test Case Execution Quality Definition Test Design Test Plan Test Case Development Table of Contents Introduction 1
More informationReflections on Probability vs Nonprobability Sampling
Official Statistics in Honour of Daniel Thorburn, pp. 29 35 Reflections on Probability vs Nonprobability Sampling Jan Wretman 1 A few fundamental things are briefly discussed. First: What is called probability
More informationKey Requirements for a Job Scheduling and Workload Automation Solution
Key Requirements for a Job Scheduling and Workload Automation Solution Traditional batch job scheduling isn t enough. Short Guide Overcoming Today s Job Scheduling Challenges While traditional batch job
More informationSoftware Firm Applies Structure to Content Management System for Greatest Value
Partner Solution Case Study Software Firm Applies Structure to Content Management System for Greatest Value Overview Country or Region: United States Industry: Professional services Software engineering
More informationFoundations of Business Intelligence: Databases and Information Management
Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,
More informationDelivering Smart Answers!
Companion for SharePoint Topic Analyst Companion for SharePoint All Your Information Enterprise-ready Enrich SharePoint, your central place for document and workflow management, not only with an improved
More informationIntroduction to Quality Assessment
Introduction to Quality Assessment EU Twinning Project JO/13/ENP/ST/23 23-27 November 2014 Component 3: Quality and metadata Activity 3.9: Quality Audit I Mrs Giovanna Brancato, Senior Researcher, Head
More informationrelevant to the management dilemma or management question.
CHAPTER 5: Clarifying the Research Question through Secondary Data and Exploration (Handout) A SEARCH STRATEGY FOR EXPLORATION Exploration is particularly useful when researchers lack a clear idea of the
More informationwarehouse landscape for HINC
Transforming the data warehouse landscape for the financial industry HINC by Graz A data warehouse pre-configured for the financial industry significantly reduces the costs and risks associated with reporting
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationThe Benefits of Using Data Mining Approach in Business Intelligence for Healthcare Organizations
The Benefits of Using Data Mining Approach in Business Intelligence for Healthcare Organizations Hisham S, Katoua Management Information Systems Dept. Faculty of Economics & Administration King Abdulaziz
More informationA Survey of Classification Techniques in the Area of Big Data.
A Survey of Classification Techniques in the Area of Big Data. 1PrafulKoturwar, 2 SheetalGirase, 3 Debajyoti Mukhopadhyay 1Reseach Scholar, Department of Information Technology 2Assistance Professor,Department
More informationChapter 6 Experiment Process
Chapter 6 Process ation is not simple; we have to prepare, conduct and analyze experiments properly. One of the main advantages of an experiment is the control of, for example, subjects, objects and instrumentation.
More informationFight fire with fire when protecting sensitive data
Fight fire with fire when protecting sensitive data White paper by Yaniv Avidan published: January 2016 In an era when both routine and non-routine tasks are automated such as having a diagnostic capsule
More informationAlternative data collection methods -
Alternative data collection methods - focus on online data Presentation prepared by Ragnhild Nygaard, Statistics Norway for the UNECE/ILO Meeting on CPIs, Geneva, 2.-4. May 2016 Contents Data sources and
More informationIntegrated email archiving: streamlining compliance and discovery through content and business process management
Make better decisions, faster March 2008 Integrated email archiving: streamlining compliance and discovery through content and business process management 2 Table of Contents Executive summary.........
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationDESKTOP BASED RECOMMENDATION SYSTEM FOR CAMPUS RECRUITMENT USING MAHOUT
Journal homepage: www.mjret.in ISSN:2348-6953 DESKTOP BASED RECOMMENDATION SYSTEM FOR CAMPUS RECRUITMENT USING MAHOUT 1 Ronak V Patil, 2 Sneha R Gadekar, 3 Prashant P Chavan, 4 Vikas G Aher Department
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationData Mining and Analytics in Realizeit
Data Mining and Analytics in Realizeit November 4, 2013 Dr. Colm P. Howlin Data mining is the process of discovering patterns in large data sets. It draws on a wide range of disciplines, including statistics,
More informationData Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC
Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.
More informationData Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
More informationIMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
More informationA Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
More informationChapter 8: Quantitative Sampling
Chapter 8: Quantitative Sampling I. Introduction to Sampling a. The primary goal of sampling is to get a representative sample, or a small collection of units or cases from a much larger collection or
More informationWHITE PAPER. Five Steps to Better Application Monitoring and Troubleshooting
WHITE PAPER Five Steps to Better Application Monitoring and Troubleshooting There is no doubt that application monitoring and troubleshooting will evolve with the shift to modern applications. The only
More informationPlanning and Writing Essays
Planning and Writing Essays Many of your coursework assignments will take the form of an essay. This leaflet will give you an overview of the basic stages of planning and writing an academic essay but
More informationBig Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
More informationImproving quality through regular reviews:
Implementing Regular Quality Reviews at the Office for National Statistics Ria Sanderson, Catherine Bremner Quality Centre 1, Office for National Statistics, UK Abstract There is a requirement under the
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationChapter ML:XI. XI. Cluster Analysis
Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster
More informationStatistical Challenges with Big Data in Management Science
Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision
More informationCHAPTER - 5 CONCLUSIONS / IMP. FINDINGS
CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS In today's scenario data warehouse plays a crucial role in order to perform important operations. Different indexing techniques has been used and analyzed using
More informationUSES OF CONSUMER PRICE INDICES
USES OF CONSUMER PRICE INDICES 2 2.1 The consumer price index (CPI) is treated as a key indicator of economic performance in most countries. The purpose of this chapter is to explain why CPIs are compiled
More informationECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam
ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open
More informationHow to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationQuality Control of Web-Scraped and Transaction Data (Scanner Data)
Quality Control of Web-Scraped and Transaction Data (Scanner Data) Ingolf Boettcher 1 1 Statistics Austria, Vienna, Austria; ingolf.boettcher@statistik.gv.at Abstract New data sources such as web-scraped
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationETPL Extract, Transform, Predict and Load
ETPL Extract, Transform, Predict and Load An Oracle White Paper March 2006 ETPL Extract, Transform, Predict and Load. Executive summary... 2 Why Extract, transform, predict and load?... 4 Basic requirements
More informationIntroduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI
Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, University of Indonesia Objectives
More informationThe University of Adelaide Business School
The University of Adelaide Business School MBA Projects Introduction There are TWO types of project which may be undertaken by an individual student OR a team of up to 5 students. This outline presents
More informationWHERE TO START? A PRELIMINARY DATA QUALITY CHECKLIST FOR EMERGENCY MEDICAL SERVICES DATA (Practice - Oriented Paper)
WHERE TO START? A PRELIMINARY DATA QUALITY CHECKLIST FOR EMERGENCY MEDICAL SERVICES DATA (Practice - Oriented Paper) Jennifer Long Prehospital and Transport Medicine Research Program Sunnybrook and Women
More informationThe skill content of occupations across low and middle income countries: evidence from harmonized data
The skill content of occupations across low and middle income countries: evidence from harmonized data Emanuele Dicarlo, Salvatore Lo Bello, Sebastian Monroy, Ana Maria Oviedo, Maria Laura Sanchez Puerta
More informationEnhancing Sales and Operations Planning with Forecasting Analytics and Business Intelligence WHITE PAPER
Enhancing Sales and Operations Planning with Forecasting Analytics and Business Intelligence WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Analytics.... 1 Forecast Cycle Efficiencies...
More informationA Process-focused Approach to Improving Business Performance
A Process-focused Approach to Improving Business Performance Richard B Davis, BSc(Eng), CEng, MIEE, AKC Process Improvement Consultant, AXA AXA Centre, PO Box 1810, Bristol BS99 5SN Telephone: 0117 989
More informationStatistics on E-commerce and Information and Communication Technology Activity
Assessment of compliance with the Code of Practice for Official Statistics Statistics on E-commerce and Information and Communication Technology Activity (produced by the Office for National Statistics)
More informationManaging Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
More informationIndex Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
More informationSECURITY METRICS: MEASUREMENTS TO SUPPORT THE CONTINUED DEVELOPMENT OF INFORMATION SECURITY TECHNOLOGY
SECURITY METRICS: MEASUREMENTS TO SUPPORT THE CONTINUED DEVELOPMENT OF INFORMATION SECURITY TECHNOLOGY Shirley Radack, Editor Computer Security Division Information Technology Laboratory National Institute
More information