Formal Methods for Preserving Privacy for Big Data Extraction Software
|
|
|
- Sibyl Quinn
- 10 years ago
- Views:
Transcription
1 Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability of information storage media, businesses, government agencies, healthcare professionals, and individuals worldwide have exponentially increased their production and persistence of large amounts of data whether such data are captured as text, images, or sound. As such, it is no surprise that just coining the term Big Data has generated a viral level of energy in industry and the research community. In order to facilitate the benefits of Big Data, users must capture and store their information in systematic and predictable data stores using trusted software that extract, transform and load (ETL) the data. This position paper focuses on challenges associated with the protection of sensitive information that, when mishandled or correlated with other pieces of data, can jeopardize user s privacy. In this paper, we motivate the need for enhanced formal approaches that will enable (1) the creation of a customized, contemporary software lifecycle and framework for big data testing, (2) development of new approaches for automated test generation including approaches that leverage the efforts of crowds/social networks, and (3) development of algorithms for continuously ensuring privacy as ETL applications and data evolve. Introduction Businesses and government agencies are generating and continuously collecting huge amounts of data. The current increased focus on large amounts of data will undoubtedly create opportunities and avenues for understanding the processing of such data across many varying domains. In this position paper, we use the term, Big Data, to define many terabyte and larger repositories of interconnected (sometimes with unknown relationships) data that might be realized as logs, product information, transactions, medical or biological collections. Data can be structured and unstructured and can contain complex types such as text documents, graphs, images, audio and video files. Several examples of Big Data may include repositories containing data such as information captured from social media on the Internet, large databases of genomic information (a domain that we have leveraged via our ongoing projects), or consumer s buying history and logs. Analysis of these huge repositories of data introduces fascinating new opportunities for discovering new patterns, relationships and insights that contribute to different branches of science. Researchers, for example, have analyzed data from over 7 million electronic medical records and used them to illustrate in a powerful visualization the (sometimes surprising) relationships between medical conditions on the basis of the frequency of co occurrences [1]. The Figure 1. Health InfoScape : a visualization of visualized network is called the Health InfoScape (Figure 1). Such visualizations raise the relationships between medical conditions on significant HIPAA concerns [2]. the basis of the frequency of co-occurrences. 1
2 The potential of Big Data comes however with a price; the users privacy is often at risk. Guarantees of conformance to privacy terms and regulations are limited in current Big Data analytics and mining practices. Furthermore, current practices do not ensure that privacy terms are enforced as the data and underlying software evolve. In recent events, the government healthcare website exemplified the challenges faced by Big Data applications. The website requests enrollees to provide personally identifiable information such as date of birth, Social Security number and income information. According to the website's privacy policy, that information is shared among many government agencies, including the IRS, Social Security Administration and Department of Homeland Security. However, the website has been criticized that there have been no guarantees that this information is used solely for the purpose of verifying the enrollee's income, immigration status and healthcare eligibility. Users expect that the information should not be used, shared or stored beyond that purpose. This example and many others [17][18]show that there exists a distinct need to build trust into applications that extract, translate, and load (ETL) this scale of data. Furthermore, these processes must ensure the protection of sensitive information. Developers should be able to verify that their applications comply with privacy agreements and that sensitive information is kept private regardless of changes in the applications and/or privacy regulations. Overview of Big Data Loading and Testing To address these challenges, we have identified a need for new contributions in the areas of formal methods and testing procedures. A 2013 article on Big Data Testing [15] identifies the main testing areas, depicted in Figure 2, on a Hadoop architecture; the defacto standard for Big Data. In our work, we have discovered the need for new paradigms for privacy conformance testing to the four areas of the ETL process as follows: Figure 2. Big Data Architecture and Main Testing Areas [15]. (1) Pre Hadoop Process Validation: This step represents the data loading process. At this step, the privacy specifications define the sensitive pieces of data that can uniquely identify a user or an entity. Privacy terms can also specify which pieces of data can be stored and for how long. At this step, schema restrictions can take place as well. (2) Map Reduce Process Validation: This process manipulates big data assets to efficiently respond to a query. Privacy terms can dictate the minimum number of returned records required to conceal individual values, in addition to constraints on data sharing between different processes. (3) ETL Process Validation: Similar to step (2), warehousing logic need to be verified at this step for compliance with privacy terms. Some data values may be aggregated anonymously or not included in the warehouse if that implies high probability of identifying individuals. (4) Reports Testing: reports are another form of queries, potentially with higher visibility and wider audience. Privacy terms that define the notion of purpose are essential to verify that sensitive data is not reported except for specified uses. 2
3 Challenges for Formal Validation in Loading Big Data Big Data applications have evolved through commercial use but they are still lacking the theoretical rigor to provide confidence and repeatability. Formal approaches, when applied to Big Data and its applications, will ensure the development of trusted systems in terms of compliance with privacy regulations. Through our work, we introduce techniques that leverage the efforts of social networks. Crowdsourcing techniques can help with subjectivity of privacy rules while providing a solution to dealing with the size of data when verifying privacy rules [3][20]. There are several high level research challenges that must be addressed in order to enable our investigations and the implementation of our formalization framework, namely: Formal methods and specifications must be customized to capture the privacy requirements of Big Data. Any formal methods [4][10][12] [13] to address these problems must provide the necessary rigor to specify Big Data privacy regulations, given its unstructured and highly variable nature. The specification needs to capture the sensitivity level of different pieces of data, both individually and combined. It also needs to incorporate the notion of differential privacy where the risk of exposing personal information is probabilistically minimized. At the same time, the formalization language requires reasonable ease of use for application developers. Techniques and tools must be developed that reason [7][8][9] about the specification of rules in order to automatically generate compliance tests. Human input (i.e. crowdsourcing techniques) should be leveraged to efficiently and effectively guide automatic code and rules repair. With the huge amount of data and the risk involved in automatically correcting sensitive data, we believe that leveraging crowds (perhaps of socially linked groups [3][20]) can increase the trust in the data driven applications. We also believe that users input is valuable in defining and refining privacy rules considering its subjective nature. Current privacy preserving mining algorithms for Big Data must facilitate verification of privacy compliance. Given the high volume, and the mix of structured and unstructured data, we need a set of new algorithms for managing and analyzing Big Data privately. These algorithms may build on current privacy preserving data techniques [5][6][11] [14][19]. Leveraging the appropriate formal specification, we propose a family of algorithms that can (1) comply with the advertised privacy regulations and (2) automatically adapt to changes in the application logic and the privacy rules. New software lifecycle (or customizations to existing contemporary lifecycles) must be created to incorporate the proposed approaches for developing trusted software for Big Data. Privacy considerations must be integrated into popular software development cycle [16] to ensure wide adoption of these approaches. Privacy must be built in as a property to the applications rather than an afterthought. Consequently, there must be a redefinition of the phases of existing software development processed to integrate formalization that define privacy. Subsequently, innovations are required that extend current testing practices and the underlying processes for handling changes in privacy rules and the application logic. Conclusion As Big Data analytics continue to become prevalent in the operations of industry and government organizations, principle methods must ensure privacy at present time and as the data evolve. Aforementioned challenges require research coordination in the general areas of software engineering, data management, and machine learning. 3
4 References [1] At the Intersection of Big Data and Healthcare: What 7.2 Million Medical Records Can Tell Us» CCC Blog. [Online]. Available: the intersection of bigdata and healthcare what 7 2 million medical records can tell us/. [Accessed: 08 Jan 2014]. [2] Summary of the HIPAA Security Rule. [Online]. Available: [Accessed: 04 Jan 2014]. [3] D. Schall, M.B. Blake, and S. Dustdar Programming Human and Software Based Web Services, IEEE Computer, Vol 43, No. 7, pp 82 86, July 2010 [4] E. Sobel and M. R. Clarkson, Formal methods application: An empirical tale of software development, IEEE Transactions on Software Engineering, pp , [5] I. Saleh and M. Eltoweissy, P3ARM t: Privacy Preserving Protocol for Association Rule Mining with t Collusion Resistance, Journal of Computers, vol. 2, no. 2, Apr [6] I. Saleh, A. Mokhtar, A. Shoukry, and M. Eltoweissy, P3ARM: Privacy Preserving Protocol for Association Rule Mining, in 2006 IEEE Information Assurance Workshop, 2006, pp [7] I. Saleh, G. Kulczycki, and M. B. Blake, Demystifying Data Centric Web Services, Internet Computing, IEEE, vol. 13, no. 5, pp , [8] I. Saleh, G. Kulczycki, and M. B. Blake, Formal Specification and Verification of Transactional Service Composition, in Services (SERVICES), 2011 IEEE World Congress on, 2011, pp [9] I. Saleh, G. Kulczycki, M.B. Blake, Y. Wei Formal Methods for Data Centric Web Services: A Case Study, International Journal of Services Computing, Vol. No.1, December 2013 I. Saleh, G. Kulczycki, M. B. Blake, and Y. Wei, Formal Methods for Data Centric Web Services: From Model to Implementation, in 2013 IEEE International Conference on Web Services (ICWS), [10] J. M. Wing, A Specifier s Introduction to Formal Methods, Computer, vol. 23, no. 9, pp. 8, 10 22, 24, Sep [11] J. Paranthaman and T. A. A. Victoire, Hybrid techniques for Privacy preserving in Data Mining, Life Science Journal, vol. 10, no. 7s, [12] J. R. Abrial, Formal methods in industry: achievements, problems, future, Shanghai, China, 2006, pp [13] Luqi and J. A. Goguen, Formal Methods: Promises and Problems, IEEE Software, vol. 14, no. 1, pp , Feb [14] M. D. L. Kotak and M. S. Shukla, Protecting Sensitive Rules Based on Classification in Privacy Preserving Data Mining, International Journal of Engineering, vol. 2, no. 11, [15] M. Gudipati, S. Rao, N. D. Mohan, and N. K. Gajja, Big Data: Testing Approach to Overcome Quality Challenges. [16] M.B. Blake "Decomposing Composition: Service Oriented Software Engineers", Special Issue on Realizing Service Centric Software Systems, IEEE Software, Vol. 24, No. 6, pp 68 77, Nov/Dec
5 [17] Narayanan and V. Shmatikov, How to break anonymity of the netflix prize dataset, arxiv preprint cs/ , [18] R. C. Wong, Big Data Privacy, J Inform Tech Softw Eng, vol. 2, p. e114, [19] R. Nix, M. Kantarcioglu, and K. J. Han, Approximate privacy preserving data mining on vertically partitioned data, in Data and Applications Security and Privacy XXVI, Springer, 2012, pp [20] W. Tan, M.B. Blake, I. Saleh, S. Dustdar Social Network Sourced Big Data Analytics, IEEE Internet Computing, Vol. 17, No. 5, pp , Sept/Oct 2013 [21] X. Fu, Conformance verification of privacy policies, in Web Services and Formal Methods, Springer, 2011, pp
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Associate Prof. Dr. Victor Onomza Waziri
BIG DATA ANALYTICS AND DATA SECURITY IN THE CLOUD VIA FULLY HOMOMORPHIC ENCRYPTION Associate Prof. Dr. Victor Onomza Waziri Department of Cyber Security Science, School of ICT, Federal University of Technology,
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Healthcare, transportation,
Smart IT Argus456 Dreamstime.com From Data to Decisions: A Value Chain for Big Data H. Gilbert Miller and Peter Mork, Noblis Healthcare, transportation, finance, energy and resource conservation, environmental
How To Develop Software
Software Engineering Prof. N.L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture-4 Overview of Phases (Part - II) We studied the problem definition phase, with which
BIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya
Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data
Cray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
NTT DATA Big Data Reference Architecture Ver. 1.0
NTT DATA Big Data Reference Architecture Ver. 1.0 Big Data Reference Architecture is a joint work of NTT DATA and EVERIS SPAIN, S.L.U. Table of Contents Chap.1 Advance of Big Data Utilization... 2 Chap.2
Lightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
Outline. What is Big data and where they come from? How we deal with Big data?
What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
15.00 15.30 30 XML enabled databases. Non relational databases. Guido Rotondi
Programme of the ESTP training course on BIG DATA EFFECTIVE PROCESSING AND ANALYSIS OF VERY LARGE AND UNSTRUCTURED DATA FOR OFFICIAL STATISTICS Rome, 5 9 May 2014 Istat Piazza Indipendenza 4, Room Vanoni
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Realizing business flexibility through integrated SOA policy management.
SOA policy management White paper April 2009 Realizing business flexibility through integrated How integrated management supports business flexibility, consistency and accountability John Falkl, distinguished
Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules
Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules M.Sangeetha 1, P. Anishprabu 2, S. Shanmathi 3 Department of Computer Science and Engineering SriGuru Institute of Technology
Effective Data Integration - where to begin. Bryte Systems
Effective Data Integration - where to begin Bryte Systems making data work Bryte Systems specialises is providing innovative and cutting-edge data integration and data access solutions and products to
DATA MINING - 1DL105, 1DL025
DATA MINING - 1DL105, 1DL025 Fall 2009 An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht09 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
90% of your Big Data problem isn t Big Data.
White Paper 90% of your Big Data problem isn t Big Data. It s the ability to handle Big Data for better insight. By Arjuna Chala Risk Solutions HPCC Systems Introduction LexisNexis is a leader in providing
"BIG DATA A PROLIFIC USE OF INFORMATION"
Ojulari Moshood Cameron University - IT4444 Capstone 2013 "BIG DATA A PROLIFIC USE OF INFORMATION" Abstract: The idea of big data is to better use the information generated by individual to remake and
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper
Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)
Student Handbook 2015-2016. Master of Information Systems Management (MISM)
Student Handbook 2015-2016 Master of Information Systems Management (MISM) Table of Contents Contents 1 Masters of information systems Management (MISM) Curriculum... 3 1.1 Required Courses... 3 1.2 Specialization
Alcatel-Lucent 8920 Service Quality Manager for IPTV Business Intelligence
Alcatel-Lucent 8920 Service Quality Manager for IPTV Business Intelligence IPTV service intelligence solution for marketing, programming, internal ad sales and external partners The solution is based on
Big Data and Healthcare Payers WHITE PAPER
Knowledgent White Paper Series Big Data and Healthcare Payers WHITE PAPER Summary With the implementation of the Affordable Care Act, the transition to a more member-centric relationship model, and other
Integrated email archiving: streamlining compliance and discovery through content and business process management
Make better decisions, faster March 2008 Integrated email archiving: streamlining compliance and discovery through content and business process management 2 Table of Contents Executive summary.........
Qi Liu Rutgers Business School ISACA New York 2013
Qi Liu Rutgers Business School ISACA New York 2013 1 What is Audit Analytics The use of data analysis technology in Auditing. Audit analytics is the process of identifying, gathering, validating, analyzing,
A Knowledge Management Framework Using Business Intelligence Solutions
www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For
CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University
CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University Given today s business environment, at times a corporate executive
Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success
Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10
Beyond the Data Lake
WHITE PAPER Beyond the Data Lake Managing Big Data for Value Creation In this white paper 1 The Data Lake Fallacy 2 Moving Beyond Data Lakes 3 A Big Data Warehouse Supports Strategy, Value Creation Beyond
BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT
ISSN 1804-0519 (Print), ISSN 1804-0527 (Online) www.academicpublishingplatforms.com BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT JELICA TRNINIĆ, JOVICA ĐURKOVIĆ, LAZAR RAKOVIĆ Faculty of Economics
Reinventing Business Intelligence through Big Data
Reinventing Business Intelligence through Big Data Dr. Flavio Villanustre VP, Technology and lead of the Open Source HPCC Systems initiative LexisNexis Risk Solutions Reed Elsevier LEXISNEXIS From RISK
An Enterprise Framework for Business Intelligence
An Enterprise Framework for Business Intelligence Colin White BI Research May 2009 Sponsored by Oracle Corporation TABLE OF CONTENTS AN ENTERPRISE FRAMEWORK FOR BUSINESS INTELLIGENCE 1 THE BI PROCESSING
Data warehouse and Business Intelligence Collateral
Data warehouse and Business Intelligence Collateral Page 1 of 12 DATA WAREHOUSE AND BUSINESS INTELLIGENCE COLLATERAL Brains for the corporate brawn: In the current scenario of the business world, the competition
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
HOW THE DATA LAKE WORKS
HOW THE DATA LAKE WORKS by Mark Jacobsohn Senior Vice President Booz Allen Hamilton Michael Delurey, EngD Principal Booz Allen Hamilton As organizations rush to take advantage of large and diverse data
MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012
MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
Session 4 Cloud computing for future ICT Knowledge platforms
ITU Workshop on "Future Trust and Knowledge Infrastructure", Phase 1 Geneva, Switzerland, 24 April 2015 Session 4 Cloud computing for future ICT Knowledge platforms Olivier Le Grand, Senior Standardization
A Secure Model for Medical Data Sharing
International Journal of Database Theory and Application 45 A Secure Model for Medical Data Sharing Wong Kok Seng 1,1,Myung Ho Kim 1, Rosli Besar 2, Fazly Salleh 2 1 Department of Computer, Soongsil University,
How To Turn Big Data Into An Insight
mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Cúram Business Intelligence and Analytics Guide
IBM Cúram Social Program Management Cúram Business Intelligence and Analytics Guide Version 6.0.4 Note Before using this information and the product it supports, read the information in Notices at the
Enterprise Data Quality
Enterprise Data Quality An Approach to Improve the Trust Factor of Operational Data Sivaprakasam S.R. Given the poor quality of data, Communication Service Providers (CSPs) face challenges of order fallout,
Cleaned Data. Recommendations
Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110
Big Data: Study in Structured and Unstructured Data
Big Data: Study in Structured and Unstructured Data Motashim Rasool 1, Wasim Khan 2 [email protected], [email protected] Abstract With the overlay of digital world, Information is available
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
WHITE PAPER. Social media analytics in the insurance industry
WHITE PAPER Social media analytics in the insurance industry Introduction Insurance is a high involvement product, as it is an expense. Consumers obtain information about insurance from advertisements,
BEYOND BI: Big Data Analytic Use Cases
BEYOND BI: Big Data Analytic Use Cases Big Data Analytics Use Cases This white paper discusses the types and characteristics of big data analytics use cases, how they differ from traditional business intelligence
Voice. listen, understand and respond. enherent. wish, choice, or opinion. openly or formally expressed. May 2010. - Merriam Webster. www.enherent.
Voice wish, choice, or opinion openly or formally expressed - Merriam Webster listen, understand and respond May 2010 2010 Corp. All rights reserved. www..com Overwhelming Dialog Consumers are leading
Research of Smart Distribution Network Big Data Model
Research of Smart Distribution Network Big Data Model Guangyi LIU Yang YU Feng GAO Wendong ZHU China Electric Power Stanford Smart Grid Research Institute Smart Grid Research Institute Research Institute
DATA MINING - 1DL360
DATA MINING - 1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.
Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their
Big Data Analytics Best Practices
1 Big Data Analytics Best Practices Marshall Presser Federal Field CTO Greenplum 2 Big Data Makes the Mainstream 3 WHAT DOES IT TAKE? 4 1. New Applications MADlib 5 2. New Skill Sets -- Data Science 6
7 Best Practices for Speech Analytics. Autonomy White Paper
7 Best Practices for Speech Analytics Autonomy White Paper Index Executive Summary 1 Best Practice #1: Prioritize Efforts 1 Best Practice #2: Think Contextually to Get to the Root Cause 1 Best Practice
MDM and Data Warehousing Complement Each Other
Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There
IBM Software A Journey to Adaptive MDM
IBM Software A Journey to Adaptive MDM What is Master Data? Why is it Important? A Journey to Adaptive MDM Contents 2 MDM Business Drivers and Business Value 4 MDM is a Journey 7 IBM MDM Portfolio An Adaptive
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
JOURNAL OF OBJECT TECHNOLOGY
JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,
Top 10 Business Intelligence (BI) Requirements Analysis Questions
Top 10 Business Intelligence (BI) Requirements Analysis Questions Business data is growing exponentially in volume, velocity and variety! Customer requirements, competition and innovation are driving rapid
The Future of Business Analytics is Now! 2013 IBM Corporation
The Future of Business Analytics is Now! 1 The pressures on organizations are at a point where analytics has evolved from a business initiative to a BUSINESS IMPERATIVE More organization are using analytics
Apache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
Hadoop Technology for Flow Analysis of the Internet Traffic
Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet
A Design Technique: Data Integration Modeling
C H A P T E R 3 A Design Technique: Integration ing This chapter focuses on a new design technique for the analysis and design of data integration processes. This technique uses a graphical process modeling
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer
locuz.com Big Data Services
locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.
Data Outsourcing based on Secure Association Rule Mining Processes
, pp. 41-48 http://dx.doi.org/10.14257/ijsia.2015.9.3.05 Data Outsourcing based on Secure Association Rule Mining Processes V. Sujatha 1, Debnath Bhattacharyya 2, P. Silpa Chaitanya 3 and Tai-hoon Kim
Healthcare data analytics. Da-Wei Wang Institute of Information Science [email protected]
Healthcare data analytics Da-Wei Wang Institute of Information Science [email protected] Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
Data Mining and Database Systems: Where is the Intersection?
Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: [email protected] 1 Introduction The promise of decision support systems is to exploit enterprise
Agile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
DATAOPT SOLUTIONS. What Is Big Data?
DATAOPT SOLUTIONS What Is Big Data? WHAT IS BIG DATA? It s more than just large amounts of data, though that s definitely one component. The more interesting dimension is about the types of data. So Big
How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6
Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...
COMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Architected Blended Big Data with Pentaho
Architected Blended Big Data with Pentaho A Solution Brief Copyright 2013 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information,
Chapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
Big Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
How To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
Industry Models and Information Server
1 September 2013 Industry Models and Information Server Data Models, Metadata Management and Data Governance Gary Thompson ([email protected] ) Information Management Disclaimer. All rights reserved.
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Introduction to Service Oriented Architectures (SOA)
Introduction to Service Oriented Architectures (SOA) Responsible Institutions: ETHZ (Concept) ETHZ (Overall) ETHZ (Revision) http://www.eu-orchestra.org - Version from: 26.10.2007 1 Content 1. Introduction
The University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics
North Highland and Analytics Governance Considerations for Big Analytics Agenda Traditional BI/Analytics vs. Big Analytics Types of Requiring Governance Key Considerations Information Framework Organizational
