SEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS
|
|
|
- Baldwin Carr
- 9 years ago
- Views:
Transcription
1 SEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS Irwan Bastian, Lily Wulandari, I Wayan Simri Wicaksana {bastian, lily, wayan}@staff.gunadarma.ac.id Program Doktor Teknologi Informasi Universitas Gunadarma Abstract In the business world, incorrect data can be costly. Many companies use customer information databases that record data like contact information, addresses, and preferences. If for instance the addresses are inconsistent, the company will suffer the cost of resending mail or even losing customers. The main issue in this paper is to enhance data cleaning by considering syntatic, structured and semantic problem. How to find appropriate approach to solve the heterogeneity in data cleaning for data integration especially for datawarehouse. In this paper, we purpose an ontology-based data cleaning framework. In the framework, data cleaning requires a set of ontologies describing the domains represented by an ontology representation language. Using the ontology-based approach, we are able to clean data of not only syntactic errors but also some classes of semantic errors. Keywords : Semantic heterogenous, Ontology, data cleaning 1. Introduction A process used to determine inaccurate, incomplete, or unreasonable data and then improving the quality through correction of detected errors and omissions. The process may include format checks, completeness checks, reasonableness checks, limit checks, review of the data to identify outliers (geographic, statistical, temporal or environmental) or other errors, and assessment of data by subject area experts (e.g. taxonomic specialists). These processes usually result in flagging, documenting and subsequent checking and correction of suspect records. Validation checks may also involve checking for compliance against applicable standards, rules, and conventions. In the business world, incorrect data can be costly. Many companies use customer information databases that record data like contact information, addresses, and preferences. If for instance the addresses are inconsistent, the company will suffer the cost of resending mail or even losing customers. No matter how efficient the process of data entry, errors will still occur and therefore data validation and correction cannot be ignored. Error detection, validation and cleaning do have key roles to play,
2 especially with legacy data (e.g. banking data collected), and thus both error prevention and data cleaning should be incorporated in an organisation s data management policy. One important product of data cleaning is the identification of the basic causes of the errors detected and using that information to improve the data entry process to prevent those errors from reoccurring. For data cleaning in general, three comprehensive systems have been developed: AJAX, the Potter s Wheel and Intelliclean. Let us briefly described the main features of each. AJAX [1][3] provides a declarative framework for describing data cleaning as a series of transformations to data, including the removal of synonymous records. It extends SQL to describe concepts relevant to data transformation [4] and matching. The problem of synonymous records from multiple sources is referred to as the object identity problem [1]. For example, John Smith may be referred as Smith John or J. Smith. The five atomic data transformations that were used in the AJAX framework are mapping, merging, clustering, merging, and SQL view. The Potter s Wheel [6] is an interactive data cleaning system with tightly integrated steps for performing transformations and detecting discrepancies. Using a spreadsheetlike interface, a user incrementally specifies a series of transformations to clean the data. The main components of the Potter s Wheel architecture are a data source, a transformation engine, an online reorderer to support interactive scrolling and sorting for the user interface, and an automatic discrepancy detector. Intelliclean [6] applies three steps to clean data. First, it preprocesses data to remove abbreviations and standardize data formats. Second, it applies two synonymous record identification rules based on the certainty factor (CF) to detect and match synonymous records. Finally, Intelliclean interacts with the human user to confirm the sets of synonymous records. To allow the user some control over the matching process, the recall and precision are measured and presented to the user. The main issue in this paper is to enhance data cleaning by considering syntatic, structured and semantic problem. How to find appropriate approach to solve the heterogeneity in data cleaning for data integration especially for datawarehouse. 2. Methodology Problem of data cleaning roughly distinguish between single-source and multi-source problems and between schema- and instance-related problems. Schema-level problems of course are also reflected in
3 the instances; they can be addressed at the schema level by an improved schema design (schema evolution), schema translation and schema integration. Instance-level problems, on the other hand, refer to errors and inconsistencies in the actual data contents which are not visible at the schema level. In this paper we only discuss about the multi-source problems. The problems that occur in single sources getting worse when a variety of sources need to be integrated. Each source may contain dirty data and data in the source they may be represented differently, overlapping or contradictory. This is because the sources are typically developed, used and managed independently to serve specific needs. This results in a large degree of heterogeneity of data management system, data models, schema design and the actual data. At the schema level, data model and schema design differences are to be addressed by the steps of schema translation and schema integration, respectively. The main problems w.r.t. schema design are naming and structural conflicts [5]. Naming conflicts arise when the same name is used for different objects (homonyms) or different names are used for the same object (synonyms). Table 1. Examples of multi-source problems at schema and instance level [7] Customer (source 1) CID Name Street City Sex 11 Kristen Smith 2 Hurley P1 South Fork, MN Christian Smith Hurley St 2 S Fork MN 1 Client (source 2) Cno LastName FirstName Gender Address Phone/Fax 24 Smith Christop M 23 Harley St, Chicago IL, / Smith Kris L F 2 Hurley Place, South Fork, MN In table 1, we see there are two sources of data. Both sources of data have a conflict on the level scheme and data. At the schema level, there are name some conflicts as follow: semantice conflicts in synonyms between source 1 ~ source 2
4 Customer ~ Client, Cid ~ Cno, Sex ~ Gender structural conflicts in different representations for names and addresses. between source 1 ~ source 2 Name ~ {LastName, FirstName} {Street, City} ~ Address syntatic conflict in coding between source 1 ~ source 2 Sex/Gender: 0/1 ~ M/F In this paper, we purpose an ontology-based data cleaning framework. In the framework, data cleaning requires a set of ontologies describing the domains represented by an ontology representation language. Using the ontology-based approach, we are able to clean data of not only syntactic errors but also some classes of semantic errors. An ontology represents the concepts and their relations that are relevant for a given domain of discourse. It consists of a representational vocabulary with precise definitions of the meanings of the terms of this vocabulary plus a set of axioms. The pupose solutions are: Semantic conflict is solved by implemented ontology framework that provided by taxonomy, glossary, similarity. The problem is how to develope the appropriate ontology framework in a domain. Structured conflict can be overcome by using ontology framework as well by looking for the IsA or subclass from a class. Syntatict conflict not yet address in this paper. 3. Discussion Although the three systems integrate different methods and are effective to some datasets, they cannot clean some semantic errors due to lack of domain knowledge support. We propose a method to perform semi-automatic data cleaning based semantic heterogeneity and ontology. We give illustrations to clarify our approach. Our illustration is based on examples of conflict in the previous section. For data integration needs then Sex and Gender should be recognized as the same column. In this case the semantics has an important role. For illustration: an ontology framework is developed based on WordNet [8] lexical that consist taxonomy, glossary and instances. From the problem
5 in tabel 1 can be found out the solution by looking for the similarity by looking a distance among the concept between sources can be calculated by using Wu & Palmer equation[2][9], the result as follow: Customer ~ Client is 1 Cid ~ Cno is no answer Sex ~ Gender is 1 The value of similarity is between 0 until 1. 0 value is mean absolutly not similar, and 1 value is mean absolutly similar. From above result, customer-client and sex-gender is similar, but Cid Cno is no answer because there are no concept Cid and Cno in WordNet. Other concepts among sources can be calculated based on this approach to consider about similarity to consider the data can be intergrated or not. 4. Conclusion A set of methods was presented which addresses the problem of automatic identification of errors in data sets. The methods were implemented and the results showed that some of the methods could be successfully applied to real-world data, while others need fine-tuned and improvement. Semantic and structured conflict problem can be addressed by using ontology framework. Future work will find a mehtod for semi automatic in syntatic conflict solution. Some papers pupose to consider artificial intelligence that combined with natural language processing. References [1] Bither, Y. Cleaning and Matching of Contact Lists Including Nicknames and Spouses, M.Sc. Thesis, Department of Computer Science, University of Regina, September [2] Corley,C. and Mihalcea, R., Measuring the Semantic Similarity of Texts, Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pages 13 18, Ann Arbor, June [3] Galhardas, H., Florescu, D., Shasha, D., Simon, E., and Saita, C.A. Declarative Data Cleaning: Language, Model, and Algorithms. In Proceedings of the 27th International Conference on Very Large Databases (VLDB 2001), pages , Rome, Italy. [4] Han, J., and Kamber, M. Data Mining: Concepts and Techniques. Morgan Kaufmaan, [5] Kashyap, V. and Sheth, A.P.: Semantic and Schematic Similarities between Database Objects: A Context-Based Approach. VLDB Journal 5(4): , 1996.
6 [6] Low, W.L., Lee, M.L., and T.W. Ling, T.W. A Knowledge-Based Approach for Duplicate Elimination in Data Cleaning, Information Systems, 26(8): , Dec [7] Rahm,E. And Hong Hai Do, Data Cleaning: Problems and Current Approaches,University of Leipzig, Germany [8] Sun, K.T., Huang, Y.M., and Liu, M.C. (2011)., A WordNet-Based Near-Synonyms and Similar- Looking Word Learning System, Educational Technology & Society, 14 (1), , 2011 [9] T. Slimani, B. Ben Yaghlane, and K. Mellouli, A New Similarity Measure based on Edge Counting, World Academy of Science, Engineering and Technology 23, 2006
Study of Data Cleaning & Comparison of Data Cleaning Tools
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems
A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems Agusthiyar.R, 1, Dr. K. Narashiman 2 Assistant Professor (Sr.G), Department of Computer Applications,
Data Cleaning: Problems and Current Approaches
Data Cleaning: Problems and Current Approaches Erhard Rahm Hong Hai Do University of Leipzig, Germany http://dbs.uni-leipzig.de Abstract We classify data quality problems that are addressed by data cleaning
Application of Data Mining Methods in Health Care Databases
6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Application of Data Mining Methods in Health Care Databases Ágnes Vathy-Fogarassy Department of Mathematics and
Data Cleaning: Problems and Current Approaches
Data Cleaning: Problems and Current Approaches Erhard Rahm Hong Hai Do University of Leipzig, Germany http://dbs.uni-leipzig.de Abstract We classify data quality problems that are addressed by data cleaning
Data Cleaning and Transformation - Tools. Helena Galhardas DEI IST
Data Cleaning and Transformation - Tools Helena Galhardas DEI IST Agenda ETL/Data Quality tools Generic functionalities Categories of tools The AJAX data cleaning and transformation framework Typical architecture
BUSINESS RULES AND GAP ANALYSIS
Leading the Evolution WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Discovery and management of business rules avoids business disruptions WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Business Situation More
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM) Extended Abstract Ioanna Koffina 1, Giorgos Serfiotis 1, Vassilis Christophides 1, Val Tannen
DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH
DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 DATA INTEGRATION Motivation Many databases and sources of data that need to be integrated to work together Almost all applications have many sources
Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms
Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,
Analysis of Data Cleansing Approaches regarding Dirty Data A Comparative Study
Analysis of Data Cleansing Approaches regarding Dirty Data A Comparative Study Kofi Adu-Manu Sarpong Institute of Computer Science Valley View University, Accra-Ghana P.O. Box VV 44, Oyibi-Accra ABSTRACT
CHAPTER-6 DATA WAREHOUSE
CHAPTER-6 DATA WAREHOUSE 1 CHAPTER-6 DATA WAREHOUSE 6.1 INTRODUCTION Data warehousing is gaining in popularity as organizations realize the benefits of being able to perform sophisticated analyses of their
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
Data Cleaning. Helena Galhardas DEI IST
Data Cleaning Helena Galhardas DEI IST (based on the slides: A Survey of Data Quality Issues in Cooperative Information Systems, Carlo Batini, Tiziana Catarci, Monica Scannapieco, 23rd International Conference
Data Quality in Information Integration and Business Intelligence
Data Quality in Information Integration and Business Intelligence Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Institute of Computing Science Laboratory of Intelligent Decision Support Systems Politechnika Poznańska (Poznań University of Technology) Software
A generic approach for data integration using RDF, OWL and XML
A generic approach for data integration using RDF, OWL and XML Miguel A. Macias-Garcia, Victor J. Sosa-Sosa, and Ivan Lopez-Arevalo Laboratory of Information Technology (LTI) CINVESTAV-TAMAULIPAS Km 6
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Distributed Database for Environmental Data Integration
Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information
CHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1. Introduction 1.1 Data Warehouse In the 1990's as organizations of scale began to need more timely data for their business, they found that traditional information systems technology
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan
Report on the Dagstuhl Seminar Data Quality on the Web
Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
ONTOLOGY BASED FEEDBACK GENERATION IN DESIGN- ORIENTED E-LEARNING SYSTEMS
ONTOLOGY BASED FEEDBACK GENERATION IN DESIGN- ORIENTED E-LEARNING SYSTEMS Harrie Passier and Johan Jeuring Faculty of Computer Science, Open University of the Netherlands Valkenburgerweg 177, 6419 AT Heerlen,
A Framework and Architecture for Quality Assessment in Data Integration
A Framework and Architecture for Quality Assessment in Data Integration Jianing Wang March 2012 A Dissertation Submitted to Birkbeck College, University of London in Partial Fulfillment of the Requirements
A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment
DOI: 10.15415/jotitt.2014.22021 A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment Rupali Gill 1, Jaiteg Singh 2 1 Assistant Professor, School of Computer Sciences, 2 Associate
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
ADVANCED GEOGRAPHIC INFORMATION SYSTEMS Vol. II - Using Ontologies for Geographic Information Intergration Frederico Torres Fonseca
USING ONTOLOGIES FOR GEOGRAPHIC INFORMATION INTEGRATION Frederico Torres Fonseca The Pennsylvania State University, USA Keywords: ontologies, GIS, geographic information integration, interoperability Contents
Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)
HIDDEN WEB EXTRACTOR DYNAMIC WAY TO UNCOVER THE DEEP WEB DR. ANURADHA YMCA,CSE, YMCA University Faridabad, Haryana 121006,India [email protected] http://www.ymcaust.ac.in BABITA AHUJA MRCE, IT, MDU University
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,
A Framework for Data Migration between Various Types of Relational Database Management Systems
A Framework for Data Migration between Various Types of Relational Database Management Systems Ahlam Mohammad Al Balushi Sultanate of Oman, International Maritime College Oman ABSTRACT Data Migration is
A Framework of Context-Sensitive Visualization for User-Centered Interactive Systems
Proceedings of 10 th International Conference on User Modeling, pp423-427 Edinburgh, UK, July 24-29, 2005. Springer-Verlag Berlin Heidelberg 2005 A Framework of Context-Sensitive Visualization for User-Centered
Data Mining Governance for Service Oriented Architecture
Data Mining Governance for Service Oriented Architecture Ali Beklen Software Group IBM Turkey Istanbul, TURKEY [email protected] Turgay Tugay Bilgin Dept. of Computer Engineering Maltepe University Istanbul,
Data Analysis in E-Learning System of Gunadarma University by Using Knime
Data Analysis in E-Learning System of Gunadarma University by Using Knime Dian Kusuma Ningtyas tyaz tyaz [email protected] Prasetiyo [email protected] Farah Virnawati virtha
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
A Framework for Ontology-Based Knowledge Management System
A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: [email protected] Abstract Knowledge
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
DATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
Information Services for Smart Grids
Smart Grid and Renewable Energy, 2009, 8 12 Published Online September 2009 (http://www.scirp.org/journal/sgre/). ABSTRACT Interconnected and integrated electrical power systems, by their very dynamic
2 Associating Facts with Time
TEMPORAL DATABASES Richard Thomas Snodgrass A temporal database (see Temporal Database) contains time-varying data. Time is an important aspect of all real-world phenomena. Events occur at specific points
Application of ontologies for the integration of network monitoring platforms
Application of ontologies for the integration of network monitoring platforms Jorge E. López de Vergara, Javier Aracil, Jesús Martínez, Alfredo Salvador, José Alberto Hernández Networking Research Group,
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
User Profile Refinement using explicit User Interest Modeling
User Profile Refinement using explicit User Interest Modeling Gerald Stermsek, Mark Strembeck, Gustaf Neumann Institute of Information Systems and New Media Vienna University of Economics and BA Austria
Semantic Interoperability for Data Integration Framework using Semantic Web Services and Rule-based Inference: A case study in healthcare domain
Semantic Interoperability for Data Integration Framework using Semantic Web Services and Rule-based Inference: 1 Suphachoke Sonsilphong, 2 Ngamnij Arch-int Semantic Mining Information Integration Laboratory
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
Fund Finder: A case study of database-to-ontology mapping
Fund Finder: A case study of database-to-ontology mapping Jesús Barrasa, Oscar Corcho, Asunción Gómez-Pérez (Ontology Group, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Enterprise Data Quality
Enterprise Data Quality An Approach to Improve the Trust Factor of Operational Data Sivaprakasam S.R. Given the poor quality of data, Communication Service Providers (CSPs) face challenges of order fallout,
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
Application of Clustering and Association Methods in Data Cleaning
Proceedings of the International Multiconference on ISBN 978-83-60810-14-9 Computer Science and Information Technology, pp. 97 103 ISSN 1896-7094 Application of Clustering and Association Methods in Data
Data Integration and Data Provenance. Profa. Dra. Cristina Dutra de Aguiar Ciferri [email protected]
Data Integration and Data Provenance Profa. Dra. Cristina Dutra de Aguiar Ciferri [email protected] Outline n Data Integration q Schema Integration q Instance Integration q Our Work n Data Provenance q
An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
Automatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
City Data Pipeline. A System for Making Open Data Useful for Cities. [email protected]
City Data Pipeline A System for Making Open Data Useful for Cities Stefan Bischof 1,2, Axel Polleres 1, and Simon Sperl 1 1 Siemens AG Österreich, Siemensstraße 90, 1211 Vienna, Austria {bischof.stefan,axel.polleres,simon.sperl}@siemens.com
A survey of data quality tools
José Barateiro, Helena Galhardas A survey of data quality tools quality tools aim at detecting and correcting data problems that affect the accuracy and efficiency of data analysis applications. We propose
Using Ontologies to Enhance Data Management in Distributed Environments
Using Ontologies to Enhance Data Management in Distributed Environments Carlos Eduardo Pires 1, Damires Souza 2, Bernadette Lóscio 3, Rosalie Belian 4, Patricia Tedesco 3 and Ana Carolina Salgado 3 1 Federal
A Pattern-based Framework of Change Operators for Ontology Evolution
A Pattern-based Framework of Change Operators for Ontology Evolution Muhammad Javed 1, Yalemisew M. Abgaz 2, Claus Pahl 3 Centre for Next Generation Localization (CNGL), School of Computing, Dublin City
Publishing Linked Data Requires More than Just Using a Tool
Publishing Linked Data Requires More than Just Using a Tool G. Atemezing 1, F. Gandon 2, G. Kepeklian 3, F. Scharffe 4, R. Troncy 1, B. Vatant 5, S. Villata 2 1 EURECOM, 2 Inria, 3 Atos Origin, 4 LIRMM,
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Hybrid Technique for Data Cleaning
Hybrid Technique for Data Cleaning Ashwini M. Save P.G. Student, Department of Computer Engineering, Thadomal Shahani Engineering College, Bandra, Mumbai, India Seema Kolkur Assistant Professor, Department
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
Facilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat [email protected] Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
A Proposal for the use of Artificial Intelligence in Spend-Analytics
A Proposal for the use of Artificial Intelligence in Spend-Analytics Mark Bishop, Sebastian Danicic, John Howroyd and Andrew Martin Our core team Mark Bishop PhD studied Cybernetics and Computer Science
A Web-based Interactive Data Visualization System for Outlier Subspace Analysis
A Web-based Interactive Data Visualization System for Outlier Subspace Analysis Dong Liu, Qigang Gao Computer Science Dalhousie University Halifax, NS, B3H 1W5 Canada [email protected] [email protected] Hai
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
An Overview of Database management System, Data warehousing and Data Mining
An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
Business Intelligence: Recent Experiences in Canada
Business Intelligence: Recent Experiences in Canada Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies 2 Business Intelligence
