Towards a Semantic Extract-Transform-Load (ETL) framework for Big Data Integration
|
|
|
- Richard McLaughlin
- 10 years ago
- Views:
Transcription
1 2014 IEEE International Congress on Big Data Towards a Semantic Extract-Transform-Load (ETL) framework for Big Data Integration Srividya K Bansal Dept. of Engineering & Computing Systems Arizona State University Mesa, AZ, USA [email protected] Abstract Big Data has become the new ubiquitous term used to describe massive collection of datasets that are difficult to process using traditional database and software techniques. Most of this data is inaccessible to users, as we need technology and tools to find, transform, analyze, and visualize data in order to make it consumable for decision-making. One aspect of Big Data research is dealing with the Variety of data that includes various formats such as structured, numeric, unstructured text data, , video, audio, stock ticker, etc. Managing, merging, and governing a variety of data is the focus of this paper. This paper proposes a semantic Extract- Transform-Load (ETL) framework that uses semantic technologies to integrate and publish data from multiple sources as open linked data. This includes - creation of a semantic data model to provide a basis for integration and understanding of knowledge from multiple sources; creation of a distributed Web of data using Resource Description Framework (RDF) as the graph data model; extraction of useful knowledge and information from the combined data using SPARQL as the semantic query language. Keywords Big data; Data integration; Ontology; Semantic technolgies; I. INTRODUCTION There has been an exponential growth and availability of data, both structured and unstructured. Big Data has become the new ubiquitous term used to describe massive collection of datasets that is so large that it's difficult to process using traditional database and software techniques. Big Data may comprise of petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people - all from different sources (e.g. Web, sales, customer contact center, social media, mobile data, etc). The data is typically loosely structured data that is often incomplete and inaccessible. Big Data is transforming science, engineering, medicine, healthcare, finance, business, and ultimately society itself. Massive amounts of data are available to be harvested for competitive business advantage, sound government policies, and new insights in a broad array of applications (including healthcare, biomedicine, energy, smart cities, genomics, transportation, etc.). Yet, most of this data is inaccessible for users, as we need technology and tools to find, transform, analyze, and visualize data in order to make it consumable for decision-making [1]. Research community also agrees that it is important to engineer Big Data meaningfully [2]. Meaningful data integration in a schema-less, and complex Big Data world of databases is a big open challenge. Big Data research is usually discussed in the areas of the 3V s Volume, Velocity, and Variety. Volume being the storage of massive amount of data streaming in from social media, sensors, and machine-to-machine data being collected; determination of relevance within large data volumes; and use of analytics to create value from relevant data. Velocity is streaming in at unprecedented speed at which the data is streaming in and reacting quickly enough to deal with it in near-real time. Variety deals with the various types of formats in which data comes in (structured, numeric, unstructured text data, , video, audio, stock ticker, etc.). Big Data challenges are not only in storing and managing this variety of data but also extracting and analyzing consistent information from it. Researchers are working on creating a common conceptual model for the integrated data [3]. Managing, merging, and governing heterogeneous data is an open research challenge that this paper focuses on. There has been a tremendous increase of published data on the Web. Linked Open Data community effort has led to a huge data space with 31 billion RDF triples as shown in [4]. This data can be used in a number of interesting Web applications, mobile applications, and for analytics. An interesting example application would be a Smart City project that would integrate and use information from various sources such as transportation, weather, social media streams, maps, energy, real estate, policies, crime reports, etc. Government agencies are increasingly making their data accessible through initiatives such as data.gov to promote transparency and economic growth [5]. For example, a traffic jam that emerges due to an unplanned protest may be captured through a twitter stream, but missed when examining weather conditions, event databases, reported roadwork, etc. Additionally, weather sensors in the city tend to miss localized events such as flooding. These views of the city combined however, can provide a richer and more complete view of the state of the city, by merging traditional data sources with messy and unreliable social media streams thereby contributing to smart living, people, environment, economy, mobility, and governance. We need ways to organize a variety of data such that common things are represented together, while the things that are distinct can be represented as well. This will allow effective and creative use of search/query engines and analytic tools for Big Data, which is absolutely essential to /14 $ IEEE DOI /BigData.Congress
2 create smart and sustainable environments. This project focuses on dealing with the Variety V of Big Data more specifically creating a semantic extract-transform-load framework for data integration. This paper proposes the use of semantic technologies to connect, link, and load data into a data warehouse. This includes: (i) creation of a semantic data model via ontologies to provide a basis for integration and understanding knowledge from multiple sources; (ii) creation of integrated semantic data using Resource Description Framework (RDF) as the graph data model [6]; (iii) extracting useful knowledge and information from the combined web of data using SPARQL as the semantic query language [7]. This paper presents the preliminary implementation and results using a few sample public datasets that provide household travel data, vehicle data, and fuel economy data. The rest of the paper is organized as follows: Section 2 presents related work in this area. Section 3 presents a motivating scenario showing an application of meaningful data integration. Section 4 presents the conceptual model for a semantic ETL framework. Section 5 presents the system architecture and prototype implementation followed by conclusions and future work. II. RELATED WORK One of the popular approaches to data integration has been Extract-Transform-Load (ETL) as shown in [8], [9]. Authors of this work have described a taxonomy of activities in ETL and a framework that uses a workflow approach to design ETL activities. They used a declarative database programming language called LDL to define the semantics of ETL activities. Similarly there are other groups that have used various other approaches such as UML and data mapping diagrams for representing ETL activities, quality metrics driven design for ETL, and scheduling of ETL activities [10] [13]. The focus in all of these papers has been on the design of ETL workflow and not about generating meaningful/semantic data. A semantic approach to ETL technologies was proposed in [14], [15]. In this approach semantic technologies are used to further enhance definitions of the ETL activities involved in the process rather than the data itself. A tool that allowed semi-automatic definition of inter-attribute semantic mappings, by identifying parts of data source schemas, which are related to the data warehouse schema was proposed in [15]. This supported the extraction phase of ETL. The use of semantics here was to facilitate the extraction process and workflow generation with semantic mappings. In contrast, our approach uses ontologies to provide a common vocabulary for the integrated data and generates semantic data as part of the transformation phase of ETL. This semantic data is then loaded in the data store or warehouse for querying and analytics. There is related work on creating a common conceptual model for data integration to handle the heterogeneous data coming from multiple sources. Reference [3] uses processmining techniques to handle the heterogeneity by computing the mismatch among data sources being integrated. Work has been done on data abstraction and visualization tools for Big Data [16] that is applicable after the data integration phase. Numerous studies are being done on the analysis of Big Data using various algorithms such as influence-based module mining (IBMM) algorithm [17], online association rule mining [18], graph analytics [19], and provenance analysis support framework [20]. The method of publishing and linking structured data on the web is called Linked Data. This data is machinereadable, its meaning is explicitly defined, it is linked to other external data sets, and it can be linked to from other data sets as well [4]. Reference [21] uses these interlinked RDF data stores from the Linked Open Data (LOD) cloud and queries them using SPARQL to perform analysis. They use statistical Learning predictive models to learn classifiers from the data. III. MOTIVATING SCENARIO There has been a growing demand and interest in mobile applications for automotive industry that enhance the driver s experience through personalization, provide feedback on optimizing vehicle performance, diagnosis and part failure detection, and driver assistance. Most of these applications rely on Big Data that is available to the public through the cloud or as linked open data. Consider the following scenario where a typical driver, John gets into his car in the morning and turns on the ignition. A built-in innovative application in the car greets him and asks him if he is going to work based on the time of the day. John responds by saying yes and the app replies that the vehicle performance has been optimized for the trip. The built-in system uses the GIS system, road grade data, and speed limits data to create an optimal velocity profile. As John starts driving and is approaching Recker road to turn left, the app informs John about a road repair on Recker road for a 1-mile stretch on his route up to 3pm that day. The app suggests that John should continue driving and take the next left on Power road. John follows the suggestion and answers a text message that he receives from his collaborator. As he is answering to the text message, his car drifts into the neighboring lane. The app immediately notifies John that he is departing from his lane and John quickly adjusts his driving. As John approaches his workplace he drives towards Green Lot 1 where he usually parks. The app informs John that there are only 2 parking spots open in Green Lot 1. As John is already running late for a meeting by 5 minutes, he decides to directly drive to the next parking lot Green Lot 2 and avoid spending the time looking for the 2 empty spots in Lot 1. As John enters Lot 2 and is driving towards one of the empty spots, he gets too close to one of the parked cars. The app immediately warns John of a collision. John quickly adjusts his car away from the parked cars and parks in one of the empty spots. The app logs tracking data about John s style of driving on the server for future use. In order to build such apps for automobiles, access to a number of data sets from various sources is required. Some of this data is real-time data that is continuously being updated. Data related to traffic, road repairs, emergencies, accidents, driving habits, maps, parking, fuel economy data, household data, diagnosis data, etc. would be required
3 Various other apps could be built that focus on reducing energy consumption, reducing emissions, and using existing related data to analyze and provide useful feedback. Another useful app is for fuel economy guidance that is based on actual vehicle data, road conditions, traffic, and most importantly personal driving habits. This app would show the car s fuel efficiency, and provide feedback on how good or bad it is based on data from others in the community. It also provides guidance on improving one s personal driving habits and thereby saving on fuel. It is important to effectively integrate data such that the data is tied to a meaningful and rich data model that can be queried by these innovative applications. IV. CONCEPTUAL MODEL In this section we present our conceptual model by describing existing ETL process, our proposed idea of a Semantic ETL process and describe the semantic technology stack that will be used in our model. A. Extract-Transform-Load (ETL) Process Extract-Transform-Load (ETL) process in computing has been in use for integration of data from multiple sources or applications, possibly from different domains. It refers to a process in data warehousing that extracts data from outside sources, transforms it to fit operational needs, which can include quality checks, and loads it into the end target database, more specifically, operational data store, data mart, or data warehouse. A number of tools facilitate the ETL process, namely IBM Infosphere [22], Oracle Warehouse Builder [23], Microsoft SQL Server Integration Services [24], and Informatica Powercenter for Enterprise Data Integration [25]. Talend Open Studio [26] and Pentaho Kettle [27] are two open source ETL products. The three phases of this process are described as follows: Extract: this is the first phase of the process that involves data extraction from appropriate data sources. Data is usually available in flat file formats such as csv, xls, and txt or is available through a RESTful client. Transform: this phase involves the cleansing of data to comply with the target schema. Some of the typical transformation activities involve normalizing data, removing duplicates, checking for integrity constraint violations, filtering data based on some regular expressions, sorting and grouping data, applying built-in functions where necessary, etc. [8] Load: this phase involves the propagation of the data into a data mart or a data warehouse that serves Big Data. Most ETL tools provide a graphical interface to create a workflow of ETL activities and automate their execution. Figure 1 shows an example workflow created by Talend to transform and load NASA Aviation Safety Reporting System data into a database. Figure 1: Example ETL workflow B. Semantic ETL In this paper we introduce the use of semantic technologies in the Transform phase of an ETL process to create a semantic data model and generate semantic linked data (RDF triples) to be stored in a data mart or warehouse. The transform phase will still continue to perform other activities such as normalizing and cleansing of data. Extract and Load phases of the ETL process would remain the same. Figure 2 shows the overview of activities in semantic ETL. Transform phase will involve a manual process of analyzing the datasets, the schema and their purpose. Based on the findings, the schema will have to be mapped to an existing domain-specific ontology or an ontology will have to be created from scratch. If the data sources belong to disparate domains, multiple ontologies will be required and alignment rules will have to be specified for any common or related data fields. C. Technology Stack Semantic web technologies facilitate: the organization of knowledge into conceptual spaces, based on their meanings; extraction of new knowledge via querying; and maintenance of knowledge by checking for inconsistencies. These technologies can therefore support the construction of an advanced knowledge management system [28], [29]. The following Semantic technologies and tools are used as part of our Semantic ETL framework: Uniform Resource Identifier (URI), a string of characters used to identify a name or a web resource. Such identification enables interaction with representations of the web resource over a network (typically the Web) using specific protocols. Resource Description Framework (RDF) [6], a general method for data interchange on the Web, which allows the sharing and mixing of structured and semi-structured data across various applications. As the name suggests, RDF is a language for describing web resources. It is
4 Figure 2: Overview of Semantic ETL Process used for representing information, especially metadata, about web resources. RDF is designed to be machinereadable so that it can be used in software applications for intelligent processing of information. Web Ontology Language (OWL) [30] is a markup language that is used for publishing and sharing ontologies. OWL is built upon RDF and an ontology created in OWL is actually a RDF graph. Individuals with common characteristics can be grouped together to form a class. OWL provides different types of class descriptions that can be used to describe an OWL class. OWL also provides two types of properties: object properties and data properties. Object properties are used to link individuals to other individuals while data properties are used to link individuals to data values. OWL enables users to define concepts in a way that allows them to be mixed and matched with other concepts for various uses and applications. SPARQL a RDF Query Language [7], which is designed to query, retrieve, and manipulate data stored in RDF format. SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns. SPARQL allows users to write queries against data that can loosely be called "key-value" data, as it follows the RDF specification of the W3C. The entire database is thus a set of "subject-predicate-object" triples. Protégé Ontology Editor - Protégé is an open-source ontology editor and framework for building intelligent systems [31]. It allows users to create ontologies in W3C s Web Ontology Language. It is used in our framework to provide semantic metadata to the schema of datasets from various sources. V. IMPLEMENTATION In this section we present our prototype implementation of the semantic ETL process using public datasets on National Household travel survey data and EPA s Fuel Economy data. These data sets were chosen to build a prototype for a proof-of-concept of the semantic ETL framework. Table 1: Sample data fields from National Household data set
5 A. Data sets Our first data source is the National Household Travel survey (NHTS) data for the year 2009 published by the U.S. Department of Transportation. This data was collected to assist transportation planners and policy makers who needed transportation patterns in the United States. The dataset consists of daily travel data of trip taken in a 24-hour period with information on the purpose of the trip (work, grocery shopping, school dropoff, etc.), means of transportation used (bus, car, walk, etc.), how long the trip took, day of week when it took place, and additional information in case a private vehicle was used. For private vehicle driver characteristics and vehicle attributes were all also collected. This data was collected for all areas of the country, urban and rural. This dataset has been used by research community to study relationships between demographics and travel, correlation of the modes of transportation, amount, or purpose of travel with the time of the day and day of the week. We chose this dataset to integrate the private vehicles used by members of the household and fuel economy of those corresponding vehicle models by getting that information from another dataset on Fuel Economy. The second data source is the U.S. Environmental Protection Agency s (EPA) data on Fuel Economy of vehicles. This dataset provided detailed information about vehicles produced by all manufacturers (make) and models. It provided detailed vehicle description, mileage data for both city drive and highway drive, mileage for different fuel types, emissions information, and fuel prices that included regular, midgrade, and premium gasoline, diesel, electric, lpg, e85, and cng. This data was obtained as a result of vehicle testing conducted by EPA s National Vehicle and Fuel Emissions Lab and the manufacturers with an oversight by EPA. Table 1 & 2 show sample data fields from the National Household Travel dataset. Table 3 shows sample data fields from the Fuel Economy dataset. B. Semantic Data Model generation We used Protégé OWL [31] editor for ontology engineering and created a data model that comprised of primary classes Person, Household, Vehicle at the top most level. All the fields in the datasets were modeled as classes, subclasses, or properties and connected to the primary classes via suitable relationships. The ontology had both object properties and data properties depending on the type of data field being represented. Figure 3 shows a partial semantic data model with the primary classes and properties and their relationships. C. Semantic Instance Data generation The datasets were obtained from the sources in CSV format. An XML tool called Oxygen XML editor [32] was used to convert the CSV data into OWL instance data conforming to the vocabulary defined by the ontology. Table 4 shows a couple of sample instances from the Household dataset and table 5 shows a couple of sample instances from the Fuel Economy dataset. Table 2: Sample fields from National Household Vehicle data Table 3: Sample data fields for EPA Fuel Economy data D. Semantic Querying We tested the integrated data by querying it using the RDF Query language SPARQL [7]. The Java open source framework for building Semantic Web and Linked data applications, Apache Jena [33], was used for executing SPARQL queries. An application was built that loads the ontologies and data, checks for conformity of data with the vocabulary, and then run SPARQL queries against the data. A sample list of queries is provided in Table 6. This is an ongoing project and this initial version was built to test the proposed concept of a semantic ETL framework. We will continue building in this data set by pulling information from a variety of sources and analyzing the data for interesting results
6 Figure 3: Semantic data model Table 5: Sample OWL Fuel Economy Instance data Table 4: Sample OWL Household Instance data Table 6: Sample Queries
7 VI. CONCLUSIONS AND FUTURE WORK Integration of data from various heterogeneous sources into a meaningful data model that allows intelligent querying is an important open issue in the area of Big Data. Traditionally, Extract-Transform-Load (ETL) process has been used for data integration in the industry. This paper proposes a semantic ETL framework that uses semantic technologies to produce rich and meaningful knowledge around data integration and produces semantic data that can possible be published on the web and contribute to the Web of data. Successful creation of such a framework will be of tremendous use to various innovative Big Data applications as well as analytics. In order to test the proposed technique, we used the semantic ETL process to integrate a few public data sets with information on vehicles, household transportation, and fuel economy. We created a simple java application that ran SPARQL queries against the combined semantic data. One of the challenges of this project is ontology engineering that needs a fairly good understanding of the data from different sources. A human expert has to perform this step and it is often a time-consuming process. This is an ongoing project with scope for numerous interesting web and mobile applications that will use a variety of data to enhance user experience. Future work also includes looking into real-time data and how semantic ETL can help with its integration. Apps can be built for various domains such as automotive, aerospace, healthcare, education to list a few. REFERENCES [1] E. Kandogan, M. Roth, C. Kieliszewski, F. Ozcan, B. Schloss, and M.-T. Schmidt, Data for All: A Systems Approach to Accelerate the Path from Data to Insight, in 2013 IEEE International Congress on Big Data (BigData Congress), 2013, pp [2] C. Bizer, P. Boncz, M. L. Brodie, and O. Erling, The Meaningful Use of Big Data: Four Perspectives Four Challenges, SIGMOD Rec., vol. 40, no. 4, pp , Jan [3] A. Azzini and P. Ceravolo, Consistent Process Mining over Big Data Triple Stores, in 2013 IEEE International Congress on Big Data (BigData Congress), 2013, pp [4] C. Bizer, T. Heath, and T. Berners-Lee, Linked datathe story so far, International journal on semantic web and information systems, vol. 5, no. 3, pp. 1 22, [5] F. Lecue, S. Kotoulas, and P. Mac Aonghusa, Capturing the Pulse of Cities: Opportunity and Research Challenges for Robust Stream Data Reasoning, in Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence, [6] P. Hayes and B. McBride, Resource description framework (RDF), [Online]. Available: [Accessed: 28-Feb- 2014]. [7] SPARQL Query Language for RDF. [Online]. Available: [8] P. Vassiliadis, A. Simitsis, and E. Baikousi, A Taxonomy of ETL Activities, in Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, New York, NY, USA, 2009, pp [9] P. Vassiliadis, A. Simitsis, P. Georgantas, M. Terrovitis, and S. Skiadopoulos, A generic and customizable framework for the design of ETL scenarios, Information Systems, vol. 30, no. 7, pp , Nov [10] S. Luján-Mora, P. Vassiliadis, and J. Trujillo, Data mapping diagrams for data warehouse design with UML, in Conceptual Modeling ER 2004, Springer, 2004, pp [11] J. Trujillo and S. Luján-Mora, A UML based approach for modeling ETL processes in data warehouses, in Conceptual Modeling-ER 2003, Springer, 2003, pp [12] A. Simitsis, K. Wilkinson, M. Castellanos, and U. Dayal, QoX-driven ETL design: reducing the cost of ETL consulting engagements, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, 2009, pp [13] A. Karagiannis, P. Vassiliadis, and A. Simitsis, Macro-level Scheduling of ETL Workflows, Submitted for publication, [14] D. Skoutas and A. Simitsis, Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data:, International Journal on Semantic Web and Information Systems, vol. 3, no. 4, pp. 1 24, [15] S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori, and M. Vincini, A semantic approach to ETL technologies, Data & Knowledge Engineering, vol. 70, no. 8, pp , Aug [16] S. K. Bista, S. Nepal, and C. Paris, Data Abstraction and Visualisation in Next Step: Experiences from a Government Services Delivery Trial, in 2013 IEEE International Congress on Big Data (BigData Congress), 2013, pp [17] Y. Guo, X. Shang, J. Li, and Z. Li, Revealing the Causes of Dynamic Change in Protein-Protein Interaction Network, in 2013 IEEE International Congress on Big Data (BigData Congress), 2013, pp [18] E. Olmezogullari and I. Ari, Online Association Rule Mining over Fast Data, in 2013 IEEE International Congress on Big Data (BigData Congress), 2013, pp [19] M. U. Nisar, A. Fard, and J. A. Miller, Techniques for Graph Analytics on Big Data, in Big Data (BigData Congress), 2013 IEEE International Congress on, 2013, pp
8 [20] Y.-W. Cheah, R. Canon, B. Plale, and L. Ramakrishnan, Milieu: Lightweight and Configurable Big Data Provenance for Science, in Big Data (BigData Congress), 2013 IEEE International Congress on, 2013, pp [21] H. T. Lin and V. Honavar, Learning Classifiers from Chains of Multiple Interlinked RDF Data Stores, in Big Data (BigData Congress), 2013 IEEE International Congress on, 2013, pp [22] IBM InfoSphere Platform big data, information integration, data warehousing, master data management, lifecycle management and data security. [Online]. Available: 01.ibm.com/software/data/infosphere/. [Accessed: 28- Feb-2014]. [23] Warehouse Builder 11gR2: Home Page on OTN. [Online]. Available: [24] Integration Services Microsoft SQL Server [Online]. Available: [25] Enterprise Data Integration Informatica. [Online]. Available: [26] Talend Open Studio Talend. [Online]. Available: [27] Data Integration Pentaho Business Analytics Platform. [Online]. Available: [28] N. Shadbolt, W. Hall, and T. Berners-Lee, The semantic web revisited, Intelligent Systems, IEEE, vol. 21, no. 3, pp , [29] G. Antoniou and F. Van Harmelen, A semantic web primer. the MIT Press, [30] OWL - Semantic Web Standards. [Online]. Available: [Accessed: 23-Jan-2014]. [31] Protégé Ontology Editor. [Online]. Available: [32] Oxygen XML Editor. [Online]. Available: [Accessed: 28-Feb- 2014]. [33] Apache Jena - Home. [Online]. Available:
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment
DOI: 10.15415/jotitt.2014.22021 A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment Rupali Gill 1, Jaiteg Singh 2 1 Assistant Professor, School of Computer Sciences, 2 Associate
City Data Pipeline. A System for Making Open Data Useful for Cities. [email protected]
City Data Pipeline A System for Making Open Data Useful for Cities Stefan Bischof 1,2, Axel Polleres 1, and Simon Sperl 1 1 Siemens AG Österreich, Siemensstraße 90, 1211 Vienna, Austria {bischof.stefan,axel.polleres,simon.sperl}@siemens.com
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
Index Terms: Business Intelligence, Data warehouse, ETL tools, Enterprise data, Data Integration. I. INTRODUCTION
ETL Tools in Enterprise Data Warehouse *Amanpartap Singh Pall, **Dr. Jaiteg Singh E-mail: [email protected] * Assistant professor, School of Information Technology, APJIMTC, Jalandhar ** Associate Professor,
JOURNAL OF OBJECT TECHNOLOGY
JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer
Optimization of ETL Work Flow in Data Warehouse
Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. [email protected] P Srinivasu
Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya
Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data
E6895 Advanced Big Data Analytics Lecture 4:! Data Store
E6895 Advanced Big Data Analytics Lecture 4:! Data Store Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big Data Analytics,
Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain
Talend Metadata Manager Reduce Risk and Friction in your Information Supply Chain Talend Metadata Manager Talend Metadata Manager provides a comprehensive set of capabilities for all facets of metadata
Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance
Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
COMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
Big Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Publishing Linked Data Requires More than Just Using a Tool
Publishing Linked Data Requires More than Just Using a Tool G. Atemezing 1, F. Gandon 2, G. Kepeklian 3, F. Scharffe 4, R. Troncy 1, B. Vatant 5, S. Villata 2 1 EURECOM, 2 Inria, 3 Atos Origin, 4 LIRMM,
Big Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
Sunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014
5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for
BUSINESS VALUE OF SEMANTIC TECHNOLOGY
BUSINESS VALUE OF SEMANTIC TECHNOLOGY Preliminary Findings Industry Advisory Council Emerging Technology (ET) SIG Information Sharing & Collaboration Committee July 15, 2005 Mills Davis Managing Director
How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6
Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...
The Ontological Approach for SIEM Data Repository
The Ontological Approach for SIEM Data Repository Igor Kotenko, Olga Polubelova, and Igor Saenko Laboratory of Computer Science Problems, Saint-Petersburg Institute for Information and Automation of Russian
BUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
A Big Data-driven Model for the Optimization of Healthcare Processes
Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed under
CitationBase: A social tagging management portal for references
CitationBase: A social tagging management portal for references Martin Hofmann Department of Computer Science, University of Innsbruck, Austria [email protected] Ying Ding School of Library and Information Science,
www.sryas.com Analance Data Integration Technical Whitepaper
Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring
Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers
60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative
Secure Semantic Web Service Using SAML
Secure Semantic Web Service Using SAML JOO-YOUNG LEE and KI-YOUNG MOON Information Security Department Electronics and Telecommunications Research Institute 161 Gajeong-dong, Yuseong-gu, Daejeon KOREA
How To Build A Cloud Based Intelligence System
Semantic Technology and Cloud Computing Applied to Tactical Intelligence Domain Steve Hamby Chief Technology Officer Orbis Technologies, Inc. [email protected] 678.346.6386 1 Abstract The tactical
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many
White Paper. How Streaming Data Analytics Enables Real-Time Decisions
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
An Approach for Facilating Knowledge Data Warehouse
International Journal of Soft Computing Applications ISSN: 1453-2277 Issue 4 (2009), pp.35-40 EuroJournals Publishing, Inc. 2009 http://www.eurojournals.com/ijsca.htm An Approach for Facilating Knowledge
Industry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, [email protected] Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
TopBraid Insight for Life Sciences
TopBraid Insight for Life Sciences In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.
A generic approach for data integration using RDF, OWL and XML
A generic approach for data integration using RDF, OWL and XML Miguel A. Macias-Garcia, Victor J. Sosa-Sosa, and Ivan Lopez-Arevalo Laboratory of Information Technology (LTI) CINVESTAV-TAMAULIPAS Km 6
COURSE SYLLABUS COURSE TITLE:
1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
DATA MINING AND WAREHOUSING CONCEPTS
CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation
Bringing Business Objects into ETL Technology
Bringing Business Objects into ETL Technology Jing Shan Ryan Wisnesky Phay Lau Eugene Kawamoto Huong Morris Sriram Srinivasn Hui Liao 1. Northeastern University, [email protected] 2. Stanford University,
Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD
Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD Boris Mocialov (H00180016) MSc Software Engineering Heriot-Watt University, Edinburgh April 5, 2015 1 1 Introduction The purpose
Formal Methods for Preserving Privacy for Big Data Extraction Software
Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
Beyond the Single View with IBM InfoSphere
Ian Bowring MDM & Information Integration Sales Leader, NE Europe Beyond the Single View with IBM InfoSphere We are at a pivotal point with our information intensive projects 10-40% of each initiative
Getting Started Practical Input For Your Roadmap
Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data
Transforming Data into Intelligence Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data Big Data Data Warehousing Data Governance and Quality
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
ONTOLOGY-BASED MULTIMEDIA AUTHORING AND INTERFACING TOOLS 3 rd Hellenic Conference on Artificial Intelligence, Samos, Greece, 5-8 May 2004
ONTOLOGY-BASED MULTIMEDIA AUTHORING AND INTERFACING TOOLS 3 rd Hellenic Conference on Artificial Intelligence, Samos, Greece, 5-8 May 2004 By Aristomenis Macris (e-mail: [email protected]), University of
DISCOVERING RESUME INFORMATION USING LINKED DATA
DISCOVERING RESUME INFORMATION USING LINKED DATA Ujjal Marjit 1, Kumar Sharma 2 and Utpal Biswas 3 1 C.I.R.M, University Kalyani, Kalyani (West Bengal) India [email protected] 2 Department of Computer
LDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany [email protected],
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd
Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd Page 1 of 8 TU1UT TUENTERPRISE TU2UT TUREFERENCESUT TABLE
www.ducenit.com Analance Data Integration Technical Whitepaper
Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization
Exploiting Data at Rest and Data in Motion with a Big Data Platform
Exploiting Data at Rest and Data in Motion with a Big Data Platform Sarah Brader, [email protected] What is Big Data? Where does it come from? 12+ TBs of tweet data every day 30 billion RFID tags
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
Semantically Enhanced Web Personalization Approaches and Techniques
Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,
A Semantic web approach for e-learning platforms
A Semantic web approach for e-learning platforms Miguel B. Alves 1 1 Laboratório de Sistemas de Informação, ESTG-IPVC 4900-348 Viana do Castelo. [email protected] Abstract. When lecturers publish contents
MicroStrategy Course Catalog
MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY
Ontology for Home Energy Management Domain
Ontology for Home Energy Management Domain Nazaraf Shah 1,, Kuo-Ming Chao 1, 1 Faculty of Engineering and Computing Coventry University, Coventry, UK {nazaraf.shah, k.chao}@coventry.ac.uk Abstract. This
De la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data
De la Business Intelligence aux Big Data Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris 22/01/14 Séminaire Big Data 1 Agenda EvoluHon of Business Intelligence SemanHc Technologies
A Knowledge Management Framework Using Business Intelligence Solutions
www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For
A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud.
A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud. Tejas Bharat Thorat Prof.RanjanaR.Badre Computer Engineering Department Computer
Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer
Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
ISSN:2321-1156 International Journal of Innovative Research in Technology & Science(IJIRTS)
Nguyễn Thị Thúy Hoài, College of technology _ Danang University Abstract The threading development of IT has been bringing more challenges for administrators to collect, store and analyze massive amounts
Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
ETL as a Necessity for Business Architectures
Database Systems Journal vol. IV, no. 2/2013 3 ETL as a Necessity for Business Architectures Aurelian TITIRISCA University of Economic Studies, Bucharest, Romania [email protected] Today, the
Data Science & Big Data Practice
INSIGHTS ANALYTICS INNOVATIONS Data Science & Big Data Practice Manufacturing Internet of Things (IoT) Amplify Serviceability and Productivity by integrating machine /sensor data with Data Science What
Methods and Technologies for Business Process Monitoring
Methods and Technologies for Business Monitoring Josef Schiefer Vienna, June 2005 Agenda» Motivation/Introduction» Real-World Examples» Technology Perspective» Web-Service Based Business Monitoring» Adaptive
Cloud computing based big data ecosystem and requirements
Cloud computing based big data ecosystem and requirements Yongshun Cai ( 蔡 永 顺 ) Associate Rapporteur of ITU T SG13 Q17 China Telecom Dong Wang ( 王 东 ) Rapporteur of ITU T SG13 Q18 ZTE Corporation Agenda
Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:
Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Big Data Executive Survey
Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the
Chapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
SAP Data Services 4.X. An Enterprise Information management Solution
SAP Data Services 4.X An Enterprise Information management Solution Table of Contents I. SAP Data Services 4.X... 3 Highlights Training Objectives Audience Pre Requisites Keys to Success Certification
Decision Ready Data: Power Your Analytics with Great Data. Murthy Mathiprakasam
Decision Ready Data: Power Your Analytics with Great Data Murthy Mathiprakasam 2 Your Mission Repeatably deliver trusted and timely data for great analytics and great social impact 3 Great Data Powers
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology
International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D.
Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D. Most college courses in statistical analysis and data mining are focus on the mathematical techniques for analyzing data structures, rather
Chapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here
Data Virtualization for Agile Business Intelligence Systems and Virtual MDM To View This Presentation as a Video Click Here Agenda Data Virtualization New Capabilities New Challenges in Data Integration
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
Big Analytics: A Next Generation Roadmap
Big Analytics: A Next Generation Roadmap Cloud Developers Summit & Expo: October 1, 2014 Neil Fox, CTO: SoftServe, Inc. 2014 SoftServe, Inc. Remember Life Before The Web? 1994 Even Revolutions Take Time
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 OVERVIEW About this Course Data warehousing is a solution organizations use to centralize business data for reporting and analysis.
Department of Defense. Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD
Department of Defense Human Resources - Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD Federation Defined Members of a federation agree to certain standards
