On Building XML Data Warehouses

Size: px
Start display at page:

Download "On Building XML Data Warehouses"

Transcription

1 On Building XML Data Warehouses Laura Irina Rusu 1 Wenny Rahayu 1 David Taniar 2 1 LaTrobe University, Department of Computer Science and Computer Engineering, Bundoora, VIC 3086, Australia {lrusu,wenny}@cs.latrobe.edu.au 2 Monash University, School of Business Systems, Clayton, VIC 3800, Australia David.Taniar@infotech.monash.edu.au Abstract. Developing a data warehouse for XML documents implies two major processes: one of creating it, by processing XML raw documents into a specified data warehouse repository; and one of querying it, by applying techniques to better answer user s queries. This paper focuses on the first part; that is identifying a systematic approach for building a data warehouse of XML documents, specifically for transferring data from an underlying XML database into a defined XML data warehouse. The proposed methodology on building XML data warehouses covers processes such as data cleaning and integration, summarization, intermediate XML documents, and updating/linking existing documents and creating fact tables. We utilise the XQuery technology in all of the above processes. In this paper, we also present a case study on how to put this methodology into practice. 1. Introduction In the last few years, building a data warehouse for XML documents became a very important issue, if we consider continual growing of representing different kind of data as XML documents [1,11]. This is one of the reasons why researchers become interested in studying ways to optimise processing of XML documents and to obtain a better data warehouse to store optimised information for future reference. How to build a data warehouse and how to efficiently query it depend on the following two strong-related issues. In the XML document repository case, we cannot have good answers for our queries, in terms of speed, quality and up-to-date information, if the data warehouse is not properly designed to contain all necessary information and, on the other hand, we cannot properly design a data warehouse if a reasonable amount of possible future questions of users are unknown and unanalysed. Many papers have analysed how to design a better data warehouse for XML data, from different points of view (e.g. [1, 2, 5, 6]) and many other papers have focused on querying XML data warehouse or XML documents (e.g. [7, 8]), but almost all of them considered only the design and representations issues of XML data warehouse or how to query them and very few considered optimisation of data quality in their research.

2 In this paper, we propose a practical methodology for building XML documents data warehouse. We also ensure that the data warehouse is one where the occurrences of dirty data, errors, duplications or inconsistencies are minimized as much as possible and a good summarisation exists. The steps cover data cleaning and integration, summarization, intermediate XML documents, and updating/linking existing XML documents and creating fact tables. We use XQuery in all of the above processes. The main purpose of this paper is to show systematic steps to build a XML data warehouse as opposed to developing a model for designing a data warehouse. However, it is important to note that our proposed steps for building a XML data warehouse is generic enough to be applied on different XML data warehouse model. The rest of this paper is organised as follows. Section 2 gives a brief introduction to the XML technology, including XML data warehouses. Section 3 discusses related work. Section 4 presents our proposed methodology in building XML data warehouses. Section 5 describes a case study, and finally section 6 gives the conclusions. 2. Background As the background to our work, we briefly discuss XML technology covering XML, XML Schema and XQuery. We also briefly introduce XML data warehouses, including their schemas XML, XML Schema, and XQuery XML (extensible Markup Language) has increasingly been used for storing different kinds of semistructured data [11]. It became a new standard for data representation and exchange on the Internet, from web servers and web application data to business data, which previously were stored in relational databases. One new thing introduced by XML was tag significance, from a way to predict how the web page will look like (as presentation tags in HTML) to specifying what is the context or meaning of the element contained. Name of the elements & attributes can be easily bound to their content in XML; and therefore it is self-describing. Hierarchy can be chosen to accommodate grouping or categorization and elements/attributes can be added anytime in the structure, because it is extensible, as suggested by the name. Due to this extended feature in defining new elements, the XML document structure can be changed multiple times; and in the case of very big documents, the track of changes can be lost and even content of elements or value of attributes can differ, by mistake, from what we intend it to be. To overcome this problem, XML Schema is used to specify a valid structure for each XML document, where elements and attributes, hierarchy, data types, possible values (restrictions), order indicators, occurrence indicators etc. are clearly defined [11]. Strictly following XML schema can ensure that probability to have inconsistencies in the document is reduced and querying that document using XQuery, for example, will give better results.

3 XQuery is a language developed by W3C (World Wide Web Consortium) [9], still under development but functional at the same time, designed to provide query capabilities for the same range of data that XML stores [11]. It can be used for querying XML data that do not have a schema at all, or that is governed by a XML schema or by a DTD (Document Type Definition). XQuery data model is based on the concept where each document is represented as a tree of nodes (where each node can be a child of another node), as opposed to the traditional relational model which does not utilise hierarchies. One of the most powerful expressions in XQuery is FLWOR, translated by for-let-where-order by- and it is quite similar to select-from-where from SQL in relational databases, but FLWOR expression binds variables to values in for and let clauses and use them to create a result in, eliminating tuples which do not satisfy a specific condition in where clause and ordering the tuples in the order clause XML Data Warehouses Query processing can be facilitated by the existence of a data warehouse, that is, a repository where necessary data are stocked, after they have been extracted, transformed and integrated from initial documents [6]. Because these days diskspace is no longer an issue, whereas query speed is very significant, one of the main purposes of a data warehouse is optimising query processing speed and this can be achieved using multiple levels of summarisation during the processing. As volume of data contained in these XML documents are continually growing and changing, XML data warehouse intend to solve the issue of providing users with up-to-date, correct and comprehensive information. A star-schema approach of data warehouse would contain facts, dimensions, attributes and attributes hierarchies. Facts will contain cleaned and correct specific values representing the business activity (for example sales ). Dimensions are characteristics that will slice our data in specific subsets, for example sales by region or sales by years. Details contained in dimension documents are attributes and they can have a specific hierarchy, for example country=>region=>state=>city. 3. Related Work There is a large amount of work in data warehouse field. Many researchers have studied how to construct a data warehouse, first for relational databases [2,3,4,10] but in the last years, for XML documents [5,6], considering the spread of use for this kind of documents in a vast range of activities. A concrete methodology on how to construct a XML data warehouse analysing frequent patterns in user historical queries is provided in [6]. The authors start from determining which data sources are more frequently accessed by the users, transform those queries in Query Path Transactions and, after applying a rule mining technique, they calculate the Frequent Query Paths which stay at the base of building data warehouse schema. It was also mentioned that the final step in

4 building a data warehouse would be to acquire clean and consistent data to feed to the data warehouse. However, there is not enough detail on how to ensure this. Although it seems to be a simple thing to do in the whole process, this is the place where corrupted or inconsistent data can slip into the data warehouse. Another approach is proposed in [5], where a XML data warehouse is designed from XML schemas, proposing for this a semi-automated process. After preprocessing XML schema, creating and transforming schema graph, they choose facts for data warehouse and, for each facts, follow few steps in order to obtain starschema: building the dependency graph from schema graph, rearranging the dependency graph, defining dimensions and measures and creating logical schema. In their approach, XQuery is used to query XML documents in three different situations: examination of convergence and shared hierarchies, searching for manyto-many relationship between the descendants of the fact in schema-graph and searching for to-many relationship towards the ancestors of the fact in the schemagraph. The authors specify that in the presence of many-to-many relationship one of the logical design solution proposed by [10] is to be adopted. [3] consider the aspect of data correctness and propose a solution where datacleaning application is modelled as a directed acyclic flow of transformations applied to the source data. This framework consists of a platform offering three services: data transformation, multi-table matching and duplicate elimination, each service being supported by a shared kernel of four macro-operators consisting of high-level SQL-like commands. The framework proposed by these authors for data cleaning automation addresses three main problems: object-identity, data entry errors and data inconsistencies across overlapping autonomous databases but the method cover only data cleaning in relational databases aspect. Authors of [4] survey various summarisation procedures for databases and provide a categorisation of the mechanisms by which information can be summarised. They consider information capacity, vertical reduction by attribute projection, horizontal reduction by tuple selection, and horizontal/vertical reduction by concept ascension as being few very good methods of summarisation, but as they only analysed only implementation on databases projects, future work should be done for implementing some specific technique of summarisation for XML documents. Our proposed method focuses on practical aspects of building XML data warehouses through several practical steps, including data cleaning and integration, summarization, intermediate XML documents, and updating/linking existing documents and creating fact tables. We have developed generic methods whereby the proposed method is able to be applied to any collection of XML documents to be stored in an XML data warehouse. We believe that a step-by-step XML data warehouse creation from existing XML documents is essential and moreover, by employing these steps, the result XML data warehouse will offer better performance in processing future queries.

5 4. Proposed Method on Building XML Data Warehouses Our paper proposes a systematic approach on how to feed data from an underlying XML database into a XML data warehouse, in such a way so optimisation of queries can be made possible. Also, a methodology of building a data warehouse from an initial XML document is provided, developing necessary fact and dimensions Data cleaning and integration The existence of a XML schema is very important, especially for analysing data correctness, because its purpose is to define a valid way to specify the content of XML documents which are considered for analysis. If it is correctly established and all documents are compliant with it, there is a lower probability to have dirty data. We defined four rules for performing data cleaning and integration. Rule1. If a schema exists, we should verify correctness of all schema stipulations: - verifying if correct use of name of elements and attributes in the entire document: <name> <Name> <NAME> - observing if data type & natural logic is respected (for example, a telephone number will probably have xsd:string data type but it can have only digits, not letters etc ); - verifying if all elements and attributes are entered in their schemaspecified hierarchy, for example: <name>john</name> <address> <street>15, New St.</street> <city>melbourne</city> <state>vic</state> </address> <name>john</name> <street>15, New St.</street> <city>melbourne</city> <state>vic</state> - verifying if order indicators are respected. They can be: all, choice, sequence. All indicate that child elements declared in it can appear in any order and each child element must occur only once. Choice indicates that either one element or another can occur, and if sequence indicator exists, elements inside it should appear only in the specified order; - identifying Null values; In relational database theory, Null values have two main meanings [4]: (1) the value was unknown or (2) was inapplicable In XML schema, number of occurrences of an element can be specified, for example minoccur= 1 & maxoccur= 20 means that at least one apparition of element should occur and no more than 20 occurences; - any other schema related specification / restriction;

6 Rule2. Eliminating duplicate records, for example a customer name was entered two or more times, by different departments in a store, in a different manner /order (surname&firstname, firstname&surname, surname&the father s initial&firstname etc) Rule3. Eliminate inconsistencies from elements & attributes values, for example existence of two different dates of birth for the same person [3]. As they cannot be two, the right one only should be kept. Rule4. Eliminating errors in data, determined by entering process, for example mistyping. Some of the data-cleaning processes have to be done manually, because they require user intervention and occasionally domain expert understanding of the area. However, Rule 1 above can be automated to reduce the need of user intervention in this process Data summarisation This is a very important step, as multiple issues must be taken in consideration: - Because not the entire volume of existing data in the underlying database will be absorbed into the XML data warehouse, we should consider very carefully a process of data summarisation. We must extract only useful and valuable information. At the same time it should be diversified, as in the future it should answer to multiple queries. - Depending of how affordable is disk-space issue, how often specific queries appear or if historical data would be necessary, we can have multiple levels of summarisation, applying different mechanisms. - On the other hand, multiple levels of summarisation, when exist, can significantly improve the speed of querying. In most of the data warehouse projects, speed of querying is above disk-space issue, so should consider detailing levels of summarisation as much as necessary. Creating dimensions using summarization Depending of how many levels of summarisation we will have for a specific dimension, we would create and populate new documents, which will contain extraction from initial data or special-constructed values. For example (see Figure 1), a) if we need to create a part-of-the-day dimension (query: What are dynamic of sales during one day: morning, afternoon, evening ) we will create and populate it as a very new document; b) if we need country or region as a level of summarisation (query: What are the sales of product X by countries/regions ), we can find this info by querying directly into the primary document and searching for distinct values of country and/or region element.

7 XML documents storing sales details m Pick distinct values of country Pick distinct values of region Create & populate as a new XML document 1 Regions region_id region country ID m 1 1 Part-of-the-day time_id morning afternoon evening Countries country_id country region ID Dimension A Dimension B Fig.1 Dimensions created and populated as new XML documents Techniques to create time dimension: Different level of summarization used in time dimensions can be, for example: month, quarter, semester, year. For month and year some specific XQuery functions [12] can be applied to get them. Furthermore, we are interested only in those values related to our data, so only distinct value should be extracted. At the same time, a key should be constructed, to act as a link to the fact document. The following shows how time dimension is created using XQuery. let $a:=0 document { for $b in distinct-values(doc( doc_name.xml )/date_node) let $a:=$a+1 <time_node> <timeid>{$a}</timeid> <date>{$b}</date> <month>{get-month-from-date($b)}</month> <year>{get- year -from-date($b)}</ year > </time_node> } To exemplify with a more specific case, for example to create a semester dimension, we should create and populate it with our own data. document { <semester> <semid>1</semid> <start_month>1</start_month> <end_month>6</end_month> </semester> <semester> <semid>2</semid> <start_month>7</start_month> <end_month>12</end_month> </semester> }

8 When we link this dimension to the fact data, we will only determine which semester that each date corresponds. Functions to extract date and time are available in XQuery [12], so a large range of time level summarisation can be analysed and created. Techniques to create other kind of dimensions: After determining the dimensions, we should check if values of that specific dimension exist in our document. If not, they should be created, such as in semester example above. If exist, we will be interested in distinct values of the element involved. let $a:=0 document { for $t in distinct-values(doc( doc_name.xml )//element) let $a:=$a+1 <new_element> <elementid>{$a}</elementid> <element_name>{$t}</element_name> </new_element> }, where <new_element> is a tag representing a new created element in the dimension. It contains a key (<elementid>, which takes predetermined values), actual value which are interesting for us (that is <element_name>, e.g. values of country ) and any other elements which can be helpful in the dimension. If we have more level of summarization for the same dimension (e.g. country and region in location dimension), we will create a new document for each of those levels and we will connect each other using keys Creating intermediate XML documents In the process of creating a data warehouse from collection of documents, creating intermediate documents is a common way to extract valuable & necessary information (refer to Figure 2)

9 XML documents storing sales details invoice_id invoicedate price quantity customer_id customer_name customer_address carrier transport_fee.. Intermediate Orders D invoice_id invoicedate quantity price customer ID Fig.2 Extracting important information as new XML document Which information in the initial documents is most important and necessary and should be kept in the data warehouse is a very good question and researchers have attempted to answer it by determining different complex techniques (detecting patterns in historical user queries [6] or detecting shared hierarchies and convergence of dependencies [7]). Still, for general users the analysis of possible queries in the domain remains a common way to do it. During this step, we are interested only in data representing activity, which include data involved in queries, calculations etc., from our initial document. At the same time, we will bring in the intermediate document elements from our initial document that are keys to dimensions, if already exist (e.g. customerid in Fig.2, which reference customer dimension). Actual fact document in data warehouse will be this intermediate document, but linked to the dimensions. document { for $t in (doc( doc_name.xml )) <temp_fact> <elem1_name>{$t//elem1_content}</elem1_name> <elem2_name>{$t//elem2_content}</elem2_name> </temp_fact> } where <temp_fact> is a tag representing a new element in the intermediate document, containing <elem1_name> (name of the element) and <elem1_content> (value of element which is valuable for our fact document) etc. Actual correct path in expression {$t//elem1_content} depends on the schemagraph of the initial document and where elem2_content exist in that schema Updating/Linking existing documents; Creating fact table At this step all intermediate XML documents created before should be linked, in such a way that relationships between keys are established (see Fig 3).

10 Intermediate Orders invoice_id invoicedate quantity price customer_id Orders Fact invoice_id time_id quantity price value customer ID Part-of-the-day time_id morning afternoon evening Countries country_id country region ID (another dimensions, e.g. customers ) Fig.3 Linking documents and creating star-schema of data warehouse Techniques to link dimensions and to create data warehouse star-schema: If linking dimensions to intermediate document and obtaining fact are processed all together, the number of iterations through our initial document will be lower, so it subsequently reduces the processing time. A general way to do it can be: let $a:=doc( dimension1.xml ) e.g. time dimension let $b:=doc( dimension2.xml ) e.g. customer dimension document { for $t in (doc( intermediate.xml )/node) <dim1_key>{for $p in $a where $p//element=$t//element $p//dim1_key} </dim1_key> <dim2_key>{for $p in $b where $p//element=$t//element $p//dim2_key} </dim2_key> ( for all dimensions ) <elem1>{$t//elem1_name}</elem1> <elem2>{$t//elem2_name}</elem2> <elem3>{$t//elem1 * $t//elem2}</elem3> ---- (for all extracted & calculated elements ) ---- } In the example above, we just obtain the fact, where <dim1_key>, <dim2_key> etc represent the new created keys elements which will link the fact to dimensions and <elem1>, <elem2>,<elem3> etc are elements of the fact, extracted from intermediate document. As can be seen in <elem3> declaration, a large range of

11 operators can be applied, in order to obtain desired values for analysis (e.g. price * quantity=income) 5. A Case Study Because the main purpose of this paper is to show systematic steps to build a XML data warehouse as opposed to developing a model for designing a data warehouse, purely for the purpose of a case study we have adopted an existing model and schema mapping as proposed in [5]. However, it is important to note that our proposed steps for building a XML data warehouse is generic enough to be applied on different XML data warehouse model. In this section, we will show how the star-schema of data warehouse can be obtained from initial XML documents structure, following our proposal and steps presented in the previous section. An example of visual representation of XML document (based on [5]) is shown in Figure 4. purchaseorder billto shipto item orderdate product name name brand street country name quantity price productcode size city Fig.4 Example of XML document schema-graph The following codes show the mapping from the visual representation as described in Figure 4 to the implementation in XML Schema. <?xml version="1.0" encoding="iso "?> <xs:schema xmlns:xs=" <xs:element name="purchaseorder"> <xs:complextype> <xs:element name="shipto"> <xs:complextype> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="street" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complextype> <xs:element name="billto">

12 <xs:complextype> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="street" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complextype> </xs:element> <xs:element name="item" maxoccurs="unbounded"> <xs:complextype> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="quantity" type="xs:positiveinteger"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complextype> </xs:element> <xs:element name="product"> <xs:complextype> < xs:element name="name" type="xs:string"/> <xs:element name="productcode" type="xs:string"/> <xs:element name="size" type="xs:string"/> <xs:element name="brand" type="xs:string"/> </xs:complextype> </xs:complextype> </xs:element> </xs:schema> Step 1. Data cleaning and integration The first step is to apply the specified Rules 1 to 4 as described in section 4.1 which include verifying correctness of all schema stipulations, eliminating duplicate records, inconsistencies and data entry errors. These rules are applied to the data that will be transferred to our data warehouse. This step is normally performed by a user who has a good understanding of the domain, through manual techniques or otherwise. Once all the above rules have been applied to the data, the following step 2 is then performed. Step 2. Data summarisation By analysing possible queries to the data warehouse for decision making, for example, we need to summarise our purchase details at a month level, so in our time dimension one of the attributes will be month (the same way can be done for year, semester, quarter etc). At the same time, a summarisation by customer and by product are necessary and almost all queries ask for an income calculation, that is price multiplied by quantity. For easy referencing, we bring again the schemagraph in attention:

13 a) Creating Time Dimension- because we need month level of summarization, we should link each particular date in our data with it s month value, so we will extract only distinct values of orderdate from initial document and will apply a getmonth-from-date function, in order to find which month the date corresponds to. We name this new document timedim.xml. Each new document will be created using document instruction from XQuery. let $b:=0 document{ for t in distinct-values (doc( purchaseorder.xml )//orderdate) let $b:=$b+1 <ordtime> <timekey>{$b}</timekey> <orderdate>{$t}</orderdate> <month>{get-month-from-date($t)}</month> </ordtime> } b) Creating Customer Dimension we can have for a purchaseorder different customers for shipto and billto destinations or they can be the same, so by creating a customer dimension actually we create something which shipto and billto will refer in the future, without unnecessary duplication of data in principal document. To do this, we will extract only distinct (unique) customers from our initial document and we will name it customerdim.xml. let c:=doc( purchaseorder.xml )//purchaseorder let $b:=0 document{ for $t in distinct-values ($c//name) let $b:=$b+1 <customer> <customerkey>{$b}</customerkey> <name>{$t}</name> <street>{$t/../street}</street> <city>{$t/../city}</city> <country>{$t/../country}</country> </customer> } c) Creating product dimension- the same as for customer dimension, we will create a product dimension which contains only distinct products extracted from our initial document and we will consider ProductCode element as a key, because it can uniquely identify a product. We will name it productdim.xml. let $b:=0 let $p:= doc( purchaseorder.xml )//purchaseorder document { for $t in dictinct-values($p//productcode) let $b:=$b+1

14 } <product> <productkey>{$t}</productkey> <productname>{$t/../..//name} </productname> <size>{$t/../..//size}</size> <brand>{$t/../..//brand}</brand> </product> Step 3. Creating intermediate XML documents As our possible queries refer to month, customer and product as possible levels of summarisation, we need to extract from the initial purchaseorder example all records having order date, customer name (from shipto, billto ) and product code and, among them, price and quantity as principal figures when analysing this sales activity. It is all we need for now and we will name it PurchaseTemp.xml. document { for $t in doc( purchaseorder.xml )//purchaseorder <purchase> } <shiptoname>{$t/shipto/name}</shiptoname> <billtoname>{$t/billto/name}</billtoname> <orderdate>{$t/orderdate}</orderdate> <productcode>{$t//productcode}</productcode> <price>{$t//item/price}</price> <quantity>{$t//item/quantity}</quantity> </purchase> Step 4. Linking existing (new created) documents; Creating data warehouse It is straightforward that we now have to link orderdate from our intermediate document with time dimension, name with customer dimensions and productcode with product dimension. We create now the final document, which will be the fact of data warehouse and will contain: keys for links with all three dimensions, price and quantity as raw specific activity data and an income element as common required calculation. let $p:=doc( productdim.xml ) let $c:=doc( customerdim.xml ) let $t:=doc( timedim.xml ) document { for $a in doc( PurchaseTemp.xml )/purchase <shiptocustomerkey>{for $b in $c/customer where $b/name=$a/shiptoname $b/customerkey } </shiptocustomerkey> <billtocustomerkey> {for $b in $c/customer

15 } where $b/name=$a/billtoname $b/customerkey} </billtocustomerkey> <orderdatekey> {for $b in $t/ordtime where $t/orderdate= $a/orderdate $t/timekey} </orderdatekey> <productkey> {for $b in $p/product where $b/productkey=$a/productcode $b/productkey } </productkey> <USPrice>{$a/price}</USPrice> <quantity>{$a/quantity}</quantity> <income>{$a/price* $a/quantity } </income> Following all the steps from 1 to 4, we have just obtained the following star-schema for purchaseorder XML data warehouse, as mentioned in [5]. TIME timekey orderdate month PRODUCT productkey productname size brand PURCHASE_ORDER shiptocustomerkey billtocustomerkey orderdatekey productkey USPrice quantity income CUSTOMER customerkey name street city country Fig. 5 Star - schema of purchaseorder XML data warehouse [5] 6. Discussions As mentioned in the introduction at this paper, data contained in data warehouse ideally should be in-time, high-quality, up-to-date and easy to query. Our method helps these requirements to be accomplished: If the fact and dimensions of data warehouse are correctly built following the steps and if cleaning data and integration are performed every time when new data should be included in data warehouse, a high-quality data warehouse will be achieved and subsequently it will support more efficient querying. Using the appropriate summarization in developing dimensions can bring an optimization of data volume added in time in data warehouse, too. So we can say that the methodology s features bring optimization in the long run. Another strong accomplishment is that the steps of work are presented in a very clear and easy manner and people who doesn t have too much knowledge of XQuery or XML schema can iterate them, with adequate and proper modifications, in order

16 to obtain a data warehouse corresponding to their necessities. Queries will be easy to ask and process, as long as dimensions reflect correct all level of summarization suitable for each specific case. For example, query Give all purchase orders which were billed to customers from UK will only look in customer dimension and take keys for customers who have country= UK and, for each of these keys will take order date, price, quantity and income from fact document. 7. Conclusions and Future work The aim of this paper is to establish a framework for building a XML data warehouse, in term of quality of data and procedures to follow by the possible working group. This framework has been build following few important steps but, considering the specific situation of XML documents involved in processing, more detailed analysis of data, cleaning, summarization and integration can be done. Because at this stage there are no automatic methods for covering data cleaning, our aim is to study this concept and try to identify some techniques which can be applied to XML documents processing. Other interesting aspect to study can be transformation of XML schema when data to be included in the data warehouse come from documents with different structures but with equivalent data content, how we will manipulate all aspects of data integration, creating dimensions and facts in data warehouse. References 1. Widom J., Data Management for XML: Research Directions, IEEE Data Engineering Bulletin, 22(3):44:52, Sept Goffarelli, M., Maio, D., Rizzi, S., Conceptual design of data warehouses from E/R schemes Proc. HICSS-31, Kona, Hawaii, Galhardas, H., Florescu, D., Shasha, D., Simon, E., An Extensible framework for data cleaning, Proc. of the International Conference on Data Engineering, San Diego, CA, Roddick, J.F., Mohania, M.K., Madria, S.K., Methods and Interpretation of Database Summarisation, Database and Expert Systems Application, Florence, Italy, Lecture Notes in Computer Science, Vol.1677, pp , Springer-Verlag, Vrdoljak, B., Banek M. and Rizzi S., Designing Web Warehouses from XML Schema, Data Warehousing and Knowledge Discovery, 5 th International Conference DaWak 2003, Prague, Czech Republic, Sept.3-5, Zhang J., Ling T.W., Bruckner R.M. and Tjoa A.M., Building XML Data Warehouse Based on Frequent Patterns in User Queries, Data Warehousing and Knowledge Discovery, 5 th International Conference DaWak 2003, Prague, Czech Republic, Sept.3-5, Fernandez M., Simeon J. and Wadler P., XML Query Languages : Experiences and Exemplars, Draft manuscris, September 1999, 8. Deutch A., Fernandez M., Florescu D., Levy A. and Suciu D., A Query Language for XML, Computer Networks, vol.31, pp , Amsterdam, Netherlands, 1999

17 9. Robie J., XQuery, A guided Tour (ISBN ), copyright chapter 1, posted with permission from Addison-Wesley Song I.Y., Rowen W., Medsker C. and Ewen E., An analysis of many-to-many relationships between fact and dimension tables in dimensional modelling, Proc. DMDW, Interlaken, Switzerland, pp , World Wide Web Consortium (W3C), XML Schema Part 0: Primer,

Logical Design of Data Warehouses from XML

Logical Design of Data Warehouses from XML Logical Design of Data Warehouses from XML Marko Banek, Zoran Skočir and Boris Vrdoljak FER University of Zagreb, Zagreb, Croatia {marko.banek, zoran.skocir, boris.vrdoljak}@fer.hr Abstract Data warehouse

More information

Modernize your NonStop COBOL Applications with XML Thunder September 29, 2009 Mike Bonham, TIC Software John Russell, Canam Software

Modernize your NonStop COBOL Applications with XML Thunder September 29, 2009 Mike Bonham, TIC Software John Russell, Canam Software Modernize your NonStop COBOL Applications with XML Thunder September 29, 2009 Mike Bonham, TIC Software John Russell, Canam Software Agenda XML Overview XML Thunder overview Case Studies Q & A XML Standard

More information

+ <xs:element name="productsubtype" type="xs:string" minoccurs="0"/>

+ <xs:element name=productsubtype type=xs:string minoccurs=0/> otcd.ntf.001.01.auctiondetail.. otcd.ntf.001.01.auctionresult - + otcd.ntf.001.01.automaticterminationsummary

More information

Transformation of OWL Ontology Sources into Data Warehouse

Transformation of OWL Ontology Sources into Data Warehouse Transformation of OWL Ontology Sources into Data Warehouse M. Gulić Faculty of Maritime Studies, Rijeka, Croatia marko.gulic@pfri.hr Abstract - The Semantic Web, as the extension of the traditional Web,

More information

Using Relational Algebra on the Specification of Real World ETL Processes

Using Relational Algebra on the Specification of Real World ETL Processes Using Relational Algebra on the Specification of Real World ETL Processes Vasco Santos CIICESI - School of Management and Technology Polytechnic of Porto Felgueiras, Portugal vsantos@estgf.ipp.pt Orlando

More information

Module 1: Introduction to Data Warehousing and OLAP

Module 1: Introduction to Data Warehousing and OLAP Raw Data vs. Business Information Module 1: Introduction to Data Warehousing and OLAP Capturing Raw Data Gathering data recorded in everyday operations Deriving Business Information Deriving meaningful

More information

XIII. Service Oriented Computing. Laurea Triennale in Informatica Corso di Ingegneria del Software I A.A. 2006/2007 Andrea Polini

XIII. Service Oriented Computing. Laurea Triennale in Informatica Corso di Ingegneria del Software I A.A. 2006/2007 Andrea Polini XIII. Service Oriented Computing Laurea Triennale in Informatica Corso di Outline Enterprise Application Integration (EAI) and B2B applications Service Oriented Architecture Web Services WS technologies

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday

More information

Data Integration Hub for a Hybrid Paper Search

Data Integration Hub for a Hybrid Paper Search Data Integration Hub for a Hybrid Paper Search Jungkee Kim 1,2, Geoffrey Fox 2, and Seong-Joon Yoo 3 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu,

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

keyon Luna SA Monitor Service Administration Guide 1 P a g e Version Autor Date Comment

keyon Luna SA Monitor Service Administration Guide 1 P a g e Version Autor Date Comment Luna SA Monitor Service Administration Guide Version Autor Date Comment 1.1 Thomas Stucky 25. July 2013 Update installation instructions. 1 P a g e Table of Contents 1 Overview... 3 1.1 What is the keyon

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases

More information

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation. [MS-EDCSOM]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Deferred node-copying scheme for XQuery processors

Deferred node-copying scheme for XQuery processors Deferred node-copying scheme for XQuery processors Jan Kurš and Jan Vraný Software Engineering Group, FIT ČVUT, Kolejn 550/2, 160 00, Prague, Czech Republic kurs.jan@post.cz, jan.vrany@fit.cvut.cz Abstract.

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Conceptual Workflow for Complex Data Integration using AXML

Conceptual Workflow for Complex Data Integration using AXML Conceptual Workflow for Complex Data Integration using AXML Rashed Salem, Omar Boussaïd and Jérôme Darmont Université de Lyon (ERIC Lyon 2) 5 av. P. Mendès-France, 69676 Bron Cedex, France Email: firstname.lastname@univ-lyon2.fr

More information

A Survey on Web Mining From Web Server Log

A Survey on Web Mining From Web Server Log A Survey on Web Mining From Web Server Log Ripal Patel 1, Mr. Krunal Panchal 2, Mr. Dushyantsinh Rathod 3 1 M.E., 2,3 Assistant Professor, 1,2,3 computer Engineering Department, 1,2 L J Institute of Engineering

More information

The Benefits of Data Modeling in Data Warehousing

The Benefits of Data Modeling in Data Warehousing WHITE PAPER: THE BENEFITS OF DATA MODELING IN DATA WAREHOUSING The Benefits of Data Modeling in Data Warehousing NOVEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2 SECTION 2

More information

Conceptual Level Design of Semi-structured Database System: Graph-semantic Based Approach

Conceptual Level Design of Semi-structured Database System: Graph-semantic Based Approach Conceptual Level Design of Semi-structured Database System: Graph-semantic Based Approach Anirban Sarkar Department of Computer Applications National Institute of Technology, Durgapur West Bengal, India

More information

New Approach of Computing Data Cubes in Data Warehousing

New Approach of Computing Data Cubes in Data Warehousing International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of

More information

Model-driven Rule-based Mediation in XML Data Exchange

Model-driven Rule-based Mediation in XML Data Exchange Model-driven Rule-based Mediation in XML Data Exchange Yongxin Liao, Dumitru Roman, Arne J. Berre SINTEF ICT, Oslo, Norway October 5, 2010 ICT 1 Outline Intro to XML Data Exchange FloraMap: Flora2-based

More information

Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries. Abstract Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

Application of Data Mining Methods in Health Care Databases

Application of Data Mining Methods in Health Care Databases 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Application of Data Mining Methods in Health Care Databases Ágnes Vathy-Fogarassy Department of Mathematics and

More information

How To Use X Query For Data Collection

How To Use X Query For Data Collection TECHNICAL PAPER BUILDING XQUERY BASED WEB SERVICE AGGREGATION AND REPORTING APPLICATIONS TABLE OF CONTENTS Introduction... 1 Scenario... 1 Writing the solution in XQuery... 3 Achieving the result... 6

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

A Workbench for Prototyping XML Data Exchange (extended abstract)

A Workbench for Prototyping XML Data Exchange (extended abstract) A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy

More information

Visual Data Mining in Indian Election System

Visual Data Mining in Indian Election System Visual Data Mining in Indian Election System Prof. T. M. Kodinariya Asst. Professor, Department of Computer Engineering, Atmiya Institute of Technology & Science, Rajkot Gujarat, India trupti.kodinariya@gmail.com

More information

Introduction to XML Applications

Introduction to XML Applications EMC White Paper Introduction to XML Applications Umair Nauman Abstract: This document provides an overview of XML Applications. This is not a comprehensive guide to XML Applications and is intended for

More information

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda

Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda Data warehouses 1/36 Agenda Why do I need a data warehouse? ETL systems Real-Time Data Warehousing Open problems 2/36 1 Why do I need a data warehouse? Why do I need a data warehouse? Maybe you do not

More information

Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey

Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey 1 Amjad Qtaish, 2 Kamsuriah Ahmad 1 School of Computer Science, Faculty of Information Science and Technology,

More information

[MS-DVRD]: Device Registration Discovery Protocol. Intellectual Property Rights Notice for Open Specifications Documentation

[MS-DVRD]: Device Registration Discovery Protocol. Intellectual Property Rights Notice for Open Specifications Documentation [MS-DVRD]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,

More information

Service Description: NIH GovTrip - NBS Web Service

Service Description: NIH GovTrip - NBS Web Service 8 July 2010 Page 1 Service Description: NIH GovTrip - NBS Web Service Version # Change Description Owner 1.0 Initial Version Jerry Zhou 1.1 Added ISC Logo and Schema Section Ian Sebright 8 July 2010 Page

More information

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007 Introduction to XML Yanlei Diao UMass Amherst Nov 15, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly

More information

XML and Data Management

XML and Data Management XML and Data Management XML standards XML DTD, XML Schema DOM, SAX, XPath XSL XQuery,... Databases and Information Systems 1 - WS 2005 / 06 - Prof. Dr. Stefan Böttcher XML / 1 Overview of internet technologies

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

6NF CONCEPTUAL MODELS AND DATA WAREHOUSING 2.0

6NF CONCEPTUAL MODELS AND DATA WAREHOUSING 2.0 ABSTRACT 6NF CONCEPTUAL MODELS AND DATA WAREHOUSING 2.0 Curtis Knowles Georgia Southern University ck01693@georgiasouthern.edu Sixth Normal Form (6NF) is a term used in relational database theory by Christopher

More information

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved

More information

Cúram Business Intelligence and Analytics Guide

Cúram Business Intelligence and Analytics Guide IBM Cúram Social Program Management Cúram Business Intelligence and Analytics Guide Version 6.0.4 Note Before using this information and the product it supports, read the information in Notices at the

More information

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks Ramaswamy Chandramouli National Institute of Standards and Technology Gaithersburg, MD 20899,USA 001-301-975-5013 chandramouli@nist.gov

More information

CHAPTER 5: BUSINESS ANALYTICS

CHAPTER 5: BUSINESS ANALYTICS Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

Data warehouse design

Data warehouse design DataBase and Data Mining Group of DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, Data warehouse design DATA WAREHOUSE: DESIGN - 1 Risk factors Database

More information

Business Rules Modeling for Business Process Events: An Oracle Prototype

Business Rules Modeling for Business Process Events: An Oracle Prototype JOURNAL OF COMPUTERS, VOL. 7, NO. 9, SEPTEMBER 2012 2099 Business Rules Modeling for Business Process Events: An Oracle Prototype Rajeev Kaula Computer Information Systems Department, Missouri State University,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Classification of Fuzzy Data in Database Management System

Classification of Fuzzy Data in Database Management System Classification of Fuzzy Data in Database Management System Deval Popat, Hema Sharda, and David Taniar 2 School of Electrical and Computer Engineering, RMIT University, Melbourne, Australia Phone: +6 3

More information

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

1. INTRODUCTION TO RDBMS

1. INTRODUCTION TO RDBMS Oracle For Beginners Page: 1 1. INTRODUCTION TO RDBMS What is DBMS? Data Models Relational database management system (RDBMS) Relational Algebra Structured query language (SQL) What Is DBMS? Data is one

More information

Conceptual Workflow for Complex Data Integration using AXML

Conceptual Workflow for Complex Data Integration using AXML Conceptual Workflow for Complex Data Integration using AXML Rashed Salem, Omar Boussaid, Jérôme Darmont To cite this version: Rashed Salem, Omar Boussaid, Jérôme Darmont. Conceptual Workflow for Complex

More information

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to: 14 Databases 14.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define a database and a database management system (DBMS)

More information

Tracking System for GPS Devices and Mining of Spatial Data

Tracking System for GPS Devices and Mining of Spatial Data Tracking System for GPS Devices and Mining of Spatial Data AIDA ALISPAHIC, DZENANA DONKO Department for Computer Science and Informatics Faculty of Electrical Engineering, University of Sarajevo Zmaja

More information

The Direct Project. Implementation Guide for Direct Project Trust Bundle Distribution. Version 1.0 14 March 2013

The Direct Project. Implementation Guide for Direct Project Trust Bundle Distribution. Version 1.0 14 March 2013 The Direct Project Implementation Guide for Direct Project Trust Bundle Distribution Version 1.0 14 March 2013 Version 1.0, 14 March 2013 Page 1 of 14 Contents Change Control... 3 Status of this Guide...

More information

Effective Management and Exploration of Scientific Data on the Web. Lena Strömbäck lena.stromback@liu.se Linköping University

Effective Management and Exploration of Scientific Data on the Web. Lena Strömbäck lena.stromback@liu.se Linköping University Effective Management and Exploration of Scientific Data on the Web. Lena Strömbäck lena.stromback@liu.se Linköping University Internet 2 Example: New York Times 3 Example: Baby Name Vizard Laura Wattenberg

More information

Database Optimizing Services

Database Optimizing Services Database Systems Journal vol. I, no. 2/2010 55 Database Optimizing Services Adrian GHENCEA 1, Immo GIEGER 2 1 University Titu Maiorescu Bucharest, Romania 2 Bodenstedt-Wilhelmschule Peine, Deutschland

More information

XML DATA INTEGRATION SYSTEM

XML DATA INTEGRATION SYSTEM XML DATA INTEGRATION SYSTEM Abdelsalam Almarimi The Higher Institute of Electronics Engineering Baniwalid, Libya Belgasem_2000@Yahoo.com ABSRACT This paper describes a proposal for a system for XML data

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

Week 3 lecture slides

Week 3 lecture slides Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically

More information

Web Services Technologies

Web Services Technologies Web Services Technologies XML and SOAP WSDL and UDDI Version 16 1 Web Services Technologies WSTech-2 A collection of XML technology standards that work together to provide Web Services capabilities We

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2013 Vol I, IMECS 2013, March 13-15, 2013, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2013 Vol I, IMECS 2013, March 13-15, 2013, Hong Kong , March 13-15, 2013, Hong Kong Risk Assessment for Relational Database Schema-based Constraint Using Machine Diagram Kanjana Eiamsaard 1, Nakornthip Prompoon 2 Abstract Information is a critical asset

More information

MDM and Data Warehousing Complement Each Other

MDM and Data Warehousing Complement Each Other Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

More information

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many

More information

Translating between XML and Relational Databases using XML Schema and Automed

Translating between XML and Relational Databases using XML Schema and Automed Imperial College of Science, Technology and Medicine (University of London) Department of Computing Translating between XML and Relational Databases using XML Schema and Automed Andrew Charles Smith acs203

More information

Modern Databases. Database Systems Lecture 18 Natasha Alechina

Modern Databases. Database Systems Lecture 18 Natasha Alechina Modern Databases Database Systems Lecture 18 Natasha Alechina In This Lecture Distributed DBs Web-based DBs Object Oriented DBs Semistructured Data and XML Multimedia DBs For more information Connolly

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing A Framework for Developing the Web-based Integration Tool for Web-Oriented Warehousing PATRAVADEE VONGSUMEDH School of Science and Technology Bangkok University Rama IV road, Klong-Toey, BKK, 10110, THAILAND

More information

Data Integration and ETL Process

Data Integration and ETL Process Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second

More information

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2 Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on

More information

Agents and Web Services

Agents and Web Services Agents and Web Services ------SENG609.22 Tutorial 1 Dong Liu Abstract: The basics of web services are reviewed in this tutorial. Agents are compared to web services in many aspects, and the impacts of

More information

Chapter 7 Multidimensional Data Modeling (MDDM)

Chapter 7 Multidimensional Data Modeling (MDDM) Chapter 7 Multidimensional Data Modeling (MDDM) Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. To assess the capabilities of OLTP and OLAP systems 2.

More information

Web-Based Genomic Information Integration with Gene Ontology

Web-Based Genomic Information Integration with Gene Ontology Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic

More information

CA ERwin Data Modeler

CA ERwin Data Modeler CA ERwin Data Modeler Creating Custom Mart Reports Using Crystal Reports Release 9.6.0 This Documentation, which includes embedded help systems and electronically distributed materials (hereinafter referred

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz March 1, 2015 The Database Approach to Data Management Database: Collection of related files containing records on people, places, or things.

More information

Fuzzy Duplicate Detection on XML Data

Fuzzy Duplicate Detection on XML Data Fuzzy Duplicate Detection on XML Data Melanie Weis Humboldt-Universität zu Berlin Unter den Linden 6, Berlin, Germany mweis@informatik.hu-berlin.de Abstract XML is popular for data exchange and data publishing

More information

Managing XML Data to optimize Performance into Object-Relational Databases

Managing XML Data to optimize Performance into Object-Relational Databases Database Systems Journal vol. II, no. 2/2011 3 Managing XML Data to optimize Performance into Object-Relational Databases Iuliana BOTHA Academy of Economic Studies, Bucharest, Romania iuliana.botha@ie.ase.ro

More information

Business Object Document (BOD) Message Architecture for OAGIS Release 9.+

Business Object Document (BOD) Message Architecture for OAGIS Release 9.+ Business Object Document (BOD) Message Architecture for OAGIS Release 9.+ an OAGi White Paper Document #20110408V1.0 Open standards that open markets TM Open Applications Group, Incorporated OAGi A consortium

More information

Data warehousing with PostgreSQL

Data warehousing with PostgreSQL Data warehousing with PostgreSQL Gabriele Bartolini http://www.2ndquadrant.it/ European PostgreSQL Day 2009 6 November, ParisTech Telecom, Paris, France Audience

More information

WEBVIEW An SQL Extension for Joining Corporate Data to Data Derived from the World Wide Web

WEBVIEW An SQL Extension for Joining Corporate Data to Data Derived from the World Wide Web WEBVIEW An SQL Extension for Joining Corporate Data to Data Derived from the World Wide Web Charles A Wood and Terence T Ow Mendoza College of Business University of Notre Dame Notre Dame, IN 46556-5646

More information

Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager

Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager Rational Reporting Module 3: IBM Rational Insight and IBM Cognos Data Manager 1 Copyright IBM Corporation 2012 What s next? Module 1: RRDI and IBM Rational Insight Introduction Module 2: IBM Rational Insight

More information

A Framework for Data Migration between Various Types of Relational Database Management Systems

A Framework for Data Migration between Various Types of Relational Database Management Systems A Framework for Data Migration between Various Types of Relational Database Management Systems Ahlam Mohammad Al Balushi Sultanate of Oman, International Maritime College Oman ABSTRACT Data Migration is

More information

Introduction. Web Data Management and Distribution. Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart

Introduction. Web Data Management and Distribution. Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Introduction Web Data Management and Distribution Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook

More information

Data Integration for XML based on Semantic Knowledge

Data Integration for XML based on Semantic Knowledge Data Integration for XML based on Semantic Knowledge Kamsuriah Ahmad a, Ali Mamat b, Hamidah Ibrahim c and Shahrul Azman Mohd Noah d a,d Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia,

More information

A SURVEY ON WEB MINING TOOLS

A SURVEY ON WEB MINING TOOLS IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 3, Issue 10, Oct 2015, 27-34 Impact Journals A SURVEY ON WEB MINING TOOLS

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

How To Model Data For Business Intelligence (Bi)

How To Model Data For Business Intelligence (Bi) WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2

More information

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram Cognizant Technology Solutions, Newbury Park, CA Clinical Data Repository (CDR) Drug development lifecycle consumes a lot of time, money

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online Analytical Processing From OLTP to the Data Warehouse Traditionally, database systems stored data relevant to current business

More information

CHAPTER 4: BUSINESS ANALYTICS

CHAPTER 4: BUSINESS ANALYTICS Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the

More information