How To Create A Graph Based Data Analysis System

Size: px
Start display at page:

Download "How To Create A Graph Based Data Analysis System"

Transcription

1 BIIIG : Enabling Business Intelligence with Integrated Instance Graphs André Petermann # 1, Martin Junghanns #1, Robert Müller 2, Erhard Rahm #1 # Department of Computer Science, University of Leipzig Augustusplatz 10, Leipzig, Germany 1 {petermann,junghanns,rahm}@informatik.uni-leipzig.de Faculty of Media, University of Applied Sciences Leipzig Karl-Liebknecht-Str. 145, Leipzig, Germany 2 mueller@fbm.htwk-leipzig.de Abstract We propose a new graph-based framework for business intelligence called BIIIG supporting the flexible evaluation of relationships between data instances. It builds on the broad availability of interconnected objects in existing business information systems. Our approach extracts such interconnected data from multiple sources and integrates them into an integrated instance graph. To support specific analytic goals, we extract subgraphs from this integrated instance graph representing executed business activities with all their data traces and involved master data. We provide an overview of the BIIIG approach and describe its main steps. We also present initial results from an evaluation with real ERP data. I. INTRODUCTION Traditional business intelligence based on data warehouses is limited in its flexibility by the underlying data warehouse schema, e.g. star, snowflake or galaxy schema. Only predefined kinds of facts, dimensional objects and their relationships captured in the schema can be evaluated by the data analyst. While this is sufficient in many cases, it does not support a more elaborate analysis and mining of relationships between data objects to better understand the causes of certain results, e.g., in which way certain employees contribute to profit. Such relationships are frequently recorded in the underlying information systems but are not transferred to the data warehouse due to the typical focus on simple dimensional relationships. We therefore propose a new graph-based approach to business analysis supporting a more comprehensive analysis of relationships in addition to standard analysis techniques. For this purpose, we extract instances and their relationships from the relevant data sources and integrate them within a comprehensive instance graph that preserves existing relationships. Enterprises typically use different heterogeneous but interrelated information systems for their daily business such as ERP (enterprise resource planning) or CIT (customer issue tracking) systems. The data objects within these systems represent either transactional or master data. Transactional data refer to business activities such as quotations, invoices, s or calendar entries while master data (non-transactional data) refer to more static reference data such as customers, products or employees. Transactional data instances are typically related to other transactional data instances as well as to relevant master objects. We will capture such relationships referring to the same business activity within so-called business transaction graphs (BTG) in order to allow a fine-grained analysis of business activities. There are also frequent references between information systems, e.g. to link an activity to a remotely controlled transactional object or master object. For example, tickets in the CIT system may be related to sales orders in the ERP system or orders in the ERP system may refer to customer master data in other systems. Such links between systems are typically implemented by cross-system identifiers for customers, sales order etc. and can be maintained by dedicated middleware like enterprise service bus [16] software. We will utilize such existing links for integrating instances from different systems. A. Motivating Example For illustration, we consider a trade company with business activities supported by an ERP system and a separate CIT system, both keeping transactional and master data. The ERP system is used to process purchase orders and sales of products while the CIT system manages customer complaints. A nontrivial analysis task is to evaluate employees considering not only the frequency but also the way they contribute to the profit of the company. Our intended solution is to aggregate BTGs to determine profit and analyze occuring patterns of employee involvement. Figure 4 shows a simplified sample for a sales process containing transactional (white nodes) and involved Employee objects (gray nodes) from both, the ERP and CIT system. Master data about Employee Alice is available in both sources and thus shown as a cross-system node. The calculation of profit has to consider all revenue-related (SalesInvoice; revenue 5,000) and expense-related objects (PurchaseInvoice and Ticket; total expense 4,000) resulting in a profit of 1,000. Information about employee involvement is available from relationship paths between transactional and Employee objects. In particular the combination of master data (e.g. employee name) and transactional metadata (relationship types, transactional classes) provides meaningful patterns about who was how involved in which kind of business activity. For

2 Fig. 1: BIIIG Overview comprehensive analytics we have to aggregate profit-related measures and consider the involvement of all employees across relationship paths in not only one but all relevant BTGs. B. Framework Overview The motivating example illustrates only one possible kind of analytical questions answerable by the evaluation of business transaction graphs. Generally, the identification of frequent relationship patterns in BTGs of interest is promising high analytical value. Relationship-driven evaluations are not only valuable for business applications but also for other domains and different kinds of information systems. For example, staffrelated treatment success in hospitals could be evaluated with data from clinical information systems. This can be supported by connecting all clinical records associated to the treatment of a certain patient within a BTG and the analysis of relationship patterns across all such BTGs. To support this kind of evaluations we propose a new graphbased framework for business analytics called BIIIG (Business Intelligence with Integrated Instance Graphs). BIIIG uses a native graph database (currently Neo4j [3]) to efficiently manage even large graphs and to utilize the advantages of graph databases for the integration of heterogeneous data and the flexible evaluation of relationships. While data integration happens mainly on the instance level, BIIIG also captures the metadata from the data sources to uniformly and semantically describe instances and relationships. As shown in Fig. 1, the BIIIG approach entails four main steps: (1) Metadata Acquisition and Unification: We first extract the schema for every data source and translate it into a generic graph format. The schema will be used not only to semantically describe the data but also enables the automated extraction of instance objects and relationships in step 2. (2) Automated Graph Integration: In our main data store, called integrated instance graph (IIG), each data object (transactional or master data) is a node and each relationship is an edge. It can be very large and is stored in a graph database. The integration process is fully automated based on the unified metadata and requires no source-specific ETL design. (3) Business Transaction Graphs: To support specific analytical tasks, we extract business transaction graphs (BTG) from the IIG. A BTG contains interrelated transactional data as well as involved master data. BTGs typically represent business activities and are often the granularity of interest. (4) Graph Analytics: Both the IIG and the set of BTGs can be the basis for graph-based business analytics. With an appropriate user interface, analysts can intuitively navigate in the graphs to access any piece of recorded data with its relationships. In addition, declarative query languages enable new kinds of OLAP queries involving relationship pattern. Furthermore, graph mining techniques can be applied to both the IIG or the set of BTGs, e.g., to determine frequent patterns or to predict the outcome of future business activities. The main contribution of this paper is outlining the BIIIG framework for graph-based business intelligence with its dedicated support of business transaction graphs. The approach has already been implemented in an initial version and could be applied to a real use case. In the next section, we briefly discuss related work about graph databases and graph analytics. After the introduction of our metadata and instance models (section III), we present the main steps of the framework in sections IV, V and VI. We then show the results of a first use case with data from a real ERP system (section VII). Finally, we summarize and conclude with an outlook to future work. II. RELATED WORK Graph databases [21] and graph data models [5] have found increasing interest and adoption in recent years. Even business intelligence platform providers aim at providing graph data analysis, e.g., for SAP HANA [19] or Teradata [4]. However, these efforts are still under development so that the final capabilities are yet unknown. The property graph model [18] is applied in several systems and will also be used in BIIIG. It is simple but flexible due to a uniform representation of objects and relationships with different properties. State-of-the art graph query languages such as Neo4j Cypher [1] provide already useful analytical features for graph pattern matching and aggregation.

3 Fig. 2: Sample database tables of two interlinked systems Fig. 3: Sample Unified Metadata Graph (simplified); master data classes are gray and transactional classes are white systems that might become of interest in unexpected ways. A notable exception is DB2SNA [23] that extracts social networks from relational databases and analyzes them, albeit it is not concerned with business analytics. The automated extraction of interrelated data objects from ERP systems is discussed in [17], but without using a graph model and for the single analytical goal of process mining. The majority of analytical graph algorithms, e.g. for patternmining [9], are based on graph models with homogenous data objects and relationships while we aim at analyzing graphs with heterogeneous data. Still, there are some approaches to find similar patterns in more complex graph models. PathSim [24] is a similarity measure for graph patterns in heterogeneous metadata. In [13] graph summarization is used for the visualization of similar patterns in multiple labelled graphs and [20] discusses the aggregation of property graphs by rule-based summarization. Further on, there are different approaches of graph-based OLAP frameworks. GraphCube [26] enables the graph-based aggregation of graphs with nodes providing a set of predefined dimensions. Graph OLAP [8] aims at aggregating multiple snapshots of homogenous networks and HMGraph OLAP [25] extends this approach by the support of heterogenous nodes and edges. While these approaches show the opportunities for OLAP on graph data, they either do not support heterogenous data [8], [26] or rely on a static data warehouse schema [25]. III. DATA MODEL Fig. 4: Sample Business Transaction Graph with node classes, edge types and selected properties; master data is shown as gray and transactional data as white nodes Graph databases are also popular in the semantic web community where RDF graphs [12] are stored in dedicated triple stores [10] and analyzed with the standardized query language SPARQL. However, RDF and SPARQL have not been widely used for business intelligence so far. RDF triples lead to an extremely fine-grained and voluminous data representation. Furthermore, SPARQL has limitations for graph analysis compared to languages such as Cypher, e.g. to find and return variable paths and graph patterns of interest [22]. While BIIIG aims at a general and comprehensive end-toend solution for graph-based data integration and business intelligence, previous related studies mostly focussed on specific aspects. Several graph-based frameworks focus on the transformation of source data into problem-specific [14][7] or more general [15] graph-based data warehouse models but do not aim at preserving all relationships from the source BIIIG is based on simple but generic and flexible models to uniformly represent and integrate the metadata and instance data of heterogeneous data sources. In contrast to data warehouses we are not trying to define a specific global schema to which subsets of the data sources need to be mapped. We follow a bottom-up approach for data integration that preserves the sources metadata as well as instance data and relationships but integrate them into uniform graph models. At the metadata level, we describe sources in terms of classes and associations to which instance objects and their relationships belong. In the following, we first introduce the so-called unified metadata graph (UMG) representing the metadata from all data sources. We then specify the instance graph model. A. Unified Metadata Graph The UMG is used to describe the classes and associations for every source as well as the associations between sources. We can semi-automatically derive the UMG representation for diverse data sources, in particular for relational databases. Furthermore, we can derive mappings from the data sources to the UMG that support an automatic extraction and integration of instance data. The UMG also provides additional value for the data analyst, as it reveals available classes and associations in a model of a, compared to the instance graph, compact size. The UMG is defined as a pair G M = V,E with the set of classes v as nodes and the set of associations E as edges. Each class v V(G M ) is defined as v = d,c,t,a i,a,µ with data

4 source named, class namec, class typet(transactional or master data), identity attributea i, the optional set of class attributes A and a schema translation (mapping) µ. An association e E(G M ) is defined as e = r,v s,v e,a s,a e,i,a,µ with relationship type r, start class v s, end class v e, start reference attribute a s, end reference attribute a e, the direction indicator i, the optional set of association attributes A and a mapping µ. If the direction indicator is true-valued, associations will be directed from v s to v e. Based on the definition, each class of the UMG is associated to a single source. By contrast, associations can link either classes from the same source or from different sources and thus support data integration. To simplify data integration for heterogeneous sources, we limit the mandatory attributes for classes and associations to the minimal information required for identity and reference. Attributes for classes and associations in A thus are optional and document what attributes have been defined within a source. For schema-less sources (e.g., XML documents), we can document the attributes found in the instance properties. This information is intended to provide the analyst with the attributes available for analysis. The schema and instance translation between data sources and the UMG is described by class and association-specific mappings µ. The implementation of these mappings depends on the kind of data source. For relational sources it can be based on SQL statements (see section IV). To help generate nodes and edges of the integrated instance graphs, the mappings should specify the derivation of class and association instances from a source to a generic representation as pair-sets { a 1,b 1,.., a n,b n } with attributes a (identity, reference or property attribute) and values b. Fig. 3 illustrates an example UMG for the two interlinked relational sources shown in Fig. 2; it also covers the classes and relationship types in the instance example of Fig. 4. The three tables of the CIT source are represented by two similarly named classes and an undirected association representing the n:m relationship between tickets and employees. In the UMG model, we have an Employee class for each source since both sources have an employee table (with different attributes and ids). However, the UMG contains a sameas association between the Employee classes as well as another cross-system association between the Ticket and SalesOrder classes. B. Integrated Instance Graph Data objects and relationships from all sources are uniformly represented and integrated within the integrated instance graph (IIG). With this central data structure, BIIIG utilizes the versatility of graphs to represent heterogeneous data and to support the flexible analysis of data and relationships. For this purpose, the IIG should meet the following specific requirements: uniform representation of transactional and master data instances of different classes from different sources relationships of different semantic kinds directed relationships between any two objects multiple relationships between any two objects unrestricted number of named properties for objects and relationships. The IIG as well as business transaction graphs (section V) are instance graphs that are defined as follows. An instance graph G I = V,E consists of a set of nodes V and a set of edges E. A node representing a data object v V(G I ) is defined as v = u,s,c,t,p with the unique identifier u, the set of source identifiers S, the class name c, the node type t (master or transactional data) and the set of named properties P. Class names refer to classes of the UMG. Corresponding objects for sameas-connected classes from different sources are combined within one node. In this case we have multiple source identifiers in S referring to the original data objects to provide provenance information. An edge e E(G I ) is defined as e = f,r,v s,v e,p, with edge identifier f, relationship type r, the connected nodes v s,v e and the set of properties P. Relationship types refer to associations of the UMG. The identifier f differentiates multiple relationships between the same two nodes and the sequence of v s and v e expresses the relationship direction. For undirected associations we maintain edges in both directions. For both, nodes and edges, the set of properties may contain arbitrarily many pairs P = { a 1,b 1,.., a n,b n } of attribute a and value b. For integrated objects we can merge the properties from the sources. For the example in Fig. 2, we can combine employees objects with CIT.employees.erp empl number = ERP.EmplyeeTable.number and merge their properties from both sources (name, degree, dob, address, phone). It is easy to see that instance graphs meet all mentioned requirements. Nodes and edges can be heterogeneous as they may belong to different classes and relationship types and consist of different sets of properties. The support of node classes and edge types also ensures a semantic expressiveness of the graph structure for business-oriented analysis tasks. This graph model can be implemented with graph database systems supporting the property graph model. IV. METADATA AND INSTANCE INTEGRATION The generation of the UMG can partially be automated but needs support by an expert knowing the data sources. By contrast, the integration of data objects and relationships into the IIG is fully automated based on the UMG and its schema translations. A. Metadata Acquisition Metadata acquisition requires that each data source is expressed by classes and associations as defined in the UMG model. Furthermore, we need to link data sources at the metadata level either by utilizing existing references across sources or by specifying additional ones. The source-specific metadata acquisition is inherently dependent on the underlying data model and chosen modeling decisions. While our approach can accommodate different kinds of data sources, we have so far focussed on the integration of relational databases due to their dominant role in

5 enterprise information systems. For UMG generation we need information about all tables including their primary key and foreign key constraints. Ideally we can query tables T, primary and all further columns C T n from the standardized SQL Information Schema of a database D. We then try to automatically derive classes and associations assuming that tables either represent classes or complex (n:m) associations while n:1 or 1:1 relationships are expressed by foreign key constraints within class-like tables. Deviations from this common modeling approach need to be dealt with manually after the automatic translation steps. For a class-like tabletwith primary keyp T we derive a class descriptionv = d,c,t,a i,a,µ where we set the source name d = D, the class name c = T and the identity attribute a i = P T. All further column names C T n except the foreign keys are included in A. The class type t has to be specified manually. Mapping µ is expressed by an SQL statement selecting all class instances with their primary key a i and values for all further attributes in A. For the example class Ticket the mapping is thus simply : SELECT id, description FROM CIT.tickets. keys P T, foreign keys F T1,T2 n For a foreign key F T1,T2 n in a class-like table T1 referencing a table T2, we derive a directed n:1 (or 1:1) association e = r,v s,v e,a s,a e,i,a,µ where v s and v e are the classes mapped tot1 andt2, the start reference attribute is the primary key a s = P T1 and relationship type r = F as well as the reference attribute a e = F are the foreign key column. We further assume no association-specific attributes (A = ) and directed associations (direction indicator i =true) from the referencing to the referenced class. Mapping µ is expressed by a SQL statement to select all association instances with the id values of the inter-related objects. For the example association processedby between SalesOrder and Employee in the ERP system this is achieved by the following SQL statement : SELECT number, processedby FROM ERP.SalesOrderTable WHERE processedby IS NOT NULL. For an association table T1 representing a binary n:m relationship via foreign keys F T1,T2 1 and F T1,T3 2 without a further primary key we derive an undirected association (i=false) and use the table name as the relationship type (r = T1), the foreign keys as reference attributes (a s = F 1, a e = F 2 ), the classes corresponding to the two referenced tables T2,T3 as c s and c e, and include all further column names in A. The SQL statement for mapping µ thus selects the foreign keys to specify the two inter-related objects and all further association attributes. For association processedby between Ticket and Employee in the CIT system the SQL mapping is : SELECT ticket id, empl id, responsible FROM CIT.ticket employee. The automatically generated classes and associations need to be enhanced by an expert, in particular to categorize classes as master or transactional data and to deal with cases not covered by the sketched translation approach. Furthermore, the expert can rename classes, associations and attributes to improve understandability for business analysts. An expert can also specify associations between different sources. Existing cross-system links need to be specified by dedicated associations in the UMG, such as the sameas and openedfor associations in Fig. 3. In the example, these associations can be derived from foreign keys in the CIT tables (erp empl number, erp so number) referring to the ERP system. An example mapping for the latter is an SQL statement equivalent to other n:1 associations : SELECT id, erp so number FROM CIT.tickets WHERE erp so number IS NOT NULL. Associations between sources can be also determined with the help of a link discovery or schema matching tool [6]. B. Instance Graph Integration The generation of the integrated intance graph is an automatic process with three main steps. First, class mappings are used to transform each source data object into a node in the IIG. Second, we use assocation mappings to generate edges representing relationships between objects. Finally, we combine corresponding data objects that are interconnected by a sameas edge (relationship). Algorithm 1 describes these steps in more detail. The input of the algorithm is the UMG including the class and association mappings, the result is the IIG. First, all source data objects are added as nodes to the integrated instance graph within a loop over all classes (lines 2-11). The class specific mapping µ (e.g. SQL statement) is used to query or extract the instances from a data source in order to represent them uniformly as data objects consisting of sets of attributevalue pairs (3). Every new node (4) obtains the class name and type from the currently processed class (5,6). The set of source identifiers S is implemented as an array. Initially, it contains only one identifier represented as the concatenation of the data source and class names as well as the object identity (7). The attribute name for the object identifierclass.id attr corresponds to a i of the class definition. The chosen identifier format allows tracing back any node to its originating data source (provenance). In particular, it enables querying nodes by source identifiers as implemented in our edge integration. We further include all nonempty properties of the data object into the node, except the one already included in identity (8) and add the constructed node to the IIG (9). Second, all relationships are added as edges to the IIG within a loop over all associations (12-29). Analogous to nodes, the association-specific mapping µ is used to query relationships, represented as sets of attribute-value pairs (13). The connected nodes will be queried from the IIG by their source identifier, after those identifiers have been concatenated with their respective source and class names (15,16). We will only continue with the current edge, if we find a node for both identifiers (17). The edge obtains its relationship type from the currently processed association (18) and receives all nonempty property values of the relationship except the ones already included in reference attributes (19). At the end the

6 Algorithm 1 Automated Instance Graph Integration Input: unified metadata graph (umg) Output: integrated instance graph (iig) 1: iig = new Graph 2: for all cls in umg.classes() do 3: for all obj in cls.instances() /* µ */ do 4: node = new Node 5: node.type = cls.type 6: node.class = cls.name 7: node.sids = [concat(cls.source,cls.name,obj[cls.id attr])] 8: node.properties = obj.where(value!= NULL and attr!= cls.id attr) 9: iig.add(node) 10: end for 11: end for 12: for all asn in umg.associations() do 13: for all rel in asn.instances() /* µ */ do 14: edge = new Edge 15: start sid = concat(asn.start class.source,start class.name,rel.start attr) 16: end sid = concat(asn.end class.source,asn.end class.name,rel.end attr) 17: if edge.start node = iig.get node(start sid) and edge.end node = iig.get node(end sid) then 18: edge.type = asn.type 19: edge.properties = rel.where(value!= NULL and attr!= asn.start attr and attr!= asn.end attr) 20: iig.add(edge) 21: if not asn.directed then 22: edge2 = edge.clone() 23: edge2.start node = edge.end node 24: edge2.end node = edge.start node 25: iig.add(edge2) 26: end if 27: end if 28: end for 29: end for 30: while edge = iig.first edge of type(sameas) do 31: edge.end node.sids += edge.start node.sids 32: edge.end node.properties += edge.start node.properties 33: edge.start node.edges.redirect to(edge.end node) 34: iig.delete(edge) 35: iig.delete(edge.start node) 36: end while 37: return iig constructed edge is added to the IIG (20). In the case of an undirected relationship we construct a second edge with same relationship type and properties (22) but switched start and end nodes (23,24) and add it to the IIG (25). In the final step, we fuse nodes representing the same logical object (30-37). For that, all sameas edges are processed (30) and the start node is merged into the end node. First, the identifier array and the set of properties of the end node are merged with those of the start node (31,32) and second, all in- and outgoing edges of the start node are redirected to the end node (33). Finally, the sameas edge as well as the initial start node are deleted (34,35). V. BUSINESS TRANSACTION GRAPHS A main feature of BIIIG is the analysis of business activities represented by subgraphs of the IIG called business transaction graphs (BTG). The notion of BTGs is based on the observation that transactional data objects represent steps within business activities, e.g. the provision of a product quotation is represented by a corresponding quotation object. Such activity steps cause further actions, for example a sales order represented by its own transactional data object and Algorithm 2 Business Transaction Graph Isolation Input: integrated instance graph (iig) Output: set of business transaction graphs (btgs) 1: btgs = new GraphSet 2: trans nodes = iig.get nodes by type(transactional) 3: while trans nodes.size() > 0 do 4: btg = new Graph // business transaction graph 5: seed node = trans nodes.random() 6: btg.add(seed node) 7: cc nodes = new NodeSet // causally connected nodes 8: cc nodes.add(seed node) 9: while cc nodes.size() > 0 do 10: node = cc nodes.random() 11: for all edge in node.edges() do 12: if not btg.contains(edge) then 13: if node == edge.start node then 14: next node = edge.end node 15: else 16: next node = edge.start node 17: end if 18: if not btg.contains(next node) then 19: btg.add(next node) 20: if next node.type == Transactional then 21: cc nodes.add(next node) 22: end if 23: end if 24: btg.add(edge) 25: end if 26: cc nodes.remove(node) // all edges traversed 27: trans nodes.remove(node) // allocated to single btg 28: end for 29: end while 30: btgs.add(btg) 31: end while 32: return btgs so on. Assuming a data object has to be existent before it can be linked, edge directions provide information about the (inverse) sequence of corresponding business activities without the need for timestamps. Thus, we consider such relationships between transactional objects as causal connections that are represented by edges or paths between transactional nodes in the IIG. Transactional data objects also have relationships with master data objects (e.g., the employee providing the quotation), with corresponding master data nodes and edges in the IIG. However, a path between transactional nodes via master data nodes is in general no hint for a causal connection. For example, if two quotations involve the same product, these quotations can be completely independent. Hence, we define that two transactional nodes will be considered as causally connected, if they have at least one connection through an edge or a path involving only transactional data objects. We therefore define a business transaction graph as a subgraph of the integrated instance graph, where all transactional nodes are causally connected. Because of their fundamental analytical value, a business transaction graph also contains all master data nodes connected to the transactional nodes. Based on this BTG definition, the IIG can be transferred to a set of BTGs, which may overlap in master data nodes but not in transactional nodes or edges. Algorithm 2 shows how BTGs can be derived from the IIG. It starts with a candidate set of all transactional nodes contained in the IIG (2). The generation of a single BTG starts with an arbitrary transactional seed

7 node from this set (5) and is followed by the traversal of all connecting paths, where all passed nodes and edges are stored in a BTG (9-29). The traversal is stopped as soon as a master data node is reached (20). Every traversed node is removed from the candidate set to ensure that any transactional node as well as connected edges are exactly processed once (27). This leads to a linear time complexity of O( V(G) + E(G) ). VI. INSTANCE GRAPH ANALYTICS BIIIG aims at supporting comprehensive graph-based business analytics including graph mining and the evaluation of relationship patterns (like our motivating example). The design of the analysis capabilities for BIIIG has just begun and will be described in future publications. To get a first impression of our approach we sketch a few basic operators for selection, projection and aggregation. Projection and aggregation generate tabular results supporting a flexible and powerful OLAP post-processing using standard relational technology. Since BIIIG currently uses the graph database Neo4j to manage the IIG as well as the set of BTGs we can also use its query functionality for analysis, in particular the query language Cypher. At the end of this section we briefly discuss how well our requirements are already met by Neo4j and Cypher. A. Operators As the basis for analytical operations, we consider the set of BTGs as an indexed set of n graphs G = {G i } 1 i n. a) Selection: The selection operator σ H,θ (G i ) returns a set of m subgraphs {G j i 1 j m;g j i G i } each matching a specified search pattern. A search pattern consists of a search predicate θ referring to node and edge variables in a variable definition H = V, E. The simplest search pattern for employee involvement describes a direct edge from a transactional node to an employee node, expressed by H ex = {v 1,v 2 },{e 1 } and θ ex : t(v 1 ) = Transactional c(v 2 ) = Employee v s (e 1 ) = v 1 v e (e 1 ) = v 2. The result of σ Hex,θ ex (G i ) is a set of subgraphs each containing exactly two nodes and one edge or will be an empty set, if the pattern does not occur. b) Projection: The projection operator returns specified metadata and property values of graphs as a table. Projection is typically applied to the results of a selection operation. The projection π a1(x 1),..,a k (x l ),m 1(x 1),..,m n(x l )(σ H,θ (G i )) extracts values of k properties a i and n metadata elements m j of node or edge variables x V(H) E(H). The result is a table T i = { b 1,..,b k+n j } with one row of k+n values per subgraph G j i G i of the selection result. Reviewing our example the projection π c(v1),r(e 1),name(v 2)(σ Hex,θ ex (G i )) determines for every employee involvement its kind of business activity (transactional class c(v 1 )), relationship type r(e 1 ) and employee name. An example result row would be SalesQuotation, sentby, Alice. c) Aggregation: In business analytics we frequently want to derive aggregated values such as the profit per BTG. Similar to projection, aggregation is typically applied to the result of a selection operation, in particular with a search pattern locating the base values to aggregate. The aggregation γ a(x) (σ H,θ (G i )) applies an aggregation function γ (e.g. SUM or AVG) on values of a specified property a for a node or edge variable x V(H) E(H). Aggregation extracts the a values from all subgraphs of the selection result and aggregates them into a single measure value. We can use separate search patterns for revenue-related nodes H r = {v 1 }, with θ r : c(v 1 ) = SalesInvoice and expense-related nodes H e = {v 2 }, with θ e : c(v 2 ) = PurchaseInvoice c(v 2 ) = Ticket to calculate the profit for BTG G i by profit i = SUM Revenue(v1)(σ Hr,θ r (G i )) SUM Expense(v2)(σ He,θ e (G i )). B. Relational Postprocessing Since projections determine relational tables and aggregations determine scalar values we can use these results for a relational postprocessing to support a comprehensive analytics across all BTGs G = {G i }. For this purpose, we extend projection result tables T = {T i } with an BTG index column and unite all records into a single table T p = { i,b 1,..,b k+n }. Similarly, we can maintain all aggregation results in a combined table T m = { i,m 1,m 2,... } with a column i for the BTG index and one additional column per BTG measure m such as profit or number of customer complaints. To relate measures and projection results we can apply a natural join T p T m via the BTG index i. Since projections often express relationship patterns of interest, e.g. on employee involvement, the resulting join table supports comprehensive, multidimensional grouping and aggregations across BTGs. For our example, we can thus determine the employees, business activities or relationship types that are most frequently involved in high profit BTGs or those with customer complaints. C. Capabilities of Neo4j and Cypher A fundamental lack of Neo4j is the missing support for managing and thus also querying sets of graphs. Furthermore, the projection to non-graph structures (e.g. tables) in the RETURN clause is mandatory. While this satisfies our current approach, it might be limiting for analytics involving chained graph operations. It is also not possible to apply further relational operations on multiple selections, which are projected to tables. As the major positive point, our selection and projection operators on single graphs are well covered. Cypher supports the specification of search patterns, where the MATCH and WHERE clauses correspond to H and θ. The following example query implements our example projection for employee involvements: MATCH (v1)-[e1]->(v2) WHERE v1.type="transactional" AND v2.class="employee" RETURN v1.class,type(e1),v2.name; Cypher also supports the aggregation of property values. The following example query corresponds to the revenue part of our profit example : MATCH (v1) WHERE v1.class="salesinvoice" RETURN SUM(v1.Revenue);

8 VII. EXPERIMENTAL EVALUATION For an initial evaluation of the BIIIG framework, we used data of a real installation of the open source ERP system ERP- Next [2]. The used data is kept in a MySQL database and covers fictive business activities for application development and testing. While the database size is relatively small we found the data realistic and useful for an initial proof-of-concept. We could automatically determine the tables, columns and primary keys following the approach of IV-A. However, the metadata description of the ERP database was missing foreign keys and we had to support the generation of the UMG by a custom script. In particular, foreign key candidates were chosen based on names and datatypes and validated by table joins. In the result, we derived 102 classes and 583 associations. Initially, class names were chosen corresponding to table names and association names corresponding to foreign key column names or (for n:m relationships) the name of the reference table. These names turned out to be meaningful for a person familar to ERP systems but we translated class and association names to more common business terms and also typified classes as transactional or master data. We implemented algorithm 1 in Java and used it together with the generated UMG to create the IIG. The IIG for the used ERP data has 8,358 nodes, 38,892 edges and 87,746 nonempty properties. The integration process took about 10 seconds on commodity hardware and the IIG size for the graph database Neo4j is about 30 MB on disk. Algorithm 2 was implemented using the native API of the graph database. In our current implementation we use a second database instance with isolated subgraphs (including redundant master data), where every node provides a dedicated property btg id representing its BTG index. We generated 1,983 BTGs in about 2 seconds. The BTG size ranges from 2 to 221 nodes and 2 to 883 edges. Spot tests of BTGs yielded only reasonable results in the context of business activities, for example interrelated transactional data objects from first sales activities over product purchase and invoicing up to general ledger accounting as well as the involved master data such as employees, customers and products. We expect real-world BTGs containing such data to have at least the size of the biggest ones resulting from the experiment dataset. Finally, we could successfully execute queries corresponding to the examples in section VI-C. VIII. CONCLUSION AND FUTURE WORK We proposed the BIIIG framework for graph-based data integration and business intelligence. It utilizes simple but flexible graph models to uniformly represent metadata and instance data of diverse sources such as ERP and related systems. Metadata acquisition and instance integration are largely automatic and retain valuable relationships represented in the source data for later business analysis. BIIIG also includes the automatic generation of so-called business transaction graphs to enable the focused analysis of business activities with all their steps and involved people as well as other resources. BIIIG will provide comprehensive query and data mining facilities based on graph patterns although the analysis component is still under development. An initial version of BIIIG for relational data sources based on an existing graph database has already been implemented and was successfully applied to a real ERP use case. In future work, we will complete the specification and implementation of the BIIIG analysis capabilities. We will also investigate the need and implications for more general BTGs where transactional objects can be used in more than one BTG to represent cross-btg dependencies as already addressed in [11]. We will further apply and evaluate BIIIG for more and larger use cases. ACKNOWLEDGMENTS This project is partly funded within the EU program Europa fördert Sachsen of the European Social Fund. We are grateful to Web Notes Technologies for providing us with the data for the ERPNext use case. REFERENCES [1] Cypher query language, cypher-query-lang.html. [2] ERPNext, ERP system, [3] Neo4j, graph database, [4] Teradata Aster SQL-GR, [5] R. Angles, A comparison of current graph database models, in Data Engineering Workshops (ICDEW), 2012 IEEE 28th Int. Conf. on, [6] Z. Bellahsene, A. Bonifati, and E. Rahm, Eds., Schema Matching and Mapping. Springer, [7] D. Bleco and Y. Kotidis, Business intelligence on complex graph data, in Proceedings of the 2012 Joint EDBT/ICDT Workshops, [8] C. Chen et al., Graph OLAP: Towards online analytical processing on graphs, in Data Mining. ICDM 08. Eighth IEEE Int. Conf. on, [9] H. Cheng, X. Yan, and J. Han, Mining graph patterns, in Managing and Mining Graph Data. Springer, 2010, pp [10] C. David, C. Olivier, and B. Guillaume, A survey of RDF storage approaches, ARIMA Journal, vol. 15, pp , [11] D. Fahland et al., Many-to-many: Some observations on interactions in artifact choreographies. in ZEUS, 2011, pp [12] G. Klyne, J. J. Carroll, and B. McBride, Resource description framework (RDF): Concepts and abstract syntax, W3C rec., vol. 10, [13] D. Koop, J. Freire, and C. T. Silva, Visual summaries for graph collections, [14] Y. Kotidis, Extending the data warehouse for service provisioning data, Data & Knowledge Engineering, vol. 59, no. 3, pp , [15] D. Lam et al., Graph-based data warehousing using the core-facets model, in Advances in Data Mining. Appl. and Theor. Aspects, [16] F. Menge, Enterprise service bus, in Free and open s. softw. c., [17] E. Nooijen et al., Automatic discovery of data-centric and artifactcentric processes, in Business Process Management Workshops, [18] M. A. Rodriguez and P. Neubauer, Constructions from dots and lines, Bulletin of the Amer. Society for Inf. Sci. and Tec., vol. 36, no. 6, [19] M. Rudolf, M. Paradies, C. Bornhövd, and W. Lehner, The graph story of the SAP HANA database. in BTW, 2013, pp [20] M. Rudolf et al., SynopSys: Large graph analytics in the SAP HANA database trough summarization, in GRADES, [21] S. Sakr and E. Pardede, Graph Data Management: Techniques and Applications. Information Science Reference, [22] A. Seaborne, SPARQL 1.1 property paths, sparql11-property-paths/. [23] R. Soussi et al., DB2SNA: an all-in-one tool for extraction and aggregation of underlying social networks from relational databases, in The inf. of tec. on social network analysis and Mining, [24] Y. Sun, J. Han, C. C. Aggarwal, and N. V. Chawla, When will it happen?: relationship prediction in heterogeneous information networks, in Proc. of the 5th ACM int. conf. on Web search and data mining, [25] M. Yin, B. Wu, and Z. Zeng, HMGraph OLAP: a novel framework for multi-dimensional heterogeneous network analysis, in Proceedings of the 15th int. workshop on Data warehousing and OLAP, [26] P. Zhao, X. Li, D. Xin, and J. Han, Graph cube: on warehousing and olap multidimensional networks, in SIGMOD Conference, 2011.

FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics

FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics André Petermann 1,2, Martin Junghanns 1, Robert Müller 2 and Erhard Rahm 1 1 University of Leipzig {petermann,junghanns,rahm}@informatik.uni-leipzig.de

More information

SCALABLE GRAPH ANALYTICS WITH GRADOOP AND BIIIG

SCALABLE GRAPH ANALYTICS WITH GRADOOP AND BIIIG SCALABLE GRAPH ANALYTICS WITH GRADOOP AND BIIIG MARTIN JUNGHANNS, ANDRE PETERMANN, ERHARD RAHM www.scads.de RESEARCH ON GRAPH ANALYTICS Graph Analytics on Hadoop (Gradoop) Distributed graph data management

More information

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

CHAPTER 5: BUSINESS ANALYTICS

CHAPTER 5: BUSINESS ANALYTICS Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse

More information

OLAP Online Privacy Control

OLAP Online Privacy Control OLAP Online Privacy Control M. Ragul Vignesh and C. Senthil Kumar Abstract--- The major issue related to the protection of private information in online analytical processing system (OLAP), is the privacy

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Journal homepage: www.mjret.in ISSN:2348-6953 IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Deepak Ramchandara Lad 1, Soumitra S. Das 2 Computer Dept. 12 Dr. D. Y. Patil School of Engineering,(Affiliated

More information

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.

More information

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed

More information

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers 60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION EXECUTIVE SUMMARY Oracle business intelligence solutions are complete, open, and integrated. Key components of Oracle business intelligence

More information

Methodology Framework for Analysis and Design of Business Intelligence Systems

Methodology Framework for Analysis and Design of Business Intelligence Systems Applied Mathematical Sciences, Vol. 7, 2013, no. 31, 1523-1528 HIKARI Ltd, www.m-hikari.com Methodology Framework for Analysis and Design of Business Intelligence Systems Martin Závodný Department of Information

More information

SQL Server 2012 Business Intelligence Boot Camp

SQL Server 2012 Business Intelligence Boot Camp SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu

More information

CHAPTER 4: BUSINESS ANALYTICS

CHAPTER 4: BUSINESS ANALYTICS Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

Deductive Data Warehouses and Aggregate (Derived) Tables

Deductive Data Warehouses and Aggregate (Derived) Tables Deductive Data Warehouses and Aggregate (Derived) Tables Kornelije Rabuzin, Mirko Malekovic, Mirko Cubrilo Faculty of Organization and Informatics University of Zagreb Varazdin, Croatia {kornelije.rabuzin,

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

Big Data Analytics. Rasoul Karimi

Big Data Analytics. Rasoul Karimi Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Introduction

More information

LDIF - Linked Data Integration Framework

LDIF - Linked Data Integration Framework LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,

More information

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days or 2008 Five Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students

More information

Data Warehousing Systems: Foundations and Architectures

Data Warehousing Systems: Foundations and Architectures Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday

More information

Transformation of OWL Ontology Sources into Data Warehouse

Transformation of OWL Ontology Sources into Data Warehouse Transformation of OWL Ontology Sources into Data Warehouse M. Gulić Faculty of Maritime Studies, Rijeka, Croatia marko.gulic@pfri.hr Abstract - The Semantic Web, as the extension of the traditional Web,

More information

BUILDING OLAP TOOLS OVER LARGE DATABASES

BUILDING OLAP TOOLS OVER LARGE DATABASES BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,

More information

B.Sc (Computer Science) Database Management Systems UNIT-V

B.Sc (Computer Science) Database Management Systems UNIT-V 1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used

More information

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS Subbarao Jasti #1, Dr.D.Vasumathi *2 1 Student & Department of CS & JNTU, AP, India 2 Professor & Department

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Lightweight Data Integration using the WebComposition Data Grid Service

Lightweight Data Integration using the WebComposition Data Grid Service Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach 2006 ISMA Conference 1 Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach Priya Lobo CFPS Satyam Computer Services Ltd. 69, Railway Parallel Road, Kumarapark West, Bangalore 560020,

More information

Discovering Interacting Artifacts from ERP Systems (Extended Version)

Discovering Interacting Artifacts from ERP Systems (Extended Version) Discovering Interacting Artifacts from ERP Systems (Extended Version) Xixi Lu 1, Marijn Nagelkerke 2, Dennis van de Wiel 2, and Dirk Fahland 1 1 Eindhoven University of Technology, The Netherlands 2 KPMG

More information

University of Gaziantep, Department of Business Administration

University of Gaziantep, Department of Business Administration University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

Business Intelligence for SUPRA. WHITE PAPER Cincom In-depth Analysis and Review

Business Intelligence for SUPRA. WHITE PAPER Cincom In-depth Analysis and Review Business Intelligence for A Technical Overview WHITE PAPER Cincom In-depth Analysis and Review SIMPLIFICATION THROUGH INNOVATION Business Intelligence for A Technical Overview Table of Contents Complete

More information

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 3 Issue 2; March-April-2016; Page No. 09-13 A Comparison of Database

More information

A Critical Review of Data Warehouse

A Critical Review of Data Warehouse Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 95-103 Research India Publications http://www.ripublication.com A Critical Review of Data Warehouse Sachin

More information

Jet Data Manager 2012 User Guide

Jet Data Manager 2012 User Guide Jet Data Manager 2012 User Guide Welcome This documentation provides descriptions of the concepts and features of the Jet Data Manager and how to use with them. With the Jet Data Manager you can transform

More information

Acknowledgements References 5. Conclusion and Future Works Sung Wan Kim

Acknowledgements References 5. Conclusion and Future Works Sung Wan Kim Hybrid Storage Scheme for RDF Data Management in Semantic Web Sung Wan Kim Department of Computer Information, Sahmyook College Chungryang P.O. Box118, Seoul 139-742, Korea swkim@syu.ac.kr ABSTRACT: With

More information

XML DATA INTEGRATION SYSTEM

XML DATA INTEGRATION SYSTEM XML DATA INTEGRATION SYSTEM Abdelsalam Almarimi The Higher Institute of Electronics Engineering Baniwalid, Libya Belgasem_2000@Yahoo.com ABSRACT This paper describes a proposal for a system for XML data

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume, Issue, March 201 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient Approach

More information

Designing internal control points in partially managed processes by using business vocabulary

Designing internal control points in partially managed processes by using business vocabulary Designing internal control points in partially managed processes by using business vocabulary Yurdaer N. Doganata, IBM T. J. Watson Research Center Hawthorne, NY 10532 yurdaer@us.ibm.com Abstract The challenges

More information

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance Data Sheet IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance Overview Multidimensional analysis is a powerful means of extracting maximum value from your corporate

More information

TopBraid Insight for Life Sciences

TopBraid Insight for Life Sciences TopBraid Insight for Life Sciences In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.

More information

The Data Warehouse Design Problem through a Schema Transformation Approach

The Data Warehouse Design Problem through a Schema Transformation Approach The Data Warehouse Design Problem through a Schema Transformation Approach V. Mani Sharma, P.Premchand Associate Professor, Nalla Malla Reddy Engineering College Hyderabad, Andhra Pradesh, India Dean,

More information

CHAPTER-6 DATA WAREHOUSE

CHAPTER-6 DATA WAREHOUSE CHAPTER-6 DATA WAREHOUSE 1 CHAPTER-6 DATA WAREHOUSE 6.1 INTRODUCTION Data warehousing is gaining in popularity as organizations realize the benefits of being able to perform sophisticated analyses of their

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

investment is a key issue for the success of BI+EDW system. The typical BI application deployment process is not flexible enough to deal with a fast

investment is a key issue for the success of BI+EDW system. The typical BI application deployment process is not flexible enough to deal with a fast EIAW: Towards a Business-friendly Data Warehouse Using Semantic Web Technologies Guotong Xie 1, Yang Yang 1, Shengping Liu 1, Zhaoming Qiu 1, Yue Pan 1, and Xiongzhi Zhou 2 1 IBM China Research Laboratory

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

Reverse Engineering in Data Integration Software

Reverse Engineering in Data Integration Software Database Systems Journal vol. IV, no. 1/2013 11 Reverse Engineering in Data Integration Software Vlad DIACONITA The Bucharest Academy of Economic Studies diaconita.vlad@ie.ase.ro Integrated applications

More information

An Approach for Facilating Knowledge Data Warehouse

An Approach for Facilating Knowledge Data Warehouse International Journal of Soft Computing Applications ISSN: 1453-2277 Issue 4 (2009), pp.35-40 EuroJournals Publishing, Inc. 2009 http://www.eurojournals.com/ijsca.htm An Approach for Facilating Knowledge

More information

Fuzzy Spatial Data Warehouse: A Multidimensional Model

Fuzzy Spatial Data Warehouse: A Multidimensional Model 4 Fuzzy Spatial Data Warehouse: A Multidimensional Model Pérez David, Somodevilla María J. and Pineda Ivo H. Facultad de Ciencias de la Computación, BUAP, Mexico 1. Introduction A data warehouse is defined

More information

Databases in Organizations

Databases in Organizations The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron

More information

Data warehousing with PostgreSQL

Data warehousing with PostgreSQL Data warehousing with PostgreSQL Gabriele Bartolini http://www.2ndquadrant.it/ European PostgreSQL Day 2009 6 November, ParisTech Telecom, Paris, France Audience

More information

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts Dimodelo Solutions Data Warehousing and Business Intelligence Concepts Copyright Dimodelo Solutions 2010. All Rights Reserved. No part of this document may be reproduced without written consent from the

More information

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,

More information

<no narration for this slide>

<no narration for this slide> 1 2 The standard narration text is : After completing this lesson, you will be able to: < > SAP Visual Intelligence is our latest innovation

More information

Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques

Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Sean Thorpe 1, Indrajit Ray 2, and Tyrone Grandison 3 1 Faculty of Engineering and Computing,

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART A: Architecture Chapter 1: Motivation and Definitions Motivation Goal: to build an operational general view on a company to support decisions in

More information

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India sk.obaidullah@gmail.com

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING M.Gnanavel 1 & Dr.E.R.Naganathan 2 1. Research Scholar, SCSVMV University, Kanchipuram,Tamil Nadu,India. 2. Professor

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact contact@datagaps.com

More information

Graphical Web based Tool for Generating Query from Star Schema

Graphical Web based Tool for Generating Query from Star Schema Graphical Web based Tool for Generating Query from Star Schema Mohammed Anbar a, Ku Ruhana Ku-Mahamud b a College of Arts and Sciences Universiti Utara Malaysia, 0600 Sintok, Kedah, Malaysia Tel: 604-2449604

More information

IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY. Maria Kowal, Galina Setlak

IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY. Maria Kowal, Galina Setlak 174 No:13 Intelligent Information and Engineering Systems IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY Maria Kowal, Galina Setlak Abstract: in this paper the implementation of Data

More information

Efficient Data Access and Data Integration Using Information Objects Mica J. Block

Efficient Data Access and Data Integration Using Information Objects Mica J. Block Efficient Data Access and Data Integration Using Information Objects Mica J. Block Director, ACES Actuate Corporation mblock@actuate.com Agenda Information Objects Overview Best practices Modeling Security

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

New Matrix Approach to Improve Apriori Algorithm

New Matrix Approach to Improve Apriori Algorithm New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, rehab.alwan@majancolleg.edu.om Associate

More information

DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Introduction Increasingly, organizations are analyzing historical data to identify useful patterns and

More information

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014 AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014 Career Details Duration 105 hours Prerequisites This career requires that you meet the following prerequisites: Working knowledge

More information

Tracking System for GPS Devices and Mining of Spatial Data

Tracking System for GPS Devices and Mining of Spatial Data Tracking System for GPS Devices and Mining of Spatial Data AIDA ALISPAHIC, DZENANA DONKO Department for Computer Science and Informatics Faculty of Electrical Engineering, University of Sarajevo Zmaja

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

How To Improve Performance In A Database

How To Improve Performance In A Database Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed

More information

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial

More information

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy Satish Krishnaswamy VP MDM Solutions - Teradata 2 Agenda MDM and its importance Linking to the Active Data Warehousing

More information

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the

More information

Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis

Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Rajesh Reddy Muley 1, Sravani Achanta 2, Prof.S.V.Achutha Rao 3 1 pursuing M.Tech(CSE), Vikas College of Engineering and

More information

Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries. Abstract Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

More information

Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations

Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations www.ijcsi.org 202 Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations Muhammad Saqib 1, Muhammad Arshad 2, Mumtaz Ali 3, Nafees Ur Rehman 4, Zahid

More information

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015 E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing

More information

www.dotnetsparkles.wordpress.com

www.dotnetsparkles.wordpress.com Database Design Considerations Designing a database requires an understanding of both the business functions you want to model and the database concepts and features used to represent those business functions.

More information

A Design and implementation of a data warehouse for research administration universities

A Design and implementation of a data warehouse for research administration universities A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced

More information

Distance Learning and Examining Systems

Distance Learning and Examining Systems Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed

More information