Data Mining Standards

Size: px
Start display at page:

Download "Data Mining Standards"

Transcription

1 Data Mining Standards Arati Kadav Jaya Kawale Pabitra Mitra Abstract In this survey paper we have consolidated all the current data mining standards. We have categorized them in to process standards, XML standards, standard APIs, web standards and grid standards and discussed them in considerable detail. We have also designed an application using these standards. We later also analyze the standards their influence on data mining application development and later point out areas in the data mining application development that need to be standardized. We also talk about the trend in the focus areas addressed by these standards.

2 Data Mining Standards Introduction Data Mining Standards Process Standards CRISP-DM XML Standards/ OR Model defining standards<todo> PMML CWM-DM Web Standards XMLA Semantic Web Data Space Application Programming Interfaces (APIs) SQL/ MM DM Java API s Microsoft OLEDB-DM Grid Services OGSA and data mining Developing Data Mining Application Using Data Mining Standards Application Requirement Specification Design and Deployment Analysis Conclusion Appendix: A1] PMML example A2] XMLA example A3] OLEDB A4] OLEDB-DM example A5] SQL / MM Example [A6] Java Data Mining Model Example... 32

3 1 Introduction Researchers in data mining and knowledge discovery are creating new, more automated methods for discovering knowledge to meet the needs of the 21st century. This need for analysis will keep growing, driven by the business trends of one-to-one marketing, customerrelationship management, enterprise resource planning, risk management, intrusion detection and Web personalization all of which require customer-information analysis and customerpreferences prediction. [GrePia] Deploying a data mining solution requires collecting data to be mined, cleaning and transforming its attributes to provide the inputs for data mining models. Also these models need to be built, used and integrated with different applications. Moreover it is required that currently deployed data management software be able to interact with the data mining models using standards APIs. The scalability aspect calls for collecting data to be mined from distributed and remote locations. Employing common data mining standards greatly simplifies the integration, updating, and maintenance of the applications and systems containing the models. [stdhb] Over the past several years, various data mining standards have matured and today are used by many of the data mining vendors, as well as by others building data mining applications. With the maturity of data mining standards, a variety of standards-based data mining services and platforms can now be much more easily developed and deployed. Related fields such as data grids, web services, and the semantic web have also developed standards based infrastructures and services relevant to KDD. These new standards and standards based services and platforms have the potential for changing the way the data mining is used. [kdd03] The data mining standards are concerned with one or more of the following issues [stdhb]: 1. The overall process by which data mining models are produced, used, and deployed: This includes, for example, a description of the business interpretation of the output of a classification tree. 2. A standard representation for data mining and statistical models: This includes, for example, the parameters defining a classification tree. 3. A standard representation for cleaning, transforming, and aggregating attributes to provide the inputs for data mining models: This includes, for example, the parameters defining how zip codes are mapped to three digit codes prior to their use as a categorical variable in a classification tree. 4. A standard representation for specifying the settings required to build models and to use the outputs of models in other systems: This includes, for example, specifying the name of the training set used to build a classification tree. 5. Interfaces and Application Programming Interfaces (APIs) to other languages and systems: There are standard data mining APIs for Java and SQL. This includes, for

4 example, a description of the API so that a classification tree can be built on data in a SQL database. 6. Standards for viewing, analyzing, and mining remote and distributed data: This includes, for example, standards for the format of the data and metadata so that a classification tree can be built on distributed web-based data. The current established standards address these different aspects or dimensions of data mining application development. They are summarized in Table 1.1. Areas Data Mining Standard Description Process Standards XML Standards Standard APIs Protocol for transport of remote and distributed data. Model Scoring Standard Web Standards Grid Standards Cross Industry Standard Process for Data Mining (CRISP-DM) Predictive Model Markup Language (PMML) Common Warehouse Model for Data Mining (CWM-DM) SQL/MM, Java API (JSR-73), Microsoft OLE-DB Data Space Transport Protocol (DSTP) Predictive scoring and update protocol (PSUP) XML for analysis (XMLA) Semantic Web Data Space Open Grid Service Architecture Captures Data Mining Process: Begins with business problem and ends with the deployment of knowledge gained in the process. Model for representing Data Mining and statistical data. Model for meta data that specifies metadata for building settings, model representations, and results from model operations Models are defined through the Unified Modeling Language. API for Data Mining applications DSTP is used for distribution, enquiry and retrieval of data in a data space. PSUP can be used for both on line real time scoring and updates as well as scoring in an off line batch environment (Scoring is the process of using statistical models to make decisions.) Standard web service interface designed specifically for online analytical processing and data-mining functions (uses Simple Object Access Protocol (SOAP)) Semantic Web provides a framework to represent information in machine processable form and can be used to extract knowledge from Data Mining Systems. Provides an infrastructure for creating a web of data. Is built around standards like XML, DSTP, PSUP. Helps handle large data sets which are present on remote and distributed locations. Developed by Globus, this standard talks about Service based open architecture for distributed virtual organizations. It will provide data mining engine with secure, reliable and scaleable high bandwidth access to the various distributed data sources and formats across various administrative domains. Table 1: Summary of Data Mining Standards

5 Section 2 describes the above standards in details. In section 3 we design and develop a data mining application using the above standards. Section 4 analyzes the standards and their relationship with each other and proposes the areas where standards are needed. 2. Data Mining Standards 2.1 Process Standards CRISP-DM CRISP-DM stands for CRoss Industry Standard Process for Data Mining. It is industry, tool and application neutral standard for defining and validating data mining process. It was conceived in late 1996 by DailerChrysler, SPSS and NCR. The latest version is CRISP-DM 1.0. Motivation: As the market interest in data mining was resulting into its widespread uptake every new adopter of data mining was required to come up with his own approach of incorporating data mining in his current set up. There was also a requirement of demonstrating that data mining was sufficiently mature to be adopted as a key part of any customer s business process. CRISP-DM provided the standard process model for conceiving, developing and deploying a data mining project which is non-propriety and freely distributed. Standard Description: The CRISP-DM organizes the process model into hierarchical process model. At the top level the task is divided into phases. Each phase consists of several second level generic tasks. These tasks are complete (covering the phase and all possible data mining applications) and stable (valid for yet unforeseen developments). These generic tasks are mapped to specialized tasks. Finally these specialized tasks contain several process instances which are record of the actions, decisions and results of an actual data mining engagement process. This is depicted in Figure 1. Mapping of the generic tasks (e.g. task for cleaning data) to specialized task (e.g. cleaning numerical or categorical value) depends on the data mining context. CRISP-DM distinguishes between four different dimensions of data mining contexts. These are: Application domain (areas of the project e.g. Response Modeling) Data mining problem type (e.g. clustering or segmentation problem) Technical aspect (issues like outliers or missing values) Tool and technique (e.g. Clementine or decision trees).

6 The more value for these different context domains are fixed, the more concrete is the data mining context. The mappings can be done for the current single data mining project in hand or for the future. The process reference model consists of phases shown in figure 1 and summarized in table 2. The sequence of the phases is not rigid. Depending on the outcome of each phase, which phase or which particular task of a phase to be performed next is determined. [CRSP] Phases Business understanding Specialized Tasks Data understanding Data Preparation Modelling Each Phase Generic Tasks Generic Tasks M A P P I N G Specialized Tasks Specialized Tasks Process Instances Evaluation Deployment Four Level breakdown of CRISP-DM methodology Figure 1: CRISP-DM process Model Interoperability with other standards: CRISP-DM provides a reference model which is completely neutral to other tools, vendors, applications or existing standards. Phases Description Business understanding Focuses on assessing and understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives. Data - Starts with an initial data collection.

7 understanding - The data collected is then described and explored (e.g. target attribute of a prediction task is identified). - Then the data quality is verified (e.g. noise or missing values). Data preparation Covers all activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are likely to be performed multiple times and not in any prescribed order. The data to be used for analysis is - Selected - Cleaned (their data quality is raised to the level required by the analysis technique) - Constructed (e.g. derived attributes like area = length * breadth are created) - Integrated (information from multiple tables is combined to create new labels) and formatted. Modeling - Specialized Modeling techniques are selected (e.g decision tree with C4.5 algorithm) - Test design is generate to test model s quality and validity. - The modeling tool is run on created data set. - The model is assessed and evaluated. (accuracy tested) Evaluation - The degree to which the model meets the business objectives is assessed. - The model undergoes a review process identifying the objectives missed or accomplished based on this whether the project should be deployed or not is determined. Deployment Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise. A deployment plan is chalked out before actually carrying out the deployment. Table 2: Phases in CRISP-DM Process Reference Model 2.2 XML Standards/ OR Model defining standards<todo> PMML PMML stands for The Predictive Model Markup Language. It is being developed by the Data Mining Group [dmg], a vendor led consortium which currently includes over a dozen vendors including Angoss, IBM, Magnify, MINEit, Microsoft, National Center for Data Mining at the University of Illinois (Chicago), Oracle, NCR, Salford Systems, SPSS, SAS, and Xchange. PMML is used to specify the models. The latest version of PMML Version 2.1 was released in March, There have been 6 releases so far. Motivation: A standard representation for data mining and statistical models was required. Apart from this it was required that it be relatively narrow so that it could serve as common ground for several subsequent standards so that these standards could interoperate. Standard Description:

8 PMML is an XML mark up language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. It allows users to develop models within one vendor's application, and use other vendors' applications to visualize, analyze, evaluate or otherwise use the models. It describes the inputs to data mining models, the transformations used prior to prepare data for data mining, and the parameters which define the models themselves. [PMMSche] [stdhb]. PMML consists of the components summarized in table 3. PMML Component Data Dictionary Mining Schema Transformation Dictionary Model Statistics Model Parameters Mining Functions Description Data dictionary contains data definitions that do not vary with the model. - Defines the attributes input to models - Specifies the type and value range for each attribute. The mining schema contains information that is specific to a certain model and varies with the model. Each model contains one mining schema that lists the fields used in the model. These fields are a subset of the fields in the Data Dictionary. E.g. the Mining Schema specifies the usage type of an attribute, which may be active (an input of the model), predicted (an output of the model), or supplementary (holding descriptive information and ignored by the model). Defines derived fields. Derived fields may be defined by: - Normalization which maps continuous or discrete values to numbers - Discretization which maps continuous values to discrete values - Value mapping, which maps discrete values to discrete values - Aggregation which summarizes or collects groups of values, e.g. by computing averages. The Model Statistics component contains basic univariate statistics about the model, such as the minimum, maximum, mean, standard deviation, median, etc. of numerical attributes. PMML also specifies the actual parameters defining the statistical and data mining models per se. The different models supported in Version 2.1 are: regression models, clusters models, trees, neural networks, Bayesian models, association rules, sequence models. Since different models like neural networks and logistic reasoning can be used for different purposes e.g. some instances implement prediction of numeric values, while others can be used for classification. Therefore, PMML Version 2.1 defines five different mining functions which are association rules, sequences, classifications, regression and clustering. Table 3: PMML Components of Data Mining Model Since PMML is an XML based standard, the specification comes in the form of an XML Document Type Definition (DTD). A PMML document can contain more than one model. If the application system provides a means of selecting models by name and if the PMML

9 consumer specifies a model name, then that model is used; otherwise the first model is used. Please Appendix A1 for an example of PMML. [stdhb] Interoperability with other standards: PMML is complementary to many other data mining standards. Its XML interchange format is supported by several other standards, such as XML for Analysis, JSR 73, and SQL/MM Part 6: Data Mining. PMML provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications CWM-DM CWM-DM stands for Common Warehouse Model for Data Mining. It was specified by members of the JDM expert group and has many common elements with JDM. It s a new specification for data mining metadata and has recently been defined using the Common Warehouse Metadata (CWM) specification from Object Management Group. Motivation: Different data warehousing solutions including data mining solutions should be provided transparently to applications through a unified metadata management environment. Metadata not only links individual software components provided by one software vendor, but it also has the potential to open a data warehousing platform from one provider to third-party analytic tools and applications. The Common Warehouse Metamodel is a specification that describes metadata interchange among data warehousing, business intelligence, knowledge management and portal technologies. The OMG Meta-Object Facility bridges the gap between dissimilar metamodels by providing a common basis for meta-models. If two different meta-models are both MOF-conformant, then models based on them can reside in the same repository. Standard Description: The CWM-DM consists of the following conceptual areas which are summarized in Table 4. CWM DM also defines tasks that associate the inputs to mining operations, such as build, test, and apply (score). [CurrPaYa] CWM-DM areas Model description Settings Description This consists of: - MiningModel, a representation of the mining model itself - MiningSettings, which drive the construction of the model - ApplicationInputSpecification, which specifies the set of input attributes for the model - MiningModelResult, which represents the result set produced by the testing or application of a generated model. Mining Settings has four subclasses representing settings for

10 - StatisticsSettings - ClusteringSettings - SupervisedMiningSettings - AssociationRulesSettings. The Settings represents the mining settings of the Data Mining algorithms on the function level including specific mining attributes. Attributes The Attributes defines the Data Mining attributes and has MiningAttribute as its basic class. Table 4: CWM-DM conceptual areas Interoperability with other standards: CWM supports interoperability among data warehouse vendors by defining Document Type Definitions (DTDs) that standardize the XML metadata interchanged between data warehouses. The CWM standard generates the DTDs using the following three steps: First, a model using the Unified Modeling Language is created. Second the UML model is used to generate a CWM interchange format called the Meta-Object Facility / XML Metadata Interchange. Third, the MOF/XML is converted automatically to DTDs. 2.3 Web Standards With the expansion of the World Wide Web, it has become one of the largest repositories of data. Hence it is possible that data to be mined is distributed and needs to be accessed via web XMLA Microsoft and Hyperion had introduced XML for Analysis which is a Simple Object Access Protocol (SOAP)-based XML API designed for standardizing data access between a web client application and an analytic data provider, such as an OLAP or data mining application. XMLA APIs supports the exchange of analytical data between clients and servers on any platform and with any language.[xmla] Motivation: Under traditional data access techniques, such as OLE DB and ODBC, a client component that is tightly coupled to the data provider server must be installed on the client machine in order for an application to be able to access data from a data provider. Tightly coupled client components can create dependencies on a specific hardware platform, a specific operating system, a specific interface model, a specific programming language, and a specific match between versions of client and server components. The requirement to install client components and the dependencies associated with tightly coupled architectures are unsuitable for the loosely coupled, stateless, cross-platform, and language independent environment of

11 the Internet. To provide reliable data access to Web applications the Internet, mobile devices, and cross-platform desktops need a standard methodology that does not require component downloads to the client. Extensible Markup Language (XML) is generic and can be universally accessed. XML for Analysis advances the concepts of OLE DB by providing standardized universal data access to any standard data source residing over the Web without the need to deploy a client component that exposes COM interfaces. XML for Analysis is optimized for the Web by minimizing roundtrips to the server and targeting stateless client requests to maximize the scalability and robustness of a data source. [kddxml] Standard Description: XMLA XML based communication API - defines two methods, Discover and Execute, which consume and send XML for stateless data discovery and manipulation.. The two APIs are summarized in table 5. XMLA APIS Discover Description It is used to obtain information (e.g. a list of available data sources) and meta data from Web Services. The data retrieved with the Discover method depends on the values of the parameters passed to it. Syntax: Discover ( [in] RequestType As EnumString, [in] Restrictions As Restrictions [in] Properties As Properties,// [out] Resultset As Rowset) RequestType: Determines the type of information to be returned Restrictions: Enables the user to restrict the data returned in Resultset Properties: Enables the user to control some aspect of the Discover method, such as defining the connection string, specifying the return format of the result set, and specifying the locale in which the data should be formatted. The available properties and their values can be obtained by using the DISCOVER_PROPERTIES request type with the Discover methodresultset. ResultSet: This required parameter contains the result set returned by the provider as a Rowset object. Execute The Execute method is used for sending action requests to the server. This includes requests involving data transfer, such as retrieving or updating data on the server. Syntax: Execute ( [in] Command As Command, [in] Properties As Properties, [out] ResultSet As ResultSet) Command: It consists of a provider-specific statement to be executed. For example, this parameter contains a <Statement> tag that contains an SQL command or query. Properties: Each property allows the user to control some aspect of the Execute method, such as defining the connection string, specifying the return format of the result set, or specifying the locale in which the data should be formatted.

12 ResultSet: This required parameter contains the result set returned by the provider as a Rowset object. The Discover and Execute methods enable users to determine what can be queried on a particular server and, based on this, submit commands to be executed. An Example The client having the URL for a server hosting a Web service sends Discover and Execute calls using the SOAP and HTTP protocols to the server. The server instantiates the XMLA provider, which handles the Discover and Execute calls. The XMLA provider fetches the data, packages it into XML, and then sends the requested data as XML to the client. Table 5: XMLA APIs See Appendix A2 for a detailed example of XMLA. Interoperability with other standards: XMLA specification is built upon the open Internet standards of HTTP, XML, and SOAP, and is not bound to any specific language or technology Semantic Web The World Wide Web Consortium (W3C) standards for the semantic web defines a general structure for knowledge using XML, RDF, and ontologies [W3C SW]. The semantic web approach develops languages for expressing information in machine processable form. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners and is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming. This infrastructure in principle can be used to store the knowledge extracted from data using data mining systems, although at present, one could argue that this is more of a goal than an achievement. As an example of the type of knowledge that can be stored in the semantic web, RDF can be used to code assertions such as "credit transactions with a dollar amount of $1 at merchants with a MCC code of 542 have a 30% likelihood of being fraudulent." [stdhb] Data Space Data Space is an infrastructure for creating a web of data or data webs. The general operations in the web involve browsing remote pages or documents where as the main purpose of having a data space is to explore and mine remote columns of distributed data. Data webs are similar to semantic webs except that they house data instead of documents. Motivation:

13 The web today contains a large amount of data. Although the amount of scientific, health care and business data is exploding, we do not have the technology today to casually explore remote data nor to mine distributed data.[stdhb]. The size of individual data sets has also increased. There are a certain issues involved in the process of analyzing such a data. The multimedia documents on the web cannot be directly used for the process of mining and analyzing. Another issue is that the current web structure does not optimally support handling of large data sets and is best suited only for browsing hypertext documents.[rdsw] Hence there is a need to have a standard support to this data. The concept of a data space helps explore, analyze and mine such data. Standard Description: The DataSpace project is supported by the National Science Foundation and has Robert Grossman as its director. DataSpace is built around standards developed by the Data Mining Group and W3C. The concept of a Data Space is based upon XML and web services which are W3C maintained standards. Data Space defines a protocol DSTP (DataSpace Transfer Protocol) for distribution, enquiry and retrieval of data in a DataSpace. It also works with the real time scoring standard PSUP( Predictive Scoring and Update Protocol).[Dsw] The DataSpace consists of the following components: Data Web DSTP DSTP PSUP for realtime Open Source Server Open Source Client Data Mining Engine Access Remote Data View and Mine Data PMML Figure 2: DataSpace Architecture DSTP is the protocol for the distribution, enquiry and retrieval of data in a DataSpace. The data could be stored in files, databases or distributed databases. It has a corresponding XML file, which contains Universal Correlation Key tags (UCK) that act as identification keys.

14 The UCK is similar to a primary key in a database. A join can be performed by merging data from different servers on the basis of UCKs.[DSTP] The Predictive Scoring and Update Protocol is a protocol for event driven, real time scoring. Real time applications are becoming increasing important in business, e-business, and health care. PSUP provides the ability to use PMML applications in real time and near real time applications. For the purpose of data mining a DSTP client is used to access the remote data. The data is retrieved from the required sites and DataSpace is designed to interoperate with proprietary and open source data mining tools. In particular the open source statistical package R has been integrated into Version 1.1 of DataSpace and is currently being integrated into Version 2.0. DataSpace also works with predictive models in PMML, the XML markup language for statistical and data mining models. Standard DSTP PSUP Description Provides direct support for attributes, keys and meta data. Also supports: Attribute Selection Range Queries Sampling Other functions for accessing and analyzing remote data Is a protocol is a protocol for event driven, real time scoring. PSUP provides the ability to use PMML in real time applications. Table 6: Summary of Data Space Standards 2.4 Application Programming Interfaces (APIs) Earlier, application developers wrote their own data mining algorithms for applications, or used sophisticated end-user GUIs. The GUI package for data mining included complete range of methods for data transformation, model building, testing and scoring. But it remained challenging to integrate data mining and the application code due to lack of proper APIs to do the task. APIs were vendor specific and hence proprietary. Thus the product developed would become dependent and hence risky to market. To switch to a different vendor s solution the entire code had to be re-written which made the process costly. In short it was realized that data-mining solutions must co-exist. Hence the need arose to have a common standard for the APIs. The ability to leverage data mining functionality via a standard API greatly reduces risk and potential cost. With a standard API customers can use multiple products for solving business problems by applying the most appropriate algorithm implementation without investing resources to learn each vendor's proprietary API. Moreover, a standard API makes data mining more accessible to developers while making developer skills more transferable. Vendors can now differentiate themselves on price, performance, accuracy, and features. [JDM]

15 2.4.1 SQL/ MM DM SQL/MM is an ISO/IEC international standardization project. The SQL/MM suite of standards includes parts used to manage full-text data, spatial data, and still images. The part 6 of the standard addresses data mining. Motivation: Database systems should be able to integrate data mining applications in a standard way so as to enable the end-user to perform data mining with ease. Data Mining has become a part of modern data management and could be said to be a sophisticated tool to extract information or to aggregate the original data. SQL is a language widely used by database users today and provides basic operations of aggregate, etc. Thus Data Mining could be said to be a natural extension to the primitive functionalities provided by SQL. Hence it becomes obvious to standardize data mining through SQL. Standard Description: The SQL/MM Part 6:Data mining standard provides an API for data mining applications to access data from SQL-MM compliant relational databases. It defines structured user defined types including associated methods to support data mining. It attempts to provide a standardized interface to data mining algorithms that can be layered atop of any objectrelational database system and even deployed as a middleware when required. [Sqlm] The table below provides a brief description of the standard: [Sqlm][Cti] Description Data Mining Techniques 4 Different data mining techniques supported by this: Row Model Allows to search for patterns and relationships between different parts of your data Clustering Model Regression Model Classification Model Helps grouping of Clusters Helps predict the ranking of new data base upon the analysis of existing data Helps predicting the grouping or class of the new data Data Mining Stages 3 distinct stages through which data can be mined Train Choose technique most appropriate Set parameters to orient the model Train by applying reasonably sized data Test Apply For classification and regression test with known data and compare the model s predictions Apply the model to the business data

16 Supporting Data Types DM_*Model, Defines the model that you want to use when mining your data DM_*Settings Stores various parameters of the data mining model, e.g. - Depth of a decision tree - Maximum number of clusters DM_*Result Created by running data mining model against real data DM_*TestResult Holds the results of testing during the training phase of the data mining models DM_*Task Stores the metadata that describe the process and control of the testing and of the actual runnings. where * could be Clas - Classification Model Rule Rule Model Clustering Clustering Model Regression Regression Model Table 7: Summary of SQL/MM DM Standard Java API s Java Specification Request -73 (JSR-73) also known as Java Data Mining (JDM), defines a pure Java API to support data mining operations. The JDM development team was led by Oracle and included other members like Hyperion, IBM, Sun Microsystems, and others. Motivation: Java has become a language that is widely used by application developers. The Java 2 Platform, Enterprise Edition (J2EE) provides a standard development and deployment environment for enterprise applications. It reduces the cost and complexity of developing multi-tier enterprise services by defining a standard, platform-independent architecture for building enterprise components. JSR-73 provides a standard way to create, store, access and maintain data and metadata supporting data mining models, data scoring and data mining results serving J2EE compliant application servers. It provides a single standard API or data mining system that will be understood by a wide variety of client applications and components running on the J2EE platform. This specification does not preclude, however, the use of JDM services outside of the J2EE environment.

17 Standard Description: Defining compliance for vendor specification asks for addressing several issues. In JDM, data mining includes the functional areas of classification, regression, attribute importance, clustering and association. These are supported by Supervised and unsupervised algorithms as decision trees, neural networks, Naïve Bayes, Support Vector Machines, K-means on structured data. A particular implementation of this specification may not necessarily support all interfaces and services provided by JVM. JDM is based on a generalized, object-oriented, data mining conceptual model leveraging emerging data mining standards such the Object Management Group s Common Warehouse Metadata (CWM), ISO s SQL/MM for Data Mining, and the Data Mining Group s Predictive Model Markup Language (PMML), as appropriate implementation details of JDM are delegated to each vendor. A vendor may decide to implement JDM as a native API of its data mining product. Others may opt to develop a driver/adapter that mediates between a core JDM layer and multiple vendor products. The JDM specification does not prescribe a particular implementation strategy, nor does it prescribe performance or accuracy of a given capability or algorithm. To ensure J2EE compatibility and eliminate duplication of effort, JDM leverages existing specifications. In particular, JDM leverages the Java Connection Architecture to provide communication and resource management between applications and the services that implement the JDM API. JDM also reflects aspects the Java Metadata Interface. [JDM] Architectural Components Data Mining Functions Data Mining Tasks JDM has 3 logical components: Application Programming Interface: Is the end-user visible component of a JDM implementation that allows access to the services provided by the data mining engine. An application developer would require the knowledge of only this library Data Mining Engine: Provides the infrastructure that offers a set of data mining services to the API clients Metadata repository: Serves to persistent data mining objects. The repository can be based on the CWM framework. JDM specifies the following data mining functions: Classification: Classification analyzes the input or the build data and predicts to which class a given case belongs. Regression: Regression involves predicting a continuous, numerical valued target attribute given a set of predictors. Attribute Importance: Determines which attributes are most important for building a model. Helps users to reduce the model build time, scoring time, etc. Similar to feature selection. Clustering: Clustering Analysis finds out clusters embedded in the data, where a cluster is a collection of data objects similar to one another. Association: Has been used in market basket analysis and analysis of customer behavior for the discovery of relationships or correlations among a set of items. Data Mining revolves around a few common data mining tasks: Building a Model: Users define input tasks specifying the parameters model name, mining data and mining settings. JDM enables users to build models in the functional areas classification, regression, attribute importance, clustering and association.

18 Testing a Model: Gives an estimate of the accuracy a model has in predicting the target. Follows model building to compute the accuracy of a model s predictions when the model is applied to a previously unseen data set. Input consists of model and data for testing the model. Test results could be confusion matrix, error estimates, etc. Lift is a measure of effectiveness of a predictive model. A user may specify to compute lift. Applying a Model: Model is finally applied to a case. Produces one or more predictions or assignments. JDM enables Object Import and Export: Could be useful in Interchange with other DMEs Persistent storage outside the DME Object inspection or manipulation To enable import and export of system metadata JDM specifies 2 standards for defining metadata in XML PMML for mining models CWM Computing statistics on data: Provides to compute various statistics on a given physical data set. Verifying task correctness Extension Packages Conformance Statement javax.datamining javax.datamining.settings javax.datamining.models javax.datamining.transformations javax.datamining.results JDM API standard is flexible and allows vendors to implement only specific functions that they want their product to support. Packages divided into 2 categories - Required: Vendors must provide an implementation for this. - Optional: A vendor may or may not implement these. Table 8: Summary of Java Data Model Standards Microsoft OLEDB-DM In July 2001 Microsoft released specification document [3] for first real industrial standard for data mining called OLE DB for Data Mining. This API is supported by Microsoft and in part of release of Microsoft SQL Server 2000 (Analysis Server component). See Appendix A3 for an overview of OLEDB. Motivation: An industry standard was required for data mining so that different data mining algorithms from various data mining ISVs can be easily plugged into user applications. OLEDB-DM addressed the problem of deploying models (once the model is generated, how to store, maintain, and refresh it as data in the warehouse is updated, how to

19 programmatically use the model to do predictions on other data sets, and how to browse models over the life cycle of an enterprise) Another motivation to introduce OLE DB DM was to enable enterprise application developers to participate in building data mining solutions. For this it was required that the infrastructure for supporting data mining solution is aligned with traditional database development environment and with APIs for database access. Standard Description: OLE DB for DM is an OLE DB extension that supports data mining operations over OLE DB data providers. It has a concept of Data mining providers: Software packages that provide data mining algorithms. Data mining consumers: Those applications that use data mining features. OLE DB for DM specifies the API between data mining consumers and data mining providers. It introduces two new concepts of cases and models in the current semantics of OLEDB. CaseSets: Input data is in the form of a set of cases (caseset). A case captures the traditional view of an observation by machine learning algorithms as consisting of all information known about a basic entity being analyzed for mining as opposed to the normalized tables stored in databases. It makes use of the concept of nested tables for this. Data mining model (DMM): It is treated as if it were a special type of table: A caseset is associated with a DMM and additional meta-information while creating (defining) a DMM. When data (in the form of cases) is inserted into the data mining model, a mining algorithm processes it and the resulting abstraction (or DMM) is saved instead of the data itself. Once a DMM is populated, it can be used for prediction, or its content can be browsed for reporting. The key operations to support on a data mining model are shown in Table 9. This model has an advantage of having a low cost of deployment. See Appendix A3 for an example. Operations on DMM Define Populate Description Identifying the set of attributes of data - to be predicted - to be used for prediction and the algorithm used to build the mining model Populating a mining model from training data using the algorithm specified in its definition above CREATE statement Syntax Repeatedly via the INSERT INTO statement (used to add rows in a SQL table), and emptied (reset) via the DELETE statement. Predict Browse Predicting attributes for new data using a mining model that has been populated Browsing a mining model for reporting and visualization applications Table 9: DMM Operations Prediction on a dataset made by making a PREDICTION JOIN between the mining model and the data set. Using SELECT statement

20 Interoperability with other standards: OLE DB for DM is independent of any particular provider or software and is meant to establish a uniform API. It is not specialized to any specific mining model but is structured to cater to all well-known mining models. [MSOLE] 2.5 Grid Services Grids are collections of computers or computer networks, connected in a way that allows for sharing of processing power and storage as well as applications and data. Grid technologies and infrastructures are hence defined as supporting the sharing and coordinated use of diverse resources in dynamic, distributed virtual organizations.[grid] OGSA and data mining The Open Grid Services Architecture (OGSA) represents an evolution towards a Grid architecture based on Web services concepts and technologies. It consists of a well-defined set of basic interfaces which used to communicate extensibility, vendor neutrality, and commitment to a community standardization process. It uses the Web Services Description Language (WSDL) to achieve self-describing, discoverable services and interoperable protocols, with extensions to support multiple coordinated interfaces and change management. Motivation: In a distributed environment, it is important to employ mechanisms that help in communicating interoperably. A service oriented view partitions this interoperability problem into two sub problems: Definition of service interfaces and the identification of the protocol(s) that can be used to invoke a particular interface Agreement on a standard set of such protocols A service-oriented view allows local/remote transparency, adaptation to local OS services, and uniform service semantics. A service-oriented view also simplifies encapsulation behind a common interface of diverse implementations that allows for consistent resource access across multiple heterogeneous platforms with local or remote location transparency, and enables mapping of multiple logical resource instances onto the same physical resource and management of resources. Thus service definition is decoupled from service invocation. OGSA describes and defines a service oriented architecture composed of a set of interfaces and their corresponding behaviors to facilitate distributed resource sharing and accessing in heterogeneous dynamic environments. Data is inherently distributed and hence the data mining task needs to be performed keeping this distributed environment in mind. Also it is required to provide data mining as a service. Grid technology provides secure, reliable and scaleable high bandwidth access to distributed data sources across various administrative domains which can be exploited. Standard Description:

21 Service Requester Bind Find Transport Medium Service Provider Service Directory Publish Figure 3: Service oriented architecture Figure 3 shows the individual components of the service-oriented architecture (SOA). The service directory is the location where all information about all available grid services is maintained. A service provider that wants to offer services publishes its services by putting appropriate entries into the service directory. A service requestor uses the service directory to find an appropriate service that matches its requirements. An example of data mining scenario using this architecture is as follows. When a service requestor locates a suitable data mining service, it binds to the service provider, using binding information maintained in the service directory. The binding information contains the specification of the protocol that the service requestor must use as well as the structure of the request messages and the resulting responses. The communication between the various agents occurs via an appropriate transport mechanism. Grid offers basic services that include resource allocation and process management, unicast and multicast communication services, security services, status monitoring, remote data access etc. Apart from this there is Data Grid that provides Grid FTP (a secure, robust and efficient data transfer protocol) and Metadata information management system. Hence, the grid-provided functions do not have to be re-implemented for each new mining system e.g. single sign-on security, ability to execute jobs at multiple remote sites, ability to securely move data between sites, broker to determine best place to execute mining job, job manager to control mining jobs etc. Therefore, mining system developers can focus on the mining applications and not the issues associated with distributed processing. However, the standards for these are yet to be developed. Interoperability with other standards:

22 The standard for Grid Services is yet to emerge. 3. Developing Data Mining Application Using Data Mining Standards In this section we describe a data mining application. We then describe its architecture using data mining standards. However we see that not all the architecture constructs can be standardized as no standards are available for them. We point this out in more detail in Section 4 below. 3.1 Application Requirement Specification A multinational food chain has its outlets in several countries i.e. India, USA and China. The outlets in each of these want information regarding: Combinations of food items that constitute their happy meal. Most preferred food items they need to target for their advertisements in the respective country. Preferred seasonal food items. Information about the food item, their prices and their popularity and coming up with patterns that reveal the relationship between the pricing and the popularity. The above information must be obtained from these transactions solely as the food chain company does not want to indulge in any surveys. All the customer transactions of each outlet are recorded. The transactions contain along with customer id, the food items, their prices and the time at which the order was placed. However each outlet could store transactions in different databases like Oracle, Sybase for the same. As we see this is a typical data mining application. In the next section we describe the run time architecture of the data mining system. We also see how application of standards make the components of this architecture independent of each other as well as of the underlying platform or technology. 3.2 Design and Deployment Architecture Overview: In the architecture shown in Figure 4, the outlets (data sources) are spread in multiple locations (Location A, B, C) henceforth referred to as remote data sources. The data before being mined has to be aggregated in a single location. For this we use a client server architecture. Each of the remote data sources have a data server which might connect to the respective database using any standard drivers. A client is deployed in the location where data to be mined is collected. This client contacts these servers for browsing or retrieving data. As mentioned in the figure we need a standard for data transport over the web so that this entire client server architecture can be independently developed and deployed.

23 Location A Location B Location C Data Data Data Server Server Server 1) Standard for data transport over the web Location where data mining task is being Client Data Warehouse 5) Standard for data cleaning, transformation Driver 2) Data Connectivity 4) Standard API 6) Standard for representing decision Output Data Mining Engine Application Mining Model Data Mining Engine Mining Model 3) Standard Model Representation Data Mining Engine Mining Model 4) Standard API Figure 4 Architecture of Data Mining Application The client stores the data in a data warehouse so that data mining operations can be performed on it. But before the data is to be mined it needs to be cleaned and transformed. Some standards should be present for this purpose. The DataMining Engine accesses the data in the warehouse with the help of standard data connectivity mechanisms. It produces a mining model such a decision tree, etc. This model is then used to discover patterns in the data. It is required that the model produced be represented in a standard format so as to allow inter-operability across vendors as well as different data mining engine. Hence a standard is required for the same.

24 The data mining engine is accessible to the end-user via an application programming interface. The application requiring data mining contains the calls of the API. This set of APIs should be standardized so as to allow the application to switch to a different vendors solution without being concerned about changing his entire code. Also, once the data mining task is performed the output produced needs to be incorporated into the existing business model. Hence the decisions or suggestions recommended by the data mining model needs to be stored. For this a standardized decision model is required that incorporates this decision model with the current business model. Standards employed in the architecture: For the data transport over the web the standard DSTP [Section 2.3.3] is employed. The mining model produced by the data-mining engine is PMML [Section 2.2.1] compliant so as to enable inter-operability. If not PMML then any model that confirms to meta model specifications of CWM-DM [Section 2.2.2] must be used. However the most widely used model currently is PMML. The data-mining engine connects to the data warehouse using any of the JDBC or ODBC drivers. Here we are using JDBC driver for it. The application uses the data mining services with the help of the standard API JSR-73. [Section 2.4.2]. The entire system should be developed using the Process Standard CRISP-DM [Section 2.1.1]. If we want this data mining application to be deployed as a web service then we can use a provider server at this end that supports XMLA s Execute and Discover APIs [Section 2.3.1]. Thus any third party can fire queries without having any software installed at its end. Standards not yet defined: As we see there are no current standards that can be used for data transformation. Also there is no standard decision model that could incorporate the output of a data-mining task into our engine. We discuss this further in section 4. Scoring should also be integrated with the mining applications via published standard API's and run-time-library scoring engines. Automation of the scoring process will reduce processing time, allow for the most up-to-date data to be used, and reduce error. 4. Analysis Earlier data mining comprised of algorithms working on flat files with no standards. Industry interest led to development of standards that enabled representation of these algorithms in a model and separation of online development of these models with their deployment. These

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

Introduction to Service Oriented Architectures (SOA)

Introduction to Service Oriented Architectures (SOA) Introduction to Service Oriented Architectures (SOA) Responsible Institutions: ETHZ (Concept) ETHZ (Overall) ETHZ (Revision) http://www.eu-orchestra.org - Version from: 26.10.2007 1 Content 1. Introduction

More information

Model-Driven Data Warehousing

Model-Driven Data Warehousing Model-Driven Data Warehousing Integrate.2003, Burlingame, CA Wednesday, January 29, 16:30-18:00 John Poole Hyperion Solutions Corporation Why Model-Driven Data Warehousing? Problem statement: Data warehousing

More information

CS590D: Data Mining Chris Clifton

CS590D: Data Mining Chris Clifton CS590D: Data Mining Chris Clifton March 10, 2004 Data Mining Process Reminder: Midterm tonight, 19:00-20:30, CS G066. Open book/notes. Thanks to Laura Squier, SPSS for some of the material used How to

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems Proceedings of the Postgraduate Annual Research Seminar 2005 68 A Model-based Software Architecture for XML and Metadata Integration in Warehouse Systems Abstract Wan Mohd Haffiz Mohd Nasir, Shamsul Sahibuddin

More information

Service Oriented Architecture

Service Oriented Architecture Service Oriented Architecture Charlie Abela Department of Artificial Intelligence charlie.abela@um.edu.mt Last Lecture Web Ontology Language Problems? CSA 3210 Service Oriented Architecture 2 Lecture Outline

More information

Lightweight Data Integration using the WebComposition Data Grid Service

Lightweight Data Integration using the WebComposition Data Grid Service Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed

More information

Enterprise Application Designs In Relation to ERP and SOA

Enterprise Application Designs In Relation to ERP and SOA Enterprise Application Designs In Relation to ERP and SOA DESIGNING ENTERPRICE APPLICATIONS HASITH D. YAGGAHAVITA 20 th MAY 2009 Table of Content 1 Introduction... 3 2 Patterns for Service Integration...

More information

Service-Oriented Architectures

Service-Oriented Architectures Architectures Computing & 2009-11-06 Architectures Computing & SERVICE-ORIENTED COMPUTING (SOC) A new computing paradigm revolving around the concept of software as a service Assumes that entire systems

More information

A standards-based approach to application integration

A standards-based approach to application integration A standards-based approach to application integration An introduction to IBM s WebSphere ESB product Jim MacNair Senior Consulting IT Specialist Macnair@us.ibm.com Copyright IBM Corporation 2005. All rights

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

SOA REFERENCE ARCHITECTURE

SOA REFERENCE ARCHITECTURE SOA REFERENCE ARCHITECTURE August 15, 2007 Prepared by Robert Woolley, Chief Technologist and Strategic Planner INTRODUCTION This document is a derivative work of current documentation and presentations

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Grow Revenues and Reduce Risk with Powerful Analytics Software

Grow Revenues and Reduce Risk with Powerful Analytics Software Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,

More information

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.

More information

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise

More information

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Service Oriented Architecture SOA and Web Services John O Brien President and Executive Architect Zukeran Technologies

More information

Meta Data Management for Business Intelligence Solutions. IBM s Strategy. Data Management Solutions White Paper

Meta Data Management for Business Intelligence Solutions. IBM s Strategy. Data Management Solutions White Paper Meta Data Management for Business Intelligence Solutions IBM s Strategy Data Management Solutions White Paper First Edition (November 1998) Copyright International Business Machines Corporation 1998. All

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Building Data Mining Solutions with OLE DB for DM and XML for Analysis

Building Data Mining Solutions with OLE DB for DM and XML for Analysis Building Data Mining Solutions with OLE DB for DM and XML for Analysis Zhaohui Tang, Jamie Maclennan, Peter Pyungchul Kim Microsoft SQL Server Data Mining One, Microsoft Way Redmond, WA 98007 {ZhaoTang,

More information

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle Outlines Business Intelligence Lecture 15 Why integrate BI into your smart client application? Integrating Mining into your application Integrating into your application What Is Business Intelligence?

More information

Model-Driven Architecture: Vision, Standards And Emerging Technologies

Model-Driven Architecture: Vision, Standards And Emerging Technologies 1 Model-Driven Architecture: Vision, Standards And Emerging Technologies Position Paper Submitted to ECOOP 2001 Workshop on Metamodeling and Adaptive Object Models John D. Poole Hyperion Solutions Corporation

More information

SAP BW Connector for BIRT Technical Overview

SAP BW Connector for BIRT Technical Overview SAP BW Connector for BIRT Technical Overview How to Easily Access Data from SAP Cubes Using BIRT www.yash.com 2011 Copyright YASH Technologies. All rights reserved. www.yash.com 2013 Copyright YASH Technologies.

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Research on the Model of Enterprise Application Integration with Web Services

Research on the Model of Enterprise Application Integration with Web Services Research on the Model of Enterprise Integration with Web Services XIN JIN School of Information, Central University of Finance& Economics, Beijing, 100081 China Abstract: - In order to improve business

More information

Databases in Organizations

Databases in Organizations The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron

More information

Version 14.0. Overview. Business value

Version 14.0. Overview. Business value PRODUCT SHEET CA Datacom Server CA Datacom Server Version 14.0 CA Datacom Server provides web applications and other distributed applications with open access to CA Datacom /DB Version 14.0 data by providing

More information

Service-Oriented Architecture and Software Engineering

Service-Oriented Architecture and Software Engineering -Oriented Architecture and Software Engineering T-86.5165 Seminar on Enterprise Information Systems (2008) 1.4.2008 Characteristics of SOA The software resources in a SOA are represented as services based

More information

Java Metadata Interface and Data Warehousing

Java Metadata Interface and Data Warehousing Java Metadata Interface and Data Warehousing A JMI white paper by John D. Poole November 2002 Abstract. This paper describes a model-driven approach to data warehouse administration by presenting a detailed

More information

Getting started with API testing

Getting started with API testing Technical white paper Getting started with API testing Test all layers of your composite applications, not just the GUI Table of contents Executive summary... 3 Introduction... 3 Who should read this document?...

More information

A Generic Database Web Service

A Generic Database Web Service A Generic Database Web Service Erdogan Dogdu TOBB Economics and Technology University Computer Engineering Department Ankara, Turkey edogdu@etu.edu.tr Yanchao Wang and Swetha Desetty Georgia State University

More information

DATA MINING AND WAREHOUSING CONCEPTS

DATA MINING AND WAREHOUSING CONCEPTS CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Universal PMML Plug-in for EMC Greenplum Database

Universal PMML Plug-in for EMC Greenplum Database Universal PMML Plug-in for EMC Greenplum Database Delivering Massively Parallel Predictions Zementis, Inc. info@zementis.com USA: 6125 Cornerstone Court East, Suite #250, San Diego, CA 92121 T +1(619)

More information

Mitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform

Mitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform Mitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform May 2015 Contents 1. Introduction... 3 2. What is BIM... 3 2.1. History of BIM... 3 2.2. Why Implement BIM... 4 2.3.

More information

Chapter 11 Mining Databases on the Web

Chapter 11 Mining Databases on the Web Chapter 11 Mining bases on the Web INTRODUCTION While Chapters 9 and 10 provided an overview of Web data mining, this chapter discusses aspects of mining the databases on the Web. Essentially, we use the

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Global Data Integration with Autonomous Mobile Agents. White Paper

Global Data Integration with Autonomous Mobile Agents. White Paper Global Data Integration with Autonomous Mobile Agents White Paper June 2002 Contents Executive Summary... 1 The Business Problem... 2 The Global IDs Solution... 5 Global IDs Technology... 8 Company Overview...

More information

CA IDMS Server r17. Product Overview. Business Value. Delivery Approach

CA IDMS Server r17. Product Overview. Business Value. Delivery Approach PRODUCT sheet: CA IDMS SERVER r17 CA IDMS Server r17 CA IDMS Server helps enable secure, open access to CA IDMS mainframe data and applications from the Web, Web services, PCs and other distributed platforms.

More information

PROGRESS Portal Access Whitepaper

PROGRESS Portal Access Whitepaper PROGRESS Portal Access Whitepaper Maciej Bogdanski, Michał Kosiedowski, Cezary Mazurek, Marzena Rabiega, Malgorzata Wolniewicz Poznan Supercomputing and Networking Center April 15, 2004 1 Introduction

More information

Model Driven and Service Oriented Enterprise Integration---The Method, Framework and Platform

Model Driven and Service Oriented Enterprise Integration---The Method, Framework and Platform Driven and Oriented Integration---The Method, Framework and Platform Shuangxi Huang, Yushun Fan Department of Automation, Tsinghua University, 100084 Beijing, P.R. China {huangsx, fanyus}@tsinghua.edu.cn

More information

Fast and Easy Delivery of Data Mining Insights to Reporting Systems

Fast and Easy Delivery of Data Mining Insights to Reporting Systems Fast and Easy Delivery of Data Mining Insights to Reporting Systems Ruben Pulido, Christoph Sieb rpulido@de.ibm.com, christoph.sieb@de.ibm.com Abstract: During the last decade data mining and predictive

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,

More information

PMML and UIMA Based Frameworks for Deploying Analytic Applications and Services

PMML and UIMA Based Frameworks for Deploying Analytic Applications and Services PMML and UIMA Based Frameworks for Deploying Analytic Applications and Services David Ferrucci 1, Robert L. Grossman 2 and Anthony Levas 1 1. Introduction - The Challenges of Deploying Analytic Applications

More information

Service Computing: Basics Monica Scannapieco

Service Computing: Basics Monica Scannapieco Service Computing: Basics Monica Scannapieco Generalities: Defining a Service Services are self-describing, open components that support rapid, low-cost composition of distributed applications. Since services

More information

An Oracle White Paper June 2009. Integration Technologies for Primavera Solutions

An Oracle White Paper June 2009. Integration Technologies for Primavera Solutions An Oracle White Paper June 2009 Integration Technologies for Primavera Solutions Introduction... 1 The Integration Challenge... 2 Integration Methods for Primavera Solutions... 2 Integration Application

More information

SOACertifiedProfessional.Braindumps.S90-03A.v2014-06-03.by.JANET.100q. Exam Code: S90-03A. Exam Name: SOA Design & Architecture

SOACertifiedProfessional.Braindumps.S90-03A.v2014-06-03.by.JANET.100q. Exam Code: S90-03A. Exam Name: SOA Design & Architecture SOACertifiedProfessional.Braindumps.S90-03A.v2014-06-03.by.JANET.100q Number: S90-03A Passing Score: 800 Time Limit: 120 min File Version: 14.5 http://www.gratisexam.com/ Exam Code: S90-03A Exam Name:

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Principles and Foundations of Web Services: An Holistic View (Technologies, Business Drivers, Models, Architectures and Standards)

Principles and Foundations of Web Services: An Holistic View (Technologies, Business Drivers, Models, Architectures and Standards) Principles and Foundations of Web Services: An Holistic View (Technologies, Business Drivers, Models, Architectures and Standards) Michael P. Papazoglou (INFOLAB/CRISM, Tilburg University, The Netherlands)

More information

Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus

Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 4.0.3 Unit objectives

More information

What You Need to Know About Transitioning to SOA

What You Need to Know About Transitioning to SOA What You Need to Know About Transitioning to SOA written by: David A. Kelly, ebizq Analyst What You Need to Know About Transitioning to SOA Organizations are increasingly turning to service-oriented architectures

More information

PIE. Internal Structure

PIE. Internal Structure PIE Internal Structure PIE Composition PIE (Processware Integration Environment) is a set of programs for integration of heterogeneous applications. The final set depends on the purposes of a solution

More information

Service Oriented Architecture 1 COMPILED BY BJ

Service Oriented Architecture 1 COMPILED BY BJ Service Oriented Architecture 1 COMPILED BY BJ CHAPTER 9 Service Oriented architecture(soa) Defining SOA. Business value of SOA SOA characteristics. Concept of a service, Enterprise Service Bus (ESB) SOA

More information

Guideline for Implementing the Universal Data Element Framework (UDEF)

Guideline for Implementing the Universal Data Element Framework (UDEF) Guideline for Implementing the Universal Data Element Framework (UDEF) Version 1.0 November 14, 2007 Developed By: Electronic Enterprise Integration Committee Aerospace Industries Association, Inc. Important

More information

Service-Oriented Architecture: Analysis, the Keys to Success!

Service-Oriented Architecture: Analysis, the Keys to Success! Service-Oriented Architecture: Analysis, the Keys to Success! Presented by: William F. Nazzaro CTO, Inc. bill@iconatg.com www.iconatg.com Introduction Service-Oriented Architecture is hot, but we seem

More information

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities April, 2013 gaddsoftware.com Table of content 1. Introduction... 3 2. Vendor briefings questions and answers... 3 2.1.

More information

Interacting the Edutella/JXTA Peer-to-Peer Network with Web Services

Interacting the Edutella/JXTA Peer-to-Peer Network with Web Services Interacting the Edutella/JXTA Peer-to-Peer Network with Web Services Changtao Qu Learning Lab Lower Saxony University of Hannover Expo Plaza 1, D-30539, Hannover, Germany qu @learninglab.de Wolfgang Nejdl

More information

UNLOCKING XBRL CONTENT

UNLOCKING XBRL CONTENT UNLOCKING XBRL CONTENT An effective database solution for storing and accessing XBRL documents An Oracle & UBmatrix Whitepaper September 2009 Oracle Disclaimer The following is intended to outline our

More information

University Data Warehouse Design Issues: A Case Study

University Data Warehouse Design Issues: A Case Study Session 2358 University Data Warehouse Design Issues: A Case Study Melissa C. Lin Chief Information Office, University of Florida Abstract A discussion of the design and modeling issues associated with

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

Gradient An EII Solution From Infosys

Gradient An EII Solution From Infosys Gradient An EII Solution From Infosys Keywords: Grid, Enterprise Integration, EII Introduction New arrays of business are emerging that require cross-functional data in near real-time. Examples of such

More information

Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute

Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute JMP provides a variety of mechanisms for interfacing to other products and getting data into JMP. The connection

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

KnowledgeSEEKER Marketing Edition

KnowledgeSEEKER Marketing Edition KnowledgeSEEKER Marketing Edition Predictive Analytics for Marketing The Easiest to Use Marketing Analytics Tool KnowledgeSEEKER Marketing Edition is a predictive analytics tool designed for marketers

More information

A Quick Introduction to SOA

A Quick Introduction to SOA Software Engineering Competence Center TUTORIAL A Quick Introduction to SOA Mahmoud Mohamed AbdAllah Senior R&D Engineer-SECC mmabdallah@itida.gov.eg Waseim Hashem Mahjoub Senior R&D Engineer-SECC Copyright

More information

CUSTOMER Presentation of SAP Predictive Analytics

CUSTOMER Presentation of SAP Predictive Analytics SAP Predictive Analytics 2.0 2015-02-09 CUSTOMER Presentation of SAP Predictive Analytics Content 1 SAP Predictive Analytics Overview....3 2 Deployment Configurations....4 3 SAP Predictive Analytics Desktop

More information

Introduction to UDDI: Important Features and Functional Concepts

Introduction to UDDI: Important Features and Functional Concepts : October 2004 Organization for the Advancement of Structured Information Standards www.oasis-open.org TABLE OF CONTENTS OVERVIEW... 4 TYPICAL APPLICATIONS OF A UDDI REGISTRY... 4 A BRIEF HISTORY OF UDDI...

More information

Pentaho Reporting Overview

Pentaho Reporting Overview Pentaho Reporting Copyright 2006 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Data Grids. Lidan Wang April 5, 2007

Data Grids. Lidan Wang April 5, 2007 Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural

More information

IBM WebSphere application integration software: A faster way to respond to new business-driven opportunities.

IBM WebSphere application integration software: A faster way to respond to new business-driven opportunities. Application integration solutions To support your IT objectives IBM WebSphere application integration software: A faster way to respond to new business-driven opportunities. Market conditions and business

More information

Combining Service-Oriented Architecture and Event-Driven Architecture using an Enterprise Service Bus

Combining Service-Oriented Architecture and Event-Driven Architecture using an Enterprise Service Bus Combining Service-Oriented Architecture and Event-Driven Architecture using an Enterprise Service Bus Level: Advanced Jean-Louis Maréchaux (jlmarech@ca.ibm.com), IT Architect, IBM 28 Mar 2006 Today's business

More information

Oracle Identity Analytics Architecture. An Oracle White Paper July 2010

Oracle Identity Analytics Architecture. An Oracle White Paper July 2010 Oracle Identity Analytics Architecture An Oracle White Paper July 2010 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may

More information

Nagarjuna College Of

Nagarjuna College Of Nagarjuna College Of Information Technology (Bachelor in Information Management) TRIBHUVAN UNIVERSITY Project Report on World s successful data mining and data warehousing projects Submitted By: Submitted

More information

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications Gaël de Chalendar CEA LIST F-92265 Fontenay aux Roses Gael.de-Chalendar@cea.fr 1 Introduction The main data sources

More information

MicroStrategy Course Catalog

MicroStrategy Course Catalog MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Model Driven Interoperability through Semantic Annotations using SoaML and ODM

Model Driven Interoperability through Semantic Annotations using SoaML and ODM Model Driven Interoperability through Semantic Annotations using SoaML and ODM JiuCheng Xu*, ZhaoYang Bai*, Arne J.Berre*, Odd Christer Brovig** *SINTEF, Pb. 124 Blindern, NO-0314 Oslo, Norway (e-mail:

More information

Service-Oriented Architecture and its Implications for Software Life Cycle Activities

Service-Oriented Architecture and its Implications for Software Life Cycle Activities Service-Oriented Architecture and its Implications for Software Life Cycle Activities Grace A. Lewis Software Engineering Institute Integration of Software-Intensive Systems (ISIS) Initiative Agenda SOA:

More information

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2 Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on

More information

Introduction to Web services architecture

Introduction to Web services architecture Introduction to Web services architecture by K. Gottschalk S. Graham H. Kreger J. Snell This paper introduces the major components of, and standards associated with, the Web services architecture. The

More information

Oracle Warehouse Builder 10g

Oracle Warehouse Builder 10g Oracle Warehouse Builder 10g Architectural White paper February 2004 Table of contents INTRODUCTION... 3 OVERVIEW... 4 THE DESIGN COMPONENT... 4 THE RUNTIME COMPONENT... 5 THE DESIGN ARCHITECTURE... 6

More information

Course 103402 MIS. Foundations of Business Intelligence

Course 103402 MIS. Foundations of Business Intelligence Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:

More information

Enhancing A Software Testing Tool to Validate the Web Services

Enhancing A Software Testing Tool to Validate the Web Services Enhancing A Software Testing Tool to Validate the Web Services Tanuj Wala 1, Aman Kumar Sharma 2 1 Research Scholar, Department of Computer Science, Himachal Pradesh University Shimla, India 2 Associate

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Chapter 2: Cloud Basics Chapter 3: Cloud Architecture

Chapter 2: Cloud Basics Chapter 3: Cloud Architecture Chapter 2: Cloud Basics Chapter 3: Cloud Architecture Service provider s job is supplying abstraction layer Users and developers are isolated from complexity of IT technology: Virtualization Service-oriented

More information

Introduction into Web Services (WS)

Introduction into Web Services (WS) (WS) Adomas Svirskas Agenda Background and the need for WS SOAP the first Internet-ready RPC Basic Web Services Advanced Web Services Case Studies The ebxml framework How do I use/develop Web Services?

More information

Business Intelligence and Service Oriented Architectures. An Oracle White Paper May 2007

Business Intelligence and Service Oriented Architectures. An Oracle White Paper May 2007 Business Intelligence and Service Oriented Architectures An Oracle White Paper May 2007 Note: The following is intended to outline our general product direction. It is intended for information purposes

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7 No. 7, September-October 2008 Applications At Your Service Mahesh H. Dodani, IBM,

More information

Make Better Decisions Through Predictive Intelligence

Make Better Decisions Through Predictive Intelligence IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly

More information

Business Process Management with @enterprise

Business Process Management with @enterprise Business Process Management with @enterprise March 2014 Groiss Informatics GmbH 1 Introduction Process orientation enables modern organizations to focus on the valueadding core processes and increase

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES Out-of-box integration with databases, ERPs, CRMs, B2B systems, flat files, XML data, LDAP, JDBC, ODBC Knowledge

More information

Oracle Service Bus Examples and Tutorials

Oracle Service Bus Examples and Tutorials March 2011 Contents 1 Oracle Service Bus Examples... 2 2 Introduction to the Oracle Service Bus Tutorials... 5 3 Getting Started with the Oracle Service Bus Tutorials... 12 4 Tutorial 1. Routing a Loan

More information

The Prophecy-Prototype of Prediction modeling tool

The Prophecy-Prototype of Prediction modeling tool The Prophecy-Prototype of Prediction modeling tool Ms. Ashwini Dalvi 1, Ms. Dhvni K.Shah 2, Ms. Rujul B.Desai 3, Ms. Shraddha M.Vora 4, Mr. Vaibhav G.Tailor 5 Department of Information Technology, Mumbai

More information

Introduction: Database management system

Introduction: Database management system Introduction Databases vs. files Basic concepts Brief history of databases Architectures & languages Introduction: Database management system User / Programmer Database System Application program Software

More information

Introduction to Web Services

Introduction to Web Services Department of Computer Science Imperial College London CERN School of Computing (icsc), 2005 Geneva, Switzerland 1 Fundamental Concepts Architectures & escience example 2 Distributed Computing Technologies

More information

Leveraging Service Oriented Architecture (SOA) to integrate Oracle Applications with SalesForce.com

Leveraging Service Oriented Architecture (SOA) to integrate Oracle Applications with SalesForce.com Leveraging Service Oriented Architecture (SOA) to integrate Oracle Applications with SalesForce.com Presented by: Shashi Mamidibathula, CPIM, PMP Principal Pramaan Systems shashi.mamidi@pramaan.com www.pramaan.com

More information