RS MDM 2009 Integration Guide This document provides the details about RS MDMCenter integration module and provides details about the overall architecture and principles of integration with the system. Riversand
Copyright 2001-2009 Riversand Technologies, Inc. All rights reserved. Any technical documentation made available by Riversand Technologies, Inc is the copyright work of Riversand Technologies, Inc and owned by Riversand Technologies, Inc. TRADEMARKS Riversand, the Riversand logo, and Riversand MDMCenter and ProductCenter are U.S. trademarks or registered trademarks of Riversand Technologies, Inc. Other brands and product names mentioned in this guide are trademarks or registered trademarks of their respective owners and hereby acknowledged. DISCLAIMER NO WARRANTY. The technical documentation is being delivered to you AS IS, and Riversand Technologies, Inc makes no warranty as to its accuracy or use. Any use of the technical documentation or the information contained therein is at the risk of the user. Documentation may include technical or other inaccuracies or typographical errors. Riversand Technologies, Inc reserves the right to make changes without prior notice. No part of this publication may be copied without the express written permission of Riversand Technologies Inc., 9800 Richmond Ave, Suite #140, Houston, TX 77042, U.S.A. TECHNICAL SUPPORT The Technical Support group s primary role is to respond to specific questions on product features/functions, installation, and configuration. Our support provides rapid response and up tothe minute information. When Contacting the Technical Support group, please have the following information: Product release/version level Hardware information Available memory, disk space, NIC information Problem description o o o Error messages/log files Troubleshooting performed prior to contacting Riversand Technologies, Inc Recent software configuration changes and/or network changes 2
Support from Riversand Technologies, Inc is available by telephone, fax or mail in a variety of languages. Mail: Riversand Technologies Inc., 9800 Richmond Ave, Suite #140, Houston, TX 77042, U.S.A. Telephone: 713.934.8899 Fax: 713.934.8845 email: support@riversand.com CUSTOMER SERVICE To contact Riversand Technologies, Inc. Customer Service, please call 713.934.8899 or mail to Riversand Technologies, Inc, 9800 Richmond Ave, Suite #140, Houston, TX 77042, U.S.A. Customer Service is available to assist with the following types of issues: Questions regarding product licensing Product registration updates such as address or name changes General product information (features, language availability, local sales personnel) Latest information on product updates and upgrades Advice on Riversand s technical support options Non technical pre sales questions SEND YOUR COMMENTS Riversand welcomes your comments and suggestions on the quality and usefulness of this document. Your input is an important part of the information used for revision. If you find any errors or have any other suggestions for improvement, please indicate the chapter, section, and page number (if available). Send comments using any of the technical support options. If you would like a reply, please provide your name, address, and telephone number. COMPANY INFORMATION For more information on Riversand Technologies, Inc., visit www.riversand.com 3
Contents MDMCenter Architecture... 5 Riversand MDM Framework... 6 MDM Services... 6 Metamodel Services... 7 Integration Services... 7 Data Quality and Analysis Services... 7 Administration Services... 8 UI Services... 8 WCF Services... 8 MDMCenter Integration Module Overview... 9 Integration Channels... 9 Application Programming Interface (API)... 10 File system... 10 Message Queues... 11 File Transfer Protocol (FTP)... 11 User Interface... 11 Scope Definition... 11 Mapping and Transformation Definition... 11 Change Management Process... 14 Change Management for Aggregation... 15 Change Management for Syndication... 17 Orchestration... 18 Schedulers... 18 Listeners... 18 4
MDMCenter Architecture Riversands flagship product MDMCenter is a state of the art application to help enterprises in creating and managing the overall Master Data Management strategy of the enterprise. MDMCenter is based completely on Service Oriented Architecture principles and is an extremely modular system that is built on a robust, flexible and extensible data model. Based on this data model are built a comprehensive set of MDM services that provide all required functionality to manage master data. These MDM services are autonomous services that accept one or more requests, and returns one or more responses via a set of published and well defined interfaces. These services are loosely coupled, independent modules that communicate with each other and with externally systems using the Windows Communication Framework (WCF) framework and collectively provide the extensive functionality required for master data management. The application is a three tier model with a database tier, an application layer that runs on IIS, and a web browser based client. In addition, the application exposes all its services using WCF for use by external systems to access the functionality in the MDM system. In this section, we will provide an overview of the technical architecture of the system and the interactions between the different layers of the system and external systems. Riversand MDMCenter is developed on a base proprietary MDM framework and all functionality in the system is encapsulated as a service that is consumed either internally by the application or externally by other systems. The application is built using.net Framework 3.5 and services are developed using WCF. WCF is the next generation of distributed application development technology that provides the communication infrastructure based on the notion of services that have well defined interfaces. Figure 1 below shows a schematic of the architecture of the system and its different layers. 5
In the next section, we will describe each of the layers in detail and elaborate on the interactions between these layers. Riversand MDM Framework The Riversand MDM framework represents the core object models of the application and provides the foundation for building all the services for the application. The MDM framework contains the core objects of the application and defines the structure and interfaces for these objects. These objects serve as the building blocks of the application and provide a comprehensive object model to represent the various complex business entities that need to be created in the MDM system. The MDM framework represents extensive research and development in the MDM domain and is a set of very powerful, flexible and extensible objects and their base level interactions MDM Services The Riversand MDM Services layer provides all the functionality required to accomplish complex MDM business scenario requirements. The MDM services are built on top of the MDM framework and are a composite library of the services that are required to accomplish the business needs. The MDM services are composed of four sets of services 1. Metamodel Services 6
2. Integration Services 3. Data Quality and Analysis Services 4. Admistrative Services Each of these services independently provide services required for a key component of the MDM application and are designed to work seamlessly with each other to provide a comprehensive suite of services to accomplish even the most complex MDM task. Metamodel Services These services are the building block services that enable the creation and maintenance of the MDM core MDM objects: Entities for creation and management of entities like Products, Vendors and Customers. Hierarchies for categorization of the Entities and providing views of the entities Attributes for creating attributes and associate rules and managing data Relationships to maintain interrelationships between the entities Integration Services These services provide the necessary functionality to control the flow of data in the application and serves as the interface to transfer data to and from the database. These services enable: Creation and management of integrations with external systems Initiation and orchestration of the data flow within the application and with external systems, Enable data access functionality to communicate with the database. Data Quality and Analysis Services These services provide the functionality to conduct data quality functions and analysis services on the MDM application: Data Quality services to ensure enforcement of data governance rules, data quality standards and data transformation services Data Analysis services to enable reporting, analysis and investigation of data stored in the MDM application Search Services to enable searching for data within the MDM application. 7
Administration Services These services provide the required administrative services to ensure proper usage and segmentation of the features to the users of the other services. The administrative services cover: Authorization services to ensure that accurate authorization is applied at each level to determine what rights and permissions users have while using the system. Eventing services to capture events as they occur within the system and enable external systems to consume these events. Policy management services to ensure that appropriate business policies are enforced in the MDM system. UI Services These services provide the functionality to render and display the data as well as provide functionality for business reporting and analysis. The UI services module consists of: Human workflow services to provide functionality for creating and executing workflows for managing the process of master data management. Business Intelligence services to provide reporting and analysis functions. Scheduling services to provide management of the various events occurring in the application. Security services to provide authentication capabilities to ensure proper access to the system. WCF Services These services serve as the core SOA enablers and provide functionality to establish communication between MDMCenter and other systems. The WCF services provide the interfaces and establish the contracts, security and transaction context for the communication. The WCF services provide a web services interfaces or a Native API interface that can be used to communicate with MDMCenter. In the following section, we will describe the integration module in detail and provide an overview of the processes involved in integrating with MDMCenter. 8
MDMCenter Integration Module Overview MDMCenter Integration module provide functionality for purposing data from multiple external sources into the systems (Aggregation) and providing the data to multiple systems that require the master data (Syndication). The figure below show a high level schematic of the steps involved in the integration process. We will describe each of these steps in detail in the following sections. Identify integration channel Direct Using Web Services or Native APIs SOAP/RFC Indirect File system, FTP, MSMQ, etc Define Scope Identify the Master records, attributes, categories, and relationships to be imported/exported. Create/Use existing Map/Transform Configure Change Management Select source data formats (Imports) Identify target data in MDM Map to MDM Entities Define transformations/masks Select Full or Delta mode Select entity level or attribute level delta Create Orchestration Scheduling Full Loads, Deltas Integration Channels All integrations of the RS MDMCenter platform with external systems occur through the support integration channels. RS MDMCenter support a extensive range of integration channels providing the capability to integrate with all major ERP, CRM, EAI/ETL, DW, and other enterprise systems. The figure below shows all the integration channels supported by the RS MDMCenter. 9
Message Queues Application Programming Interface (API) File Transfer Protocol (FTP) File System WCF Services User Interface RS MDM Services Fig. 1: Integration Channels Application Programming Interface (API) APIs are predefined methods and interfaced exposed by the external systems and the RS MDMCenter to provide the ability to transfer data between the RS MDMCenter and the external system. These interfaces and methods can be invoked over HTTP(S) using SOAP communication channel or on the local network using RFCs. The two method of invoking APIs supported by RS MDMCenter are: Web services Native APIs Remote Function Call (RFC) File system Data import/export from the RS MDMCenter can be done by using physical files and providing access to them using the windows file system. Import of data can be achieved by dropping physical files in to the windows file system or folder. Similarly, files can be exported from the MDMCenter into a specific file 10
system folder. The files supported include XML, comma separated (SCV), excel (XLS), and other text based delimited files. Message Queues Message queue are an effective means of transferring data asynchronously between enterprise systems. RS MDMCenter provided the ability to transfer data from external systems using message queues. The various message queues supported include MSMQ, MQ Series, and other compatible queuing services. Message queues are supported for both inbound and outbound data transfer. File Transfer Protocol (FTP) The RS MDMCenter can access files using the FTP protocols. External systems can setup FTP websites to send data files to the RS MDM system. To export data, RS MDMCenter can transfer data to files and copy them over to an FTP website. User Interface Users can directly import or export data from the RS MDMCenter using the user interface. Files can be directly uploaded using the RS MDMCenter web application. Similarly files can be downloaded from the web application interface. Scope Definition This step defines the targeted data that is affected by the integrations. For imports, elements such as entity type, relationship type, attributes, category, and locale can be specified. MDMCenter limits the range for affected entity instance based on the scope specified. Specific attributes can be loaded via an inbound integration. For exports, the target data that needs to be sent to the external system can be specified. Only data that is selected in the scope will be sent to the external system. Additional business rule can be applied, for example to send data for an entity only when price is greater than 0 or when the state of an Item is active. Each external system could potentially have its own requirement, and business rules can be configured accordingly. Mapping and Transformation Definition Mapping is the process of defining the correspondences between the records and fields in the source specification and the records and fields in the destination specification. Transformation is the process of changing the data based on specific business rule and transformation maps. The Integration mapping module shows a graphical representation of a map that can include simple value copy translations, 11
commonly referred to as links, and complex structural manipulations. Using these elements, data can be mapped between a source to target containers, catalogs, locales, entities, entity types, relationships, relationship types, and attributes. Using the Integration mapping module, to mitigate errors while creating mapping specifications, several features have been developed to simplify the mapping process including: required columns, automated mapping, schema validation, and error handling. Required Columns: Source columns in a file can be made mandatory or optional. For instance, if a trading partner(s) was to provide product data with 5 attributes name, upc, color, size, and shape the attributes Name and UPC maybe mandatory while the other 3 are optional. If the name column or UPC was missing from the file, then an inbound would intentionally fail and prevent loading the bad data. Conversely, since color, size, and shape are optional, if the source data does not have one or even all three of the columns, then the integration would load name, upc, and any one of the optional attributes, if provided, successfully. Automated Mapping: Automated mapping expedites the process where inbound attributes are linked to existing target attributes or linked to target attributes via synonyms within Riversand MDM application. Users can easily remove this association between source to target and break the link, if necessary. o For instance, an inbound file contains 4 Columns customer name, address, state, zip code. Similarly, the Riversand MDM Suite has attributes that exist that are named customer name, address, state, zip code. Since the inbound columns and the MDM attributes have the same name, customer name, address, state and zip will automatically map to each other. o Alternatively, a file containing a field named part number automatically maps to sku as well as part number. Schema Validation: Riversand s Integration module can enforce target attributes metadata 12 properties on the inbound source attribute as part of a data validation process. Schema validation supports: Strong Typed Data types, allowed values, minimum and maximum lengths. If schema validation is enabled, the data that mismatches the metadata for the attribute will be flagged as error. A source attribute may contain alpha number characters and is mapped to an attribute that has been defined as an integer. Likewise, if a source attribute value is not in an
allowed valued list, this value will be flag as error. Attribute length can be enforced; i.e. If an attribute value was that of a UPC, minimum and maximum length could be defined for both as 12. Any source attribute that does not contain 12 digits would be identified as error. Error Handling: The integration module provides detailed error handling and reports of all the errors that occur in the integration process. All errors are logged to the database and also to event logs. Inbound integration support mapping of the following objects: 1. Entity Type Mapping: Integrations support import and export of any combination of defined entity types. On inbound integrations, entity types can be provided via the data source or specified for a specific integration profile. Some of the common examples of MDM entity types are Customer, Vendor, Products, Plants, etc. 2. Entity Mapping. Entity instances can be defined by a single Unique Identifier, multiple Unique Identifiers, combination of single unique identifiers, or a combination of multiple Unique Identifiers. Unique identifiers can be attribute(s), a parent category, attribute instance(s), and internal Riversand Unique Identifiers. a. For instance, a book could uniquely identified by any one of the following attributes; UPC, ISBN, or EAN. Let s take for instance, we have 3 suppliers that provide data regarding this book, but each supplier uses their own identifier. Supplier 1 only provides data using a UPC. Supplier 2 only provides data using the ISBN. Depending on the supplier for inbound data, the entity instance is identified using the appropriate unique identifier. b. Multiple Unique Identifiers maybe required to uniquely identity an instance of an entity. Using the book example, a particular supplier may uniquely identify a book by title, publisher location, and publisher date. c. Within Riversand MDM Suite, the same book could also be identified by an internal Riversand Unique Identifiers. On Entity instances, properties such as Create or Update can also be defined. For instance, some data provider supplies only auxiliary data for a given product, but under no circumstance would the data provided be permitted to create a new product. 13
3. Relationship Type Mapping: Integrations support import and export of any combination of defined relationship types. On inbound integrations, relationship types can be provided via the data source or specified for a specific integration profile. 4. Attribute Mapping: Attribute values can be loaded to an assortment of attribute types. On inbound integrations, attribute permissions have attribute specific properties that can be controlled to most granular levels. Attributes value can have Add, Update, and Delete properties. Keywords are supported for deleted values. For instance, on an inbound integration, multiple attributes can have data provided. Attribute 1 is only permitted to add, attribute 2 is only permitted to update, and Attribute 3 is only permitted when Delete keyword is provided. Furthermore, this combination can be mixed and matched. a. Simple Attributes and Collection attributes can be flushed and filled. b. Comparing existing target data to source data is supported. By default it s enabled, but can be disabled. When attribute values match, then no update is recorded preventing unnecessary history. c. Leading Zeros on Decimal and Integer values can be automatically trimmed during inbound integrations. For instance, source data may provide 000001 for integer value 1. Rather than loading the 000001, the leading zeros are trimmed and 1 is loaded. 5. Category Mapping: Integrations support import and export of any combination of defined entity types. On inbound integrations, entity types can be provided via the data source or specified for a specific integration profile. Change Management Process MDMCenter supports three different change management modes Full Flush and Fill Mode Full Update Mode Delta Update Mode The first two modes expects a full view of data from the source system, The Full Flush and Fill Mode implies a full refresh of the data after flushing the existing entity attributes and replacing it with the source data. The Full Update Mode updates the current data with the source data without replacing the existing entity attributes. Both these modes are designed as a batch process. The Delta Update Mode is designed to consume a partial view of the data from source system and updates only the specific list of 14
attributes that have been mapped. The delta processing mode supports both batch and real time processing. In the case of batch design, the delta processing can leverage the same technology components and services used for the full data load. The architecture enables the initial load and batch delta processing to support high performance and scalable ETL functionality. Real time delta processing, on the other hand, takes full advantage of service oriented architecture, implements a set of transactional API to consume and deliver data. MDMCenter stores and manages a system s complete interaction history across all channels Web, FTP, Queue, and API and so on. Other integrating systems can populate MDMCenter with interaction history information using data and application services. By providing a central source of interaction information for the customer, organizations can ensure more consistent service delivery across their existing communications channels. This history information is used for determining the incremental changes for each of the integrating systems. Change Management for Aggregation Data aggregation into MDMCenter can be performed in the Full mode and Delta mode. The Full Load and Delta Load scenarios entails the ETL processing of data from the source systems that normally occurs at a regular interval (hourly, daily, weekly or monthly). Some source systems can only produce a fully refreshed file and are incapable of producing a delta updates that only reflect the changes. The next two sections describe how data from these source systems can be loaded into MDMCenter. The third section describes the scenario when the source system is able to send only delta changes for processing. In all the three scenarios, after comparing the data between the source and target systems, even if the attribute values are same, the system gives an option an option to update the existing attribute values and create version histories. Full Flush and Fill Mode In this scenario, the data coming from the source system is compared with the data in the MDM system based on the record keys to determine inserts, updates and deletes. The diagram below describes how inserts, updates and deletes are handled in the system. 15
Inserts Creation of New Entities and Attributes Source Data Comparison with Target Updates Flush Existing Attributes and Replace with New Attributes Deletes (optional) Flush Entities and Attributes In the case of updates, all the attributes are fully flushed and replaced with the attributes from the source system. Full Update Mode In this scenario, the data from the source system is updated in the MDM system without replacing the already existing attributes. The behavior of inserts and deletes are exactly the same as the Flush and Fill Mode. Inserts Creation of New Entities and Attributes Source Data Comparison with Target Updates Add/Update Attributes without flusing the existing attributes Deletes (optional) Flush Entities and Attributes Delta Update Mode In Delta Update Mode, the data coming from the source system is already filtered and contains only changes in the source. The data from the source is compared with the data already existing in target to 16
identify insert/update/delete changes that have occurred. Only the changes are processed and the data in the target system is updated. Inserts Creation of New Entities and Attributes Source (Delta) Data Comparison with Target Updates Add/Update Attributes Deletes (optional) Flush Entities and Attributes Change Management for Syndication Data syndication from MDMCenter can be performed in the Full mode and Delta mode. In both Full and Delta modes, only data that has been selected in the scope specification is sent to the target system. For example, some target system will be interested only in the category specific attributes and another system might be interested only in entity relationships. Based on the profile, the syndication server sends only the data that is subscribed by the target. Full Syndication Mode The Full Syndication Mode exports data based on the scope and filters defined in the contract between the source and target systems. The profile can be orchestrated to deliver data to the target system in a specific time interval or can be manually triggered as needed. Every time the profile is invoked, the full view of the data in the source system is syndicated to the target. It s the responsibility of the target system to identify the changes and update its data. Delta Syndication Mode The delta syndication mode is a powerful concept to limit the amount of data that is transferred to the external systems as well as the frequency of the data transfer. To limit the amount of data transferred, the Delta Syndication Mode exports only changes made in the source system from the time of the last export to the target system. To limit the frequency of data transfer, the delta mode provides the ability to specify attribute triggers that trigger the syndication. For example, if a system is only interested in price changes, then the delta for that system is only triggered when the price changes, whereas another system will be interested in all changes, and so the delta is triggered for any change in the system. The 17
system keeps a record of all the syndications between the source and the target system and calculates the differential changes that happened between two export cycles. The Delta Syndication profiles are usually orchestrated to deliver data to the target system in specific time intervals. The system can be configured to guarantee data delivery by waiting for an acknowledgement to be received from the target system. In case the acknowledgement is not received, the source system marks the transaction as incomplete and sends the differential changes from the last successful delivery to the next syndication cycle. Orchestration Orchestration defines how data flows from the external system into the RS MDMCenter or vice versa. The application supports either a push or pull of data from the integration channel specified. RS MDM services can also listen to specific channels for data transfer. Schedulers The integration services can be scheduled to run one time or at periodic intervals. Scheduling intervals can be hourly, daily, weekly, monthly or at specific interval specified during profile creation Listeners For integration channels like the file system and message queues, the RS MDMCenter integrations can be setup to be run in a listener mode. The instant a message is received in the message queue it would be picked up by the integration services and processed. Similarly, once a file is dropped into the designated file system, the file would be picked up by the integration services and processed 18