Robert Takoushian, CVS/Caremark Data Architect Session Code DM04
Speaker Bio Twenty-five years data architecture/modeling experience in: - Manufacturing - Financial Services - Entertainment -Medical Information Education: MBA Finance; BA Economics Articles published: The Naming Game. Database Programming & Design March 1992 Domain Classification: A Scalar Approach to Data Names. Research Institute of America March 1993 Guerilla Tactics for the Stealth DA. Database Programming & Design April 1992 The Dear John DA. Database Programming & Design May 1994 Winning the Data Race. Database & Programming Design April 1998 Paying Respect to the Data Architect. Intelligent Enterprise - Closed Loop May 2001 Insert Session Title via Insert > Header&Footer
Premise: Well-formed domain names are the foundation of the data model and the file system over which they are built. Definition: A domain is a value constraint usually implemented as a distinct set of values (discrete, range, derived/combination); all the unique values which a data element may contain. Wikipedia Objectives: To quickly build a domain dictionary around our domains. To enable a dictionary-based data model. Why? Because it s empowering.
Master of Your Domain Session Agenda Capture legacy element metadata Distill domains from elements Load domains into ERwin data dictionary (DD) Attach naming standards /name macros to DD entries Explore the power of modeling with the DD
Master of Your Domain ASSUMPTIONS You want to re-engineer or re-factor a set of legacy data structures (RDB or flat file). You need to jump start bottom-up analysis. You want to focus mainly on business data (vs. system control data). You want to add value to minimal legacy schema metadata (column name / data type / length, description). << = Why is that greigh? You want to develop a set of well-formed candidate domain names. You lack redundancy auditing tool$. You re not motivated enough to manually key in hundreds of domains into the ERwin DD. The good news is, for an upfront fee, you don t have to.
Every block of stone has a statue inside it and it is the task of the sculptor to discover it Michelangelo
JOB 1: Extract/capture minimal legacy metadata: from DBMS catalog, Meta Integration Bridge for COBOL, etc.
JOB 2: Distill raw schema metadata Raw table/column metadata: distilled to a distinct set of column names + datatypes (+ definitions). Perform redundancy audit on names, attributes, and definitions. Use whatever clues you can find to resolve synonyms and homonyms. For element names, identify your key words: - Key word in context (KWIC) = Element name variations using the same key words Date of Birth vs. Birth date - Key word out of context (KWOC) = element name variations using different key words SEX_CD vs. GENDER_CD - For attributes, rely on length, format, significant digits, etc. for clues: CMNTS char(1) vs. CMNTS varchar(255) -For definitions, get accustomed to reading between the lines; should be most reliable source of audit info. In practice, they are often the least reliable. A boat is a ship; A ship is a boat.
JOB 3: Setup WAS-IS list: a high value-add process. A valuable cross-reference tool for data conversion / migration Assign well-formed domain name to each column name; include class word suffix. Resolve homonyms: add qualifiers where necessary Resolve synonyms: strip column name prefixes; rationalize to common domain name
Populate domain staging table: Prepare / format entries in MS-Access for the bulk domain load utility (Excel) Use Append query to copy domains from WAS-IS into staging table; Column order is positionally dependent. Use default parent domains only. If necessary, re-assign to new parent after import. 11
ERwin bulk domain load utility (Excel-based API) L-P Domains data tab
ERwin bulk domain load utility (Excel-based API) Update tab
ERwin bulk domain load utility Update Complete message
Retain the WAS-IS list A valuable cross-reference or data mapping tool for data conversion/migration.
Thank You Questions? Robert Takoushian 480-707-6585 Robert.Takoushian@Caremark.com http://www.thedatarat.com LinkedIn: http://www.linkedin.com/pub/robert-takoushian/2/a68/54 He who doesn t lay his foundations beforehand may, by great abilities do so afterward, although with great trouble to the architect and danger to the building. Nicolo Machiavelli
Legal notice Copyright 2012 CA. All rights reserved. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. No unauthorized use, copying or distribution permitted. THIS PRESENTATION IS FOR YOUR INFORMATIONAL PURPOSES ONLY. CA assumes no responsibility for the accuracy or completeness of the information. TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS DOCUMENT AS IS WITHOUT WARRANTY OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. In no event will CA be liable for any loss or damage, direct or indirect, in connection with this presentation, including, without limitation, lost profits, lost investment, business interruption, goodwill, or lost data, even if CA is expressly advised of the possibility of such damages. Certain information in this presentation may outline CA s general product direction. This presentation shall not serve to (i) affect the rights and/or obligations of CA or its licensees under any existing or future written license agreement or services agreement relating to any CA software product; or (ii) amend any product documentation or specifications for any CA software product. The development, release and timing of any features or functionality described in this presentation remain at CA s sole discretion. Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA product release referenced in this presentation, CA may make such release available (i) for sale to new licensees of such product; and (ii) in the form of a regularly scheduled major product release. Such releases may be made available to current licensees of such product who are current subscribers to CA maintenance and support on a when and if-available basis.