Implementing Data Governance at Grifols: Best Practices and Lessons Learned Praneeth Padmanabhuni, Grifols Inc. Richard Hauser, Decision First Technologies SESSION CODE: 0204
LEARNING POINTS Discuss how SAP Information Steward can assist in establishing a Data Governance program Enable power users in the business to own data processing and be responsible for data quality Remove manual steps to automate data processing as much as possible Extend the out-of-the-box visualizations available in Information Steward scorecards by utilizing repository metadata Involve data stewards directly in de-duplication efforts via Match Review Tasks in Information Steward
Who Is Grifols? International healthcare company based in Barcelona, Spain with offices the Raleigh, NC as well as Los Angeles Develop and distribute life-saving protein therapies derived from human plasma Have experienced rapid growth over the past few years as a result of mergers and acquisitions
Challenges as a Growing Company 70+ files to collect data from on a monthly basis 70+ varying degrees of data quality!!! Only want to count sales at the closest point to an actual consumer Data warehouse had previously been outsourced, but volumes had reached a point where insourcing became a more attractive option Data cleansing was being performed manually via Excel files, but using a tool to process large volumes became a necessity
Decision First Technologies Who we are Atlanta-based SAP Business Objects specialists Partnered with SAP 7x Business Objects Partner of the Year SAP Business Objects, SAP EIM, and SAP HANA experts What we do Strategize and implement Data Governance solutions BI Nirvana 90 day Business Intelligence on HANA Full lifecycle data warehouse implementations Data visualizations and standard reporting
Data Governance Defined Core business process that ensures data is treated as a corporate asset and is formally managed throughout the enterprise Marriage of the following programs: Data Quality Information Management policies Security Business process management Risk management
Information Steward Information Steward was chosen to be used as the tool to help implement initial DG policies Integrates nicely with DS, which was already in use Gives visibility to data quality issues Easy for business users to pick-up and run with Not a fully blown master data solution, more of an MDM-lite
Challenges at time of enlisting DFT Cluttered ETL environment Many manual steps needed for weekly processes Data issues popping up weeks after loading of flat files Users not trustworthy of account master data
Solutions Put Forward Implement best practices in ETL environment Multiple developer repositories, central repositories, and best practices naming conventions Combine and automate common ETL jobs to the fullest extent possible Give visibility to data quality by developing an Information Steward scorecard Improve the customer account matching process and utilize DS cleansing transforms to build user trust in the data warehouse
ETL Coding Best Practices Multiple repos and landscapes Previously just PRD One repository per developer Fully fleshed-out DEV, QA, and PRD to properly test Central repo for each environment Allows for versioning and rollback in event of unintended consequences Moving objects to central forces developers to fully understand the impacts they are having to all objects Naming standards Objects properly named, data that is being sourced from or written to, initial/delta load, number in sequence if applicable E.g. DF_ACCOUNT_MASTER_INT_D
ETL Automation Combine objects into jobs, workflows, etc Went from 15 steps down to 3-4 depending on data Code objects for reusability, not one-off executions Standardize variables across all jobs and conform to a template job format Job Execution Table, Job Start Script Give power users authority to process data when ready by allowing them to run certain DS jobs that they are responsible for
DQ Visualizations Needed a way to assess DQ before it became an issue IS Data Insight was the best solution for our purposes Same data validation rules could be applied to all distributors Limit the data being analyzed to only most recent month Built an event-based process chain in the CMC to seamlessly integrate this step into the normal weekly ETL jobs
Original Sales Staging Process
New Sales Staging process with DQ
DQ Reporting Enhancements Extract data from appropriate tables/views in the IS repository database every time new DQ data is available Historical scores are readily available from the following database views: MMB_DATA_GROUP Contains project names, among many other things MMB_KEY_DATA_DOMAIN Key Data Domain descriptions MMB_KEY_DATA_DOMAIN_SCORE Historical scores for every active quality object MMB_DOMAIN_VALUE Quality dimension descriptions
MMB_KEY_DATA_DOMAIN_SCORE Contains scores for KDDs, QDs, Rules, Bindings, by key data domain, which is attached to a scorecard Column to select score type is KEY_DATA_DOMAIN_SCORE_TYPE_CD TOTL = Key Data Domain Score KDDQ = Quality Dimension Score KDDR = Rule Score KDDB = Rule Binding Score
Information Steward Repo Joins MMB_DATA_GROUP.DATA_GROUP_ID = MMB_KEY_DATA_DOMAIN.PROJECT_ID (Project description) MMB_KEY_DATA_DOMAIN.KEY_DATA_DOMAIN_ID = MMB_KEY_DATA_DOMAIN_SCORE.KEY_DATA_DO MAIN_ID where score_type_cd = TOTL (for KDD scores) MMB_KEY_DATA_DOMAIN_SCORE.SCORE_ID = MMB_DOMAIN_VALUE.DOMAIN_VALUE_ID where score_type_cd = KDDQ ( for Quality Dimension scores)
Automated DQ Chain
Automated DQ Chain
Scorecard
Scorecard Drilldown
DQ Webi Report
DQ Webi Report Drilldown
Account Master Cleanup Requirements Needed to prove to the business that account master data was trustworthy Too many overmatch and undermatch scenarios existed in the old account master Could not start from scratch because internal data had been matched to an external data source by a third party Needed the cleanup effort to have data steward input for uncertain matches Little impact as possible on all current processes
Account Master Cleanup, Step 1 Identify overmatch scenarios, i.e. accounts that had been incorrectly matched together Run all current accounts with their children through a data quality match transform Break key is on Data Warehouse ID Child can only match to their parent, not to other parent accounts Pass all potential overmatches to a review task in Information Steward for data steward input Use data steward s input to determine how to handle the record Leave alone or create a new account master
Account Master Overmatch Cleanup
Account Master Cleanup, Step 2 Improve the current delta matching logic that was part of the sales weekly data warehouse load Should see a gradual decrease in number of new accounts created over time 3K per week initially New children accounts must be matched first against existing account masters, only after that can they be considered a match with each other Account master data was frozen for one month to accomplish this task Short enough timeline to not have a critical impact on business decisions
Account Delta Process
Account Master Cleanup, Step 3 Identify undermatched accounts Accounts that should be merged together but haven t been for whatever reason Run all existing account master records through a DS match dataflow to determine if they should be merged into one If a potential match is found between 2 or more accounts, pass this match group along to an IS Match Review task for data steward review Utilize data stewardship results to determine a winning account master and deprecate the others in the group
Account Master Undermatch Cleanup
VISION FOR THE FUTURE Ultimately would like to associate Salesforce.com CRM data with actual sales data coming from distributors Provides backward-looking analysis of sales rep performance Capability to start performing some predictive analysis Find more ideal customers Identify prototypical customers Focus on these accounts to grow business Foundation is now in place to be in compliance with Sunshine Act when it goes into effect
RETURN ON INVESTMENT THUS FAR Yearly savings resulting from initial DW project: $441.5K Savings resulting from reduced time to process weekly records: $13,000/month or $156,000/year Customer targeting and predictive analytics is next No upper bound on revenue potential
BEST PRACTICES Involve the business often to showcase improvements and ask for further suggestions Necessary for all DG/DQ projects Keep history of IS Match Review results OK to leave in same table in 4.1, issues have been found in early versions of 4.2 Just fine to move to another table if too confusing Have separate Reviewer and Approver roles for Match Review tasks Easy to get fatigued when going through hundreds or thousands of records Also a good idea to allow a few days to pass between review and approval
KEY LEARNING SAP Information Steward can assist in establishing a Data Governance program and gaining momentum within your organization Empower your power users to own data processing and be responsible for data quality. Actively involve business users in all steps of the process Eliminate manual intervention to automate data processing as much as possible. This is where a large portion of ROI can be found
Questions? Praneeth Padmanabhuni praneeth.padmanabhuni@grifols.com Rich Hauser richard.hauser@decisionfirst.com
FOLLOW US Follow the ASUGNews team: Follow the ASUGNews team: Tom Wailgum: @twailgum & Courtney Bjorlin: @cbjorlin For all things SAP
THANK YOU FOR PARTICIPATING Please provide feedback on this session by completing a short survey via the event mobile application. SESSION CODE: 0204 For ongoing education on this area of focus, visit www.asug.com