DATA MANAGEMENT USER GROUP MANCHESTER 9 TH MARCH 2016
AGENDA SAS DATA MANAGEMENT USER GROUP 9 th March 2016, SAS Manchester Time Topic Speaker 9-9.30am Coffee and networking 9.30-9.35am Introductions All 9.35-10.15am SAS Data Integration Janice Newell 10.15-11.00am SAS Data Quality Rajeeve Narula 11.00am Coffee Break 11.20-11.40am SAS Data Governance Dave Smith 11:40-12:20pm SAS Data Federation Dave Smith 12.20-12.30pm Wrap up Sophie Ainley 12.30 2pm Lunch and networking All
THE DATA PROCESS A DAY IN THE LIFE Lineage BDN Data Governance Fed Server Data Access Data Assessment Data Cleansing Data Monitoring Data Transformation SAS DI SAS DI DM Studio DM Studio DM Studio DM Studio Fed Server Fed Server Fed Server Fed Server
DATA INTEGRATION
DATA INTEGRATION Transactional Systems Reshape Adjust Time Dim Decision Making External Data Feeds Standardise Clean Regulatory Data Reporting Requirements Spreadsheets Analyse How did you get there? C o p yr i g ht 2013, S A S I nstitute Inc. A l l r i g hts r eser v ed.
SAS DATA INTEGRATION SERVER Generates code through GUI SAS Code Database Code Library of transformations and functions Manages Metadata Records steps taken to build output data Able to trace any table or column forwards and backwards through the process Much more efficient than hand coding Usually 50% faster through clarity, reusability, coding speed C o p yr i g ht 2013, S A S I nstitute Inc. A l l r i g hts r eser v ed.
ANALYTICAL PROCESS DATA REQUIREMENT Source Data Analysis Ready Data Dashboards Deployed Models Stakeholders Productionised Process Analysts Source Data Analysis Ready Data Visualisation Modelling Data Managers Granular Security Business Repeatability Assured Quality IT Support Governance and Clarity C o p yr i g ht 2013, S A S I nstitute Inc. A l l r i g hts r eser v ed.
ANALYTICAL DATA INTEGRATION Preparing data for analysis Can pass data mining metadata to Enterprise Miner (Target etc.) Embedding analytical procedures Summarisation, esp medians (on all data) Time series preparation Multi-row data operations Creating correlation indexes Scoring Data Rapid model deployment Including managing in-database scoring Model monitoring data creation C o p yr i g ht 2013, S A S I nstitute Inc. A l l r i g hts r eser v ed.
C o p yr i g ht 2013, S A S I nstitute Inc. A l l r i g hts r eser v ed. DI DEMO
DEMO DAY IN THE LIFE OF DI ANALYST SAS data External files Join tables Create new fields Map data Check errors Control order Tables Fields Impact analysis Define data Test job C o p yr i g ht 2013, S A S I nstitute Inc. A l l r i g hts r eser v ed.
ANALYTICAL DATA MANAGEMENT DETAIL What SAS DMA provides Development framework to create SAS job flows including a documentation framework Inbuilt versioning framework Inbuilt custom transformation framework to provide re-use of complex processing Metadata impact analysis and search facility Deployment of SAS data flows to a scheduling tool SAS analytical modelling code integration (uses Enterprise Miner) Data quality integration framework bring in DQ processing (uses DMA) Data governance framework share lineage through a browser (DMA 9.4M2) Clear evidence of process and development ownership Historical traceability of DI changes Simplify DI flows and speed up development Importance/usage of data items within a SAS data flow Integration with production processes Support predictive model factory concept Ensure trust in results, build defensive process controls Build business driven definitions of data items C o p yr i g ht 2013, S A S I nstitute Inc. A l l r i g hts r eser v ed.
DATA QUALITY
THE DATA PROCESS A DAY IN THE LIFE Data Access Data Assessment Data Cleansing Data Monitoring DM Studio DM Studio DM Studio DM Studio Data Connection Profiling Standardise Business Rules Data Job Dashboard
DATA GOVERNANCE
WHY GOVERN DATA? Regulation Risk Efficiency Opportunity Financial organisation
CHALLENGE MAP EVERYTHING Customers are under increasing pressure to be able to link data in disparate systems at a logical level that is to show how metadata is connected At the same time, with the advent of Big Data systems and the concept of the Data Lake, it is ever more important from a practical, user-driven point of view, to have a system that tells data users where the data resides? Business Term Data Item?? Where in the Lake is my data?
ETL Data Quality Database Physical Data Model Logical Data Model Database Database Mainframe Data via COBOL Analytics and Reporting
TYPICAL REQUIREMENTS Business Glossary Search / Discover data & metadata Business terms & technical data attributes Ownership, Structure, Usage Context Secured, governed, and workflow enabled Consensus Data Monitoring Metadata Lineage Users specify data quality controls/checks Proactively monitor data Validate data Enforce policy Alerts Automated collection of metadata Services to manage the metadata Maintain relationships Provide context Metadata analysis Collaboration Transparency
METADATA LINEAGE SAS RELATIONSHIP SERVICE AND LINEAGE VIEWER
SAS RELATIONSHIP (LINEAGE) REPOSITORY One repository to store metadata from multiple environments. SAS Relationship Repository
A ROBUST SET OF SERVICES TO MANAGE THE REPOSITORY Automated metadata collection Easy to access Many ways to analyze Metabridge Loader Lineage Viewer Relationship Loader REST Services Relationship Repository Relationship Reporter REST Services
RESULT: DATASTAGE ETL JOB
CLEAR GOVERNANCE AND OWNERSHIP SAS BUSINESS DATA NETWORK
DATA GOVERNANCE PEOPLE, PROCESS, TECHNOLOGY Business User Business Term Technical User Rule Business Data Network Data Stewards Alerts
DEMO
SAS FEDERATION SERVER
DATA FEDERATION WHAT IS IT? Federation Server Source 1 Web Administration Logging Source 2 Federated Views Row and Column Access Control Applications Source 3 Caching views Scheduling
DATA FEDERATION ENABLING COLLABORATION Collaboration Environment Secure data filter Organisation 1 Organisation 2 Organisation 3
DATA FEDERATION MASKING SENSITIVE INFORMATION Data Lake Data Masking Analysis Environment
DATA FEDERATION AUDITING ANALYTICAL USAGE EDW Logging Analysis Environment Notifiable Queries Workflow
SAS FEDERATION SERVER WHAT S NEW IN 4.2?
WHAT S NEW? SUMMARY SAS Metadata Server and Web Infrastructure Platform (WIP) integration SAS Metadata Server replaces DataFlux Authentication Server for authentication and persistence of users, groups, logins (for example, personal, group, and shared) and domains
WHAT S NEW? SUMMARY Read/Write access to Hadoop (HIVE) using the new SAS Federation Server Driver for Apache Hive Access to SAS data sets secured with metadata bound libraries Access to shared data sources across multiple SAS Federation Servers using a new Federation Server Driver Enhanced data masking and encryption support
WHAT S NEW? SUMMARY Embedded data quality and cleansing functions in data views Support for SAS DS2 Cache enhancements that include in memory data cache A new migration guide is available for SAS Federation Server 4.2. Proc ASExport
USES SAS METADATA SERVER This refresh icon can be used to show newly created Authentication Domains SAS Metadata Server replaces Authentication Server for authentication and other permission-based functions SAS Metadata Server provides access for user and group objects other permission-based functions such as shared logins and trusted users.
ENHANCED DATA MASKING Enhanced data masking and encryption support New data masking features include TRANC which transliterates characters from the input string to characters in the output string. to change (letters, words, etc.) into corresponding characters of another alphabet or language A series of random data masking rules are also available. The current set of available Data Masking Functions
The current set of available Data Masking Functions ENHANCED DATA MASKING TRANC which transliterates characters RANDOM rules are also available Example of masking a numeric column Example of masking a character column
CACHE ENHANCEMENTS Cache enhancements that include cache refresh for data held in memory We can now cache queries to the MDS (Memory Data Store) = FAST PERFORMANCE Federation Server now has the capability of refreshing cached data, including MDS, after a server restart. In previous releases, cached data that was held in memory was deleted if the server was restarted or shut down.
FED. SERVER 4.2 CACHE ENHANCEMENTS After a Fed. Server restart the views are re-ran in the background We can now cache queries to the MDS (Memory Data Store) = FAST PERFORMANCE
FED. SERVER 4.2 Parsing EMBEDDED DATA QUALITY Mr. Roy G Biv Jr Data Quality Extraction Blue mens long-sleeved buttondown collar denim shirt Where the DQ functions live Pattern Analysis 999-999-9999 Identification Analysis John Smith = Name / SAS = Organization Gender Analysis Jane Smith = F - Sam Adams = M Standardization, Casing 919.6778000 = (919) 677-8000 Matching John Smith / J. Smith / Mr. Jon Smith
EMBEDDED DATA QUALITY Embedded data quality and cleansing functions in data views Implemented using SAS Quality Knowledge Base (QKB) with FedSQL and DS2. The data quality methods use data quality rules from the SAS QKB in order to cleanse data. The standardized primary_state_code
FED. SERVER 4.2 EMBEDDED DATA QUALITY
DATA STEP 2 (DS2) LANGUAGE SAS Federation Server now supports the DATA Step 2 (DS2) language. includes additional data types ANSI SQL types programming structure elements user-defined methods and packages. To invoke DS2, you must configure a DSN that uses the DS2 dialect Processing gets automatically pushed down if Code Accelerator is present in the corresponding data platform If DS2 code conforms to a pushable format (e.g. threads defined, etc.)
FED. SERVER 4.2 DATA STEP 2 (DS2) LANGUAGE Actually, our DQ functions are DS2 methods invoked from SQL DS2 equivalent Customers can write any DS2 code with if/then/else logic, iterating over column data and producing programmatic results This integrates nicely with SQL and is a very useful way to use DS2
READ/WRITE ACCESS TO HADOOP (HIVE) Read/Write access to Hadoop (HIVE) using the SAS Federation Server Driver for Apache Hive The Driver for Hive uses FedSQL and also provides limited support for HiveQL. supports multiple versions of Hadoop. you can use Kerberos does not support Write operations such as insert, update, and delete
FED. SERVER 4.2 READ/WRITE ACCESS TO HADOOP (HIVE) Access Hadoop using SAS Studio to Federation Server Create a table in Hadoop The configuration of the Hadoop Data Service using the native Apache HIVE driver
QUESTIONS?
CUSTOMER LOYALTY UK USER GROUPS To register: www.sas.com/uk/usergroups
USEFUL INFORMATION
THANK YOU FOR YOUR TIME