Peter Aiken, Ph.D. Data Blueprint Session Code CL01
Speaker Bio The author is widely acclaimed as one of the top ten data management authorities in the world. In addition to examining the data management practices of more than 500 organizations, he has spent multi-year immersions with organizations as diverse as the US DoD, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. As President of DAMA International, his expertise in the practice is unquestioned. He has been a member of the Information Systems Department at Virginia Commonwealth University's School of Business since 1993 and jointly owns, with the University, Data Blueprint(.com) an awardwinning, data management/it consulting firm. September 20, 2011
Session Abstract This presentation provides guidance to organizations preparing for cloud computing initiatives. We will show that data quality initiatives are essential when reengineering data for the cloud and provide examples for structural and data quality-related considerations and benefits. Showing how data quality can be engineered provides a useful framework in which to develop an organizational approach for cloud computing. Participants will also learn about the importance of understanding the role of various data quality tools and techniques that can be used to ensure the success of your organizational cloud computing initiatives. 3 September 20, 2011
Famous words? Question: Why haven t organizations taken a more proactive approach to data quality? Answer: Fixing data quality problems is not easy It is dangerous they ll come after you Your efforts are likely to be misunderstood You could make things worse Now you get to fix it A single data quality issue can grow into a significant, unexpected investment
High cost of poor quality government information
Examples of poor data governance Mizuho Securities Example Wanted to sell 1 share for 600,000 yen Sold 600,000 shares for 1 yen $347 million loss In-house system did not have limit checking Tokyo stock exchange system did not have limit checking And doesn t allow order cancellations CLUMSY typing cost a Japanese bank at least 128 million and staff their Christmas bonuses yesterday, after a trader mistakenly sold 600,000 more shares than he should have. The trader at Mizuho Securities, who has not been named, fell foul of what is known in financial circles as fat finger syndrome where a dealer types incorrect details into his computer. He wanted to sell one share in a new telecoms company called J Com, for 600,000 yen (about 3,000).
State of data quality: information indifference 4% excellent data quality 51% good data quality 32% poor data quality 49% actively measuring
Insight into the cost of poor data is lacking Only 1 in 3 companies are very confident in the quality of their own data Only 15% of companies are very confident of the data received from other organizations Calc ulate d 37% Not calcu lated 63%
The blind men and the elephant It was six men of Indostan, To learning much inclined, Who went to see the Elephant (Though all of them were blind), That each by observation Might satisfy his mind. The First approached the Elephant, And happening to fall Against his broad and sturdy side, At once began to bawl: "God bless me! but the Elephant Is very like a wall!" The Second, feeling of the tusk Cried, "Ho! what have we here, So very round and smooth and sharp? To me `tis mighty clear This wonder of an Elephant Is very like a spear!" The Third approached the animal, And happening to take The squirming trunk within his hands, Thus boldly up he spake: "I see," quoth he, "the Elephant Is very like a snake!" The Fourth reached out an eager hand, And felt about the (Source: John Godfrey Saxe's ( 1816-1887) version of the famous Indian legend ) knee: "What most this wondrous beast is like Is mighty plain," quoth he; "'Tis clear enough the Elephant Is very like a tree! The Fifth, who chanced to touch the ear, Said: "E'en the blindest man Can tell what this resembles most; Deny the fact who can, This marvel of an Elephant Is very like a fan! The Sixth no sooner had begun About the beast to grope, Than, seizing on the swinging tail That fell within his scope. "I see," quoth he, "the Elephant Is very like a rope! And so these men of Indostan Disputed loud and long, Each in his own opinion Exceeding stiff and strong, Though each was partly in the right, And all were in the wrong!
The blind men and the elephant, cont d
No universal conception of data quality exists, instead many differing perspectives compete Problem: Most organizations approach data quality problems in the same way that the blind men approached the elephant people tend to see only the data that is in front of them Little cooperation across boundaries, just as the blind men were unable to convey their impressions about the elephant to recognize the entire entity Leads to confusion, disputes and narrow views Solution: Data quality engineering can help achieve a more complete picture and facilitate cross boundary communications
Traditional quality life cycle Data Storage Data Acquisition Activities Data Usage Activities Levitan and Redman s Data Acquisition and Usage Cycles (Levitin and Redman 1993].
Extended data life cycle model with metadata sources and uses
Definition: cloud computing Cloud computing is location-independent computing, whereby shared servers provide resources, software, and data to computers and other devices on demand, as with the electricity grid. Cloud computing is a natural evolution of the widespread adoption of virtualization, service-oriented architecture and utility computing. Details are abstracted from consumers, who no longer have need for expertise in, or control over, the technology infrastructure "in the cloud" that supports them.
Cloud rendering
From the office federal chief information officer Vivek Kundra
From the office of our federal chief information officer, Vivek Kundra, cont d IT Infrastructure. Your submission should include funding for the timely execution of agency plans to consolidate data centers developed in FY 2010 (reference FY 2011 passback guidance). In coordination with the data center consolidations, agencies should evaluate the potential to adopt cloud computing solutions by analyzing computing alternatives for IT investments in FY 2012. Agencies will be expected to adopt cloud computing solutions where they represent the best value at an acceptable level of risk. Adopt Light Technologies and Shared Solutions. We are reducing our data center footprint by 40 percent by 2015 and shifting the agency default approach to IT to a cloud-first policy as part of the 2012 budget process. Consolidating more than 2,000 government data centers will save money, increase security and improve performance.
Similar opportunity Today we have with us Dr. Peter Aiken, founder and CEO of Data Blueprint. Data Blueprint helps public and private organizations improve the value of their data and the effectiveness of data management practices. Today we are going to talk about a current federal government challenge the cloud first imperative and how data centric development is a necessary pre-requisite to benefitting from it Question: The administration has recently issued a cloud first mandate that requires a massive data consolidation, eliminating 800 data centers by 2012. what are some potential pitfalls that could impact data quality in agencies?
Similar opportunity, cont d Answer: Well Traci, 20 years ago, I was a DISA CIM team member, consolidating more than 20,000 DoD legacy systems and the data quality pitfalls no are very similar to the way they were then: Too few in the IT and business communities understand the operational and strategic impact of data quality and Too few understand how to prevent and correct these data quality problems
Data cloud infrastructure Gartner defines cloud computing as the set of disciplines, technologies, and business models used to deliver IT capabilities (software, platforms, hardware) as an on-demand, scalable, elastic service. 5 essential characteristics of cloud computing: It uses shared infrastructure It provides on-demand self-service It is elastic and scalable It is priced by consumption It is dynamic and virtualized
Getting into the cloud Transform Less, cleaner, and more shareable data
Similar opportunity, cont d Question: It has been said that If done properly, data center consolidation can transform every aspect of an IT organization. What can an agency do to ensure the success of such a large-scale data initiative? Answer: Organizations can ensure successful data cloud-based initiatives by focusing on three key goals: 1. Incorrect operational data must be corrected before cloud loading 2. Architectural quality problems must also be corrected before cloud loading, and 3. Data must be re-architected to maximize effective organization-wide sharing and minimize organizational data ROT before cloud loading
The cloud as a data quality tool
Fixing data in the cloud using a glovebox
Data.gov monthly visitor statistics
Cloud security Question: We are living in a world where IT security is one of the greatest concerns for our national security. What is the relationship between data strategy, data governance, and ultimately, the security of data? Answer: The relationship between data strategy, data governance, and ultimately, the security of data is one of dependency. Please picture an inverted three-layer pyramid where from bottom to top your organizational data strategy is the bottom third, governance is the middle layer, and security is the top, most comprehensive layer. Your data strategy specifies how data is engaged to help achieve organizational objectives and you use data governance to guide ongoing data security planning and implementation. So, good data security is dependent upon good governance and good governance is dependent on good strategy an inverted 3-layer pyramid. Data Strategy Governance Security
Effective cloud transformation Transformation into cloud computing cannot be done in a manner that benefits organizations unless data is re-architected formally with two goals: 1. Maximizing effective, organization-wide data sharing; and 2. Minimizing organizational data ROT. Our research indicates that the resulting data volume reduction should be 1/5 what is currently is This is a significant economic motivator. All existing federal agencies have data collections that possess unique strengths and weaknesses. The data organizations have strengths that should be leveraged and the weaknesses must be corrected. Neither of these can be accomplished without formal data re-architecting prior to cloud loading. There are very few who work in the area for a living but my team has achieved some remarkable successes
Data assurance stumbling blocks Many organizations do attempt some semblance of a data assurance effort. However, there are several reoccurring themes that stymie these efforts, including: Lack of a cross-functional, dedicated team for data assurance Not involving the business community Failure to establish a formal data stewardship organization and data quality program Lack of an enterprise metadata management program Believing that maintaining the status quo is acceptable Believing that you do not need data standards to achieve accountability Poor planning, lack of upfront understanding and execution Ignoring the enterprise ramifications by creating redundant data silos
Data assurance stumbling blocks
The upside of data quality Cost savings from the removal of redundant customers, product, materials data Savings in operational costs Cost savings and greater business efficiency from a more integrated supply chain Better strategic planning based on more accurate analysis and forecasting Increased revenue from identifying and targeting first-time customers Enhanced revenue from higher customer satisfaction and retention All the advantages associated with better regulatory compliance, fraud prevention and loss control
Summary Computing Data This presentation provides guidance to organizations preparing for cloud computing initiatives. We will show that data quality initiatives are essential when reengineering data for the cloud and provide examples for structural and data quality-related considerations and benefits. Showing how data quality can be engineered provides a useful framework in which to develop an organizational approach for cloud computing. Participants will also learn about the importance of understanding the role of various data quality tools and techniques that can be used to ensure the success of your organizational cloud computing initiatives. 32 September 20, 2011
Questions? 33 September 20, 2011
Thank You Contact: Name: Peter Aiken, Ph.D. Phone: 804.521.4056 Email: paiken@datablueprint.com http://twitter.com/#!/paiken http://peteraiken.net Copyright CA 2011. All rights reserved. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. No unauthorized use, copying or distribution permitted.
Legal notice Copyright CA 2011. All rights reserved. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. No unauthorized use, copying or distribution permitted. THIS PRESENTATION IS FOR YOUR INFORMATIONAL PURPOSES ONLY. CA assumes no responsibility for the accuracy or completeness of the information. TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS DOCUMENT AS IS WITHOUT WARRANTY OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. In no event will CA be liable for any loss or damage, direct or indirect, in connection with this presentation, including, without limitation, lost profits, lost investment, business interruption, goodwill, or lost data, even if CA is expressly advised of the possibility of such damages. Certain information in this presentation may outline CA s general product direction. This presentation shall not serve to (i) affect the rights and/or obligations of CA or its licensees under any existing or future written license agreement or services agreement relating to any CA software product; or (ii) amend any product documentation or specifications for any CA software product. The development, release and timing of any features or functionality described in this presentation remain at CA s sole discretion. Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA product release referenced in this presentation, CA may make such release available (i) for sale to new licensees of such product; and (ii) in the form of a regularly scheduled major product release. Such releases may be made available to current licensees of such product who are current subscribers to CA maintenance and support on a when and if-available basis.