What happens when Big Data and Master Data come together? Jeremy Pritchard Master Data Management fgdd 1
What is Master Data? Master data is data that is shared by multiple computer systems. The Information Difference Master data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites, hierarchies and chart of accounts. Gartner Master data is information that is key to the operation of a business persistent, non-transactional data that defines a business entity for which there is, or should be, an agreed-upon view across the organisation. Wikipedia Master data is often one of the key assets of a company. Microsoft What is Master Data Management? Master data management is a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise s official shared master data assets. Gartner Master Data Management comprises a set of processes, governance, policies, standards and tools that consistently defines and manages the master data. Wikipedia The creation of: The Golden Record Single Version of the Truth fgdd 2
Types of data in an organisation Unstructured Transactional Metadata Hierarchical Master Found in e-mail, white papers, magazine articles, corporate intranet portals, product specifications, marketing collateral, and PDF files Related to sales, deliveries, invoices, trouble tickets, claims, and other monetary and non-monetary interactions Data about other data and includes: report definitions, column descriptions in a database, log files, connections, and configuration files Stores the relationships between other data such as company organisational structures or product lines. Critical nouns of a business and fall generally into the groupings: people, places and things, The What, Why, and How of Master Data Management Microsoft November 2006 Understanding Master Data Think of nouns and verbs Bob Smith buys a widget (SKU #A1234) and ships it to his home address The master data elements are the nouns and are people, things, and places The transactional data elements are verbs that describe what happens to those people, places, and things. CRM Marketing ERP WMS Financial fgdd 3
Deciding what Master Data should be Managed Generally speaking, master data should meet the following requirements: Cardinality Volatility Lifetime Value Reuse Volatility Any given day: 21,994 will change their address 3,112 will change their name 1,920 will change their address 32 will change their name 46,152 will change jobs 896 directorship changes will occur 1,200 will change their telephone number 96 new business will start Better Information through Master Data Management MDM as a Foundation for BI Oracle September 2011 fgdd 4
Master Data Management Name: Bob Smith Tel: 01323 456842 DOB: 23/10/71 Gender: M Name: Bob Smith Tel: 01323 456842 DOB: Gender: M Name: Bob Smith Tel: 01323-456842 DOB: Gender: Male Name: B Smith Tel: 01323 456842 DOB: 23/10/71 Gender: M Name: B Smith Tel: (0)1323456842 DOB: 23-Oct-71 Gender: M Name: Bob Smith Tel: 01283 56982 DOB: 23/10/71 Gender: Name: Smith, Bob Tel: (01283)56982 DOB: 23/10/1971 Gender: CRM Marketing ERP WMS Financial Master Data Management Architectures Consolidated Coexistence Master is Single Version of Truth Data Quality at Master Updates occur at Sources Updates propagated to Master Master is Single Version of Truth Data Quality is ongoing Updates occur at Sources or Master Updates propagated to other Sources Registry Centralised Multiple Versions of Truth Data Quality is ongoing Updates occur at Sources Keys and Metadata in Registry Updates optionally propagated to other Sources Master is Single Version of Truth Data Quality at Master Updates occur at Master Updates propagated to Sources fgdd 5
The Current Landscape of MDM Systems 45% of survey respondents have no formal MDM system Aberdeen Group April 2012 Reported Success of MDM Programs 45% of survey respondents said their projects were successful or very successful Information Difference July 2012 fgdd 6
Key Domains to be Managed The top two domains were customers and products Information Difference July 2012 What about you? Do you have a Master Data Management solution running in your organisation? fgdd 7
Big Data What is Big Data? high-volume, -velocity and -variety information assets that demand costeffective, innovative forms of information processing for enhanced insight and decision making. Gartner "Big Data" describes data sets so large and complex they are impractical to manage with traditional software tools. Wikipedia a term applied to voluminous data objects that are variety in nature structured, unstructured or a semi-structured, including sources internal or external to an organisation, and generated at a high degree of velocity with some level uncertainty pattern, that does not fit neatly into traditional, structured, relational data stores and requires strong sophisticated information ecosystem with high performance computing platform and analytical capabilities to capture, process, transform, discover and derive insights with some level of confidence and accuracy to provide business value within a reasonable elapsed time. The Big Data Institute (TBDI) fgdd 8
What is Big Data? 63% of IT and business executives are not familiar with the phrase Worldwide Big Data Ecosystem IDC 30 th July 2013 Moore s Law The number of transistors on integrated circuits doubles approximately every two years. His 1965 paper noted that the number of components in integrated circuits had doubled every year from the invention of the integrated circuit in 1958 until 1965 and predicted that the trend would continue "for at least ten years". The capabilities of many digital electronic devices are strongly linked to Moore's law: processing speed, memory capacity, sensors and even the number and size of pixels in digital cameras fgdd 9
Moore s Law Three Dimensions of Big Data Volume Velocity Variety fgdd 10
Three Dimensions of Big Data Volume Velocity Variety Data Size 1,000,000,000,000,000,000,000,000,000 10 27 1,000,000,000,000,000,000,000,000 10 24 Brontobyte 1,000,000,000,000,000,000,000 10 21 Yottabyte 1,000,000,000,000,000,000 10 18 1,000,000,000,000,000 10 15 Exabyte 1,000,000,000,000 10 12 Petabyte 1,000,000,000 10 9 Terabyte Gigabyte 1,000,000 10 6 Megabyte Zettabyte fgdd 11
Global Data 2.5 Exabytes are created in the digital universe every day 2.3 Trillion Gigabytes 2,500,000,000,000,000,000 Bytes The Big Vs of Big Data PROs 24 th July 2013 & The Four Vs of Big Data IBM 25th July 2013 has been created in the last last2years The Big Vs of Big Data PROs 24 th July 2013 fgdd 12
Global Data Volume (in Zettabytes) 1 Zettabyte = 1 trillion Gigabytes 1 billion Terrabytes 40 0.13 1.4 2.7 8 2005 2011 2012 2015 2020 The Big Vs of Big Data PROs 24 th July 2013 & The Four Vs of Big Data IBM 25th July 2013 Three Dimensions of Big Data Volume Velocity Variety fgdd 13
Data Velocity The New York Stock Exchange captures 1TB of trade information during each trading session Wal-Mart handles more than 1 million customer transactions every hour By 2020, business transactions on the internet (B2B and B2C) will reach 450 billion per day The economist Feb 25th 2010 IDC The Four Vs of Big Data IBM 25 th July 2013 Social Media Data Velocity [ ] 950 million users generate 2.7 billion likes on Facebook per day [ ] 400 million new tweets are created by users each day [ ] 2 million Google search queries per minute [ ] 24 Petabytes of data processed per day The Big Vs of Big Data PROs 24 th July 2013 fgdd 14
Three Dimensions of Big Data Volume Velocity Variety Variety Purchase Transactions Website Traffic Clickstreams Rewards Programs Twitter Facebook Blog content Videos Email Business Reports Logfiles Mobile Data Location data Metering Data Personal Health Monitors Sensor data fgdd 15
Sensor Data The Large Hadron Collider has 150 million sensors delivering data 40 million times per second Boeing 737 generates 240 Terabytes of flight data during a single transatlantic flight Modern cars have close to 100 sensors that monitor items such as fuel level and tyre pressure Within the next five years, sensor data will hit the crossover point with unstructured data generated by social media. From there, the sensor data will dominate by factors 10-to-20 times that of social media. Sverre Jarp: CTO at Cern 6 th June 2013 Stephen Brobst, CTO Teradata - 2010 The Four Vs of Big Data IBM 25 th July 2013 Three Dimensions of Big Data Volume Velocity Variety fgdd 16
Big Data Technology Challenge More Data, More Data Sources 01010101010101010101010101 010101010101010101010101010 Real time data 01 1 0101010101010101010101010 Multiple databases 1010 01010101010101010101 External Sources 101 01 Limited Resources and Budget More Kinds of Output Needed by More Users, More Quickly 10 1 1 1 1 010 1 0 Traditional Data Warehousing Labour intensive, heavy indexing, aggregations and partitioning Hardware intensive: massive storage; big servers Expensive and complex Top Big Data Challenges The biggest challenge for survey respondents was determining how to get value from big data 720 Big Data Adoption in 2013 - Gartner 12 September 2013 fgdd 17
Big Data Investments on the Rise 64% of survey respondents are investing or planning to invest in Big Data Big Data Adoption in 2013 - Gartner 12 September 2013 Types of Data Analysed The top three types of data were transactions, log data and machine or sensor data Big Data Adoption in 2013 - Gartner 12 September 2013 fgdd 18
What about you? Do you have a Big Data project running in your organisation? Master Data & Big Data How can they work together? fgdd 19
A Mismatched Pair? Big Data Large volumes Potentially unstructured High velocity Varied Generally transactional Questionable trustworthiness Master Data Relatively small Highly structured Domain specific Non-transactional Trusted Don t try putting Big Data through your MDM solution fgdd 20
MDM as a search index for big data Big Data sources may contain new insights but they are often hard to identify and place quickly and cost-efficiently. If you want to perform targeted analysis on Big Data, you need to know what you re looking for. MDM is used to guide big data analysis Example Understanding Customer Interactions Customers Customer Services Social Media fgdd 21
Extracting Master Data from Big Data Augment traditional information data with dynamically derived data from Big Data sources Distil the data down to have meaning Enhance the 360 degree view of MDM Example Is this customer a safe driver? Hire Car Customer Sensor Data 7 10 fgdd 22
Is there a connection between Big Data and MDM? 44% of survey respondents believe there is a significant connection between MDM and Big Data The Information Difference - September 2012 Link Specifics 67% of survey respondents believe the link is from MDM to Big Data. 17% believe the link is from Big Data to MDM The Information Difference - September 2012 fgdd 23
What about you? Do you consider Social Media to be the most important Big Data to your organisation? What about you? Are you currently or do you have any plans to link Master Data Management and Big Data in in your organisation? fgdd 24
Information Builders Information Builders Our Mission: To provide the best software and services for business intelligence, analytics and information management Transform data into business value Allow every stakeholder to make better decisions Inject valuable insight throughout your business 38 years of expertise 1,350 dedicated professionals 60 offices worldwide Tens of thousands of customers Millions of users fgdd 25
Information Builders The Information Stack Business Intelligence Advanced Analytics Performance Management Data Quality Management Master Data Management Data Governance Integration Infrastructure Data Integration Universal Adapter Suite Information Builders A unique and Complete Solution fgdd 26