Data Modeling in the Age of Big Data
|
|
|
- Evan Parrish
- 10 years ago
- Views:
Transcription
1 Data Modeling in the Age of Big Data Pete Stiglich Pete Stiglich is a principal at Clarity Solution Group. [email protected] Abstract With big data adoption accelerating and strong interest in NoSQL technologies (such as Hadoop and Cassandra) and the flexible schemas they offer, the need for data modeling is sometimes questioned. However, data in NoSQL may still require some structure so data scientists can mine it effectively, even though NoSQL does have powerful mechanisms for working with unstructured data to extract structured data. Whenever data has some structure, a data model can be useful for describing it and fostering communication among data scientists, developers, data analysts, and others. The need for conceptual (business/domain) and logical data modeling doesn t change significantly, although physical data modeling practices may need to be adjusted. The concept of a data lake, where all your source system data is stored on the massively parallel processing (MPP) cluster or grid, requires effective metadata management (and possibly data modeling) to help you understand the data when source system documentation is inadequate. Determining how a data modeling component fits into your big data projects requires collaboration among management, project managers, and application and data architects. Data Modeling in the Age of Big Data With the advent of NoSQL databases and big data platforms such as Hadoop, Cassandra, and MongoDB, the need for formalized data modeling is sometimes questioned. Some believe that formalized data modeling (in the sense of defining data structures, not in the sense of data mining/statistical models) can and should be circumvented in NoSQL and big data environments. After all, these people argue, data modeling slows down development and models aren t necessary given many of the schema-less features of these new platforms. However, whenever you are designing data structures, BUSINESS INTELLIGENCE JOURNAL VOL. 19, NO. 4 17
2 you are modeling data, whether formally (e.g., using diagrams) or informally (e.g., data definition language). In this article, I will describe how data modeling is still highly relevant in the big data world. I will explain how data modeling will continue to evolve, as well as how data modeling for NoSQL big data technologies changes or has similarities with traditional data modeling. This will include: Determining when and how data modelers should be involved in big data projects Pig and Hive have similarities with RDBMS tables, such as tables, columns, data types, and partitioning. There are also differences, of course (such as no foreign key constraints, and no unique constraints). HBase and Cassandra have tables (that are completely denormalized because joins aren t possible natively) and column families that are predefined, although the actual columns (aka column qualifiers) can be completely variable (e.g., one record can have 10 columns and another record can have a million completely distinct columns). Differences and similarities between relational database management systems (RDBMSs) and NoSQL (because this affects data modeling) Types of data models to develop How data modeling tools can be used in a NoSQL environment What Does Schema-less Mean? Even when you work with unstructured data (video, audio, freeform text), your data must have some structure to provide meaning or be useful. Unstructured data usually has associated metadata (e.g., tags, creation or modification dates, Dublin Core elements) to provide context and so that the metadata and the data together have a structure. Many of the new big data technologies leverage non-relational data stores based on key-value, wide-column, graph, or other data representation paradigms. Schema-less typically refers to the absence of predefined physical schema that are typical of relational databases for example, in system catalogs where tables, columns, indexes, views, etc., are pre-defined and often relatively hard to change. However, unless you are doing something similar to using MapReduce to create key-value pairs by interrogation of a file (e.g., to get word counts from unstructured text), your data typically has some kind of predefined structure. For example: MongoDB stores data in JSON documents, which have a structure similar to XML. Semantic models, although perhaps using an RDF triple store for persisting, have taxonomies and ontologies that require modeling expertise to develop. Making sense of unstructured data often requires such taxonomies and ontologies, e.g., to categorize unstructured data such as documents. The need for maintaining a system catalog for Hadoop-based tables has been recognized; HCatalog has been developed to help keep track of these catalogs. In the big data world, your data often has a schema or structure, although there can be a high degree of flexibility. Role of Data Modelers in Big Data Given the flexible schemas of many big data technologies, where (and when) does the data modeler fit in? The answer becomes more difficult with NoSQL environments, and involvement will vary by type of project, industry, and sensitivity or criticality of the data (e.g., financial, data required for compliance, PHI data). An Internet company may require less formal modeling than a healthcare provider, but multiple development paradigms may be needed even within a single company. The more critical the data, the more quality required, the more structured the data is, or the more it is exposed to end users or customers (via a BI tool, for example), typi- 18 BUSINESS INTELLIGENCE JOURNAL VOL. 19, NO. 4
3 cally the greater the level of modeling formalism required. However, there is no one-size-fits-all approach. Data architects and modelers should generally be a part of most big data development teams for: enterprise use of data/information, and will likely create a database that meets the needs of the immediate application, although over time the system may: Not align with corporate/enterprise goals Modeling business objects and relationships as a precursor to a design phase and to help data scientists understand how to tie the data together from a business perspective Degrade data quality Become difficult to integrate with other applications and databases Translating business data requirements into target data design (logical and/or physical) depending upon project type Require extensive rework Perform or scale poorly Providing semantic disambiguation and helping data scientists and developers make sense of the data (e.g., as a data archaeologist) Modeling source data (e.g., reverse engineering) and developing artifacts to help formalize communication about what source data is contained in the data lake and the business rules associated with the use of individual data components Identifying, collecting, and providing source metadata (e.g., source models, definitions, data dictionary) Enabling alignment with data governance, as applicable Fail to align with data governance and stewardship decisions Consider the case of application developers (and sometimes inexperienced data modelers) who might proceed immediately to physical data modeling, bypassing conceptual and logical modeling, to create the schema. As a result, business and functional relationships might be modeled incorrectly (e.g., a many-to-many relationship from a business perspective instead of a one-to-many). The results can be missing data, data duplication required to fit the data into the model, mischaracterization of data, time spent by data scientists waiting for clean data (and they re not inexpensive), and loss of confidence in the ability to deliver quality information. Management, project leaders, and architects (application and data) need to collaborate to determine the best approach for roles and responsibilities on a particular big data project. Application development and data modeling require different skill sets and have different perspectives, and projects benefit from reconciling diverse opinions. When application developers design a database without proper training and experience, they are generally focused on the short-term needs of the specific tactical application development goals. They may have limited views toward Data scientists need a sandbox where they are free to create their own tables, manipulate data, and formulate and test hypotheses. In this case, the data modeler s involvement should be to support the data scientist s understanding of the data. Similarly, application developers may need similar flexibility to prototype or rapidly develop solutions. For applications destined to be put into production, the data model provides an architectural foundation that supports the overall organizational goals and standards and should be a key component. BUSINESS INTELLIGENCE JOURNAL VOL. 19, NO. 4 19
4 As an example, when developing a reporting environment, a data modeler would design a model (e.g., a dimensional model with facts and dimensions) for an RDBMS. The data modeler could develop a similar dimensional model for implementation on Hive. Where there are differences, data modelers must learn the new NoSQL technologies to effectively support big data initiatives, much as they transitioned from a traditional RDBMS to data warehousing (DW) appliances. Data architects and modelers need to be concerned about enabling integration and effective data management without slowing down the development process. They must work to standardize the naming, definitions (business description, logical data type, length, etc.), and relationships of the data whenever possible to prevent future integration nightmares and to manage the data effectively. Different technologies have different ways of physically naming items; data modelers should ensure that the same concepts/business terminology is leveraged. For example, if a customer is called a customer and not an account in the enterprise data model, the same term should be used in new data stores. This practice reduces future integration problems, even if the physical representation of the name needs to be different due to technical constraints. Data is an enterprise asset that must be identified (including resolving and tracking synonyms and homonyms); understood and defined; tracked for effective stewardship, data lineage, and impact analysis (if you don t know where your assets are, you re not managing assets well) in a metadata repository or business glossary; and have its quality, security, and usage identified and measured. Data architects and modelers help to ensure the data is aligned with enterprise standards whenever possible and have an eye on the bigger picture looking beyond the current project scope. Data architects and modelers understand how to translate business requirements into the appropriate design for a data store. Indeed, conceptual and logical data models can and should inform the choice of database platform (relational, wide column, graph, etc.). Not every application works equally well on all platforms. For example, if a generic data model type such as entity attribute value (EAV) seems indicated where the entity and attribute names are determined at the time of insert rather than having a table per entity each with predefined columns and the volume is significant, a wide column store such as HBase or Cassandra might be better. If there are many relationships in the data (such as with social networking), perhaps a graph data store is called for. How Data Modeling Changes with Big Data The standard modeling processes are the same. The first data model to develop is a conceptual data model (CDM), which identifies key business objects and their relationships. Even when the system must handle unstructured data, there are still business objects and relationships that must be identified, even if just for the metadata providing context to the unstructured data. The CDM (using ERD, UML Class Diagram, ORM, or other modeling notation) provides a framework for an information system. It is roughly equivalent to an architect s conceptual design of a building. Of course, we would never allow someone to build a house without seeing a picture of it beforehand yet, strangely, many in IT don t have a problem with going straight to schema development. This would be like a builder drawing a layout of the electrical or plumbing system without knowing what the building is going to look like. Obviously, conceptual data modeling is still required for any information system intended for production the CDM is the framework for an information system. Nothing is worse than building a data store where the data and relationships don t align with business requirements. Some may argue that in a big data environment you should just put all source data onto your cluster and let the data scientists hammer away and manipulate the data as desired. There is definitely real value to enabling this scenario, and you don t want your data scientists waiting for all the data to get integrated into the EDW before they can start data mining. Keep in mind two key points: Data scientists must understand the business and, of course, the data and relationships. The conceptual 20 BUSINESS INTELLIGENCE JOURNAL VOL. 19, NO. 4
5 data model will allow the data scientist to better understand the business objects, how they relate, and the cardinality of the relationships. This will help data scientists understand how to navigate the data and identify data quality issues that may skew results. Integration and data quality assurance are difficult and time consuming. Most of a data scientist s time is spent gathering, integrating, and cleansing data before even starting analysis. Did I mention that data scientists are expensive? Initially, all the data can be stored in the data lake and models and other metadata collected and provided to make sense of the data. Eventually, more data can be integrated, cleansed, structured, and pulled into the EDW or retained in the big data environment as appropriate. By cleansing, standardizing, and integrating this data once and persisting it, it will be available for future analyses that can be performed more rapidly. Providing a semantic layer schema on read can help make column names more business oriented to improve understandability. One notion about data science is that it is important to be able to fail fast to minimize the impact and risk when a data mining algorithm doesn t produce usable or valuable results. Storing all the data in a big data environment and allowing data scientists to analyze and manipulate it is a worthwhile goal. However, without the enterprise data model (which may include conceptual and logical models), source data models and dictionaries, mappings from the source data to the enterprise data model, and data profiling results from the source data, your data scientists will be less effective. When data from multiple systems must be integrated for analysis, more time must be allocated, even if all the data is stored on the big data platform. The Logical Data Model The next model that should be created in most cases is the logical data model (LDM), though it is not always required. For example, if you are developing a graph database, you might convert your CDM into an ontology (e.g., using RDF/OWL) as the foundation for your semantic model/graph database. The conceptual data model should only use business terminology; abstractions can cause confusion for business managers, analysts, and data stewards who need to review and approve the model. The CDM might be lightly attributized some attributes (such as sales region, country, or address) may be considered a business object and have relationships that should be expressed, and other attributes (e.g., first name, sale date, or sale amount) would probably not be considered a discrete entity. An LDM often includes abstractions of entities, attributes, and relationships to increase reuse. For example, in a CDM, entities such as patient, professional provider, employee, and representative might be abstracted into a generic entity called person in the LDM so you re not storing duplicate name, demographic, and contact information. An LDM often begins to represent a data structure that can be implemented in a data store, and thus is the start of your actual design. The CDM is not part of design. Instead, it describes the business from a business object perspective and is performed in the requirements gathering phase. A critical component in the LDM is the identification of keys, primarily natural keys. Although the NoSQL technology might not employ indices (unique or otherwise) or have key constraints, it is still very important to understand what attribute(s) uniquely identify a record. Even when partitioning relational tables, it can be difficult to enable a unique index because the partition key might need to be a subset of the primary key (and the natural key is not always the primary key, such as in the case of surrogate keys). In any case, in the LDM you must be sure to identify natural keys (as either primary or alternate keys) so that the granularity and meaning of the entity can be more accurately understood and serve as an aid for the use of and communication about the data. An enterprise logical data model is constructed based on a normalized pattern independent of application (even enterprise applications such as MDM or EDW) so that entities can be defined at the atomic level to enable future data integration. An enterprise logical data model may retain many-to-many and subtype relationships (which BUSINESS INTELLIGENCE JOURNAL VOL. 19, NO. 4 21
6 aren t implementable without transformation, such as associative entities to resolve an M:N relationship) in order to allow the applications to transform these as appropriate to the application. An application logical data model may be normalized or denormalized as applicable. For an application leveraging a NoSQL wide column store such as HBase or Cassandra that doesn t allow joins natively, the application LDM might be a single denormalized table with the row key and commonly used columns or column types defined. Alternatively, column families could be defined as separate logical entities when multiple column families are envisioned to aid communication and understanding. At this time, I am unaware of any support in commonly used data modeling tools for NoSQL constructs such as column families, so the data modeler will need to adjust usage of the modeling tool as necessary. Column families may also be defined based on physical characteristics such as encoding or memory pinning. Column families defined to support performance or other physical characteristics should be defined in the physical data model or in the table schema, not in the LDM. Summary The age of NoSQL big data is clearly pushing a paradigm shift away from the familiar relational database world. However, much remains the same about the need to model and manage data. Effectively translating business requirements into data structures intended for a production environment still requires the skills of a data modeler. Data governance and stewardship decisions and standards need to be followed in NoSQL data stores because data is an enterprise asset. For data scientists to answer the really tough questions, they must understand what the data means and how it ties together. Data models and modelers provide invaluable assistance to the data scientists. Data modelers need to learn NoSQL technologies so they can remain relevant and provide the best service possible. Data modelers and application developers both need to be adaptable and work well together to build optimal NoSQL solutions. Data modeling tool vendors need to support NoSQL constructs because NoSQL big data technology adoption is accelerating. The Physical Data Model The last step in the data modeling process is creating the physical data model (PDM) that the schema will be based on and where physical characteristics of the tables (encoding, format, partitioning) are determined. Unfortunately, the traditional data modeling tools provide little support for NoSQL, so forward engineering the PDM to create DDL appears to be out of the picture for the time being. The NoSQL DDL will probably need to be developed outside of the modeling tool manually or using a tool such as AquaData Studio. Developing a PDM becomes less important than the CDM and LDM, but may still be useful for demonstrating denormalization (e.g., HBase or Cassandra) for physical naming (instead of using business names in the CDM and LDM), and for metadata retention. Even if DDL can t be generated, describing the physical data structures and showing how they relate can still yield benefits. 22 BUSINESS INTELLIGENCE JOURNAL VOL. 19, NO. 4
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
Agile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC
Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Data Modeling Basics
Information Technology Standard Commonwealth of Pennsylvania Governor's Office of Administration/Office for Information Technology STD Number: STD-INF003B STD Title: Data Modeling Basics Issued by: Deputy
Data Modeling Master Class Steve Hoberman s Best Practices Approach to Developing a Competency in Data Modeling
Steve Hoberman s Best Practices Approach to Developing a Competency in Data Modeling The Master Class is a complete data modeling course, containing three days of practical techniques for producing conceptual,
JOURNAL OF OBJECT TECHNOLOGY
JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,
What's New in SAS Data Management
Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases
Cloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
MDM and Data Warehousing Complement Each Other
Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There
Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain
Talend Metadata Manager Reduce Risk and Friction in your Information Supply Chain Talend Metadata Manager Talend Metadata Manager provides a comprehensive set of capabilities for all facets of metadata
US Department of Education Federal Student Aid Integration Leadership Support Contractor January 25, 2007
US Department of Education Federal Student Aid Integration Leadership Support Contractor January 25, 2007 Task 18 - Enterprise Data Management 18.002 Enterprise Data Management Concept of Operations i
New Modeling Challenges: Big Data, Hadoop, Cloud
New Modeling Challenges: Big Data, Hadoop, Cloud Karen López @datachick www.datamodel.com Karen Lopez Love Your Data InfoAdvisors.com @datachick Senior Project Manager & Architect 1 Disclosure I m a Data
Using Tableau Software with Hortonworks Data Platform
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
Industry Models and Information Server
1 September 2013 Industry Models and Information Server Data Models, Metadata Management and Data Governance Gary Thompson ([email protected] ) Information Management Disclaimer. All rights reserved.
Ganzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
Introduction to Glossary Business
Introduction to Glossary Business B T O Metadata Primer Business Metadata Business rules, Definitions, Terminology, Glossaries, Algorithms and Lineage using business language Audience: Business users Technical
Data Integration Checklist
The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media
Improving your Data Warehouse s IQ
Improving your Data Warehouse s IQ Derek Strauss Gavroshe USA, Inc. Outline Data quality for second generation data warehouses DQ tool functionality categories and the data quality process Data model types
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
Understanding NoSQL Technologies on Windows Azure
David Chappell Understanding NoSQL Technologies on Windows Azure Sponsored by Microsoft Corporation Copyright 2013 Chappell & Associates Contents Data on Windows Azure: The Big Picture... 3 Windows Azure
An Overview of SAP BW Powered by HANA. Al Weedman
An Overview of SAP BW Powered by HANA Al Weedman About BICP SAP HANA, BOBJ, and BW Implementations The BICP is a focused SAP Business Intelligence consulting services organization focused specifically
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
IBM Analytics Make sense of your data
Using metadata to understand data in a hybrid environment Table of contents 3 The four pillars 4 7 Trusting your information: A business requirement 7 9 Helping business and IT talk the same language 10
Managing Third Party Databases and Building Your Data Warehouse
Managing Third Party Databases and Building Your Data Warehouse By Gary Smith Software Consultant Embarcadero Technologies Tech Note INTRODUCTION It s a recurring theme. Companies are continually faced
Understanding NoSQL on Microsoft Azure
David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
Traditional BI vs. Business Data Lake A comparison
Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
Data Virtualization and ETL. Denodo Technologies Architecture Brief
Data Virtualization and ETL Denodo Technologies Architecture Brief Contents Data Virtualization and ETL... 3 Summary... 3 Data Virtualization... 7 What is Data Virtualization good for?... 8 Applications
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
Teradata s Big Data Technology Strategy & Roadmap
Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any
IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!
The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader
CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved
CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information
TRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
Enabling Better Business Intelligence and Information Architecture With SAP Sybase PowerDesigner Software
SAP Technology Enabling Better Business Intelligence and Information Architecture With SAP Sybase PowerDesigner Software Table of Contents 4 Seeing the Big Picture with a 360-Degree View Gaining Efficiencies
WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS
WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies
Big Data Can Drive the Business and IT to Evolve and Adapt
Big Data Can Drive the Business and IT to Evolve and Adapt Ralph Kimball Associates 2013 Ralph Kimball Brussels 2013 Big Data Itself is Being Monetized Executives see the short path from data insights
Luncheon Webinar Series July 29, 2010
Luncheon Webinar Series July 29, 2010 Business Glossary & Business Glossary Anywhere Sponsored By: 1 Business Glossary & Business Glossary Anywhere Questions and suggestions regarding presentation topics?
The Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.
Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments Anuraj Gupta Department of Electronics and Communication Oriental Institute
BUSINESS RULES AND GAP ANALYSIS
Leading the Evolution WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Discovery and management of business rules avoids business disruptions WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Business Situation More
PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.
PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
Master Data Management. Zahra Mansoori
Master Data Management Zahra Mansoori 1 1. Preference 2 A critical question arises How do you get from a thousand points of data entry to a single view of the business? We are going to answer this question
Data Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
A Design Technique: Data Integration Modeling
C H A P T E R 3 A Design Technique: Integration ing This chapter focuses on a new design technique for the analysis and design of data integration processes. This technique uses a graphical process modeling
Enterprise Data Quality
Enterprise Data Quality An Approach to Improve the Trust Factor of Operational Data Sivaprakasam S.R. Given the poor quality of data, Communication Service Providers (CSPs) face challenges of order fallout,
Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014
Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4
Applications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff
Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff The Challenge IT Executives are challenged with issues around data, compliancy, regulation and making confident decisions on their business
Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:
Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Bringing Big Data into the Enterprise
Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?
Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov
Unlock your data for fast insights: dimensionless modeling with in-memory column store By Vadim Orlov I. DIMENSIONAL MODEL Dimensional modeling (also known as star or snowflake schema) was pioneered by
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Preparing Your Data For Cloud
Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability
Design and Implementation
Pro SQL Server 2012 Relational Database Design and Implementation Louis Davidson with Jessica M. Moss Apress- Contents Foreword About the Author About the Technical Reviewer Acknowledgments Introduction
Structured Content: the Key to Agile. Web Experience Management. Introduction
Structured Content: the Key to Agile CONTENTS Introduction....................... 1 Structured Content Defined...2 Structured Content is Intelligent...2 Structured Content and Customer Experience...3 Structured
Three Fundamental Techniques To Maximize the Value of Your Enterprise Data
Three Fundamental Techniques To Maximize the Value of Your Enterprise Data Prepared for Talend by: David Loshin Knowledge Integrity, Inc. October, 2010 2010 Knowledge Integrity, Inc. 1 Introduction Organizations
Enabling Better Business Intelligence and Information Architecture With SAP PowerDesigner Software
SAP Technology Enabling Better Business Intelligence and Information Architecture With SAP PowerDesigner Software Table of Contents 4 Seeing the Big Picture with a 360-Degree View Gaining Efficiencies
Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
Structured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
WHITE PAPER. Four Key Pillars To A Big Data Management Solution
WHITE PAPER Four Key Pillars To A Big Data Management Solution EXECUTIVE SUMMARY... 4 1. Big Data: a Big Term... 4 EVOLVING BIG DATA USE CASES... 7 Recommendation Engines... 7 Marketing Campaign Analysis...
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Chapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Making Sense of Big Data in Insurance
Making Sense of Big Data in Insurance Amir Halfon, CTO, Financial Services, MarkLogic Corporation BIG DATA?.. SLIDE: 2 The Evolution of Data Management For your application data! Application- and hardware-specific
Data Vault and The Truth about the Enterprise Data Warehouse
Data Vault and The Truth about the Enterprise Data Warehouse Roelant Vos 04-05-2012 Brisbane, Australia Introduction More often than not, when discussion about data modeling and information architecture
BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by:
BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE Cerulium Corporation has provided quality education and consulting expertise for over six years. We offer customized solutions to
POLAR IT SERVICES. Business Intelligence Project Methodology
POLAR IT SERVICES Business Intelligence Project Methodology Table of Contents 1. Overview... 2 2. Visualize... 3 3. Planning and Architecture... 4 3.1 Define Requirements... 4 3.1.1 Define Attributes...
How To Learn To Use Big Data
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design
COURSE OUTLINE Track 1 Advanced Data Modeling, Analysis and Design TDWI Advanced Data Modeling Techniques Module One Data Modeling Concepts Data Models in Context Zachman Framework Overview Levels of Data
BIG DATA-AS-A-SERVICE
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
Artur Borycki. Director International Solutions Marketing
Artur Borycki Director International Solutions Agenda! Evolution of Teradata s Unified Architecture Analytical and Workloads! Teradata s Reference Information Architecture Evolution of Teradata s" Unified
Big Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
Patterns of Information Management
PATTERNS OF MANAGEMENT Patterns of Information Management Making the right choices for your organization s information Summary of Patterns Mandy Chessell and Harald Smith Copyright 2011, 2012 by Mandy
Navigating the Big Data infrastructure layer Helena Schwenk
mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining
Big data for the Masses The Unique Challenge of Big Data Integration
Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...
Information Management Metamodel
ISO/IEC JTC1/SC32/WG2 N1527 Information Management Metamodel Pete Rivett, CTO Adaptive OMG Architecture Board [email protected] 2011-05-11 1 The Information Management Conundrum We all have Data
Apache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
TE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
Cloud Integration and the Big Data Journey - Common Use-Case Patterns
Cloud Integration and the Big Data Journey - Common Use-Case Patterns A White Paper August, 2014 Corporate Technologies Business Intelligence Group OVERVIEW The advent of cloud and hybrid architectures
Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM
Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that
ETL-EXTRACT, TRANSFORM & LOAD TESTING
ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA [email protected] ABSTRACT Data is most important part in any organization. Data
