Building Governance into Big Data

Size: px
Start display at page:

Download "Building Governance into Big Data"

Transcription

1 Building Governance into Big Data A metadata-based approach for ensuring visibility and control for your Hadoop data architecture A Hortonworks White Paper SEPTEMBER 2015 Building Governance into Big Data 2015 Hortonworks

2 2 Contents Overview 3 Why data governance matters 4 Four essential elements of Hadoop data governance 4 Why metadata and taxonomy hold the key to comprehensive data governance 5 The Data Governance Initiative (DGI) building cross-industry metadata services in Hadoop 6 Addressing cross-industry use cases 7 DGI becomes Apache Atlas 8 Supporting data governance across industries through a flexible-type system 10 Key characteristics and capabilities of Atlas 11 Competitive analysis 13 Summary 15

3 3 Overview As organizations pursue Hadoop initiatives in order to capture new opportunities for data-driven insight, data governance requirements can pose a key challenge. The management of information to identify its value and enable effective control, security and compliance for customer and enterprise data is a core requirement for both traditional and Modern Data Architectures. However, it hasn t yet been clear how to easily address these requirements using Hadoop. Traditional data governance tools either treat Hadoop as a black box with no visibility or access into internal data manipulation (aka ETL, etc.), or impose significant restrictions in order to meet these requirements, such as requiring every job run to be authored within a single tool, undermining the value of the breadth of tooling across the Hadoop Modern Data Architecture. While Hadoop produces a large amount of operational and application related data that can be used for auditing purposes, the attempt to discern meaning from this information using a forensic or rear view mirror kind of approach can result in inconsistency and inaccurate results. As a result of these challenges, a Data Lake can easily become a data swamp as users lose track of what data it contains, where it came from and the processes used to shape it. Hortonworks, committed to innovation at the core, has been a leader in industry efforts to weave data governance into the fabric of the Modern Data Architecture. Recognizing that Hadoop isn t an island of data, our approach has been to ensure that everything we build is open and can integrate within the context of the Modern Data Architecture. This approach provides our customers with a comprehensive view of data as it moves between systems and is transformed and accessed. The realization of this approach has revolved around a common set of metadata services, helpful information that describes and provides information about other data. Through an open, collaborative initiative with a small number of industry thought leaders, Hortonworks has helped develop capabilities and frameworks that can be applied across industries to ensure effective management and governance of Big Data environments. Working with other industry thought leaders, Hortonworks launched Apache Atlas to apply consistent metadata and taxonomy across the data ecosystem. Hortonworks empowers data managers to ensure the transparency, reproducibility, auditability and consistency of the Data Lake and the assets it contains. Hadoopcentric information can be leveraged in this broader context using third-party products to form a comprehensive view. In this way, Apache Atlas sits at the core of data governance for Hadoop and makes it possible for enterprises to capitalize on the power of Big Data to drive growth, differentiation and competitive advantage while maintaining full control and oversight.

4 4 Why data governance matters Data governance is a matter of critical importance for every organization that relies on data to drive business value in other words, virtually every organization today. Businesses in highly regulated industries such as finance and healthcare must maintain effective control and visibility over data to ensure auditability and compliance. For other companies, data governance is crucial for securing sensitive information and protecting customer privacy while helping employees leverage the full value of information to drive growth and differentiation. But every company, as it grows and expand its data lake beyond the first few use cases and applications, needs an easy way to explore the data sets that exist within the lake. At the same time, data governance needs to be built-in and automated as much as possible. The approach should support the process of bringing data into Hadoop and applied consistently across every subsequent access point to the data itself. What enterprises need is an approach to data governance for Big Data that creates value by: Enabling rapid discovery of datasets already contained within Hadoop, eliminating requests for duplicate data to be curated or ingested Addressing compliance reporting requirements for Hadoop related to data access and lineage, to reduce both cost and regulatory risk Supporting comprehensive data governance initiatives that span Hadoop and traditional data systems As Hadoop enables enterprises to grow the volume, velocity and variety of data that can be leveraged for insight, the importance of governance grows in tandem with the scale of the Data Lake. By building effective data governance into the architecture that powers Big Data, businesses can realize the full value of their information assets while ensuring effective risk management. Four essential elements of Hadoop data governance Critics of the Data Lake approach have characterized it as throw all the data into cluster now, and worry about cleansing, reconciliation and enrichment later. Hadoop s schema-on-read functionality allows users to forgo the definition and organization of data as it enters the system, while its distributed architecture facilitates the persistence of data. As a result, organizations have unchecked permission to store virtually any type of data while delegating data management and governance to application layers operating at the top of the platform. This approach is all too likely to transform an organization s Data Lake into a data swamp while fostering additional governance risks. To realize the full value of Hadoop, enterprises must reconcile data management realities when they bring existing and new data from disparate sources into the Hadoop platform. Metadata and its use in the context of data governance are vital parts of any enterprise-ready Data Lake, and must be built into the ecosystem from the outset to prevent increasingly complex data management challenges moving forward.

5 5 The Hortonworks philosophy for data governance in the enterprise revolves around four tenets: Auditability: All relevant events and assets must be traceable with appropriate lineage Transparency: Governance standards and protocols must be clearly defined, consistently applied and available to all Reproducibility: Relevant data landscapes should be reproducible at any given point in time Consistency: Compliance programs must be policy-driven As Hadoop enables enterprises to grow the volume, velocity and variety of data that can be leveraged for insight, the importance of governance grows in tandem with the scale of the Data Lake. By building effective data governance into the architecture that powers Big Data, businesses can realize the full value of their information assets while ensuring effective risk management. Why metadata and taxonomy hold the key to comprehensive data governance The success of data governance fundamentally revolves around capturing metadata and defining meaningful taxonomies for data. A definition of these concepts can provide a useful context for understanding their value within Hadoop and the broader data ecosystem. Metadata is information that describes and provides information about other data. This may include data models, schemas and administrative information in addition to attributes such as title, author, subject, tags, date created and description. Once defined and documented, these attributes can be used to search, link, aggregate and grant access to the associated dataset. TECHNICAL METADATA BUSINESS METADATA OPERATIONAL METADATA Database name Table name Column name Data type Business names Business definition Business classification Sensitivity tags Who? (security access) What? (job information) When? (logs/audit trails) Where? (location) Taxonomy refers to any structure that is used to organize and classify information. Taxonomies are used as part of metadata fields to support consistent and accurate indexing of data structures, and to define the relationships among them. Taxonomy may include a standardized list of terms (vocabulary) that can be used to consistently order data classification structures and/ or hierarchies into parent-child relationships. One can think of metadata as a framework or filing cabinet for data, and taxonomy as a mechanism for organizing it into folders. This approach makes it possible to organize even vast amounts of information consistently, just as a similar hierarchical approach is used to categorize the millions of different life forms on earth into a rational and manageable structure of families, genuses and species. This can be contrasted with the simple name-value pairs used elsewhere, which are really free-form labels with no hierarchy structure and a vulnerability to error and duplication. It should also be noted that taxonomies can and do change with time. So accounting for changes that can occur within taxonomies over time is critical to the success of any system which leverages them.

6 6 Combining technical and business taxonomical metadata is the key for consistent data governance within Hadoop and the broader data ecosystem. A common metadata and classification framework ensures that all applications operating on top of Hadoop infrastructure will relate to and treat data in the same way. DATA + METADATA + BUSINESS TAXONOMY = AUDIT & GOVERNANCE HDFS files HCatalog definitions Falcon pipelines Ranger set of users Title Description Author Subject Date created Date modified Data sensitivity Organizational hierarchy Customer/ industry vocabulary Industry compliance standards Who did what, where and when and how The Data Governance Initiative (DGI) building cross-industry metadata services in Hadoop The application of data governance best practices for Hadoop is complicated by its current lack of a comprehensive approach to deliver visibility and control into workflows that require audit, lineage and security. While a number of available vendor solutions seek to fill this gap, their solutions are not integrated into the broader Hadoop ecosystem and require a siloed, monolithic workflow. Governance vendors support for multi-tenancy and concurrency are less than ideal as current offerings do not have visibility into activity outside their own narrow focus. Hortonworks has been a leader in industry efforts to address these challenges for Open Enterprise Hadoop. As part of our promise to drive enterprise readiness for Hadoop, Hortonworks established the Data Governance Initiative (DGI) in collaboration with Aetna, Merck, Target, and SAS. The charter of this initiative was to introduce a common, metadatapowered approach to data governance into the open source community, and to establish a framework with the flexibility to be applied across industries. Since its inception, this co-development effort has grown to include Schlumberger and a global financial institution. DGI members set forth two guiding principles: The Hadoop data governance framework must integrate seamlessly with existing frameworks and exchange metadata with them The framework must also address governance across all the components or data engines that operate on top of the Hadoop platform

7 7 Figure 1: Data Governance Initiative (DGI) laid the foundation for a common, metadata-powered approach to data governance. DGI members worked on this shared framework to determine how users access data within Hadoop while interoperating with and extending its capability to existing third-party data governance and management tools. Addressing cross-industry use cases By bringing together leading companies with deep expertise across a range of industries, DGI made it possible to develop a truly cross-industry, extensible framework. DGI members actively worked to materialize real industry data governance solutions through the open source community at an unprecedented rate. The expertise brought to the DGI by the members manifested itself in addressing the following industry use cases across financial services, healthcare, pharmaceutical and telecommunications. Chain of custody (compliance): The financial services sector operates under strict regulations that require detailed audit tracking of every event s origin, access and transformation in order to comply with customer and governmental inquiries. This involves tracking every copy, backup and derivation of each dataset, in addition to actions with regards to data access or denial. Financial services companies must be able to recreate the narrative for every dataset from its creation through its disposition at any given time. Healthcare ad hoc reporting (30-day measures): Reimbursements by the Centers for Medicare & Medicaid Services (CMS) represent a significant portion of healthcare provider revenues. A healthcare institution s bottom line can be adversely affected if they are penalized by the CMS for poor patient recidivism rates, making it essential to be able to assess and track patient outcomes over their entire history. This involves analyzing a wide set of sensitive patient data from disparate data sources on an ad-hoc basis for timely remediation. The work that was done as part of the DGI can be used to discover, catalog and score patient data rapidly and accurately and present it in the relevant context.

8 8 Licensing of research data (data masking): To optimize return on investment for product development cycles that can stretch years, pharmaceutical companies often license research data to other companies or partners. Each licensing agreement has specific requirements, often requiring data to be shared in its entirety with licensing customers or partners. To complicate matters, this data may contain sensitive personally identifiable information (PII), protected health information (PHI) data or both. To prevent regulatory violations, the licensing company must mask this sensitive information, while still making the entire dataset available to users based on their roles or data attributes. All these factors must be managed and coordinated in an efficient way. Energy companies often rely on similar licensing deals to monetize their own research data; while the regulatory environment differs, some of the same challenges come into play. Log analysis (customer experience): Data from telephony, networked devices, set-top boxes and websites hold vast quantities of information about the experience of individual telecommunications customers. This information is highly valuable to telecom companies, as inconsistent customer service can easily increase customer attrition and lower service margins. However, current data technologies make it extremely difficult to correlate customer events spread across a number of years and petabytes of data, making insights more difficult to expose. Opt-in customer data is specific to device, subscribed product, time and geography, and the lineage of all these attributes must be tracked to enable effective analysis. To mitigate subpar customer experience, providers must perform both real-time and predictive analysis of live streaming data, correlated by deep historical analysis. This analysis must be performed based on compliant methods that are grounded in established data governance practices. DGI laid the foundation to provide true visibility in Hadoop business processes such as these and other key use cases across industries. DGI becomes Apache Atlas Building on the success of DGI, Hortonworks, Aetna, Merck, SAS, Schlumberger, Target and others leveraged their groundbreaking co-development efforts into a new Apache project. In April 2015, they submitted a proposal for a new incubator project called Apache Atlas to the Apache Software Foundation. The founding members of the project include all the members of the DGI and others from the Hadoop community. Apache Atlas was proposed to provide governance capabilities in Hadoop. At its core, Atlas is designed to exchange metadata both within and outside of the Hadoop stack. By reconciling both logical data models and forensic events, enriched by business taxonomy metadata, Atlas enables a scalable set of core governance services. These services enable enterprises to effectively and efficiently address their compliance requirements by providing: Search and lineage for datasets Metadata-driven data access control Indexed and searchable centralized audit for operational events Comprehensive data lifecycle management from ingestion to disposition Metadata interchange with other metadata tools

9 9 In this way, Atlas allows for organizations to establish reliable and safe information products and better utilize information assets to generate revenue. By helping to eliminate duplicate data along with their associated cost, Atlas makes it easier for IT to support data exploration and compliance. As Hadoop enables enterprises to grow the volume, velocity and variety of data an enterprise can leverage for insight, the importance of governance grows with the Data Lake. A common metadata store provides the foundation for addressing these requirements and delivering a broad range of data governance capabilities for Hadoop. It also provides a focal point for interoperability to any metadata consumer within the ecosystem and within the Modern Data Architecture, rather than requiring each project or component within the Hadoop stack to provide its own unique interface. This further reduces cost and complexity for IT while enabling a holistic approach to data governance across the Data Lake. Rather than requiring each third-party product (ETL tools, broader data governance tools, etc.) to understand which projects and components are within the Hadoop ecosystem, Atlas provides a focal point for interoperability and information exchange. Of course, this isn t delivered in a big bang approach, but rather as a sustained open source effort. The community has decided to take a gradual approach to delivering comprehensive interoperability capability and has come together to define and build the core of Apache Atlas. The community has also outlined a clear roadmap to integrate a number of Hadoop ecosystem components with the common metadata store. Hive was chosen as the starting point due to its maturity, existing footprint among current Hadoop users and the fact that it is similar in concept to existing enterprise data warehouse technologies that are subject to these same data governance challenges. Figure 2: Atlas delivers out-of the-box integration with Apache Hive as its starting point with plans to expand from there.

10 10 Supporting data governance across industries through a flexible-type system The Apache community built Atlas with the realization that when it comes to data governance, one size doesn t fit all. It would be impractical for the community to attempt to create a super data model that would satisfy the unique requirements of all the diverse industries and business processes. This approach would also result in duplicate data models, given that enterprises across industries have already invested significant resources in building and refining the data models that reflect the unique ways in which they do business. A much more effective and efficient approach is to provide enterprises with the ability to import and export metadata, as it currently exists in non-hadoop systems such as ETL tools, ERP systems or data warehouses. The Atlas adaptive model streamlines compliance efforts by allowing companies to import existing metadata structures via REST-based APIs from other sources to leverage legacy investments, or to pre-load a taxonomy-rule combination for a specific industry or line of business. This approach is especially relevant for companies in the payment card industry (PCI), where a consistent metadata vocabulary ensures that downstream audit and compliance processes will match perfectly with metadata tags and access rules. With Atlas, data stewards also have the ability to define, annotate and automate the capture of relationships between data sets and underlying elements including source, target and derivation processes. Atlas ensures downstream metadata consistency across the ecosystem by enabling enterprises to easily export it to third-party systems. The advantages of the flexible-type system can be seen in its day-to-day use. Atlas empowers IT to model business organizations as well as technical metadata about enterprise data. Administrators can create ad-hoc or bulk structures that allow users assign a business tag (taxonomy) to physical data structures including database, tables or columns. For example, a data steward can assign a PII (personally identifiable information) tag to a column in a Hive table that contains employee's social security numbers. Whenever that column is used as part of a business workflow or is queried against for analysis purposes, it carries the PII tag with it and the user is notified of its appropriate use. Since Atlas is aware of how and when a tagged data structure was accessed, copied or modified, it can construct its lineage at any given time based on actual data events. This approach provides the enterprises with the confidence that its data governance processes are comprehensive enough to pass independent audit. This approach is also applicable to logical data structures (business taxonomy) such as hierarchies of departments or products. A data administrator can tag a data structure once at the parent level and all the associated child elements automatically inherit that tag. For example, a human resources data asset group can be tagged sensitive or PII and all child groups inside that parent group such as Drivers or Timesheets would inherit this attribute. Figure 3: Apache Atlas enables business tags applied to the parent entity to be automatically inherited by child entities.

11 11 Key characteristics and capabilities of Atlas As a result of the collaborative approach to its development, Atlas provides a robust and comprehensive framework for addressing governance for Big Data. The following attributes contribute to its unique effectiveness in this regard. Prescriptive lineage Lineage typically refers to the steps a dataset took to arrive to its current state, as well as any copies that may have been created. However, simply looking at audit or log correlations alone to determine if the lineage is flawed is not enough. As it is not possible to determine with certainty whether the route a data workflow took was correct or in compliance. Data governance approaches based on time-based algorithms are especially problematic as this inaccurate process can lead to misplaced confidence in a method, which would never pass serious compliance scrutiny. Without a more comprehensive understanding, it is impossible to take any action that might be warranted. The correct approach is to combine logical models of workflow with log events for validation and completeness, an approach called prescriptive lineage. This is the path that Atlas takes. Dynamic, metadata-based access policies for real-time policy enforcement Governance control cannot be passive or simply forensic; reports on who did what, when, are not enough. Apache Ranger is an open source project that provides authorization and authentication to the Hadoop ecosystem. By integrating with Ranger, Atlas empowers enterprises to rationalize compliance policy at runtime based on Atlas s data classification schemes by leveraging Ranger to enforce a flexible attribute-based policy that prevents violations from occurring. Ranger s centralized platform empowers data administrators to define security policy once based on Atlas metadata tags or attributes defined by a data steward or administrators, and apply this policy in real time to the entire hierarchy of assets. Data stewards can focus on discovery and tagging while another group can manage compliance policy. This decoupling of explicit policy offers two important benefits: Dynamic policy enforcement: data analysis-driven tags can be enforced immediately Reusability: One policy can be applied to many assets, simplifying management Apache Ranger enforces both role-based (RBAC) and attribute-based (ABAC) access control to create a flexible security profile that meets the needs of data-driven enterprises. The initial set of policies being constructed within the community are defined as: 1. Attribute-based access controls: For example, a column in a particular Hive table is marked with the metadata tag PII. This tag is then used to assign multiple entitles to a group. This is an evolution from role-based entitlements, which require discrete and static one-to-one mappings. 2. Prohibition against dataset combinations: It s possible for two data sets for example, one consisting of account numbers and the other of customer names to be in compliance individually, but pose a violation if combined. Administrators can apply a metadata tag to both sets to prevent them from being combined, helping avoid such a violation. 3. Time-based access policies: Administrators can use metadata to define access according to time windows in order to enforce compliance with regulations such as SOX 90-day reporting rules. 4. Location-specific access policies: Similar to time-based access policies, administrators can define entitlements differently by geography. For example, a U.S.-based user might be granted access to data while still in a domestic office, and then travel to Switzerland. Although the same user may be trying to access the same data, the different geographical context would apply, triggering a different set of privacy rules to be evaluated.

12 12 These policies can be used in combination to create a very sophisticated security access policy for each user at that point in time and location. Of course, the reach that Apache Ranger provides in terms of authorization for an ever growing number of Hadoop ecosystem components (currently eight at the time of this writing) allows organizations to consistently define and apply data access policies based on metadata regardless of the route by which the user or application attempts to the data itself. Audit and reporting Atlas leverages a common metadata store and policy rules, and the community plans to leverage this further with centralized log data for advanced reporting and analysis. Customers can recreate the data landscape at any given time by capturing security access information for every application, process and interaction with data, thereby providing insight into operational information for completed tasks as well as intermediate steps and activities. In the future, combining the capabilities of the HDP log search with a cross-component globally unique identifier (GUID), Atlas will strive to provide greater visibility to the entire HDP stack. RESTful APIs Atlas facilitates exploration of audit information by providing pre-defined navigation paths to data classification and audit information. Text-based search features in Atlas locate relevant data and audit event across the Data Lake quickly and accurately. Data stewards have the power to visualize a data set s lineage and then drill down into operational, security and provenance-related details. Native connector for Hive integration HDP 2.3 saw the initial release of Atlas. Included is a native connector to automatically capture all SQL activity on HiveServer2. All activity through HiveServer2 is tracked, providing lineage of both the data and the schema. This is then combined with business taxonomy to provide an enriched search and discovery capability. Governance-ready certification Atlas strives to foster a vibrant ecosystem to address Hadoop application integration requirements based on a centralized metadata store. A certification program aims to create a curated group of partners that contribute a rich set of data management features encompassing data preparation, integration, cleansing, tagging, ETL visualization and collaboration. Certified partners will define a set of metadata standards to exchange data and contribute conforming data integration features to the metadata store. Customers can then subscribe to features that they want to deploy with low switching costs and faster ramp-up times. Smaller firms can differentiate themselves by contributing innovative features to the program and benefit from other features to devise end-to-end workflow processing.

13 13 Competitive analysis As a result of the collaborative development of Atlas following the principles of Open Enterprise Hadoop, HDP offers key advantages over solutions developed through a proprietary approach to Hadoop. METADATA SERVICES HORTONWORKS DATA PLATFORM Metadata built around a core flexible type system that can model any organizational and data structure. Support for hierarchies and inheritance of attributes (parent-to-child elements). PROPRIETARY HADOOP Flat modeling using name-value pairs. Coarse and inelegant data modeling. No hierarchy or inheritance support. Open, platform-wide metadata integration to provide cross component lineage and dependencies. Lineage support for Hcatalog, Hive and HDFS. No support for Kafka or Storm. Open metadata services coordinate and support the entire platform, including complete SQL lineage, tag based real-time policy protection and common taxonomy for data pipelines. Custom connection supported through rich REST API set. Limited proprietary point integrations for certain components only Hcatalog, Hive and HDFS. PRESCRIPTIVE LINEAGE HORTONWORKS DATA PLATFORM Business and Operational: Combine logical models of workflow and log events for validation and completeness. PROPRIETARY HADOOP Operational event data lineage assembled by algorithm. Backward-looking only, no validation for missing elements. Taxonomy: Lineage searchable by both hierarchical business taxonomy (classification) and tags (traits) such as PII, as well as by data type (Hive table, column, etc.) Search only on operational data and flat labels; no validation against taxonomy for duplications or typos. Advanced search: Domain-specific language (DSL-SQL like search) that supports keyword and full text search Full text search only.

14 14 DATA LIFE CYCLE HORTONWORKS DATA PLATFORM Reusable: Logical model to create reusable and repeatable workflows. PROPRIETARY HADOOP Manually create each job and schedule. Built-in data management policies: Late data handling, replication (both HDFS and Hive) and eviction (disposition). Manually create each job and schedule. THIRD-PARTY SUPPORT HORTONWORKS DATA PLATFORM Gov.-ready certification: Certification that partners are being good citizens. Common metadata store, no proprietary formats, must use open APIs and SLA for lineage commits. PROPRIETARY HADOOP Not available. Low cost and no vendor lock-in: Common metadata stores to allow HDP users to change vendor with minimum switching cost. Customer metadata control and ownership. Not available. Vendor lock-in with typical cycle of configuration and migration. Agility and rapid customization: Common metadata to allow rapid deployment of new vendor or features with minimal downtime and risk. Data management tools available a la carte instead of only in rigid suites. Vendor-specific proprietary point solutions. No shared metadata. Not open.

15 15 Summary The transformative value of Big Data has driven the rapid adoption of Hadoop across businesses and industries of all kinds, but to be a truly enterprise-ready technology, its implications for data governance must be recognized and addressed. To manage risk, organizations need a comprehensive and effective way to ensure full visibility, control and compliance for the corporate and customer information in the Data Lake. Recognizing data governance as an essential element of Open Enterprise Hadoop, Hortonworks has collaborated with industry partners to create a flexible, open framework based on metadata and taxonomy to ensure the auditability, transparency, reproducibility and consistency of the Data Lake and the information it contains. This metadata-based approach is embodied in Apache Atlas, a project developed collaboratively by Hortonworks and a diverse group of large enterprises. Atlas will allow a single gateway to interface with all the diverse components in the HDP stack and harmonizes hem with the rest of the enterprise data ecosystem. A core flexible type system allows modeling of any organizational or data structures with built-in support for hierarchies and inheritance of attributes or tags (parent-to-child elements). Administrators also benefit from rich capabilities to define and enforce policies flexibly to support a wide range of industry use cases, and to take action quickly when data governance policies are violated. The power and versatility of Hadoop is the direct result of its open and collaborative development. By continuing this approach to address key enterprise requirements for data governance, Hortonworks helps companies leverage the strength of the open source community to manage risk without compromising productivity or data accessibility. In this way, customers can be confident that their Big Data strategy is built on a foundation of visibility, control and compliance. About Hortonworks Hortonworks develops, distributes and supports the only 100% open source Apache Hadoop data platform. Our team comprises the largest contingent of builders and architects within the Hadoop ecosystem who represent and lead the broader enterprise requirements within these communities. Hortonworks Data Platform deeply integrates with existing IT investments upon which enterprises can build and deploy Hadoop-based applications. Hortonworks has deep relationships with the key strategic data center partners that enable our customers to unlock the broadest opportunities from Hadoop. For more information, visit

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015 Ensure PCI DSS compliance for your Hadoop environment A Hortonworks White Paper October 2015 2 Contents Overview Why PCI matters to your business Building support for PCI compliance into your Hadoop environment

More information

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015 Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

Data Management Roadmap

Data Management Roadmap Data Management Roadmap A progressive approach towards building an Information Architecture strategy 1 Business and IT Drivers q Support for business agility and innovation q Faster time to market Improve

More information

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015 Data Governance in the Hadoop Data Lake Kiran Kamreddy May 2015 One Data Lake: Many Definitions A centralized repository of raw data into which many data-producing streams flow and from which downstream

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

HP SOA Systinet software

HP SOA Systinet software HP SOA Systinet software Govern the Lifecycle of SOA-based Applications Complete Lifecycle Governance: Accelerate application modernization and gain IT agility through more rapid and consistent SOA adoption

More information

ORACLE HYPERION DATA RELATIONSHIP MANAGEMENT

ORACLE HYPERION DATA RELATIONSHIP MANAGEMENT Oracle Fusion editions of Oracle's Hyperion performance management products are currently available only on Microsoft Windows server platforms. The following is intended to outline our general product

More information

How to avoid building a data swamp

How to avoid building a data swamp How to avoid building a data swamp Case studies in Hadoop data management and governance Mark Donsky, Product Management, Cloudera Naren Korenu, Engineering, Cloudera 1 Abstract DELETE How can you make

More information

www.sryas.com Analance Data Integration Technical Whitepaper

www.sryas.com Analance Data Integration Technical Whitepaper Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring

More information

Oracle Role Manager. An Oracle White Paper Updated June 2009

Oracle Role Manager. An Oracle White Paper Updated June 2009 Oracle Role Manager An Oracle White Paper Updated June 2009 Oracle Role Manager Introduction... 3 Key Benefits... 3 Features... 5 Enterprise Role Lifecycle Management... 5 Organization and Relationship

More information

Why enterprise data archiving is critical in a changing landscape

Why enterprise data archiving is critical in a changing landscape Why enterprise data archiving is critical in a changing landscape Ovum white paper for Informatica SUMMARY Catalyst Ovum view The most successful enterprises manage data as strategic asset. They have complete

More information

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10

More information

Cross-Domain Service Management vs. Traditional IT Service Management for Service Providers

Cross-Domain Service Management vs. Traditional IT Service Management for Service Providers Position Paper Cross-Domain vs. Traditional IT for Providers Joseph Bondi Copyright-2013 All rights reserved. Ni², Ni² logo, other vendors or their logos are trademarks of Network Infrastructure Inventory

More information

Building a Data Quality Scorecard for Operational Data Governance

Building a Data Quality Scorecard for Operational Data Governance Building a Data Quality Scorecard for Operational Data Governance A White Paper by David Loshin WHITE PAPER Table of Contents Introduction.... 1 Establishing Business Objectives.... 1 Business Drivers...

More information

CORPORATE OVERVIEW. Big Data. Shared. Simply. Securely.

CORPORATE OVERVIEW. Big Data. Shared. Simply. Securely. CORPORATE OVERVIEW Big Data. Shared. Simply. Securely. INTRODUCING PHEMI SYSTEMS PHEMI unlocks the power of your data with out-of-the-box privacy, sharing, and governance PHEMI Systems brings advanced

More information

Data Masking: A baseline data security measure

Data Masking: A baseline data security measure Imperva Camouflage Data Masking Reduce the risk of non-compliance and sensitive data theft Sensitive data is embedded deep within many business processes; it is the foundational element in Human Relations,

More information

www.ducenit.com Analance Data Integration Technical Whitepaper

www.ducenit.com Analance Data Integration Technical Whitepaper Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

MOVING TO THE NEXT-GENERATION MEDICAL INFORMATION CALL CENTER

MOVING TO THE NEXT-GENERATION MEDICAL INFORMATION CALL CENTER MOVING TO THE NEXT-GENERATION MEDICAL INFORMATION CALL CENTER Pharma companies are improving personalized relationships across more channels while cutting cost, complexity, and risk Increased competition

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

WHITEPAPER. Why Dependency Mapping is Critical for the Modern Data Center

WHITEPAPER. Why Dependency Mapping is Critical for the Modern Data Center WHITEPAPER Why Dependency Mapping is Critical for the Modern Data Center OVERVIEW The last decade has seen a profound shift in the way IT is delivered and consumed by organizations, triggered by new technologies

More information

Corralling Data for Business Insights. The difference data relationship management can make. Part of the Rolta Managed Services Series

Corralling Data for Business Insights. The difference data relationship management can make. Part of the Rolta Managed Services Series Corralling Data for Business Insights The difference data relationship management can make Part of the Rolta Managed Services Series Data Relationship Management Data inconsistencies plague many organizations.

More information

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014 Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

A Comprehensive Solution for API Management

A Comprehensive Solution for API Management An Oracle White Paper March 2015 A Comprehensive Solution for API Management Executive Summary... 3 What is API Management?... 4 Defining an API Management Strategy... 5 API Management Solutions from Oracle...

More information

Master Data Management (MDM) in the Sales & Marketing Office

Master Data Management (MDM) in the Sales & Marketing Office Master Data Management (MDM) in the Sales & Marketing Office As companies compete to acquire and retain customers in today s challenging economy, a single point of management for all Sales & Marketing

More information

A WHITE PAPER By Silwood Technology Limited

A WHITE PAPER By Silwood Technology Limited A WHITE PAPER By Silwood Technology Limited Using Safyr to facilitate metadata transparency and communication in major Enterprise Applications Executive Summary Enterprise systems packages such as SAP,

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

DATA QUALITY MATURITY

DATA QUALITY MATURITY 3 DATA QUALITY MATURITY CHAPTER OUTLINE 3.1 The Data Quality Strategy 35 3.2 A Data Quality Framework 38 3.3 A Data Quality Capability/Maturity Model 42 3.4 Mapping Framework Components to the Maturity

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

NCOE whitepaper Master Data Deployment and Management in a Global ERP Implementation

NCOE whitepaper Master Data Deployment and Management in a Global ERP Implementation NCOE whitepaper Master Data Deployment and Management in a Global ERP Implementation Market Offering: Package(s): Oracle Authors: Rick Olson, Luke Tay Date: January 13, 2012 Contents Executive summary

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Big Data must become a first class citizen in the enterprise

Big Data must become a first class citizen in the enterprise Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught

More information

SOA REFERENCE ARCHITECTURE: SERVICE TIER

SOA REFERENCE ARCHITECTURE: SERVICE TIER SOA REFERENCE ARCHITECTURE: SERVICE TIER SOA Blueprint A structured blog by Yogish Pai Service Tier The service tier is the primary enabler of the SOA and includes the components described in this section.

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,

More information

Test Data Management Concepts

Test Data Management Concepts Test Data Management Concepts BIZDATAX IS AN EKOBIT BRAND Executive Summary Test Data Management (TDM), as a part of the quality assurance (QA) process is more than ever in the focus among IT organizations

More information

IBM Data Security Services for endpoint data protection endpoint data loss prevention solution

IBM Data Security Services for endpoint data protection endpoint data loss prevention solution Automating policy enforcement to prevent endpoint data loss IBM Data Security Services for endpoint data protection endpoint data loss prevention solution Highlights Facilitate policy-based expertise and

More information

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff The Challenge IT Executives are challenged with issues around data, compliancy, regulation and making confident decisions on their business

More information

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems Simplified Management With Hitachi Command Suite By Hitachi Data Systems April 2015 Contents Executive Summary... 2 Introduction... 3 Hitachi Command Suite v8: Key Highlights... 4 Global Storage Virtualization

More information

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Accelerate your Big Data Strategy Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Enterprise Data Hub Accelerator enables you to get started rapidly and cost-effectively with

More information

UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES

UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES CONCEPT SEARCHING This document discusses some of the inherent challenges in implementing and maintaining a sound records management

More information

Inside the Digital Commerce Engine. The architecture and deployment of the Elastic Path Digital Commerce Engine

Inside the Digital Commerce Engine. The architecture and deployment of the Elastic Path Digital Commerce Engine Inside the Digital Commerce Engine The architecture and deployment of the Elastic Path Digital Commerce Engine Contents Executive Summary... 3 Introduction... 4 What is the Digital Commerce Engine?...

More information

Embarcadero DataU Conference. Data Governance. Francis McWilliams. Solutions Architect. Master Your Data

Embarcadero DataU Conference. Data Governance. Francis McWilliams. Solutions Architect. Master Your Data Data Governance Francis McWilliams Solutions Architect Master Your Data A Level Set Data Governance Some definitions... Business and IT leaders making strategic decisions regarding an enterprise s data

More information

Achieving Regulatory Compliance through Security Information Management

Achieving Regulatory Compliance through Security Information Management www.netforensics.com NETFORENSICS WHITE PAPER Achieving Regulatory Compliance through Security Information Management Contents Executive Summary The Compliance Challenge Common Requirements of Regulations

More information

Best Practices in Contract Migration

Best Practices in Contract Migration ebook Best Practices in Contract Migration Why You Should & How to Do It Introducing Contract Migration Organizations have as many as 10,000-200,000 contracts, perhaps more, yet very few organizations

More information

Datameer Big Data Governance

Datameer Big Data Governance TECHNICAL BRIEF Datameer Big Data Governance Bringing open-architected and forward-compatible governance controls to Hadoop analytics As big data moves toward greater mainstream adoption, its compliance

More information

The Liaison ALLOY Platform

The Liaison ALLOY Platform PRODUCT OVERVIEW The Liaison ALLOY Platform WELCOME TO YOUR DATA-INSPIRED FUTURE Data is a core enterprise asset. Extracting insights from data is a fundamental business need. As the volume, velocity,

More information

IBM Software A Journey to Adaptive MDM

IBM Software A Journey to Adaptive MDM IBM Software A Journey to Adaptive MDM What is Master Data? Why is it Important? A Journey to Adaptive MDM Contents 2 MDM Business Drivers and Business Value 4 MDM is a Journey 7 IBM MDM Portfolio An Adaptive

More information

Enhance visibility into and control over software projects IBM Rational change and release management software

Enhance visibility into and control over software projects IBM Rational change and release management software Enhance visibility into and control over software projects IBM Rational change and release management software Accelerating the software delivery lifecycle Faster delivery of high-quality software Software

More information

Generally Accepted Recordkeeping Principles

Generally Accepted Recordkeeping Principles Generally Accepted Recordkeeping Principles Information Governance Maturity Model Information is one of the most vital strategic assets any organization possesses. Organizations depend on information to

More information

IBM Data Security Services for endpoint data protection endpoint data loss prevention solution

IBM Data Security Services for endpoint data protection endpoint data loss prevention solution Automating policy enforcement to prevent endpoint data loss IBM Data Security Services for endpoint data protection endpoint data loss prevention solution Highlights Protecting your business value from

More information

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite IBM Software IBM Business Process Management Suite Increase business agility with the IBM Business Process Management Suite 2 Increase business agility with the IBM Business Process Management Suite We

More information

An Application-Centric Infrastructure Will Enable Business Agility

An Application-Centric Infrastructure Will Enable Business Agility An Application-Centric Infrastructure Will Enable Business Agility March 2014 Prepared by: Zeus Kerravala An Application-Centric Infrastructure Will Enable Business Agility by Zeus Kerravala March 2014

More information

Healthcare, transportation,

Healthcare, transportation, Smart IT Argus456 Dreamstime.com From Data to Decisions: A Value Chain for Big Data H. Gilbert Miller and Peter Mork, Noblis Healthcare, transportation, finance, energy and resource conservation, environmental

More information

Wrangling Actionable Insights from Organizational Data

Wrangling Actionable Insights from Organizational Data Wrangling Actionable Insights from Organizational Data Koverse Eases Big Data Analytics for Those with Strong Security Requirements The amount of data created and stored by organizations around the world

More information

Data Security in Hadoop

Data Security in Hadoop Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize

More information

Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data

Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data Transforming Data into Intelligence Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data Big Data Data Warehousing Data Governance and Quality

More information

Transforming Information Silos into Shareable Assets through Automated Content Conversion

Transforming Information Silos into Shareable Assets through Automated Content Conversion Transforming Information Silos into Shareable Assets through Automated Content Conversion AUTOMATED DOCUMENT CONVERSION FOR ECM SYSTEMS WHITE PAPER Executive Summary Massive volumes of business data much

More information

Accelerate BI Initiatives With Self-Service Data Discovery And Integration

Accelerate BI Initiatives With Self-Service Data Discovery And Integration A Custom Technology Adoption Profile Commissioned By Attivio June 2015 Accelerate BI Initiatives With Self-Service Data Discovery And Integration Introduction The rapid advancement of technology has ushered

More information

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

WHITE PAPER. Creating your Intranet Checklist

WHITE PAPER. Creating your Intranet Checklist WHITE PAPER Creating your Intranet Checklist About this guide It can be overwhelming to run and manage an Intranet project. As a provider of Intranet software and services to small, medium and large organizations,

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

A Vision for Operational Analytics as the Enabler for Business Focused Hybrid Cloud Operations

A Vision for Operational Analytics as the Enabler for Business Focused Hybrid Cloud Operations A Vision for Operational Analytics as the Enabler for Focused Hybrid Cloud Operations As infrastructure and applications have evolved from legacy to modern technologies with the evolution of Hybrid Cloud

More information

Realizing business flexibility through integrated SOA policy management.

Realizing business flexibility through integrated SOA policy management. SOA policy management White paper April 2009 Realizing business flexibility through integrated How integrated management supports business flexibility, consistency and accountability John Falkl, distinguished

More information

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015 Bringing Strategy to Life Using an Intelligent Platform to Become Ready Informatica Government Summit April 23, 2015 Informatica Solutions Overview Power the -Ready Enterprise Government Imperatives Improve

More information

IBM Unstructured Data Identification and Management

IBM Unstructured Data Identification and Management IBM Unstructured Data Identification and Management Discover, recognize, and act on unstructured data in-place Highlights Identify data in place that is relevant for legal collections or regulatory retention.

More information

IBM Master Data Management and data governance November 2007. IBM Master Data Management: Effective data governance

IBM Master Data Management and data governance November 2007. IBM Master Data Management: Effective data governance November 2007 IBM Master Data Management: Effective data governance Page 2 Introduction Gone are the days when doing business meant doing so only within the borders of the organization. What used to be

More information

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

The Case for Business Process Management

The Case for Business Process Management The Case for Business Process Management Executive Summary Each company s unique way of doing business is captured in its business processes. For this reason, business processes are rapidly becoming the

More information

How To Improve Your Business

How To Improve Your Business IT Risk Management Life Cycle and enabling it with GRC Technology 21 March 2013 Overview IT Risk management lifecycle What does technology enablement mean? Industry perspective Business drivers Trends

More information

Gain control over all enterprise content

Gain control over all enterprise content Brochure Gain control over all enterprise content HP Autonomy ControlPoint Turning Big Data into little data Most organizations today store data in a number of business systems and information repositories.

More information

Masterminding Data Governance

Masterminding Data Governance Why Data Governance Matters The Five Critical Steps for Data Governance Data Governance and BackOffice Associates Masterminding Data Governance 1 of 11 A 5-step strategic roadmap to sustainable data quality

More information

CIC Audit Review: Experian Data Quality Enterprise Integrations. Guidance for maximising your investment in enterprise applications

CIC Audit Review: Experian Data Quality Enterprise Integrations. Guidance for maximising your investment in enterprise applications CIC Audit Review: Experian Data Quality Enterprise Integrations Guidance for maximising your investment in enterprise applications February 2014 Table of contents 1. Challenge Overview 03 1.1 Experian

More information

Successful Outsourcing of Data Warehouse Support

Successful Outsourcing of Data Warehouse Support Experience the commitment viewpoint Successful Outsourcing of Data Warehouse Support Focus IT management on the big picture, improve business value and reduce the cost of data Data warehouses can help

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Operational Excellence for Data Quality

Operational Excellence for Data Quality Operational Excellence for Data Quality Building a platform for operational excellence to support data quality. 1 Background & Premise The concept for an operational platform to ensure Data Quality is

More information

Beyond the Data Lake

Beyond the Data Lake WHITE PAPER Beyond the Data Lake Managing Big Data for Value Creation In this white paper 1 The Data Lake Fallacy 2 Moving Beyond Data Lakes 3 A Big Data Warehouse Supports Strategy, Value Creation Beyond

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

The Customizable Cloud. How the Cloud Provides the More Flexible Alternative to Legacy ERP Platforms

The Customizable Cloud. How the Cloud Provides the More Flexible Alternative to Legacy ERP Platforms How the Cloud Provides the More Flexible Alternative to Legacy ERP Platforms Executive Summary For years, Enterprise Resource Planning (ERP) applications have been instrumental in integrating business

More information

Improving Service Asset and Configuration Management with CA Process Maps

Improving Service Asset and Configuration Management with CA Process Maps TECHNOLOGY BRIEF: SERVICE ASSET AND CONFIGURATION MANAGEMENT MAPS Improving Service Asset and Configuration with CA Process Maps Peter Doherty CA TECHNICAL SALES Table of Contents Executive Summary SECTION

More information

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop Hadoop Data Hubs and BI Supporting the migration from siloed reporting and BI to centralized services with Hadoop John Allen October 2014 Introduction John Allen; computer scientist Background in data

More information

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform Optimized for the Industrial Internet: GE s Industrial Lake Platform Agenda The Opportunity The Solution The Challenges The Results Solutions for Industrial Internet, deep domain expertise 2 GESoftware.com

More information

API Management: Powered by SOA Software Dedicated Cloud

API Management: Powered by SOA Software Dedicated Cloud Software Dedicated Cloud The Challenge Smartphones, mobility and the IoT are changing the way users consume digital information. They re changing the expectations and experience of customers interacting

More information

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation

More information

A Modern Data Architecture with Apache Hadoop

A Modern Data Architecture with Apache Hadoop Modern Data Architecture with Apache Hadoop Talend Big Data Presented by Hortonworks and Talend Executive Summary Apache Hadoop didn t disrupt the datacenter, the data did. Shortly after Corporate IT functions

More information

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

Service Oriented Architecture and the DBA Kathy Komer Aetna Inc. New England DB2 Users Group. Tuesday June 12 1:00-2:15

Service Oriented Architecture and the DBA Kathy Komer Aetna Inc. New England DB2 Users Group. Tuesday June 12 1:00-2:15 Service Oriented Architecture and the DBA Kathy Komer Aetna Inc. New England DB2 Users Group Tuesday June 12 1:00-2:15 Service Oriented Architecture and the DBA What is Service Oriented Architecture (SOA)

More information

How To Make Data Streaming A Real Time Intelligence

How To Make Data Streaming A Real Time Intelligence REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

How To Turn Big Data Into An Insight

How To Turn Big Data Into An Insight mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information