On the Radar: Tamr. Applying machine learning to integrating Big Data. Publication Date: Sept. 2014 Product code: IT0014-002934.



Similar documents
On the Radar: Alation harnesses crowdsourcing and machine learning to speed data access

On the Radar: Pulse Secure

On the Radar: CipherCloud

Re-architecting Legacy Systems is the Keystone for Transformation

SWOT Assessment: Alfresco, Alfresco One, v5.0

SWOT Assessment: BMC Remedy v9

How To Use Syncplicity Panorama On A Mobile Device

Financial services perspectives on the role and real impact of cloud

Case Study: Vitamix. Improving strategic business integration using IT service management practices and technology

On the Radar: ForgeRock

Ovum Decision Matrix: Selecting an Enterprise File Sync and Share Product,

On the Radar: JReport

HP s revitalized workforce optimization suite is worth a fresh look

SWOT Assessment: BeyondTrust Privileged Identity Management Portfolio

LMS and Student Success at Greenville College: A Case Study

Financial Institutions and the cloud: moving from BAU to business transformation

Addressing Enterprise Needs with a Software Defined Network Platform

SWOT Assessment: CoreMedia, CoreMedia Platform

On the Radar: Apperian MAM

Hybrid WAN Services emerging as a viable network option

Web Application Firewalls: The TCO Question

Public Sector Enterprises and Cloud Computing: Realizing Efficiencies

Winning with Emerging CRM Channels. An Ovum White Paper

Enterprise Content Management: The Suite Perspective

How To Rank Customer Analytics Vendors

Making analytics a first-class healthcare citizen: lessons from Oracle customers

Realising possibilities in the cloud: The need for a trusted broker

2015 Global Payments Insight: Bill Pay Services. With big change comes big opportunity

2015 Trends to Watch: Higher Education

How To Understand The Implications Of Outsourced Testing

The Future of Payments 2015: Financial Institutions. The Payments Value Chain is Driven by Customers

Data Center Automation: Market Landscape and Maturity Model

On the Radar: Tessella

Staying agile with Big Data

The Critical Role for Cloud in the Transformation of Retail Banks

Dare to Share: Putting the Data into Data-Driven Services. Adopting master data management technology to rise to the challenge of austerity

Ovum Decision Matrix: Selecting a Customer Interaction Analytics Vendor,

OSS/BSS market overview and vendor landscape, 2Q13-1Q14

CA Service Management Solutions 14.1

The Critical Impact of Cloud for Insurance on Business Transformation

On the Case: HCL News Corp (News UK)

Ovum Decision Matrix: Selecting an Outsourced Testing Service Provider,

SWOT Assessment: FireMon Security Manager Suite v7.0

Rethinking Cloud Content Collaboration in Financial Services

On the Radar: Truphone

How To Get Value From Data In An Enterprise Business

On the Radar: EMC Supplier Exchange

Ovum Decision Matrix: Selecting a Hybrid Cloud and Virtualization Management Solution,

Core Operations Modernization in the Global Insurance Industry

Ovum Decision Matrix: Selecting an Enterprise Mobility Management Solution,

Ovum Decision Matrix: Selecting an Enterprise Mobility Management Solution,

2016 Global Payments Insight Survey: Merchants and Retailers. Changing the merchant experience

SWOT Assessment: CoreMedia, CoreMedia 7

Enterprise-grade Hadoop: The Building Blocks

On the Radar: Esri UK

CA Performance Management Solution for Communications Service Providers

SWOT Assessment: Eccentex AppBase v5.0

Data Quality and Corporate IT - The Current Situation

SWOT Assessment: dotcms dotcms v2.5

Tooling is starting to tame Hadoop

Transforming Asset Information Management

Big Data must become a first class citizen in the enterprise

On the Radar: Be Informed

How To Understand The Internet Of Things

On the Radar: NextPlane

Specializing in visualizing and analyzing clinical trials data

Ovum Decision Matrix: Selecting an Outsourced Testing Service Provider,

How To Be A Cloud Leader In The Telecoms World Through Cloud Success

LTE450 Julian Bright, Senior Analyst LTE450 Global Seminar 2014 Copyright Ovum All rights reserved.

ImageWare Systems, Inc.

Secunia Corporate Software Inspector (Secunia CSI) ver.5.0

BLOOMBERG ENTERPRISE SOLUTIONS NSIDE DATA OVERNANCE T WFIC 2013

Big Data and Business Analytics

Secunia Vulnerability Intelligence Manager

2013 ICT Enterprise Insights in the Life Sciences Industry

World City Millionaire Rankings. May 2013

DocAve Software Platform

A new approach to telecom digital marketing:

Case Study: Unifying ITSM Practices and Technology

IBM's Adoption of Sugar: A Lesson in Global Implementation

Satellite Broadband: A Global Comparison

THE VALUE OF TRUSTED DATA. How Asset Managers Use Technology To Turn Data Into Actionable Insight

Why enterprise data archiving is critical in a changing landscape

ORACLE PRODUCT DATA HUB

Data Platforms and Analytics Market Map 2016

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Oracle Order Management

THE VALUE OF TRUSTED DATA

WHITE PAPER. BMC Software Evaluation for Selecting a Cloud Management Solution Technology Decision Matrix,

CA Service Management Solutions

CONSULTING SERVICES Business & technology consulting and managed services

Enterprise Information Management Services Managing Your Company Data Along Its Lifecycle

TIME TO RETHINK REAL-TIME BIG DATA ANALYTICS

CityNext: Microsoft's Future City Market Proposition

Ovum Decision Matrix: Selecting a CTMS Solution,

Angoss Predictive Analytics Software Suite

Data Quality Through Curation at Big Data Scale Andy Palmer, CEO & Co-Founder, Tamr, Inc. MIT CDOIQ Symposium, July 23 & 24, 2014

Cloud Ready Data: Speeding Your Journey to the Cloud

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Transcription:

Applying machine learning to integrating Big Data Publication Date: Sept. 2014 Product code: IT0014-002934 Tony Baer

Summary Catalyst Traditional data integration approaches may not scale for Big Data. The new norm is that, as data volumes grow, so does the number and diversity of data sources. Tamr is applying innovations with machine learning to automate and scale the process for integrating and reconciling data from multiple sources. It complements, not replaces, tools and practices for data transformation and master data management. Key messages Data integration challenges are magnified by Big Data. Tamr leverages machine learning, an increasingly popular approach for data transformation, and is applying it at the upstream step of reconciling and integrating data from multiple sources. Tamr is carving a foothold in a new portion of an emerging market for Big Data integration, an area where there will be significant opportunity for OEM and "coopetition" strategies. Ovum view Tamr is carving a new foothold into the Big Data end of an established market. While a number of data preparation tools leveraging machine learning are emerging, Tamr is unique in applying similar approaches to the upstream step of consolidating data. It will complement data preparation and master data management tools in deriving the big picture from Big Data. Recommendations for enterprises Why put Tamr on your radar? Data integration, a perennial challenge for data warehousing, is magnified with Big Data. Not only are the data sets larger, but in all likelihood, so are the number and variety of sources. Traditional data integration approaches will not scale because of the sheer number, variety, and increasingly dynamic nature of data sources. With machine learning transforming the data transformation process, Tamr is applying similar approaches to make the inevitable issue of data integration and consolidation doable for Big Data. Highlights Background Data management issues associated with data warehousing are compounded by Big Data. Beyond the issue of scale is the likelihood that more data sets will be involved that, in many cases, will come from external sources where the provenance of data is less known. And, unlike traditional data warehousing, which used relatively static internal data sets, Big Data analytics are likely to consume 2014 Ovum. All rights reserved. Unauthorized reproduction prohibited. Page 2

data sets where the content and structure of data is constantly morphing. The Ovum report, Data Quality and Big Data: From Discovery to Precision, called for new approaches to be applied to data cleansing. Some of our recommendations included: determining whether the goal is getting "the big picture" (which does not require as rigorous a strategy for cleansing data) or "the exact picture" assigning confidence levels regarding data validity because, unlike internal sources, it is virtually impossible to be 100% certain regarding data quality or consistency leveraging new approaches such as crowdsourcing, machine learning, and data science techniques to vouch for data. Tamr applies many of these approaches to a similar task that is conducted upstream of data cleansing: integrating and consolidating data from multiple sources. It characterizes its approach as "data curation at scale." As a technology that employs probabilistic matching, it is best suited for use cases aimed at deriving the big picture. During data ingestion, Tamr extracts whatever metadata exists and makes rough guesses regarding matching columns from multiple data sources, and displays histograms showing the relative levels of certainty on the matches. A workflow manager is available for organizing human expert input to help refine the matching logic; the same process is then repeated with matching individual records. Machine learning, tweaked with human intelligence, steadily improves over time. Current position The company was founded by the same team Andy Palmer and Michael Stonebraker who previously started Vertica. In May 2014, the company released the first version of its product and received $16m in Series-A funding from Google Ventures and New Enterprise Associates. Tamr is part of a wave of data management start-ups filling the vacuum in the Big Data third-party tooling ecosystem. This is an essential development for making Big Data and platforms such as Hadoop accessible to the mainstream enterprise market much as it was for data warehousing and business intelligence nearly 20 years ago. For data integration-related tasks, much of the start-up activity has heavily leveraged machine learning; it is useful, not only because of the scale of data involved, but also for helping overcome uncertainty. Start-ups such as Trifacta and Paxata emerged, applying such techniques to data preparation, an approach subsequently embraced by incumbents Informatica and IBM. Tamr has adopted a similar approach but applied it to a different upstream problem curating data from multiple sources. It has identified use cases with customer data, product parts catalogs, and health claims reconciliation, among others. Tamr's opportunity and challenge is being one of the first to make a stab at the data integration stage of the process. IBM has publicly stated its direction to develop a "Big Match" capability for Big Data that would complement its MDM (master data management) tools, and Ovum expects more players to surface. The most promising initial use case is reconciling identities from internal customer relationship management (CRM) and related systems with social data feeds, as Ovum has found customer-related applications as being one of the most popular among early Big Data adopters. There are further 2014 Ovum. All rights reserved. Unauthorized reproduction prohibited. Page 3

opportunities for Tamr to integrate with data preparation tools applying similar approaches. Ultimately, Ovum believes that data curation and data preparation should be integrated as a single workflow. Data sheet Key facts Table 1: Data sheet: Tamr Product name Tamr Product classification Data integration Version number 1.0 Release date May 2014 Industries covered All Geographies covered North America Relevant company sizes Midsized to large Licensing options Subscription URL www.tamr.com Routes to market Direct Company headquarters Source: Ovum Cambridge, Massachusetts, US Number of employees 25 Appendix On the Radar On the Radar is a series of research notes about vendors bringing innovative ideas, products, or business models to their markets. Although On the Radar vendors may not be ready for prime time, they bear watching for their potential impact on markets and could be suitable for certain enterprise and public sector IT organizations. Further reading Data Quality and Big Data: From Discovery to Precision, IT014-002596 (May 2012) Author Tony Baer, Principal Analyst, Software Information Management Ovum Consulting We hope that this analysis will help you make informed and imaginative business decisions. If you have further requirements, Ovum s consulting team may be able to help you. For more information about Ovum s consulting capabilities, please contact your Ovum representative. 2014 Ovum. All rights reserved. Unauthorized reproduction prohibited. Page 4

Copyright notice and disclaimer The contents of this product are protected by international copyright laws, database rights and other intellectual property rights. The owner of these rights is Informa Telecoms and Media Limited, our affiliates or other third party licensors. All product and company names and logos contained within or appearing on this product are the trademarks, service marks or trading names of their respective owners, including Informa Telecoms and Media Limited. This product may not be copied, reproduced, distributed or transmitted in any form or by any means without the prior permission of Informa Telecoms and Media Limited. Whilst reasonable efforts have been made to ensure that the information and content of this product was correct as at the date of first publication, neither Informa Telecoms and Media Limited nor any person engaged or employed by Informa Telecoms and Media Limited accepts any liability for any errors, omissions or other inaccuracies. Readers should independently verify any facts and figures as no liability can be accepted in this regard readers assume full responsibility and risk accordingly for their use of such information and content. Any views and/or opinions expressed in this product by individual authors or contributors are their personal views and/or opinions and do not necessarily reflect the views and/or opinions of Informa Telecoms and Media Limited. 2014 Ovum. All rights reserved. Unauthorized reproduction prohibited. Page 5

CONTACT US www.ovum.com (212) 652-2647 INTERNATIONAL OFFICES Beijing Dubai Hong Kong Hyderabad Johannesburg London Melbourne New York San Francisco Sao Paulo Tokyo