On the Radar: Alation harnesses crowdsourcing and machine learning to speed data access
Summary Catalyst As organizations widen their net and analyze more data sources, it becomes all too easy for business end users to lose their way through the data. Part of an emerging breed of tools targeted at helping business end users make sense of big data, Alation combines machine learning with crowdsourcing and collaborative approaches to help end users ask the right questions of the right data. Emerging from stealth nearly a year ago, the company has just scored its first major data platform alliance, with Teradata, making Alation the preferred partner for data cataloging in Teradata's Universal Data Architecture (UDA). Key messages Alation combines machine learning and collaboration approaches to catalog big data and help business end users query it. Alation is part of an emerging breed of tools providers that are offering self-service approaches to make big data accessible. Thanks to a new resale agreement, Alation has become Teradata's strategic partner for providing data-cataloging solutions. Ovum view Alation is part of an emerging breed of tools providers that are offering machine-learning and collaboration approaches to big data. It offers a collaborative cataloging approach that picks up where data-wrangling and data-matching approaches leave off: cataloging data and helping business end users query it. Recommendations for enterprises Why put Alation on your radar? Alation offers a self-service approach to discovering, cataloging, and helping business end users query big data. It fits alongside a growing arsenal of self-service tools that apply machine-learning and crowdsourcing-style collaboration approaches to transforming and reconciling data from multiple sources. While data cataloging is hardly unique, Alation offers a different spin by combining machine learning and collaboration to proactively help business end users query the data. Highlights Background Few, if any, organizations can rely on a single source of "the truth" for all their data. In most cases, data is spread across multiple data warehouses, data marts, spreadsheets, and other sources. Organizations with extensive data warehousing and reporting infrastructure may find themselves with tens of thousands of reports. All too often, the problem is not a lack of data, but where to start to get
the answer to a specific query. Increasingly, organizations are accumulating or creating diverse, heterogeneous lakes of data. But as the data universe widens, end users typically stick with what they know running queries against the same data mart using the same reporting tool, with its limited catalog of reports. With all the new opportunities to gain insights from new sources of data, end users are increasingly finding themselves in scenarios where they can't see the forest through the trees. Alation has developed tooling that utilizes machine-learning, natural language processing, and crowdsourcing techniques to cut through the growing clutter of diverse data. Its tool crawls enterprise databases, harvesting metadata and cataloging information without intervention from data engineers. It combines the accessibility of a search engine with the familiarity of SQL-based query, and the opportunity for crowdsourcing wisdom about data. It helps business end users find data, form queries, and share insights on data sets, and with its data lineage capabilities, provides a trail of breadcrumbs on which governance can be applied. It allows end users to search by keywords or just enter a business term in their natural language to find tables on specific topics and provides assistance to users writing SQL. Data entities and data sets can be linked with a built-in business glossary. Using both historical and real-time queries, the system builds catalogs and tracks usage that can show end users which data sources are the most popular. From query history, the system can make query recommendations. And end users can share their insights as they comment on, tag, and rate different data sources. The system employs machine learning to detect patterns in queries and rank usage of data sets, enabling it to "auto-suggest" relevant data sources or query snippets based on similar queries from other colleagues. As the user is writing a query, they can get a thumbnail profile of the data, enabling them to check that they are targeting the right data. And Alation can validate queries to check for correctness and compliance with resource management policies set by database administrators (e.g., limiting queries with excess joins). As users exercise the system, Alation tracks usage by tracking who, and in which part of the organization, used which data sets. And it offers housecleaning functions that can identify duplicate data sets and schema across the different data warehouses run by the organization. Current position Alation has received $9m in a single round of funding concluded in early 2015. The company's mantra to "centralize knowledge of an organization's data, making it accessible" overlaps across multiple data integration tooling categories. Examples include search-based analytics, which encompasses open source search providers such as Lucidworks and Elastic; providers, such as Oracle, that have reinvented search-based analytics as data discovery; data-cataloging offerings from providers such as IBM, Tamr, and Waterline; and data-wrangling providers such as Paxata, Trifacta, Zaloni, and more recently, IBM, Informatica, and Oracle. So there's no question that the data integration field is growing crowded; Ovum believes that ultimately, data preparation, matching, and cataloging will consolidate into a single suite. But for now, Alation has created a unique spin on cataloging; it isn't simply a list of APIs or metadata repository, but an approach that combines machine learning to categorize the data, collaboration to enable end users and domain experts to share their insights, and capabilities to help business end users form queries. In the short run, Alation's alliance with Teradata will provide the most logical onramp to the market. Although the company intends to keep its tooling data source-neutral, there is little question that, with advance access to Teradata's UDA pipeline, Teradata (and its supported Hadoop distributions) will
become first among equals. Nonetheless, even being Teradata's preferred partner still requires the ability to integrate with other data platforms as most of the target customer base are not going to be single-database shops. Data sheet Key facts Table 1: Data sheet: Alation Product name Alation Product classification Data management Version number 3.1 Release date March 2015 Industries covered All Geographies covered North America, EMEA Relevant company sizes All Licensing options Subscription URL www.alation.com Routes to market Direct Company headquarters Source: Ovum Redwood City, California, US Number of employees 30 Appendix On the Radar On the Radar is a series of research notes about vendors bringing innovative ideas, products, or business models to their markets. Although On the Radar vendors may not be ready for prime time, they bear watching for their potential impact on markets and could be suitable for certain enterprise and public sector IT organizations. Further reading "Teradata announcements hint at new directions for Unified Data Architecture," IT0014-003072 (November 2015) "Teradata extends its Big Data reach," IT0014-002956 (October 2014) Author Tony Baer, Principal Analyst, Information Management tony.baer@ovum.com Ovum Consulting We hope that this analysis will help you make informed and imaginative business decisions. If you have further requirements, Ovum's consulting team may be able to help you. For more information about Ovum's consulting capabilities, please contact us directly at consulting@ovum.com.
Copyright notice and disclaimer The contents of this product are protected by international copyright laws, database rights and other intellectual property rights. The owner of these rights is Informa Telecoms and Media Limited, our affiliates or other third party licensors. All product and company names and logos contained within or appearing on this product are the trademarks, service marks or trading names of their respective owners, including Informa Telecoms and Media Limited. This product may not be copied, reproduced, distributed or transmitted in any form or by any means without the prior permission of Informa Telecoms and Media Limited. Whilst reasonable efforts have been made to ensure that the information and content of this product was correct as at the date of first publication, neither Informa Telecoms and Media Limited nor any person engaged or employed by Informa Telecoms and Media Limited accepts any liability for any errors, omissions or other inaccuracies. Readers should independently verify any facts and figures as no liability can be accepted in this regard - readers assume full responsibility and risk accordingly for their use of such information and content. Any views and/or opinions expressed in this product by individual authors or contributors are their personal views and/or opinions and do not necessarily reflect the views and/or opinions of Informa Telecoms and Media Limited.
CONTACT US www.ovum.com askananalyst@ovum.com INTERNATIONAL OFFICES Beijing Dubai Hong Kong Hyderabad Johannesburg London Melbourne New York San Francisco Sao Paulo Tokyo