CTOlabs.com White Paper: Datameer s User-Focused Big Data Solutions May 2012 A White Paper providing context and guidance you can use Inside: Overview of the Big Data Framework Datameer s Approach Consideration for Deployment
CTOlabs.com Datameer: Bringing Big Data To All This paper, produced by the analysts and researchers of CTOlabs.com, provides an overview of one of the pioneers of the Big Data movement, Datameer. Datameer provides end-user-focused capabilities, enabling user self-service and real time interaction for users. Executive Summary Datameer provides a complete analytics platform that supports users. Users see familiar interfaces and easy to manipulate interactive visualizations. They are supported with a back end that integrates all the enterprise s data resources. The result: powerful Big Data solutions are provided to users in a way that lets them interact directly with data instead of being forced to work through development teams. Background Apache Hadoop is the leading open-source Big Data platform with an ecosystem of software to inexpensively store, process, and analyze almost any type of information from any source. Hadoop is renowned for its ability to work on commodity hardware. Hadoop enables fast, distributed analysis running in parallel on multiple servers in a cluster. Hadoop is reliable, managing and healing itself; scales linearly, working as well with one terabyte of data across three nodes as it does with petabytes of data across thousands; affordable, costing much less per terabyte to store and process data compared to traditional alternatives; and agile, allowing users to load raw data into the system and implement a schema on read approach which orders the data based on how it s requested. The Challenges of Hadoop Enterprises face many common challenges when implementing Hadoop. Until the arrival of end-userfocused solutions like Datameer, Hadoop clusters had to be integrated, programmed, and queried by programming specialists or data scientists. This works for firms like Google, LinkedIn, Facebook, Twitter and others that can hire scores of computer scientists and data engineers, but in most firms the analyst 1
A White Paper for the Government IT Community needs the help of additional programmers to build a query. Datameer s approach has changed this by bringing powerful graphic user interface tools and easy to implement Big Data solutions directly to the user. Businesses and government agencies can rely on Datameer to tap into all data sources while presenting users unfamiliar with Hadoop programming paradigms with familiar tools. The analysts who can benefit the most from Big Data now have a tool purpose-built for their needs. The Datameer Approach Datameer and Zvents Datameer was designed to let analysts and other Big Data Zvents is the leading online platform end-users benefit from Hadoop. Datameer is the first for the discovery and promotion of local business intelligence and analytics platform built natively entertainment, including concerts, movies, on Hadoop to allow for end-user analysis and correlation restaurants, theaters, and more. We connect of any size structured, semi-structured and unstructured 35 million monthly uniques with over 140,000 data. Datameer runs on all major Hadoop distributions and local promoters, via a network of over 300 integrates easily into existing IT infrastructure with pointand-click deployment. Datameer can be easly deployed over local entertainment audience on the Web. branded media partners, creating the largest any Hadoop cluster, including those in-house or on public Data is critical to any web business, and cloud environments like those at Amazon or Rackspace. gaining rapid insights in the fast-moving world Datameer easily integrates with all legacy technologies and of live events is even more critical. Datameer datastreams, including existing business intelligence data has enabled us to leverage our considerable warehouses, transactional databases and other analytic investments in Big Data technology, stores. It also works with newer NoSQL technologies. including Hadoop and Hypertable, to rapidly discover actionable business insights that With Datameer, you can integrate, analyze, and visualize enable us to better server our users and our data of any volume, variety, and velocity, enabling numerous advertisers. Datameer has given us a scalable, Big Data use cases. It works well for large-scale data mining flexible, cost-effective way to structure and and text analysis because it can import massive amounts analyze terabytes of behavioral click data, of data in parallel. Datameer can find correlations across driving new product initiatives like Top 40, structured and unstructured data such as phone records, our new trending list of hot events at top40. social media, and text for pattern detection by joining any zvents.com. type of data at any size. Datameer can also be helpful in Ethan Stock,CEO and Founder, Zvents Inc. cyber security monitoring, for example by importing a large numbers of log files from disparate servers and analyzing them together for anomalous behavior. 2
CTOlabs.com Datameer can do all of this because it is a complete analytics platform that supports data integration, analysis, visualization, and security while focusing on the data analysts and scientists who turn raw information into intelligence. To bring data of all sorts together, Datameer provides wizard-based data integration with over 20 prebuilt connectors. These provide immediate access to all common data sources including relational databases such as Teradata, Greenplum, Vertica, Oracle, DB2, Microsoft SQL Server, and MySQL, along with file formats such as CSV, Fixed Length, JSON, XML Mbox, Apache Log Files and Twitter. Datameer also has connectors for the Hadoop Distributed File System and the Hadoop database systems Hive and HBase. Datameer and Nurago As a result of using Datameer, nurago is better able to help our clients identify and analyze patterns in behavioral data of panel members. Datameer helps us as a market research vendor to scale for our most granular data requirements and greatly simplifies the integration of multiple sources. In addition, Datameer makes reporting on big data analytics directly accessible to our analysts so that they don t need to turn to developers for their requirements. Nikolaus Pohle,CTO of nurago For The Analyst For analysis, Datameer provides a familiar spreadsheet user interface that requires no programming to design end-to-end data processing pipelines. Datameer provides over 200 pre-built functions for exploring and discovering complex relationships. These include the basics such as aggregation but also advanced capabilities. Functions are provided for analysis of text, production of mathematical assessments, bioinformatics, engineering and statistics. Once users integrate and analyze their data, they can visualize the results using simple drag and drop wizards for creating visualizations and dashboards. An extensive library of widgets including tables, charts, graphs, and maps gives users the ability to choose the visualization that will best help them understand the results. With Datameer, analysts and data scientists can focus on what they do best, getting insights from data, instead of writing code. Datameer automatically compiles a workbook of spreadsheets into efficient Hadoop MapReduce execution plans; it then monitors their progress, status, and throughput to detect problems. If users want to go deeper, it offers open APIs for custom data integration, analytics and visualizations. 3
A White Paper For The Federal IT Community As a result, Datameer provides powerful, agile analytics to support your organization s mission. Adding new data sources is quick and easy. By using Hadoop, Datameer has no limitations on storage and computation and does not require pre-defined data models so usage is never constrained by up-front system design. By focusing on the end-user, Datameer also eliminates Hadoop s need for a user s deep technical expertise. This lets any analyst, from any domain, across any site, and with any skillset to contribute by providing Big Data analytics in spreadsheets that can be accessed and updated instantly worldwide. Lastly, by simplifying the process and removing the IT bottleneck, Datameer removes limitations in time-totrigger, letting users develop and run Hadoop-based analytics jobs in minutes. Datameer and Attributor Attributor s selection of Datameer was driven by our need to quickly provide analytics to our clients. Datameer s ease-of-use, seamless integration with Cloudera s CDH, HBase and MySQL and ability to correlate structured and unstructured data on day one has already saved us both time and money in running thousands of analytics jobs for our users. Matt Robinson,President and COO, Attributor Datameer in Government Government agencies have been leveraging the Big Data movement to directly support many government missions. Agencies have been using Big Data approaches in missions supporting Healthcare, Education, Environmental Research, Law Enforcement, Defense, Intelligence and numerous other activities. Early adopters in these communities have been leading the way in open source solutions and contributions back to the broader community. Solutions have been deployed throughout the federal space including on most publicly facing government web properties. The initial government foray into Big Data has in many ways mirrored the Big Data movement in industry. Still today, for most government missions to be served they must leverage teams of data scientists and engineers. Little to no user-centered Big Data approaches are in use in the government. We believe that is about to change. With Datameer available to every government knowledge worker by easy access through a browser, citizen service and mission support will be supported in new, highly efficient and effective ways. 4
CTOlabs.com Concluding Thoughts Since government agencies have already established visions and goals for big data approaches to serve their missions and since Datameer has a proven user-focused approach to leveraging all organizational data for analysis, we believe Datameer is poised for rapid growth in the federal sector. A logical step for most government agencies and systems integrators, architects and engineers that support them is to begin a proof of concept activity to see first hand how Datameer can work in your environment. More Reading For more federal Big Data technology and policy issues visit: CTOvision.com- A blog for enterprise technologists with a special focus on Big Data. CTOlabs.com - A reference for research and reporting on all IT issues. Carahsoft.com - Offering Big Data solutions for Government. GovernmentBigDataForum.com - Join the Government Big Data Forum. J.mp/ctonews - Sign up for the Government Big Data Newsletter. Datameer.com - Visit for more on how Datameer works and to arrange a proof of concept. 5
A White Paper For The Federal IT Community About the Authors Bob Gourley is CTO and founder of Crucial Point LLC and editor and chief of CTOvision.com He is a former federal CTO. His career included service in operational intelligence centers around the globe where his focus was operational all source intelligence analysis. He was the first director of intelligence at DoD s Joint Task Force for Computer Network Defense, served as director of technology for a division of Northrop Grumman and spent three years as the CTO of the Defense Intelligence Agency. Bob serves on numerous government and industry advisory boards. Contact Bob at bob@crucialpointllc.com Alexander Olesker is a technology research analyst at Crucial Point LLC, focusing on disruptive technologies of interest to enterprise technologists. He writes at http://ctovision.com. Alex is a graduate of the Edmund A. Walsh School of Foreign Service at Georgetown University with a degree in Science, Technology, and International Affairs. He researches and writes on developments in technology and government best practices for CTOvision.com and CTOlabs.com, and has written numerous whitepapers on these subjects. Alex has worked or interned in early childhood education, private intelligence, law enforcement, and academia, contributing to numerous publications on technology, international affairs, and security and has lectured at Georgetown and in the Netherlands. Alex is also the founder and primary contributor of an international security blog that has been quoted and featured by numerous pundits and the War Studies blog of King s College, London. Alex is a fluent Russian speaker and proficient in French. Contact Alex at AOlesker@crucialpointllc.com 6
For More Information If you have questions or would like to discuss this report, please contact me. As an advocate for better IT in government, I am committed to keeping the dialogue open on technologies, processes and best practices that will keep us moving forward. Contact: Bob Gourley bob@crucialpointllc.com 703-994-0549 All information/data 2011 CTOLabs.com. CTOlabs.com