HADOOP BEST PRACTICES

Size: px
Start display at page:

Download "HADOOP BEST PRACTICES"

Transcription

1 TDWI RESEARCH TDWI CHECKLIST REPORT HADOOP BEST PRACTICES For Data Warehousing, Data Integration, and Analytics By Philip Russom Sponsored by tdwi.org

2 OCTOBER 2013 TDWI CHECKLIST REPORT HADOOP BEST PRACTICES For Data Warehousing, Data Integration, and Analytics By Philip Russom TABLE OF CONTENTS 2 FOREWORD 2 NUMBER ONE Plan how you will use Hadoop for business advantage. 3 NUMBER TWO Get to know the extended Hadoop ecosystem and what it can do for you. 4 NUMBER THREE Interface with HDFS data through Hadoop tools. 5 NUMBER FOUR Extend your data warehouse architecture with Hadoop. 6 NUMBER FIVE Embrace new best practices in data management as enabled by Hadoop. 7 NUMBER SIX Leverage Hadoop for relatively static, regularly repeated queries against massive data sets. 8 NUMBER SEVEN Get many uses from a few HDFS clusters. 8 NUMBER EIGHT Augment your Hadoop environment with special technologies for real-time data. 9 ABOUT OUR SPONSORS 9 ABOUT THE AUTHOR 9 ABOUT TDWI RESEARCH 9 ABOUT THE TDWI CHECKLIST REPORT SERIES 555 S Renton Village Place, Ste. 700 Renton, WA T F E info@tdwi.org tdwi.org 2013 by TDWI (The Data Warehousing Institute TM ), a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. requests or feedback to info@tdwi.org. Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies.

3 FOREWORD NUMBER ONE PLAN HOW YOU WILL USE HADOOP FOR BUSINESS ADVANTAGE. According to a recent TDWI survey about Hadoop, only 10% of respondents report having the Hadoop Distributed File System (HDFS) in production today, while a whopping 63% expect to deploy HDFS within three years. 1 The survey shows that user organizations are aggressively adopting HDFS and other Hadoop technologies for data warehousing (DW), data integration, and analytics. A number of trends are driving Hadoop adoption. Organizations want more business value from big data. Seventy percent of survey respondents say that Hadoop is a business opportunity, but only if big data is leveraged through analytics. In other words, business value takes the form of insights gained from analyzing big data managed on Hadoop. Other forms of value come from Hadoop s scalable data management, handling of diverse data types, and low cost compared to other data platforms. Hadoop complements data warehouse platforms, data integration, and analytic tools. It handles massive data volumes and diverse, multi-structured data in scalable and cost-effective ways that traditional platforms cannot. Yet, Hadoop lacks the SQL support, low latency, and ACID properties of traditional platforms (atomicity, consistency, isolation, and durability). This is why Hadoop is a practical complement to traditional platforms, but cannot replace them. Users are moving toward multi-platform environments for DW, data integration, and analytics. This is so users can choose the best platform for a given data workload or analytic goal, plus offload certain workloads from the data warehouse. Hadoop is a welcome addition to extended multi-platform architectures in data warehousing, data integration, and analytics, because it excels with workloads for massive data, ETL, and new analytic algorithms. Most of the organizations adopting Hadoop are completely new to it, so they need to educate themselves quickly about emerging best practices. This TDWI Checklist Report will assist with that education by beginning with an overview of the rapidly evolving Hadoop ecosystem. The checklist of best practices presented here can help users make sustainable decisions as they plan their first Hadoop deployments. 1 See Figure 1 in the 2013 TDWI Best Practices Report Integrating Hadoop into Business Intelligence and Data Warehousing, available at tdwi.org/bpreports. Look for a problem to solve or an opportunity to leverage. For example, some organizations need to have many years of data on spinning disk, ready for analytics and reporting. But many of the largest data warehouse environments have an upper limit for data volumes, whether it s based on a technical barrier or an economic one. HDFS can cost-effectively extend analytic data sets. As another example, many organizations have collected some big data, but have not yet leveraged it. Implementing Hadoop can enable broad exploration of big data as a first step toward reaping business value. In yet another example, the data staging areas of many data warehouses (DWs) are stretched to the limit; Hadoop can handle the growing volumes of detailed source data that many DW staging areas manage and process, plus offload heavy transformation workloads (ETL) to free up DW capacity. Involve business people in defining applications for Hadoop. If your organization has an existing program for data stewardship or governance, stewards and board members can tell you multiple ways that managing big data could be beneficial. Other people to involve include BI/DW sponsors, the chief information officer (CIO), and the heads of departments that can benefit most from big data (marketing, sales, Web, customer service, and operations). Consider how others in your industry are leveraging big data and Hadoop. For example, supply chain focused industries such as manufacturing and retail have big data in the form of RFID and XML. These provide valuable information via analytics about products and their movement through a market. Text-laden industries (insurance and healthcare) can use text analytics to get value from large volumes of human-language text. Machine data streaming from robots in manufacturing and sensors on vehicles or packages in logistics firms can lead to improvements in product quality and operational efficiency. Learn the forms of big data your organization has and propose Hadoop-based solutions accordingly. Identify how Hadoop data can integrate with other enterprise data for broader insights. Many organizations have built complete views of customers based on available enterprise data. Hadoop can extend such views by introducing additional information about customers, customer service, and new customer channels (such as social media or a mobile app). Customers aside, big data managed in Hadoop can also broaden views of products, suppliers, and employees, as well as low-level business operations and activities. Likewise, big data extends analytic applications that depend on large data samples (fraud, risk, customer segmentation). 2 TDWI RESEARCH tdwi.org

4 NUMBER TWO GET TO KNOW THE EXTENDED HADOOP ECOSYSTEM AND WHAT IT CAN DO FOR YOU. Apache Hadoop is an open source software project administered by the Apache Software Foundation (ASF). The Apache Hadoop software library is a framework that enables the distributed processing of large data sets across clusters of computers, each offering local computation and storage. A TDWI survey asked users with hands-on Hadoop experience which Hadoop products they are using today for applications in data warehousing, data integration, and analytics. 2 Their responses identify the five most common Hadoop products in use today: The Hadoop Distributed File System (HDFS) is a file system (not a database) and therefore lacks capabilities associated with a database management system (DBMS), such as random access to data, support for standard SQL, and query optimization. However, HDFS does things DBMSs don t do as well, such as managing and processing massive volumes of filebased, multi-structured data. MapReduce is a general-purpose execution engine that handles the complexities of parallel programming for a wide variety of hand-coded logic and other applications, which includes (but is not restricted to) analytics and ETL. Hive projects structure onto Hadoop data as it scans so the data can be queried using a SQL-like language called HiveQL. HBase provides a few database functions for HDFS data. HBase is a simple record store, not a full-blown DBMS. Pig provides an additional layer of abstraction that enables developers to design logic (specifically for execution by MapReduce) without having to hand code. Besides the Hadoop products listed above, users will adopt others in coming years (according to the TDWI survey), especially Mahout, Zookeeper, and HCatalog. Hadoop is an ecosystem of products and technologies. security) or additional tools (for administering HDFS clusters). Apache does not provide support and maintenance for Hadoop, but a few software vendors do. Vendor support and vendor-developed functionality help Hadoop achieve the enterprise readiness that many user organizations demand. The Hadoop ecosystem is rounded out by software vendors that support interfaces to Hadoop. Thanks to this effort, Hadoop now integrates with a growing number of analytic tools, database management systems, reporting tools, and tools for data integration and extract, transform, and load (ETL). This support has made it much easier for Hadoop to play a beneficial role in established technology stacks for data warehousing, data management, analytics, reporting, and enterprise applications. You can see that many parties are making substantive contributions to the development of the Hadoop product family and its surrounding ecosystem. As a result, Hadoop improves almost daily, making it an even more viable choice for enterprise use. Hadoop products are almost always deployed in combinations. Some combinations are purely open source. For minimal DBMS functionality, users can layer HBase over HDFS. They can also layer a query framework such as Hive over HDFS or HBase. Note that some implementations of MapReduce require HDFS and others don t. The earliest analytic applications for Hadoop data (by early adopters such as Internet firms) were developed using purely open source Hadoop products. However, as Hadoop goes mainstream across many industries, an emerging best practice is to interface open source Hadoop with a growing variety of enterprise applications and data platforms (as explained in the next section of this report). Adding Hadoop to the extended tool and platform ecosystem adds value to these applications and extends their lives by providing a larger and more diverse store for all data, as well as new analytic functionality. Hadoop products are available as open source from ASF. All can be downloaded at no cost from and ASF encourages contributions from developers to the source code of all the open source products it manages. Hadoop products are also available from software vendors. Several provide a distribution of HDFS, which is usually bundled with other Hadoop tools. A few vendors add value to HDFS and other Hadoop products by providing patches (for high availability or 2 Ibid, Figure 2. 3 TDWI RESEARCH tdwi.org

5 NUMBER THREE INTERFACE WITH HDFS DATA THROUGH HADOOP TOOLS. Integrating Hadoop into a multi-platform environment for warehousing, analytics, BI, data integration, and applications requires knowledge of the available interfaces. Today, most access to Hadoop data is through Hadoop tools that are layered over HDFS, namely MapReduce and tools that run atop MapReduce (Hive and Pig). In a practice that s common at early adoptor Internet firms, developers access Hadoop data by hand coding data processing logic (to be executed by MapReduce) or routines in the Hive Query Language (executed by Hive). Note that Hadoop Pig is a high-level tool that enables developers to design data access logic for MapReduce, which otherwise would require code written in Java, C++, C#, Python, R, and so on. Best practices for accessing Hadoop data are evolving away from hand-coded routines and toward solutions developed with vendor products. In fact, a wide variety of vendor products already support Hadoop by interfacing with MapReduce and Hive, including databases, analytic tools, applications, and tools for ETL and other forms of data integration. The best practice of interfacing HDFS from a vendor tool or platform has advantages over open source Hadoop tools and hand-coded routines: Hand-coded solutions are inherently slow to develop, timeconsuming to test and debug, difficult to update or reuse, and costly (due to the high payroll of programmers). In Hadoop, coding involves languages that most data professionals don t know. Compared to Spartan open source Hadoop tools, mature vendor tools, platforms, and applications are feature-rich, with modern GUIs that foster collaboration, reuse, standards, and productivity for developers. Today, most interfaces from vendor-built software to HDFS generate code that a Hadoop tool can execute. For example, for straightforward access to Hadoop data, a data visualization tool might generate HiveQL (or SQL, which is translated to HiveQL), then pass that to Hive for execution. If more extensive processing is needed (say, for analytic algorithms or ETL transformational processing), a tool might generate Java code that is optimized for MapReduce. Although code generation works for Hadoop interfaces, there is performance overhead involved in parsing and compiling the code. To avoid overhead, the emerging best practice in vendor interfaces is to run natively in Hadoop tools (especially MapReduce) without generating code. 4 TDWI RESEARCH tdwi.org

6 NUMBER FOUR EXTEND YOUR DATA WAREHOUSE ARCHITECTURE WITH HADOOP. Hadoop can be the powerful platform that enables scalability and handles diverse data types for certain components of your DW architecture: Data staging area. Much data processing occurs in a DW s staging area to prepare source data for specific uses (reporting, analytics, OLAP) and for loading into specific databases (DWs, marts, appliances). Much of this processing is done by homegrown or toolbased solutions for extract, transform, and load (ETL). Consider extending your ETL infrastructure by staging and processing a wide variety of big data on HDFS. Operational data stores (ODSs). To free up capacity on a data warehouse, many organizations manage detailed source data on an ODS consisting of a standalone hardware server running a DBMS instance. There s a need for ODS platforms that cost-effectively handle massive data volumes and more diverse data, which Hadoop can do. Online data archive. The source data stored long-term in ODSs can approach petabyte scale. Examples include call detail records in telco, sessionized clickstreams in e-commerce, and customer data in financial services. To cope with large volumes, some data is archived offline, which puts it beyond the reach of analytics. Hadoop can keep data online for constant access. Analytic sandbox. Essentially, this is a 21st century terabyte-scale data mart. Data analysts, data scientists, and other users need a place to collect large data sets for ad hoc analytics. This work can degrade DW performance and lead to an analytic silo, so it s best done with data in an analytic sandbox, which can be a governed, virtual area within HDFS. Data integration sandbox. This resembles the analytic sandbox except that it s a work area for data integration specialists designing and testing large joins, aggregations, and transformational logic with big data. Data lake. This is a pool of most of the data the business has collected for analytics. Instead of categorizing data by type and structure before storing it (which alters the usable content of data), data is left in the form in which it arrived at the lake, so all the source material is there for unforeseen analytic applications. Whether you call it a data lake, a logical data warehouse, or a virtual data warehouse, this is the direction many best practices in big data analytics are headed. Note that the data warehouse components just listed are logical components that can be physically deployed to the central warehouse or to a variety of other data platforms. Imagine deploying all of them on HDFS, but with each logical component as a virtual data structure over a data lake. Because it is virtual, each data warehouse component is easily created and altered, and all virtual data structures can share data without replicating it. A long tradition exists of transforming, remodeling, and tweaking data to optimize it for performance. Today, massively parallel processing (MPP) platforms (such as Hadoop and many relational DBMSs) scale and perform so well that on-the-fly aggregations and transformations return results in reasonable time frames. Likewise, virtual views of data in the lake perform well, making best practices in virtual warehousing more pragmatic than in the past. 5 TDWI RESEARCH tdwi.org

7 NUMBER FIVE EMBRACE NEW BEST PRACTICES IN DATA MANAGEMENT AS ENABLED BY HADOOP. When possible, take analytic logic to the data, not vice versa. For decades, most analytic tools required that data be transformed to a special model and moved to a special database or file prior to analysis. Given the volumes of today s big data, this is no longer feasible. However, Hadoop was designed for processing data in place. Think of how MapReduce and Hive access and process Hadoop data without moving or remodeling it first. Once an information system is identified as a source for analytics, pre-load its data into Hadoop. In the long run, this is faster than retrieving the data prior to each run of an analytic process, especially if the data is voluminous. Keep Hadoop data synchronized with other systems. Preloading data into HDFS means you must devise processes that keep Hadoop data up to date. Look for changed data capture functionality in data integration tools that interface with HDFS. Operationalize the discoveries of analytics as permanent data structures in a data warehouse. It s ironic that data analysts, data scientists, and similar users scan gigantic volumes of data to understand a business problem (what s the root cause of the latest form of churn?) or opportunity (what new customer attributes are emerging?). Then they typically boil it all down to a relatively small data set expressed in a model that represents their epiphany. Too often, analysts share the epiphany with a few peers and managers, then move on to the next analytic assignment. Instead, analysts should always take the outcome of analytics to the BI and DW team in case the team sees the need to operationalize in reports what was initially discovered via analysis. For example, when analysis reveals a new form of churn, metrics for it should be added to dashboards in which managers track churn. Likewise, when a new customer attribute is discovered, metrics and reports about customers should be updated so managers have complete information about customers. Offload the data and processes for which DWs are not well suited. This includes data types that few DWs were designed for, such as detailed source data and any file-based data (logs, XML, text documents, unstructured data). It includes most ETL and data integration processes, especially those that must run at massive scale (e.g., aggregating tens of terabytes, sorting millions of call detail records). Hadoop is designed for these data types and operations, and Hadoop capacity is far less expensive than DW capacity. The real point, however, is that offloading allows the DW to do what it does best: provide squeaky clean, well-modeled data with a well-documented audit trail for standard reports, dashboards, performance management, and OLAP. In turn, this preserves the business s investment in the warehouse, reduces the cost of future expansion, and redirects funds from doomed tasks (managing data that DWs were not designed for) to successful ones (providing quality data for report consumers and OLAP users). Onboard new data sources quickly and without fear. DW professionals are often hesitant when it comes to integrating data from a new source into the warehouse because it takes time to model new data structures and design ETL jobs. In addition, disaggregating poor-quality or untrustworthy data from the DW s calculated values, time series, and dimensional structures is so difficult as to be impossible. With a data-lake approach to HDFS, modeling and ETL are not required, and disaggregation can be as simple as altering virtual views or analytic algorithms so they ignore files containing questionable data. Don t forget mainframe and legacy systems. These, too, have big data that could be pre-loaded to a data lake on Hadoop. Although new sources such as Web logs, machine data, and social media are key to capturing what happens before and after a transaction, in many organizations, detailed transactional data is still captured and processed on mainframes. However, capacity on these systems is often so expensive as to preclude analytics, whereas cheap capacity on Hadoop makes analytics economically feasible. Hadoop does not support mainframe data natively, but some commercial data integration tools support both mainframes and Hadoop. 6 TDWI RESEARCH tdwi.org

8 NUMBER SIX LEVERAGE HADOOP FOR RELATIVELY STATIC, REGULARLY REPEATED QUERIES AND OTHER WORKLOADS WITH MASSIVE DATA SETS. Data management professionals depend heavily on structured query language (SQL). Some of them can write optimal SQL code, and they want to leverage this valuable skill. However, most users depend on optimized SQL that is generated by vendor-built tools for reporting, OLAP, data integration, databases, and some forms of analytics. Among the many forms of advanced analytics, query-based analytics is one of the most popular right now; it involves large, complex SQL routines, sometimes consisting of hundreds of lines of standard SQL. Although SQL is more important than ever, Hadoop is currently weak in its support for standard SQL. The Hive Query Language (HiveQL) resembles SQL, so data professionals can learn it easily. The catch is that HiveQL is not as feature-rich as SQL. A bigger challenge is one of tool compatibility, because most of the SQL that users need is generated by vendor tools. Many tools can now translate generated SQL into HiveQL and pass it to Hive, but this entails overhead. A better approach is for a tool to interoperate natively with the MapReduce framework, thereby avoiding code generation and the additional Hadoop tool layers needed to translate HiveQL, Pig, or Java into MapReduce calls. In a related issue, MapReduce is inherently a batch process, which amounts to high data latency for queries, ETL processing, and analytics executed via MapReduce. Data latency is exacerbated by the gargantuan volumes managed in HDFS. Furthermore, HDFS is a file system, so it scans large amounts of data when queried, instead of the selective random access to data we expect from a query run in a relational DBMS. Despite high latency and weak SQL support, queries are very powerful on Hadoop, whether working solely with Hive or also involving SQL-based vendor tools. TDWI has interviewed users who have complex queries in production and who also perform ad hoc queries against Hadoop data. The problem is that Hadoop queries tend to run slowly, some taking hours or days. The development process can also be slow, especially when it involves hand coding and the coordination of multiple tools. Speed issues aside, some queries are ideally suited to Hadoop, especially those that sum instances of entities at scale. For example, some of the first production queries on Hadoop were at Internet firms that needed to count hits on Web pages, as seen in thousands of Web logs, each with thousands of click records. A leading telco scans millions of file-based call detail records in HDFS, summarizing traffic and switch activities. A smartphone manufacturer queries a few billion quality assurance records to correlate suppliers with bad supplies. Some queries are not such a good fit with HDFS and Hadoop tools, particularly those that are iterative. For example, a data analyst practicing query-based analytics starts with an ad hoc query, looks at the results, then revises the query and runs it again. Many iterative revisions later, the analyst has a query result set that summarizes the thing he or she hoped to discover, such as the attributes of profitable customers, a list of problematic suppliers, or a leak in bottom-line costs. Ad hoc and iterative queries are possible in Hadoop, but the data analyst must expect long time frames between result sets. To avoid these delays, an emerging best practice is to manage source data in Hadoop (similar to the data lake discussed earlier) but extract subsets of potentially relevant data and load them into a relational DBMS. That DBMS may be in the core warehouse, a data warehouse appliance, or a standalone system (such as a columnar database). Moving a subset of Hadoop data to a relational platform enables the data analyst to leverage existing tools and skills for SQL, and it makes the analyst more agile and productive by greatly reducing the waiting periods between iterative runs of queries. 7 TDWI RESEARCH tdwi.org

9 NUMBER SEVEN GET MANY USES FROM A FEW HDFS CLUSTERS. NUMBER EIGHT AUGMENT YOUR HADOOP ENVIRONMENT WITH SPECIAL TECHNOLOGIES FOR REAL-TIME DATA. Beware of analytic silos. Analytic applications tend to be departmental by nature. Sales and marketing need to own and control customer analytics, procurement owns supply chain analytics, and so on. As analytic applications flourish and department requirements become more important, TDWI is seeing HDFS clusters deployed per department. This results in the age-old data silo problem, but with new big data. Consolidating most analytic data into one HDFS cluster can reduce costs for redundant clusters and nodes. It pools data for a data lake approach, as well as provides the single version of the truth that credible decision-making is based on. Big data is too big to move. Leading goals for a Hadoop implementation should be to reduce the number of places data is stored and to establish methods that minimally transform or replicate data as with the data lake, data virtualization, and the logical data warehouse. Moving big data across multiple clusters works against these goals, and is not likely to succeed once data volumes swell. YARN will improve Hadoop s concurrency. Version 1.0 of Hadoop offers limited concurrency in the sense of multiple processes running simultaneously (from MapReduce, Hive, and vendor tools). Version 2.0 is now available, and it includes a new layer called YARN, which provides greater concurrency and administration for multiple tools. In turn, YARN fosters a strategy of more simultaneous processes on fewer HDFS clusters. Imagine Hadoop as shared infrastructure. In many organizations, IT has evolved into an infrastructure provider, and other teams deploy applications atop that infrastructure. In that spirit, central IT could supply HDFS, similar to the way it provides networks, storage subsystems, and racks of servers. HDFS would then be shared by teams for applications, warehousing, integration, analytics, and so on. Hadoop isn t free, despite its open source origins. Setting up and maintaining an HDFS cluster is complex and has administrative payroll costs. Although HDFS runs well on commodity-priced hardware, acquisition and maintenance costs for hardware go up as the cluster grows. Organizations can control costs by consolidating redundant clusters. Similarly, they can reduce the number of nodes by using tools that are optimized for Hadoop. This will allow them to get more out of individual nodes and therefore perform well with fewer nodes. Manage streaming data for new business insights. One of the toughest types of big data to process is streaming data, because it comes at you relentlessly from sensors, machines, devices, and applications. Yet, streaming data is very promising for businesses, because it represents new, untapped data sources that can be analyzed to understand and improve operational efficiencies, Web behaviors, logistics, machine maintenance, and more. Correlate real-time data with Hadoop data and enterprise data. Real-time data represents now an event that just happened or the state of an entity that just changed, such as a customer touch point, the current location of a delivery truck, or a machine that suddenly needs maintenance. To understand the full relevance of an event that happened a moment ago, it s best to correlate data about the event with historical or seasonal data about that entity, as found in a warehouse, Hadoop, or an operational application. Batch-oriented, high-latency HDFS can barely capture data that streams in real time, much less process it in real time. However, special technologies for complex event processing (CEP), operational intelligence (OI), or in-memory analytics are known to make such correlations in true real time in seconds or milliseconds. Capture and store streaming data for offline analytics later. Streaming data should be captured and stored en masse for offline analytics later. Most events, messages, transactions, alerts, clicks, and so on in a stream have a record structure, and these can be captured and appended to a flat file. Hadoop excels in the management and analysis of such file-based data. Enable many right-time speeds and frequencies. Most organizations today need fresher data that is collected, processed, and delivered more frequently than it was in the past, but that doesn t always mean true real time. For example, moving the refresh of reports and analytic models from overnight only to three times daily provides executives with data at a freshness level that s just right for the processes they manage. Hadoop can be adapted to some of the data techniques in use today for various right-time speeds, namely microbatches, data virtualization, data federation, and changed data capture. 8 TDWI RESEARCH tdwi.org

10 ABOUT OUR SPONSORS ABOUT THE AUTHOR As market leader in enterprise application software, SAP (NYSE: SAP) helps companies of all sizes and industries run better. From back office to boardroom, warehouse to storefront, desktop to mobile device SAP empowers people and organizations to work together more efficiently and use business insight more effectively to stay ahead of the competition. SAP applications and services enable more than 248,500 customers to operate profitably, adapt continuously, and grow sustainably. Philip Russom is the research director for data management at The Data Warehousing Institute (TDWI), where he oversees many of TDWI s research-oriented publications, services, and events. He s been an industry analyst at Forrester Research and Giga Information Group, where he researched, wrote, spoke, and consulted about BI issues. Before that, Russom worked in technical and marketing positions for various database vendors. Over the years, Russom has produced more than 500 publications and speeches. You can reach him at prussom@tdwi.org. ABOUT TDWI RESEARCH Syncsort provides data-intensive organizations across the big data continuum with a smarter way to collect, process, and distribute the ever-expanding data avalanche. With thousands of deployments across all major platforms, including mainframe, Syncsort helps customers around the world to overcome the architectural limits of today s ETL and Hadoop environments, empowering their organizations to drive better business outcomes in less time, with fewer resources and lower TCO. For decades, Syncsort has been the undisputed leader in highperformance data processing technology for the mainframe and the fastest ETL software for Windows, Unix, and Linux. Thanks to breakthrough innovations and ongoing contributions to the Apache Hadoop open source community, organizations can now run the same technology natively within the MapReduce framework. The result is Syncsort DMX-h, high-performance software to collect, transform, and distribute all your data with Hadoop. DMX-h turns Hadoop into a more robust and feature-rich ETL solution, enabling users to maximize the benefits of MapReduce without compromising on capabilities, ease of use, and typical use cases of conventional ETL tools. Accelerate your data integration initiatives and unleash Hadoop s potential with the only architecture that runs ETL processes natively within Hadoop. TDWI Research provides research and advice for business intelligence and data warehousing professionals worldwide. TDWI Research focuses exclusively on BI/DW issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of business intelligence and data warehousing solutions. TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor organizations. ABOUT THE TDWI CHECKLIST REPORT SERIES TDWI Checklist Reports provide an overview of success factors for a specific project in business intelligence, data warehousing, or a related data management discipline. Companies may use this overview to get organized before beginning a project or to identify goals and areas of improvement for current projects. 9 TDWI RESEARCH tdwi.org

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

Evolving Data Warehouse Architectures

Evolving Data Warehouse Architectures Evolving Data Warehouse Architectures In the Age of Big Data Philip Russom April 15, 2014 TDWI would like to thank the following companies for sponsoring the 2014 TDWI Best Practices research report: Evolving

More information

Big Data and Your Data Warehouse Philip Russom

Big Data and Your Data Warehouse Philip Russom Big Data and Your Data Warehouse Philip Russom TDWI Research Director for Data Management April 5, 2012 Sponsor Speakers Philip Russom Research Director, Data Management, TDWI Peter Jeffcock Director,

More information

Using and Choosing a Cloud Solution for Data Warehousing

Using and Choosing a Cloud Solution for Data Warehousing TDWI RESEARCH TDWI CHECKLIST REPORT Using and Choosing a Cloud Solution for Data Warehousing By Colin White Sponsored by: tdwi.org JULY 2015 TDWI CHECKLIST REPORT Using and Choosing a Cloud Solution for

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

DATA REPLICATION FOR REAL-TIME DATA WAREHOUSING AND ANALYTICS

DATA REPLICATION FOR REAL-TIME DATA WAREHOUSING AND ANALYTICS TDWI RESE A RCH TDWI CHECKLIST REPORT DATA REPLICATION FOR REAL-TIME DATA WAREHOUSING AND ANALYTICS By Philip Russom Sponsored by tdwi.org APRIL 2012 TDWI CHECKLIST REPORT DATA REPLICATION FOR REAL-TIME

More information

INTEGRATING HADOOP INTO BUSINESS INTELLIGENCE AND DATA WAREHOUSING

INTEGRATING HADOOP INTO BUSINESS INTELLIGENCE AND DATA WAREHOUSING TDWI research TDWI BEST PRACTICES REPORT SECOND QUARTER 2013 INTEGRATING HADOOP INTO BUSINESS INTELLIGENCE AND DATA WAREHOUSING By Philip Russom tdwi.org Research Sponsors Research Sponsors Cloudera EMC

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Ten Mistakes to Avoid

Ten Mistakes to Avoid EXCLUSIVELY FOR TDWI PREMIUM MEMBERS TDWI RESEARCH SECOND QUARTER 2014 Ten Mistakes to Avoid In Big Data Analytics Projects By Fern Halper tdwi.org Ten Mistakes to Avoid In Big Data Analytics Projects

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Active Data Archiving

Active Data Archiving TDWI RESEARCH TDWI CHECKLIST REPORT Active Data Archiving For Big Data, Compliance, and Analytics By Philip Russom Sponsored by: tdwi.org MAY 2014 TDWI CHECKLIST REPORT ACTIVE DATA ARCHIVING For Big Data,

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Bringing the Power of SAS to Hadoop. White Paper

Bringing the Power of SAS to Hadoop. White Paper White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What

More information

Navigating Big Data business analytics

Navigating Big Data business analytics mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Data Integration for Real-Time Data Warehousing and Data Virtualization

Data Integration for Real-Time Data Warehousing and Data Virtualization TDWI RESEARCH TDWI CHECKLIST REPORT Data Integration for Real-Time Data Warehousing and Data Virtualization By Philip Russom Sponsored by tdwi.org O C T OBER 2 010 TDWI CHECKLIST REPORT Data Integration

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

Achieving Business Value through Big Data Analytics Philip Russom

Achieving Business Value through Big Data Analytics Philip Russom Achieving Business Value through Big Data Analytics Philip Russom TDWI Research Director for Data Management October 3, 2012 Sponsor 2 Speakers Philip Russom Research Director, Data Management, TDWI Brian

More information

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco Decoding the Big Data Deluge a Virtual Approach Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco High-volume, velocity and variety information assets that demand

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

How To Turn Big Data Into An Insight

How To Turn Big Data Into An Insight mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

P u b l i c a t i o n N u m b e r : W P 0 0 0 0 0 0 0 4 R e v. A

P u b l i c a t i o n N u m b e r : W P 0 0 0 0 0 0 0 4 R e v. A P u b l i c a t i o n N u m b e r : W P 0 0 0 0 0 0 0 4 R e v. A FileTek, Inc. 9400 Key West Avenue Rockville, MD 20850 Phone: 301.251.0600 International Headquarters: FileTek Ltd 1 Northumberland Avenue

More information

The Business Analyst s Guide to Hadoop

The Business Analyst s Guide to Hadoop White Paper The Business Analyst s Guide to Hadoop Get Ready, Get Set, and Go: A Three-Step Guide to Implementing Hadoop-based Analytics By Alteryx and Hortonworks (T)here is considerable evidence that

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA

PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Harnessing the combined power of SAP HANA and PARC s HiperGraph graph analytics technology for real-time insights

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

TDWI research. TDWI Checklist report. Data Federation. By Wayne Eckerson. Sponsored by. www.tdwi.org

TDWI research. TDWI Checklist report. Data Federation. By Wayne Eckerson. Sponsored by. www.tdwi.org TDWI research TDWI Checklist report Data Federation By Wayne Eckerson Sponsored by www.tdwi.org NOVEMBER 2009 TDWI Checklist report Data Federation By Wayne Eckerson TABLE OF CONTENTS 2 FOREWORD 2 NUMBER

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Trafodion Operational SQL-on-Hadoop

Trafodion Operational SQL-on-Hadoop Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL

More information

Data Warehousing in the Cloud

Data Warehousing in the Cloud TDWI RESEARCH TDWI CHECKLIST REPORT Data Warehousing in the Cloud By David Loshin Sponsored by: tdwi.org JULY 2015 TDWI CHECKLIST REPORT Data Warehousing in the Cloud By David Loshin TABLE OF CONTENTS

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

Modernizing Your Data Warehouse Architecture with Hadoop

Modernizing Your Data Warehouse Architecture with Hadoop MARCH 2014 TDWI E-Book Modernizing Your Data Warehouse Architecture with Hadoop 1 Q&A: Best Practices for Offloading Tasks to Hadoop 3 How (and Why) Hadoop Is Changing the Data Warehousing Paradigm 6 Hadoop

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora SAP Brief SAP Technology SAP HANA Vora Objectives Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora Bridge the divide between enterprise data and Big Data Bridge the divide

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

E-Guide THE CHALLENGES BEHIND DATA INTEGRATION IN A BIG DATA WORLD

E-Guide THE CHALLENGES BEHIND DATA INTEGRATION IN A BIG DATA WORLD E-Guide THE CHALLENGES BEHIND DATA INTEGRATION IN A BIG DATA WORLD O n one hand, while big data applications have eliminated the rigidity of the data integration process, they don t take responsibility

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Big Data and Your Data Warehouse Philip Russom

Big Data and Your Data Warehouse Philip Russom Big Data and Your Data Warehouse Philip Russom TDWI Research Director for Data Management May 7, 2013 Sponsor Speakers Philip Russom TDWI Research Director, Data Management Chris Twogood VP, Product and

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.

More information

E-Guide HADOOP MYTHS BUSTED

E-Guide HADOOP MYTHS BUSTED E-Guide HADOOP MYTHS BUSTED I n many organizations, the growing volume and increasing complexity of data are straining performance and highlighting the limits of the traditional data warehouse. Today,

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

How To Make Data Streaming A Real Time Intelligence

How To Make Data Streaming A Real Time Intelligence REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

SAP and Hortonworks Reference Architecture

SAP and Hortonworks Reference Architecture SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

1 Performance Moves to the Forefront for Data Warehouse Initiatives. 2 Real-Time Data Gets Real

1 Performance Moves to the Forefront for Data Warehouse Initiatives. 2 Real-Time Data Gets Real Top 10 Data Warehouse Trends for 2013 What are the most compelling trends in storage and data warehousing that motivate IT leaders to undertake new initiatives? Which ideas, solutions, and technologies

More information

Data Warehouse Optimization

Data Warehouse Optimization Data Warehouse Optimization Embedding Hadoop in Data Warehouse Environments A Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy September 2013 Sponsored by Copyright

More information

DATA VISUALIZATION AND DISCOVERY FOR BETTER BUSINESS DECISIONS

DATA VISUALIZATION AND DISCOVERY FOR BETTER BUSINESS DECISIONS TDWI research TDWI BEST PRACTICES REPORT THIRD QUARTER 2013 EXECUTIVE SUMMARY DATA VISUALIZATION AND DISCOVERY FOR BETTER BUSINESS DECISIONS By David Stodder tdwi.org EXECUTIVE SUMMARY Data Visualization

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Hadoop for the Enterprise:

Hadoop for the Enterprise: TDWI RESEARCH SECOND QUARTER 2015 TDWI BEST PRACTICES REPORT Hadoop for the Enterprise: Making Data Management Massively Scalable, Agile, Feature-Rich, and Cost-Effective By Philip Russom Co-sponsored

More information

Big Data Introduction

Big Data Introduction Big Data Introduction Ralf Lange Global ISV & OEM Sales 1 Copyright 2012, Oracle and/or its affiliates. All rights Conventional infrastructure 2 Copyright 2012, Oracle and/or its affiliates. All rights

More information

VIEWPOINT. High Performance Analytics. Industry Context and Trends

VIEWPOINT. High Performance Analytics. Industry Context and Trends VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Big Data Can Drive the Business and IT to Evolve and Adapt

Big Data Can Drive the Business and IT to Evolve and Adapt Big Data Can Drive the Business and IT to Evolve and Adapt Ralph Kimball Associates 2013 Ralph Kimball Brussels 2013 Big Data Itself is Being Monetized Executives see the short path from data insights

More information

Five Best Practices for Maximizing Big Data ROI

Five Best Practices for Maximizing Big Data ROI E-PAPER FEBRUARY 2014 Five Best Practices for Maximizing Big Data ROI Lessons from early adopters show how IT can deliver better business results at less cost. TW_1401138 Organizations of all kinds have

More information

The IBM Cognos Platform

The IBM Cognos Platform The IBM Cognos Platform Deliver complete, consistent, timely information to all your users, with cost-effective scale Highlights Reach all your information reliably and quickly Deliver a complete, consistent

More information

Agile Business Intelligence Data Lake Architecture

Agile Business Intelligence Data Lake Architecture Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

UNIFY YOUR (BIG) DATA

UNIFY YOUR (BIG) DATA UNIFY YOUR (BIG) DATA ANALYTIC STRATEGY GIVE ANY USER ANY ANALYTIC ON ANY DATA Scott Gnau President, Teradata Labs scott.gnau@teradata.com t Unify Your (Big) Data Analytic Strategy Technology excitement:

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

MDM and Data Warehousing Complement Each Other

MDM and Data Warehousing Complement Each Other Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Accelerate Business Advantage with Dynamic Warehousing

Accelerate Business Advantage with Dynamic Warehousing Accelerate Business Advantage with Dynamic Warehousing Mark McConnell Marketing Executive, Information Management IBM Asia Pacific 2007 IBM Corporation Is Information Technology delivering? Source: IBM

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information