HADOOP. Unleashing the Power of. for Big Data Analysis. Best Practices Series. MarkLogic. Cisco. Attunity. Couchbase PAGE 14 PAGE 17 PAGE 18 PAGE 19

Size: px
Start display at page:

Download "HADOOP. Unleashing the Power of. for Big Data Analysis. Best Practices Series. MarkLogic. Cisco. Attunity. Couchbase PAGE 14 PAGE 17 PAGE 18 PAGE 19"

Transcription

1 MarkLogic PAGE 14 MAKING HADOOP BETTER WITH MARKLOGIC Best Practices Series Cisco PAGE 17 CREATING SOLUTIONS TO MEET OUR CUSTOMERS DATA AND ANALYTICS CHALLENGES Attunity PAGE 18 HADOOP DATA LAKES: INCORPORATING AN ODS WITH DATA INTEGRATION TO ENSURE A SUCCESSFUL INITIATIVE Couchbase PAGE 19 HADOOP IN THE WILD Unleashing the Power of HADOOP for Big Data Analysis

2 12 OCTOBER/NOVEMBER 2014 DBTA 8 STEPS to Unleashing the Power of HADOOP Best Practices Series According to a recent DBTA survey, 30% of respondents report having the Hadoop framework in production today, and another 26% plan to acquire or implement the technology over the coming year. The survey shows that enterprises are adopting Hadoop for analytics/business intelligence, IT operational data requirements (logs, systems monitoring), and special projects. While the open source framework is popular and being widely deployed, there s no shortage of great anticipation as well as confusion about it either. Hadoop is well-designed for big data jobs, as it supports a variety of data types within its file system. However, it is not a panacea for organizations struggling to be more data-driven, and often adds more of a burden than it resolves. For example, Hadoop environments often require highly specialized data science and development skills, which are in short supply. Governance is also an issue, since Hadoop projects often fall outside the established order of metadata management, data mappings and data quality processes painstakingly put in place for relational database management and data warehouse sites. Plus, being a widely available open source product, Hadoop often gets implemented and run in various places across organizations, meaning that there may be multiple projects underway that potentially even duplicate each other. The challenge is to bring Hadoop activities into existing IT environments. Ultimately, the result is bottlenecks or even an inability to move data into and out of Hadoop environments. Data integration across new environments or application areas may also be a challenge, since Hadoop-based data may be managed within its own special silos. Real-time data is also an issue. While there are tools and updates that move Hadoop from batch to real-time mode, there may still be issues with complex hybrid queries requiring both historical and real-time data. Here are some pointers for successful Hadoop implementations: Focus on business challenges that existing data platforms aren t adequately handling. Many decision makers are seeking insights from data that is beyond the reach of traditional data warehouse environments or relational databases, and are expensive to incorporate. For example, weblog data may point to important trends in terms of digital engagements, but there may be too much data being generated for the existing data infrastructure to support. Many departments have such requirements; meeting these requests is where Hadoop can deliver to the business.

3 DBTA OCTOBER/NOVEMBER Make Hadoop real for the business. In recent years, data governance efforts have been successful at not only ensuring that data moving through organizations is trustworthy but also that it serves the needs of its business owners. Hadoop, typically spun up in data centers as pilot projects or for specialized jobs, has been outside the data governance sphere. It s time to incorporate Hadoop into existing governance processes for data management, to bring about an organizational mission for the data framework, along with business unit sponsorship. Departments that will work most closely with big data such as marketing, sales, and IT will have a forum to collaborate and provide guidance. Plus, such a framework sets up a consistent process and methodology for vetting data and new projects. Another initiative seen in the database and data warehouse world centers of excellence are needed to shield growing Hadoop projects from intraorganizational squabbles. Remember that not every project is right for Hadoop. Build out your Hadoop architecture incrementally. Start with smaller, low-hanging fruit -type projects that can be quickly turned around to show a win. While the framework is very effective at processing analytical jobs across various data formats, it may not be suitable for mixed workloads, such as combined historical and real-time queries. There are many places where relational databases still fill the bill. For example, if SQL statements or more complex queries are needed, or if the business case for unstructured data still being defined. Determine the types and sources of data to be flowed into Hadoop. There are many types of data particularly in unstructured formats that don t fit well into data warehouse or relational data environments and can be offloaded into Hadoop environments. If they are made to fit, the process may be prohibitively expensive. Ultimately, Hadoop offers a way to quickly access well-packaged files for rapid analysis. Develop an architectural approach that fully incorporates Hadoop. Until now, vendors have been positioning Hadoop as a data source that then feeds into a relational database management system or data warehouse. New thinking puts Hadoop at the core of such architectures, pulling in data from warehouses and existing databases. Evaluate data integration strategies that can incorporate data managed within Hadoop. The framework needs to be designed into a data process flow as part of its movement from original sources to analytical environments. There are many types of data particularly in unstructured formats that don t fit well into data warehouse or relational data environments and can be offloaded into Hadoop environments. Get involved with Hadoop user groups or communities, and connect with other users. This is an opportunity to learn the latest thinking and evolving best practices. Talk to others within your industry to find out what works, and what potential pitfalls may be. Be sure to explore these issues with enterprises managing the same types of data. For example, manufacturers may be focused on data generated by machines, robots, and sensors. Publishers and media companies will be focused on content. Find out from others how Hadoop is being used to manage those types of files. Look inside and outside for the right skills. Hadoop requires new sets of skills, both technical and problemsolving. There is a need for data scientists, analysts, and developers. The rise of Hadoop introduces new tools and languages, such as the R language, employed for statistical problem-solving, as well as tools such as MapReduce, Yarn, and Apache Spark. Along with new sets of technical skills, Hadoop requires looking at data differently, to identify analysis opportunities, uncover hidden nuggets, and then communicate those data findings to the business. These skills may be hard to find on the open job market, but fundamental understanding may be readily available within existing IT or data management departments. Many current data professionals are already performing the rudimentary tasks of data scientists, and can be brought up-to-speed with further training and education. Have Hadoop complement, not replace, existing data warehouses and data platforms. There s a role for Hadoop, and there s a role for existing data environments. Much of the data now coming in, due to cost or structure, may be better suited for storage within non-relational environments as the first option. The bottom line is that most enterprises are accumulating vast stores of both unstructured and structured data, which ultimately need to be integrated. The key challenge is to be able to move away from point-to-point integration, which is not sustainable within big data environments and is often welded to specific applications, to build a wellarchitected data environment that can readily ingest and analyze massive and varied datasets. Hadoop makes such approaches possible, acting as a data staging area, operational data store, and even analytic sandbox. n Joe McKendrick

4 14 OCTOBER/NOVEMBER 2014 DBTA Sponsored Content Making Hadoop Better With MarkLogic When someone says Hadoop, they typically mean an entire ecosystem of projects, all of which are focused on dealing with big data through a framework of distributed processing over large groups of commodity machines. There s a lot of activity, a lot of choice and quite a bit of confusion. With a large number of moving parts, and a large number of vendors providing Hadoop support and customized distributions, how do you decide where to begin? The initial development of Hadoop by organizations like Google and Yahoo! was a response to not being able to handle what we would now call big data in legacy RDBMSs or data warehouses. To date, there aren t many Hadoop-based applications in production to learn from, so many organizations are struggling to figure out how to get real benefit from it without hiring an army of IT staff and spending significant amounts of time and money. In this editorial, we ll explain the ways the MarkLogic platform can help you use Hadoop to deliver real-time big data applications, improve data governance, and save money. HADOOP & MARKLOGIC: AN OVERVIEW MarkLogic can be deployed against any of the leading commercial Hadoop distributions, allowing administrators to leverage existing infrastructure. Though the MarkLogic Enterprise NoSQL platform is not dependent on Hadoop, MarkLogic and Hadoop function in a complementary manner in a big data ecosystem. MarkLogic works with the two core elements of Hadoop Hadoop Distributed File System (HDFS) and MapReduce which are the most mature parts of the ecosystem and the foundation for all the other pieces. HDFS provides storage for data that is too large or unpredictable for traditional databases or data warehouses. While it s not designed for real-time data access that requires indexes and interactive query capabilities, HDFS is a cost-effective way to keep data that may have otherwise been discarded or archived to tape. MapReduce does distributed computation on the data stored in HDFS it s useful for batch processing where you need to perform analytics or enrichment on massive datasets, but what if you need to provide users with the ability to quickly find specific pieces of data and provide granular updates to the data in real time? If you need to do near-instantaneous analysis and alerting for fraud detection, emergency crisis management, risk mitigation or assessment, can you afford the time it would take for a MapReduce job to complete? Hadoop has three primary use cases in the enterprise: Staging: accommodate any shape of data relatively cheaply Persistence: keep the raw input for analytics without losing the original context Analytics: perform large scale analytics on raw or prepared data However, Hadoop alone cannot provide real-time applications or the governance around the data that enterprises require today. MarkLogic makes Hadoop better by bringing the power of Enterprise NoSQL to address these limitations. MarkLogic is unique in the marketplace in providing the best of NoSQL while also being a hardened and proven enterprise-class database technology. Created in 2001 to fill the need within enterprise organizations and government entities to store, manage, query and search data, no matter the format or structure, MarkLogic has these NoSQL characteristics: Flexible, with a schema-agnostic, document data model (JSON, XML, Text, Binary, RDF Triples) Fast, implemented in C++, optimized for today s I/O systems Scalable, leveraging a shared-nothing distributed architecture and lockfree reads MarkLogic is also highly available, with transactional consistency, automatic failover, and replication. As an Enterprise NoSQL database platform, MarkLogic was designed from the start to support enterprise-class and enterprise-scale application requirements, including: ACID (atomic, consistent, isolated, and durable) transactions, just like you get from relational DBMS Government-grade security features including fine-grained privileges, role-based security, document-level permissions, and HTTPS access Real-time indexing, full-text search, geospatial search, semantic search, and alerting Proven reliability, uptime, and over 500 deployed mission-critical and enterprise projects in government, media, financial services, energy, and other industries REAL-TIME APPLICATIONS: MARKLOGIC IS THE BEST DATABASE FOR HADOOP First things first: HDFS is a costeffective file system, but it has no indexes, so finding an individual record typically involves scanning through every record in a large file. That might be okay for large-scale analytics where the computation might need to read every record but it can t support the low-latency queries and granular updates required for real-time workloads and end-user applications. For that, you need a database. Hadoop alone is not equipped for this type of workload.

5 Sponsored Content DBTA OCTOBER/NOVEMBER MarkLogic & Hadoop: Complementary Big Data Capabilities MarkLogic Online applications Decision-making Real-time Distributed indexes The popular tech press will have you think it s a stark trade-off between legacy relational databases which provide indexes, transactions, security, and enterprise operations and the open source NoSQL databases like HBase which have a flexible data model and commodity scale-out while being distributed and fault-tolerant, but are less mature in their enterprise roadmap. What if you could have the best of both of these worlds? With MarkLogic, you get all of the scalability on commodity hardware that s come to define the NoSQL space. However, you don t have to sacrifice the enterprise capabilities like ACID transactions, security, high availability and disaster recovery that your missioncritical applications require. This is why we believe MarkLogic is the best database for Hadoop. TIERED STORAGE: COST-EFFECTIVE SUPPORT FOR A VARIETY OF SLAS Next, what if you could segregate your data, such that you could align how it s stored, with its value AND make it available whenever it s needed? All data is valuable but the value of the data may vary based on business need at a given time: In a typical organization, a small amount of data accounts for most of the value, for example, current transactions or the latest news. This is the data that requires high availability and interactive response times. However, as data ages along the long tail, its access patterns change. Hadoop Offline analytics Model-building Long-haul batch Distributed file system Historical data is typically not the data you are running your business on. You may need to keep it around for regulatory compliance or reporting, but it s likely not something that needs millisecond interactivity or high availability. Economically, it makes sense to more densely pack this data on cheaper storage. Finally, the economics of storage and compute have allowed organizations to keep the long tail around. Much of this data may not need to be online or immediately queryable, but should be accessible to quickly spin up for analysis and then spin down again to conserve computer resources. MarkLogic allows you to store data across different types of storage. This, in itself, is not a new capability as database and storage vendors have been ordering hierarchical storage and Information Lifecycle Management for years. What differentiates our offering is the ability to easily and consistently move data between tiers without complicated and expensive ETL and data duplication index it once when it s first ingested and query and leverage those indexes for search and analytics, no matter where it s stored. By allowing your data to live in the most appropriate tier of infrastructure, you can save money while still providing appropriate performance and availability for applications. Aligning a storage strategy with the value and use of your data allows you to make smarter tradeoffs among cost, performance, and availability. You can implement a data governance policy and deploy MarkLogic using a fluid mix of SSD, local disk, shared disk, and HDFS as well as Amazon EBS and S3. MarkLogic is unique in the fact that you can run a database on a mix of locally attached storage and shared storage. For example, you can benefit from less expensive Hadoop storage for archive data, with high density for efficiency, and shared-disk failover while using another tier of more expensive storage for active data, with low density for ingest performance, and replication for high availability. In MarkLogic moving data between local and shared storage is an online operation. There is no down time and all of the guarantees of ACID transactions hold. An administrator can easily move data around to the most appropriate infrastructure without having to ETL the data between two environments or without having developers change any downstream application code. It s the same executable with the same APIs on all these tiers, so you can write one app that runs across

6 16 OCTOBER/NOVEMBER 2014 DBTA Sponsored Content them seamlessly and transparently. An administrator could arrange the database to query, for example, just the local disk for the latest transactions, just the shared storage for the long tail, or both together with the database handling all of the nitty-gritty details of query federation and transactions. A tiered storage infrastructure with MarkLogic lets you fluidly and consistently switch between Active, Historical, and Archive data without expensive ETL or dedicated infrastructure. You can perform mixed batch and real-time workloads with Hadoop MapReduce and the MarkLogic Enterprise NoSQL Database. CONNECTOR FOR COMPUTE INFRASTRUCTURE: MARKLOGIC & MAPREDUCE MarkLogic can also use Hadoop as a compute layer. The MarkLogic Connector for Hadoop is a drop-in extension that integrates MarkLogic with MapReduce for ETL, analytics, or enrichment. For example, you can use the large ecosystem of Hadoop libraries to transform and aggregate data before loading it into MarkLogic. And the MarkLogic bulk loading tool, mlcp, schedules MapReduce jobs under the covers to load gigabytes, terabytes, or even petabytes in parallel. You can use Hadoop s powerful batch processing capabilities to enrich datasets or develop models before delivering them to real-time applications powered by MarkLogic or even use MarkLogic to mark up content then move it back into Hadoop. Finally, once data is indexed by MarkLogic and stored in its on-disk format it s not locked away. Using a feature we call direct access a Hadoop application can access the data in a MarkLogic data file without having to first mount it to a database. The implementation is very similar to some of the formats like Parquet and ORC that are coming out of the Hadoop Hive community. Of course, if you want to leverage MarkLogic s sophisticated indexes and security model, you ll have to come in through the front door. However, with direct access a MapReduce job can efficiently read all of the data in a MarkLogic data file. This means you re able to index the data once for real-time queries and updates and leverage that same data format for large-scale batch processing. By using the same data format you ll have fewer different representations floating around and maintain a single version of the truth. You ll also reduce the amount of ETL required to translate data between operational and analytic environments. SUMMARY Using Hadoop with MarkLogic s real-time Enterprise NoSQL database and tiered storage capabilities, you can build With MarkLogic, your Hadoop ecosystem has: Less ETL Data governance already built in ACID compliance as part of the design Schema agnostic no upfront data modeling Elasticity, scale everything when you need, as you need it out automated business rules that move your data to the right place for storage. You can search, query and use that data, no matter where it is whether in a disk array, or in distributed, commodity hardware in a Hadoop cluster, or perhaps even in the cloud without having to move it to a data mart or reconstruct your applications, queries, security, or data governance. MarkLogic and Hadoop are complementary technologies that work well together for today s Big Data challenges. By combining MarkLogic and Hadoop: You can build real-time enterprise applications for Hadoop-based data You can leverage existing (or upcoming) infrastructure investments to save time and money You will require less data movement and/or duplication over its lifecycle You can support mixed workloads: index once, real-time or batch You will save money from using cost-effective long-term and longtail storage Adding MarkLogic to your Hadoop stack makes it better helping you to deliver real-time big data applications, improve data governance, and save money. n MARKLOGIC

7 Sponsored Content DBTA OCTOBER/NOVEMBER Creating Solutions to Meet Our Customers Data and Analytics Challenges THE INTERNET OF EVERYTHING (IOE) People. Process. Data. Things. Yesterday, they functioned independently. Today, they need to function together through a combination of machineto-machine, person-to-machine and person-to-person connections. Creating new capabilities, richer experiences and incredible economic opportunity, Cisco calls this the Internet of Everything (IoE). The IoE is creating more data, more types of data, and in more places. While the IoE is making us all smarter, this wealth of data comes with two major challenges: 1) Effective management of massive amounts and types of data in multiple locations 2) Analyzing data quickly enough to respond to opportunities and threats At Cisco, we designed our data and analytics solutions to meet these two challenges and are working to bring together data and analytics securely in a way no other company can. Not only do we connect more people, processes, data, and things than any other company, we can also bring analytics to data wherever it is no matter how remote to turn information into insights almost instantly. The first step begins with our agile data integration software, Cisco Data Virtualization. Our Data Virtualization technology abstracts data from multiple sources and transparently brings it together to give users a unified, friendly view of the data that they need. By leveraging this technology with additional solutions, we help our customers access data across the IoE and use that data to respond quickly to change, gain competitive advantage and drive better outcomes. 1 OFFLOADING DATA FOR EFFECTIVE MANAGEMENT Driven by the massive amounts of data in today s IT environment, customers are facing huge expenses to add capacity to their existing enterprise data warehouses, the place in which data is traditionally stored. Investments regularly reach into the millions of dollars for large deployments. 1 We help customers tackle the challenge of rising enterprise data warehouse costs with Cisco Big Data Warehouse Expansion (BDWE), a solution that assists customers with strategy, tools and processes to extend the value of their traditional data warehouse investment. BDWE analyzes the warehouse, identifies infrequently used data and provides a methodology and tools to offload the data onto Hadoop, avoiding additional capacity costs and extending the life of the data warehouse. By implementing an ongoing strategy to offload data from the primary system to Hadoop, our solution frees up resources providing for better overall system performance. Additionally, we deploy our Data Virtualization technology that provides a layer of abstraction and simplified access spanning the original warehouse and the new Hadoop data store. ENHANCED ANALYTICS WITH CURRENT AND HISTORICAL DATA Many companies are forced by the economics of data management to implement aggressive Information Lifecycle Management (ILM) policies removing data from critical systems to avoid costs. With BDWE, we can help customers keep more data online and available for deeper and more insightful analytics, therefore, adding value to the Source: documents/ema-cisco_composite-0614-ib.pdf overall environment. Our strategy is to empower customers to store data in places that make sense for their business model and yet provide the ability to access that data, abstract insights, from anywhere and in real-time, in order to make key business decisions. BDWE enriches analytics with expanded data breadth by effectively allowing users to analyze not only recent data, but also get unprecedented access to all historical data. It improves analytics and data warehouse performance with its unique blend of best-in-class data, computing, and network infrastructures, to drive accelerated performance and scalability. By leveraging massive data assets, our customers gain competitive advantage and achieve better business outcomes. Lastly, risk is reduced while advancing a company s data strategy with an end-to-end solution that uses proven software, network and computing infrastructure to achieve set out data and business goals. THE BIG PICTURE The true value from big data and analytics comes from acting on the insights found when connecting the unconnected. Our ability to connect data across the network and bring analytics to the edge of the network allows our customers to take advantage of all their data assets and create unique business insights. This creates an eye-opening experience for customers by painting a full picture of their data assets and giving them the opportunity to run their business more efficiently. n CISCO Interested in hearing more? Check out Follow us on Get the latest news from our Cisco Data Virtualization blog:

8 18 OCTOBER/NOVEMBER 2014 DBTA Sponsored Content Hadoop Data Lakes Incorporating an ODS With Data Integration to Ensure a Successful Initiative Hadoop data lakes are a new and promising option for enterprise-wide analytics and business intelligence. The potential benefits are clear for the lines of business, data scientists, and IT professionals alike. Data from disparate sources throughout the organization are proactively placed in the data lake. Whenever a team or data scientist wants to run analysis, the information is ready and waiting. As analyst firm Gartner recently noted, data lakes eliminate the need to deal with dozens of independently-managed collections of data. Instead, information is combined into a single data lake. From an IT perspective, Hadoop is an ideal platform to support data lakes, given its scalability and low cost. At first glance, data lakes seem like they could be nirvana for data scientists. From an implementation standpoint, incorporating an operational data store (ODS) within a data lake environment is a surefire way to deliver on the promise of increased agility through improved data accessibility. In a data supply chain that feeds a data lake architecture, an ODS holds a real-time copy of the organization s production data. Production data is the primary ingredient required in most business analytics projects, so aggregating three to twelve months worth of this information in the ODS makes perfect sense. This gives data scientists free reign to explore production data as they see fit, test hypotheses, and embrace a fail fast philosophy. Once data in the ODS exceeds the desired age, it can be moved into the Hadoop data lake for long-term archiving. In addition to the analytics benefits, capturing production data in the ODS ensures that organizations maintain access to production data without tasking production systems. The key to implementing a successful data lake is simplifying its creation and maintenance by using an automated, high-performance data integration tool. Following are four data integration tips for implementing an ODS as part of a larger Hadoop data lake initiative: 1. Find a replication tool that will keep production data up-to-date. Including production data in a data lake supply chain is only useful if that information is kept as current as the systems that generate the data. The best way to keep an ODS containing production data up-to-date is to use a replication tool that captures changes in the source systems as they occur and sends them to the ODS. This ensures that data scientists not only have problem-free access to the information they need, but that they can also access data that reflects the same version of the truth that the lines of business are working from. 2. Look for solutions that offer heterogeneous data support. Production data invariably comes from many different source systems. An automation tool that has heterogeneous data support ensures that a wide range of production systems can be used as sources for the ODS. 3. Seek out tools with a simple, intuitive user interface. GUI-driven designs that simplify and virtualize operations are ideal. Ease of use means that operational data stores can be created in days or hours, rather than months. That translates into rapid return on investment (ROI). 4. Consider a solution like Attunity Replicate that makes it easy for teams to create an ODS and Hadoop data lake. Attunity Replicate supports homogeneous and heterogeneous IT environments. It also provides IT teams with a way to distribute information that is intuitive, high performance, and cost-effective. Attunity Replicate s multi-server, multi-task, and multi-threaded architecture is designed to scale and support large-scale data replication and loading, ideal for supporting a successful modern data architecture. n To learn more, download the Attunity whitepaper, Making an Operational Data Store (ODS) the Center of Your Data Strategy ATTUNITY For more information, visit or call (866) (toll free) / + 1 (781)

9 Sponsored Content DBTA OCTOBER/NOVEMBER Hadoop in the Wild AOL ADVERTISING AOL Advertising is powered by one of the largest online ad serving platforms in the world, driving digital advertising campaigns that generate billions of impressions a month from hundreds of millions of visitors. It identifies the ad to serve by analyzing the visitor, his/her behavior, and the advertiser campaigns presently available. However, analyzing billions of data points to serve digital ads in real-time is a challenge. How do you analyze billions of data points to identify ads for visitors? How do you access hundreds of millions of visitor profiles in real-time? How do you ensure data is up to date? AOL Advertising integrated a high performance database, Couchbase Server, with Hadoop. Couchbase Server provides real-time access to visitor profiles while Hadoop provides offline analysis of clickstream data. The data is imported into Hadoop where it is analyzed with MapReduce jobs to generated visitor profiles. The user profiles are imported into Couchbase Server to support current campaigns. The ad server platform queries Couchbase Server to access data necessary for optimized placement of ads in real-time. With this solution, AOL Advertising solved the challenge of extracting information from clickstream data to support real-time personalization. LIVEPERSON LivePerson is a global leader in intelligent, online customer engagement. The LivePerson platform enables its organizations to engage their customers via chat, voice, content, or video. The challenge is supporting over 8,500 customers, 2 billion sessions per month, and 22 million engagements per month while creating meaningful engagements. LivePerson integrated a high performance messaging platform, a stream processing platform, a high performance database, Couchbase Server, and Hadoop. Clickstream and interaction data is ingested via messaging with Apache Kafka. The data is imported into Hadoop for business intelligence, reporting, and analysis. At the same time, the data is streamed through Apache Storm for realtime analysis. The results are written to Couchbase Server for real-time access. As a result, LivePerson agents can monitor and engage customers based on real-time information. For example, an agent may engage a customer if they are unable to pass through the checkout process. Hadoop is the single source of truth. All data is imported into Hadoop. In addition, a predictive analytics engine accesses the data to improve future customer engagement. For example, agents can better understand when and how to engage customers based on previous customer engagements and behavior. With this solution, LivePerson solved the challenge of extracting information from previous customer engagements and behavior to improve real-time customer engagements. PAYPAL PayPal is a leader in online payments with a focus on multi-channel payments, financial flexibility separating purchase from payment, and a digital wallet for credit cards, loyalty cards, coupons, and more. The PayPal Media Network is a hyperlocal, geo-fenced ad network for delivering targeted offers to mobile platforms to help businesses increase in-person engagement. PayPal integrated a high performance database, Couchbase Server, with Hadoop and more. Couchbase Server enables realtime access to relevant data, redemption processing, and identity mapping and customer segmentation to create profiles for targeted offers. All data is stored in Hadoop. It s used for reporting, analysis, and more. In fact, PayPal relies on Hadoop for event analysis, sentiment analysis, customer segmentation, scoring, and recommendation in addition to real-time, location-based offers. PayPal leverages MapReduce jobs to preprocess, aggregate, and summarize data. With this solution, PayPal solved the challenge of extracting information from visitor behavior to improve real-time placement of location-based offers on mobile platforms. COUCHBASE SERVER + HADOOP AOL Advertising, LivePerson, and PayPal implemented real-time big data architectures with Couchbase Server and Hadoop to solve cloud, mobile, social, and big data challenges. Hadoop solves the challenges of analyzing large volumes of data. Couchabse Server solves the challenge of delivering real-time access to big data. Together, they serve as the foundation for real-time big data architectures. n COUCHBASE

Unleashing the Power of Hadoop for Big Data Analytics

Unleashing the Power of Hadoop for Big Data Analytics THOUGHT LEADERSHIP SERIES AUGUST 2013 2 Unleashing the Power of Hadoop for Big Data Analytics Data analytics, long the obscure pursuit of analysts and quants toiling in the depths of enterprises, has emerged

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

and NoSQL Data Governance for Regulated Industries Using Hadoop Justin Makeig, Director Product Management, MarkLogic October 2013

and NoSQL Data Governance for Regulated Industries Using Hadoop Justin Makeig, Director Product Management, MarkLogic October 2013 Data Governance for Regulated Industries Using Hadoop and NoSQL Justin Makeig, Director Product Management, MarkLogic October 2013 Who am I? Product Manager for 6 years at MarkLogic Background in FinServ

More information

You Have Your Data, Now What?

You Have Your Data, Now What? You Have Your Data, Now What? Kevin Shelly, GVP, Global Public Sector Data is a Resource SLIDE: 2 Time to Value SLIDE: 3 Big Data: Volume, VARIETY, and Velocity Simple Structured Complex Structured Textual/Unstructured

More information

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014 Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4

More information

ARCHITECTURE MODERN DATA MOVING TO A. Best Practices Series. MarkLogic. BackOffice Associates. Couchbase. Snowflake. Cloudant, an IBM Company PAGE 14

ARCHITECTURE MODERN DATA MOVING TO A. Best Practices Series. MarkLogic. BackOffice Associates. Couchbase. Snowflake. Cloudant, an IBM Company PAGE 14 MarkLogic PAGE 14 POWERING THE DATA-CENTERED Best Practices Series DATA CENTER BackOffice Associates PAGE 17 INFORMATION GOVERNANCE MAKING THE MOST OF MODERN DATA ARCHITECTURES Couchbase PAGE 18 THE TENANTS

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information

How To Make Data Streaming A Real Time Intelligence

How To Make Data Streaming A Real Time Intelligence REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

Evolution to Revolution: Big Data 2.0

Evolution to Revolution: Big Data 2.0 Evolution to Revolution: Big Data 2.0 An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for Actian March 2014 IT & DATA MANAGEMENT RESEARCH, INDUSTRY ANALYSIS & CONSULTING Table of Contents

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

How To Use Hp Vertica Ondemand

How To Use Hp Vertica Ondemand Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

Getting Started Practical Input For Your Roadmap

Getting Started Practical Input For Your Roadmap Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data Research Report CA Technologies Big Data Infrastructure Management Executive Summary CA Technologies recently exhibited new technology innovations, marking its entry into the Big Data marketplace with

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

The Business Analyst s Guide to Hadoop

The Business Analyst s Guide to Hadoop White Paper The Business Analyst s Guide to Hadoop Get Ready, Get Set, and Go: A Three-Step Guide to Implementing Hadoop-based Analytics By Alteryx and Hortonworks (T)here is considerable evidence that

More information

From Spark to Ignition:

From Spark to Ignition: From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for

More information

MarkLogic Enterprise Data Layer

MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer September 2011 September 2011 September 2011 Table of Contents Executive Summary... 3 An Enterprise Data

More information

The Enterprise Data Hub and The Modern Information Architecture

The Enterprise Data Hub and The Modern Information Architecture The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader

More information

Big Data Efficiencies That Will Transform Media Company Businesses

Big Data Efficiencies That Will Transform Media Company Businesses Big Data Efficiencies That Will Transform Media Company Businesses TV, digital and print media companies are getting ever-smarter about how to serve the diverse needs of viewers who consume content across

More information

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP CLOUDERA WHITE PAPER 2 Table of Contents Introduction 3 Hadoop's Role in the Big Data Challenge 3 Cloudera:

More information

Making Sense of Big Data in Insurance

Making Sense of Big Data in Insurance Making Sense of Big Data in Insurance Amir Halfon, CTO, Financial Services, MarkLogic Corporation BIG DATA?.. SLIDE: 2 The Evolution of Data Management For your application data! Application- and hardware-specific

More information

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Simplifying Data Governance and Accelerating Real-time Big Data Analysis in Financial Services with MarkLogic Server and Intel

Simplifying Data Governance and Accelerating Real-time Big Data Analysis in Financial Services with MarkLogic Server and Intel White Paper MarkLogic and Intel for Financial Services Simplifying Data Governance and Accelerating Real-time Big Data Analysis in Financial Services with MarkLogic Server and Intel Reduce risk and speed

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

HP and Business Objects Transforming information into intelligence

HP and Business Objects Transforming information into intelligence HP and Business Objects Transforming information into intelligence 1 Empowering your organization Intelligence: the ability to acquire and apply knowledge. For businesses today, gaining intelligence means

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager steve.gonzales@thinkbiganalytics.com

More information

Big Data and Apache Hadoop Adoption:

Big Data and Apache Hadoop Adoption: Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

White Paper: Hadoop for Intelligence Analysis

White Paper: Hadoop for Intelligence Analysis CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and

More information

Three Open Blueprints For Big Data Success

Three Open Blueprints For Big Data Success White Paper: Three Open Blueprints For Big Data Success Featuring Pentaho s Open Data Integration Platform Inside: Leverage open framework and open source Kickstart your efforts with repeatable blueprints

More information

Investment Bank Case Study: Leveraging MarkLogic for Records Retention and Investigation

Investment Bank Case Study: Leveraging MarkLogic for Records Retention and Investigation Investment Bank Case Study: Leveraging MarkLogic for Records Retention and Investigation 2014 MarkLogic. All rights reserved. Reproduction of this white paper by any means is strictly prohibited. TABLE

More information

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

Cloud Integration and the Big Data Journey - Common Use-Case Patterns Cloud Integration and the Big Data Journey - Common Use-Case Patterns A White Paper August, 2014 Corporate Technologies Business Intelligence Group OVERVIEW The advent of cloud and hybrid architectures

More information

More Data in Less Time

More Data in Less Time More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational

More information

The Principles of the Business Data Lake

The Principles of the Business Data Lake The Principles of the Business Data Lake The Business Data Lake Culture eats Strategy for Breakfast, so said Peter Drucker, elegantly making the point that the hardest thing to change in any organization

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Delivering Real-World Total Cost of Ownership and Operational Benefits

Delivering Real-World Total Cost of Ownership and Operational Benefits Delivering Real-World Total Cost of Ownership and Operational Benefits Treasure Data - Delivering Real-World Total Cost of Ownership and Operational Benefits 1 Background Big Data is traditionally thought

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Cisco IT Hadoop Journey

Cisco IT Hadoop Journey Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases

More information

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing

More information

Changing the Equation on Big Data Spending

Changing the Equation on Big Data Spending White Paper Changing the Equation on Big Data Spending Big Data analytics can deliver new customer insights, provide competitive advantage, and drive business innovation. But complexity is holding back

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Government Institutions with MarkLogic Server and Intel

Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Government Institutions with MarkLogic Server and Intel White Paper MarkLogic and Intel for Federal, State, and Local Agencies Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Government Institutions with MarkLogic Server and Intel

More information

How the oil and gas industry can gain value from Big Data?

How the oil and gas industry can gain value from Big Data? How the oil and gas industry can gain value from Big Data? Arild Kristensen Nordic Sales Manager, Big Data Analytics arild.kristensen@no.ibm.com, tlf. +4790532591 April 25, 2013 2013 IBM Corporation Dilbert

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

A Modern Data Architecture with Apache Hadoop

A Modern Data Architecture with Apache Hadoop Modern Data Architecture with Apache Hadoop Talend Big Data Presented by Hortonworks and Talend Executive Summary Apache Hadoop didn t disrupt the datacenter, the data did. Shortly after Corporate IT functions

More information

Beyond the Single View with IBM InfoSphere

Beyond the Single View with IBM InfoSphere Ian Bowring MDM & Information Integration Sales Leader, NE Europe Beyond the Single View with IBM InfoSphere We are at a pivotal point with our information intensive projects 10-40% of each initiative

More information

MarkLogic and Cisco: A Next-Generation, Real-Time Solution for Big Data

MarkLogic and Cisco: A Next-Generation, Real-Time Solution for Big Data MarkLogic and Cisco: A Next-Generation, Real-Time Solution for Big Data MarkLogic Enterprise NoSQL Database and Cisco Unified Computing System provide a single, integrated hardware and software infrastructure

More information

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Accelerate your Big Data Strategy Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Enterprise Data Hub Accelerator enables you to get started rapidly and cost-effectively with

More information

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal BIG DATA AND MICROSOFT Susie Adams CTO Microsoft Federal THE WORLD OF DATA IS CHANGING Cloud What s making this possible? Electrical efficiency of computers doubles every year and ½. Laptops and mobile

More information

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora SAP Brief SAP Technology SAP HANA Vora Objectives Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora Bridge the divide between enterprise data and Big Data Bridge the divide

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Deploying an Operational Data Store Designed for Big Data

Deploying an Operational Data Store Designed for Big Data Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction

More information

SAP HANA Vora : Gain Contextual Awareness for a Smarter Digital Enterprise

SAP HANA Vora : Gain Contextual Awareness for a Smarter Digital Enterprise Frequently Asked Questions SAP HANA Vora SAP HANA Vora : Gain Contextual Awareness for a Smarter Digital Enterprise SAP HANA Vora software enables digital businesses to innovate and compete through in-the-moment

More information

Cisco Unified Data Center: The Foundation for Private Cloud Infrastructure

Cisco Unified Data Center: The Foundation for Private Cloud Infrastructure White Paper Cisco Unified Data Center: The Foundation for Private Cloud Infrastructure Providing Agile and Efficient Service Delivery for Sustainable Business Advantage What You Will Learn Enterprises

More information

Big Data and Your Data Warehouse Philip Russom

Big Data and Your Data Warehouse Philip Russom Big Data and Your Data Warehouse Philip Russom TDWI Research Director for Data Management April 5, 2012 Sponsor Speakers Philip Russom Research Director, Data Management, TDWI Peter Jeffcock Director,

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Architecting an Industrial Sensor Data Platform for Big Data Analytics: Continued

Architecting an Industrial Sensor Data Platform for Big Data Analytics: Continued Architecting an Industrial Sensor Data Platform for Big Data Analytics: Continued 2 8 10 Issue 1 Welcome From the Gartner Files: Blueprint for Architecting Sensor Data for Big Data Analytics About OSIsoft,

More information

Master big data to optimize the oil and gas lifecycle

Master big data to optimize the oil and gas lifecycle Viewpoint paper Master big data to optimize the oil and gas lifecycle Information management and analytics (IM&A) helps move decisions from reactive to predictive Table of contents 4 Getting a handle on

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

Tap into Big Data at the Speed of Business

Tap into Big Data at the Speed of Business SAP Brief SAP Technology SAP Sybase IQ Objectives Tap into Big Data at the Speed of Business A simpler, more affordable approach to Big Data analytics A simpler, more affordable approach to Big Data analytics

More information

Big Data, Big Banks and Unleashing Big Opportunities

Big Data, Big Banks and Unleashing Big Opportunities Big, Big Banks and Unleashing Big Opportunities Big, Big Banks and Unleashing Big Opportunities Big, Big Banks and Unleashing Big Opportunities A retailer using Big to the full could increase its operating

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information