HADOOP. Unleashing the Power of. for Big Data Analysis. Best Practices Series. MarkLogic. Cisco. Attunity. Couchbase PAGE 14 PAGE 17 PAGE 18 PAGE 19

Transcription

1 MarkLogic PAGE 14 MAKING HADOOP BETTER WITH MARKLOGIC Best Practices Series Cisco PAGE 17 CREATING SOLUTIONS TO MEET OUR CUSTOMERS DATA AND ANALYTICS CHALLENGES Attunity PAGE 18 HADOOP DATA LAKES: INCORPORATING AN ODS WITH DATA INTEGRATION TO ENSURE A SUCCESSFUL INITIATIVE Couchbase PAGE 19 HADOOP IN THE WILD Unleashing the Power of HADOOP for Big Data Analysis

2 12 OCTOBER/NOVEMBER 2014 DBTA 8 STEPS to Unleashing the Power of HADOOP Best Practices Series According to a recent DBTA survey, 30% of respondents report having the Hadoop framework in production today, and another 26% plan to acquire or implement the technology over the coming year. The survey shows that enterprises are adopting Hadoop for analytics/business intelligence, IT operational data requirements (logs, systems monitoring), and special projects. While the open source framework is popular and being widely deployed, there s no shortage of great anticipation as well as confusion about it either. Hadoop is well-designed for big data jobs, as it supports a variety of data types within its file system. However, it is not a panacea for organizations struggling to be more data-driven, and often adds more of a burden than it resolves. For example, Hadoop environments often require highly specialized data science and development skills, which are in short supply. Governance is also an issue, since Hadoop projects often fall outside the established order of metadata management, data mappings and data quality processes painstakingly put in place for relational database management and data warehouse sites. Plus, being a widely available open source product, Hadoop often gets implemented and run in various places across organizations, meaning that there may be multiple projects underway that potentially even duplicate each other. The challenge is to bring Hadoop activities into existing IT environments. Ultimately, the result is bottlenecks or even an inability to move data into and out of Hadoop environments. Data integration across new environments or application areas may also be a challenge, since Hadoop-based data may be managed within its own special silos. Real-time data is also an issue. While there are tools and updates that move Hadoop from batch to real-time mode, there may still be issues with complex hybrid queries requiring both historical and real-time data. Here are some pointers for successful Hadoop implementations: Focus on business challenges that existing data platforms aren t adequately handling. Many decision makers are seeking insights from data that is beyond the reach of traditional data warehouse environments or relational databases, and are expensive to incorporate. For example, weblog data may point to important trends in terms of digital engagements, but there may be too much data being generated for the existing data infrastructure to support. Many departments have such requirements; meeting these requests is where Hadoop can deliver to the business.

3 DBTA OCTOBER/NOVEMBER Make Hadoop real for the business. In recent years, data governance efforts have been successful at not only ensuring that data moving through organizations is trustworthy but also that it serves the needs of its business owners. Hadoop, typically spun up in data centers as pilot projects or for specialized jobs, has been outside the data governance sphere. It s time to incorporate Hadoop into existing governance processes for data management, to bring about an organizational mission for the data framework, along with business unit sponsorship. Departments that will work most closely with big data such as marketing, sales, and IT will have a forum to collaborate and provide guidance. Plus, such a framework sets up a consistent process and methodology for vetting data and new projects. Another initiative seen in the database and data warehouse world centers of excellence are needed to shield growing Hadoop projects from intraorganizational squabbles. Remember that not every project is right for Hadoop. Build out your Hadoop architecture incrementally. Start with smaller, low-hanging fruit -type projects that can be quickly turned around to show a win. While the framework is very effective at processing analytical jobs across various data formats, it may not be suitable for mixed workloads, such as combined historical and real-time queries. There are many places where relational databases still fill the bill. For example, if SQL statements or more complex queries are needed, or if the business case for unstructured data still being defined. Determine the types and sources of data to be flowed into Hadoop. There are many types of data particularly in unstructured formats that don t fit well into data warehouse or relational data environments and can be offloaded into Hadoop environments. If they are made to fit, the process may be prohibitively expensive. Ultimately, Hadoop offers a way to quickly access well-packaged files for rapid analysis. Develop an architectural approach that fully incorporates Hadoop. Until now, vendors have been positioning Hadoop as a data source that then feeds into a relational database management system or data warehouse. New thinking puts Hadoop at the core of such architectures, pulling in data from warehouses and existing databases. Evaluate data integration strategies that can incorporate data managed within Hadoop. The framework needs to be designed into a data process flow as part of its movement from original sources to analytical environments. There are many types of data particularly in unstructured formats that don t fit well into data warehouse or relational data environments and can be offloaded into Hadoop environments. Get involved with Hadoop user groups or communities, and connect with other users. This is an opportunity to learn the latest thinking and evolving best practices. Talk to others within your industry to find out what works, and what potential pitfalls may be. Be sure to explore these issues with enterprises managing the same types of data. For example, manufacturers may be focused on data generated by machines, robots, and sensors. Publishers and media companies will be focused on content. Find out from others how Hadoop is being used to manage those types of files. Look inside and outside for the right skills. Hadoop requires new sets of skills, both technical and problemsolving. There is a need for data scientists, analysts, and developers. The rise of Hadoop introduces new tools and languages, such as the R language, employed for statistical problem-solving, as well as tools such as MapReduce, Yarn, and Apache Spark. Along with new sets of technical skills, Hadoop requires looking at data differently, to identify analysis opportunities, uncover hidden nuggets, and then communicate those data findings to the business. These skills may be hard to find on the open job market, but fundamental understanding may be readily available within existing IT or data management departments. Many current data professionals are already performing the rudimentary tasks of data scientists, and can be brought up-to-speed with further training and education. Have Hadoop complement, not replace, existing data warehouses and data platforms. There s a role for Hadoop, and there s a role for existing data environments. Much of the data now coming in, due to cost or structure, may be better suited for storage within non-relational environments as the first option. The bottom line is that most enterprises are accumulating vast stores of both unstructured and structured data, which ultimately need to be integrated. The key challenge is to be able to move away from point-to-point integration, which is not sustainable within big data environments and is often welded to specific applications, to build a wellarchitected data environment that can readily ingest and analyze massive and varied datasets. Hadoop makes such approaches possible, acting as a data staging area, operational data store, and even analytic sandbox. n Joe McKendrick

4 14 OCTOBER/NOVEMBER 2014 DBTA Sponsored Content Making Hadoop Better With MarkLogic When someone says Hadoop, they typically mean an entire ecosystem of projects, all of which are focused on dealing with big data through a framework of distributed processing over large groups of commodity machines. There s a lot of activity, a lot of choice and quite a bit of confusion. With a large number of moving parts, and a large number of vendors providing Hadoop support and customized distributions, how do you decide where to begin? The initial development of Hadoop by organizations like Google and Yahoo! was a response to not being able to handle what we would now call big data in legacy RDBMSs or data warehouses. To date, there aren t many Hadoop-based applications in production to learn from, so many organizations are struggling to figure out how to get real benefit from it without hiring an army of IT staff and spending significant amounts of time and money. In this editorial, we ll explain the ways the MarkLogic platform can help you use Hadoop to deliver real-time big data applications, improve data governance, and save money. HADOOP & MARKLOGIC: AN OVERVIEW MarkLogic can be deployed against any of the leading commercial Hadoop distributions, allowing administrators to leverage existing infrastructure. Though the MarkLogic Enterprise NoSQL platform is not dependent on Hadoop, MarkLogic and Hadoop function in a complementary manner in a big data ecosystem. MarkLogic works with the two core elements of Hadoop Hadoop Distributed File System (HDFS) and MapReduce which are the most mature parts of the ecosystem and the foundation for all the other pieces. HDFS provides storage for data that is too large or unpredictable for traditional databases or data warehouses. While it s not designed for real-time data access that requires indexes and interactive query capabilities, HDFS is a cost-effective way to keep data that may have otherwise been discarded or archived to tape. MapReduce does distributed computation on the data stored in HDFS it s useful for batch processing where you need to perform analytics or enrichment on massive datasets, but what if you need to provide users with the ability to quickly find specific pieces of data and provide granular updates to the data in real time? If you need to do near-instantaneous analysis and alerting for fraud detection, emergency crisis management, risk mitigation or assessment, can you afford the time it would take for a MapReduce job to complete? Hadoop has three primary use cases in the enterprise: Staging: accommodate any shape of data relatively cheaply Persistence: keep the raw input for analytics without losing the original context Analytics: perform large scale analytics on raw or prepared data However, Hadoop alone cannot provide real-time applications or the governance around the data that enterprises require today. MarkLogic makes Hadoop better by bringing the power of Enterprise NoSQL to address these limitations. MarkLogic is unique in the marketplace in providing the best of NoSQL while also being a hardened and proven enterprise-class database technology. Created in 2001 to fill the need within enterprise organizations and government entities to store, manage, query and search data, no matter the format or structure, MarkLogic has these NoSQL characteristics: Flexible, with a schema-agnostic, document data model (JSON, XML, Text, Binary, RDF Triples) Fast, implemented in C++, optimized for today s I/O systems Scalable, leveraging a shared-nothing distributed architecture and lockfree reads MarkLogic is also highly available, with transactional consistency, automatic failover, and replication. As an Enterprise NoSQL database platform, MarkLogic was designed from the start to support enterprise-class and enterprise-scale application requirements, including: ACID (atomic, consistent, isolated, and durable) transactions, just like you get from relational DBMS Government-grade security features including fine-grained privileges, role-based security, document-level permissions, and HTTPS access Real-time indexing, full-text search, geospatial search, semantic search, and alerting Proven reliability, uptime, and over 500 deployed mission-critical and enterprise projects in government, media, financial services, energy, and other industries REAL-TIME APPLICATIONS: MARKLOGIC IS THE BEST DATABASE FOR HADOOP First things first: HDFS is a costeffective file system, but it has no indexes, so finding an individual record typically involves scanning through every record in a large file. That might be okay for large-scale analytics where the computation might need to read every record but it can t support the low-latency queries and granular updates required for real-time workloads and end-user applications. For that, you need a database. Hadoop alone is not equipped for this type of workload.

5 Sponsored Content DBTA OCTOBER/NOVEMBER MarkLogic & Hadoop: Complementary Big Data Capabilities MarkLogic Online applications Decision-making Real-time Distributed indexes The popular tech press will have you think it s a stark trade-off between legacy relational databases which provide indexes, transactions, security, and enterprise operations and the open source NoSQL databases like HBase which have a flexible data model and commodity scale-out while being distributed and fault-tolerant, but are less mature in their enterprise roadmap. What if you could have the best of both of these worlds? With MarkLogic, you get all of the scalability on commodity hardware that s come to define the NoSQL space. However, you don t have to sacrifice the enterprise capabilities like ACID transactions, security, high availability and disaster recovery that your missioncritical applications require. This is why we believe MarkLogic is the best database for Hadoop. TIERED STORAGE: COST-EFFECTIVE SUPPORT FOR A VARIETY OF SLAS Next, what if you could segregate your data, such that you could align how it s stored, with its value AND make it available whenever it s needed? All data is valuable but the value of the data may vary based on business need at a given time: In a typical organization, a small amount of data accounts for most of the value, for example, current transactions or the latest news. This is the data that requires high availability and interactive response times. However, as data ages along the long tail, its access patterns change. Hadoop Offline analytics Model-building Long-haul batch Distributed file system Historical data is typically not the data you are running your business on. You may need to keep it around for regulatory compliance or reporting, but it s likely not something that needs millisecond interactivity or high availability. Economically, it makes sense to more densely pack this data on cheaper storage. Finally, the economics of storage and compute have allowed organizations to keep the long tail around. Much of this data may not need to be online or immediately queryable, but should be accessible to quickly spin up for analysis and then spin down again to conserve computer resources. MarkLogic allows you to store data across different types of storage. This, in itself, is not a new capability as database and storage vendors have been ordering hierarchical storage and Information Lifecycle Management for years. What differentiates our offering is the ability to easily and consistently move data between tiers without complicated and expensive ETL and data duplication index it once when it s first ingested and query and leverage those indexes for search and analytics, no matter where it s stored. By allowing your data to live in the most appropriate tier of infrastructure, you can save money while still providing appropriate performance and availability for applications. Aligning a storage strategy with the value and use of your data allows you to make smarter tradeoffs among cost, performance, and availability. You can implement a data governance policy and deploy MarkLogic using a fluid mix of SSD, local disk, shared disk, and HDFS as well as Amazon EBS and S3. MarkLogic is unique in the fact that you can run a database on a mix of locally attached storage and shared storage. For example, you can benefit from less expensive Hadoop storage for archive data, with high density for efficiency, and shared-disk failover while using another tier of more expensive storage for active data, with low density for ingest performance, and replication for high availability. In MarkLogic moving data between local and shared storage is an online operation. There is no down time and all of the guarantees of ACID transactions hold. An administrator can easily move data around to the most appropriate infrastructure without having to ETL the data between two environments or without having developers change any downstream application code. It s the same executable with the same APIs on all these tiers, so you can write one app that runs across

6 16 OCTOBER/NOVEMBER 2014 DBTA Sponsored Content them seamlessly and transparently. An administrator could arrange the database to query, for example, just the local disk for the latest transactions, just the shared storage for the long tail, or both together with the database handling all of the nitty-gritty details of query federation and transactions. A tiered storage infrastructure with MarkLogic lets you fluidly and consistently switch between Active, Historical, and Archive data without expensive ETL or dedicated infrastructure. You can perform mixed batch and real-time workloads with Hadoop MapReduce and the MarkLogic Enterprise NoSQL Database. CONNECTOR FOR COMPUTE INFRASTRUCTURE: MARKLOGIC & MAPREDUCE MarkLogic can also use Hadoop as a compute layer. The MarkLogic Connector for Hadoop is a drop-in extension that integrates MarkLogic with MapReduce for ETL, analytics, or enrichment. For example, you can use the large ecosystem of Hadoop libraries to transform and aggregate data before loading it into MarkLogic. And the MarkLogic bulk loading tool, mlcp, schedules MapReduce jobs under the covers to load gigabytes, terabytes, or even petabytes in parallel. You can use Hadoop s powerful batch processing capabilities to enrich datasets or develop models before delivering them to real-time applications powered by MarkLogic or even use MarkLogic to mark up content then move it back into Hadoop. Finally, once data is indexed by MarkLogic and stored in its on-disk format it s not locked away. Using a feature we call direct access a Hadoop application can access the data in a MarkLogic data file without having to first mount it to a database. The implementation is very similar to some of the formats like Parquet and ORC that are coming out of the Hadoop Hive community. Of course, if you want to leverage MarkLogic s sophisticated indexes and security model, you ll have to come in through the front door. However, with direct access a MapReduce job can efficiently read all of the data in a MarkLogic data file. This means you re able to index the data once for real-time queries and updates and leverage that same data format for large-scale batch processing. By using the same data format you ll have fewer different representations floating around and maintain a single version of the truth. You ll also reduce the amount of ETL required to translate data between operational and analytic environments. SUMMARY Using Hadoop with MarkLogic s real-time Enterprise NoSQL database and tiered storage capabilities, you can build With MarkLogic, your Hadoop ecosystem has: Less ETL Data governance already built in ACID compliance as part of the design Schema agnostic no upfront data modeling Elasticity, scale everything when you need, as you need it out automated business rules that move your data to the right place for storage. You can search, query and use that data, no matter where it is whether in a disk array, or in distributed, commodity hardware in a Hadoop cluster, or perhaps even in the cloud without having to move it to a data mart or reconstruct your applications, queries, security, or data governance. MarkLogic and Hadoop are complementary technologies that work well together for today s Big Data challenges. By combining MarkLogic and Hadoop: You can build real-time enterprise applications for Hadoop-based data You can leverage existing (or upcoming) infrastructure investments to save time and money You will require less data movement and/or duplication over its lifecycle You can support mixed workloads: index once, real-time or batch You will save money from using cost-effective long-term and longtail storage Adding MarkLogic to your Hadoop stack makes it better helping you to deliver real-time big data applications, improve data governance, and save money. n MARKLOGIC

7 Sponsored Content DBTA OCTOBER/NOVEMBER Creating Solutions to Meet Our Customers Data and Analytics Challenges THE INTERNET OF EVERYTHING (IOE) People. Process. Data. Things. Yesterday, they functioned independently. Today, they need to function together through a combination of machineto-machine, person-to-machine and person-to-person connections. Creating new capabilities, richer experiences and incredible economic opportunity, Cisco calls this the Internet of Everything (IoE). The IoE is creating more data, more types of data, and in more places. While the IoE is making us all smarter, this wealth of data comes with two major challenges: 1) Effective management of massive amounts and types of data in multiple locations 2) Analyzing data quickly enough to respond to opportunities and threats At Cisco, we designed our data and analytics solutions to meet these two challenges and are working to bring together data and analytics securely in a way no other company can. Not only do we connect more people, processes, data, and things than any other company, we can also bring analytics to data wherever it is no matter how remote to turn information into insights almost instantly. The first step begins with our agile data integration software, Cisco Data Virtualization. Our Data Virtualization technology abstracts data from multiple sources and transparently brings it together to give users a unified, friendly view of the data that they need. By leveraging this technology with additional solutions, we help our customers access data across the IoE and use that data to respond quickly to change, gain competitive advantage and drive better outcomes. 1 OFFLOADING DATA FOR EFFECTIVE MANAGEMENT Driven by the massive amounts of data in today s IT environment, customers are facing huge expenses to add capacity to their existing enterprise data warehouses, the place in which data is traditionally stored. Investments regularly reach into the millions of dollars for large deployments. 1 We help customers tackle the challenge of rising enterprise data warehouse costs with Cisco Big Data Warehouse Expansion (BDWE), a solution that assists customers with strategy, tools and processes to extend the value of their traditional data warehouse investment. BDWE analyzes the warehouse, identifies infrequently used data and provides a methodology and tools to offload the data onto Hadoop, avoiding additional capacity costs and extending the life of the data warehouse. By implementing an ongoing strategy to offload data from the primary system to Hadoop, our solution frees up resources providing for better overall system performance. Additionally, we deploy our Data Virtualization technology that provides a layer of abstraction and simplified access spanning the original warehouse and the new Hadoop data store. ENHANCED ANALYTICS WITH CURRENT AND HISTORICAL DATA Many companies are forced by the economics of data management to implement aggressive Information Lifecycle Management (ILM) policies removing data from critical systems to avoid costs. With BDWE, we can help customers keep more data online and available for deeper and more insightful analytics, therefore, adding value to the Source: documents/ema-cisco_composite-0614-ib.pdf overall environment. Our strategy is to empower customers to store data in places that make sense for their business model and yet provide the ability to access that data, abstract insights, from anywhere and in real-time, in order to make key business decisions. BDWE enriches analytics with expanded data breadth by effectively allowing users to analyze not only recent data, but also get unprecedented access to all historical data. It improves analytics and data warehouse performance with its unique blend of best-in-class data, computing, and network infrastructures, to drive accelerated performance and scalability. By leveraging massive data assets, our customers gain competitive advantage and achieve better business outcomes. Lastly, risk is reduced while advancing a company s data strategy with an end-to-end solution that uses proven software, network and computing infrastructure to achieve set out data and business goals. THE BIG PICTURE The true value from big data and analytics comes from acting on the insights found when connecting the unconnected. Our ability to connect data across the network and bring analytics to the edge of the network allows our customers to take advantage of all their data assets and create unique business insights. This creates an eye-opening experience for customers by painting a full picture of their data assets and giving them the opportunity to run their business more efficiently. n CISCO Interested in hearing more? Check out Follow us on Get the latest news from our Cisco Data Virtualization blog:

8 18 OCTOBER/NOVEMBER 2014 DBTA Sponsored Content Hadoop Data Lakes Incorporating an ODS With Data Integration to Ensure a Successful Initiative Hadoop data lakes are a new and promising option for enterprise-wide analytics and business intelligence. The potential benefits are clear for the lines of business, data scientists, and IT professionals alike. Data from disparate sources throughout the organization are proactively placed in the data lake. Whenever a team or data scientist wants to run analysis, the information is ready and waiting. As analyst firm Gartner recently noted, data lakes eliminate the need to deal with dozens of independently-managed collections of data. Instead, information is combined into a single data lake. From an IT perspective, Hadoop is an ideal platform to support data lakes, given its scalability and low cost. At first glance, data lakes seem like they could be nirvana for data scientists. From an implementation standpoint, incorporating an operational data store (ODS) within a data lake environment is a surefire way to deliver on the promise of increased agility through improved data accessibility. In a data supply chain that feeds a data lake architecture, an ODS holds a real-time copy of the organization s production data. Production data is the primary ingredient required in most business analytics projects, so aggregating three to twelve months worth of this information in the ODS makes perfect sense. This gives data scientists free reign to explore production data as they see fit, test hypotheses, and embrace a fail fast philosophy. Once data in the ODS exceeds the desired age, it can be moved into the Hadoop data lake for long-term archiving. In addition to the analytics benefits, capturing production data in the ODS ensures that organizations maintain access to production data without tasking production systems. The key to implementing a successful data lake is simplifying its creation and maintenance by using an automated, high-performance data integration tool. Following are four data integration tips for implementing an ODS as part of a larger Hadoop data lake initiative: 1. Find a replication tool that will keep production data up-to-date. Including production data in a data lake supply chain is only useful if that information is kept as current as the systems that generate the data. The best way to keep an ODS containing production data up-to-date is to use a replication tool that captures changes in the source systems as they occur and sends them to the ODS. This ensures that data scientists not only have problem-free access to the information they need, but that they can also access data that reflects the same version of the truth that the lines of business are working from. 2. Look for solutions that offer heterogeneous data support. Production data invariably comes from many different source systems. An automation tool that has heterogeneous data support ensures that a wide range of production systems can be used as sources for the ODS. 3. Seek out tools with a simple, intuitive user interface. GUI-driven designs that simplify and virtualize operations are ideal. Ease of use means that operational data stores can be created in days or hours, rather than months. That translates into rapid return on investment (ROI). 4. Consider a solution like Attunity Replicate that makes it easy for teams to create an ODS and Hadoop data lake. Attunity Replicate supports homogeneous and heterogeneous IT environments. It also provides IT teams with a way to distribute information that is intuitive, high performance, and cost-effective. Attunity Replicate s multi-server, multi-task, and multi-threaded architecture is designed to scale and support large-scale data replication and loading, ideal for supporting a successful modern data architecture. n To learn more, download the Attunity whitepaper, Making an Operational Data Store (ODS) the Center of Your Data Strategy ATTUNITY For more information, visit or call (866) (toll free) / + 1 (781)

9 Sponsored Content DBTA OCTOBER/NOVEMBER Hadoop in the Wild AOL ADVERTISING AOL Advertising is powered by one of the largest online ad serving platforms in the world, driving digital advertising campaigns that generate billions of impressions a month from hundreds of millions of visitors. It identifies the ad to serve by analyzing the visitor, his/her behavior, and the advertiser campaigns presently available. However, analyzing billions of data points to serve digital ads in real-time is a challenge. How do you analyze billions of data points to identify ads for visitors? How do you access hundreds of millions of visitor profiles in real-time? How do you ensure data is up to date? AOL Advertising integrated a high performance database, Couchbase Server, with Hadoop. Couchbase Server provides real-time access to visitor profiles while Hadoop provides offline analysis of clickstream data. The data is imported into Hadoop where it is analyzed with MapReduce jobs to generated visitor profiles. The user profiles are imported into Couchbase Server to support current campaigns. The ad server platform queries Couchbase Server to access data necessary for optimized placement of ads in real-time. With this solution, AOL Advertising solved the challenge of extracting information from clickstream data to support real-time personalization. LIVEPERSON LivePerson is a global leader in intelligent, online customer engagement. The LivePerson platform enables its organizations to engage their customers via chat, voice, content, or video. The challenge is supporting over 8,500 customers, 2 billion sessions per month, and 22 million engagements per month while creating meaningful engagements. LivePerson integrated a high performance messaging platform, a stream processing platform, a high performance database, Couchbase Server, and Hadoop. Clickstream and interaction data is ingested via messaging with Apache Kafka. The data is imported into Hadoop for business intelligence, reporting, and analysis. At the same time, the data is streamed through Apache Storm for realtime analysis. The results are written to Couchbase Server for real-time access. As a result, LivePerson agents can monitor and engage customers based on real-time information. For example, an agent may engage a customer if they are unable to pass through the checkout process. Hadoop is the single source of truth. All data is imported into Hadoop. In addition, a predictive analytics engine accesses the data to improve future customer engagement. For example, agents can better understand when and how to engage customers based on previous customer engagements and behavior. With this solution, LivePerson solved the challenge of extracting information from previous customer engagements and behavior to improve real-time customer engagements. PAYPAL PayPal is a leader in online payments with a focus on multi-channel payments, financial flexibility separating purchase from payment, and a digital wallet for credit cards, loyalty cards, coupons, and more. The PayPal Media Network is a hyperlocal, geo-fenced ad network for delivering targeted offers to mobile platforms to help businesses increase in-person engagement. PayPal integrated a high performance database, Couchbase Server, with Hadoop and more. Couchbase Server enables realtime access to relevant data, redemption processing, and identity mapping and customer segmentation to create profiles for targeted offers. All data is stored in Hadoop. It s used for reporting, analysis, and more. In fact, PayPal relies on Hadoop for event analysis, sentiment analysis, customer segmentation, scoring, and recommendation in addition to real-time, location-based offers. PayPal leverages MapReduce jobs to preprocess, aggregate, and summarize data. With this solution, PayPal solved the challenge of extracting information from visitor behavior to improve real-time placement of location-based offers on mobile platforms. COUCHBASE SERVER + HADOOP AOL Advertising, LivePerson, and PayPal implemented real-time big data architectures with Couchbase Server and Hadoop to solve cloud, mobile, social, and big data challenges. Hadoop solves the challenges of analyzing large volumes of data. Couchabse Server solves the challenge of delivering real-time access to big data. Together, they serve as the foundation for real-time big data architectures. n COUCHBASE