1 White Paper Crowd Sourcing Reflected Intelligence Using Search and Big Data How LucidWorks and MapR can reflect crowd-source intelligence by leveraging Lucene/Solr A White PAPer by Grant ingersoll, Chief Scientist for LucidWorks and ted Dunning, Chief Application Architect for Mapr
2 Page 2 LucidWorks & MapR: Crowd-Sourced Intelligence Abstract This white paper explores how search has evolved in recent years beyond keyword search into a more broadly applicable information discovery tool by using principles of reflected intelligence. The paper will then demonstrate how several organizations combine big data, search and reflected intelligence to improve search results and decision-making. The paper concludes with a discussion of how LucidWorks and MapR work together to make this possible and how organizations can get started using reflected intelligence in their search applications. The Evolution of Search Search has become a mainstream and integral part of our daily lives it is helpful to remember, however, that it wasn t always this way. In the early days of the Internet, tools like Archie, Veronica, and Jughead emerged to search for particular file names stored on FTP servers and Gopher listings. Once the World Wide Web was established with the release of the first browser and server code from CERN in 1992, search engines like WebCrawler, Lycos, Magellan, Excite, Infoseek, Inktomi, Northern Light, and AltaVista emerged to help unlock the information stored across, what was at the time, thousands of Web servers with perhaps hundreds of thousands of pages of information. As the Internet continued to grow exponentially, Google s innovative PageRank algorithm allowed the company to break away from the pack and eventually became synonymous with search on the Web. While this was all developing on the public Web, companies began to realize that they also have vast stores of information, both structured (enterprise applications, databases and spreadsheets) and semi-structured ( s, documents, presentations, multimedia, etc.) and that search technology can provide an effective means to uncover insights and correlations about things like customers, products, and markets. Enter Lucene/Solr and LucidWorks Lucene was accepted into the Apache Software Foundation in 2001 and became its own top-level project in By 2007, companies such as Netflix, eharmony, and Cisco, among others began adopting Apache Lucene as an open source alternative to proprietary search engines for both internal and customer-facing applications. Apache Solr began as in internal project at CNET to provide a better search experience for their web visitors. Leveraging Lucene search algorithms as the internals, Solr added a search server and significant additional capabilities. Solr was donated to the Apache foundation in 2006 and over the last several years has become tightly coupled with Lucene. Lucene/Solr has become the de facto standard in open source search in use at tens of thousands of companies and employs a thriving developer community that numbers in the thousands. Along Comes Hadoop and MapR In 2006, the Lucene project spawned a sub-project by the name of Hadoop. By early 2008, it had become a top-level Apache project and is now a de facto standard for big-data analysis. What began as a way to make the Nutch web crawler scale to handle larger and larger crawling jobs, it has since morphed into a general purpose, distributed file system and computation framework used in a wide variety of large scale applications such as log processing, data warehousing and much, much more. Improving the Search Experience the Next Frontier Products such as Apache Lucene/Solr and LucidWorks Search have commoditized enterprise search and have made it extraordinarily easy to deploy as well as easy to obtain highly relevant search results. As a result of LucidWorks substantial investment, it no longer requires a search engine expert or developer to stand up an enterprise search server and tune it for optimal performance. Search professionals no longer need to focus on the basics of search like how to index and search content. Instead, implementers can focus on the real challenge of improving the user s application experience by exploiting the intersection of content, content relationships, user interactions, and access. In most cases, in order to exploit this information, one needs a big data solution due to the large amounts of data and user interactions often seen in many applications. Even if there isn t a large amount of content, distributed computation is often useful for speeding up computationally intensive tasks like natural language processing.
3 Page 3 LucidWorks & MapR: Crowd-Sourced Intelligence Search Abuse An Evolutionary Step Forward As search has evolved beyond simple text retrieval, it has emerged as a building block for addressing tougher challenges like fuzziness, relevance ranking, and probabilities all across data stores that include structured and semi-structured information. Search abuse is the notion that software that was intended for one thing (text retrieval) is re-purposed as a building block for another thing (non-textual analytics). Specifically, all kinds of data, including structured data or records of user behavior, can be analyzed by a search engine as a component of a larger system. These kinds of solutions are possible because the underlying algorithms and data structures used to power a search engine can be effectively seen as a sparse matrix multiplication, and it just so happens that sparse matrix multiplication is often what is needed to power many of these next generation, data driven applications. It turns out that there are many places to use non-textual information in search applications: to perform NoSQL types of data retrieval, or to do large-scale machine learning, recommendations and much, much more. The Intersection of Search and Big Data Things begin to get really interesting when a search engine is used to analyze the behavior of users who are themselves using a search engine. Search experts compare keyword search to dehydrated food the basic nutrition is there, but it is not easily accessible until water is added. In this case, the water is behavioral information. Armed with other data sources, such as clicks, mouse tracks, ratings, and reviews, keyword search can be augmented and can lead to information discovery and ultimately to better decision-making. This is where tools like MapR s distribution for Hadoop enter into the search equation. By including all of this meta information about user behavior, search engines can find interesting patterns and correlations that feed directly back into search results. The end result is that the search system can reflect the behavior of subject matter experts back at other users who lack some of that training or experience. The system appears to users to act intelligently because it is reflecting intelligent action back at users. Big data plays a role in large-scale analysis, as well, by producing clusters, identifying trends and topics, finding statistically interesting phrases, similar documents and many more things that require an aggregate view of the data. These large-scale discovery components can encourage system users to experiment with the data and can lead to a virtuous cycle more people do search and discovery and their behavior contributes to improving search results and insights that can be derived. Similar to the Agile method of software engineering, where organizations are always in development cycles, search can always be refining results, based on user behavior. And, with potentially millions of users hitting a system, some subset will behave in clever ways that can be reflected back to general users and make them more productive. This results in the system appearing to be intelligent, although it is simply reflecting back the intelligence of others. Customer Examples and Use Cases Reflected intelligence has utility in a wide variety of situations as reflected in some examples in industries such as telecom, advertising, banking and insurance, education, government, and entertainment. Most of these use cases are not typical search applications there are no users entering search terms into text boxes. Instead, these cases largely use search components as pivotal elements to adding value to content that already exists. Social Media in Telecom Social media has evolved to become a key component of marketing for many types of companies and organizations. The first use of search as it relates to social media has been to find mentions of the company across social media sources such as Twitter or Instagram. The true power of search is revealed in cases where a company can make operational decisions based on insights derived from search. An example is a major telecom provider who mines social media and correlates it with cell tower data to predict additional capacity demands for sporting events, music festivals, emergencies, etc.
4 Page 4 LucidWorks & MapR: Crowd-Sourced Intelligence Social Media Analysis for Advertising Typical television advertising is done using a scattershot approach where targeting is based on demographic data that paints a broad picture of an audience, e.g., affluent women between years old. As a result, ad placement pricing is based on reaching a portion of this very broad segment. It is estimated that up to a 5x multiple could be derived if the ads were backed up by good analysis of who was actually watching, when they were watching, and what they thought about the advertising and brand. By combining insights from social media, advertisers can get as much as 80% of the total value of the ad from this analysis, as compared to the ad itself. Insurance Claims Processing and Analysis Insurance companies always want to have a better understanding of the claims they are processing, whether it is to detect fraud or to determine new trends or patterns that emerge from the pool of claims they see. Typical auto insurance claims include both tabular, attributed data, such as make, model, year, price, etc., and semi-structured data such as police reports, eyewitness reports, victim reports, etc. In the traditional data warehouse approach, analysts could ask question about the attributed data, but had no means to combine, rank or facet on the complete picture. In this example, a large insurance company took both the structured and semi-structured data into their search application and then enriched it with behavioral data. Specifically, they looked at what the analysts were working on and performed text analysis at a low level to identify trends and patterns. It turns out that they could identify trends such as seeing that in a particular make/model vehicle, just before a crash, people reported that their brakes failed. This data could be fed back to the NTSB and to manufacturers, as well as their own claims adjusters. Virginia Tech - Help the World in Crisis Virginia Tech s Crisis Tragedy Recovery Network serves as a resource to victims and their relatives as well as first responders and policy makers. Anytime there is a large national or global crisis natural or man-made the CTRN harvests content from the web, social media, news outlets, etc., and makes it immediately searchable as well as archived for future access. Over time, they employ large-scale natural language processing to identify trends, topics, themes, and relationships both inside an event and across multiple events to help policy makers and first responders develop systems and processes to improve response. Bright Planet Catch the Bad Guys Bright Planet is in the business of harvesting intelligence from the web beyond the reach of traditional search engines for use by governments, businesses, and organizations. Bright Planet s client in this case is a large pharmaceutical manufacturer who was looking for evidence of the sale of counterfeit drugs. While search can provide some answers, more analysis is needed since counterfeiters often carefully disguise their wares. Bright Planet looks for certain types of language and other indicators that they feed into their search algorithms along with enrichment data from how analysts are performing their analysis and what questions they are asking of the data. This results in new patterns that are detected and continuously refines and improves their analysis. Veoh Cross Recommendations Veoh is a video content network that allows subscribers to watch, follow, share, and comment on aggregated video content from around the web. Their innovative recommendation engine leverages user behavior (videos searched, watched, recommended, items clicked, words typed in, mouse tracks etc.) to influence recommendations and search results. They use behavior across the entire subscriber population to influence an individual s search results and coalesce all of these various signals into a single query system with what appeared to the user as magical results.
5 Page 5 LucidWorks & MapR: Crowd-Sourced Intelligence Getting Started with Reflected Intelligence There are several critical components needed to get starting building applications that leverage reflected intelligence. Fast, efficient, scalable search Lucene/Solr powers some of the world s largest websites and search applications with sub-second response against billions of records, so it makes a good choice for this fundamental component. Bulk and near-real-time indexing Distributed computing platform for performance and scale Storage capacity to store and work with raw data to transform it to address the kinds of questions that will be asked NLP and Machine Learning Tools to address semi- and unstructured data that will scale The natural language processing and machine learning tools are what will power the discovery and analysis. They provide the ability to crunch through all of the feedback and user behavior data to understand what people are clicking. To make this work at scale, the feedback must work seamlessly inside of the system with the appropriate workflows in place to eliminate the need for administrators to chase down log files from disparate systems. Reference Architecture for Reflected Intelligence This reference architecture handles a wide variety of data types both textual and behavioral. It also can handle an array of enrichment systems to elaborate and annotate documents for useful actions across a broad spectrum of business purposes. The enrichment systems can be batch oriented or large-scale offline, or near-real-time. Discovery and enrichment can be done as a rough cut at the time of content acquisition and can be re-clustered at a later date when more is known. The heart of this architecture is the document store represented by the grey cylinder in the middle of the diagram. Inside of this store are multiple shards that make up the document store and retrieval index. It contains text and semi-structured information, as well as structured information processed by ETL systems.
6 Page 6 LucidWorks & MapR: Crowd-Sourced Intelligence Discovery and enrichment processes run against recently added documents and look for patterns and enrichment opportunities that can improve search results. Enrichment can include classifiers and recommenders that can create special tags and indicators on documents to improve correlations. Analytic services are accessed via the general APIs that can query the system and may be explicit or implicit where they are derived from behavior or formed from other data sources. Query processes don t necessarily have to give results. Instead, they may be used to structure a website or notify an analyst when particular conditions are met. MapR Extends Hadoop for Reflected Intelligence MapR provides a technology-leading, complete distribution for Hadoop with enhancements that make Hadoop easy, dependable and fast. MapR distribution includes the different Apache projects from the Hadoop ecosystem such as Hive, Pig, HBase and Mahout over a platform that provides enterprise grade features such as direct access NFS, snapshots, mirroring and instant node recovery. easy MapR innovation allows users to access the Hadoop cluster through industry standard APIs. Some of the standards that are built-in and supported over MapR include full POSIX compliance, Network File Service (NFS), ODBC, Linux PAM and REST. Beyond the standards, MapR also provides multi-tenancy, data placement control and hardware level monitoring of the cluster. Dependable MapR provides some of the best features for running mission critical applications. Features include self-healing of critical services that maintain the nodes and the jobs, snapshots that allows for point in time recovery of data, mirroring that allows for inter-cluster replication over WAN and rolling upgrades that prevent service disruptions. Fast MapR is twice as fast as any other distribution. It leverages optimized shuffle algorithm, direct access to disk, built in compression and code written in advanced C++ to provide superior and unprecedented performance over Hadoop. MapR is particularly well suited for reflected intelligence applications. It provides an integrated data platform that can store file-like objects accessible through HDFS or NFS and table objects that exhibit the Hbase API. MapR supports real-time ingestion and processing for objects that store user behavior which are changing in real time. MapR s snapshot and mirroring capabilities are critical for reflected intelligence applications, as they support the evolution of large data objects over time. With these tools, new data can be layered on old data in what-if scenarios to assess the impact to an application. As search experts will attest, tuning a result set in one area can have unanticipated consequences in other areas, and this sort of impact analysis is crucial to good search hygiene. These snapshots support the always testing model of enrichment, where the search application continues to improve simply through the act of more people using the application over time. In addition, snapshots allow search professionals to play back what might have happened over a particular period of time and recreate situations for further troubleshooting. These capabilities go beyond ordinary Hadoop and make reflected intelligence applications possible. LucidWorks Extends Lucene/Solr for Reflected Intelligence LucidWorks is the leading provider of packaging, support, training, and knowledge about Apache Lucene/Solr. LucidWorks employs about a third of the committers to the open source project and was founded by a group of the committers to promote the adoption of Lucene/Solr. The company continues to contribute a considerable body of work back to the open source project each year. In the past year, the LucidWorks team worked to ensure Lucene/Solr can scale to handle Hadoop workloads.
7 Page 7 LucidWorks & MapR: Crowd-Sourced Intelligence LucidWorks offers LucidWorks Search, which adds a user interface for management and operations to Lucene/Solr, along with a connector framework for integrating to tools like MapR and common enterprise repositories such as SharePoint, file systems, etc., and it adds integration to organizations security access control lists. LucidWorks Big Data offers big data as a service. It is constructed very similarly to the reference architecture referred to earlier in this document. It incorporates LucidWorks Search, adds Hadoop and machine learning, along with pre-built workflows that eliminate the pain of moving the data around to be processed. The LucidWorks Big Data Marketecture The Big Data Operating Systems at the heart of this diagram is the reference architecture discussed earlier where LucidWorks Search is combined with Hadoop, Hbase, etc., and determines that the data is in the right place at the right time. On top of this substrate, Search, Discovery, and Analytics applications are built that leverage machine learning tools, natural language processing, and the tools needed to scale with pre-defined workflows. This is all accessible through a set of REST APIs, so a non-expert can interact with the services with common web services like REST and JSON. The right side of the diagram is the system management layer with the glue, like Zookeeper, and provisioning tools. To get content into the system, LucidWorks provides a variety of connector to a range of enterprise data sources, databases, S3 buckets, plus the system supports push data.
8 Page 8 LucidWorks & MapR: Crowd-Sourced Intelligence The LucidWorks/MapR Advantage The goal of the partnership between LucidWorks and MapR is to enable a rapid path to the next generation of search, by using reflected intelligence, along with other methods, to unlock correlations and insights from large data sets and ultimately drive better decisions for individuals and organizations. By using LucidWorks and MapR, organizations can quickly build reflected intelligence search applications where: Data can be ingested into MapR by a variety of methods, through Hadoop ecosystem components, or by storing data directly and transparently via NFS (for legacy components) Search indices can be stored in MapR and fed into a MapReduce setting into tools like Pig and Mahout or can be deployed using mirrors or NFS MapR snapshots make backups very simple Snapshots also allow scenarios to be replayed and to do experiment management correlate scoring factors, config files, log analysis etc., to see what users saw at the time LucidWorks connects transparently with MapR No unnatural acts are required logs are in NFS or file systems that MapR presents and can run MapReduce jobs over them without concern for where they reside LEARN MORE AND GET STARTED TODAy To learn more about using crowd sourcing reflected intelligence for search and big data please visit and A webinar with Grant Ingersoll, Chief Scientist for LucidWorks and Ted Dunning, Chief Application Architect for MapR can be found on either site. For a direct response, please or For more information, please visit MapR delivers on the promise of Hadoop with a proven, enterprise-grade platform that supports a broad set of mission-critical and real-time production uses. MapR brings unprecedented dependability, ease-of-use and world-record speed to Hadoop, NoSQL, database and streaming applications in one unified Big Data platform. MapR is used across financial services, retail, media, healthcare, manufacturing, telecommunications and government organizations as well as by leading Fortune 100 and Web 2.0 companies. Amazon, Cisco, EMC and Google are part of MapR s broad partner ecosystem. Investors include Lightspeed Venture Partners, Mayfield Fund, NEA, and Redpoint Ventures MapR Technologies. All rights reserved. Apache Hadoop and Hadoop are trademarks of the Apache Software Foundation and not affiliated with MapR Technologies.
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate
Create and Drive Big Data Success Don t Get Left Behind The performance boost from MapR not only means we have lower hardware requirements, but also enables us to deliver faster analytics for our users.
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Technical brief Introduction Disaster recovery (DR) is the science of returning a system to operating status after a site-wide disaster. DR enables business continuity for significant data center failures
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
White Paper Managing MapR Clusters on Google Compute Engine MapR Technologies, Inc. www.mapr.com Introduction Google Compute Engine is a proven platform for running MapR. Consistent, high performance virtual
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
Transactions & Interactions The Correlation of Structured and Unstructured Data Shaun Connolly, Hortonworks December 15, 2011 Big Data Has Reached Every Market Digital data is personal, everywhere, increasingly
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their
Big Data Big Data? Definition # 1: Big Data Definition Forrester Research Big Data? Definition # 2: Quote of Tim O Reilly brings it all home: Companies that have massive amounts of data without massive
CTOlabs.com White Paper: Datameer s User-Focused Big Data Solutions May 2012 A White Paper providing context and guidance you can use Inside: Overview of the Big Data Framework Datameer s Approach Consideration
Cloud Integration and the Big Data Journey - Common Use-Case Patterns A White Paper August, 2014 Corporate Technologies Business Intelligence Group OVERVIEW The advent of cloud and hybrid architectures
White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With
white paper Deploying Big Data with MapR and StackIQ A Simplified, Automated Solution for Enterprise Hadoop from StackIQ and MapR. Abstract Contents Meeting the Need for Enterprise- Grade Hadoop Deployments
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Mastering Big Data Steve Hoskin, VP and Chief Architect INFORMATICA MDM October 2015 Agenda About Big Data MDM and Big Data The Importance of Relationships Big Data Use Cases About Big Data Big Data is
SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction
Saving Millions through Data Warehouse Offloading to Hadoop Jack Norris, CMO MapR Technologies MapR Technologies. All rights reserved. MapR Technologies Overview Open, enterprise-grade distribution for
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University email@example.com 14.9-2015 1/36 Google MapReduce A scalable batch processing
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Management - MCS MapR Data Platform for Hadoop and NoSQL APACHE HADOOP AND OSS ECOSYSTEM Batch
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase TECHNOLOGY OVERVIEW FRAUD MANAGE- MENT REFERENCE ARCHITECTURE This technology overview describes a complete infrastructure and application re-architecture
Time-Series Databases and Machine Learning Jimmy Bates November 2017 1 Top-Ranked Hadoop 1 3 5 7 Read Write File System World Record Performance High Availability Enterprise-grade Security Distribution
IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced
CITO Research Advancing the craft of technology leadership Choosing a Provider from the Hadoop Ecosystem Sponsored by MapR Technologies Contents Introduction: The Hadoop Opportunity 1 What Is Hadoop? 2
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager firstname.lastname@example.org
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Data Is Doubling Every Two Years Unstructured data will account for more than 80% of the data
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team email@example.com @rob1lancaster Organizer of Chicago
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) White Paper BY DATASTAX CORPORATION August 2013 1 Table of Contents Abstract 3 Introduction 3 Overview of HDFS 4
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data
Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses
Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4
Making Sense of Big Data in Insurance Amir Halfon, CTO, Financial Services, MarkLogic Corporation BIG DATA?.. SLIDE: 2 The Evolution of Data Management For your application data! Application- and hardware-specific
1 Agenda Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback 2 A World of Connected Devices Need a new data management architecture for Internet of Things 21% the % of
5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK CUSTOMER JOURNEY Technology is radically transforming the customer journey. Today s customers are more empowered and connected
VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations
Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the
The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media