CASE STUDIES OF SUCCESSFUL APPLICATIONS OF BIG DATA IN
|
|
|
- Alban Carpenter
- 10 years ago
- Views:
Transcription
1 CASE STUDIES OF SUCCESSFUL APPLICATIONS OF BIG DATA IN THE INDUSTRY SAI HARINYA TURAGA RAMYA MERUVA TEJASWINI KANTHETI
2 CASE STUDIES OF SUCCESSFUL APPLICATIONS OF BIGDATA IN INDUSTRY Introduction Big data is the name given to very large and complex databases which cannot be processed by traditional database management systems. This kind of data is being generated due to the analysis of single large set of related data rather than multiple smaller data sets with same amount of data. This analysis helps in identifying business trends, quality of research, and also allows business organizations to take better decisions. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization [1]. Definition[2] In 2012, Gartner defined big data as "It is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization." Big Data in Industry As the amount of Big data generated increased, the need for information management also increased. So the companies like Microsoft, IBM, HP, Dell, Oracle Corporation, etc. have spent $15 billion, only on software firms specialized in data management and analysis. In 2010, this industry on its own was worth more than $100 billion and was growing at almost 10 percent a year: about twice as fast as the software business as a whole[3]. Big data software: Apache Hadoop MongoDB Splunk APACHE HADOOP Definition [4] The Apache Hadoop is open-source software that is used for reliable, scalable, distributed computing. It is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. Apache Hadoop has the following modules [5]: Hadoop Common - contains libraries and utilities needed by other Hadoop modules
3 Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster. Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Hadoop MapReduce - a programming model for large scale data processing. As Hadoop evolves, many companies started using Hadoop for both research and production purposes. More than half of the fortune 50 companies use Hadoop including Yahoo!, Facebook. Case Study: Yahoo! Yahoo is not only the first large scale user of Hadoop but also, a tester and a contributor. Fig 1 [6] represents the volumes of data generated each year since 2006 and also the internal Hadoop usage. Figure 1 Initially, Yahoo started using Hadoop in order to speed up the process of indexing the web crawl results for its search engine, but now it completely depends on Microsoft s Bing search results. Yahoo's relationship with Hadoop today, according to Scott Burke, senior VP of advertising and data platforms is that the company is still betting on it. "Yahoo remains committed to Hadoop, more than ever before. It's the only platform we use globally. We have the largest Hadoop footprint in the world," said Burke in his keynote address at the Hadoop Summit 2012[4]. Fig 2 [6] represents statistics before and after Yahoo started using Hadoop.
4 Architecture Currently, Yahoo uses 42,000 servers which is 1,200 racks divided into four data centers to run Hadoop. The largest cluster being 4,000 nodes, but after the release of Apache Hadoop 2.0, yahoo will increase it to 10,000 nodes. Fig 3[6] represents the architecture of Hadoop. Figure 3 Usage of Hadoop in Yahoo Today, we can say that Hadoop has become an integral part of how yahoo does business. It uses Hadoop to block spam messages that try to get into the servers. This has elevated their spam detection capabilities. On an average, yahoo blocks around 20.5 billion spam messages per day. It is located at the bottom of the powerful software stack that is inside yahoo which empowers the personalization facilities that yahoo aims to give its end-users. Hadoop already has information regarding the web history of the end-users based upon the previous visits, so if you search for sports or finance news, you ll get a high-value package that reflect your interests. In order to obtain this kind of personalization, yahoo uses a combination of automated analysis and human editors to define packages. If there isn t any manual involvement, then everyone would get packages about celebrities, which draw the highest traffic on the site. [7] Instead of using Hadoop as a stand-alone application, Yahoo uses it as an information foundation for an Oracle database system that pulls presorted, indexed data out and feeds it into a Microsoft SQL Server cube for highly detailed analysis. The resulting data is displayed in
5 either Tableau or Microstrategy visualization systems to Yahoo business analysts. They in turn use it to advise advertisers how their campaigns are faring soon after launch. [8] Yahoo can suggest advertisers on the kind and percentage of people who respond best to their campaigns so that advertisers have the facility to adjust their ads to appeal to wide range of audience. This will lead to improvement in search results, response time and will also uplift the business. Yahoo provides the advertisers with significant data like who is viewing their advertisement, how much time are they spending on it, what they are doing on the site after reading it, etc. This kind of data is very valuable and thus, helps the advertisers to plan and improve their future advertising strategies. This helps them take better decisions which ultimately results in better business. Fig 4[6] depicts the yahoo webpage and shows where Hadoop is actually being utilized.. Figure 5 In addition to all of this, yahoo developed a set of open source code written in JavaScript called the cocktails. These cocktails increase the data availability to the advertisers by providing them with with tools that extract data directly from Yahoo Hadoop. A server-side cocktail is Manhattan whereas client side is Mojito. In order to get the information the advertiser is seeking, a cocktail could run the same reporting code on either server or an end-user s laptop, respectively. The reporting system works on a Hadoop-Oracle-Pentaho Mondrian (open source code for building multi-view cubes) stack [9]. The advertisers could see all the valuable data that yahoo gathered about their customers. If this is the case, then there is huge possibility that cocktails could replace PHP in the business. Apart from all this, Yahoo also uses Hadoop for the internal analysis of the data that is collected from user interactions. This data accumulates to 140 Peta Bytes in Hadoop. Since Hadoop maintains triplicates of all data sets, over 400 Peta Bytes of storage is required to maintain the system. Yahoo adopted Hadoop at its crude level, nurtured it, made it an open source, got benefited from it and is still betting their business on it by continuously developing it. The company is confident enough that its connection with Hadoop will bring Yahoo to the top in the business initiatives.
6 In an interview [9] with the Information Week, Scott Burke said, Yahoo is becoming an expert at capitalizing on the big shift from the off-line analysis model to something much closer to a predictive model: Get a response in front of the customer with the right offer at the right time," To achieve this goal, Yahoo has had to implement "science at scale," or invest in an open source system that seemed to have broad potential but wasn't yet proven in prime time. "We've bet the business on this platform at on a global scale," Burke says, "and there's no turning back." MongoDB :[10] MongoDB is a cross-platform document-oriented database system. Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas, making the integration of data in certain types of applications easier and faster. MongoDB is free and open source software. MongoDB comes under online form of bigdata. Main Features of MangoDB are : 1. Adhoc queries : Regular expression searches, range of queries etc are supported by the mangodb. Specified documents and java script functions are returned by the set of queries. 2. Replication : MongoDB provides the advantage of Replication which increases the throughput and availabitiy.the Replication procedure involves the availability of two replica copies. Primary replica performs the read and write operations on the data and the secondary replica maintains the copy of the data at the backend. If there is no availability of primary replica then replica automatically set elections which secondary replica should become primary replica and then read and write operations are performed by that elected replica. 3. Indexing : In MongoDB all the documents are indexed in which the indices are similar to the indices in Relational Data Management System. 4. File storage : MongoDB also provides the file storage mechanism achieved by using replicas system to store files in different machines and load balancing process.gridfs is the function which develops the languages and help in the file manipulations to the developers. MongoDB also allows the mechanism like distribution of files over multiple machines which in turn creates fault tolerance. 5. Aggregation : MongoDB provides the facility of MapReduce technique which is used for Batchprocessing of data and aggregation operations. Aggregation is used where groupby clause is used in SQL to obtain results.
7 How MongoDB and Hadoop will work together for typical BigData?[11] There are so many techniques in which Hadoop and MongoDB work together to fit for typical BigData.Some of them are described briefly. They are : 1. Batch Aggregation Technique. 2. Data warehouse. Batch Processing Technique : In many cases the data analyzing can be done by providing the built-in-aggregation functionality of MongoDB, if there is a complex data the hadoop substitutes the MongoDB which provides a powerful framework for complex data analysing. In this scenario, the data to be processed is taken from the MongoDB and it is sent to Hadoop and after processing in Hadoop, it is again sent to the MongoDB for ad-hoc analysis. The information found on the top of MongoDB can be used by the applications and will be presented to the end-users. Datawarehouse: The large amounts of data is stored in the hadoop which acts as a data warehouse which is taken from different data sources.
8 In this scenario, the data stored in the MongoDB can be taken via Mapreduce jobs and stored in the Hadoop and the data from other different sources are also available. The mostly used techniques by the data analysts are Pig and MapReduce jobs to create jobs that query these larger datasets which incorporate data from MongoDB. Case study :[12] MongoDB is an open-source document and it uses nosql database and it is written in C++. There are many number of clients for MngoDB. They are : 1.MTV 2.Forbes 3.Craigslist 4.Intuit 5.Rangespan 6.Foursquare 7.CustomInk etc If we have a deep look into one of the above i.e., MTV networks : MTV Networks : MTV networks operates different web-properties like thedailyshow.com, gametrailers.com, nick.com etc. MongoDB was choosen by MTV for ots document data model and flexible schema as a data repository. Problem faced by MTV networks? MTV web-properties were developed on JAVA-based content management system, which suits their data model. They planned to migrate to new relational model for content management system, as there are different web-properties, they made it a roadblock problem. So after struggling MTV began looking for database solution which possess a data flexibilty. Solution : This problem can be overcome by using the MongoDB which allows MTV to store hierarchical data without using queries to build pages. MongoDB schema can be used for the designing the structures and data elements for each brand. The advantage Possessed by MongoDB is Queries and Indexing which is provided by no other databases. In this the documents are provided with an indices which are helpful for fast access of
9 data and also MTV found query nested data feature in MongoDB to overcome the above problem. Results : MTV uses the MongoDB to find a solution for their database problem and also met the needs of their web-properties.in this way MongoDB helps to enable the online BigData which to cease the delay of the processing of applications to serve all the end-users where the new data is created at every time and provides modern application performance. SPLUNK Splunk is an American multinational corporation with its headquarters in San Francisco, California. Splunk began as a tool for troubleshooting IT issues by poring through log files, but now it produces software for monitoring, searching and analyzing machine-generated big data, through a web-style interface[13]. Splunk gives IT professionals ways to log into any piece of information and quickly index it, find it and run a number of analytics functions on it.[14] Splunk Co-Founder and CTO Erik Swan says Splunk is even used as a replacement for, or at least a complement to, Hadoop in certain instances. Hadoop might be great for social-networking data and creating a social graph. Splunk is ideal for time-related tasks like monitoring profile changes. Even web sites such as Facebook, Myspace and Zynga use Splunk to analyze operational data. [15] Customers: [16] Some of the customers of Splunk are Domino s pizza: uses Splunk to strengthen online business through customer interaction and understanding T-mobile: uses splunk to search across all their log sources, run ad hoc reports and visualize all their IT data, secure the infrastructure and more. Hortonworks: uses both hadoop and splunk. But mainly uses splunk to get data easily and fast, ease of use Apollo: uses splunk to get security information, faster search engine Cricket Communications: uses Splunk to monitor the performance of an in-house middleware environment, to gain end-to-end visibility on various critical infrastructures
10 Case Study: Domino s pizza transforms E-Commerce with Splunk Domino's is a world leader in pizza delivery. Domino's was founded in 1960 and by 2012 its sales were $7.5 billion. It consists of more than 10,000 corporate and franchised stores in US and international markets. It has an online ordering system, apps, kindle and iphone. Domino s used Splunk for the first time in Splunk was initially used by Domino's because it needed a solution to analyze and aggregate logging data from OS (Linux and Solaris) and middleware in a timely manner. Information Security team used HP ArcSight for log aggregation, but Splunk offered the following advantages:[17] Faster and easier searches in Splunk Real-time insights Better reporting with Apache access logs Much faster alerting in Splunk Cost and scalability Ease of deployment Splunk Usage at Domino's: [17] Range of Uses from monitoring to business insights include: What we are selling, orders per minute, coupon usage, etc. Online ordering trends, efficiency of marketing promotions Splunk provides answers 24-48h prior to analysis from data warehousing tools. Significant reduction in troubleshooting time Streamlined developer insight into debugging development code Architecture: Splunk provides high performance and has a scalable software server which is written in C/C++ and Python. It indexes and searches logs and other IT data in real time. Splunk deals with data generated by any application, server or device. The Splunk Developer API is accessible through REST, SOAP or the command line. After downloading, installing and starting Splunk, you'll find two Splunk Server processes running on your host, splunkd and splunkweb. splunkd is a distributed C/C++ server that accesses, processes and indexes streaming IT data. It also handles search requests. splunkd processes and indexes your data by streaming it through a series of pipelines. Pipelines are single threads inside the splunkd process, each configured with a single snippet of XML. Processors are individual, reusable C/C++ or Python functions that act on the stream of IT data passing through a
11 pipeline. Pipelines can pass data to one another via queues. splunkd supports a command line interface for searching and viewing results. splunkweb is a Python-based application server providing the Splunk Web user interface. It allows users to search and navigate IT data stored by Splunk servers and also to manage your Splunk deployment through the browser interface. splunkweb communicates with your web browser via REST and communicates with splunkd via SOAP.[18] Splunk at Domino's today: Splunk is deployed across two data centers(live and failover). It has four different production environments GB of data is indexed per day. There are dozen unique users per month.
12 Teams using Splunk are: Site Reliability team, InfoSec and web developers. Splunk Apps are: Deployment Monitor, Google Maps [19] Results with Splunk: Domino's splunk environment: Before Splunk gathering logs manually Shifting through aggregated Java messages from middleware(grep) Reactive After Splunk Million times easier with Splunk Proactive alarm alerts to dip in sales Base lining and trending Splunk for operational analysis of payment processing: Measuring response time for various order channels
13 Instant analysis of cash vs. credit card ordering performance. Troubleshooting card processor issues. Splunk for GEO sales tracking: Splunk API s integrate with Domino s GEO sales tracking applications which are Java based. It monitors sales by regions. Domino s has been able to identify ISP outages in certain regions. Splunk for Domino s marketing: Before Splunk: Someone stuggling with data and numbers daily After Splunk: Automated information, report submitted to leadership team including the CIO and CEO, monitoring promotion success in real-time. Splunk at Domino s Future: Create real time dashboards for any departments to view Use Splunk for more key performance analysis Expand Splunk Apps deployment: Linux and Unix monitoring, VMware App, F5 integration Optimize middleware application logs for Splunk consumption Start to leverage Splunk to monitor Corporate applications built on stack. [19]
14 References [1] [2] [3] [6] [4http:/ [5] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] getsinstalled#splunk_and_windows_in_safe_mode [19]
Transform E- Commerce the Domino s Pizza Way
Copyright 2014 Splunk Inc. Transform E- Commerce the Domino s Pizza Way Domino s Pizza Russell Turner Disclaimer During the course of this presentafon, we may make forward- looking statements regarding
Copyright 2013 Splunk Inc. Introducing Splunk 6
Copyright 2013 Splunk Inc. Introducing Splunk 6 Safe Harbor Statement During the course of this presentation, we may make forward looking statements regarding future events or the expected performance
BIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
MONGODB - THE NOSQL DATABASE
MONGODB - THE NOSQL DATABASE Akhil Latta Software Engineer Z Systems, Mohali, Punjab MongoDB is an open source document-oriented database system developed and supported by 10gen. It is part of the NoSQL
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.
Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014
White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page
From Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
Ubuntu and Hadoop: the perfect match
WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely
Understanding the Value of In-Memory in the IT Landscape
February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to
Assignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data
Research Report CA Technologies Big Data Infrastructure Management Executive Summary CA Technologies recently exhibited new technology innovations, marking its entry into the Big Data marketplace with
Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions
Big Data Solutions Portal Development with MongoDB and Liferay Solutions Introduction Companies have made huge investments in Business Intelligence and analytics to better understand their clients and
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
XpoLog Competitive Comparison Sheet
XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT
FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase
FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase TECHNOLOGY OVERVIEW FRAUD MANAGE- MENT REFERENCE ARCHITECTURE This technology overview describes a complete infrastructure and application re-architecture
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
The 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS
Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Customized Report- Big Data
GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.
Microsoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: [email protected];
Microsoft Big Data Solutions Anar Taghiyev P-TSP E-mail: [email protected]; Why/What is Big Data and Why Microsoft? Options of storage and big data processing in Microsoft Azure. Real Impact of Big
Introduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Three Open Blueprints For Big Data Success
White Paper: Three Open Blueprints For Big Data Success Featuring Pentaho s Open Data Integration Platform Inside: Leverage open framework and open source Kickstart your efforts with repeatable blueprints
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Hadoop and its Usage at Facebook. Dhruba Borthakur [email protected], June 22 rd, 2009
Hadoop and its Usage at Facebook Dhruba Borthakur [email protected], June 22 rd, 2009 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed on Hadoop Distributed File System Facebook
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager [email protected]
Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.
Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has
EMC Data Protection Advisor 6.0
White Paper EMC Data Protection Advisor 6.0 Abstract EMC Data Protection Advisor provides a comprehensive set of features to reduce the complexity of managing data protection environments, improve compliance
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Open Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
Zend and IBM: Bringing the power of PHP applications to the enterprise
Zend and IBM: Bringing the power of PHP applications to the enterprise A high-performance PHP platform that helps enterprises improve and accelerate web and mobile application development Highlights: Leverages
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Chase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
Big Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Using Tableau Software with Hortonworks Data Platform
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
Unified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
Logentries Insights: The State of Log Management & Analytics for AWS
Logentries Insights: The State of Log Management & Analytics for AWS Trevor Parsons Ph.D Co-founder & Chief Scientist Logentries 1 1. Introduction The Log Management industry was traditionally driven by
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook
Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the
How To Make Data Streaming A Real Time Intelligence
REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
A Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT
WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve
Cisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights. 2013 Copyright Metric insights, Inc.
JAVASCRIPT CHARTING Scaling for the Enterprise with Metric Insights 2013 Copyright Metric insights, Inc. A REVOLUTION IS HAPPENING... 3! Challenges... 3! Borrowing From The Enterprise BI Stack... 4! Visualization
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
Senior Business Intelligence/Engineering Analyst
We are very interested in urgently hiring 3-4 current or recently graduated Computer Science graduate and/or undergraduate students and/or double majors. NetworkofOne is an online video content fund. We
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
Big Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable
Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led
Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led Course Description This three day course prepares IT Professionals to administer enterprise search solutions using
Next-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK OVERVIEW ON BIG DATA SYSTEMATIC TOOLS MR. SACHIN D. CHAVHAN 1, PROF. S. A. BHURA
IBM Global Business Services Microsoft Dynamics CRM solutions from IBM
IBM Global Business Services Microsoft Dynamics CRM solutions from IBM Power your productivity 2 Microsoft Dynamics CRM solutions from IBM Highlights Win more deals by spending more time on selling and
<Insert Picture Here> Big Data
Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
PRTG NETWORK MONITOR. Installed in Seconds. Configured in Minutes. Master Your Network for Years to Come.
PRTG NETWORK MONITOR Installed in Seconds. Configured in Minutes. Master Your Network for Years to Come. PRTG Network Monitor is... NETWORK MONITORING Network monitoring continuously collects current status
PRTG NETWORK MONITOR. Installed in Seconds. Configured in Minutes. Masters Your Network for Years to Come.
PRTG NETWORK MONITOR Installed in Seconds. Configured in Minutes. Masters Your Network for Years to Come. PRTG Network Monitor is... NETWORK MONITORING Network monitoring continuously collects current
Big Data Analytics OverOnline Transactional Data Set
Big Data Analytics OverOnline Transactional Data Set Rohit Vaswani 1, Rahul Vaswani 2, Manish Shahani 3, Lifna Jos(Mentor) 4 1 B.E. Computer Engg. VES Institute of Technology, Mumbai -400074, Maharashtra,
Networking in the Hadoop Cluster
Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop
GigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
