Data Pipeline with Kafka
|
|
- Garry Adams
- 8 years ago
- Views:
Transcription
1 Data Pipeline with Kafka Peerapat Asoktummarungsri AGODA
2 Senior Software Engineer Agoda.com Contributor Thai Java User Group (THJUG.com) Contributor Agile66
3 AGENDA Big Data & Data Pipeline Kafka Introduction Quick Start Monitoring Data Pipeline for Search API Hadoop integration with Camus
4 Big Data Hadoop Information MapReduce + HDFS
5 Pipeline Website hadoop log
6 Growth Website log hadoop Mobile
7 Complex Website log hadoop Mobile message realtime monitoring
8 Features becomes the problem Website hadoop New Mobile realtime monitoring New API NEW Data Warehouse
9 Data Pipeline Website Produce hadoop Mobile Data Pipeline realtime monitoring Consume API Warehouse
10 1 Consumer Queue 2 Consumer 3 Consumer compare 1 Consumer Topic 1 Consumer 1 Consumer
11 General Topic Implement 2 Consumer 1 Topic Consumer 2 2 Consumer 3 This consumer will lose a message.
12 Distributed by Design Fast Scalable - It can be elastically and transparently expanded without downtime. Durable - Messages are persisted on disk and replicated within the cluster to prevent data loss.
13 msg gid = hadoop msg 4 1 Topic Consumer 1 msg 7 2 Consumer Consumer 3 gid = Group ID
14 msg gid = hadoop msg Topic msg hadoop gid = rtmon realtime monitoring gid = warehouse data warehouse gid = Group ID
15 msg msg 9 Topic 3 gid = hadoop hadoop gid = Group ID New Consumer gid = newconsumer gid = rtmon realtime monitoring gid = warehouse data warehouse
16
17 Vagrant Install Vagrant Install Virtual Box Clone vagrant up
18 BREW brew update brew install zookeeper kafka -y
19 Some Kafka Config # The id of the broker. This must be set to a unique integer for each broker. broker.id=0 # The port the socket server listens on port=9092 # Zookeeper connection string (see zookeeper docs for details). zookeeper.connect=localhost:2181 # Timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=6000 # The minimum age of a log file to be eligible for deletion log.retention.hours=168
20 Linkedin (2013) 10 billion message writes per day 55 billion messages delivered to real-time consumers 367 topics that cover both user activity topics and operational data the largest of which adds an average of 92GB per day of batch-compressed messages Messages are kept for 7 days, and these average at about 9.5 TB of compressed messages across all topics.
21 KafkaOffsetMonitor 1 Jar file, KafkaOffsetMonitor-assembly jar java -cp KafkaOffsetMonitor-assembly jar \ com.quantifind.kafka.offsetapp.offsetgetterweb \ --zk localhost \ --port 8080 \ --refresh 10.seconds \ --retain 2.days Download KafkaOffsetMonitor from Github
22
23
24
25
26
27
28 Produce Change CHANGE API Kafka Price & Inventory Consumer Calculate Price Hotel Manager HTTP Search API Hotels Cassandra
29 API CHANGE Kafka Price & Inventory Consumer Hotel Manager A Consumer Hotels B Consumer
30 Camus
31 Slide available here Sourcecode available here
32 REFERENCES developingwithapachekafka
33 Q & A
STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day
STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events every day Neha Narkhede Co-founder and Head of Engineering @ Stealth Startup Prior to this Lead, Streams Infrastructure
More informationNear Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationBuilding your Big Data Architecture on Amazon Web Services
Building your Big Data Architecture on Amazon Web Services Abhishek Sinha @abysinha sinhaar@amazon.com AWS Services Deployment & Administration Application Services Compute Storage Database Networking
More informationArchitectures for massive data management
Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with
More informationX4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
More informationA Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
More informationBig Data Infrastructure at Spotify
Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure June 12, 2013 2 Agenda Let s talk about Data Infrastructure, how we did it, what we learned and how we ve failed Some Context
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationOpenbus Documentation
Openbus Documentation Release 1 Produban February 17, 2014 Contents i ii An open source architecture able to process the massive amount of events that occur in a banking IT Infraestructure. Contents:
More informationThe Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationKafka & Redis for Big Data Solutions
Kafka & Redis for Big Data Solutions Christopher Curtin Head of Technical Research @ChrisCurtin About Me 25+ years in technology Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)
More informationBuilding a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave
Building a logging pipeline with Open Source tools Iñigo Ortiz de Urbina Cazenave NLUUG Utrecht - Netherlands 28 May 2015 whoami; 2 Iñigo Ortiz de Urbina Cazenave Systems Engineer whoami; groups; 3 Iñigo
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationUsing Kafka to Optimize Data Movement and System Integration. Alex Holmes @
Using Kafka to Optimize Data Movement and System Integration Alex Holmes @ https://www.flickr.com/photos/tom_bennett/7095600611 THIS SUCKS E T (circa 2560 B.C.E.) L a few years later... 2,014 C.E. i need
More informationdocs.hortonworks.com
docs.hortonworks.com Hortonworks Data Platform: Configuring Kafka for Kerberos Over Ambari Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop,
More informationSearch and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
More informationComparing Scalable NOSQL Databases
Comparing Scalable NOSQL Databases Functionalities and Measurements Dory Thibault UCL Contact : thibault.dory@student.uclouvain.be Sponsor : Euranova Website : nosqlbenchmarking.com February 15, 2011 Clarications
More informationAgenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:
Cloud (data) management Ahmed Ali-Eldin First part: ZooKeeper (Yahoo!) Agenda A highly available, scalable, distributed, configuration, consensus, group membership, leader election, naming, and coordination
More informationThe Hadoop Distributed File System
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationKatta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com
1 Katta & Hadoop Katta - Distributed Lucene Index in Production Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com foto by: belgianchocolate@flickr.com 2 Intro Business intelligence reports from
More informationWisdom from Crowds of Machines
Wisdom from Crowds of Machines Analytics and Big Data Summit September 19, 2013 Chetan Conikee Irfan Ahmad About Us CloudPhysics' mission is to discover the underlying principles that govern systems behavior
More informationKafka: a Distributed Messaging System for Log Processing
Kafka: a Distributed Messaging System for Log Processing Jay Kreps LinkedIn Corp jkreps@linkedincom Neha Narkhede LinkedIn Corp nnarkhede@linkedincom Jun Rao LinkedIn Corp jrao@linkedincom ABSTRACT Log
More informationThe Game of Big Data! Analytics Infrastructure at KIXEYE
The Game of Big Data! Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup linkedin.com/in/randyshoup QCon New York, June 13 2014 Free-to-Play Real-time Strategy Games Web and mobile Strategy and
More informationRealtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at
More informationFacebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
More informationJeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
More informationElasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationIntroduction to Apache Kafka And Real-Time ETL. for Oracle DBAs and Data Analysts
Introduction to Apache Kafka And Real-Time ETL for Oracle DBAs and Data Analysts 1 About Myself Gwen Shapira System Architect @Confluent Committer @ Apache Kafka, Apache Sqoop Author of Hadoop Application
More informationTesting Spark: Best Practices
Testing Spark: Best Practices Anupama Shetty Neil Marshall Senior SDET, Analytics, Ooyala Inc SDET, Analytics, Ooyala Inc Spark Summit 2014 Agenda - Anu 1. Application 2. Test 3. Best Overview Batch mode
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
More informationAn Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
More informationPutting Apache Kafka to Use!
Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable
More informationHADOOP MOCK TEST HADOOP MOCK TEST II
http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationMapReduce, Hadoop and Amazon AWS
MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables
More informationFast Data in the Era of Big Data: Twitter s Real-
Fast Data in the Era of Big Data: Twitter s Real- Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Presented by: Rania Ibrahim 1 AGENDA Motivation
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationBookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011
BookKeeper Flavio Junqueira Yahoo! Research, Barcelona Hadoop in China 2011 What s BookKeeper? Shared storage for writing fast sequences of byte arrays Data is replicated Writes are striped Many processes
More informationThe Cloud to the rescue!
The Cloud to the rescue! What the Google Cloud Platform can make for you Aja Hammerly, Developer Advocate twitter.com/thagomizer_rb So what is the cloud? The Google Cloud Platform The Google Cloud Platform
More informationIntegrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
More informationSAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT
More informationBig Data Spatial Analytics An Introduction
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data Spatial Analytics An Introduction Marwa Mabrouk Mansour Raad Esri iu UC2013. Technical Workshop
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationxpaaerns on Spark, Shark, Tachyon and Mesos
xpaaerns on Spark, Shark, Tachyon and Mesos Spark Summit 2014 Claudiu Barbura Sr. Director of Engineering A>geo Agenda xpa&erns Architecture From Hadoop to BDAS & our contribu
More informationBest Practices for Sharing Imagery using Amazon Web Services. Peter Becker
Best Practices for Sharing Imagery using Amazon Web Services Peter Becker Objectives Making Imagery Accessible Store massive volumes of imagery on inexpensive cloud storage Use elastic compute for image
More informationIntroduction to Hbase Gkavresis Giorgos 1470
Introduction to Hbase Gkavresis Giorgos 1470 Agenda What is Hbase Installation About RDBMS Overview of Hbase Why Hbase instead of RDBMS Architecture of Hbase Hbase interface Summarise What is Hbase Hbase
More informationYARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing
YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing Eric Charles [http://echarles.net] @echarles Datalayer [http://datalayer.io] @datalayerio FOSDEM 02 Feb 2014 NoSQL DevRoom
More informationCS 378 Big Data Programming. Lecture 2 Map- Reduce
CS 378 Big Data Programming Lecture 2 Map- Reduce MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is processed But viewed in small increments
More informationPepper: An Elastic Web Server Farm for Cloud based on Hadoop. Subramaniam Krishnan, Jean Christophe Counio Yahoo! Inc. MAPRED 1 st December 2010
Pepper: An Elastic Web Server Farm for Cloud based on Hadoop Subramaniam Krishnan, Jean Christophe Counio. MAPRED 1 st December 2010 Agenda Motivation Design Features Applications Evaluation Conclusion
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationHDFS Cluster Installation Automation for TupleWare
HDFS Cluster Installation Automation for TupleWare Xinyi Lu Department of Computer Science Brown University Providence, RI 02912 xinyi_lu@brown.edu March 26, 2014 Abstract TupleWare[1] is a C++ Framework
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationMoving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
More informationDATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationCS 378 Big Data Programming
CS 378 Big Data Programming Lecture 2 Map- Reduce CS 378 - Fall 2015 Big Data Programming 1 MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationProductionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN Issac Buenrostro, Arup Malakar Spark Summit 2014 July 1, 2014 About Ooyala Cross-device video analytics and monetization products and services Founded
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationHadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com
Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since Feb
More information[Hadoop, Storm and Couchbase: Faster Big Data]
[Hadoop, Storm and Couchbase: Faster Big Data] With over 8,500 clients, LivePerson is the global leader in intelligent online customer engagement. With an increasing amount of agent/customer engagements,
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationLarge Scale Text Analysis Using the Map/Reduce
Large Scale Text Analysis Using the Map/Reduce Hierarchy David Buttler This work is performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract
More informationChase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
More informationBig Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
More informationHadoop. History and Introduction. Explained By Vaibhav Agarwal
Hadoop History and Introduction Explained By Vaibhav Agarwal Agenda Architecture HDFS Data Flow Map Reduce Data Flow Hadoop Versions History Hadoop version 2 Hadoop Architecture HADOOP (HDFS) Data Flow
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationIntroduction to HDFS. Prasanth Kothuri, CERN
Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. Hadoop
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationMedia Upload and Sharing Website using HBASE
A-PDF Merger DEMO : Purchase from www.a-pdf.com to remove the watermark Media Upload and Sharing Website using HBASE Tushar Mahajan Santosh Mukherjee Shubham Mathur Agenda Motivation for the project Introduction
More informationScalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationApplication and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10
Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationWorkshop: From Zero. Budapest DW Forum 2014
Workshop: From Zero to _ Budapest DW Forum 2014 Agenda today 1. Some setup before we start 2. (Back to the) introduction 3. Our workshop today 4. Part 2: a simple Scalding job on EMR Some setup before
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationApache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
More informationCSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei
CSE 344 Introduction to Data Management Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei Homework 8 Big Data analysis on billion triple dataset using Amazon Web Service (AWS) Billion Triple Set: contains
More informationPerformance Testing of Big Data Applications
Paper submitted for STC 2013 Performance Testing of Big Data Applications Author: Mustafa Batterywala: Performance Architect Impetus Technologies mbatterywala@impetus.co.in Shirish Bhale: Director of Engineering
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationFast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture
Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Adeniyi Abdul 2522715 Agenda Abstract Introduction
More informationCreating Big Data Applications with Spring XD
Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these
More informationEfficient Network Marketing - Fabien Hermenier A.M.a.a.a.C.
the road to cloud native applications Fabien Hermenier 1 cloud ready applications single-tiered monolithic hardware specific cloud native applications leverage cloud services scalable reliable 2 Agenda
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationHow To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
More informationBIG DATA USING HADOOP
+ Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data
More informationHadoop Distributed File System (HDFS) Overview
2012 coreservlets.com and Dima May Hadoop Distributed File System (HDFS) Overview Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized
More information