Media Upload and Sharing Website using HBASE
|
|
- Preston Atkinson
- 8 years ago
- Views:
Transcription
1 A-PDF Merger DEMO : Purchase from to remove the watermark Media Upload and Sharing Website using HBASE Tushar Mahajan Santosh Mukherjee Shubham Mathur
2 Agenda Motivation for the project Introduction Summary of how we used Hadoop Why HBASE not RDBMS? Current Status Challenges Future Work
3 Motivation Facebook, Stumble Upon HBASE
4 Motivation Cont. "Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform and an attempt to understand the rules for success on that new platform. Chief among those rules is this: Build applications that harness network effects to get better the more people use them. (This is what I've elsewhere called 'harnessing collective intelligence.')" - Tim O'Reilly, Grand Poobah2.0
5 Why HBASE and not RDBMS RDBMS Powerful For small scale ideal to use What if someday, my site ranks top in google search. How do i scale my performance? Although you can run several instances of mysql on different machines.
6 But will it help?
7 No!
8 Scaling MySQL hard, Oracle Expensive(and hard). Machine cost goes up faster speed Turns off all relational feature to scale (!!) Turns off secondary(!) indexes too That is not the power of RBDMS, its power is to build indexes, scale no of rows.
9 Prob Cont. Tables are harder to scale at sizes as low as 500GB Hard to read data at these sizes
10 In case of Schema change? What about schema change or migrations? Mysql is not your friend Only gets harder with more data
11 HBASE Mostly schema-less Dynamic distribution Motivation for HBASE? Google Bigtables
12 HBASE is an Apache open source project whose goal is to provide Bigtable like storage for the Hadoop Distributed computing Environment.
13 Data Model Similar to that of Bigtable. Applications store data rows in labeled tables. A data row has a sortable row key and an arbitrary number of columns. A column name has the form <family>:<label> where <family> and <label> can be arbitrary byte arrays.
14 HBASE storage model Column oriented database Column name is arbitary data, can have, variable, number of column per row. Can random read and write Tables are split into roughly equal size regions Region split as they grow, thus dynamically adjusting your data set.
15 ( Language(HQL Hbase Query ${HBASE_HOME}/bin/hbase shell [--help] Usage:./bin/hbase shell [-- master:ip_address:port] [--html] Running the above command on command line presents before you the following prompt: hql>
16 Sample Hbase Query To create a table: CREATE TABLE table_name(column_family_definition [,column_family_definition]... ) Column_family_definition: column_family_name [MAX_VERSIONS=n] [MAX_LENGTH=n] [COMPRESSION=NONE RECORD BLOCK] [IN_MEMORY] [BLOOMFILTER=NONE BLOOMFILTER COUNTING_BL OOMFILTER RETOUCHED_BLOOMFILTER VECTOR_SIZE=n NUM_HASH=n]
17 (.. Contd ) Sample HBASE Queries SELECT Syntax: SELECT { column_name, [, column_name]... expr[alias] *} FROM table_name [WHERE row='row_key' STARTING FROM 'row-key' [UNTIL 'stop-key']] [NUM_VERSIONS = version_count] [TIMESTAMP 'timestamp'] [LIMIT = row_count] [INTO FILE 'file_name']
18 (.. contd ) Sample HBASE Queries Insert data into table Syntax: INSERT INTO table_name (... ('value', (colmn_name,...) VALUES WHERErow='row_key'[TIMESTAMP'timestamp']; column_name: column_family_name column_family_name:column_label_name
19 HQL FACTS The hql shell prompt has now been depreciated. It has been moved to a newer shell version. PS: Never bother to mention hql in IRC.
20 Sample php to communicate with Hbase // open a new connection to rest server. Hbase Master default port is $hbase = new hbase_rest($ip, $port); // get list of tables $tables = $hbase->list_tables(); // get table column family names and compression stuff $table_info=$hbase>table_schema("search_index");
21 Sample and end row keys of each region Php File (Cont) // get start $regions = $hbase->regions($table); // select data from hbase $results = $hbase->select($table,$row_key); // insert data into hbase the $column and $data can be arrays with more then one column inserted in one request $hbase->insert($table,$row,$column(s),$data(s));
22 Scaling HBASE Add more machines to scale Base model(bigtables) scale past 1000TB
23 No Inherent reason why HBASE couldn't
24
25 How to store data in HBASE? Maybe not your raw log data... Results, processing it with hadoop By storing the defined version in HBASE, can keep up with huge data demands and serve to your website
26 Website access Using thrift gateway, php code accesses HBASE No additional caching other than what Hbase provides
27 Large data Storage Over 9 billion rows and 1300 GB in Hbase Can map reduce a 700GB table in ~20 min This is about 6 million rows/sec
28 Challenges Lack of Documentation Its new hard to find any document library or tutorial. Hostel Wireless Issues Need atleast 2 computer to test. Thrift is still in early stage. Lot of php issues :(, no help nearby IRC Freenode #hbase channel was very helpful ( slow (but process is
29 Alternatives Cassandra Hypertable
30 References Home Page Wiki Freenode IRC #hbase html
31
32
33 Thank You For the Patience
34 HBASE is an Apache open source project whose goal is to provide Bigtable like storage for the Hadoop Distributed computing Environment.
35 Data Model Similar to that of Bigtable. Applications store data rows in labeled tables. A data row has a sortable row key and an arbitrary number of columns. A column name has the form <family>:<label> where <family> and <label> can be arbitrary byte arrays.
36 HBASE QUERY LANGUAGE (HQL) ${HBASE_HOME}/bin/hbase shell [--help] Usage:./bin/hbase shell [--master:ip_address:port] [-- html] Running the above command on command line presents before you the following prompt: hql>
37 Sample HBASE Queries To create a table: Syntax: CREATE TABLE table_name(column_family_definition [, column_family_definition] ) Column_family_definition: column_family_name [MAX_VERSIONS=n] [MAX_LENGTH=n] [COMPRESSION=NONE RECORD BLOCK] [IN_MEMORY] [BLOOMFILTER=NONE BLOOMFILTER COUNTING_BL OOMFILTER RETOUCHED_BLOOMFILTER VECTOR_SIZE=n NUM_HASH=n]
38 Sample HBASE Queries (Contd..) SELECT Syntax: SELECT { column_name, [, column_name]... expr[alias] *} FROM table_name [WHERE row='row_key' STARTING FROM 'row-key' [UNTIL 'stop-key']] [NUM_VERSIONS = version_count] [TIMESTAMP 'timestamp'] [LIMIT = row_count] [INTO FILE 'file_name']
39 Sample HBASE Queries (contd..) Insert data into table Syntax: INSERT INTO table_name (colmn_name,...) VALUES ('value',...) WHERErow='row_key'[TIMESTAMP'timestamp']; column_name: column_family_name column_family_name:column_label_name
40 HQL FACTS The hql shell prompt has now been depreciated. It has been moved to a newer shell version. PS: Never bother to mention hql in IRC.
41 Sample Php File To Communicate With HBASE // open a new connection to rest server. Hbase Master default port is $hbase = new hbase_rest($ip, $port); // get list of tables $tables = $hbase->list_tables(); // get table column family names and compression stuff $table_info=$hbase>table_schema("search_index");
42 Sample Php File (Contd..) // get start and end row keys of each region $regions = $hbase->regions($table); // select data from hbase $results = $hbase->select($table,$row_key); // insert data into hbase the $column and $data can be arrays with more then one column inserted in one request $hbase->insert($table,$row,$column(s),$data(s));
43 Sample Php File (Contd..) // start a scanner on a set range of table $handle = $hbase- >scanner_start($table,$cols,$start_row,$end_row); // pull the next row of data for a scanner handle $results = $hbase->scanner_get($handle); // delete a scanner handle $hbase->scanner_delete($handle);
44 Overview Basically the file uploaded through the web page is inserted into the HBASE in the form of its byte representation. On being requested the file, depending upon the key, we select a region of HBASE and output the user the corresponding file.
45 Table Schema and a Lot More!
46 The Hbase Table The table consists of a row key which Is unique. Associated with the row key it has column family. The column family comprises of two columns. One stores the file name whereas the other stores the actual file data.
47 HBASE Schema Row Post TempAddress+timestamp Name Data (in bytes) Hdfs://Downloads :12:07 DiaryofJane.mp
48 HBASE Schema (Contd..) Give a unique row key corresponding to a column family. Associate a time-stamp with the temporary download location of each file. The time-stamp associated includes the time of upload+the date of upload to nullify the clashes.
49 Backend Associated: The file available in the temporary download location is copied into the HBASE Php used as a framework. The Thrift API acts as a bridge for php to communicate with the HBASE. The thrift API enables socket connection The php code runs the HBASE code written in Java in the hbase directory.
50 Backend Associated-Thrift A software library and set of code generation tools. Developed by Facebook. Used for implementing efficient and scalable backend services. Goal: To enable efficient and reliable communication across programming languages.
51 Backend Associated (contd..) The java code takes as argument the download location and the actual file along with the file name. A time-stamp is then associated with the download location. The download location being fixed for every user, we are able to generate a unique key using time-stamp.
52 Backend Associated (contd..) Associate a file stream to read the file Change file into its corresponding byte representation using JAVA methods. Create a put object associated with the table using the row key. The byte representation of the file and the file name is then fed into this put object. The put object then inserts the data in HBASE.
53 Code Snippet Public class HbaseClient{ Public static void main (String args[]) throws IO Exception { String rowkey=time+. +temp; Put p=new Put(Bytes.toBytes(rowkey)); } } p.add(bytes.tobytes( post ),Bytes.toBytes( name ),Byt es.tobytes(temp));
54 Program Execution The url associated with the file is then returned to the user The url when clicked is passed as an argument to another java file interacting with HBASE. This java file creates a get object. Associated with the url, which is also a unique row key for the table, it returns to the user the data.
55 Conclusion The Thrift API enables the php code to communicate with the java code written in Hbase directory. There still remains an isssue with the scalablity of the project. That we could handle by storing the files in distributed HDFS cluster as the temporary location.
56 Conclusion (contd..) We could then scale the Hbase tables across multiple nodes in case the size of the table grows large. This scaling of the HBASE table could be done on the basis of the regions associated with the table.
57 Problems Faced Lack of documentation on the Thrift API Lack of sample codes or tutorials. Even the thrift home page contains links to the tutorial that doesn t work. Lots of scenarios to stumble upon and explore.
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationXiaoming Gao Hui Li Thilina Gunarathne
Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationIntroduction to Hbase Gkavresis Giorgos 1470
Introduction to Hbase Gkavresis Giorgos 1470 Agenda What is Hbase Installation About RDBMS Overview of Hbase Why Hbase instead of RDBMS Architecture of Hbase Hbase interface Summarise What is Hbase Hbase
More informationHBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367
HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive
More informationHareDB HBase Client Web Version USER MANUAL HAREDB TEAM
2013 HareDB HBase Client Web Version USER MANUAL HAREDB TEAM Connect to HBase... 2 Connection... 3 Connection Manager... 3 Add a new Connection... 4 Alter Connection... 6 Delete Connection... 6 Clone Connection...
More informationA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop Ecosystem Raj Nair Director Data Platform Kiru Pakkirisamy CTO AGENDA About Penton and Serendio Inc Data Processing at Penton PoC Use Case Functional
More informationOpen source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
More informationHypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
More informationScaling Up 2 CSE 6242 / CX 4242. Duen Horng (Polo) Chau Georgia Tech. HBase, Hive
CSE 6242 / CX 4242 Scaling Up 2 HBase, Hive Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le
More informationData storing and data access
Data storing and data access Plan Basic Java API for HBase demo Bulk data loading Hands-on Distributed storage for user files SQL on nosql Summary Basic Java API for HBase import org.apache.hadoop.hbase.*
More informationGetting Started with SandStorm NoSQL Benchmark
Getting Started with SandStorm NoSQL Benchmark SandStorm is an enterprise performance testing tool for web, mobile, cloud and big data applications. It provides a framework for benchmarking NoSQL, Hadoop,
More informationBig Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform interactive execution of map reduce jobs Pig is the name of the system Pig Latin is the
More informationIntegration of Apache Hive and HBase
Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Page 1 About Me User and committer of Hadoop since 2007 Contributor to Apache Hadoop, HBase, Hive and Gora Joined
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationComparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationApache HBase: the Hadoop Database
Apache HBase: the Hadoop Database Yuanru Qian, Andrew Sharp, Jiuling Wang Today we will discuss Apache HBase, the Hadoop Database. HBase is designed specifically for use by Hadoop, and we will define Hadoop
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationBigtable is a proven design Underpins 100+ Google services:
Mastering Massive Data Volumes with Hypertable Doug Judd Talk Outline Overview Architecture Performance Evaluation Case Studies Hypertable Overview Massively Scalable Database Modeled after Google s Bigtable
More informationCloudera Certified Developer for Apache Hadoop
Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number
More informationCloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
More informationRealtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationBig Data with Component Based Software
Big Data with Component Based Software Who am I Erik who? Erik Forsberg Linköping University, 1998-2003. Computer Science programme + lot's of time at Lysator ACS At Opera Software
More informationBig Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
More informationDevelopment of nosql data storage for the ATLAS PanDA Monitoring System
Development of nosql data storage for the ATLAS PanDA Monitoring System M.Potekhin Brookhaven National Laboratory, Upton, NY11973, USA E-mail: potekhin@bnl.gov Abstract. For several years the PanDA Workload
More informationUsing MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationFacebook s Petabyte Scale Data Warehouse using Hive and Hadoop
Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop Why Another Data Warehousing System? Data, data and more data 200GB per day in March 2008 12+TB(compressed) raw data per day today Trends
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationStorage of Structured Data: BigTable and HBase. New Trends In Distributed Systems MSc Software and Systems
Storage of Structured Data: BigTable and HBase 1 HBase and BigTable HBase is Hadoop's counterpart of Google's BigTable BigTable meets the need for a highly scalable storage system for structured data Provides
More informationHadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com
Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since Feb
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers
More informationThe Hadoop Eco System Shanghai Data Science Meetup
The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related
More informationBig Table A Distributed Storage System For Data
Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationFacebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
More informationHypertable Goes Realtime at Baidu. Yang Dong yangdong01@baidu.com Sherlock Yang(http://weibo.com/u/2624357843)
Hypertable Goes Realtime at Baidu Yang Dong yangdong01@baidu.com Sherlock Yang(http://weibo.com/u/2624357843) Agenda Motivation Related Work Model Design Evaluation Conclusion 2 Agenda Motivation Related
More informationBig Data: Tools and Technologies in Big Data
Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can
More informationFile S1: Supplementary Information of CloudDOE
File S1: Supplementary Information of CloudDOE Table of Contents 1. Prerequisites of CloudDOE... 2 2. An In-depth Discussion of Deploying a Hadoop Cloud... 2 Prerequisites of deployment... 2 Table S1.
More informationMADOCA II Data Logging System Using NoSQL Database for SPring-8
MADOCA II Data Logging System Using NoSQL Database for SPring-8 A.Yamashita and M.Kago SPring-8/JASRI, Japan NoSQL WED3O03 OR: How I Learned to Stop Worrying and Love Cassandra Outline SPring-8 logging
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationВовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН
Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Zettabytes Petabytes ABC Sharding A B C Id Fn Ln Addr 1 Fred Jones Liberty, NY 2 John Smith?????? 122+ NoSQL Database
More informationReal-Time Handling of Network Monitoring Data Using a Data-Intensive Framework
Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework Aryan TaheriMonfared Tomasz Wiktor Wlodarczyk Chunming Rong Department of Electrical Engineering and Computer Science University
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationProcessing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems
Processing of massive data: MapReduce 2. Hadoop 1 MapReduce Implementations Google were the first that applied MapReduce for big data analysis Their idea was introduced in their seminal paper MapReduce:
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationHow To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
More informationCOSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015
COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt
More informationproject collects data from national events, both natural and manmade, to be stored and evaluated by
Joseph Sebastian CS 2994 Spring 2014 Undergraduate Research Final Paper GOALS The goal of my research was to assist the Integrated Digital Event Archive (IDEAL) team in transferring their Twitter data
More informationUQC103S1 UFCE47-20-1. Systems Development. uqc103s/ufce47-20-1 PHP-mySQL 1
UQC103S1 UFCE47-20-1 Systems Development uqc103s/ufce47-20-1 PHP-mySQL 1 Who? Email: uqc103s1@uwe.ac.uk Web Site www.cems.uwe.ac.uk/~jedawson www.cems.uwe.ac.uk/~jtwebb/uqc103s1/ uqc103s/ufce47-20-1 PHP-mySQL
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationScaling Up HBase, Hive, Pegasus
CSE 6242 A / CS 4803 DVA Mar 7, 2013 Scaling Up HBase, Hive, Pegasus Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationMySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
More informationHadoop Distributed File System (HDFS) Overview
2012 coreservlets.com and Dima May Hadoop Distributed File System (HDFS) Overview Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized
More informationthe missing log collector Treasure Data, Inc. Muga Nishizawa
the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days
More informationData-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationSpark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1
Spark ΕΡΓΑΣΤΗΡΙΟ 10 Prepared by George Nikolaides 4/19/2015 1 Introduction to Apache Spark Another cluster computing framework Developed in the AMPLab at UC Berkeley Started in 2009 Open-sourced in 2010
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationOpen Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
More informationWhy NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
More informationAgenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases
Postgres & Hadoop Agenda! Strengths of PostgreSQL! Strengths of Hadoop! Hadoop Community! Use Cases Best of Both World Postgres Hadoop World s most advanced open source database solution Enterprise class
More informationBig Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is
More informationAssignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
More informationCSCI110 Exercise 4: Database - MySQL
CSCI110 Exercise 4: Database - MySQL The exercise This exercise is to be completed in the laboratory and your completed work is to be shown to the laboratory tutor. The work should be done in week-8 but
More informationMySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)
MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) Erdélyi Ernő, Component Soft Kft. erno@component.hu www.component.hu 2013 (c) Component Soft Ltd Leading Hadoop Vendor Copyright 2013,
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationQuerying Massive Data Sets in the Cloud with Google BigQuery and Java. Kon Soulianidis JavaOne 2014
Querying Massive Data Sets in the Cloud with Google BigQuery and Java Kon Soulianidis JavaOne 2014 Agenda Massive Data Qualification A Big Data Problem What is BigQuery? BigQuery Java API Getting data
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationCost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
More informationData Pipeline with Kafka
Data Pipeline with Kafka Peerapat Asoktummarungsri AGODA Senior Software Engineer Agoda.com Contributor Thai Java User Group (THJUG.com) Contributor Agile66 AGENDA Big Data & Data Pipeline Kafka Introduction
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationIntroduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationAdvanced Java Client API
2012 coreservlets.com and Dima May Advanced Java Client API Advanced Topics Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized Hadoop
More informationCDH installation & Application Test Report
CDH installation & Application Test Report He Shouchun (SCUID: 00001008350, Email: she@scu.edu) Chapter 1. Prepare the virtual machine... 2 1.1 Download virtual machine software... 2 1.2 Plan the guest
More informationReal Time Micro-Blog Summarization based on Hadoop/HBase
Real Time Micro-Blog Summarization based on Hadoop/HBase Sanghoon Lee and Sunny Shakya Abstract Twitter is an immensely popular micro-blogging site where people post short messages of 140 characters called
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
More informationApache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
More informationInternals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
More informationHadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
Hadoop for MySQL DBAs + 1 About me Sarah Sproehnle, Director of Educational Services @ Cloudera Spent 5 years at MySQL At Cloudera for the past 2 years sarah@cloudera.com 2 What is Hadoop? An open-source
More informationBig Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
More informationORACLE GOLDENGATE BIG DATA ADAPTER FOR HIVE
ORACLE GOLDENGATE BIG DATA ADAPTER FOR HIVE Version 1.0 Oracle Corporation i Table of Contents TABLE OF CONTENTS... 2 1. INTRODUCTION... 3 1.1. FUNCTIONALITY... 3 1.2. SUPPORTED OPERATIONS... 4 1.3. UNSUPPORTED
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationComplete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
More informationForensic Clusters: Advanced Processing with Open Source Software. Jon Stewart Geoff Black
Forensic Clusters: Advanced Processing with Open Source Software Jon Stewart Geoff Black Who We Are Mac Lightbox Guidance alum Mr. EnScript C++ & Java Developer Fortune 100 Financial NCIS (DDK/ManTech)
More informationNot Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)
Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure
More information