A Distributed Storage Schema for Cloud Computing based Raster GIS Systems. Presented by Cao Kang, Ph.D. Geography Department, Clark University



Similar documents
INTRODUCTION TO CASSANDRA

How To Scale Out Of A Nosql Database

Cloud Computing at Google. Architecture

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

The Quest for Extreme Scalability

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Open source large scale distributed data management with Google s MapReduce and Bigtable

Comparing SQL and NOSQL databases

Preparing Your Data For Cloud

Cloud Scale Distributed Data Storage. Jürmo Mehine

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Slave. Master. Research Scholar, Bharathiar University

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

So What s the Big Deal?

Big Systems, Big Data

Introduction to Big Data Training

Challenges for Data Driven Systems

Infrastructures for big data

Benchmarking and Analysis of NoSQL Technologies

Lecture Data Warehouse Systems

NoSQL Data Base Basics

Dominik Wagenknecht Accenture

Database Scalability {Patterns} / Robert Treat

How To Use Big Data For Telco (For A Telco)

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Criteria to Compare Cloud Computing with Current Database Technology

Basics on Geodatabases

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

bigdata Managing Scale in Ontological Systems

BIG DATA What it is and how to use?

Xiaoming Gao Hui Li Thilina Gunarathne

NoSQL Database Systems and their Security Challenges

Scaling Up 2 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

NoSQL and Hadoop Technologies On Oracle Cloud

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Cassandra A Decentralized Structured Storage System

Hypertable Architecture Overview

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Big Data and Apache Hadoop s MapReduce

Big Data With Hadoop

Structured Data Storage

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Practical Cassandra. Vitalii

Ubuntu and Hadoop: the perfect match

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

2.1.5 Storing your application s structured data in a cloud database

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Market Basket Analysis Algorithm on Map/Reduce in AWS EC2

Introduction to Apache Cassandra

Architectures for Big Data Analytics A database perspective

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Apache HBase. Crazy dances on the elephant back

Scalable Forensics with TSK and Hadoop. Jon Stewart

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Similarity Search in a Very Large Scale Using Hadoop and HBase

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Scaling Up HBase, Hive, Pegasus

Hadoop IST 734 SS CHUNG

Implementing GIS in Optical Fiber. Communication

OPTIMIZED AND INTEGRATED TECHNIQUE FOR NETWORK LOAD BALANCING WITH NOSQL SYSTEMS

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

No-SQL Databases for High Volume Data

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece

Databases 2 (VU) ( )

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

CloudDB: A Data Store for all Sizes in the Cloud

Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Workshop on Hadoop with Big Data

Cloud Computing Training

A programming model in Cloud: MapReduce

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

Introduction to Hbase Gkavresis Giorgos 1470

A Survey of Distributed Database Management Systems

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Making Sense of NoSQL Dan McCreary Ann Kelly

Transcription:

A Distributed Storage Schema for Cloud Computing based Raster GIS Systems Presented by Cao Kang, Ph.D. Geography Department, Clark University

Cloud Computing and Distributed Database Management System Why distributed database in Cloud? Better scalability, availability, reliability Match each other: GFS/Bigtable, Hadoop/HBase Categories: NoSQL Distributed Database Management System (NDDBMS) Relational Database Management System (RDBMS) sharding

What is NoSQL NoSQL is the term used to designate database management systems that differ from classic RDBMS in some way. * Data stores may not require fixed table schemas Usually avoid join operations Typically scale horizontally * Wikipedia: http://en.wikipedia.org/wiki/nosql_%28concept%29

Why NoSQL Distributed DBMS? Scalabilities: Scale Out vs. Scale Up Keep adding more CPUs and memories into one expensive giant server or buy two smaller servers Capacity and cost do not go up in a linear way RDBMS (non-sharding) can scale up, but not scale out Cost: can use commodity computers Easy to work with: More friendly than RDBMS sharding

Current NDDBMS on the Market Proprietary NDDBMS: Google Bigtable Amazon Dynamo Opensource NDDBMS: HBase Cassandra

Why HBase? HBase offers good scalability. HBase is built to use commodity hardware. HBase can host huge volumes of data. HBase offers high availabilities and reliabilities. HBase offers strong consistency through its data operations. vs. RDBMS vs. Cassandra

HBase Data Model Conceptual Conceptual View: View Column based (vs. RDBMS row based) Each column family may contain multiple qualifiers Each cell is identified by: {row, column, version} column family : qualifier = value submerged_village_area : village_a = 20000 m 2

HBase Data Model Physical Storage Conceptual view Physical storage No null storage necessary

HBase Data Model Each column family is stored in a separate unit. Physically, all column family members are stored together on the file system. Tables are split into pieces based on row ranges, these pieces of tables are called Regions.

HBase Architecture ROOT Region Server META Region Server Data Region Server In theory, a three-tier HBase can host up to 2 34 regions with each region hosting 256M of data, which equals up to 4096 petabytes of data.

Limits of HBase for GIS Systems 1. The database cell size should be kept small, which in general should not be larger than 20M. 2. The number of column families cannot be infinite.

Storage Schema for Raster GIS Systems Two Different Data Types High 3rd Dimensional Data (H3D data) Such as time series data (100+ bands) Low 3rd Dimensional Data (L3D data) Such as ETM+ RGB data (3 bands)

Two Pixel Storage Modes Store H3D Data in S-mode (band interleaved by pixel) Store L3D Data in T-mode (band sequential)

Data Models in HBase High 3rd Dimensional Data Time locality has higher priority than space locality. Low 3rd Dimensional Data Space locality is better preserved.

Sub-Image Block Generation and Indexing A Q-tree alike splitting to sub-divide images Keys are sorted lexicographically Multi-level indexing based on Q-tree structure can better preserve locality

A Working Example Google Maps: Data in Bigtable * Load balancing http://mt0.google.com/mt?n=404&v=&x=0&y=0&zoom=16 http://mt1.google.com/mt?n=404&v=&x=1&y=0&zoom=16 http://mt2.google.com/mt?n=404&v=&x=0&y=1&zoom=16 http://mt3.google.com/mt?n=404&v=&x=1&y=1&zoom=16 http://mt1.google.com/vt/lyrs=h@149&hl=en&x=19651&s=&y=24321&z=16&s=ga http://mt0.google.com/vt/lyrs=h@149&hl=en&x=19646&s=&y=24323&z=16&s=galil Before Current *From Chang, et al, Bigtable: A Distributed Storage System for Structured Data, OSDI, 2006.

Current Development Status

Next Step Co-processing, which could greatly increase real-time spatial operation speed HBase Server (data &(data) processing ) HBase Client (controlling) (processing) Vector data support HBase Server (data &(data) processing ) HBase Server (data &(data) processing )

Thank you! Any questions?