Using Big Data and GIS to Model Aviation Fuel Burn

Similar documents

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Chapter 7. Using Hadoop Cluster and MapReduce

A Performance Analysis of Distributed Indexing using Terrier

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

A programming model in Cloud: MapReduce

Apache Hadoop. Alexandru Costan

How To Use Hadoop For Gis

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Data Mining in the Swamp

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Developing a MapReduce Application

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Yuji Shirasaki (JVO NAOJ)

Cloud Security in Map/Reduce An Analysis July 31, Jason Schlesinger

How To Handle Big Data With A Data Scientist

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Hadoop Architecture. Part 1

Integrating Big Data into the Computing Curricula

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

A Dynamic Programming Approach for 4D Flight Route Optimization

Hadoop and Map-reduce computing

Map Reduce & Hadoop Recommended Text:

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Hadoop Scheduler w i t h Deadline Constraint

Chase Wu New Jersey Ins0tute of Technology

Hadoop and Map-Reduce. Swati Gore

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Cloud-based data warehousing to power aviation analytics

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Outils pour l'analyse prédictive parallèle de multiples sources de données non structurées

Optimization and analysis of large scale data sorting algorithm based on Hadoop

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet

NCDC's Application of Climate Data to Tourism Business Decision-Making

Analyzing Big Data with AWS

Big Data. A general approach to process external multimedia datasets. David Mera

Redundant Data Removal Technique for Efficient Big Data Search Processing

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

The Rebirth of Aviation and Air Transportation in Ohio

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

BIG DATA What it is and how to use?

A Service for Data-Intensive Computations on Virtual Clusters

Hadoop. Sunday, November 25, 12

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

Log Mining Based on Hadoop s Map and Reduce Technique

Bringing Big Data Modelling into the Hands of Domain Experts

SEAIP 2009 Presentation

Implement Hadoop jobs to extract business value from large and varied data sets

The future: Big Data, IoT, VR, AR. Leif Granholm Tekla / Trimble buildings Senior Vice President / BIM Ambassador

THE STATE OF GEO BIG DATA IN OPEN SOURCE. Rob Emanuele

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Distributed Framework for Data Mining As a Service on Private Cloud

Analysis of Operational Flight Data in Hadoop using MapReduce and the MATLAB Distributed Computing Server (MDCS)

UPS battery remote monitoring system in cloud computing

Big Data With Hadoop

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

A High-availability and Fault-tolerant Distributed Data Management Platform for Smart Grid Applications

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Flight information that keeps pilots a step ahead.

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

A Brief Outline on Bigdata Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Big data and its transformational effects

NoSQL and Hadoop Technologies On Oracle Cloud

MapReduce (in the cloud)

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

Accelerating and Simplifying Apache

CSE-E5430 Scalable Cloud Computing Lecture 2

What is Analytic Infrastructure and Why Should You Care?

Similarity Search in a Very Large Scale Using Hadoop and HBase

Data Processing Solutions - A Case Study

Search and Real-Time Analytics on Big Data

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

The Stratosphere Big Data Analytics Platform

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

HadoopRDF : A Scalable RDF Data Analysis System

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Survey on Load Rebalancing for Distributed File System in Cloud

Using In-Memory Computing to Simplify Big Data Analytics

Big Data Analytics Using R

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

Transform Your Business and Protect Your Cisco Nexus Investment While Adopting Cisco Application Centric Infrastructure

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Predicting Flight Delays

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

Transcription:

Using Big Data and GIS to Model Aviation Fuel Burn Gary M. Baker USDOT Volpe Center 2015 Transportation DataPalooza June 17, 2015 The National Transportation Systems Center Advancing transportation innovation for the public good U.S. Department of Transportation Office of the Secretary of Transportation John A. Volpe National Transportation Systems Center

Presentation Outline Big Data Processing Input Data and Fuel Burn Model Analysis 2

Presentation Outline Big Data Processing Input Data and Fuel Burn Model Analysis 3

Background: Map Reduce 2004 publication from Google MapReduce: Simplified Data Processing on Large Clusters MapReduce is a programming model and an associated implementation for processing large data sets. Map function and Reduce function Many real world tasks are expressible in this model The MapReduce system orchestrates distributed servers, runs various tasks in parallel, manages all communications and data transfers between the various parts of the system, and provides for redundancy and fault tolerance. http://en.wikipedia.org/wiki/mapreduce 4

Background: Hadoop Apache Hadoop is open-source software used for reliable, scalable, distributed computing. Framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. (i.e. Map Reduce) Designed to scale up from single servers to thousands of machines, each offering local computation and storage. Designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. http://hadoop.apache.org 5

Traditional Databases vs. Hadoop Sports Car: Freight Train: From: http://www.basas.com/newsletters/20110705/presentation/kerem%20tomak.ppt 6

Presentation Outline Big Data Processing Input Data and Fuel Burn Model Analysis 7

Input Flight Data: Flight Track National Airspace System (NAS) radar based latitude, longitude, altitude, time Other parts of the world airways, waypoints great circle 8

Input Flight Data: Altitude Profile Altitude varies throughout the flight affecting fuel burn LAX BOS 9

Fuel Burn Model Model outputs many segments for each flight Fuel burn is uniform over segment Average of over 100 segments per flight 10

Route Variations One year of flights from LAX to BOS Winds Weather Traffic Restrictions Procedures This is only one of over 850,000 origin destination pairs! 11

Altitude Variations Winds Weather Traffic Restrictions Procedures Load Equipment 12

Presentation Outline Big Data Processing Input Data and Fuel Burn Model Analysis 13

Objective Intersect fuel burn segments with 36 km Equal-Area Scalable Earth Grid Determine total fuel burn for each grid cell 14

Sounds Like a Big Data Problem ~ 30 million flights globally per year * average of over 100 segments per flight * multiple analysis years * different scenarios We could use traditional statistical approaches to model flight tracks and altitudes but that s not the big data way! 15

Steps Preprocess data - reformat, subset, project Write and debug program locally (Java) Adapt program to MapReduce model and Hadoop environment Overcome cloud learning curve o connect, execute, debug, automate, etc. Upload data to the cloud Execute and retrieve results 16

Results GRID CELL ID FUEL BURN 104028 3.0 104029 46489.4 104030 4795.5 104031 985.1 104032 84491.3 104033 622.9 104034 42.9 104035 7736.3 104036 37620 104037 68749.3 104038 12308.8 104039 361.5 104040 9902.7 104041 263496.4 104042 88256.5 104043 14688.3 104044 114664.9 104045 17361.4 104046 1082.5 104047 3.0 36 km EASE-Grid has 391,384 cells. Final result is a text file that contains the total fuel burn for each cell. Over 1.5 terabytes/year reduced to a single 5MB result Data Information Knowledge 3 billion segments (30 millions flights) can be processed in just over 1 hour on an average 5 computer cluster. On a good local server the same was done in about 4 hours. 17

Results and Further Analysis in GIS Beyond a Simple Map Total fuel burn by region annual comparisons scenario comparisons overlay with other climate datasets by aircraft type or class altitude slices time animations 18

Thank You 19