Data Science in the Travel Industry



Similar documents
Where should DMOs spend their marketing budget?

Cloud-based data warehousing to power aviation analytics

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

How To Create A Data Visualization With Apache Spark And Zeppelin

Designing Agile Data Pipelines. Ashish Singh Software Engineer, Cloudera

The Internet of Things and Big Data: Intro

Jeff Edwards Head of Global Hotels Group June 2013

BIG DATA ARCHITECTURE AND ANALYTICS BIG DATA STRATEGIES FOR BUSINESS GROWTH

Are You Big Data Ready?

A Case Study of Hadoop in Healthcare

SAP Travel Management with Amadeus. Sales & e-commerce

Ali Ghodsi Head of PM and Engineering Databricks

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Modern marketing trends - Ancillary Revenue

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

HDP Hadoop From concept to deployment.

Bringing Intergalactic Data Speak (a.k.a.: SQL) to Hadoop Martin Willcox Director Big Data Centre of Excellence (Teradata

The Future of Data Management

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify

Big Data and Data Science. The globally recognised training program

Big Data: Making sense of the numbers

Cross-Sell Ancillary Services Add to the flight, a car, insurance and a night

Cloudera Enterprise Data Hub in Telecom:

Amadeus Media Solutions. Media Solutions. Set your objectives. & aim at your target

Unlocking the True Value of Hadoop with Open Data Science

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Databricks. A Primer

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

NOT IN KANSAS ANY MORE

Big Data Success Step 1: Get the Technology Right

Big Data Use Case: Business Analytics

The digital future and dealing with disruption

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

How Companies are! Using Spark

HiBench Introduction. Carson Wang Software & Services Group

Cloud Big Data Architectures

Consulting and Systems Integration (1) Networks & Cloud Integration Engineer

Go with the flow: Transportation information management solutions from IBM.

CRITEO INTERNSHIP PROGRAM 2015/2016

Databricks. A Primer

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Data Pipeline with Kafka

Information-Driven Transformation in Retail with the Enterprise Data Hub Accelerator

TECHNOLOGY TRANSFER PRESENTS INTERNATIONAL. Rome, December Residenza di Ripetta Via di Ripetta, 231 CONFERENCE BIG DATA

Hyperion is Business Performance Management

Real Time Data Processing using Spark Streaming

Tap into Hadoop and Other No SQL Sources

Business Travel in Asia Pacific: Setting the Smart Course. The Essential Asia Pacific Corporate Travel Handbook

Driving Growth in Insurance With a Big Data Architecture

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Enabling the Modern Business through IT. How Cloud ERP Can Help Meet Rapidly Evolving Business Needs

Introduction to Apache Kafka And Real-Time ETL. for Oracle DBAs and Data Analysts

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Travelport. Product Type(s) Contact Details. Company Information. Product Information. Air Reservations. Car Reservations. Global Distribution System

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Safe Harbor Statement

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

How to Use Boards for Competitive Intelligence

Qantas cross-sell Case study

[Hadoop, Storm and Couchbase: Faster Big Data]

The Truth about Finding the Lowest Airfare

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Big Data Analytics! Architectures, Algorithms and Applications! Part #3: Analytics Platform

Dansk IT Big Data i de største danske banker

2015 European business travel and expense analysis. Ten key insights into business travel and expense trends and corporate practices

How to Run a Successful Big Data POC in 6 Weeks

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Six top tips for travel managers to create savings in 2015

Amadeus Cars Technology Solutions. Cars. Love cars. & love Amadeus

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Apache Flink Next-gen data analysis. Kostas

More Data in Less Time

Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015

Deploying an Operational Data Store Designed for Big Data

Increasing Demand Insight and Forecast Accuracy with Demand Sensing and Shaping. Ganesh Wadawadigi, Ph.D. VP, Supply Chain Solutions, SAP

Airline Schedule Development

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Big Data. threat or opportunity? Philip Louis CEO Vine Cube UK Singapore India

P&G. Harnessing the Power of Real-Time Information Helping P&G make better, faster decisions.

The Challenge of the Low-cost Airlines

Getting Real Real Time Data Integration Patterns and Architectures

TDWI: BUSINESS INTELLIGENCE & DATA WAREHOUSING EDUCATION EUROPE

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

How To Understand The Benefits Of Big Data

The Future of Data Management with Hadoop and the Enterprise Data Hub

Transcription:

Data Science in the Travel Industry Real-world experience using current leading frameworks Paul Balm September 2015

Hello! _I am from Amsterdam, since 2005 in Madrid _1999 2004: Ph.D. in particle physics at Fermilab in Chicago _2005 20014: Data Processing projects for the European Space Agency _2014 present: Data Scientist at Amadeus Travel Intelligence Page 2

Agenda _ Travel Industry and Amadeus _ A User Story _ Experiences with Different Technologies _ Conclusions & Final Remarks All views expressed in this presentation are entirely my own Page 3

1 The Travel Industry and Amadeus

Amadeus in a few words 1 _ Introduction Amadeus is a technology company dedicated to the global travel industry. We are present in 195 countries with a worldwide team of more than 12,000 people. Our solutions help improve the business performance of travel agencies, corporations, airlines, airports, hotels, railways and more. 2015 Amadeus IT Group SA Page 5

We started our journey in 1987 2 _ Amadeus history Page 6

A history of shaping the future of travel 2 _ Amadeus history 1987 Amadeus is founded 1992 First booking made through Amadeus 1995 World leader in number of travel agency locations 1996 e-commerce division launched 1998 1 MILLION bookings made on a single day for the first time 2000 Partnership with BA and Qantas to launch Amadeus Altéa our core Airline IT offering 2010 Amadeus diversifies into IT for hotel, rail and airport 2014 Contracts with Ryanair and Southwest Airlines and a strategic technology relationship with IHG 2015 Amadeus IT Group SA Page 7

Robust global operations 3 _ Amadeus today 1.6+ billion data requests processed per day 525+ million travel agency bookings processed in 2014 695+ million Passengers Boarded (PBs) in 2014 95% of the wrld s scheduled network airline seats 2015 Amadeus IT Group SA Page 8

Connecting the entire travel ecosystem Cruiselines Airlines Hotels Insurance companies Car rental Travel agencies Ground handlers Airports Ground transportation Ferry operators 2015 Amadeus IT Group SA Page 9

Supporting the entire traveller life cycle Post-trip Inspire On trip Search Pre-trip Buy/Purchase 2015 Amadeus IT Group SA Page 10

2015 Amadeus IT Group SA 2 A User Story

A Network Carrier _An airline operating a network with hubs AMS BER _Any risks or opportunities for this operation? FRA Page 12

Schedule Analysis Analyse global schedules to alert of any changes in capacity Schedule information may come from _Commercially provided feeds _Airline inventory data _Publicly available databases (EuroStats, US DoT) Page 13

Schedule Analysis Page 14

Schedule Analysis Batch processing of different formats of input data _Different formats, different sources and a priori unclear which sources will be useful or when _Store everything always within legal and moral boundaries _Central data site with large NAS (network attached storage) _HDFS based storage at hosted processing clusters _Maintain separation of different customer projects Page 15

A Network Carrier _An airline operating a network with hubs AMS BER _Schedule analysis shows competitor entering the market FRA _What s the risk? Page 16

Quality of Service QSI: Quality of Service Index The prime quality metric: Time to get from A to B In practice many rules to build QSI that require tuning and iteration Agility as a requirement Need for scalable processing capacity Page 17

A Network Carrier _An airline operating a network with hubs AMS BER _Schedule analysis shows competitor entering the market FRA _Risk of loss of market share assessed via QSI analysis _How to react? Page 18

Future revenue streams Information about future revenue comes in many forms _Tickets issued _Reservations (foreseen revenue) _Searches _Capacity changes _Sentiments on social networks, weather forecasts and the like _How are average fares developing? _How are passenger counts developing? Requires fast updates live if possible Page 19

Streaming? No hard truths here considerations of options only! _Response times of the order of seconds are not required in the presented use-case _We still require to process the large amount of historical data _We think processing the historical data is better done in batch Mature and reliable approach Can we reuse the streaming code for batch processing? _Flink: Suitability for batch processing to be demonstrated _Storm: Using Summingbird, combine with Scalding? _Spark Streaming: Combine with Spark for batches. Maturity? Page 20

A Network Carrier _An airline operating a network with hubs AMS BER _Schedule analysis shows competitor entering the market FRA _Risk of loss of market share assessed via QSI analysis _How to react? Page 21

3 Technologically

The Travel Intelligence Engine Scalable cloud-based Hadoop platform Amadeus Data (Altéa and other) Customer Private Data Industry Weather Demographics Social Media Etc. Visualization layer Applications Raw Data Storage * Batch and streaming processingoutput storage Data Warehouse (* All the raw data is kept after processing to allow for re-processing when required) Page 23

Lambda architecture Image reproduced from http://lambda-architecture.net/ Page 24

Page 25

Page 26

Page 27 Scoobi

Page 28 Scoobi

Scoobi Scalding Page 29

Benchmarking Spark vs. Scalding by @BoxHQ Source: Box Engineering Blog Page 30

Scoobi Scaldin g Impala Page 31

Scoobi Scaldin g Impala Storm Kafka Page 32

Scoobi Scaldin g Kafka Impala Storm Web Services Visualization tools Page 33

Scoobi Scaldin g Kafka Impala Storm Web Services Python: Flask Scala: Spray Visualization tools Page 34

Scoobi Scaldin g Kafka Impala Storm Web Services Python: Flask Scala: Spray Visualization Tools HTML5/JS Page 35

Recap Data science opportunities in the travel sector How we are dealing with these opportunities at Amadeus Travel Intelligence An overview of our experience with different tools What s on our horizon for evaluation and why Page 36

Some final remarks Different technologies currently under evaluation In all areas (batch, streaming, visualizations) too many options to carefully evaluate all Also, development effort from the community is diluted Consolidation would be welcome! Even so, tools are advancing rapidly and within their limitations, many are of great value Page 37

Thank you @paul_balm https://es.linkedin.com/in/paulbalm