Leveraging Big Data. A case study from Thomson Reuters

Similar documents
DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

Using an In-Memory Data Grid for Near Real-Time Data Analysis

BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO

Social Market Analytics, Inc.

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Copyright 2013 Splunk Inc. Introducing Splunk 6

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Our Data & Methodology. Understanding the Digital World by Turning Data into Insights

Social Media analysis: A very useful tool for trading and investing. Gerhard Lampen Head Sanlam itrade Online

MEAN/Full Stack Web Development - Training Course Package

Class 2: Buying Stock & Intro to Charting. Buying Stock

NoSQL web apps. w/ MongoDB, Node.js, AngularJS. Dr. Gerd Jungbluth, NoSQL UG Cologne,

Agriculture Production, Trading and Brokerage SEEK MORE

TUT NoSQL Seminar (Oracle) Big Data

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Client Overview. Engagement Situation. Key Requirements

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Delivering secure, real-time business insights for the Industrial world

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Ironfan Your Foundation for Flexible Big Data Infrastructure

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Information Retrieval Elasticsearch

Financial Text Mining

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD METAMARKETS

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

COMP9321 Web Application Engineering

Localizing Your Mobile App is Good for Business

Chapter 7. Using Hadoop Cluster and MapReduce

Open Source Technologies on Microsoft Azure

WHITEPAPER. Text Analytics Beginner s Guide

Understanding the Equity Summary Score Methodology

An Oracle White Paper October Oracle: Big Data for the Enterprise

Distributed Computing and Big Data: Hadoop and MapReduce

Trends in Business Intelligence

The 4 Pillars of Technosoft s Big Data Practice

Welcome. Host: Eric Kavanagh. The Briefing Room. Twitter Tag: #briefr

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

How To Handle Big Data With A Data Scientist

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

KE Trade PRO Finding The Right Stocks To Buy

How To Make Sense Of Data With Altilia

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Please note trading advice and risk statement on pages three and four

The Big Data Paradigm Shift. Insight Through Automation

BIG DATA TRENDS AND TECHNOLOGIES

Assignment # 1 (Cloud Computing Security)

The Internet of Things

Improve performance and availability of Banking Portal with HADOOP

Agenda. Success Stories with OpenShift. 11:15-11:45 am. OpenShift Tech Overview 9:40-10:30 am. Red Hat Mobile on OpenShift 10:45-11:15 am

Schema Design Patterns for a Peta-Scale World. Aaron Kimball Chief Architect, WibiData

Evaluation of NoSQL databases for large-scale decentralized microblogging

Using In-Memory Computing to Simplify Big Data Analytics

Tableau Server Scalability Explained

CRITEO INTERNSHIP PROGRAM 2015/2016

GigaSpaces Real-Time Analytics for Big Data

Big Systems, Big Data

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Investigating Hadoop for Large Spatiotemporal Processing Tasks

Why Big Data in the Cloud?

Server Consolidation with SQL Server 2008

DIGITAL SOLUTIONS EMPOWER ADVISORS AND ENHANCE THE ONLINE CUSTOMER EXPERIENCE WITH TURN-KEY CONTENT AND CUSTOMIZED WEBSITES.

An Oracle White Paper June Oracle: Big Data for the Enterprise

Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data

Introduction to Big Data Training

Big Data and Analytics: Challenges and Opportunities

Reference Architecture, Requirements, Gaps, Roles

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

Real Time Analytics for Big Data. NtiSh Nati

Social Business Intelligence For Retail Industry

Testing Tools using Visual Studio. Randy Pagels Sr. Developer Technology Specialist Microsoft Corporation

Asset Management Solutions for Research Analysts THE CHALLENGE YOUR END-TO-END RESEARCH SOLUTION

The table below shows the satisfaction and scale scores that determine vendor placement on the Grid.

Big Data to trade bonds/fx & Python demo on FX intraday vol

Real-Time News Analytics With Big Data Technologies. Volker Stümpflen CEO Clueda AG Robert Feckl CIO Baader Bank AG

Three Open Blueprints For Big Data Success

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Whether you re new to trading or an experienced investor, listed stock

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Getting Started with the new VWO

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Transcription:

Leveraging Big Data A case study from Thomson Reuters

About the speakers Chawapong Suriyajan, Development Group Leader Sakol Suwinaitrakool Senior Solution Architect 2

FOLLOW US: facebook.com/thomsonreutersthailand 3

What s the problem we want to solve? Behavioral finance is an area of increasing interest in financial markets, but it's been difficult for human traders to keep pace due to the sheer volume and detail of data and the need to interpret it and spot trends immediately Philip Brittan Chief Technology Officer & Global Head of Platform, Thomson Reuters 4

Introducing Social Media Monitor The tool that helps overcome the challenges in analyzing social data, and provide the insights for investors 5

Awards Corporate Entrepreneur Awards 2014 Best New Service The Technical Analyst Awards 2014 Best Specialist Product FStech Awards 2015 Financial Sector Innovation of the Year 6

What does SMM do? Perform Sentiment Analysis Visualize 7

Why Social Media? Fast!!! 8

Social Media is Fast On June 10, 2014, as Iraqi militants seized the Baiji oil refinery, the news broke on Twitter - six hours in advance of other media outlets covering the story. 9

Social Media is Fast In November, 2013, when The Globe and Mail tweeted that BlackBerry s $4.7 billion buyout was scuttled. The tweet happened at 8:12 a.m., and by 8:19 a.m., BlackBerry stock had fallen 20% 10

Why Social Media? Provides collective sentiment indicators 11

Social Medias 13

The Growth of Social Data Source: http://www.searchenginejournal.com/growth-social-media-2-0-infographic/77055/ 14

Challenges leveraging Social Media? - Data are incredibly huge - How we can make a machine analyze the sentiment data correctly - How can we deal with data that are noises - How do we present the huge amount of data in the way that a human can easily understand 15

Looking at the challenges - Data are incredibly huge - How we can make a machine analyze the sentiment data correctly - How can we deal with data that are noises - How do we present the huge amount of data in the way that a human can easily understand 16

Emerging Technology Trends: Big Data T O O L File System: Document Store: Wide-column Store: Key-value Store: 17

How big is our Data? Millions of tweets with cash tag (e.g. $AAPL) per quarter Greater than 1Tera Bytes of Compressed data Around 50 GB of data flowing into our system Daily 18

Social Media Monitor Data Sources 45,000 Entries/Day 45,000,000 Entries/Day Social Media Ingestor Filter 215,000 Entries/Day 19

What is used to handle such big data? Distributed High Availability Full-Text Search Document Oriented Schema Free RESTFul API Apache 2 Open Source License Low Cost 20

Our experience using Elasticsearch & Hadoop Strengths Clean distributed deployment and prior in-house testing done Challenges Determining the size of the cluster Resource contention / Resource Sharing Large dataset 21

Looking at the challenges - Data are incredibly huge - How we can make a machine analyze the sentiment data correctly - How can we deal with data that are noises - How do we present the huge amount of data in the way that a human can easily understand 22

Analyzing Sentiments Natural Language Processing Apples are red. They are very delicious Tokenize Apples, are, red, Part of Speech tagging Apples = Subject, are = verb Lemmatization Apples = Apple, are = be Name Entity Relation Apples = Fruits, Red = Color Coreference resolution They = Apples 23

Analyzing Sentiments Machine Learning 24

Processing Tweets Tweets NLP Sentiment Analysis (Machine Learning) SM Ingestor Tweets + Sentiments/ Bullish, Bearish Search Processed in miliseconds Count of positive tweets Count of negative tweets Count of neutral tweets Count of bullish tweets Count of bearish tweets Total tweet count SM Statistic Aggregator 25

Looking at the challenges - Data are incredibly huge - How we can make a machine analyze the sentiment data correctly - How can we deal with data that are noises - How do we present the huge amount of data in the way that a human can easily understand 26

What are the noises What if people tweet about some company with great bias? What if someone tweet jokes? Will this impact the analysis? Example: Buy $Apple? Is it positive or Negative? 27

Minimizing the noises Use the proper filter for the PowerTrack API Weighted Sentiment score using Klout score Focus on collective sentiments during a specific time period, instead of individual tweet. Enough training data to train our sentiment engine 28

Klout score The Klout Score is a number between 1-100 that represents your influence. The more influential you are, the higher your Klout Score. 29

Looking at the challenges - Data are incredibly huge - How we can make a machine analyze the sentiment data correctly - How can we deal with data that are noises - How do we present the huge amount of data in the way that a human can easily understand 30

Data Visualization Bubbles Chart 31

Data Visualization Heatmap 32

Data Visualization Technology 33

Strengths and Challenges Strengths Server-side deployment No installation on client machines Off-load Presentation logic to Client machines Save resource requirement on server side more scalable (Good code needed) Scalable Node.JS is single-thread non-blocking IO, no overhead for context switching 34

Strengths and Challenges Challenges Developer Skills on Angular.js Framework JavaScript Performance Node.JS is quite sensitive to unhandled exceptions, which cause excessive memory usage 35

Q&A 36

Thank you 37