Information Retrieval Elasticsearch

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Information Retrieval Elasticsearch"

Transcription

1 Information Retrieval Elasticsearch

2 IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing. Wikipedia

3 Concepts term t : a noun or compound word used in a specific context tf (t in d) : term frequency in a document; measure the number of times term t appears in the currently scored document d idf (t) : inverse document frequency measures how often the term appears across the index: common or rare obtained by dividing the total number of documents by the number of documents containing the term, and then taking the logarithm of that quotient.

4 Concepts tf idf, (term frequency inverse document frequency) is a numerical statistic that reflects how important a word is to a document in a collection or corpus. It is used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, adjusted for the fact that some words appear more frequently in general.

5 Concepts Then tf idf is calculated as and

6 Apache Lucene Fast, high performance, scalable search/ir library Open source Indexing and Searching Inverted Index of documents Provides advanced Search options like synonyms, stopwords, based on similarity, proximity.

7 Lucene Internal Representation Credit:

8 Lucene Based on documents model Index contains documents Each document consist of fields Each Field has attributes field data type (FieldType) way to handle the content (Analyzers, Filters) Stored or indexed field (stored="true") or (indexed="true")

9 Indexing Pipeline Analyzer : creates tokens using a Tokenizer and/or applying Filters (Token Filter) Each field can define an Analyzer at index time/query time or both at same time

10 Elasticsearch - Introduction Enterprise Search platform for Apache Lucene Open source Highly reliable, scalable, fault tolerant Support distributed Indexing, Replication, and load balanced querying

11 Elasticsearch - Features Distributed (multiple nodes) RESTful search server (GET, PUT, POST, DELETE) Document oriented, full text (JSON format) Schema free Easy to scale horizontally Real time analytics Multi tenancy (multiple indexes)

12 APIs HTTP RESTful Api Java Api Clients perl, python, php, ruby,.net and more All APIs perform automatic node operation rerouting

13 Cluster Architecture Source:

14 Index Request Source:

15 Search Request Source:

16 Logstash - Architecture Source:

17 Logstash life of an event Input Filters Output Filters are processed in order of config file Outputs are processed in order of config file Input: Input stream File input (tail) Log4j Redis Syslog and many more

18 Kibana

19 Source:

20 Source:

21 Analytics Analytics source : Kibana.org based on ElasticSearch and Logstash Image Source :

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH Real-time Data Analytics mit Elasticsearch Bernhard Pflugfelder inovex GmbH Bernhard Pflugfelder Big Data Engineer @ inovex Fields of interest: search analytics big data bi Working with: Lucene Solr Elasticsearch

More information

Log management with Logstash and Elasticsearch. Matteo Dessalvi

Log management with Logstash and Elasticsearch. Matteo Dessalvi Log management with Logstash and Elasticsearch Matteo Dessalvi HEPiX 2013 Outline Centralized logging. Logstash: what you can do with it. Logstash + Redis + Elasticsearch. Grok filtering. Elasticsearch

More information

Data Discovery and Systems Diagnostics with the ELK stack. Rittman Mead - BI Forum 2015, Brighton. Robin Moffatt, Principal Consultant Rittman Mead

Data Discovery and Systems Diagnostics with the ELK stack. Rittman Mead - BI Forum 2015, Brighton. Robin Moffatt, Principal Consultant Rittman Mead Data Discovery and Systems Diagnostics with the ELK stack Rittman Mead - BI Forum 2015, Brighton Robin Moffatt, Principal Consultant Rittman Mead T : +44 (0) 1273 911 268 (UK) About Me Principal Consultant

More information

www.basho.com Technical Overview Simple, Scalable, Object Storage Software

www.basho.com Technical Overview Simple, Scalable, Object Storage Software www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...

More information

Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET. ISGC 2013, March 2013

Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET. ISGC 2013, March 2013 Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET ISGC 2013, March 2013 Agenda Introduction Collecting logs Log Processing Advanced analysis Resume Introduction Status

More information

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon 2014. Max Putas

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon 2014. Max Putas Analyzing large flow data sets using modern open-source data search and visualization tools FloCon 2014 Max Putas About me Operations Engineer - DevOps BS, MS, and CAS in Telecommunications Work/research

More information

Developing an Application Tracing Utility for Mule ESB Application on EL (Elastic Search, Log stash) Stack Using AOP

Developing an Application Tracing Utility for Mule ESB Application on EL (Elastic Search, Log stash) Stack Using AOP Developing an Application Tracing Utility for Mule ESB Application on EL (Elastic Search, Log stash) Stack Using AOP Mohan Bandaru, Amarendra Kothalanka, Vikram Uppala Student, Department of Computer Science

More information

Mobile Analytics. mit Elasticsearch und Kibana. Dominik Helleberg

Mobile Analytics. mit Elasticsearch und Kibana. Dominik Helleberg Mobile Analytics mit Elasticsearch und Kibana Dominik Helleberg Speaker Dominik Helleberg Mobile Development Android / Embedded Tools http://dominik-helleberg.de/+ Mobile Analytics Warum? Server Software

More information

Log Analysis with the ELK Stack (Elasticsearch, Logstash and Kibana) Gary Smith, Pacific Northwest National Laboratory

Log Analysis with the ELK Stack (Elasticsearch, Logstash and Kibana) Gary Smith, Pacific Northwest National Laboratory Log Analysis with the ELK Stack (Elasticsearch, Logstash and Kibana) Gary Smith, Pacific Northwest National Laboratory A Little Context! The Five Golden Principles of Security! Know your system! Principle

More information

Powering Monitoring Analytics with ELK stack

Powering Monitoring Analytics with ELK stack Powering Monitoring Analytics with ELK stack Abdelkader Lahmadi, Frédéric Beck INRIA Nancy Grand Est, University of Lorraine, France 2015 (compiled on: June 23, 2015) References online Tutorials Elasticsearch

More information

Apache Lucene. Searching the Web and Everything Else. Daniel Naber Mindquarry GmbH ID 380

Apache Lucene. Searching the Web and Everything Else. Daniel Naber Mindquarry GmbH ID 380 Apache Lucene Searching the Web and Everything Else Daniel Naber Mindquarry GmbH ID 380 AGENDA 2 > What's a search engine > Lucene Java Features Code example > Solr Features Integration > Nutch Features

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Search Engine Architecture I

Search Engine Architecture I Search Engine Architecture I Software Architecture The high level structure of a software system Software components The interfaces provided by those components The relationships between those components

More information

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures Modern technologies in Zenoss Service Dynamics v5 enable IT organizations to scale out monitoring and scale back costs, avoid service

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

April 8th - 10th, 2014 LUG14 LUG14. Lustre Log Analyzer. Kalpak Shah. DataDirect Networks. ddn.com. 2014 DataDirect Networks. All Rights Reserved.

April 8th - 10th, 2014 LUG14 LUG14. Lustre Log Analyzer. Kalpak Shah. DataDirect Networks. ddn.com. 2014 DataDirect Networks. All Rights Reserved. April 8th - 10th, 2014 LUG14 LUG14 Lustre Log Analyzer Kalpak Shah DataDirect Networks Lustre Log Analysis Requirements Need scripts to parse Lustre debug logs Only way to effectively use the logs for

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

Open Source IR Tools and Libraries

Open Source IR Tools and Libraries Open Source IR Tools and Libraries Giorgos Vasiliadis, gvasil@csd.uoc.gr CS-463 Information Retrieval Models Computer Science Department University of Crete 1 Outline Google Search API Lucene Terrier Lemur

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Processing millions of logs with Logstash

Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra November 21, 2014 About me My name is Valentin Fischer-Mitoiu and I work for the University of Vienna. More specificaly in a group called Domainis

More information

Andrew Moore Amsterdam 2015

Andrew Moore Amsterdam 2015 Andrew Moore Amsterdam 2015 Agenda Why log How to log Audit plugins Log analysis Demos Logs [timestamp]: [some useful data] Why log? Error Log Binary Log Slow Log General Log Why log? Why log? Why log?

More information

DataStax Enterprise 3.x

DataStax Enterprise 3.x DataStax Enterprise 3.x Realtime Analytics with Solr Jason Rutherglen 2012 DataStax 1 About the Presenter Big Data Engineer at DataStax Co-author of Programming Hive and Lucene and Solr: The Definitive

More information

Improve performance and availability of Banking Portal with HADOOP

Improve performance and availability of Banking Portal with HADOOP Improve performance and availability of Banking Portal with HADOOP Our client is a leading U.S. company providing information management services in Finance Investment, and Banking. This company has a

More information

Using Apache Solr for Ecommerce Search Applications

Using Apache Solr for Ecommerce Search Applications Using Apache Solr for Ecommerce Search Applications Rajani Maski Happiest Minds, IT Services SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. 2 Copyright Information This document

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Logging on a Shoestring Budget

Logging on a Shoestring Budget UNIVERSITY OF NEBRASKA AT OMAHA Logging on a Shoestring Budget James Harr jharr@unomaha.edu Agenda The Tools ElasticSearch Logstash Kibana redis Composing a Log System Q&A, Conclusions, Lessons Learned

More information

MongoDB Developer and Administrator Certification Course Agenda

MongoDB Developer and Administrator Certification Course Agenda MongoDB Developer and Administrator Certification Course Agenda Lesson 1: NoSQL Database Introduction What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL Types of NoSQL

More information

the missing log collector Treasure Data, Inc. Muga Nishizawa

the missing log collector Treasure Data, Inc. Muga Nishizawa the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days

More information

Time series IoT data ingestion into Cassandra using Kaa

Time series IoT data ingestion into Cassandra using Kaa Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka ashvayka@cybervisiontech.com Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

Log Management with Open-Source Tools. Risto Vaarandi SEB Estonia

Log Management with Open-Source Tools. Risto Vaarandi SEB Estonia Log Management with Open-Source Tools Risto Vaarandi SEB Estonia Outline Why use open source tools for log management? Widely used logging protocols and recently introduced new standards Open-source syslog

More information

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到

More information

MathCloud: From Software Toolkit to Cloud Platform for Building Computing Services

MathCloud: From Software Toolkit to Cloud Platform for Building Computing Services MathCloud: From Software Toolkit to Cloud Platform for Building Computing s O.V. Sukhoroslov Centre for Grid Technologies and Distributed Computing ISA RAS Moscow Institute for Physics and Technology MathCloud

More information

Efficient Management of System Logs using a Cloud

Efficient Management of System Logs using a Cloud , CESNET z.s.p.o.,zikova 4, 160 00 Praha 6, Czech Republic and University of West Bohemia,Univerzitní 8, 306 14 Pilsen, Czech Republic E-mail: bodik@civ.zcu.cz Daniel Kouřil, CESNET z.s.p.o.,zikova 4,

More information

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment International Journal of Applied Information Systems (IJAIS) ISSN : 2249-868 Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment Yusuf Abubakar Department of Computer Science

More information

Log management with Graylog2 Lennart Koopmann, FrOSCon 2012. Mittwoch, 29. August 12

Log management with Graylog2 Lennart Koopmann, FrOSCon 2012. Mittwoch, 29. August 12 Log management with Graylog2 Lennart Koopmann, FrOSCon 2012 About me 24 years old, Software Engineer at XING AG Hamburg, Germany @_lennart Graylog2 Free and open source log management system Started in

More information

Using Logstash and Elasticsearch analytics capabilities as a BI tool

Using Logstash and Elasticsearch analytics capabilities as a BI tool Using Logstash and Elasticsearch analytics capabilities as a BI tool Pashalis Korosoglou, Pavlos Daoglou, Stefanos Laskaridis, Dimitris Daskopoulos Aristotle University of Thessaloniki, IT Center Outline

More information

Log Management with Open-Source Tools. Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M

Log Management with Open-Source Tools. Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M Log Management with Open-Source Tools Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M Outline Why do we need log collection and management? Why use open source tools? Widely used logging protocols and recently

More information

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our

More information

The Open Source Knowledge Discovery and Document Analysis Platform

The Open Source Knowledge Discovery and Document Analysis Platform Enabling Agile Intelligence through Open Analytics The Open Source Knowledge Discovery and Document Analysis Platform 17/10/2012 1 Agenda Introduction and Agenda Problem Definition Knowledge Discovery

More information

Case Study: Real-time Analytics With Druid. Salil Kalia, Tech Lead, TO THE NEW Digital

Case Study: Real-time Analytics With Druid. Salil Kalia, Tech Lead, TO THE NEW Digital Case Study: Real-time Analytics With Druid Salil Kalia, Tech Lead, TO THE NEW Digital Agenda Understanding the use-case Ad workflow Our use case Experiments with technologies Redis Cassandra Introduction

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Open Source Techniques push Enterprise Search & Search Driven Applications and especially foster the application of Text Analytics

Open Source Techniques push Enterprise Search & Search Driven Applications and especially foster the application of Text Analytics Open Source Techniques push Enterprise Search & Search Driven Applications and especially foster the application of Text Analytics Exploring the Future of Enterprise Search, IPTS, Seville, October 2011

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

Digital Asset Management Beyond CMIS

Digital Asset Management Beyond CMIS Digital Asset Management Beyond CMIS CMIS is an important component of DAM for many organizations, but knowing how to use it to maximize its effectiveness is the key. In this paper: How organizations use

More information

CS242 PROJECT. Presented by Moloud Shahbazi Spring 2015

CS242 PROJECT. Presented by Moloud Shahbazi Spring 2015 CS242 PROJECT Presented by Moloud Shahbazi Spring 2015 AGENDA Project Overview Data Collection Indexing Big Data Processing PROJECT- PART1 1.1 Data Collection: 5G < data size < 10G Deliverables: Document

More information

Elasticsearch, Logstash, and Kibana (ELK)

Elasticsearch, Logstash, and Kibana (ELK) Elasticsearch, Logstash, and Kibana (ELK) Dwight Beaver dsbeaver@cert.org Sean Hutchison shutchison@cert.org January 2015 2014 Carnegie Mellon University This material is based upon work funded and supported

More information

Graylog2 Lennart Koopmann, OSDC 2014. @_lennart / www.graylog2.org

Graylog2 Lennart Koopmann, OSDC 2014. @_lennart / www.graylog2.org Graylog2 Lennart Koopmann, OSDC 2014 @_lennart / www.graylog2.org About me 25 years old Living in Hamburg, Germany @_lennart on Twitter Co-Founder of TORCH - The Graylog2 company. Graylog2 history Started

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services

More information

WORKSHOP Log Management with NetEye 3.5

WORKSHOP Log Management with NetEye 3.5 WORKSHOP Log Management with NetEye 3.5 Program 2015 by Thomas Forrer LogAnalysis with Logstash & Kibana Log Management in NetEye Configuration of Log sources / Agents Log acquisition and configuration

More information

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015 Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a

More information

ifinder ENTERPRISE SEARCH

ifinder ENTERPRISE SEARCH DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality

More information

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) Erdélyi Ernő, Component Soft Kft. erno@component.hu www.component.hu 2013 (c) Component Soft Ltd Leading Hadoop Vendor Copyright 2013,

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel-2556219 Jaykrushna Patel-2619715

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel-2556219 Jaykrushna Patel-2619715 Big Data Facebook Wall Data using Graph API Presented by: Prashant Patel-2556219 Jaykrushna Patel-2619715 Outline Data Source Processing tools for processing our data Big Data Processing System: Mongodb

More information

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib

More information

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges James Campbell Corporate Systems Engineer HP Vertica jcampbell@vertica.com Big

More information

Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search

Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search Table of Contents Introduction... 3 Why Search?... 3 General Search Requirements... 3 Traditional Deployment

More information

Thapovan In pursuit of perfection

Thapovan In pursuit of perfection Financial Services Health Care Insurance ON-DEMAND LAUNDRY SERVICE Case Study Java/J2EE Mobile Platforms Microsoft.NET Investment Management Investment Banking & Brokerage Custody & Clearing Services Corporate

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements

More information

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,

More information

SCALABLE DATA SERVICES

SCALABLE DATA SERVICES 1 SCALABLE DATA SERVICES 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2 Overview MySQL Database Clustering GlusterFS Memcached 3 Overview Problems of Data Services 4 Data retrieval

More information

OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP

OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP 1 KALYANKUMAR B WADDAR, 2 K SRINIVASA 1 P G Student, S.I.T Tumkur, 2 Assistant Professor S.I.T Tumkur Abstract- Product Review System

More information

Elasticsearch for Lua Developers. Pablo Musa pablo@elastic.co

Elasticsearch for Lua Developers. Pablo Musa pablo@elastic.co Elasticsearch for Lua Developers Pablo Musa pablo@elastic.co + + Me Pablo Musa Educational Engineer @ Elastic Which student? 5 interested students 3 very good proposals Key Points: - Background (Lua, Elasticsearch,

More information

Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf

Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

XpoLog Competitive Comparison Sheet

XpoLog Competitive Comparison Sheet XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT

More information

Centralized logging system based on WebSockets protocol

Centralized logging system based on WebSockets protocol Centralized logging system based on WebSockets protocol Radomír Sohlich sohlich@fai.utb.cz Jakub Janoštík janostik@fai.utb.cz František Špaček spacek@fai.utb.cz Abstract: The era of distributed systems

More information

ASAP D7.1 Integration Prototype ASAP System Prototype v.1

ASAP D7.1 Integration Prototype ASAP System Prototype v.1 FP7 Project ASAP Adaptable Scalable Analytics Platform Integration Prototype ASAP System Prototype v.1 WP 7 Integration of the ASAP System Nature: Report Dissemination: Public Version History Version Date

More information

Glassfish Architecture.

Glassfish Architecture. Glassfish Architecture. First part Introduction. Over time, GlassFish has evolved into a server platform that is much more than the reference implementation of the Java EE specifcations. It is now a highly

More information

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led Course Description This three day course prepares IT Professionals to administer enterprise search solutions using

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Log managing at PIC. A. Bruno Rodríguez Rodríguez. Port d informació científica Campus UAB, Bellaterra Barcelona. December 3, 2013

Log managing at PIC. A. Bruno Rodríguez Rodríguez. Port d informació científica Campus UAB, Bellaterra Barcelona. December 3, 2013 Log managing at PIC A. Bruno Rodríguez Rodríguez Port d informació científica Campus UAB, Bellaterra Barcelona December 3, 2013 Bruno Rodríguez (PIC) Log managing at PIC December 3, 2013 1 / 21 What will

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Data Services Advisory

Data Services Advisory Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains

More information

Making Big Data Processing Simple with Spark. Matei Zaharia

Making Big Data Processing Simple with Spark. Matei Zaharia Making Big Data Processing Simple with Spark Matei Zaharia December 17, 2015 What is Apache Spark? Fast and general cluster computing engine that generalizes the MapReduce model Makes it easy and fast

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Building a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave

Building a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave Building a logging pipeline with Open Source tools Iñigo Ortiz de Urbina Cazenave NLUUG Utrecht - Netherlands 28 May 2015 whoami; 2 Iñigo Ortiz de Urbina Cazenave Systems Engineer whoami; groups; 3 Iñigo

More information

Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014

Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014 Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014 About cziegeler@apache.org @cziegeler RnD Team at Adobe Research Switzerland Member of the Apache

More information

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation

More information

Building Multilingual Search Index using open source framework

Building Multilingual Search Index using open source framework Building Multilingual Search Index using open source framework ABSTRACT Arjun Atreya V 1 Swapnil Chaudhari 1 Pushpak Bhattacharyya 1 Ganesh Ramakrishnan 1 (1) Deptartment of CSE, IIT Bombay {arjun, swapnil,

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Epimorphics Linked Data Publishing Platform

Epimorphics Linked Data Publishing Platform Epimorphics Linked Data Publishing Platform Epimorphics Services for G-Cloud Version 1.2 15 th December 2014 Authors: Contributors: Review: Andy Seaborne, Martin Merry Dave Reynolds Epimorphics Ltd, 2013

More information

Web Content Mining. Search Engine Mining Improves on the content search of other tools like search engines.

Web Content Mining. Search Engine Mining Improves on the content search of other tools like search engines. Web Content Mining Web Content Mining Pre-processing data before web content mining: feature selection Post-processing data can reduce ambiguous searching results Web Page Content Mining Mines the contents

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information