Building a Search Engine for the Cuban Web

Size: px
Start display at page:

Download "Building a Search Engine for the Cuban Web"

Transcription

1 Building a Search Engine for the Cuban Web Jorge Luis Betancourt Search/Crawl Engineer NOVEMBER 16-18, 2016 SEVILLE, SPAIN

2 Who am I 01 Jorge Luis Betancourt González Search/Crawl Engineer Apache Nutch Committer & PMC Apache Solr/ES enthusiast 2

3 Agenda Introduction & motivation Technologies used Customizations Conclusions and future work 3

4 Introduction / Motivation Cuba Internet Intranet Global search engines can t access documents hosted the Cuban Intranet 4

5 Writing your own web search engine from scratch? or 5

6 Common search engine features 1 Web search: HTML & documents (PDF, DOC) highlighting filters (facets) suggestions autocorrection 2 Image search (size, format, color, objects) thumbnails filters (facets) show metadata match text with images 3 News search (alerting, notifications) near real time , push, SMS 6

7 How to fulfill these requirements? store query At the core a search engine: stores some information a retrieve this information when a question is received 7

8 Open Source to the rescue crawler 1 Index Server 2 web interface 3 8

9 Apache Nutch Nutch is a well matured, production ready Web crawler. Enables fine grained configuration, relying on Apache Hadoop data structures, which are great for batch processing. 9

10 Apache Nutch Highly scalable Highly extensible Pluggable parsing protocols, storage, indexing, scoring, Active community Apache License 10

11 Apache Solr TOTAL DOWNLOADS 8M+ MONTHLY DOWNLOADS 250,000+ Apache License Great community Highly modular Stability / Scalability Based on Lucene Battle tested 11

12 Back to the list of features 1 Web search: HTML & documents (PDF, DOC) highlighting filters (facets) suggestions autocorrection 2 Image search (size, format, color, objects) thumbnails show metadata filters (facets) match text with images 3 News search (alerting, notifications) near real time , push, SMS 12

13 Image search and thumbnails Custom parser & indexer to store the image thumbnail h1 Custom parser & indexer & scoring identify and store the text img p related with an image 13

14 How does it work? 2 img 1 h1 3 img img p 14

15 News search (NRT & alerting) Nutch is really not suited for this task: Batch nature of the Hadoop Jobs doesn t fit well in this scenario 15

16 Our topology index RSS fetch parse flaxsearch/luwak parse the RSS feed and outputs the news links to be processed according to SC protocol. monit or https://github.com/commoncrawl/news-crawl 16

17 Querying the data 1 Web search: HTML & documents (PDF, DOC) highlighting filters (facets) suggestions autocorrection 2 Image search (size, format, color, objects) thumbnails show metadata filters (facets) match text with images 3 News search (alerting, notifications) near real time , push, SMS 17

18 Querying the data 1 Web search: HTML & documents (PDF, DOC) highlighting filters (facets) suggestions autocorrection 2 Image search (size, format, color, objects) thumbnails show metadata filters (facets) match text with images 3 News search (alerting, notifications) near real time , push, SMS 18

19 Apache Solr Solr has full support for highlighting (3 impl) powerful faceting capabilities (even more on recent releases) autocorrection support based on the index content awesome scalability (SolrCloud, classic master-slave replication) 19

20 The features, once again 1 Web search: HTML & documents (PDF, DOC) highlighting filters (facets) suggestions autocorrection 2 Image search (size, format, color, objects) thumbnails show metadata filters (facets) match text with images 3 News search (alerting, notifications) near real time , push, SMS 20

21 The features, once again 1 Web search: HTML & documents (PDF, DOC) highlighting filters (facets) suggestions autocorrection 2 Image search (size, format, color, objects) thumbnails show metadata filters (facets) match text with images 3 News search (alerting, notifications) near real time , push, SMS 21

22 Other features - monitoring We needed a way of monitoring our infrastructure without a great Internet connection you can t send GB of logs to a cloud environment, so (and metrics) (and logs) time series store analytical tool (and facets) 22

23 Other features - monitoring (and logs) parsing & aggregation (and metrics) (and logs) time series store analytical tool (and facets) 23

24 Banana (Kibana port) for visualizations 24

25 Infrastructure HTTP HTTP HTTP WEB 2 Solr Replicador HTTP Solr Master JAVABIN Crawlers Nutch 1 25

26 Some usage stats less than visits around 600 unique visitors 26

27 Future work Apply deep learning techniques to process the raw images and mix with current approach Increase the number of signals that we get from our crawlers (correlate even more crawl related events) 27

28 Thanks M!

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

Apache Lucene. Searching the Web and Everything Else. Daniel Naber Mindquarry GmbH ID 380

Apache Lucene. Searching the Web and Everything Else. Daniel Naber Mindquarry GmbH ID 380 Apache Lucene Searching the Web and Everything Else Daniel Naber Mindquarry GmbH ID 380 AGENDA 2 > What's a search engine > Lucene Java Features Code example > Solr Features Integration > Nutch Features

More information

Building Multilingual Search Index using open source framework

Building Multilingual Search Index using open source framework Building Multilingual Search Index using open source framework ABSTRACT Arjun Atreya V 1 Swapnil Chaudhari 1 Pushpak Bhattacharyya 1 Ganesh Ramakrishnan 1 (1) Deptartment of CSE, IIT Bombay {arjun, swapnil,

More information

www.basho.com Technical Overview Simple, Scalable, Object Storage Software

www.basho.com Technical Overview Simple, Scalable, Object Storage Software www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...

More information

CloudSearch: A Custom Search Engine based on Apache Hadoop, Apache Nutch and Apache Solr

CloudSearch: A Custom Search Engine based on Apache Hadoop, Apache Nutch and Apache Solr CloudSearch: A Custom Search Engine based on Apache Hadoop, Apache Nutch and Apache Solr Lambros Charissis, Wahaj Ali University of Crete {lcharisis, ali}@csd.uoc.gr Abstract. Implementing a performant

More information

Indexing big data with Tika, Solr, and map-reduce

Indexing big data with Tika, Solr, and map-reduce Indexing big data with Tika, Solr, and map-reduce Scott Fisher, Erik Hetzner California Digital Library 8 February 2012 Scott Fisher, Erik Hetzner (CDL) Indexing big data 8 February 2012 1 / 19 Outline

More information

Towards Smart and Intelligent SDN Controller

Towards Smart and Intelligent SDN Controller Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015 Leveraging the Power of SOLR with SPARK Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015 Welcome Johannes Weigend - CTO QAware GmbH - Software architect / developer - 25 years

More information

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH Real-time Data Analytics mit Elasticsearch Bernhard Pflugfelder inovex GmbH Bernhard Pflugfelder Big Data Engineer @ inovex Fields of interest: search analytics big data bi Working with: Lucene Solr Elasticsearch

More information

Information Retrieval Elasticsearch

Information Retrieval Elasticsearch Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches

More information

Storm Crawler. A real-time distributed web crawling and monitoring framework. Jake Dodd, co-founder

Storm Crawler. A real-time distributed web crawling and monitoring framework. Jake Dodd, co-founder Storm Crawler A real-time distributed web crawling and monitoring framework Jake Dodd, co-founder http://ontopic.io jake@ontopic.io ApacheCon North America 2015 http://ontopic.io 1 Agenda Overview Continuous

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led Course Description This three day course prepares IT Professionals to administer enterprise search solutions using

More information

the missing log collector Treasure Data, Inc. Muga Nishizawa

the missing log collector Treasure Data, Inc. Muga Nishizawa the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days

More information

Analyzing web data (an exercise in navel gazing) Aaron Hart KNIME.com AG Zurich, Switzerland

Analyzing web data (an exercise in navel gazing) Aaron Hart KNIME.com AG Zurich, Switzerland Analyzing web data (an exercise in navel gazing) Aaron Hart KNIME.com AG Zurich, Switzerland KNIME Forum Analysis http://tech.knime.org/forum KNIME Forum Analysis Challenges: Get data into KNIME Extract

More information

TORNADO Solution for Telecom Vertical

TORNADO Solution for Telecom Vertical BIG DATA ANALYTICS & REPORTING TORNADO Solution for Telecom Vertical Overview Last decade has see a rapid growth in wireless and mobile devices such as smart- phones, tablets and netbook is becoming very

More information

Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience

Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience 黃 振 修 (Chris Huang) SPN 主 動 式 雲 端 截 毒 技 術 架 構 師 About Me SPN 主 動 式 雲 端 截 毒 技 術 架 構 師 SPN Hadoop 基 礎 運 算 架 構 師 Hadoop in Taiwan

More information

SharePoint 2013 Search Topologies Explained

SharePoint 2013 Search Topologies Explained SharePoint 2013 Search Topologies Explained Contents Search Topology Components... 2 Configuration... 5 Monitoring... 6 Documenting Search Topology... 7 Page 1 of 10 SharePoint 2013 Search Topologies Explained

More information

Investigating Hadoop for Large Spatiotemporal Processing Tasks

Investigating Hadoop for Large Spatiotemporal Processing Tasks Investigating Hadoop for Large Spatiotemporal Processing Tasks David Strohschein dstrohschein@cga.harvard.edu Stephen Mcdonald stephenmcdonald@cga.harvard.edu Benjamin Lewis blewis@cga.harvard.edu Weihe

More information

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Sector vs. Hadoop. A Brief Comparison Between the Two Systems Sector vs. Hadoop A Brief Comparison Between the Two Systems Background Sector is a relatively new system that is broadly comparable to Hadoop, and people want to know what are the differences. Is Sector

More information

Crawling. T. Yang, UCSB 290N Some of slides from Crofter/Metzler/Strohman s textbook

Crawling. T. Yang, UCSB 290N Some of slides from Crofter/Metzler/Strohman s textbook Crawling T. Yang, UCSB 290N Some of slides from Crofter/Metzler/Strohman s textbook Table of Content Basic crawling architecture and flow Distributed crawling Scheduling: Where to crawl Crawling control

More information

Improve performance and availability of Banking Portal with HADOOP

Improve performance and availability of Banking Portal with HADOOP Improve performance and availability of Banking Portal with HADOOP Our client is a leading U.S. company providing information management services in Finance Investment, and Banking. This company has a

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014

Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014 Finding the Needle in a Big Data Haystack Wolfgang Hoschek (@whoschek) JAX 2014 1 About Wolfgang Software Engineer @ Cloudera Search Platform Team Previously CERN, Lawrence Berkeley National Laboratory,

More information

SharePoint 2010 Interview Questions-Architect

SharePoint 2010 Interview Questions-Architect Basic Intro SharePoint Architecture Questions 1) What are Web Applications in SharePoint? An IIS Web site created and used by SharePoint 2010. Saying an IIS virtual server is also an acceptable answer.

More information

Bernd Ahlers Michael Friedrich. Log Monitoring Simplified Get the best out of Graylog2 & Icinga 2

Bernd Ahlers Michael Friedrich. Log Monitoring Simplified Get the best out of Graylog2 & Icinga 2 Bernd Ahlers Michael Friedrich Log Monitoring Simplified Get the best out of Graylog2 & Icinga 2 BEFORE WE START Agenda AGENDA Introduction Tools Log History Logs & Monitoring Demo The Future Resources

More information

EventSentry Overview. Part I About This Guide 1. Part II Overview 2. Part III Installation & Deployment 4. Part IV Monitoring Architecture 13

EventSentry Overview. Part I About This Guide 1. Part II Overview 2. Part III Installation & Deployment 4. Part IV Monitoring Architecture 13 Contents I Part I About This Guide 1 Part II Overview 2 Part III Installation & Deployment 4 1 Installation... with Setup 5 2 Management... Console 6 3 Configuration... 7 4 Remote... Update 10 Part IV

More information

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK sales@sawmill.co.uk tel: +44 845 250 4470

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK sales@sawmill.co.uk tel: +44 845 250 4470 Product Guide What is Sawmill Sawmill is a highly sophisticated and flexible analysis and reporting tool. It can read text log files from over 800 different sources and analyse their content. Once analyzed

More information

Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET. ISGC 2013, March 2013

Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET. ISGC 2013, March 2013 Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET ISGC 2013, March 2013 Agenda Introduction Collecting logs Log Processing Advanced analysis Resume Introduction Status

More information

Using Logstash and Elasticsearch analytics capabilities as a BI tool

Using Logstash and Elasticsearch analytics capabilities as a BI tool Using Logstash and Elasticsearch analytics capabilities as a BI tool Pashalis Korosoglou, Pavlos Daoglou, Stefanos Laskaridis, Dimitris Daskopoulos Aristotle University of Thessaloniki, IT Center Outline

More information

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn)

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn) + Hadoop-based Open Source ediscovery: FreeEed (Easy as popcorn) + Hello! 2 Sujee Maniyam & Mark Kerzner Founders @ Elephant Scale consulting and training around Hadoop, Big Data technologies Enterprise

More information

Tools for Web Archiving: The Java/Open Source Tools to Crawl, Access & Search the Web. NLA Gordon Mohr March 28, 2012

Tools for Web Archiving: The Java/Open Source Tools to Crawl, Access & Search the Web. NLA Gordon Mohr March 28, 2012 Tools for Web Archiving: The Java/Open Source Tools to Crawl, Access & Search the Web NLA Gordon Mohr March 28, 2012 Overview The tools: Heritrix crawler Wayback browse access Lucene/Hadoop utilities:

More information

MySQL Enterprise Monitor

MySQL Enterprise Monitor MySQL Enterprise Monitor Lynn Ferrante Principal Sales Consultant 1 Program Agenda MySQL Enterprise Monitor Overview Architecture Roles Demo 2 Overview 3 MySQL Enterprise Edition Highest Levels of Security,

More information

INSPIRE Dashboard. Technical scenario

INSPIRE Dashboard. Technical scenario INSPIRE Dashboard Technical scenario Technical scenarios #1 : GeoNetwork catalogue (include CSW harvester) + custom dashboard #2 : SOLR + Banana dashboard + CSW harvester #3 : EU GeoPortal +? #4 :? + EEA

More information

Web Crawling with Apache Nutch

Web Crawling with Apache Nutch Web Crawling with Apache Nutch Sebastian Nagel snagel@apache.org ApacheCon EU 2014 2014-11-18 About Me computational linguist software developer at Exorbyte (Konstanz, Germany) search and data matching

More information

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon 2014. Max Putas

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon 2014. Max Putas Analyzing large flow data sets using modern open-source data search and visualization tools FloCon 2014 Max Putas About me Operations Engineer - DevOps BS, MS, and CAS in Telecommunications Work/research

More information

NoSQL Roadshow Berlin Kai Spichale

NoSQL Roadshow Berlin Kai Spichale Full-text Search with NoSQL Technologies NoSQL Roadshow Berlin Kai Spichale 25.04.2013 About me Kai Spichale Software Engineer at adesso AG Author in professional journals, conference speaker adesso is

More information

K@ A collaborative platform for knowledge management

K@ A collaborative platform for knowledge management White Paper K@ A collaborative platform for knowledge management Quinary SpA www.quinary.com via Pietrasanta 14 20141 Milano Italia t +39 02 3090 1500 f +39 02 3090 1501 Copyright 2004 Quinary SpA Index

More information

Client Overview. Engagement Situation

Client Overview. Engagement Situation Client Overview Our client is a provider of Operational Analytics and Visualization solutions for cloud/datacenters that enables IT function of an organization to monitor, and plan complex cloud and data

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers

More information

CC 2.0 by William Brawley http://flic.kr/p/7pdup3

CC 2.0 by William Brawley http://flic.kr/p/7pdup3 CC 2.0 by William Brawley http://flic.kr/p/7pdup3 Why Hadoop and HBase? Social Media Monitoring Prospective Search and Coprocessors Challenges & Lessons Learned Resources to get started 2 Agenda Software

More information

Getting Real Real Time Data Integration Patterns and Architectures

Getting Real Real Time Data Integration Patterns and Architectures Getting Real Real Time Data Integration Patterns and Architectures Nelson Petracek Senior Director, Enterprise Technology Architecture Informatica Digital Government Institute s Enterprise Architecture

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Big Data Drupal. Commercial Open Source Big Data Tool Chain

Big Data Drupal. Commercial Open Source Big Data Tool Chain Big Data Drupal Commercial Open Source Big Data Tool Chain How did I prepare? MapReduce Field Work About Me Nicholas Roberts 10+ years web Webmaster, Project & Product Manager Australian Sonoma County

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed

More information

Creating a universe on Hive with Hortonworks HDP 2.0

Creating a universe on Hive with Hortonworks HDP 2.0 Creating a universe on Hive with Hortonworks HDP 2.0 Learn how to create an SAP BusinessObjects Universe on top of Apache Hive 2 using the Hortonworks HDP 2.0 distribution Author(s): Company: Ajay Singh

More information

Extranet Business Goals

Extranet Business Goals Agenda Extranet Business Optimization What is your organizational strategy? Extranet Business Goals Reduce supply chain inefficiencies Interact with your loyal customer base Extend customer self service

More information

SPM: Large Scale Performance Monitoring for ElasticSearch HBase Solr & Friends. Otis Gospodnetić Sematext International @otisg @sematext sematext.

SPM: Large Scale Performance Monitoring for ElasticSearch HBase Solr & Friends. Otis Gospodnetić Sematext International @otisg @sematext sematext. SPM: Large Scale Performance Monitoring for ElasticSearch HBase Solr & Friends #spmbuzz #bbuzz Otis Gospodnetić Sematext International @otisg @sematext sematext.com Agenda Introductions SPM Architecture

More information

BIRT ihub 3. 2013 Actuate Customer Days. Wow that looks good! Jeff Morris & Mark Gamble

BIRT ihub 3. 2013 Actuate Customer Days. Wow that looks good! Jeff Morris & Mark Gamble BIRT ihub 3 Wow that looks good! Jeff Morris & Mark Gamble SF Nov7 - UK Nov12 - DE Nov13 - FR Nov14 - SG Nov19 - JP Nov22 - NY Dec4 2013 Actuate Customer Days Actuate BIRT ihub 3 Focus Areas Simplified,

More information

Presents. WITSML Solutions For Your Business

Presents. WITSML Solutions For Your Business Presents WITSML Solutions For Your Business WHAT IS WITSML? WITSML (Wellsite Information Transfer Standard Markup Language) is a petroleum industry standard way of sharing drilling and completions related

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

ifinder ENTERPRISE SEARCH

ifinder ENTERPRISE SEARCH DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

Michelle Metzger TLG Learning. Support: 1-855-460-5880

Michelle Metzger TLG Learning. Support: 1-855-460-5880 Michelle Metzger TLG Learning 14 years in the Training & Consulting Industry Microsoft Certified Trainer Certified SQL Administrator Certified SharePoint Administrator Introduction to SharePoint 2013 New

More information

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events every day Neha Narkhede Co-founder and Head of Engineering @ Stealth Startup Prior to this Lead, Streams Infrastructure

More information

Maintaining Non-Stop Services with Multi Layer Monitoring

Maintaining Non-Stop Services with Multi Layer Monitoring Maintaining Non-Stop Services with Multi Layer Monitoring Lahav Savir System Architect and CEO of Emind Systems lahavs@emindsys.com www.emindsys.com The approach Non-stop applications can t leave on their

More information

Building a Hosted, Multi-Tenent Contact Center Environment for Thousands of Agents

Building a Hosted, Multi-Tenent Contact Center Environment for Thousands of Agents Building a Hosted, Multi-Tenent Contact Center Environment for Thousands of Agents Presented by Matt Florell President - ViciDial Group Astricon 2013 * Atlanta, GA, USA October 10, 2013 Project Requirements

More information

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

More information

The Open Source CMS. Open Source Java & XML

The Open Source CMS. Open Source Java & XML The Open Source CMS Store and retrieve Classify and organize Version and archive management content Edit and review Browse and find Access control collaboration publishing Navigate and show Notify Aggregate

More information

SQL SERVER 2008 DATABASE MANAGEMENT. PART I: Writing Queries using MS Server 2008 Transact-SQL

SQL SERVER 2008 DATABASE MANAGEMENT. PART I: Writing Queries using MS Server 2008 Transact-SQL SQL SERVER 2008 DATABASE MANAGEMENT PART I: Writing Queries using MS Server 2008 Transact-SQL Module 1: Querying and Filtering Data Using the SELECT Statement Filtering Data Working with NULL Values Formatting

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

A Java proxy for MS SQL Server Reporting Services

A Java proxy for MS SQL Server Reporting Services 1 of 5 1/10/2005 9:37 PM Advertisement: Support JavaWorld, click here! January 2005 HOME FEATURED TUTORIALS COLUMNS NEWS & REVIEWS FORUM JW RESOURCES ABOUT JW A Java proxy for MS SQL Server Reporting Services

More information

SAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ

SAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ SAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ (Delta from SPS 08 to SPS 09) SAP HANA Product Management November, 2014 2014 SAP SE or an SAP affiliate company. All rights reserved. 1 Agenda

More information

SlowShop.com SuperShop.com A practical example of how FusionReactor v5 can identify 10 common problems in ColdFusion applications.

SlowShop.com SuperShop.com A practical example of how FusionReactor v5 can identify 10 common problems in ColdFusion applications. SlowShop.com SuperShop.com A practical example of how FusionReactor v5 can identify 10 common problems in ColdFusion applications. Intergral Information Solutions David Stockton Senior Technical Consultant

More information

Get started with cloud hybrid search for SharePoint

Get started with cloud hybrid search for SharePoint Get started with cloud hybrid search for SharePoint This document supports a preliminary release of the cloud hybrid search feature for SharePoint 2013 with August 2015 PU and for SharePoint 2016 Preview,

More information

Distributed Calculus with Hadoop MapReduce inside Orange Search Engine. mardi 3 juillet 12

Distributed Calculus with Hadoop MapReduce inside Orange Search Engine. mardi 3 juillet 12 Distributed Calculus with Hadoop MapReduce inside Orange Search Engine What is Big Data? $ 5 billions (2012) to $ 50 billions (by 2017) Forbes «Big Data is the new definitive source of competitive advantage

More information

Enterprise Reporting Solution

Enterprise Reporting Solution Background Current Reporting Challenges: Difficulty extracting various levels of data from AgLearn Limited ability to translate data into presentable formats Complex reporting requires the technical staff

More information

Monitoring can be as simple as waiting

Monitoring can be as simple as waiting Proactive monitoring for dynamic virtualized environments By David Weber and Veronique Delarue Virtualization can significantly increase monitoring complexity. By using BMC ProactiveNet Performance Management,

More information

Writing for Developers: The New Customers. Amruta Ranade

Writing for Developers: The New Customers. Amruta Ranade Writing for Developers: The New Customers Amruta Ranade 1 First, let s discuss the difference between User Docs and Developer Docs 2 Let s consider an example. Suppose we are writing the user docs for

More information

Building a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave

Building a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave Building a logging pipeline with Open Source tools Iñigo Ortiz de Urbina Cazenave NLUUG Utrecht - Netherlands 28 May 2015 whoami; 2 Iñigo Ortiz de Urbina Cazenave Systems Engineer whoami; groups; 3 Iñigo

More information

Solutions to Trust. NEXThink V5 What is New?

Solutions to Trust. NEXThink V5 What is New? Solutions to Trust NEXThink V5 What is New? HIGHLIGHTS What is New? ITSM: IT services analytics in real-time Analytics and product usability Security Analytics for all web & cloud applications Product

More information

Business Application Services Testing

Business Application Services Testing Business Application Services Testing Curriculum Structure Course name Duration(days) Express 2 Testing Concept and methodologies 3 Introduction to Performance Testing 3 Web Testing 2 QTP 5 SQL 5 Load

More information

Axibase Time Series Database

Axibase Time Series Database Axibase Time Series Database Axibase Time Series Database Axibase Time-Series Database (ATSD) is a clustered non-relational database for the storage of various information coming out of the IT infrastructure.

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK

SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK Simple Machine Heuristic (SMH) Intelligent Agent (IA) Framework Tuesday, November 20, 2011 Randall Mora, David Harris, Wyn Hack Avum, Inc. Outline Solution

More information

Using Apache Solr for Ecommerce Search Applications

Using Apache Solr for Ecommerce Search Applications Using Apache Solr for Ecommerce Search Applications Rajani Maski Happiest Minds, IT Services SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. 2 Copyright Information This document

More information

W3Perl A free logfile analyzer

W3Perl A free logfile analyzer W3Perl A free logfile analyzer Features Works on Unix / Windows / Mac View last entries based on Perl scripts Web / FTP / Squid / Email servers Session tracking Others log format can be added easily Detailed

More information

Client Overview. Engagement Situation. Key Requirements

Client Overview. Engagement Situation. Key Requirements Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision

More information

RCAAP: Building and maintaining a national repository network

RCAAP: Building and maintaining a national repository network RCAAP: Building and maintaining a national repository network José Carvalho jcarvalho@sdum.uminho.pt Eloy Rodrigues eloy@sdum.uminho.pt Pedro Príncipe pedroprincipe@sdum.umiho.pt Ricardo Saraiva rsaraiva@sdum.uminho.pt

More information

Microsoft Office SharePoint Server (MOSS) 2007 Overview

Microsoft Office SharePoint Server (MOSS) 2007 Overview Microsoft Office SharePoint Server (MOSS) 2007 Overview for Technology Manager Wei Wang MOSS Technical Expert Consultant shangmeizhai@hotmail.commsiw 17.04.2010 - Seite 1 Agenda Collaboration Portal Search

More information

Which Reporting Tool Should I Use for EPM? Glenn Schwartzberg InterRel Consulting info@interrel.com

Which Reporting Tool Should I Use for EPM? Glenn Schwartzberg InterRel Consulting info@interrel.com Which Reporting Tool Should I Use for EPM? Glenn Schwartzberg InterRel Consulting info@interrel.com Disclaimer These slides represent the work and opinions of the presenter and do not constitute official

More information

Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide

Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide 1 www.microsoft.com/sharepoint The information contained in this document represents the current view of Microsoft Corporation on the issues

More information

INTRODUCING ORACLE APPLICATION EXPRESS. Keywords: database, Oracle, web application, forms, reports

INTRODUCING ORACLE APPLICATION EXPRESS. Keywords: database, Oracle, web application, forms, reports INTRODUCING ORACLE APPLICATION EXPRESS Cristina-Loredana Alexe 1 Abstract Everyone knows that having a database is not enough. You need a way of interacting with it, a way for doing the most common of

More information

Developing Microsoft SharePoint Server 2013 Advanced Solutions MOC 20489

Developing Microsoft SharePoint Server 2013 Advanced Solutions MOC 20489 Developing Microsoft SharePoint Server 2013 Advanced Solutions MOC 20489 Course Outline Module 1: Creating Robust and Efficient Apps for SharePoint In this module, you will review key aspects of the apps

More information

Building a Scalable News Feed Web Service in Clojure

Building a Scalable News Feed Web Service in Clojure Building a Scalable News Feed Web Service in Clojure This is a good time to be in software. The Internet has made communications between computers and people extremely affordable, even at scale. Cloud

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

ACEYUS REPORTING. Aceyus Intelligence Executive Summary

ACEYUS REPORTING. Aceyus Intelligence Executive Summary ACEYUS REPORTING Aceyus Intelligence Executive Summary Aceyus, Inc. June 2015 1 ACEYUS REPORTING ACEYUS INTELLIGENCE EXECUTIVE SUMMARY Aceyus Intelligence is a suite of products for optimizing contact

More information

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO

More information

How to select the right Marketing Cloud Edition

How to select the right Marketing Cloud Edition How to select the right Marketing Cloud Edition Email, Mobile & Web Studios ith Salesforce Marketing Cloud, marketers have one platform to manage 1-to-1 customer journeys through the entire customer lifecycle

More information

Using WebLOAD to Monitor Your Production Environment

Using WebLOAD to Monitor Your Production Environment Using WebLOAD to Monitor Your Production Environment Your pre launch performance test scripts can be reused for post launch monitoring to verify application performance. This reuse can save time, money

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

Performance and Scalability Overview

Performance and Scalability Overview Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING

More information

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we

More information

1 Log visualization at CNES (Part II)

1 Log visualization at CNES (Part II) 1 Log visualization at CNES (Part II) 1.1 Background For almost 2 years now, CNES has set up a team dedicated to "log analysis". Its role is multiple: This team is responsible for analyzing the logs after

More information

Wikimedia architecture. Mark Bergsma Wikimedia Foundation Inc.

Wikimedia architecture. Mark Bergsma <mark@wikimedia.org> Wikimedia Foundation Inc. Mark Bergsma Wikimedia Foundation Inc. Overview Intro Global architecture Content Delivery Network (CDN) Application servers Persistent storage Focus on architecture, not so much on

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information