The Open Source Knowledge Discovery and Document Analysis Platform
|
|
- Bruce Maxwell
- 8 years ago
- Views:
Transcription
1 Enabling Agile Intelligence through Open Analytics The Open Source Knowledge Discovery and Document Analysis Platform 17/10/2012 1
2 Agenda Introduction and Agenda Problem Definition Knowledge Discovery Document Analysis The Infinit.e Solution Architecture Use Cases Questions
3 The Problem
4 Knowledge Discovery Knowledge Discovery is the process of indexing and categorizing the contents of a corpus of data sources in order to identify what is contained in those sources and how to retrieve it. What information do we have? Where is the information located?
5 Document Analysis Document Analysis is the process of analyzing the contents of a large numbers of documents in order to answer questions related to the content of those documents. What kind of questions can we answer with our data? What kind of enrichment can we apply to our data to improve our ability to answer organizational questions?
6 The Infinit.e Solution Infinit.e is an Open Source Knowledge Discovery and Document Analysis platform that Harvests Enriches Stores Retrieves Analyzes Visualizes structured and unstructured documents
7 The Architecture External Applications & GUIs RSS XML HTML TXT PDF JDBC Etc. Rest Based API Core Server elasticsearch JSON RSS KML GraphML Etc. Enrichment MongoDB Hadoop Linux
8 Storage Infinit.e uses MongoDB for the following reasons: Document-oriented storage Horizontal and Vertical Scalability The infinit.e.data_model library: Manages connections to MongoDB Converts JSON (BSON) to POJOs using Google s GSON library
9 Harvesting Server infinit.e.core.server library manages the process of harvesting and cleansing documents: service infinite-px-engine start Configurable for timing and number of documents to harvest per cycle Note: Migrating to the Apache UIMA framework is on our to-do list Harvesting
10 Harvesting Document Types The Infinit.e platform can harvest documents from: URLs RSS, HTML, etc. File Shares Samba, Windows Shares, and local files Databases via JDBC
11 Harvesting Sources Infinit.e harvests documents based on configuration information contained in Source documents like the following example: { } "_id": "4cbdb9f05ed98e7bed499270", "title": "Wired: Top News", "url": " "created": "Oct 19, :32:00 AM", "description": "Top News", "extracttype": "Feed", "mediatype": "News", "modified": "Oct 19, :32:00 AM", "tags": ["technology", "news"]
12 Harvesting Metadata Extraction Infinit.e does not store the original document Infinit.e extracts the metadata associated with the original document and creates a Document POJO Full text can be stored in gzip format within a MongoDB collection Note: The Infinit.e harvester uses the Apache Tika toolkit to extract metadata and text from a wide variety of file formats.
13 Harvesting doc_metadata { } "_id" : ObjectId("4f93638e0cf212156d0559d2"), "title" : "Mediterranean conference seeks to flourish tourism in Egypt, Tunisia...", "url" : " html" "description" : "Report by egyptlastminute CAIRO: On Monday, the countries of the Mediterranean opened a conference seeking to enhance the future of tourism in the region. The conference focuses on the countries of Egypt and Tunisia; the most...", "created" : ISODate(" T01:49:02Z"), metadata : { }, "associations" : [ ], "entities" : [ ],...
14 Harvesting metadata { }... "metadata" : { "location" : [ { "region" : "South Asia", "citystateprovince" : { "stateprovince" : "Rolpa, "city" : "Newang" }, "country" : "Nepal" } ], "icn" : [ " " ], "incidentdate" : [ "07/25/2005" ], "organization" : [ "Communist Party of Nepal (Maoist)/United People's Front ],... },...
15 Enrichment What is it? Data enrichment is: The extraction of entities (people, places, things) and associations (relationships, events, facts) from unstructured data using Natural Language Processing (NLP) libraries Extracting entities and associations from structured data sources Applying geo-tags to entities and associations
16 Enrichment Libraries The Infinit.e platform ships with several enrichment libraries including: Structured Analysis Handler Extracts entities, creates associations, and geo-tags data from databases and other structured source documents like XML Unstructured Analysis Handler Uses RegExs, JavaScript, or Xpath to extract entities and associations TextRank based keyphrase extractor Extracts entities (keywords or phrases) from text using the TextRank algorithm and OpenNLP
17 Enrichment Structured Sample Structured Analysis Source { } "_id": " e4b0bb b7", "communityids": ["503663b1e4b0bb b4"], "created": "Aug 23, :17:09 PM", "description": "NCTC Wits Data",... "structuredanalysis": { "entities": [ { "dimension": "Who", "disambiguated_name": "$characteristic from $nationality", "iterateover": "perpetrator", "type": "PersonPerpetrator", "usedocgeo": false }... ] },...
18 Enrichment 3 rd Party Libraries Infinit.e comes with built in support for several 3 rd party enrichment tools including:
19 Enrichment Entities Feature.entity { "_id" : ObjectId("4f9189d48baf188282a1c9ef"), "alias" : [ "Zine el Abidine Ben Ali", "Zine El Abidine Ben Ali", "Zine el Abidine ben Ali" ], "batch_resync" : true, "communityid" : ObjectId("4f8f ee8003bf518"), "db_sync_doccount" : NumberLong(143), "db_sync_time" : " ", "dimension" : "Who", "disambiguated_name" : "Zine El Abidine Ben Ali", "doccount" : 152, "index" : "zine el abidine ben ali/person", "totalfreq" : 353, "type" : "Person" }
20 Enrichment Entities
21 Enrichment Associations Feature.association { "_id" : ObjectId("4f9189d48baf188282a1ca24"), "assoc_type" : "Fact", "communityid" : ObjectId("4f8f ee8003bf518"), "db_sync_doccount" : NumberLong(70), "db_sync_time" : " ", "doccount" : NumberLong(73), "entity1" : [ "zine el abidine ben ali", "zine el abidine ben ali/person" ], "entity1_index" : "zine el abidine ben ali/person", "entity2" : ["president,"president/position ], "entity2_index" : "president/position", "index" : "5e3fff27ddb78d6873ccfc77cf05c52f", "verb" : ["career,"current,"past ], "verb_category" : "career" }
22 Enrichment Associations
23 Enrichment Geolocation Feature.geo { "_id" : ObjectId("4d8bb5efbe07bb4f7036c82e"), "search_field" : "cairo", "country" : "Egypt", "country_code" : "EG", "city" : "cairo", "region" : "Al Qahirah", "region_code" : "EG11", "population" : , "latitude" : "30.05", "longitude" : "31.25", "geoindex" : { "lat" : 30.05, "lon" : } } Note: MongoDB 2d Index
24 Enrichment Geolocation
25 Retrieval - Indexing Infinit.e uses elasticsearch to index the document, entity, and association data stored in MongoDB Document, entity and association data is searchable via Lucene queries The fields indexed by elasticsearch can be configured
26 Retrieval RESTful Interface Infinit.e exposes its API via a RESTful interface Infinit.e.api.server uses the Restlet API framework Example HTTP Get API Calls
27 Analysis What s Built In The Infinit.e platform ships with built in algorithms that calculate the following for entities: Significance Entity (term frequency inverse document frequency, a.k.a. TF-IDF) Document (sum of entity significance) Coverage Percentage of documents an entity appears in the dataset returned by a query Frequency Number of occurrences in the dataset returned by a query
28 Analysis Hadoop MapReduce The Infinit.e platform has a built-in integration with Apache s Hadoop MapReduce framework
29 Analysis Hadoop MapReduce Configuration Options Job schedule Custom MongoDB query Mapper/combiner/reducer classes Output key and value types Whether or not to append results to existing data sets Data age out in number of days Job dependencies User arguments Reuse existing MapReduce jar
30 Visualization Infinit.e includes an Adobe Flex based application with a set of default visualization widgets
31 Use Case The HTS Problem: HTS had a massive amount of unstructured data locked up in 1000s of documents with no way to get at it economically Highly skilled analysts had to read each document and manually extract the information into an Excel spreadsheet that was used to catalog the contents by Topics
32 Use Case The Infinit.e Solution: Harvest the documents using Infinit.e Extract entities from the harvested documents (who, what, where) Assign one or more Topics to each document based on the entities extracted (i.e. clustering)
33 Questions? Thank you! Craig Vitter Professional Services Engineer
Search and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
More informationInformation Retrieval Elasticsearch
Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches
More informationAd Hoc Analysis of Big Data Visualization
Ad Hoc Analysis of Big Data Visualization Dean Yao Director of Marketing Greg Harris Systems Engineer Follow us @Jinfonet #BigDataWebinar JReport Highlights Advanced, Embedded Data Visualization Platform:
More informationBig Data Visualization and Dashboards
Big Data Visualization and Dashboards Boney Pandya Marketing Manager Greg Harris Systems Engineer Follow us @Jinfonet #BigDataWebinar JReport Highlights Advanced, Embedded Data Visualization Platform:
More informationReal-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH
Real-time Data Analytics mit Elasticsearch Bernhard Pflugfelder inovex GmbH Bernhard Pflugfelder Big Data Engineer @ inovex Fields of interest: search analytics big data bi Working with: Lucene Solr Elasticsearch
More informationGeo Analysis, Visualization and Performance with JReport 13
Geo Analysis, Visualization and Performance with JReport 13 Boney Pandya Marketing Manager Leo Zhao Systems Engineer Follow us @Jinfonet JReport Highlights Advanced, Embedded Data Visualization Platform:
More informationBig Data and Analytics (Fall 2015)
Big Data and Analytics (Fall 2015) Core/Elective: MS CS Elective MS SPM Elective Instructor: Dr. Tariq MAHMOOD Credit Hours: 3 Pre-requisite: All Core CS Courses (Knowledge of Data Mining is a Plus) Every
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationEmbedded Analytics & Big Data Visualization in Any App
Embedded Analytics & Big Data Visualization in Any App Boney Pandya Marketing Manager Greg Harris Systems Engineer Follow us @Jinfonet Our Mission Simplify the Complexity of Reporting and Visualization
More informationData Integration Checklist
The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationNoSQL Roadshow Berlin Kai Spichale
Full-text Search with NoSQL Technologies NoSQL Roadshow Berlin Kai Spichale 25.04.2013 About me Kai Spichale Software Engineer at adesso AG Author in professional journals, conference speaker adesso is
More informationEmbedding Customized Data Visualization and Analysis
Embedding Customized Data Visualization and Analysis Boney Pandya Marketing Manager Leo Zhao Systems Engineer Follow us @Jinfonet JReport Highlights Advanced, Embedded Data Visualization Platform: High
More informationFull-text Search in Intermediate Data Storage of FCART
Full-text Search in Intermediate Data Storage of FCART Alexey Neznanov, Andrey Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia ANeznanov@hse.ru,
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationFlattening Enterprise Knowledge
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
More informationThe Rembrandt Group Strategies for BIG DATA 2015-2016
The Rembrandt Group Strategies for BIG DATA 2015-2016 Big Data Interesting applications are data hungry Increased number & variety of sources Realization that delete is not an option The data grows over
More informationBeyond The Web Drupal Meets The Desktop (And Mobile) Justin Miller Code Sorcery Workshop, LLC http://codesorcery.net/dcdc
Beyond The Web Drupal Meets The Desktop (And Mobile) Justin Miller Code Sorcery Workshop, LLC http://codesorcery.net/dcdc Introduction Personal introduction Format & conventions for this talk Assume familiarity
More informationLarge Scale Text Analysis Using the Map/Reduce
Large Scale Text Analysis Using the Map/Reduce Hierarchy David Buttler This work is performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationHow To Make A Network Smarter In Pachube.Com
WHY NETWORK DEVICES & ENVIRONMENTS?! remote monitoring & control! connected interactions, new social relationships! products! services (recurring revenue)! new business-models! real-time product analytics!
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationClient Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
More informationElasticsearch for Lua Developers. Pablo Musa pablo@elastic.co
Elasticsearch for Lua Developers Pablo Musa pablo@elastic.co + + Me Pablo Musa Educational Engineer @ Elastic Which student? 5 interested students 3 very good proposals Key Points: - Background (Lua, Elasticsearch,
More informationIntroducing Apache Pivot. Greg Brown, Todd Volkert 6/10/2010
Introducing Apache Pivot Greg Brown, Todd Volkert 6/10/2010 Speaker Bios Greg Brown Senior Software Architect 15 years experience developing client and server applications in both services and R&D Apache
More informationQuality Measure Definitions Overview
Quality Measure Definitions Overview pophealth is a open source software tool that automates population health reporting quality measures. pophealth integrates with a healthcare provider's electronic health
More informationPLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP
PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO
More informationAutomated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer
Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we
More informationGeneral principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support
General principles and architecture of Adlib and Adlib API Petra Otten Manager Customer Support Adlib Database management program, mainly for libraries, museums and archives 1600 customers in app. 30 countries
More informationOPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP
OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP 1 KALYANKUMAR B WADDAR, 2 K SRINIVASA 1 P G Student, S.I.T Tumkur, 2 Assistant Professor S.I.T Tumkur Abstract- Product Review System
More informationPreface. Motivation for this Book
Preface Asynchronous JavaScript and XML (Ajax or AJAX) is a web technique to transfer XML data between a browser and a server asynchronously. Ajax is a web technique, not a technology. Ajax is based on
More informationMonitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
More informationMongoDB Developer and Administrator Certification Course Agenda
MongoDB Developer and Administrator Certification Course Agenda Lesson 1: NoSQL Database Introduction What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL Types of NoSQL
More informationA Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
More informationData Discovery and Systems Diagnostics with the ELK stack. Rittman Mead - BI Forum 2015, Brighton. Robin Moffatt, Principal Consultant Rittman Mead
Data Discovery and Systems Diagnostics with the ELK stack Rittman Mead - BI Forum 2015, Brighton Robin Moffatt, Principal Consultant Rittman Mead T : +44 (0) 1273 911 268 (UK) About Me Principal Consultant
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationUnleash your intuition
Introducing Qlik Sense Unleash your intuition Qlik Sense is a next-generation self-service data visualization application that empowers everyone to easily create a range of flexible, interactive visualizations
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationMongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15
MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You
More informationHow To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
More informationDeploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture
Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Apps and data source extensions with APIs Future white label, embed or integrate Power BI Deploy Intelligent
More informationReducing Client Incidents through
Intel IT IT Best Practices Big Data Predictive Analytics December 2013 Reducing Client Incidents through Big Data Predictive Analytics Executive Overview Our new ability to proactively, rather than reactively,
More informationAddressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015
Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a
More informationIntegrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
More informationBIG DATA TOOLS. Top 10 open source technologies for Big Data
BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationUsing Tableau Software with Hortonworks Data Platform
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationBIRT in the World of Big Data
BIRT in the World of Big Data David Rosenbacher VP Sales Engineering Actuate Corporation 2013 Actuate Customer Days Today s Agenda and Goals Introduction to Big Data Compare with Regular Data Common Approaches
More informationInvestigating Hadoop for Large Spatiotemporal Processing Tasks
Investigating Hadoop for Large Spatiotemporal Processing Tasks David Strohschein dstrohschein@cga.harvard.edu Stephen Mcdonald stephenmcdonald@cga.harvard.edu Benjamin Lewis blewis@cga.harvard.edu Weihe
More informationBig Data Visualization with JReport
Big Data Visualization with JReport Dean Yao Director of Marketing Greg Harris Systems Engineer Next Generation BI Visualization JReport is an advanced BI visualization platform: Faster, scalable reports,
More informationData Management in SAP Environments
Data Management in SAP Environments the Big Data Impact Berlin, June 2012 Dr. Wolfgang Martin Analyst, ibond Partner und Ventana Research Advisor Data Management in SAP Environments Big Data What it is
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology
International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)
More informationCSCI6900 Assignment 2: Naïve Bayes on Hadoop
DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF GEORGIA CSCI6900 Assignment 2: Naïve Bayes on Hadoop DUE: Friday, September 18 by 11:59:59pm Out September 4, 2015 1 IMPORTANT NOTES You are expected to use
More informationBig Data Solutions. Portal Development with MongoDB and Liferay. Solutions
Big Data Solutions Portal Development with MongoDB and Liferay Solutions Introduction Companies have made huge investments in Business Intelligence and analytics to better understand their clients and
More informationChapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
More informationMining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
More informationMaking Sense of Big Data in Insurance
Making Sense of Big Data in Insurance Amir Halfon, CTO, Financial Services, MarkLogic Corporation BIG DATA?.. SLIDE: 2 The Evolution of Data Management For your application data! Application- and hardware-specific
More informationMLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group
Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and
More informationDeveloping Microsoft SharePoint Server 2013 Advanced Solutions MOC 20489
Developing Microsoft SharePoint Server 2013 Advanced Solutions MOC 20489 Course Outline Module 1: Creating Robust and Efficient Apps for SharePoint In this module, you will review key aspects of the apps
More informationIntroducing the Reimagined Power BI Platform. Jen Underwood, Microsoft
Introducing the Reimagined Power BI Platform Jen Underwood, Microsoft Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor and manage user
More informationXpoLog Center Suite Data Sheet
XpoLog Center Suite Data Sheet General XpoLog is a data analysis and management platform for Applications IT data. Business applications rely on a dynamic heterogeneous applications infrastructure, such
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationThe Big Data Paradigm Shift. Insight Through Automation
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
More informationINSPIRE Dashboard. Technical scenario
INSPIRE Dashboard Technical scenario Technical scenarios #1 : GeoNetwork catalogue (include CSW harvester) + custom dashboard #2 : SOLR + Banana dashboard + CSW harvester #3 : EU GeoPortal +? #4 :? + EEA
More informationCleveland State University
Cleveland State University CIS 612 Modern Database Programming & Big Data Processing (3-0-3) Fall 2014 Section 50 Class Nbr. 2670. Tues, Thur 4:00 5:15 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred.
More informationA very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect
A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers
More informationBest Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
More informationBIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
More informationGeneric Log Analyzer Using Hadoop Mapreduce Framework
Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationMONGODB - THE NOSQL DATABASE
MONGODB - THE NOSQL DATABASE Akhil Latta Software Engineer Z Systems, Mohali, Punjab MongoDB is an open source document-oriented database system developed and supported by 10gen. It is part of the NoSQL
More informationIntroduction to Big Data & Basic Data Analysis. Freddy Wetjen, National Library of Norway.
Introduction to Big Data & Basic Data Analysis Freddy Wetjen, National Library of Norway. Big Data EveryWhere! Lots of data may be collected and warehoused Web data, e-commerce purchases at department/
More informationContents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes
Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter
More informationThe emergence of big data technology and analytics
ABSTRACT The emergence of big data technology and analytics Bernice Purcell Holy Family University The Internet has made new sources of vast amount of data available to business executives. Big data is
More informationAdobe ColdFusion 11 Enterprise Edition
Adobe ColdFusion 11 Enterprise Edition Version Comparison Adobe ColdFusion 11 Enterprise Edition Adobe ColdFusion 11 Enterprise Edition is an all-in-one application server that offers you a single platform
More informationLog Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com
More informationFinding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014
Finding the Needle in a Big Data Haystack Wolfgang Hoschek (@whoschek) JAX 2014 1 About Wolfgang Software Engineer @ Cloudera Search Platform Team Previously CERN, Lawrence Berkeley National Laboratory,
More informationCourse 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions OVERVIEW
Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions OVERVIEW About this Course This course provides SharePoint developers the information needed to implement SharePoint solutions
More informationPROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015
Enterprise Scale Disease Modeling Web Portal PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015 i Last Updated: 5/8/2015 4:13 PM3/5/2015 10:00 AM Enterprise
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK OVERVIEW ON BIG DATA SYSTEMATIC TOOLS MR. SACHIN D. CHAVHAN 1, PROF. S. A. BHURA
More informationIncrease Agility and Reduce Costs with a Logical Data Warehouse. February 2014
Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4
More informationthe missing log collector Treasure Data, Inc. Muga Nishizawa
the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days
More informationCommuniqué 4. Standardized Global Content Management. Designed for World s Leading Enterprises. Industry Leading Products & Platform
Communiqué 4 Standardized Communiqué 4 - fully implementing the JCR (JSR 170) Content Repository Standard, managing digital business information, applications and processes through the web. Communiqué
More informationBEdita. A system to manage and publish content, a shared platform that will increase the value of your informative patrimony
BEdita A system to manage and publish content, a shared platform that will increase the value of your informative patrimony Christiano Presutti ChannelWeb ChannelWeb / Chialab BEdita 1 The open system
More informationSisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
More informationApache Lucene. Searching the Web and Everything Else. Daniel Naber Mindquarry GmbH ID 380
Apache Lucene Searching the Web and Everything Else Daniel Naber Mindquarry GmbH ID 380 AGENDA 2 > What's a search engine > Lucene Java Features Code example > Solr Features Integration > Nutch Features
More informationDE-20489B Developing Microsoft SharePoint Server 2013 Advanced Solutions
DE-20489B Developing Microsoft SharePoint Server 2013 Advanced Solutions Summary Duration Vendor Audience 5 Days Microsoft Developer Published Level Technology 21 November 2013 300 Microsoft SharePoint
More informationSentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
More informationMashing Up with Google Mashup Editor and Yahoo! Pipes
Mashing Up with Google Mashup Editor and Yahoo! Pipes Gregor Hohpe www.eaipatterns.com Gregor Hohpe: Mashing Up with Google Mashup Editor and Yahoo! Pipes Slide 1 Who's Gregor? Distributed systems, enterprise
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationLeveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015
Leveraging the Power of SOLR with SPARK Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015 Welcome Johannes Weigend - CTO QAware GmbH - Software architect / developer - 25 years
More informationDiscovering Business Insights in Big Data Using SQL-MapReduce
Discovering Business Insights in Big Data Using SQL-MapReduce A Technical Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy July 2013 Sponsored by Copyright 2013
More informationUsing Apache Solr for Ecommerce Search Applications
Using Apache Solr for Ecommerce Search Applications Rajani Maski Happiest Minds, IT Services SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. 2 Copyright Information This document
More informationNoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011
NoSQL - What we ve learned with mongodb Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011 DW2.0 and NoSQL management decision support intgrated access - local v. global - structured v.
More informationStructured Content: the Key to Agile. Web Experience Management. Introduction
Structured Content: the Key to Agile CONTENTS Introduction....................... 1 Structured Content Defined...2 Structured Content is Intelligent...2 Structured Content and Customer Experience...3 Structured
More informationEnterprise Content Management with Microsoft SharePoint
Enterprise Content Management with Microsoft SharePoint Overview of ECM Services and Features in Microsoft Office SharePoint Server 2007 and Windows SharePoint Services 3.0. A KnowledgeLake, Inc. White
More information