Adobe Semantic Analysis Platform

Size: px
Start display at page:

Download "Adobe Semantic Analysis Platform"

Transcription

1 Adobe Semantic Analysis Platform Sept. 3, 2008 Walter W. Chang Senior Computer Scientist Advanced Technology Labs Adobe Systems, Inc.

2 Presentation Overview Background and motivation Challenges Semantic analysis techniques Platform architecture Screenshot demo Summary / lessons learned Future direction and trends

3 Presentation Overview Background and motivation Challenges Semantic analysis techniques Platform architecture Screenshot demo Summary / lessons learned Future direction and trends

4 Project history Semantic analysis platform for documents project started in 2005 in Adobe Advanced Technology Labs Targeted for enterprise gov. intelligence document workflows Identified growing opportunity in contextualized advertising Launched public Ads for Adobe PDF system in Nov Document analysis and topic/keyword recommendations for Yahoo! Ads 60 registered publishers (e.g., IEEE, CMP), 40 pending, others in disc. (Scientific America)

5 Challenges in developing our platform Finding correct set of analysis methods to understand documents Layers of representation and structure in documents Varying degrees of semantic noise Popular analysis methods are TFIDF-based Prioritizing results from all analysis methods Handling multi-theme documents Fluid, dynamic nature of ontologies

6 Key problem statement For document X, determine Aboutness ( X ) What are the main topics and concepts in X? Contextual model for X? Intentional attributes of X? To compute Aboutness( X ), a content intelligence system needs: Text extraction Metadata identification and extraction Extraction and statistical analysis of content N-grams Shallow and deep semantic analysis methods Mechanisms for generation of contextual ad metadata

7 Semantic model to address key problem For document X, determine Aboutness ( X ) Main Topics and concepts in X Contextual model for X Intentional attributes of X Develop a canonical semantic model for X: Topic domain contextualzation (CV, Ontology) Surface semantics Concept subontology Intentional semantics Text extraction Statistical BOW models N-gram TF-IDF & distribution Taxonomy/ontology based classifiers Theme-based gist / summarization NLP + deep semantic analysis Sentiment analysis Inference and rule engines

8 Presentation Overview Background and motivation Challenges Semantic analysis techniques Platform architecture Screenshot demo Summary / lessons learned Future direction and trends

9 Overview of Semantic Analysis Techniques Text extraction, lexical encoding and normalization Extensions to TF-IDF Keyword and N-gram models Document level Page level Employing ontologies Concept/topic analysis Summary/gist creation Domain expertise via rule engine Analysis result weighting

10 Text Extraction Challenges Missing document info (OCR, PDF) Complex reading order within document layout Presence of document noise (headers/footers)

11 Text Extraction Approach Use positional layout of text for inferring structure Vertical and horizontal ray projection text density Sampling to infer word, sentence and column gutter spacing Use statistical methods and heuristics to find text zones

12 Text Extraction Approach Recursively subdivide page into text zones Use heuristics to iterative scan text in each zone Re-synthesize likely reading order text per zone Identify and remove semantic noise Noise artifacts

13 Semantic Analysis Techniques Determine surface aboutness( x ) for document Normalize, find keywords and N-grams Perform statistical analysis: TFIDF analysis on keywords/n grams Term distribution analysis Rank terms by frequency & section weights term cluster position Categorize document by topics and concepts Summarize document Generate and submit terms to query the ad aggregator s inventory

14 Keyword and N-gram Analysis Text of source document Normalize pluralization, tense ( sky, skies, etc.) Term Frequency analysis Stopword filtering Stemming (e.g. Porter, Krovetz) Term N-gram extraction Remove trivial stopwords ( the, a, etc.) Find term n-grams ( British Columbia, relational database, etc.) Term Distribution analysis

15 Basic Keyword and Key Term Analysis Term Frequency analysis T F Term(i) Term(j) Count Term Distribution analysis 2 S.D. max(pos) Use TF IDF for surface analysis of semantics min(pos ) Term(i) avg(pos) +document level + page level 2 S.D. Term(i) avg. position 2 S.D. Use N-Gram distribution analysis to find topic center of gravity Term(j) avg position

16 Semantic Analysis Techniques How well does statistical document aboutness work? Reasonable results in many cases, but.. Problems: Semantic model based on term strength and co-occurrence Sensitive to writing styles that skew N-gram distributions Poor selectivity for multi-topic documents Need: Semantic model of content (e.g. weighted topic tree) Logic-based inferencing using key topics Mechanism to weigh statistical and symbolic semantics

17 Build Semantic Model of Concepts Goal: Construct concept/topic graph for document How: Use document categorization analysis methods to build topic hierarchy Leverage term statistics to identify strongest topics Leverage external taxonomy/thesaurus/ontologies Use topic supertypes for generalization E.g.: soccer field game outdoor game sport

18 How does an Ontology work? Use standardized term relationships Class Generalization / Specialization Instance Generalization / Specialization Class Relationships Ontology Thesauri example Enables upper TM platform layers: e.g., semantic analysis Relation Key NT Narrower Term BT Broader Term SN Synonym RT Related Term UF Use For TT Top Term NT Fruits Agriculture Products BT RT Vegetables Produce UF Term Non-preferred term Herbaceous plants Apples Pears Carrots

19 Example document: Travel guide for Canada PDF : pages, average = 5 pages, multiple subtopics Well written text, HS to college-level English Well-structured topically Domain terminology

20 Document Topic/Concept Extraction Section Weights Term Frequencies & Distributions Text stream filter Tokenizers Stopword filters Term stemmers Sentence segmenter Topic / Concept Extractor Ontology Manager Taxonomy / Thesaurus Inferencing Topic / Concept Weighting Scoring Rules geography physical geography bodies of water oceans land forms mountains political geography North America United States Canada Alberta Newfoundland culture & society leisure & recreation vacations arts & entertainment broadcast media television technology & sciences social sciences 4.0 history transportation travel industry tourism Document concept taxonomy

21 Semantic Analysis Techniques Observation: still other valuable concepts present Use document summarization analysis methods Goal: Capture key statement semantics via sampling Leverage topics/concepts to identify best sentences to extract into summary Leverage external taxonomy / thesaurus / ontology Find terms that support more general topics/concepts E.g.: mention of sightseeing supports tourism theme E.g.: mention of British Columbia supports Canada theme

22 Document Summarization Section Weights Term Frequencies & Distributions Text stream filter Tokenizers Stopword filters Term stemmers Sentence segmenter Topic / Concept Based Sentence Extractor Ontology Manager Topic based sentence selection Sentence Weighting Weighting Rules This will take you to our Virtual Canada Book web site to view or download video clips. They are linked to our website. Inquiries about this ebook should be sent to info@bcpictures.com. Virtual Canada Contents Introduction to Canada: A country of many colors. British Columbia and Vancouver Island: for the most scenic of mountain panoramas. The Government Offices. This site provides information on federal programs and services, departments and agencies. VIA Rail Canada VIA operates trains in all regions of Canada over a network spanning the country from the Atlantic to the Pacific. Greyhound Canada Coach service to nearly 1,100 towns and cities in Canada, as well as the United States. Visit the Canadian Automobile Association CAA offices across the entire country. Pick up cooccurring terms

23 Weighing statistical & semantic approaches Statistical Keywords (TF-IDF) Relevance Tokenizers Statistical N Grams ω0 Inventory match Monetization Input Document Text stream filter Stopword filters Term stemmers Sentence segmenter Topic / Concept Extraction ω1 ω2 : ωi : ωn Document Essence Human Evaluation Text of source document Summarization Ontology-based Inferencing No explicit ground truth! Lots of parameters & weights Difficult to tune & stabilize Changes will break things Infer and approximate conceptual and intentional semantics of content

24 Presentation Overview Background and motivation Challenges Semantic analysis techniques Platform architecture Screenshot demo Summary / lessons learned Future direction and trends

25 Architecture for a semantic analysis platform Framework for modular semantic analysis workflows (similar platforms e.g., IBM UIMA) Use Adobe proprietary and 3 rd party semantic services One interchange format for all semantic metadata Open language, server, database architecture C/C++, Java, PHP, Python Apache, Tomcat Oracle, SQLite, and JDBC accessible database Services orchestrated by WF engine

26 Adobe content intelligence platform 1 Input document 2 Extract, structure, & create text 3 Create semantic metadata & tags 4 Normalize & persist metadata 5 Retrieve, filter, and analyze all metadata 6 Score metadata & create essence Content input Text extraction Metadata Generation Metadata Persistence Semantic Analysis Essence Generation Documents Upload interface Tools & utilities CMS adapters Layout extraction Page/section segmentation Text extraction Text glyph filtering Keyterm entity extractor Categorizer & theme analyzer Summarizers XMP metadata services Metadata persistence services Category & summary filters Category taxonomy rule engine Weight categories & themes Recommend rule-based categories < XML > Crawlers Stopword filtering Term stemming Other extractors & analyzers Metadata Repository Adobe keyterm ranker Recommend doc & page Keyterms Commercial & open source taxonomies Taxonomies & ontologies Domain taxonomies Generic taxonomies Taxonomy & ontology builder

27 Semantic analysis processing node i > Doc.Reg. process 01 Doc.Reg. process 02 Job Queue PDF file1 PDF file2 PDF file3 PDF file4 PDF file5 PDF file6 : Layout Keyterm Upload extraction entity interface extractor Page/section segmentation Categorizer & Tools & theme utilities Text extraction analyzer CMS Text glyph Documents adapters filtering Summari zers Stopword Crawlers filtering Other Term extractors & stemming analyzers XMP metadata services Metadata persistence services M e t a d a t a R e p o s t Category & summary filters Category taxonomy rule engine Adobe keyterm ranker Weight categories & themes Recommend rule-based categories Recommend doc & page Keyterms < X M L o r ysemantic analysis WF 01 Semantic analysis WF 02 Semantic analysis WF 03 : Semantic analysis WF 10 Semantic analysis WF 11 Semantic analysis WF 12 Semantic analysis WF 13 : Semantic analysis WF 20 Each semantic analysis workflow = 1 thread 10 analysis threads/process svr process 01 svr process 02 Doc.Reg. process 03 Semantic analysis WF 21 Semantic analysis WF 22 Semantic analysis WF 23 Semantic analysis WF 30 : svr process 03 Doc.Reg. process 04 Semantic analysis WF 31 Semantic analysis WF 32 Semantic analysis WF 33 Semantic analysis WF 40 : svr process 04

28 Presentation Overview Background and motivation Challenges Semantic analysis techniques Platform architecture Screenshot demo Summary / lessons learned Future direction and trends

29 Screenshot Demo Ads for Adobe PDF Powered by Yahoo! Hosted in Adobe co-location Launched public beta Q publishers participating System workflows: User Registration Semantic Analysis PDF Interaction

30 Marketing Adobe Labs

31 Login to Adobe Portal

32 Proceed to Adobe Portal

33 Example document: Travel guide for Canada PDF : pages, average = 5 pages, multiple subtopics Well written text, HS to college-level English Well-structured topically Domain terminology

34 Publish the PDF Adobe semantic metadata used to match against ad inventory On ad click, ad network provider, Adobe, and content publisher share ad revenue

35 Presentation Overview Background and motivation Challenges Semantic analysis techniques Platform architecture Screenshot demo Summary / lessons learned Future direction and trends

36 Summary Launched new semantic service: Ads for Adobe PDF Features in 1.1 Page-level analysis for page specific ads High volume registration and analysis scalability: publishers with millions of PDFs Adobe content intelligence platform using Semantic model of content multi-level semantic analysis Allows publishers to easily monetize content Combines: Statistical keyword analysis Document topic analysis and summarization Ontology and rules-based inferencing

37 Lessons Learned in 1.0 Need to use a hybrid semantic analysis approach: Statistical methods based on N-grams (TF/IDF) Ontologies are key: Machine learning and automatic construction Symbolic theme/topic inference engine Logic rule engines to deal with intentional semantics Document topic analysis problem: long documents, multiple topics Aboutness( X ) with generalization Segmentation Need to refine approach to topic segmentation (e.g., Hearst) Plan for ground-truth evaluations Large number of tuning points Use systematic (WF-wide) analysis tracing & logging Understand ad network inventory from provider Adapt to non-linear ad network behavior (revenue vs. relevance)

38 Future Direction and Trends Need for deeper semantic analysis of text Large scale computational linguistics Use broader knowledge base, e.g., Wikipedia, Google, the Web Automatic targeted ontology learning New vocabulary and topics Topic interrelationships User preference model based on Fine-grained model of content corpus Global user behavior Extensions to other media types: audio and video Speech-to-text Scene analysis, image/object identification

39

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Information Retrieval Elasticsearch

Information Retrieval Elasticsearch Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

How To Manage Your Digital Assets On A Computer Or Tablet Device

How To Manage Your Digital Assets On A Computer Or Tablet Device In This Presentation: What are DAMS? Terms Why use DAMS? DAMS vs. CMS How do DAMS work? Key functions of DAMS DAMS and records management DAMS and DIRKS Examples of DAMS Questions Resources What are DAMS?

More information

Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide

Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide 1 www.microsoft.com/sharepoint The information contained in this document represents the current view of Microsoft Corporation on the issues

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University

More information

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

More information

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy Much higher Volumes. Processed with more Velocity. With much more Variety. Is Big Data so big? Big Data Smart Data Project HAVEn: Adaptive Intelligence

More information

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

More information

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

ifinder ENTERPRISE SEARCH

ifinder ENTERPRISE SEARCH DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality

More information

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

BUSINESS VALUE OF SEMANTIC TECHNOLOGY BUSINESS VALUE OF SEMANTIC TECHNOLOGY Preliminary Findings Industry Advisory Council Emerging Technology (ET) SIG Information Sharing & Collaboration Committee July 15, 2005 Mills Davis Managing Director

More information

US Patent and Trademark Office Department of Commerce

US Patent and Trademark Office Department of Commerce US Patent and Trademark Office Department of Commerce Request for Comments Regarding Prior Art Resources for Use in the Examination of Software-Related Patent Applications [Docket No.: PTO-P-2013-0064]

More information

Taxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013

Taxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013 Taxonomies for Auto-Tagging Unstructured Content Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013 About Heather Hedden Independent taxonomy consultant, Hedden

More information

Content Management Systems: Drupal Vs Jahia

Content Management Systems: Drupal Vs Jahia Content Management Systems: Drupal Vs Jahia Mrudula Talloju Department of Computing and Information Sciences Kansas State University Manhattan, KS 66502. mrudula@ksu.edu Abstract Content Management Systems

More information

A guide to the lifeblood of DAM:

A guide to the lifeblood of DAM: A guide to the lifeblood of DAM: Key concepts and best practices for using metadata in digital asset management systems. By John Horodyski. Sponsored by Widen Enterprises and DigitalAssetManagement.com.

More information

Session 2: Designing Information Architecture for SharePoint: Making Sense in a World of SharePoint Architecture

Session 2: Designing Information Architecture for SharePoint: Making Sense in a World of SharePoint Architecture Session 2: Designing Information Architecture for SharePoint: Making Sense in a World of SharePoint Architecture Welcome Don Miller VP Product Development donm@conceptsearching.com Rachel Sondag Knowledge

More information

Semaphore Overview. A Smartlogic White Paper. Executive Summary

Semaphore Overview. A Smartlogic White Paper. Executive Summary Semaphore Overview A Smartlogic White Paper Executive Summary Enterprises no longer face an acute information access challenge. This is mainly because the information search market has matured immensely

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Digital Asset Management and Controlled Vocabulary

Digital Asset Management and Controlled Vocabulary Digital Asset Management and Controlled Vocabulary Introduction One of the challenges that DataBasics has found in delivering and implementing a digital asset management system is the issue of asset ingestion

More information

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...

More information

IT Insights. Using Microsoft SharePoint 2013 to build a robust support and training portal. A service of Microsoft IT Showcase

IT Insights. Using Microsoft SharePoint 2013 to build a robust support and training portal. A service of Microsoft IT Showcase IT Insights A service of Microsoft IT Showcase Using Microsoft SharePoint 2013 to build a robust support and training portal June 2015 The Microsoft IT team that is responsible for hosting customer and

More information

Key Pain Points Addressed

Key Pain Points Addressed Xerox Image Search 6 th International Photo Metadata Conference, London, May 17, 2012 Mathieu Chuat Director Licensing & Business Development Manager Xerox Corporation Key Pain Points Addressed Explosion

More information

IBM Software Group Thought Leadership Whitepaper. IBM Customer Experience Suite and Enterprise Search Optimization

IBM Software Group Thought Leadership Whitepaper. IBM Customer Experience Suite and Enterprise Search Optimization IBM Software Group Thought Leadership Whitepaper IBM Customer Experience Suite and Enterprise Search Optimization 2 IBM Customer Experience Suite and Enterprise Search Optimization Introduction Delivering

More information

Enhancing Document Review Efficiency with OmniX

Enhancing Document Review Efficiency with OmniX Xerox Litigation Services OmniX Platform Review Technical Brief Enhancing Document Review Efficiency with OmniX Xerox Litigation Services delivers a flexible suite of end-to-end technology-driven services,

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

K@ A collaborative platform for knowledge management

K@ A collaborative platform for knowledge management White Paper K@ A collaborative platform for knowledge management Quinary SpA www.quinary.com via Pietrasanta 14 20141 Milano Italia t +39 02 3090 1500 f +39 02 3090 1501 Copyright 2004 Quinary SpA Index

More information

Why are Organizations Interested?

Why are Organizations Interested? SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty M-E.Eddlestone@sas.com +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions

More information

The Prolog Interface to the Unstructured Information Management Architecture

The Prolog Interface to the Unstructured Information Management Architecture The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, pfodor@cs.sunysb.edu 2 IBM

More information

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company Semantic SharePoint Technical Briefing Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company What is Semantic SP? a joint venture between iquest and Semantic Web Company, initiated in

More information

www.coveo.com Unifying Search for the Desktop, the Enterprise and the Web

www.coveo.com Unifying Search for the Desktop, the Enterprise and the Web wwwcoveocom Unifying Search for the Desktop, the Enterprise and the Web wwwcoveocom Why you need Coveo Enterprise Search Quickly find documents scattered across your enterprise network Coveo is actually

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many

More information

4.10 Reports. RFP reference: 6.10 Reports, Page 47

4.10 Reports. RFP reference: 6.10 Reports, Page 47 Section 4 Bidder s Products, Methodology, and Approach to the Project 4.1 FACTS II Requirements Summary 4.2 Functional Requirements 4.3 Technical Requirements 4.4 Customer Relations Management Tools 4.5

More information

Co-evolving document collections and knowledge structures. CoDAK. Dr. Evgeny Knutov! ! (MSc Seminar Nov. 11 2013)

Co-evolving document collections and knowledge structures. CoDAK. Dr. Evgeny Knutov! ! (MSc Seminar Nov. 11 2013) Co-evolving document collections and knowledge structures CoDAK Dr. Evgeny Knutov (MSc Seminar Nov. 11 2013) The CoDAK project CoDAK: Co-evolving Document Collections and Knowledge Structures AgentschapNL:

More information

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE) HIDDEN WEB EXTRACTOR DYNAMIC WAY TO UNCOVER THE DEEP WEB DR. ANURADHA YMCA,CSE, YMCA University Faridabad, Haryana 121006,India anuangra@yahoo.com http://www.ymcaust.ac.in BABITA AHUJA MRCE, IT, MDU University

More information

T HE I NFORMATION A RCHITECTURE G LOSSARY

T HE I NFORMATION A RCHITECTURE G LOSSARY T HE I NFORMATION A RCHITECTURE G LOSSARY B Y K AT H AGEDORN, ARGUS A SSOCIATES M ARCH 2000 I NTRODUCTION This glossary is intended to foster development of a shared vocabulary within the new and rapidly

More information

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Survey Results: Requirements and Use Cases for Linguistic Linked Data Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management Ram Soma 2, Amol Bakshi 1, Kanwal Gupta 3, Will Da Sie 2, Viktor Prasanna 1 1 University of Southern California,

More information

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights

More information

HOW TO DO A SMART DATA PROJECT

HOW TO DO A SMART DATA PROJECT April 2014 Smart Data Strategies HOW TO DO A SMART DATA PROJECT Guideline www.altiliagroup.com Summary ALTILIA s approach to Smart Data PROJECTS 3 1. BUSINESS USE CASE DEFINITION 4 2. PROJECT PLANNING

More information

Digital Asset Management

Digital Asset Management Digital Asset Management 1 Multimedia content is king but how to conveniently valorize it? Discovery Reply Vision & Mission DISCOVERING, CREATING AND MANAGING SERVICES THAT ENABLE FINAL USERS TO INTERACT

More information

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies 1 Enterprise Information Challenge Source: Oracle customer 2 Vision of Semantically Linked Data The Network of Collaborative

More information

SAS BI Course Content; Introduction to DWH / BI Concepts

SAS BI Course Content; Introduction to DWH / BI Concepts SAS BI Course Content; Introduction to DWH / BI Concepts SAS Web Report Studio 4.2 SAS EG 4.2 SAS Information Delivery Portal 4.2 SAS Data Integration Studio 4.2 SAS BI Dashboard 4.2 SAS Management Console

More information

Text Mining and Analysis

Text Mining and Analysis Text Mining and Analysis Practical Methods, Examples, and Case Studies Using SAS Goutam Chakraborty, Murali Pagolu, Satish Garla From Text Mining and Analysis. Full book available for purchase here. Contents

More information

Anotaciones semánticas: unidades de busqueda del futuro?

Anotaciones semánticas: unidades de busqueda del futuro? Anotaciones semánticas: unidades de busqueda del futuro? Hugo Zaragoza, Yahoo! Research, Barcelona Jornadas MAVIR Madrid, Nov.07 Document Understanding Cartoon our work! Complexity of Document Understanding

More information

Digital Asset Management. Content Control for Valuable Media Assets

Digital Asset Management. Content Control for Valuable Media Assets Digital Asset Management Content Control for Valuable Media Assets Overview Digital asset management is a core infrastructure requirement for media organizations and marketing departments that need to

More information

Text Analytics Software Choosing the Right Fit

Text Analytics Software Choosing the Right Fit Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Text Analytics World San Francisco, 2013 Agenda Introduction Text Analytics Basics

More information

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

WEB& WEBSITE DESIGN TRAINING

WEB& WEBSITE DESIGN TRAINING WEB& WEBSITE DESIGN TRAINING Introduction to Websites Course Content: Introduction to Web Technologies Protocols and Port Numbers Domain Names, DNS and Domaining Client and Server Software. Static, Dynamic

More information

Ontology based ranking of documents using Graph Databases: a Big Data Approach

Ontology based ranking of documents using Graph Databases: a Big Data Approach Ontology based ranking of documents using Graph Databases: a Big Data Approach A.M.Abirami Dept. of Information Technology Thiagarajar College of Engineering Madurai, Tamil Nadu, India Dr.A.Askarunisa

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Web 3.0 image search: a World First

Web 3.0 image search: a World First Web 3.0 image search: a World First The digital age has provided a virtually free worldwide digital distribution infrastructure through the internet. Many areas of commerce, government and academia have

More information

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com Abstract. In today's competitive environment, you only have a few seconds to help site visitors understand that you

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

GRAPHICAL USER INTERFACE, ACCESS, SEARCH AND REPORTING

GRAPHICAL USER INTERFACE, ACCESS, SEARCH AND REPORTING MEDIA MONITORING AND ANALYSIS GRAPHICAL USER INTERFACE, ACCESS, SEARCH AND REPORTING Searchers Reporting Delivery (Player Selection) DATA PROCESSING AND CONTENT REPOSITORY ADMINISTRATION AND MANAGEMENT

More information

Generating Advertising Keywords from Video Content

Generating Advertising Keywords from Video Content Generating Advertising Keywords from Video Content Michael J. Welch 1, Junghoo Cho, and Walter Chang mjwelch@cs.ucla.edu, cho@cs.ucla.edu, wachang@adobe.com Abstract With the proliferation of online distribution

More information

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY Yu. A. Zagorulko, O. I. Borovikova, S. V. Bulgakov, E. A. Sidorova 1 A.P.Ershov s Institute

More information

The Core Pillars of AN EFFECTIVE DOCUMENT MANAGEMENT SOLUTION

The Core Pillars of AN EFFECTIVE DOCUMENT MANAGEMENT SOLUTION The Core Pillars of AN EFFECTIVE DOCUMENT MANAGEMENT SOLUTION Amanda Perran 6 Time MVP Microsoft SharePoint Server Practice Lead, SharePoint - Plato vts Microsoft Co-Author of Beginning SharePoint 2007

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

A Framework for Ontology-Based Knowledge Management System

A Framework for Ontology-Based Knowledge Management System A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: jnwu@dlut.edu.cn Abstract Knowledge

More information

Developing Microsoft SharePoint Server 2013 Advanced Solutions

Developing Microsoft SharePoint Server 2013 Advanced Solutions Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions Course Details Course Outline Module 1: Creating Robust and Efficient Apps for SharePoint In this module, you will review key

More information

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015 Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD

More information

Digital Asset Management 数 字 媒 体 资 源 管 理 任 课 老 师 : 张 宏 鑫 2015-09-15

Digital Asset Management 数 字 媒 体 资 源 管 理 任 课 老 师 : 张 宏 鑫 2015-09-15 Digital Asset Management 数 字 媒 体 资 源 管 理 任 课 老 师 : 张 宏 鑫 2015-09-15 1. Introduction 1. 导 论 Outline Outline Content management Outline Content management Industrial Analysis Outline Content management Industrial

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide IBM Cognos Business Intelligence (BI) helps you make better and smarter business decisions faster. Advanced visualization

More information

Business Intelligence: Recent Experiences in Canada

Business Intelligence: Recent Experiences in Canada Business Intelligence: Recent Experiences in Canada Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies 2 Business Intelligence

More information

EC Wise Report: Unlocking the Value of Deeply Unstructured Data. The Challenge: Gaining Knowledge from Deeply Unstructured Data.

EC Wise Report: Unlocking the Value of Deeply Unstructured Data. The Challenge: Gaining Knowledge from Deeply Unstructured Data. EC Wise Report: Unlocking the Value of Deeply Unstructured Data Feedback from the Market: Forest Rim enables significant improvements in the quality of semantic information derived from text data. This

More information

Taxonomies in Practice Welcome to the second decade of online taxonomy construction

Taxonomies in Practice Welcome to the second decade of online taxonomy construction Building a Taxonomy for Auto-classification by Wendi Pohs EDITOR S SUMMARY Taxonomies have expanded from browsing aids to the foundation for automatic classification. Early auto-classification methods

More information

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo Expected Outcomes You will learn: Basic concepts related to ontologies Semantic model Semantic web Basic features of RDF and RDF

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Chapter 6. Attracting Buyers with Search, Semantic, and Recommendation Technology

Chapter 6. Attracting Buyers with Search, Semantic, and Recommendation Technology Attracting Buyers with Search, Semantic, and Recommendation Technology Learning Objectives Using Search Technology for Business Success Organic Search and Search Engine Optimization Recommendation Engines

More information

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Analysis of Web Archives. Vinay Goel Senior Data Engineer Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner

More information

Structured Content: the Key to Agile. Web Experience Management. Introduction

Structured Content: the Key to Agile. Web Experience Management. Introduction Structured Content: the Key to Agile CONTENTS Introduction....................... 1 Structured Content Defined...2 Structured Content is Intelligent...2 Structured Content and Customer Experience...3 Structured

More information

Enhancing Web Publishing with Digital Asset Management - Using Open Text Artesia DAM to enhance your Open Text WCMS (Red Dot) web sites

Enhancing Web Publishing with Digital Asset Management - Using Open Text Artesia DAM to enhance your Open Text WCMS (Red Dot) web sites Enhancing Web Publishing with Digital Asset Management - Using Open Text Artesia DAM to enhance your Open Text WCMS (Red Dot) web sites Lars Onasch Wolfgang Ruth Agenda A Brief Introduction Customer Examples

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information

ENTERPRISE DOCUMENT MANAGEMENT SYSTEM

ENTERPRISE DOCUMENT MANAGEMENT SYSTEM A Scalable Document Management for all businesses EDMS is a powerful and cost effective document management that allows businesses to centralize management, storage, collaboration, retrieval and archiving

More information

HP Systinet. Software Version: 10.01 Windows and Linux Operating Systems. Concepts Guide

HP Systinet. Software Version: 10.01 Windows and Linux Operating Systems. Concepts Guide HP Systinet Software Version: 10.01 Windows and Linux Operating Systems Concepts Guide Document Release Date: June 2015 Software Release Date: June 2015 Legal Notices Warranty The only warranties for HP

More information

CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise

CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise 5 APR 2011 1 2005... Advanced Analytics Harnessing Data for the Warfighter I2E GIG Brigade Combat Team Data Silos DCGS LandWarNet

More information

SOA REFERENCE ARCHITECTURE: WEB TIER

SOA REFERENCE ARCHITECTURE: WEB TIER SOA REFERENCE ARCHITECTURE: WEB TIER SOA Blueprint A structured blog by Yogish Pai Web Application Tier The primary requirement for this tier is that all the business systems and solutions be accessible

More information

1962-12. Joint ICTP-IAEA School of Nuclear Knowledge Management. 1-5 September 2008. Improving Organizational Performance with a KM System

1962-12. Joint ICTP-IAEA School of Nuclear Knowledge Management. 1-5 September 2008. Improving Organizational Performance with a KM System 1962-12 Joint ICTP-IAEA School of Nuclear Knowledge Management 1-5 September 2008 Improving Organizational Performance with a KM System P. PUHR-WESTERHEIDE GRS mbh Forschungsinstitute, Boltzmannstrasse,

More information

Improving EHR Semantic Interoperability Future Vision and Challenges

Improving EHR Semantic Interoperability Future Vision and Challenges Improving EHR Semantic Interoperability Future Vision and Challenges Catalina MARTÍNEZ-COSTA a,1 Dipak KALRA b, Stefan SCHULZ a a IMI,Medical University of Graz, Austria b CHIME, University College London,

More information

ER/Studio Enterprise Portal 1.0.2 User Guide

ER/Studio Enterprise Portal 1.0.2 User Guide ER/Studio Enterprise Portal 1.0.2 User Guide Copyright 1994-2008 Embarcadero Technologies, Inc. Embarcadero Technologies, Inc. 100 California Street, 12th Floor San Francisco, CA 94111 U.S.A. All rights

More information

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks Oracle Business Intelligence Enterprise Edition (OBIEE) Training: Working with Oracle Business Intelligence Answers Introduction to Oracle BI Answers Working with requests in Oracle BI Answers Using advanced

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Authoring Within a Content Management System. The Content Management Story

Authoring Within a Content Management System. The Content Management Story Authoring Within a Content Management System The Content Management Story Learning Goals Understand the roots of content management Define the concept of content Describe what a content management system

More information

Big Data & Security. Aljosa Pasic 12/02/2015

Big Data & Security. Aljosa Pasic 12/02/2015 Big Data & Security Aljosa Pasic 12/02/2015 Welcome to Madrid!!! Big Data AND security: what is there on our minds? Big Data tools and technologies Big Data T&T chain and security/privacy concern mappings

More information

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015 E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing

More information

Content Delivery Service (CDS)

Content Delivery Service (CDS) Content Delivery Service (CDS) Xyleme delivers content management for learning and development. We transform the way you author, publish, deliver, and analyze learning content to drive business performance.

More information

Data Integration Hub for a Hybrid Paper Search

Data Integration Hub for a Hybrid Paper Search Data Integration Hub for a Hybrid Paper Search Jungkee Kim 1,2, Geoffrey Fox 2, and Seong-Joon Yoo 3 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu,

More information

Data Search. Searching and Finding information in Unstructured and Structured Data Sources

Data Search. Searching and Finding information in Unstructured and Structured Data Sources 1 Data Search Searching and Finding information in Unstructured and Structured Data Sources Erik Fransen Senior Business Consultant 11.00-12.00 P.M. November, 3 IRM UK, DW/BI 2009, London Centennium BI

More information

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,

More information