Information Access Platforms: The Evolution of Search Technologies



Similar documents
US Patent and Trademark Office Department of Commerce

CONCEPTCLASSIFIER FOR SHAREPOINT

Case Study - MetaVis Migrator

Delivering Smart Answers!

Banking Industry Performance Management

The Clear Path to Business

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

3. Provide the capacity to analyse and report on priority business questions within the scope of the master datasets;

Armanino McKenna LLP Welcomes You To Today s Webinar:

Achieve more with less

The Clear Path to Business Intelligence

Enterprise 2.0 and SharePoint 2010

Software Firm Applies Structure to Content Management System for Greatest Value

<no narration for this slide>

Turn Information into a Strategic Asset with SAP Solutions for Information Management. Jens Sauer, SAP Switzerland 11 th September 2013

OGP s Solution Stack. Luis Moreira. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Office 365 SharePoint Online

Big Data Analytics- Innovations at the Edge

Data Sheet: Archiving Symantec Enterprise Vault Discovery Accelerator Accelerate e-discovery and simplify review

By Makesh Kannaiyan 8/27/2011 1

Streamlining the Process of Business Intelligence with JReport

Veritas ediscovery Platform

Knowledge Management System a Reality

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Symantec ediscovery Platform, powered by Clearwell

Analance Data Integration Technical Whitepaper

Intelligent document management for the legal industry

Microsoft in an Integrated. Update. decisions faster. Presented by Steve Studer for the AIIM

Flattening Enterprise Knowledge

Security challenges for internet technologies on mobile devices

SharePoint Term Store & Taxonomy Design Harold Brenneman Lighthouse Microsoft Technology Group

V E N D O R P R O F I L E. F i c s t a r : S i m p l i f y i n g W e b D a t a E x t r a c t i o n I D C O P I N I O N

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

INSIGHT NAV. White Paper

Tax Fraud in Increasing

Best Practices for Architecting Taxonomy and Metadata in an Open Source Environment

AccessData Corporation. No More Load Files. Integrating AD ediscovery and Summation to Eliminate Moving Data Between Litigation Support Products

Chartis RiskTech Quadrant for Model Risk Management Systems 2014

Web to Print Knowledge Experience. A Case Study of the Government of Hessen, Germany s Half-Time Report

Marathon Information Management Program

DIIMS Frequently Asked Questions

Making Leaders Successful Every Day

Transforming Information Silos into Shareable Assets through Automated Content Conversion

Session 805 -End-to-End SAP Lumira: Desktop to On-Premise, Cloud, and Mobile

Reduce Cost, Time, and Risk ediscovery and Records Management in SharePoint

An Enterprise Framework for Business Intelligence

BUSINESS INTELLIGENCE

IBM Unstructured Data Identification & Management An on ramp to reducing information costs and risk

Beyond listening Driving better decisions with business intelligence from social sources

How To Find Print And Print Servers On A Macintosh (For Free)

MarkLogic Enterprise Data Layer

Data Integration Checklist

CA Records Manager. Benefits. CA Advantage. Overview

U-LINC : Workflow and Notifications Anytime and Anywhere for Microsoft Dynamics GP

Using SAP Master Data Technologies to Enable Key Business Capabilities in Johnson & Johnson Consumer

Oracle Business Intelligence EE. Prab h akar A lu ri

Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide

Integrated archiving: streamlining compliance and discovery through content and business process management

III JORNADAS DE DATA MINING

Extending The Value of SAP with the SAP BusinessObjects Business Intelligence Platform Product Integration Roadmap

Big Data Analytics Nokia

Extending Microsoft SharePoint Environments with EMC Documentum ApplicationXtender Document Management

McAfee Global Threat Intelligence File Reputation Service. Best Practices Guide for McAfee VirusScan Enterprise Software

Technologies that Enable Knowledge Management: Understanding the Options and Taking First Steps

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

What We Do. Our products harness big data and transform it into actionable knowledge, to be consumed in 5 seconds

K-Rise Systems: White Paper. Faceted Search aka Filtered Navigation: Improving E-Commerce Functionality by an Order of Magnitude

Content Delivery Service (CDS)

Analance Data Integration Technical Whitepaper

BusinessObjects XI. New for users of BusinessObjects 6.x New for users of Crystal v10

Enterprise Content Management: A Foundation for Enterprise Information Management

Title Business Intelligence: A Discussion on Platforms, Technologies, and solutions

Business Intelligence

Business Intelligence

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

Software Provider Helps Companies Simplify Information Management and Reduce IT Costs

IBM Social Media Analytics

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

BPM vs. SharePoint: Which is Right for Your Business

Brochure. ECM without borders. HP Enterprise Content Management (ECM)

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Disrupt or be disrupted IT Driving Business Transformation

Autonomy Consolidated Archive

Management Consulting Systems Integration Managed Services WHITE PAPER DATA DISCOVERY VS ENTERPRISE BUSINESS INTELLIGENCE

Microsoft Office SharePoint Server 2007

Accenture and SAP: Delivering Visual Data Discovery Solutions for Agility and Trust at Scale

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to:

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

A business intelligence agenda for midsize organizations: Six strategies for success

BLUESKIES. Microsoft SharePoint and Integration with Content Management Platforms. FileHold - Providing Advanced Content Management Functionality

W H I T E P A P E R E X E C U T I V E S U M M AR Y S I T U AT I O N O V E R V I E W. Sponsored by: EMC Corporation. Laura DuBois May 2010

UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES

Ganzheitliches Datenmanagement

BUILDING OLAP TOOLS OVER LARGE DATABASES

EMC APPLICATIONXTENDER 8.0 Real-Time Document Management

How To Use Social Media To Improve Your Business

Transcription:

Information Access Platforms: The Evolution of Search Technologies Managing Information in the Public Sphere: Shaping the New Information Space April 26, 2010

Purpose To provide an overview of current search technologies, their respective strengths and weaknesses, and how these are currently implemented in the OPS. To present the concepts of Information Access Platforms and of Enterprise Search To inform the audience of the relevant work that is currently underway in Justice Technology Services To share our thoughts on how emerging technologies can fit within the broader Enterprise Information Management (EIM) space 2

The Search Spectrum Definitions The varieties of search listed here should be seen as comprising a spectrum, rather than absolute differentiation. Yes, there is overlap. Desktop search empowers the individual user to discover information from multiple sources / locations. Web search engines present a list of results ( hits ) consisting of web pages and other files. Federated search is associated with portals and allows for simultaneous search of on-line databases and web resources. Embedded or native search is found in most enterprise & business applications, but limited to data in that system. Enterprise search provides advanced feature sets for discovery of information assets across the enterprise. 3

Types of Search SWOT Analysis Strengths, Weaknesses, Opportunities, Threats (SWOT) analysis is an established practice for environmental scanning & strategic planning. This is a start. Desktop search (e.g., Google Desktop, Windows Search) empowers the individual user, but can be an enormous drain on network resources. NOT available as an OPS standard service. Web search results may be voluminous and ranking can be an issue still, the most widely known search practice is to google. Federated search is often associated with portals and allows for simultaneous search of on-line databases and web resources. The OPS Federation Search Service is used for internet and intranets. Embedded or native search (e.g., Outlook Search) is found in many enterprise & business applications, but usually limited to data in that application. 4

Types of Search Enterprise Search Enterprise Search is intended to support integrated discovery of information assets from within an organization as well as key external resources. The defining qualities of enterprise search are: The ability to integrate information resources from multiple data sources, types, and locations, even (up to) structured and unstructured content. The ability to deploy rich features to perform advanced analytics, visualization and guided navigation of the integrated data. Enterprise search has its own sub-spectrum (described below), based upon the breadth of information sources which can be integrated and the richness of the feature set included. The OPS Federation Search Service is being promoted by CCAS as the foundation of Enterprise Search for the OPS. 5

Varieties of Enterprise Search As noted above, Enterprise Search has its own sub-spectrum : Information Access Platform the most advanced form of enterprise search, incorporating text analytics, broad connectivity, highly scalable and customizable, etc. Embedded Platform Search an increasingly common form of enterprise search, found in content management applications & platforms such as MOSS 2007 and OpenText LiveLink Search Solutions address specific departmental needs through targetted collections & interfaces, but prone to silos and limited direct connectivity Commodity Search closest to desktop search, enabling individual users (often no cost for download) or as scoping for enterprise search For details, see the handout. References available on request. NB: JTS main interest has been in exploration of Information Access Platform technology and its EIM potential. The rest is contextual (and respond to how is this different from questions!) 6

Enterprise Search IAP Characteristics IAP technologies have characteristics which distinguish them from the remainder of Enterprise Search offerings: Search, navigate, and visualize both data and content Transform data Analyze text for entities and patterns Connect to heterogenous sources Scale to massive volumes Customize data ingestion and front-end search application Given the advanced capabilities of IAP (and consequent cost to implement), the trend has been toward strategic deployment when a broad platform for custom applications is and/or high risk is present and exceptionally high discovery rates are required. 7

Enterprise Search Key Features of IAP Text analytics (also known as text mining), essentially turns text into data available for analysis similar to structured data, and applying consistent information management practices. Some of the methods for performing text analytics are described below. Collaborative tagging to capture knowledge about unstructured & semistructured content Entity extraction to locate and classify text elements in predefined categories Facetted classification allows users to find items based on more than one dimension Access control to restrict access to results, leveraging existing security models Text clustering groups search results from multiple sources, compensating for inconsistent metadata between sources Customized / Complex User Interface to visualize data and support use of rich features and guided navigation. 8

JTS IAP Proof-of-Concept Background In early 2009, JTS had an opportunity to work with an IAP Leader (Endeca) to do a proof-of-concept (POC). The initiative had support from both the JTS CIO and from the Chief Information & Privacy Officer (CIPO). The POC ran from June to July, with a final report delivered in August, 2009. Its stated objectives were to prove the ability to: Efficiently search multiple data sources, types, locations Locate information even when lacking metadata and poorly organized Find identical and similar information in order to perform cleanup Preserve security / access restrictions in all search experience. Based upon success of the POC this is now referred to Phase 1. Results are provided below as well as planning for Phases 2 and 3. 9

JTS IAP POC Phase 1 Business Results The IAP POC Phase 1 was successful in all main objectives The POC user group tested, provided feedback, and final results over a 3 week period. Highlights of the results include: Simultaneous search of multiple data sources / data types. Preservation of access restrictions inherent in the data source. Identification of duplicate information. Use of thesaurus to improve search results. Basic ability to search and visualize data (limited due to business data sources). 10

JTS IAP POC Phase 1 Technology Results Specific technical requirements were also tested in the POC: Data sources: connectivity to a wide variety of data sources, including file shares, content management systems, databases. Document types: text & metadata search of a variety of document types, including MS Office, PDF, Outlook message types, PST files, MS Visio & Project. Save queries: ability to save queries for future use. Dashboard: ability to display results using dashboard view. De-duplication: duplicate content identification. File count: ability to view number of documents in a data source. Security: ability to replicate security found in data sources. Audit: ability to log search requests by users. 11

JTS IAP POC Phase 1 Vendor (Endeca) Endeca is the vendor with whom JTS worked in Phase 1 of the IAP POC. They are a recognized leader in Information Access Platforms. Superior facetted classification and search of structured & unstructured data. Introduces business intelligence capability, utilizing a flexible data model which accommodates new data easily without issues. Less focused on searching massive amounts of unstructured content, than integrating/mining structured & unstructured content. Endeca emphasizes that it is a platform, not search, providing contextual information for any set of records, and ability to perform in the moment analytics & data-driven guided navigation. Competitors may provide these functions, but only through custom application development, rather than configuration. 12

JTS IAP POC Phase 2 Planned Approach & Objectives In December 2009, additional work was done, specifically to explore potential support for Functional Classification practice: Crawled mandate documents for 5 major Justice business areas Crawled the E-Laws web site Incorporated the Keyword AAA thesaurus as a controlled vocabulary Phase 2 POC is required to see how IAP may help us to perform Functional Classification. Specific business requirements include: Customizable sandboxing of crawled information User-customizable thesaurus In-system tagging Accurate migration of assigned metadata to EIM platform Current plan is to work with and learn from another major IAP vendor Microsoft FAST, available from OPS Common Components Application and Services (CCAS) offering through its Federated Search Service. 13

JTS IAP POC Proposed Phase 3 + Interim Strategy IAP POC Phase 3: JTS is proposing a third phase of the POC, concentrating on the ability to utilize advanced business analytics on disparate data sources. Due to limitations in the POC Phase 1, technical ability to connect to multiple data sources and integrate structured & unstructured data was proven, but not fully tested. Business analytics & data visualization needs to be more fully explored, and potential for improved management of information in support of decisionmaking and policy development. The POC Phase 3 will focus upon IAP and the Business Intelligence information technology environment. Interim strategy will be to evaluate search technologies on a project / solution basis, depending on specific business requirements, but leveraging what we have learned (and are continuing to learn) from the POCs. 14

Lessons Learned: Enterprise Search & EIM Lesson Learned 1: there is no magic bullet for enterprise search. In reality, it depends on business requirements, best fit, current technology environment and available IT services. Lesson Learned 2: search is transformative. The search environment can change the way we interact with information and expectations of the information environment (e.g., browse vs. search). Lesson Learned 3: advanced features available in IAP can help us develop much needed tools to support implementation of EIM we are only beginning to understand how. Our conclusion: enterprise search / IAP have profound implications for EIM. Beyond specific toolkit support, the analytics aspect will be critical for achieving the improved decision-making that is a main goal of EIM. Further investigation definitely required. 15

Questions? Kelly Ryan Lead, Information Management IM & Planning Branch, JTS kelly.ryan@ontario.ca 416.314.0320 Irene McGlashan Senior Technology Manager Technology Solutions Branch, JTS irene.mcglashan@ontario.ca 416,325.0999 16