Information Access Platforms: The Evolution of Search Technologies Managing Information in the Public Sphere: Shaping the New Information Space April 26, 2010
Purpose To provide an overview of current search technologies, their respective strengths and weaknesses, and how these are currently implemented in the OPS. To present the concepts of Information Access Platforms and of Enterprise Search To inform the audience of the relevant work that is currently underway in Justice Technology Services To share our thoughts on how emerging technologies can fit within the broader Enterprise Information Management (EIM) space 2
The Search Spectrum Definitions The varieties of search listed here should be seen as comprising a spectrum, rather than absolute differentiation. Yes, there is overlap. Desktop search empowers the individual user to discover information from multiple sources / locations. Web search engines present a list of results ( hits ) consisting of web pages and other files. Federated search is associated with portals and allows for simultaneous search of on-line databases and web resources. Embedded or native search is found in most enterprise & business applications, but limited to data in that system. Enterprise search provides advanced feature sets for discovery of information assets across the enterprise. 3
Types of Search SWOT Analysis Strengths, Weaknesses, Opportunities, Threats (SWOT) analysis is an established practice for environmental scanning & strategic planning. This is a start. Desktop search (e.g., Google Desktop, Windows Search) empowers the individual user, but can be an enormous drain on network resources. NOT available as an OPS standard service. Web search results may be voluminous and ranking can be an issue still, the most widely known search practice is to google. Federated search is often associated with portals and allows for simultaneous search of on-line databases and web resources. The OPS Federation Search Service is used for internet and intranets. Embedded or native search (e.g., Outlook Search) is found in many enterprise & business applications, but usually limited to data in that application. 4
Types of Search Enterprise Search Enterprise Search is intended to support integrated discovery of information assets from within an organization as well as key external resources. The defining qualities of enterprise search are: The ability to integrate information resources from multiple data sources, types, and locations, even (up to) structured and unstructured content. The ability to deploy rich features to perform advanced analytics, visualization and guided navigation of the integrated data. Enterprise search has its own sub-spectrum (described below), based upon the breadth of information sources which can be integrated and the richness of the feature set included. The OPS Federation Search Service is being promoted by CCAS as the foundation of Enterprise Search for the OPS. 5
Varieties of Enterprise Search As noted above, Enterprise Search has its own sub-spectrum : Information Access Platform the most advanced form of enterprise search, incorporating text analytics, broad connectivity, highly scalable and customizable, etc. Embedded Platform Search an increasingly common form of enterprise search, found in content management applications & platforms such as MOSS 2007 and OpenText LiveLink Search Solutions address specific departmental needs through targetted collections & interfaces, but prone to silos and limited direct connectivity Commodity Search closest to desktop search, enabling individual users (often no cost for download) or as scoping for enterprise search For details, see the handout. References available on request. NB: JTS main interest has been in exploration of Information Access Platform technology and its EIM potential. The rest is contextual (and respond to how is this different from questions!) 6
Enterprise Search IAP Characteristics IAP technologies have characteristics which distinguish them from the remainder of Enterprise Search offerings: Search, navigate, and visualize both data and content Transform data Analyze text for entities and patterns Connect to heterogenous sources Scale to massive volumes Customize data ingestion and front-end search application Given the advanced capabilities of IAP (and consequent cost to implement), the trend has been toward strategic deployment when a broad platform for custom applications is and/or high risk is present and exceptionally high discovery rates are required. 7
Enterprise Search Key Features of IAP Text analytics (also known as text mining), essentially turns text into data available for analysis similar to structured data, and applying consistent information management practices. Some of the methods for performing text analytics are described below. Collaborative tagging to capture knowledge about unstructured & semistructured content Entity extraction to locate and classify text elements in predefined categories Facetted classification allows users to find items based on more than one dimension Access control to restrict access to results, leveraging existing security models Text clustering groups search results from multiple sources, compensating for inconsistent metadata between sources Customized / Complex User Interface to visualize data and support use of rich features and guided navigation. 8
JTS IAP Proof-of-Concept Background In early 2009, JTS had an opportunity to work with an IAP Leader (Endeca) to do a proof-of-concept (POC). The initiative had support from both the JTS CIO and from the Chief Information & Privacy Officer (CIPO). The POC ran from June to July, with a final report delivered in August, 2009. Its stated objectives were to prove the ability to: Efficiently search multiple data sources, types, locations Locate information even when lacking metadata and poorly organized Find identical and similar information in order to perform cleanup Preserve security / access restrictions in all search experience. Based upon success of the POC this is now referred to Phase 1. Results are provided below as well as planning for Phases 2 and 3. 9
JTS IAP POC Phase 1 Business Results The IAP POC Phase 1 was successful in all main objectives The POC user group tested, provided feedback, and final results over a 3 week period. Highlights of the results include: Simultaneous search of multiple data sources / data types. Preservation of access restrictions inherent in the data source. Identification of duplicate information. Use of thesaurus to improve search results. Basic ability to search and visualize data (limited due to business data sources). 10
JTS IAP POC Phase 1 Technology Results Specific technical requirements were also tested in the POC: Data sources: connectivity to a wide variety of data sources, including file shares, content management systems, databases. Document types: text & metadata search of a variety of document types, including MS Office, PDF, Outlook message types, PST files, MS Visio & Project. Save queries: ability to save queries for future use. Dashboard: ability to display results using dashboard view. De-duplication: duplicate content identification. File count: ability to view number of documents in a data source. Security: ability to replicate security found in data sources. Audit: ability to log search requests by users. 11
JTS IAP POC Phase 1 Vendor (Endeca) Endeca is the vendor with whom JTS worked in Phase 1 of the IAP POC. They are a recognized leader in Information Access Platforms. Superior facetted classification and search of structured & unstructured data. Introduces business intelligence capability, utilizing a flexible data model which accommodates new data easily without issues. Less focused on searching massive amounts of unstructured content, than integrating/mining structured & unstructured content. Endeca emphasizes that it is a platform, not search, providing contextual information for any set of records, and ability to perform in the moment analytics & data-driven guided navigation. Competitors may provide these functions, but only through custom application development, rather than configuration. 12
JTS IAP POC Phase 2 Planned Approach & Objectives In December 2009, additional work was done, specifically to explore potential support for Functional Classification practice: Crawled mandate documents for 5 major Justice business areas Crawled the E-Laws web site Incorporated the Keyword AAA thesaurus as a controlled vocabulary Phase 2 POC is required to see how IAP may help us to perform Functional Classification. Specific business requirements include: Customizable sandboxing of crawled information User-customizable thesaurus In-system tagging Accurate migration of assigned metadata to EIM platform Current plan is to work with and learn from another major IAP vendor Microsoft FAST, available from OPS Common Components Application and Services (CCAS) offering through its Federated Search Service. 13
JTS IAP POC Proposed Phase 3 + Interim Strategy IAP POC Phase 3: JTS is proposing a third phase of the POC, concentrating on the ability to utilize advanced business analytics on disparate data sources. Due to limitations in the POC Phase 1, technical ability to connect to multiple data sources and integrate structured & unstructured data was proven, but not fully tested. Business analytics & data visualization needs to be more fully explored, and potential for improved management of information in support of decisionmaking and policy development. The POC Phase 3 will focus upon IAP and the Business Intelligence information technology environment. Interim strategy will be to evaluate search technologies on a project / solution basis, depending on specific business requirements, but leveraging what we have learned (and are continuing to learn) from the POCs. 14
Lessons Learned: Enterprise Search & EIM Lesson Learned 1: there is no magic bullet for enterprise search. In reality, it depends on business requirements, best fit, current technology environment and available IT services. Lesson Learned 2: search is transformative. The search environment can change the way we interact with information and expectations of the information environment (e.g., browse vs. search). Lesson Learned 3: advanced features available in IAP can help us develop much needed tools to support implementation of EIM we are only beginning to understand how. Our conclusion: enterprise search / IAP have profound implications for EIM. Beyond specific toolkit support, the analytics aspect will be critical for achieving the improved decision-making that is a main goal of EIM. Further investigation definitely required. 15
Questions? Kelly Ryan Lead, Information Management IM & Planning Branch, JTS kelly.ryan@ontario.ca 416.314.0320 Irene McGlashan Senior Technology Manager Technology Solutions Branch, JTS irene.mcglashan@ontario.ca 416,325.0999 16