1 HiTech White Paper A Next Generation Search System for Today's Digital Enterprises
2 About the Author Ajay Parashar Ajay Parashar is a Solution Architect with the HiTech business unit at Tata Consultancy Services (TCS). He has around 15 years of experience in database administration, and project, program, as well as delivery management. As the Big Data CoE Lead for the business unit, Parashar manages business development and consulting projects in this space. He has a Master's degree in Business Administration from the Institute of Management Studies, DAVV Indore, India, and a Bachelor's degree in Industrial and Production Engineering from Sri Govindram Sakseria Institute of Technology and Science, Indore, India.
3 Abstract The importance of effective search for relevant information across different enterprise systems continues to grow, owing to rapidly maturing technologies and the data deluge faced by organizations. The advent of Web 2.0, commonly known as social computing, has brought in a fresh wave of technology enhancements to the information technology landscape. Big Data and analytics, sentiment analysis, natural language processing, and speech recognition are some technologies that are making a big impact. Given the promise of transforming the information and business landscape, organizations worldwide are increasingly adopting these technologies to improve process efficiencies and enhance business productivity. In this paper, we discuss the technology footprint of a next generation enterprise search ecosystem that can add significant business value in today's digital world, and act as a source of competitive advantage.
4 Contents The Evolving Nature of Enterprise Search Systems 5 Limitations of the Existing Enterprise Search Systems 5 Fundamental Requirements of a Next Generation Enterprise Search System 6 Developing a Comprehensive Enterprise Search System: The Technology Components 7 Building a Collaborative Organization: The Benefits 9 The Future of Search 9
5 The Evolving Nature of Enterprise Search Systems Today's globalized high-tech industry is characterized by distributed operations with R&D, manufacturing, and service centers situated in different locations. In addition, the industry focus is slowly shifting from a business to business (B2B) to a business to business to consumer (B2B2C) model. As industry dynamics continue to change, accessibility to the right information at the right time plays a critical role in maintaining a sense of cohesiveness across distributed operating units. To manage the new operating model, organizations have invested heavily in systems such as data warehouses, knowledge management systems, and collaboration platforms. Information therefore, is no longer generated from a single source or location, but is spread across multiple enterprise systems. While customer relationship management (CRM) and enterprise content management systems work well in isolation, decision makers need access to a unified data repository to get the required information on time, and in an organized manner. The scope of search is no longer restricted to providing links to multiple documents. The key is to provide summarized information by integrating multiple data repositories without losing the context of search, to help accelerate the decision making process. In the following sections, we discuss a possible technology mix that can be leveraged to develop such a system. Limitations of the Existing Enterprise Search Systems Different stakeholders across industry segments such as professional services, software product engineering, or device manufacturing face several challenges with respect to leveraging enterprise search systems. Some inherent limitations of the existing enterprise information repositories and search mechanisms are: Lack of context awareness: Existing search systems lack the ability to distinguish and provide contextual search results based on the role, place and task, location, time of search, or device from which the search was initiated. This makes it difficult to search for relevant information quickly. Absence of a unified search platform: Let's consider a scenario where you are looking for an office in a building with 100 floors. You would rather ask the receptionist than go looking for the office on each floor. The current enterprise IT landscape is similar to a huge building with its multiple systems including document management, knowledge management, file management, management, and customer relationship management, enterprise resource and business planning, and auditing and reporting. However, the existing enterprise landscape lacks a 'receptionist', in other words, a unified platform that can help you search for relevant information without going through each 'floor'. This impedes the search process when time is of essence in decision making. Complex security policies guarding each repository: Since enterprise systems are typically designed in the context of individual processes, each system has its own set of security policies. While this design works well in securing the systems, it presents a major challenge in deriving integrated insights. Business users looking for information not only need to be aware of the system landscape but also need the right level of access to multiple systems. 5
6 Dependence on specialized staff: Enterprise data warehouse is an excellent organizational data repository, although it takes a data engineer to extract something meaningful from it. Typically, business users rely on pre-defined reports that are auto generated at a periodic interval. However, if there is a need to add one more dimension to the predefined report, users need to reach out to the internal IT team. In addition, it typically takes months to navigate the complex relational structure of the data warehouse and deliver the change effectively. Fundamental Requirements of a Next Generation Enterprise Search System A reassessment and redesign of search systems is required to effectively address the aforementioned challenges, and create the next generation enterprise search platform. Here are a few fundamental functionalities that such a system should provide: n Allow users to search through multiple repositories from a single easy to use search interface. n Enable search not only through content management systems but also through other data sources, including enterprise data warehouses, repositories, file system, collaboration platforms, and more. n Provide the ability to automatically tag documents or knowledge artifacts leveraging text mining and natural language processing algorithms, and eliminate dependency on manual processing. n Consider user context such as role, location, and business group, while searching knowledge repositories. n Take into account the access control mechanism of each system transparently and render results that the user is authorized to view in keeping with his or her role in the organizational hierarchy. n Summarize results in the form of a concise snapshot. While the expectations listed might appear simple, the technology ecosystem required to deliver them is far more complex given the current maturity of the IT landscape at most organizations. 6
7 Developing a Comprehensive Enterprise Search System: The Technology Components Developing the next generation enterprise search system is a technology challenge as no single technology is capable of handling all the requirements in entirety. The next generation enterprise search platform will therefore have to rely on a combination of multiple technology components to deliver on its promise. Figure 1 is a high-level overview of different technology layers that need to be considered while building an effective enterprise search system. Business Taxonomy Data Connectors Content Mining and Text Analytics Contents, Tags, Classification Security Abstraction Elastic Search Figure 1: Components of a Next Generation Enterprise Search System (Source: TCS Internal) Let's take a closer look at these components: Enterprise Data Connectors to different data sources: Most of the popular content management systems provide Application Program Interfaces (APIs) to read stored documents. The connectors should not only be able to access the stored documents but also fetch the metadata information associated with these. Metadata information, such as author details, security restriction level, and group permission, is required to maintain the authenticity of documents and comply with regulatory requirements. While document stores are integral sources of documented information, the data (master data as well as transactional data) used and generated by various enterprise applications (like inventory management applications or finance applications) reside in their respective relational databases. The 7
8 platform may also need connectors to relational databases or other enterprise applications in order to enrich the insights drawn from the unstructured data present in document stores with the structured data present in relational databases. Content mining programs: The search system needs to support multiple parsers that can read the content stored within documents. These documents can range from word documents, PDF files, and presentation slides to spreadsheets, plain text files, and more. Content analyzer: The mined content will need to be analyzed in the context of varied taxonomy information in order to create an appropriate document tagging mechanism. This is the most crucial technology component as success or failure of the search system primarily depends on how different documents are tagged and made available to the search engine. Data integration platform: Traditionally, business have exclusively used structured data stored in relational database management systems. Any data that did not fit the relational structure was pushed to archives to be accessed on a need basis. This was also the case with the text data that was captured as part of comments or remarks in data entry forms. However, with the advent of social computing, growing chatter around machine generated data, progress in field of natural language processing, and advanced text mining techniques, business are getting interested in enhancing their existing relational data based dashboards and metrics reports. Unstructured (typically text data) and semi-structured (typically from machines and equipment like sensors) data sources are moving out from 'cold' data archives to live 'hot' data baskets. As businesses deal with multiple data sources, a data integration platform that can handle both unstructured as well as structured data from data warehouse, needs to be deployed. Security framework: A next generation enterprise search system is expected to source information from multiple data repositories to provide relevant search results, and therefore, it is imperative to build a security framework. This framework should be integrated with an access control mechanism for each underlying data platform. An alternative approach would be to bring all data sources under one security framework. However, this is a difficult proposition considering the spread of data sources in today's organizations. A better approach would be to have a security framework that coordinates independently with each data source leveraging its inherent security mechanism to control the flow of data. Search platform: While the data integration platform mentioned above helps integrate searchable data elements, a platform to carry out search on indexes and metadata information is also required. This search platform will not only search for data but also interact with different visualization tools to render the results in an understandable format, such as charts for numeric data. The most crucial components of the system are content mining programs and analyzers, since the efficiency of a search engine depends on its ability to extract insights from the available document repository. The scope of the system should range from mining simple text files and identifying keywords, to running complex natural language processing (NLP) routines that can enable segmentation of documents using business dictionaries. The content mining and analytics programs should also be able to add context to the search criteria and summarize search results. While NLP routines have advanced considerably over the past few years, this concept is yet to reach the maturity level of human processing. Added to this is the complexity resulting from providing multi-lingual support. 8
9 There are many technology components available in the open source community, with some being made available by licensed software providers, which meet one or more of these requirements. For instance, Apache Tika allows users to read different types of documents, while Apache Lucene provides a comprehensive and scalable search platform. However, a generic search platform may not offer a comprehensive solution to meet the organization's holistic requirements. The key is to identify individual technology components and integrate them under one fabric to create a search system that is unique to the organization and caters to its business goals. Building a Collaborative Organization: The Benefits A well designed next generation enterprise search platform provides a unified window for all enterprise search requirements. In addition to this, it also helps organizations derive business as well as operational benefits that include: Reduced time and effort: A single interface to search through multiple data repositories and a summarized view of search results can considerably reduce the time and effort spent in searching and understanding the content. This in turn helps personnel better utilize their time to generate insights from search results rather than merely identifying relevant information. Ability to identify expert knowledge base: A comprehensive enterprise search platform can help identify knowledge experts by skimming through multiple data repositories. For instance, there might be communities in enterprise collaboration platforms that are actively engaged in the discussion of certain topics. However, they might not have formally published any research or findings. A next generation enterprise search platform can potentially identify such hidden communities and experts. Increased collaboration: A natural outcome of easy search and accessibility to information is increased transparency, trust, and collaboration among knowledge teams. The Future of Search Organizations are growing rapidly at a global level, with centralized or decentralized systems supporting their operations in different geographic locations. The workforce is becoming increasingly global and the means to interact with customers is also changing rapidly. Knowledge management has and always will be a key element supporting the very fabric of an organization. Therefore, it needs to evolve and match the pace of growth and changes in the business landscape. In the prevailing era of digital economics, the amount of time taken to generate, process, and consume information captured at different levels has taken on a whole new meaning. Speed and agility are imperative to quick and accurate decision making that can make or break organizations. As a result, developing a next generation enterprise search platform is an area that demands immediate attention. Organizations that implement innovative search solutions, leveraging enterprise data and knowledge repositories, are sure to add value at every stage of the business cycle, resulting in significant competitive advantage. 9
10 About TCS' High Tech Business Unit Accelerated industry growth, rapid technological obsolescence, and the need for faster time to market compel High Tech organizations to improve business agility. High Tech solutions from TCS address fundamental industry problems, improve process efficiency, and enhance productivity and collaboration across businesses, while optimizing overheads. From software product engineering and supply chain, to leveraging Internet of Things, digital reimagination, cloud, Big Data, mobility and others, our solutions empower industry players - computer platform and services companies, software firms, electronics and semiconductor companies, and professional services firms to compete effectively. Complementing our comprehensive service portfolio of IT solutions, business consulting, product engineering services, infrastructure services, and business process services, are our partnerships and alliances with leading industry vendors. Dedicated innovation labs, infrastructure support, and the Tata Research Development and Design Center (TRDDC) offer our clients access to cutting edge technologies, advanced systems engineering methodologies, storage optimization, and convergence solutions. Contact For more information about TCS High Tech Business Unit, visit: Subscribe to TCS White Papers TCS.com RSS: Feedburner: About Tata Consultancy Services (TCS) Tata Consultancy Services is an IT services, consulting and business solutions organization that delivers real results to global business, ensuring a level of certainty no other firm can match. TCS offers a consulting-led, integrated portfolio of IT and IT-enabled infrastructure, engineering and TM assurance services. This is delivered through its unique Global Network Delivery Model, recognized as the benchmark of excellence in software development. A part of the Tata Group, India s largest industrial conglomerate, TCS has a global footprint and is listed on the National Stock Exchange and Bombay Stock Exchange in India. For more information, visit us at IT Services Business Solutions Consulting All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties. Copyright 2015 Tata Consultancy Services Limited TCS Design Services I M I 08 I 15