Top 10 Technology Trends Impacting Information Infrastructure, 2013

Transcription

1 G Top 10 Technology Trends Impacting Information Infrastructure, 2013 Published: 19 February 2013 Analyst(s): Regina Casonato, Mark A. Beyer, Merv Adrian, Ted Friedman, Debra Logan, Frank Buytendijk, Massimo Pezzini, Roxane Edjlali, Andrew White, Douglas Laney Information management technologies and disciplines continue to challenge traditional assumptions. CIOs, and information leaders and architects need new skills to implement a modern information infrastructure and seize the opportunities in the second half of the information age. Key Findings Information is one of the Nexus of Forces changing business. In the Nexus, information is the context for delivering enhanced social and mobile experiences, and for connecting operational technology (OT) with IT. 1 Many new information management (IM) technologies require new skills and dedicated data scientists. These skills are rare and organic growth is required. Some critical roles will remain unfulfilled due to a scarcity of talents. In addition to the people, role and process aspects of enterprise information management (EIM), central to success is an enabling technology infrastructure that helps information producers and consumers organize, share and exchange any type of data and content, anytime, anywhere. Gartner calls this enabling technology infrastructure a modern information infrastructure. Enterprises that accelerate adoption of these top technology trends will be able to cope better with vastly increased volumes of information, as well as the increased velocity, variety and proliferation of use cases for information. Recommendations Hire or train more individuals with IM and organizational skills who can work crosscompany. Create training programs for business-based information managers and data stewards. Use Gartner's Information Capabilities Framework to assess your organization's maturity and gaps, and to establish a vision and road map for improving your information infrastructure (see Figure 1).

2 Consider whether the main emerging trends in technology have been factored into your strategic planning process, but recognize that some of these top technologies could fail. Table of Contents Analysis...2 Describe Information...4 Big Data... 4 Integrate Information...6 Modern Information Infrastructure...6 Share Information... 8 Semantic Technologies...8 The Logical Data Warehouse Integrate/Store Information NoSQL DBMSs In-Memory Computing...14 Govern Information...18 Chief Data Officer and Other Information-Centric Roles Information Stewardship Applications Information Valuation/Infonomics Recommended Reading...23 List of Tables Table 1. Top Nine Technology Trends Likely to Impact Information Management in List of Figures Figure 1. Gartner's Information Capabilities Framework... 7 Figure 2. Taxonomy of In-Memory Technologies Analysis This document was revised on 22 February The document you are viewing is the corrected version. For more information, see the Corrections page on gartner.com. Page 2 of 25 Gartner, Inc. G

3 Information is one of the four powerful forces changing the way business is done. In the Nexus of Forces, information is the context for delivering enhanced social and mobile experiences, and for connecting the OT and IT worlds. Mobile devices are a platform for effective social networking and new ways of working; social links people to their work and each other, in new and unexpected ways; and cloud enables the delivery of information and functionality to users and systems. The Nexus of Forces are intertwined to create a user-driven ecosystem of modern computing. Significant innovation continues in the field of IM technologies and practices. The key factors driving this innovation are the explosion in the volume, velocity and variety of information, and the huge amount of value and potential liability locked inside all this ungoverned and underused information. However, the growth in information volume, velocity, variety and complexity, and new information use cases, makes IM infinitely more difficult than it has been. In addition to the new internal and external sources of information, practically all information assets must be available for delivery through varied, multiple, concurrent and, in a growing number of instances, real-time channels and mobile devices. All this demands the ability to share and reuse information for multiple context delivery and use cases. More importantly, it demands new skills and roles. For 2013, we have identified nine technology trends that will play different roles in modernizing IM (see Table 1). Some help structure information; some help organize and mine information; some integrate and share information across applications and repositories; some store information; and some govern information. These 2013 top technology trends make the role of information governance increasingly important. This document updates Gartner's analysis of the top technology trends in IM. These technologies are strategic, because they directly address some of the broad challenges faced by enterprises (see "Information Management in the 21st Century"). Table 1. Top Nine Technology Trends Likely to Impact Information Management in 2013 Purpose Describe information Integrate information Share information Integrate/store information Govern information Technology Trend Big data Modern information infrastructure Semantic technologies The logical data warehouse NoSQL DBMSs In-memory computing Chief data officer and other information-centric roles Information stewardship applications Information valuation/infonomics Source: Gartner (February 2013) Gartner, Inc. G Page 3 of 25

4 This document should be read by CIOs, information architects, information managers, enterprise architects and anyone working in a team evaluating or implementing: An enterprise content management system A business intelligence (BI)/analytic application A master data management (MDM) technology A document-centric collaboration initiative An e-discovery effort An information access project A data warehouse (DW) system In this document, the main factors that Gartner expects to have the biggest impact in the next 12 to 18 months are discussed. Also, recommendations are given about responding to these factors and how to improve IT strategies. Describe Information Big Data Analysis by Merv Adrian and Mark Beyer Gartner estimates that big data will generate $232 billion in revenue cumulatively from 2011 to 2016 (see "Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016"). Gartner defines big data as "high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making." Big data warrants innovative processing solutions for a variety of new and existing data, to provide real business benefits. But processing large volumes or wide varieties of data, remains merely a technological solution, unless it is tied to business goals and objectives. The only reason for pursuing big data is to deliver against a business objective. For example, knowing how a large population responds to an event is useless unless a business can benefit from influencing that event or its outcomes. New forms of processing are not necessarily required, nor are new forms of processing always the least expensive solution (least expensive and cost-effective are two different things). The technical ability to process more varieties of data in larger volumes is not the payoff. The most important aspects of big data are the benefits that can be realized by an organization. Increasingly diverse datasets complement each other and permit businesses to fill in missing gaps in the body of information. Filling these gaps improves operations and decisions, and enhances business delivery. There are new types of information asset that warrant processing innovations. These assets are not necessarily bigger, but the value of combining them with each other and existing assets requires bigger processing and management capabilities. As more assets are combined, the tempo of record creation, and the qualification of the data within use cases, become Page 4 of 25 Gartner, Inc. G

5 more complex. Marketers promoting software, services and hardware, frequently attempt to isolate one or two aspects of big data to create demand for their offerings. Big data information assets themselves are only one part of the issue. The other is the demand for innovative, cost-effective forms of processing from capture, through storage, into access and all the way to analytics and, ultimately, data management throughout the life cycle. Information has different quality, security and access requirements at different points of its life cycle. These differences create much of the complexity of big data. It's no longer news that vendors are almost universally claiming that they have a big data strategy or solution. However, Gartner clients have made it clear that big data must include large volumes, processed in streams and batches (not MapReduce). They need both, as well as an extensible services framework that can deploy processing to the data, or bring data to the process, and which spans more than one variety of asset type (that is, not just tabular, stream or text). Gartner research indicates that in 2012, the cost ratio of delivering big data services compared to using software to manage and analyze big data, was almost 15-to-1. Therefore, the focus of big data efforts is on solution delivery, not software purchase. Also, organizations addressing big data should insist on big data vendors providing the required functionality. Initially, vendors will discount such demands because they are rare, but they will become more commonplace and force product road map changes (see "Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016"). Products with established functionality, and possibly those already in beta testing status, should be accepted as only partial solutions. Organizations also need to focus on the specific features that lower the services to software/hardware cost ratios to less than 15-to-1, aiming for, at most, 10-to-1. For example, Gartner anticipates that by 2015 this ratio will approach 8-to-1, based on anticipated advances in software tools. Organizations should ask reference customers what their actual spend on internal staffing is, and what the costs for the necessary increase in staff, and consulting and implementation services charges from vendors, will be, compared to any software and hardware costs. There have been a large number of inquiries with Gartner about this topic, and this is evidence of growing hype in the market. Also, aspects and types of big data have been around for more than a decade. It is only the recent market hype about legitimate new techniques and solutions that has created this increased demand. Enterprises should be aware of the many big data use cases. Addressing all the extreme aspects of 21st-century IM permits better analysis of all the available data, and the detection of even the smallest details in the body of information. This is a precursor to the effective use of Pattern-Based Strategy and the new type of applications this enables. In the case of complex event processing, queries are complex because of the many different feeds, and volume can be high or low. Also, velocity varies from high to low. Analytics can leverage technology, such as MapReduce, to access data in external Hadoop Distributed File System (HDFS) files, BI in-database, or as service calls managed by the DBMS. Recommendations: Find out where big data work is already going on in your organization and leverage it. Some industries, such as finance, telecommunications and government, have already implemented multiple big data projects. Gartner, Inc. G Page 5 of 25

6 Identify if there are business use cases for existing business processes, or ones that enable new business processes. Determine new information asset types that support the business cases, or "dark data" that can be accessed and analyzed. Start to determine your architectural standards for addressing big data. Organizations that have not addressed the more traditional requirements of storage, processing and information architecture, need to carefully compare the use of big data solutions with more traditional ones. As you consider big-data-vendor-proffered solutions, request and review reference customers that have used them. Determine their professional staffing costs and compare them to the cost of new software/hardware. Secure only those solutions that obtain better than 15-to-1 ratios, and specifically seek those with better than 10-to-1 customer-reported ratios. Apply the tests to volume, variety and velocity. If volume is increasing as described, or variety is expanding, or velocity is creating temporal challenges in data, then you have a big data opportunity. Be aware that any combination of two "Vs" forces increased complexity in the analysis and the temporal relationships, and creates data availability issues. Integrate Information Modern Information Infrastructure Analysis by Ted Friedman and Mark Beyer IM is a discipline that requires action in many different areas, most of which are not technologyspecific (see "Gartner's Enterprise Information Management Framework Evolves to Meet Today's Business Demands"). In addition to these softer aspects of EIM, central to success is an enabling technology infrastructure that helps information producers and information consumers organize, share and exchange any type of data and content, anytime, anywhere. This enabling technology infrastructure is what Gartner calls a modern information infrastructure. The information infrastructure consists of various IM-related technologies that are deployed in the enterprise, and is meant to support the provisioning of all kinds of data from producers of that data, to the various consuming applications and processes. For most organizations, information infrastructure is not planned and managed in a proactive or cohesive way. As such, it often has overlaps and creates redundancies, gaps and discrete silos that support only limited use of data. For organizations to be successful in the 21st century, a different approach to information infrastructure is required, and IM presents an opportunity for introducing a new level of architectural discipline. However, it is important that business process management (BPM), application development and enterprise architecture disciplines, can leverage this approach, which will have different categories of capabilities and may need to be developed separately. Because it must support a wide range of information use cases and information types, it is essential that information infrastructure be viewed as strategic, so that a vision to develop it in a cohesive and aligned way over time is possible. Gartner's Information Capabilities Framework represents such a vision. This framework (see Figure 1) describes the structures that lie between the many different sources of information (such as transactional data and social information) and the use cases for that information (such as analytics and transactions). It includes common capabilities for Page 6 of 25 Gartner, Inc. G

7 using information expressed as a set of verbs (describe, organize, integrate, share, govern and implement) and specialized capabilities, as well as the standards, semantics and metadata schemes that give information these capabilities (see "The Information Capabilities Framework: An Aligned Vision for Information Infrastructure"). Also, the Information Capabilities Framework is a model describing the range of technology capabilities that are necessary to deliver a solid information infrastructure, and the manner in which those capabilities are integrated and exposed to various information use cases and applications. Organizations that establish a road map for this type of cohesive, application-independent and information-source-independent set of IM technology capabilities, are best placed to achieve long-term EIM goals. Figure 1. Gartner's Information Capabilities Framework Specialized Capabilities Information Semantic Styles Dedicated Registry External Auditing Consolidation Orchestration Manage Metadata Common Capabilities Describe Organize Integrate Share Govern Implement Provision Search Publish Standardize Enrich Discover Model Profile Identify Value Abstract Reconcile (Semantics) Categorize Relate Aggregate Resolve (Entities) Propagate Optimize Quality Assure Protect Manage Life Cycle Build Test Operate Format/ Structure Persist Synchronize Ingest Access Reformat Deliver Transform Secure Retain Monitor Administer Back up/ Restore Source: Gartner (February 2013) Gartner, Inc. G Page 7 of 25

8 However, most organizations are a long way from these ideals in terms of the state of their information infrastructure. Looking at activities in numerous IT functions from various industries, we have identified common opportunities where organizations can strengthen their capabilities, or begin to align and deploy the relevant technology in a more focused, consistent and shared way. In "Ten Ways to Strengthen Your Information Infrastructure," we describe 10 ways in which existing information infrastructures can be improved in line with the Information Capabilities Framework vision, and offer recommendations for how organizations can begin to act in these areas to develop a modern information infrastructure. Recommendations: Use Gartner's Information Capabilities Framework to assess your organization's maturity and gaps, and establish a vision and road map for increasing the effectiveness of your information infrastructure. Look at the current and anticipated range of information use cases to identify which capabilities will be most important and the highest priority. Focus on capabilities supporting the development of models that provide transparent meaning, authority and workflow, for all critical information assets in the enterprise. Position information infrastructure as a strategic component of your EIM initiatives, to secure the requisite support and resources. Share Information Semantic Technologies Analysis by Debra Logan and Frank Buytendijk Semantic technologies extract meaning from data, ranging from quantitative data and text, to video, voice and images. Semantic technologies also include systems that aid the organization and classification of electronic information, adding meaning to existing data. A third category of semantic technologies is machine-learning paradigms. Semantic technologies also include automated decision making, such as expert diagnostic systems and decision rules for data classification. Here, "meaning" describes the extent to which constructs that are "rich logic," and meaningful to people, can also be interpreted by computers. As a consequence, the extent to which a technology has semantic capabilities is scalable. Programming languages are poor semantic technologies, because they have little human meaning unless a person is trained. Data model diagrams are also semantically poor the logic in them doesn't have a rich set of expressions. Ontology-driven software, describing relationships between data elements, scores higher on the semantic scale. Computers can infer and use new insights through the discovery of relationships between data elements, in a way that is also meaningful and understandable to users. Similarly, computers can analyze and group documents based on examples given by humans. This is a form of machine learning that is being increasingly used when large sets of documents must be somehow reviewed, for example, in legal proceedings. Page 8 of 25 Gartner, Inc. G

9 Semantic analysis ranges from the simple such as grouping things into categories to the complex building highly specific models to represent the "real world" and the relationships of groups of things in the real world, for example, diagnostic expert systems. Semantic technologies include: Ontologies Taxonomies Entity resolution Classification Categorization and tagging Content analytics and clustering Content monitoring and filtering Predictive document coding Expert systems Neural networks Various kinds of decision support systems Semantic processing often uses a metadata layer, through a level of abstraction, typically in the form of a metadata model. There are two main ways to build these semantic metadata models. Firstly, the traditional way is by using a design process. After design and implementation, the user just sees the model, and the model is imposed on the data. (BI tools function in a similar, often labor-intensive, way.) Secondly, another possibility is by using automated discovery techniques. Through content analysis (including data, text, audio and video), models describing apparent relationships are created, modified and maintained. Many of these techniques have existed for years and are based on advanced statistics, data mining, machine learning and knowledge management. One reason they are garnering more interest is the renewed business requirement for monetizing information as a strategic asset. Even more pressing is the technical need. Increasing volumes, variety and velocity big data in IM and business operations, requires semantic technology that makes sense out of data for humans, or automates decisions. Examples include: MarkLogic, which claims its semantic services can be used to provide a contextual user experience based on information for any type of user experience (for example, desktop, tablet or phone). AlchemyAPI offers a text-mining platform that features semantic analysis capabilities in the natural-language processing (NLP) field. Oracle-Collective Intellect uses latent semantic analysis and NLP to understand conversations and assign sentiments to them. Gartner, Inc. G Page 9 of 25

10 SAS's products include systematic linkages across text repositories using semantic relationships. Semantic technologies are found in many types of enterprise software: Analytics IM Databases Business applications BPM Governance, risk and compliance Security They are increasingly being used for data classification and tagging. This is to clean up legacy information, because large enterprises are increasingly striving to gain control of their data storage costs, meet legal and regulatory requirements, or simply find what is useful and relevant among the vast amount of redundant, outdated and trivial data. Using semantic technologies for building taxonomies and ontologies is already becoming more accepted. We estimate that a still small (but rapidly growing) number of large organizations are doing this. Automated discovery techniques are being used significantly less, but leading organizations have already started experimenting with it. Vendors, both the leading ones in enterprise software, as well as many small best-of-breed parties, will start promoting semantic technology before Recommendations: Consider using semantic technologies when one or more of the following criteria apply: You have massive volumes of data or a large amount of documents that need categorization. There are expensive and recurring data quality issues. Content security issues present a risk to your company. Common semantic models are widely adopted in your industry. Or, conversely, there is competitive advantage in exploiting semantic technologies as part of your big data initiatives. The Logical Data Warehouse Analysis by Mark Beyer DW architecture is undergoing an important evolution, compared with the relative stasis of the previous 25 years. Although the term "data warehouse" was coined around 1989, the architectural style predated the term (for example, at American Airlines, Frito-Lay and Coca-Cola). Page 10 of 25 Gartner, Inc. G

11 At its core, a DW is a negotiated, consistent and logical model populated by using predefined transformation processes. Over the years, the various options centralized enterprise DWs, federated marts, hub-and-spoke arrays of central warehouses with dependent marts, and virtual warehouses have all emphasized certain aspects of the service expectations required for a DW. The common thread running through all the styles is that they were repository-oriented. This, however, is changing. The DW is evolving from competing repository concepts, to include fully enabled data management and information processing platforms. These new warehouses force a complete rethink of how data is manipulated, and where in the architecture each type of processing occurs that supports transformation and integration. It also introduces a governance model that is only loosely coupled with data models and file structures, as opposed to the very tight, physical orientation used before. This new type of warehouse the logical DW is an IM and access engine, which takes an architectural approach and de-emphasizes repositories in favor of new guidelines: The logical DW follows a semantic directive to orchestrate the consolidation and sharing of information assets, as opposed to one that focuses exclusively on storing integrated datasets. The logical DW is highly dependent on the introduction of information semantic services. The semantics are described by governance rules from data creation and use case business processes in a data management layer. This is instead of going through a negotiated, static transformation process located within individual tools or platforms. Integration leverages steady-state data assets in repositories and services in a flexible, audited model, using the best available optimization and comprehension solutions. To develop a logical DW, enterprises should start by introducing the logical DW concepts used for dynamic consolidation, integration and implementation (see "Does the 21st-Century 'Big Data' Warehouse Mean the End of the Enterprise Data Warehouse?"). The biggest changes in 2012 regarding logical DW practices and adoption were: A better understanding of the separation of service-level expectations in the warehouse, which created clear SLAs for technology. Repositories are now used for pervasive, persistent and latency-tolerant analytics. Federation/virtualization techniques are used for fluctuating data and query models, and/or demands for current data, direct from operational sources. Also, distributed processing is becoming the preferred architecture for comprehensive analytics (see "Understanding the Logical Data Warehouse: The Emerging Practice"). Client implementation practices indicate that the old "80/20" rule of analytics data management (in which 80% of end-user needs can be met with 20% of the available enterprise data, which is delivered under a single logical data model) is giving way to a new rule. Gartner calls this the "80/10/5" rule. For 80% of user needs, the repository follows the long-standing 80/20 rule already highlighted. However, 10% of user needs generally appear in the federation/ virtualization model, and 5% of end-user analytic data management needs are met through distributed processes. The remaining, or undeclared, 5% of analytic needs have not yet been Gartner, Inc. G Page 11 of 25

12 specified for a preferred technology deployment. Also, they are usually managed and used by data scientists conducting investigations. Federation tools and DBMSs have begun to compete for the coordination role in determining the analytic applications entry point to logical DWs. The DBMS provides federation capability through in-built capabilities, such as accessing external tables and foreign data types. Its capability of utilizing long-standing and highly successful performance optimization techniques, such as parallelization, allows it to serve as a stable basis for incorporating new processing models. At the same time, the ability to store functionality as user-defined functions, stored procedures and via SQL extension frameworks to add new processing instructions, allows DBMSs to continue extending their data management domains. Federation tools are challenging DBMSs for the information access and coordination role, by offering enhanced connectivity, multitiered caching, incremental caching and logical modeling capabilities that can be extended to unstructured data calls. Logical DW adoption is well under way. In our recent Magic Quadrant customer survey for the DW DBMS market, leading DW implementers (representing 15% to 20% of DW practitioners in the market) reported that 49% of their warehouses have introduced Hadoop or NoSQL as preprocessor platforms for data integration (a logical DW characteristic), or will introduce this type of solution in the next 12 months. Similarly, 19% of leading implementers report that they are combining their existing relational warehouse with Hadoop or NoSQL (a logical DW characteristic) solutions to create blended warehouses. At the same time, only 11% of these implementers report that they will completely replace existing warehouses with Hadoop clusters or NoSQL solutions (a competing architecture for analytics data management). This demonstrates that the logical DW is becoming the preferred best practice in the marketplace, more so than full replacement, by a wide margin (see the forthcoming document "The Future of Data Management for Analytics Is the Logical Data Warehouse"). Note that with the logical DW approach, the differing styles of support such as federation, data repositories, messaging and reduction are not mutually exclusive. They are just manifestations of data delivery. The focus is on getting the data first, then figuring out the delivery approach that best achieves the required SLA with the querying application. The transformations occur in separate services (see "The Logical Data Warehouse Will Be a Key Scenario for Using Data Federation"). Recommendations: Start your evolution toward having a logical DW by identifying data assets that are not easily addressed by traditional data integration approaches and/or easily supported by a "single version of the truth." Consider all technology options for data access and do not focus solely on consolidated repositories. This is especially relevant for big data issues. Identify pilot projects in which to use logical DW concepts, by focusing on highly volatile and highly interdependent business processes. Use a logical DW to create a single, logically-consistent information resource that is independent of any semantic layer, and specific to a particular analytic platform. The logical DW should manage reused semantics and reused data. Page 12 of 25 Gartner, Inc. G

13 Integrate/Store Information NoSQL DBMSs Analysis by Merv Adrian NoSQL DBMSs key-value stores, document-style stores, and table-style and graph databases are designed to support new transaction, interaction and observation use cases involving Web scale, mobile, cloud and clustered environments. Most NoSQL offerings, unintended for typical transactional applications, do not provide any atomicity, consistency, isolation or durability properties. Interest in NoSQL within the programmer community has expanded customer counts, use cases and download volumes are increasing, and packed conferences are springing up in multiple geographies. Adoption is increasing as commercial providers add increasing functionality, support, training and community building. Job listings and inquiries to Gartner also reflect this rise in interest. NoSQL DBMS usage continues to be driven by programmers, not the typical database team. The limitations of batch-style MapReduce-on-HDFS, are driving increased interest in NoSQL data stores from Hadoop distributors. For example, specialists like Cloudera and megavendors like IBM and EMC, are reporting increased use of HBase. Another trend is the continued growth of distributions to include other developing open-source projects, such as DataStax's addition of Apache Solr to its Cassandra-based Hadoop distribution. Big vendors are responding. Oracle's new Oracle NoSQL Database 11g, derived from BerkeleyDB, is a core element of its Big Data Appliance; Amazon's DynamoDB became available in January 2012, representing a significant trade-up from its SimpleDB. Also, many NoSQL offerings are hosted in the cloud by Amazon and others. Microsoft and IBM remain on the sidelines, although IBM has made a tentative probe into the graph database market with triplet and Sparql support in DB2 v.10. Increasing adoption and growing customer demands have opened up a significant gap between commercially-supported NoSQL DBMSs and open-source projects that only have community support. The latter remain immature and are used by Web developers for applications that are not mainstream. Commercial products are using their added funding not only to build sales, support and marketing, but also to add enterprise-class features intended to widen adoption and win new business. For example, 10gen claims its MongoDB Management Service has over 5,000 users, and Couchbase's CouchSync targets integration between cloud and mobile devices. The growth of the ecosystem partners and supporting products will have an impact on broadening adoption. Dell-Quest Software reports that its Toad for Cloud Databases with support for HBase and Mongo, as well as Cassandra and Amazon SimpleDB is gaining traction in shops that use its tools for mainstream DBMSs. Informatica 9.5, which came onto the market in May 2012, added support for Hadoop source data integration. Added support for leading NoSQL targets will not be far behind. Hosting players not just Amazon and Rackspace, but specialists like MongoLab offer a lower cost of entry and a place to experiment. Gartner, Inc. G Page 13 of 25

14 Continuing market penetration, and follow-on project growth from existing customers, is also expected in Nonetheless, despite offerings and ecosystems continuing to mature, there is a long way to go. Awareness is still limited and the leading players remain off the direct sales playing field, slowing their penetration of corporate IT strategic plans. As a result, business impact in 2012 was moderate, but, in 2013, is increasing as more organizations investigate and experiment. Decisions about how to persist data for many new-wave applications are being made by a new generation of programmers. These programmers are willing to use special-purpose NoSQL data stores that make their work less complex and provide greater flexibility. Notably, programmers should not drive enterprise procurement, but their influence is being felt. The emergence of in-memory DBMSs (IMDBMSs) has led to the potential for capturing many of the same use cases. However, language choices and commercial considerations may make NoSQL an option, despite IMDBMS success. Organizational unwillingness to build on open source, has been a main block to its success, but the rise of commercializers is beginning to shift perceptions. The availability of data integration tools that offer an understanding of data structure in relational DBMSs to support data movement into other systems (such as DWs and other applications) will be a key enabler. NoSQL vendors are pursuing integration with more data management processes, and the recent emergence of Apache HCatalog a metadata standard for the Hadoop stack will accelerate development in this area. Recommendations: For multi-user, complex applications requiring high-performance transactional capability, NoSQL DBMSs are not an option, due to their lack of atomicity, consistency, isolation and durability properties, and should not be considered. For Web-scale applications requiring large data stores with high performance especially when transactions are read-only or not complex, and can be supported by a nonatomicity, consistency, isolation and durability model some of the more mature NoSQL DBMSs can be used. For transactions that do not require atomicity, consistency, isolation and durability properties, and have complex, mixed data types, these can be very effective. Commercial NoSQL DBMSs can be used for use cases that are well-served by graph, document or key-value architectures. NoSQL DBMSs are well suited for applications expected to have frequent updates. Scale and new requirements will demand rapid iteration. In-Memory Computing Analysis by Massimo Pezzini and Roxane Edjlali In-memory computing is an emerging paradigm, enabling user organizations to develop applications that run advanced queries on very large datasets, or perform complex transactions at least one order of magnitude faster (and in a more scalable way) than when using conventional architectures. This is achieved by storing application data in-memory (that is, in the computer's main memory), rather than on electromagnetic disks, without compromising data availability, consistency and integrity. Page 14 of 25 Gartner, Inc. G

15 In-memory computing opens unprecedented and partially unexplored opportunities for business innovation (for example, via real-time analysis of big data in motion) and cost reduction (for example, through database or mainframe off-loading). However, until recently, only the most deeppocketed and technologically-savvy organizations (in verticals like financial trading, telecommunications, the military and defense, online entertainment, and logistics) could afford the high costs and deal with the complexities of adopting an in-memory computing approach. Currently, the forces driving mainstream users to increasingly look at in-memory computing as an affordable paradigm, are the dramatic and never-ending decline in DRAM and NAND flash memory prices. Added to this is the availability of commodity multicore 64-bit microprocessors, which can directly address extremely large main memories (theoretically up to one billion gigabytes) and concurrently parallel-process large datasets in a computer's main memory. However, equally important for the widespread adoption of in-memory computing, is the availability of application infrastructure software that makes it possible for application developers to take advantage of the hardware potential in a reasonably user-friendly way. Gartner has identified a variety of application infrastructure technologies that make available APIs, programming models, infrastructure services and tools to address this requirement. In some cases, these technologies map well-known paradigms (for example, DBMSs) on an in-memory set of data structures. In other cases (for example, complex event processing platforms) they implement a totally new programming paradigm, enabled by the in-memory computing paradigm of low-latency access to vast arrays of in-memory data. In some cases, these software technologies are still in the early stages of their life cycle, and are therefore immature and not well understood. But in other cases, they are already widely used (some products have thousands and even tens of thousands of users) and are rapidly achieving adoption by mainstream, risk-averse organizations. Gartner has identified a number of different in-memory technologies, each used for different purposes and contributing in different ways to the overall infrastructure landscape (see Figure 2 and "Taxonomy, Definitions and Vendor Landscape for In-Memory Computing Technologies"). Database administrators and IT architects will need to explore which combination of in-memory technologies will be required to meet their business needs. Gartner, Inc. G Page 15 of 25

16 Figure 2. Taxonomy of In-Memory Technologies In-Memory-Enabled Applications In-Memory Application Platforms In-Memory Analytics Event Processing Platforms In-Memory Application Server In-Memory Data Management IMDBMS In-Memory Data Grid In-Memory (Low-Latency) Messaging Memory "Intensive" Computing Platform (DRAM, Flash, SSD, Multicore, InfiniBand, Clusters, Grid, Cloud) IMDBMS = in-memory DBMS; SSD = solid-state drive Source: Gartner (February 2013) Business drivers for in-memory computing adoption include: The incessant quest for faster performance and greater scalability that is required for 24/7 support, and global scale, in Web and mobile-enabled businesses. The desire to analyze in real time increasing volumes of data at a very fine-grained level of detail. The necessity for obtaining real-time business insights that support operational decision support. The appetite for unconstrained data visualization and exploration delivery to a wide audience of business users. Factors inhibiting in-memory computing usage include: High software license costs and uncertain ROIs Scarcity of in-memory computing development and IT operation skills A lack of standards Limited availability of best practices The fragmented technology landscape Page 16 of 25 Gartner, Inc. G

17 An overcrowded market Skepticism about new paradigms Mostly, these are not structural factors, but are rather related to the still-turbulent state of the inmemory computing industry, and vendors' desire to maximize revenue. Therefore, the drivers will eventually prevail over the inhibitors, which will lead in-memory computing to widespread adoption by user organizations of any size, in any industry and geography. Further favoring this trend is a growing use of in-memory computing technologies (for example IMDBMSs or in-memory data grids) "inside" a variety of hardware and software products that are designed to improve their performance, scalability and availability. Such an "in-memory computing inside" approach is typically implemented in a way that minimally impacts the applications leveraging these products. Therefore, in many cases, users can experience the improved quality of service (QoS) benefits derived from in-memory computing, with minimal impact on their established applications and skills. The in-memory computing application infrastructure technologies that are attracting the most user interest, or that are most widely deployed in real-life production, are: IMDBMSs. This is a DBMS that stores the entire database structure in-memory, and accesses it without the use of input/output instructions. (It has all the necessary structure in-memory, and is not simply an in-memory disk-block cache.) IMDBMSs can also use flash memory or diskbased storage for persistence and logging. Products in the market are specialized for analytical or online transaction processing applications. However, some vendors are trying to create hybrid products that are equally capable of covering both scenarios. In-memory data grids. These provide a distributed, reliable, scalable and consistent inmemory data store the data grid that is shareable across distributed applications. These concurrently perform transactional and analytical operations in the low-latency data grid, which minimizes access to high-latency, disk-based data storage. In-memory data grids maintain data-grid consistency, availability and durability by using replication and partitioning. Although they can't be classified as full application platforms, in-memory data grids can also host application code. High-performance message infrastructures. These provide program-to-program communication with a high QoS, including assured delivery and security. They also leverage innovative design paradigms to support higher throughput (in messages per second), lower latency (in milliseconds for end-to-end delivery time) and more message producers (senders) and consumers (receivers), than traditional message-oriented middleware products. In-memory analytics platforms. This is an alternative BI performance layer in which detailed data is loaded into memory, for the fast query and calculation of large volumes of data. This approach removes the need to manually build relational aggregates and generate precalculated cubes, to ensure analytics run fast enough for users' needs. In-memory analytics platforms usually incorporate (or integrate with) an underlying in-memory data store. Complex event processing. This is a kind of computing in which incoming data about events is distilled into more useful, higher-level complex event data that provides insight into what is happening. It is event-driven because the computation is triggered by the receipt of event data. Gartner, Inc. G Page 17 of 25

18 It is used for highly demanding, continuous-intelligence applications that enhance situation awareness and support real-time decisions. Also, it can be supported by specialized, inmemory-based complex event processing platforms. In-memory application servers. These are innovative application platforms designed to support high-performance/high-scale enterprise- and global-class applications, by combining in-memory technologies, scale-out architectures and advanced, often event-based, programming models. In-memory application servers usually incorporate (or integrate with) an underlying in-memory data store to hold the database of records, provide distributed transactions, and implement high-availability and elastic scale-out architectures through sophisticated, in-memory-enabled clustering techniques. Recommendations: Brainstorm with business leaders about projects that target business growth or planned transformational initiatives. Explore how in-memory computing can provide breakthrough support for these. Define a holistic in-memory computing strategy for your company, addressing long-term strategic business benefits and short-term tactical goals. However, be prepared (in terms of skills and organizational setting) to support different in-memory computing technologies that address different business needs. Partner with the business to understand what delivers greatest business insight and competitive differentiation, in terms of the ability to analyze, and much more responsively explore, data via in-memory analytics and an in-memory DBMS. Invest in data governance processes and tools to avoid a wild proliferation of in-memory data that could lead to data chaos. Govern Information Chief Data Officer and Other Information-Centric Roles Analysis by Debra Logan and Ted Friedman EIM requires dedicated roles and specific organizational structures. Specific roles such as chief data officer, information manager, information architect and data steward will be critical for meeting the goals of an EIM program. The fundamental objectives of the roles remain constant: to structure and manage information throughout its life cycle, and to better exploit it for risk reduction, efficiency and competitive advantage. We are increasingly seeing information-focused roles develop within specific business functions. BI/ analytics, data quality, data warehousing, big data initiatives, and a general desire to improve, correct and align reporting and metrics, are common reasons for creating such roles. In addition, many examples have been emerging in the area of legal and regulatory compliance, where the absolute business need to produce electronic information has forced IT and legal departments to work together to create crossfunctional roles. Some leading organizations have been creating a new Page 18 of 25 Gartner, Inc. G

19 chief data officer role aimed at maximizing the value and use of data across the enterprise, and to manage the associated risks. Currently, the chief data officer tends to be more aligned with business than IT, but has significant influence over both. A Gartner Research Circle panel survey of more than 300 organizations, conducted in July 2012, found that only 7% had a chief data officer. Our secondary research scans concur and find that most existing chief data officers are in the U.S. However, over the next five years, enterprises with a high information element in their core valuecreating activities, will experience an increasing need for a single corporate leader of information policy and strategy. The need will be more acute in those industries where the effect of regulation compels action to be taken on information transparency for compliance or e-discovery, to defend against legal attacks. There are also emerging roles around exploiting the value of information, which may also have the title of CDO or chief digital officer. Demand for data steward roles is growing. According to the survey from July 2012, Gartner clients expect recruitment of these roles to increase by 21% over the next two years. These roles are unlike those currently employed by business units (BUs), because they will need to have domain expertise and technical knowledge, combined with an understanding of information classification and organizational techniques. In most companies, these roles and job titles vary and are only just now coming into existence most still retain business-oriented titles. The enterprises that are moving first to create these roles, and to train for them, will be the first to benefit from information exploitation. IT is changing this has become the accepted wisdom. The questions that we are now answering are "how is IT changing?" and "what new roles will be required to prepare for the change?" Some of the roles that we have seen are: Data stewards Information managers Digital archivists Knowledge managers Business analysts Information/data scientists Chief data officers IT legal professionals We have been tracking this activity for several years, and believe that the trend is about to become much more prevalent as the success of early adopters of such roles becomes more widely publicized. One of the most important issues here is that we are not talking about pure technology adoption, but about human beings who must essentially crosstrain from their area of domain expertise, to add a set of IM skills to what they already know. It's not going to happen quickly. The scarcity of information leader talent will require executive leaders to develop it as much as hire it. Also, because creating EIM-specific roles requires that people change their job responsibilities or that personnel are actually hired to do these jobs strong business cases are always going to be required. Gartner, Inc. G Page 19 of 25

20 Recommendations: Create training programs for business-based information managers and data stewards (domain experts who need to learn new IM skills) that will focus on the new IT functions. Hire more individuals with IM and organizational skills (for example, information architects, data stewards and data scientists), who can serve as crosscompany resources. All enterprises should do this. Finally, advising BU leaders about how to task and manage their new IM professionals, is something that CIOs will need to do. Information Stewardship Applications Analysis by Andrew White and Ted Friedman Governance of data is a people- and process-oriented discipline that forms a key part of any EIM program, including MDM. The decision rights and authority model that forms governance has to be enforced and operationalized. This means that this technology is needed to help formalize and combine the day-to-day stewardship processes of (business) data stewards into part of their normal work routines. The formation of this specific toolset needs to be closely targeted at the stewardship of primarily structured data. This packaged technology is slowing progressing. Initial vendors, including Collibra, Kalido, Global Data Excellence and SAP, have launched products in the last couple of years, and many more are formulating plans for an offering, or working on multiple offerings. The continued high growth and interest in MDM programs is driving much of the interest in this technology, because MDM gives these solutions recent and specific context, which makes them applicable and meaningful to users. However, other initiatives, such as data quality improvement and broadening information governance goals, are also driving demand. Individual tools, like data quality tools that output the data-level analytics monitored and enforced by these applications, have existed for years. They are mature and can help support a manual stewardship project. These new applications also promise to be more far-reaching, so that they can support a much broader and more automated stewardship process. With the new growth and interest in big/social data, these tools will provide a basis for organizations to steward data from outside their organizations, which will become important as more and more big data challenges arise. The emergence of growing numbers of integrated tools for information stewardship will soon challenge established MDM implementation efforts, because each new MDM program brings native governance requirements that greatly vary. In the short term, organizations are getting by using today's MDM offerings, data quality tools, security/privacy and archiving tools, and adding manual processes. In about five to 10 years' time, mature IT organizations will formally integrate BPM tools with MDM tools, to express information governance routines and monitor business activity and corporate performance management. Some vendors offer part of this technology, but, often, not as an integrated environment for use by the master data steward, or to be deployed across all other master data stores. Additionally, these and other applications will evolve to steward different data in support of other initiatives. However, not all these applications are evolving to steward other data in addition to master data. Page 20 of 25 Gartner, Inc. G

21 The governance of information is a core component of any EIM discipline. MDM a critical EIM program cannot be sustained without an operational master data stewardship role and function. At worst, the lack of effective governance will lead to the failure of EIM initiatives; at best, it will result in undesirable attributes. The business case for MDM, for example, won't be realized. A successful stewardship routine will lead to sustainable and persistent benefits from programs like MDM, such as increased revenue, lower IT and business costs, reduced cycle times (for example, in new product introductions) and increased business agility. Recommendations: Recognize the general lack of maturity in technology offerings related to governing data across multiple hubs and application data stores. Today, most solutions are best-suited for the singlehub level. Many MDM solutions that are focused on one of a few data domains, have some governance capability. For the next two to three years, most MDM implementations will focus on tools that manually define and manage governance, with limited help from technology vendors across MDM systems or the enterprise. Work with your MDM technology providers to help them understand what MDM tools must be made operational. If you need to steward other data outside an MDM program, be cautious, because the lack of a unifying driver like MDM could lead to fewer vendor options. Information Valuation/Infonomics Analysis by Debra Logan, Doug Laney and Andrew White Information valuation is the process by which relative value or risk is assigned to a given information asset or set of information assets. The term "information asset" encompasses any digital or physical object, or corporate data. Also, an information asset is any information artifact that can produce value for a company. One of the main characteristics of an asset is that it can be viewed as a risk, depending on the circumstances. The question of the value of information has been around for a long time. The phrase "knowledge is power" sums it up well, because the value of keeping and sharing information has been recognized since the beginning of oral and written communication. However, only since the dawn of the information age, or more precisely the computer age, has the question of "information as an asset to be exploited on a massive scale" become an issue. Gartner clients have struggled with these questions for at least a decade. A more formal approach to information valuation is beginning to take hold in leading-edge organizations. Issues such as how much to invest in IT systems, information security and cloud computing, all depend on the underlying question of how much the information is worth to the business. In "Introducing Infonomics: Valuing Information as a Corporate Asset," we introduce a more formal approach to valuing information. When considering how to put information to work for your Gartner, Inc. G Page 21 of 25

22 organization, it is important to not only think about information being like an asset, but also to actually value and treat it as if it were an asset. Although information meets accounting standards criteria for an intangible asset, it is not found on public companies' balance sheets. If you're not measuring the actual and potential value generated from information assets, then you're in a poor position to close that gap. You will not be in a good position to reap any of the other potential benefits that come from quantifying the value of information either. Any number of established methods for valuing intangibles (for example, market approach, cost approach or income approach) can be used, or organizations can select valuation methods that map to nonfinancial key performance indicators. Business leaders are often slow to understand that they are the ones who must establish the value of information and how it relates to their businesses. IM is seen as the IT department's responsibility, so the formal methods for information valuation will be slow to take hold in the marketplace, and remain mostly subjective. Recommendations: Understand the characteristics of information, which will help you value it (see "Introducing Infonomics: Valuing Information as a Corporate Asset"). Note that information valuation is not a task that many people have experience of. A combination of business and IT professionals, along with others who have a financial background, may be needed to make these decisions. Establish the value of information through a simple high-level categorization scheme: Nuisance information adds no value and may contribute to a glut of data. Its applicability is limited and may only be deemed valuable by one or just a few people in the enterprise. Compliance information, which enterprises are required to keep by law, is more like an insurance policy. Invest in it, just in case. Operational information is related to faster, less costly and more effective operations. It can provide competitive advantage but, more often, is part of the cost of doing business. It is required for keeping up with competitors, not to outpace them. Growth-related information comes from R&D and is associated with innovation, process improvement or mission enhancement. For the next level of analysis, use business-based frameworks. For each business function or role, and for each process, work through the types of document and data that are required to achieve business goals. Many individual projects exist that manage different types of information and, although IM disciplines share common goals of information consistency and usability, they lack options for discussing concepts and coordinating efforts. Enterprise information architecture guides such activities to ensure organizations gain business value from their enterprise information (see "Gartner Defines Enterprise Information Architecture"). Communicate to your business the benefits of treating information as an asset by using business constituencies. Business benefits include: Fewer errors due to data inconsistencies Page 22 of 25 Gartner, Inc. G

23 Greater efficiency in sharing resources across the enterprise An improved ability to compare and aggregate information, such as sales, balances, inventories and prices A greater ability to relate different behaviors, activities and relationships, for example, 360- degree views of customers or citizens Economies of scale Recommended Reading Some documents may not be available as part of your current Gartner subscription. "Information Management in the 21st Century" "Information Management in the 21st Century Is About All Kinds of Semantics" "The Eight Common Sources of Metadata" "Does the 21st-Century 'Big Data' Warehouse Mean the End of the Enterprise Data Warehouse?" "The Importance of 'Big Data': A Definition" "The Multidiscipline Aspect of the Information Capabilities Framework" "Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016" "Defining the Scope of Metadata Management for the Information Capabilities Framework" "The Logical Data Warehouse Will Be a Key Scenario for Using Data Federation" "Value From Information; CIO Desk Reference Chapter 15, Updated Q3 2012" "Introducing Infonomics: Valuing Information as a Corporate Asset" Gartner, Inc. G Page 23 of 25

24 Acronym Key and Glossary Terms BI business intelligence BU BPM DW EIM HDFS IM IMDBMS MDM OT QoS business unit business process management data warehouse enterprise information management Hadoop Distributed File System information management in-memory DBMS master data management operational technology quality of service Evidence 1 This document is based on Gartner inquiry trends in BI/analytics and IM/big data. Page 24 of 25 Gartner, Inc. G

25 GARTNER HEADQUARTERS Corporate Headquarters 56 Top Gallant Road Stamford, CT USA Regional Headquarters AUSTRALIA BRAZIL JAPAN UNITED KINGDOM For a complete list of worldwide locations, visit Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner s prior written permission. The information contained in this publication has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This publication consists of the opinions of Gartner s research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Gartner research may include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firms and funds that have financial interests in entities covered in Gartner research. Gartner s Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its research organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see Guiding Principles on Independence and Objectivity on its website, ombudsman/omb_guide2.jsp. Gartner, Inc. G Page 25 of 25