MarkLogic for Government July 2014
Table of Contents Executive Summary... 3 Big Trends, Big Data... 3 Public Sector Challenges... 4 The Case For NoSQL... 6 Three Use Cases for MarkLogic s NoSQL Platform... 7 Use case 1: Rescuing Healthcare.gov... 7 Use Case #2: Improving Army IT... 8 Use Case #3: Integrating Intelligence... 8 Not All NoSQL Databases Are Created Equal... 8
Executive Summary A decade and a half into the 21st century, it s time government agencies moved on past their 20th century database technology. Durable as it s been, the relational DBMS-plus-Structured Query Language (SQL) applications simply aren t up to today s requirements. You need something radically different. Why? Because computing, applications, and data itself have all changed radically. More precisely, the notions of what constitutes data, and to what applications data is available, have expanded. Data now includes a host of information sources Word documents, PDFs, images, social media posts, and location information, to name a few that simply don t fit into the rows and columns of traditional relational databases. Or, for that matter, the structures in flat file databases. These changes are why a growing number of federal agencies are evaluating a whole new class of databases that don t rely solely on structured query language, yet can act relationally if required. The product they choose most often for mission critical operations is MarkLogic. NoSQL doesn t mean non-sql. To the contrary, developers can use many of the SQL tools with which they are already familiar plus many new ones. In other words, not only SQL, but rather SQL and much more. MarkLogic is indeed radically different, yet it s a robust, tested, and proven alternative. Now in its 7th version since hitting the market 12 years ago, MarkLogic lets government agencies use not only traditional, cell-type data but also the government s vast stores of unstructured data to create value for their internal operations and in the services they are able to deploy to the public. This data takes many forms that, until the advent of NoSQL databases, could not be employed in an integrated way with mainframe and clientserver applications. Another radical benefit: With MarkLogic, the tech staff can dispense with data schema and modeling that, in some cases, can take scores or hundreds of man-hours. Whereas information stored in one relational DBMS may not be immediately compatible with another, when using MarkLogic you can ingest data from virtually any source and begin using it. Big Trends, Big Data Let s take a closer look at some of the data and computing trends affecting government. No discussion of this topic can get around the three important enterprise computing trends: Mobility, cloud computing, and big data. On the mobility front, agencies face the dual challenges of enabling greater mobility in the federal workforce and deploying apps to an increasingly mobile public. MarkLogic Corporation: MarkLogic for Government 3
On the cloud front, the cloud computing phenomenon is coalescing into four basic use cases. Clouds are hosts for productivity applications such as e-mail, software development projects, virtual machines, and big data stores. The last use case is where you find so many types of data. And big data by definition encompasses not only large but also dissimilar data sets harnessed together for analytical needs. These phenomena both encourage and enable the treatment of data as a separate and distinct commodity. Data is no longer tied to a specific application. Mobile apps have many characteristics that distinguish them from client-server or web applications, but high on the list is the way they call on multiple, often unlike data sources. Conversely, they create data that doesn t fit neatly into rows and columns. For example, agencies have thousands of people in the field conducting inspections, gathering evidence, taking censuses of people, animals or things, or testing food and consumer products. They are increasingly using mobile devices to record and upload this information, which can consist of entries in standard databases, images, videos, voice files, and text. Whereas information stored in one relational DBMS may not be immediately compatible with another, when using MarkLogic you can ingest data from virtually any source and begin using it. Whether developing traditional or mobile apps, agencies need the flexibility to incorporate the many types of structured and unstructured data sources. With the MarkLogic Enterprise NoSQL platform, agencies can design applications that use video, audio, XML, digital documents, geospatial information, and text all stored in a single repository. This means contracts, manuals, books, and e-mail are no longer islands of information, unusable by applications. Applications can also incorporate linked data from the web, such as RDF triples, plus social media like tweets, Facebook comments, and blogs. Public Sector Challenges When you look at the public sector through the right prism, you see a very large enterprise encompassing a wide set of vertical industries. Nearly everything done in the private sector has a counterpart in government. For example, federal missions involve transportation, health care, communications, and cybersecurity. Finance, human resources, logistics, procurement, even sales and marketing all occur in the common lines of government business. Government also has large exclusive functions in defense, intelligence, public safety, and law enforcement. MarkLogic Corporation: MarkLogic for Government 4
Many of the government s challenges in these three areas have a data component specifically the ability, or inability, to deal with data from different sources and of different types. Agencies can take on these challenges more readily if they have applications powered by databases which themselves can incorporate both structured and unstructured data. Some challenges relate to mission or operations. Health care provides a good example. Data challenges there range from relatively small to really big. The Defense Department and VA have tried mightily to fashion a single medical record for service members as they migrate to veteran status. It s not lack of desire or even money that s stalled this effort, but rather the sheer technical difficulty of matching data sets. Other challenges are bigger. Collectively the Food and Drug Administration, the National Institutes of Health, TRICARE Management Activity, and others conduct vast amounts of research, collect field information from all over the world, and treat millions of patients. All of this creates structured databases, notes, digital documents, images, and videos. Each data set might be developed for a specific purpose, but as a big data store it can power research and applications limited only by the imagination. Agency-generated datasets become even more valuable when mixed with other publicly available data sources. In one example, the FDA recently issued a request for information on how it could mine social media wikis, blogs, Facebook entries for early warnings of medical device problems or food borne illnesses. Other data challenges stem from policy. Agencies across the board are challenged to use evidence to justify or improve program effectiveness. At one time managerial competence was largely a function of whether allocated funds were spent within the time allotted and within the rules of the program. No longer. Office of Management and Budget guidance, for instance, and the analytical perspectives accompanying the last several federal budget requests specifically call for evidence-based decision-making and budgeting. This goes for contracted projects, grants, and public assistance and subsidy programs. Judging the effectiveness of a national program whether assistance to veterans homelessness or grants to engineering colleges to test unmanned aircraft in controlled airspace often requires synthesizing data from several sources, including those from outside of the agency. Homeland security and law enforcement together make up another prime use case for use of the NoSQL database. In addition to integrating data from many sources (including sensors) and of many types, there is a powerful and durable need to share data and information across siloed systems and among the thousands of agencies at the federal, state, and local levels. MarkLogic Corporation: MarkLogic for Government 5
Even the relatively tiny Consumer Products Safety Commission conducts epidemiologic work with reports coming in from more than 100 emergency rooms across the country, and 600 product complaints from citizens every week. The Case For NoSQL When you boil it all down, governments are vast information generators. Regardless of mission, agencies share the challenge of getting value out of data in such a way that it leads to new or improved services, better decision-making, and more-informed policy development. In the era of multi-source big data, the model of relational database bolted to client-server application won t help agencies meet that challenge. Again, the reason is simply that most data simply doesn t fit the columns and rows of the RDBMS. A different type of database is required to create new value by combining data in new ways using fast, agile development. The emerging model provides the platform for realizing the data value proposition. It looks like this: Agencies are gathering and creating data from sensors, network logs, surveillance, social media, geospatial information systems (GIS), and documents of all types including texts, spreadsheets, e-mail, contracts, and PDFs the list is endless. They are storing that data in the cloud to improve accessibility and scalability. The choice of public, private, or hybrid cloud depends on a host of factors including sensitivity, cost, and network performance considerations. This is where the NoSQL database comes in. Selecting a NoSQL container helps government agencies solve both data and system management challenges while enabling the deployment of new and different applications. The foundation technology supporting cloud storage and big, diverse data is the NoSQL database. Many of the government s challenges have a data component specifically the ability, or inability, to deal with data from different sources and of different types. From a policy standpoint, agencies need a way to sustain data governance and discovery regulations. At the federal level, and at many state and municipal levels, policy states that government data sets be made available in machine-readable formats. Agencies can greatly ease the management of these requirements MarkLogic Corporation: MarkLogic for Government 6
when multiple data sources, both structured and unstructured, can be stored in a unified, scalable database, a capability only available with a NoSQL solution like MarkLogic. Data management is also simplified when datasets are unified under one roof. That avoids the costs of duplication, lowers the possibility for unsynchronized data, and simplifies storage subsystems and database administration. You can t separate data and storage considerations. MarkLogic brings high availability, elasticity, and tiered storage features to let administrators minimize storage costs while maximizing availability all on commodity hardware. MarkLogic opens up new possibilities for application development simply by freeing IT man-hours otherwise devoted to administration, data modeling, and the extract-transform-load (ETL) functions. Thanks to MarkLogic s built-in search function and automatic indexing, application developers can easily and quickly find data and information relevant to the application they are building. Many federal agencies are exploring the Hadoop model for cost-effective data storage, as well as for analytics and other compute-intensive chores on large data sets, particularly those composed of multiple data types that don t fit into relational tables. MarkLogic runs natively atop the Hadoop Distributed File System, and also has a connector for the Hadoop MapReduce engine. Three Use Cases for MarkLogic s NoSQL Platform Use case 1: Rescuing Healthcare.gov When the new Health and Human Services secretary appointed a technology specialist to focus on Healthcare.gov, it was not coincidental she chose someone who d worked on the Data Services Hub (DSH) component of Healthcare.gov. Despite the site s troubled rollout, the DSH was the one component that worked. That s no accident either. The hub relies on MarkLogic to integrate data on applicants citizenship, Social Security Number, vital statistics, and tax information. It adds health care provider data by zip code. It does all this by connecting with web services of those sources, each of which looks different. In fact, MarkLogic even ingests data from existing relational databases, in effect unchaining them from their applications and unlocking value. Often the logic for calculating insurance eligibility and rates occurs right within MarkLogic. And it all occurred with no predefined standard schema and no data reference model because the NoSQL approach is schema-agnostic. In fact, early calculations show that using a relational approach to the DSH would have required data modeling work totaling 100 years! MarkLogic Corporation: MarkLogic for Government 7
Use Case #2: Improving Army IT The Army Network Command in Ft. Huachuca, Ariz. needed to quantify its IT assets on both classified and unclassified-but-sensitive networks. This required information from 58 data sources with a variety of formats, and an attempt to feed all of the sources into an RDBMS simply could not go live. Yet MarkLogic was able to create an operational census of IT using 22 of the data sets within 30 days; full operating capabilities launched within 90 days. Use Case #3: Integrating Intelligence Some agencies have found that MarkLogic s NoSQL platform in effect rescues an RDBMS in a critical application. A case in point is a program in the intelligence community for analyzing electronic intelligence data. Disparate types of information from 18 data sources later raised to 26 are transformed in MarkLogic before they load into the popular RDBMS that planners had first chosen. Queries that took 12 minutes of parsing when using just the RDBMS now take only 6 seconds with MarkLogic, and four seconds of that is the loading of transformed data into the RDBMS. Plus, the system that could barely support a few users now scales to hundreds. Not All NoSQL Databases Are Created Equal Pioneered by MarkLogic over a decade ago, the NoSQL database market is diverse and growing. But MarkLogic uniquely offers an enterprise-grade NoSQL database platform bundled with the tools necessary for mission critical implementations. MarkLogic brings to public sector enterprises: High speed search and automatic data indexing capabilities embedded in the core product, to enable structure-aware searches across all text and data elements, in multiple languages Full atomicity, consistency, isolation and durability (ACID) compliance for assured transactions and no data loss High availability and rapid disaster recovery Real-time alerts to changes in database objects Elasticity and scalability to meet data volume and access demands World class, government-grade security controls Join the agencies like Health and Human Services, the Army, the Intelligence Community, the National Archives and Records Administration, and the Patent and Trademark Office that have moved to a 21st century data platform. Unify your data, speed application development, and master big data. MarkLogic Corporation: MarkLogic for Government 8
About MarkLogic For more than a decade, MarkLogic has delivered a powerful, agile, and trusted Enterprise NoSQL database platform that enables organizations to turn all data into valuable and actionable information. Organizations around the world rely on MarkLogic s enterprise-grade technology to power the new generation of information applications. MarkLogic is headquartered in Silicon Valley with offices in Washington D.C., New York, London, Frankfurt, Utrecht, and Tokyo. For more information, please visit www.marklogic.com. 2014 MarkLogic Corporation. All rights reserved. This technology is protected by U.S. Patent No. 7,127,469B2, U.S. Patent No. 7,171,404B2, U.S. Patent No. 7,756,858 B2, and U.S. Patent No 7,962,474 B2. MarkLogic is a trademark or registered trademark of MarkLogic Corporation in the United States and/or other countries. All other trademarks mentioned are the property of their respective owners. [SS-MLIH-13-06] 999 Skyway Road, Suite 200, San Carlos, CA 94070 US: +1 650 655 2300 INT'L.: +1 877 992 8885 sales@marklogic.com www.marklogic.com MarkLogic Corporation: MarkLogic for Government 9