Empowering Analysts With Big Data

Similar documents
White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

White Paper: Leveraging Web Intelligence to Enhance Cyber Security

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

Three Open Blueprints For Big Data Success

White Paper: Enhancing Functionality and Security of Enterprise Data Holdings

White Paper: Datameer s User-Focused Big Data Solutions

Why your business decisions still rely more on gut feel than data driven insights.

Big Data Integration: A Buyer's Guide

Accelerate BI Initiatives With Self-Service Data Discovery And Integration

Three Fundamental Techniques To Maximize the Value of Your Enterprise Data

White Paper: The Current State of BYOD

8 TIPS FOR MAKING THE MOST OF GOOGLE ANALYTICS. Brought to you by Geary LSF and Orbital Informatics

Healthcare, transportation,

The Emergence of Security Business Intelligence: Risk

The Liaison ALLOY Platform

HP SOA Systinet software

The Top Challenges in Big Data and Analytics

Tapping the benefits of business analytics and optimization

Cross-Domain Service Management vs. Traditional IT Service Management for Service Providers

Best Practices in Leveraging a Staging Area for SaaS-to-Enterprise Integration

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

Business Outcomes from Big Data

Air Force SOA Enterprise Service Bus Study Using Business Process Management Workflow Orchestration for C4I Systems Integration

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach

Symantec Global Intelligence Network 2.0 Architecture: Staying Ahead of the Evolving Threat Landscape

Top 5 reasons to choose HP Information Archiving

Solving Your Big Data Problems with Fast Data (Better Decisions and Instant Action)

We are Big Data A Sonian Whitepaper

Accelerate Your Enterprise Private Cloud Initiative

Understanding the Value of In-Memory in the IT Landscape

Top 10 Business Intelligence (BI) Requirements Analysis Questions

Leveraging Network and Vulnerability metrics Using RedSeal

The Importance of Data Quality for Intelligent Data Analytics:

EMC ADVERTISING ANALYTICS SERVICE FOR MEDIA & ENTERTAINMENT

Improving Network Security Change Management Using RedSeal

Using Data Analytics to Detect Fraud

Module 6 Essentials of Enterprise Architecture Tools

SOA: The missing link between Enterprise Architecture and Solution Architecture

IMPLEMENTING A SECURITY ANALYTICS ARCHITECTURE

IBM Enterprise Content Management Product Strategy

DATA QUALITY MATURITY

Delivering Cost Effective IT Services

INTELLIGENCE AND HOMELAND DEFENSE INSIGHT

data driven government

Pentaho & MongoDB Partner to Solve Government Big Data Challenges

Cloud computing insights from 110 implementation projects

Top 5 reasons to choose HP Information Archiving

Cloud Analytics Where CFOs, CMOs and CIOs Need to Move To

Fogbeam Vision Series - The Modern Intranet

Achieving Business Analysis Excellence

Thought Leadership White Paper Three Steps to Building a Long-Term Big Data Analytics Strategy

Approaching SaaS Integration with Data Integration Best Practices and Technology

Process-Based Business Transformation. Todd Lohr, Practice Director

The Road to Convergence

Government Technology Trends to Watch in 2014: Big Data

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Optimizing Network Vulnerability

OPEN SOURCE INFORMATION ACQUISITION, ANALYSIS, AND INTEGRATION IN THE IAEA DEPARTMENT OF SAFEGUARDS 1

How To Monitor Hybrid It From A Hybrid Environment

17 th Petroleum Network Education Conferences

Integrated Social and Enterprise Data = Enhanced Analytics

Master big data to optimize the oil and gas lifecycle

White Paper: What You Need To Know About Hadoop

IMPROVING RISK VISIBILITY AND SECURITY POSTURE WITH IDENTITY INTELLIGENCE

CONTINUOUS DIAGNOSTICS BEGINS WITH REDSEAL

EL Program: Smart Manufacturing Systems Design and Analysis

White. Paper. Big Data Advisory Service. September, 2011

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

SUSTAINING COMPETITIVE DIFFERENTIATION

IT Strategic Vendor Management: Achieving Savings and Improving Performance in Austere Times

Five Core Principles of Successful Business Architecture. STA Group, LLC Revised: May 2013

Oracle Real Time Decisions

IT Workload Automation: Control Big Data Management Costs with Cisco Tidal Enterprise Scheduler

HOW TO USE THE DGI DATA GOVERNANCE FRAMEWORK TO CONFIGURE YOUR PROGRAM

SAP Thought Leadership Business Intelligence IMPLEMENTING BUSINESS INTELLIGENCE STANDARDS SAVE MONEY AND IMPROVE BUSINESS INSIGHT

Cybersecurity: Mission integration to protect your assets

ramyam E x p e r i e n c e Y o u r C u s t o m e r s D e l i g h t Ramyam is a Customer Experience Management Company Intelligence Lab

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

OPTIMUS SBR. Optimizing Results with Business Intelligence Governance CHOICE TOOLS. PRECISION AIM. BOLD ATTITUDE.

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

DATA MANAGEMENT FOR THE INTERNET OF THINGS

How To Create An Insight Analysis For Cyber Security

The Evolution of Enterprise Social Intelligence

How to Implement MDM in 12 Weeks

whitepaper The Evolutionary Steps to Master Data Management

BUSINESS RULES AND GAP ANALYSIS

Enterprise Architecture: A Governance Framework

A Comprehensive Solution for API Management

Engage your customers

Managing the Shadow Cloud

HP Service Manager software

VMware Cloud Operations Management Technology Consulting Services

QLIKVIEW FOR LIFE SCIENCES. Partnering for Innovation and Sustainable Growth

The Six A s. for Population Health Management. Suzanne Cogan, VP North American Sales, Orion Health

Introduction to Management Information Systems

Realizing business flexibility through integrated SOA policy management.

Banking Application Modernization and Portfolio Management

are you helping your customers achieve their expectations for IT based service quality and availability?

Improving Data Quality: Empowering Government Decision Makers with Meaningful Information for Better Decision Flow in Real-Time

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

Transcription:

White Paper: Empowering Analysts With Big Data Inside: Balancing your approach to Big Data Criteria for evaluating your enterprise approach Tips for getting started 1

Four Years of Research Into Big Data for Analysts For the last four years, the research team at CTOlabs.com has been contributing to studies and analysis and community events on the topic of Big Data. Through our leadership of venue like the yearly Government Big Data Forum and our continuous dialog with thought leaders via our Weekly Government Big Data Newsletter we have sought to highlight lessons learned, share best practices, and foster a greater dialog between and among practitioners fielding real world solutions in the Big Data space. Our research team, led by former CTO of the Defense Intelligence Agency Bob Gourley, has been collecting community advice and success tips on the implementation of Big Data projects with the goal of continually feeding those back to the community to enhance as many efforts as possible. This presents design criteria, best practices and lessons learned in a way you can use to enhance your organization s approach to Big Data. We constructed this piece in a way we hope you will find logical and compelling, but are ready at any time to provide more background, insights, introductions to thought or other additional information. Contact us at CTOlabs.com at any time to weigh in with your thoughts. Empowering Analysts With Big Data Enterprises are awash in more data than they can make sense of and it is only getting worse. Every agency is realizing that if you are not acting now to think through this challenge it will be far harder to address in the future. These challenges are highlighted in the graph to the left. The ability of humans to analyze data, represented by the red arrow, is only slightly growing. Whereas the amount of data available to support national security missions, represented by the blue arrow, has grown far beyond the ability of analysts to make sense of it. For militaries and intelligence organizations, data has been growing due to the proliferation of collection systems. But new open source information, including social media feeds is also having a dramatic impact on data growth. This curve is relevant to enterprises everywhere, but lessons from national security community successes may be most relevant due to the scale of data they have been working with. 2

Lessons learned from the national security community give us a framework to solve Big Data challenges National security enterprises, including military and intelligence organizations and the commanders that depend on them, were among the first to face today s big data challenges. National security missions have long required a rigorous analytical tradecraft, sensemaking, to emphasize the action orientation of operation analysis. Sensemaking is the creation of knowledge and the optimization of decisions from data. Sensemaking enables organizations to develop situational awareness and make maximum use over their data holdings. In the national security space, the most promising big data solutions are those that enable sensemaking in a balanced way - where analysts are empowered to do that they do best but supported by technologies that do what humans cannot - and are governed by policies forged from experience..we will leverage this conceptual framework of people, technology and policy in a more proscriptive way below. What do humans do best, and what do computers do best? Our years of operation experience in enterprise IT and continuous interaction with the emerging Big Data community have made something incredibly clear: Organizations are optimized for analysis when they design systems that empower their analysts to do what they do best and leverage IT to do what it does best. Here is the logic behind this observation: Analysts leverage the greatest processor on earth, their brains. They generate knowledge that supports their organization s mission. Humans develop insights and inferences and produce actionable intelligence for decision makers to act upon. Analysts can be great at utilizing pattern recognition and sensemaking skills, up to a point. Even the most trained analyst can only process a fixed number of objects at any one time. Once analysts pass that threshold, human processing power degrades rapidly. No human can handle the multi-dimensional correlations of factors present in large data challenges. The need to comprehend the large arrays of data in modern enterprises and the need to assess how data interrelates is beyond human capabilities. And although the trained analyst takes steps to avoid bias, enterprises should consider leveraging automation in ways to mitigate human bias. Computers exist to compute. They can conduct repetitive tasks at scale and can also apply logical reasoning over large and intricate data sources. When the right architecture is in place, computers can operate over vast quantities of data of all formats, at speeds that no person or team could ever hope to do. Computers can deliver to humans new inferences based on complex evidentiary discovery and insight. A well functioning 3

enterprise architecture can enable computers to apply analytics holistically over data holdings, comparing many millions of relationships and correlations to each other, leading to enhanced discovery and knowledge creation for presenting to analysts. The differentiation between Human and machine computational reasoning is significant. Humans are incredibly flexible, adaptive, and broad in their reasoning constructs but are not deep and therefore not able to handle large amounts of information or reasoning tasks. Computers on the other hand, are incredibly efficient in handling handle large amounts of information or reasoning tasks, but are not flexible, adaptive nor broad in their reasoning constructs. As a side effect, the vast amount of human reasoning is in the sub-conscious level meaning it is difficult to near-impossible to audit the logic trail. Computer reasoning is exposed and therefore auditable. Audit-ability is difficult with human reasoning forcing confidence assessments to be based on past performance, not logic assessment of the analysis in question. The key takeaway though for heightened analysis efficiency is to balance the human and the computer in a analytic functional pairing. This takes advantage of the best of both worlds. This pairing however is not just limited to human sensory assist functionally such as data visualization, it is also in the extension of compare and contrast operations (pairwise analysis) that effectively discovers more from difficult evidence in a Big Data environment. Without this pairing, the ability to exploit Non Obvious relationships (NOR) is limited and the results are sub-par. The problem of Overwhelming Data Gartner analyst Doug Laney created a widely used construct for understanding the enterprise data landscape, using Volume, Velocity and Variety as dimensions. Many government and military organizations add forth and fifth dimensions: Veracity and Volatility. These dimensions are important constructs for considering the contributions of big data technologies to the modern enterprise: Volume: Enterprise data holdings have all grown exponentially, making use of data at rest requires computer-based automation that can index, search, correlate and discover connections. This must be done in ways that bring new insights to analysts. Velocity: Data streams into new organizations and in operational organizations must be quickly understood. The velocity of data poses many architectural challenges (how fast can it be stored?) but introduces the biggest issues around quickly determining the relevance of new data in context of existing knowledge. Variety: The widely varying formats in data include both structured data that comes in fields and unstructured data that must have structure divined. Technologies 4

that work over all types of data help balanced organizations leverage all their data holdings. Veracity: What is the true meaning of the data? Pre-processing of data ensures the system knows data provenance and can assess validity, at speed. This is important with all data sources but is especially relevant in data created or touched by humans, such as social media. Technological contributions to veracity should also include advanced identity assessment/entity extraction and also relationship building. Volatility: When is data valuable or when is the data most valuable? Data often has a half-life of value meaning it is valuable for a certain period of time. What makes this even more complex is that often this volatility is related to the availability and detection of other like volatile data. Its like putting a puzzle together on a moving board responsive technologies like analytic visualization combined with automated compare/contrast (pairwise) operations assist here a balanced approach is needed one or the other may be incomplete. Big Data alone however is not enough to fully grasp the significance of the intelligence and/or investigatory analysis problem. Doug Laney s highly applicable Big Data characterization model is focused on data within that data we need to further explore the impact of complex evidence that is difficult to isolate, identify, and comprehend. This is the additional concept of Difficult Evidence. Like Big Data, Difficult Evidence has several dimensions: Sparse: This is the ratio between the evidence that matters (relates and has analytic impact to the question or information goal) to the information at hand. The proverbial needles in the haystack or often needles in the needle stack. Technology assists with filtering out the overall body of non-applicable information but elevating the essential elements with analytics is essential to finding the key evidence. Obscure: Key evidence is rarely obvious. It can be incomplete, inaccurate, vague, or intermixed with non-applicable information in many cases. These obscuration factors make pulling the essential evidentiary patterns extremely challenging. Obscuration cloaks the essential meaning of the evidence. Ambiguous: Evidence can mean different things to different analysts. Like obscuration, ambiguity cloaks the meaning of the evidence in relation to the other evidence. This is a contextual obscuration. Disambiguation is best accomplished by relation of other evidence to the ambiguous information thus enhancing the context and eliminating multiple meanings. Fragmented: Evidence is often not complete. Fragmentation occurs due to the nature of the information or as an artifact of its gathering and storage. Whether the silo-ing of the information is internal or external, the result is the same evidence must often be identified, partially understood, holistically recognized, and associated with what s missing before context is achieved. 5

What does an Analyst-Centric Big Data framework look like? How will you know if you are building towards a balanced, analyst-centric big data framework? We offer insights below based on our interactions with experienced technologists through the Government Big Data Forum and through direct consultations with leaders across government. We review key capabilities you should consider for your enterprise in three broad categories: Analyst-facing capabilities Enterprise IT capabilities Enterprise policy considerations Evaluation criteria in each of these areas is presented below: Evaluating Analyst Facing Capabilities Capability Operability Functionality Tailorability Accessibility Interoperability Evaluation Factors How well can the analysts in your organization operate their tools? To what degree do they require assistance from the IT department or from specialized outside contractors? Are technologies in place that funnel the right data and assessments to the right person? Do technologies support analyst s needs for social network analysis (SNA)? To what degree does the functionality of tools provided to your analysts support the full spectrum of functions (find who, what, where, when, connections and concepts and changes to all the above)? Do the capabilities your analysts use help them discover connections and concepts over large/diverse data sets? Can analysts evaluate data veracity? Can analysts evaluate data relevance? When analysts need to tailor their capabilities for new data sources can they import them themselves? Or do they require assistance from outsiders? Are there flexible import specifications or are rigid schemas used? Flexible schemas allow analysts the opportunity to get data into their analysis tools and do analysis. Can analysts access enterprise capabilities where the mission requires it? Are there thin client and mobility options? Are there stand-alone options that can synchronize with the enterprise when reconnected? Can analysts work with other analysts both inside and outside their organization? Solutions should be integrated and configurable across domains and work across internal boundaries and with partners. Interoperabiilty should include an ability to work with all standard GIS solutions. 6

Team Support Knowledge Capture Can analysts work across the collaborative spectrum from one independent analyst to an entire enterprise collaborating together? Can analysts move their conclusions quickly to others on the team and to decision-makers? Have coalition sharing capabilities been engineered that enable sanitization while protecting the essence of the information? As new conclusions and insights are developed they need to be smartly captured to build upon and for continued fusion and analysis. This can include knowledge from partners and others outside the organization. The criteria above are best assessed in conjunction with experienced analysts who know your organization s mission and function and are familiar with their current tools. But keep in mind that analysts are not paid to know the full potential of modern technologies. Additional evaluation factors below will be best evaluated in conjunction with both your internal technology team and the broader technology community. Evaluating Enterprise IT Capabilities For Big Data Capability Data Layers Data Grooming Multi- Dimensional Security Synchronized Evaluation Factors Does your data layer connect all relevant data? Have you established a trusted information layer? Does the system require loading all information into a proprietary repository or does it allow federated search among distributed data sources to gather required information for analysis and do this leveraging open architectures? Does your trusted information layer include incorporation of unstructured information into a semi-structured format? Key here is being able to crawl massive amounts of unstructured data, identify documents of interest, extract and know entities, and prepare this information for use. Seek solutions that automatically extract entities based on semantic rules, extracting directly into the intelligence repositories available for analysis any manual process at this point has direct impact on time spent on analysis Does your enterprise security model support getting all the information to those that need it? Have you engineered for a multidimensional security model that ensures your policies are always enforced and the mission is still always supported with the best possible analysis? Additionally, this security model needs to interoperate with the existing access and security systems. Have you engineered in an ability to synchronize data sources? Does this enable smooth interoperability between single user, workgroups and enterprises? 7

True Service Orientation Enhancability Do you have an architecture that facilitates sharing of secure information for both service (request/response) and notification (publish/subscribe) via widely supported standards and best practices? Does this architecture provide the flexibility and adaptability needed to keep pace with the change and evolution of the data type and volume, the analytic tools, and the analytic mission? Can enterprise IT staff tailor the capabilities for analyst use, or do they need to task an outside vendor to re-code capabilities? Most modern enterprises, especially those in the national security community, have already been building towards more service-oriented, data smart structures, so it is very likely that your organization has a good foundation along this path. But remember it is a journey, and the balanced approach your mission requires may well require changes to your configuration and perhaps even more modern technologies to optimize your ability to support the mission. Evaluating Enterprise Policy Factors for Big Data Capability Efficient Frictionless Interoperable Evaluation Factors Do your policies emphasize the need for automation for efficiency? Do you have measures of Return on Investment or Return for Mission that are used to inform architecture decisions? Do your policies seek out and eliminate barriers to collaboration that impact data design? Do you seek out and remove capabilities that do not play well with others? Learning Governing Before selecting new capabilities do you conduct market assessments and solicit the opinions of others with similar mission needs? Do you enforce mandates for open API s and SOA best practices? 8

Whatever the status of your technology infrastructure, you will need a good governance process in place to move to a more optimized infrastructure. Accelerating balanced analytical solutions: Ready to move out? Here are four steps to consider as you do: 1) Evaluate your enterprise in light of the recommended criteria above. Use that to build your plan. 2) Enlist the aid of your analyst community to prioritize the analytical capabilities to deliver. 3) After prioritizing the analytical capabilities your mission requires, address the enterprise technology gaps required to enhance support to mission. 4) Track improvements to your enterprise like a project-- Watch cost, schedule and performance Concluding Thoughts Every enterprise is different, with different missions, different infrastructures and architectures. You may find that many of the criteria we outlined above are already met by your existing enterprise. A quick inventory of capabilities and gaps will help you assess the challenge and prioritize how you architect for improvement. We most strongly recommend a structured engagement with your organization s analysts. They understand your organization s mission and vision and will likely be strong supporters in your move to bring more balance to your organization s approach to big data analytical solutions. Their prioritization of needs and capabilities should help drive organizational improvement plans. However, keep in mind that your analysts are not paid to understand the power of modern computing. External advice and assistance in this area, including connecting with other organizations that have met similar challenges, will provide important insights into your road ahead. We have observed organizations making this type of transformation around the globe, including commercial organizations, government agencies and militaries. One thing all seem to have in common is a deep need to automate with efficiency. For some this translates to a calculation of Return on Investment. For militaries it can be a more operationally focused Return on Mission. But in every case, understanding the efficiencies and total cost to the enterprise of a solution is critically important to ensuring success. 9

More Reading For more federal technology and policy issues visit: CTOvision.com- A blog for enterprise technologists with a special focus on Big Data. CTOlabs.com - A reference for research and reporting on all IT issues. J.mp/ctonews - Sign up for the government technology newsletters including the Government Big Data Weekly. About the Authors Bob Gourley is CTO and founder of Crucial Point LLC and editor and chief of CTOvision.com He is a former federal CTO. His career included service in operational intelligence centers around the globe where his focus was operational all source intelligence analysis. He was the first director of intelligence at DoD s Joint Task Force for Computer Network Defense, served as director of technology for a division of Northrop Grumman and spent three years as the CTO of the Defense Intelligence Agency. Bob serves on numerous government and industry advisory boards. Contact Bob at bob@crucialpointllc.com Ryan Kamauff is a technology research analyst at Crucial Point LLC, focusing on disruptive technologies of interest to enterprise technologists. He writes at http://ctovision.com. He researches and writes on developments in technology and government best practices for CTOvision.com and CTOlabs.com, and has written numerous whitepapers on these subjects. Contact Ryan at Ryan@ crucialpointllc.com For More Information If you have questions or would like to discuss this report, please contact me. As an advocate for better IT use in enterprises I am committed to keeping this dialogue up open on technologies, processes and best practices that will keep us all continually improving our capabilities and ability to support organizational missions. Contact: Bob Gourley bob@crucialpointllc.com CTOlabs.com 8