HOW THE DATA LAKE WORKS

Size: px
Start display at page:

Download "HOW THE DATA LAKE WORKS"

Transcription

1 HOW THE DATA LAKE WORKS by Mark Jacobsohn Senior Vice President Booz Allen Hamilton Michael Delurey, EngD Principal Booz Allen Hamilton As organizations rush to take advantage of large and diverse data sets, many find they simply cannot keep up with the exponential growth in the volume, velocity and variety of information today. So much data is coming in at such an overwhelming rate that organizations with conventional approaches to data storage and management cannot hope to capture it all, much less process it all. Inevitably, some of the most valuable information particularly unstructured data gets left on the cutting room floor. And organizations have no way of knowing how much critical knowledge and insight is being lost. To meet this challenge, Booz Allen Hamilton pioneered the Data Lake a completely new approach that not only manages the volume, velocity and variety of data, but actually becomes more powerful as all three aspects increase. What makes this possible is a transformative shift from schemaon-write to schema-on-read. With schema-on-write, which underlies the process known as extract, transform and load (ETL), it is necessary to design the data model and analytic frameworks before any data is loaded. This means we need to know in advance how we might use our data in the future a kind of catch-22 that severely limits the scope and value of our inquiries. With the schema-on-read, however, we can call upon the data for analysis as needed. The frameworks are created ad hoc and in an iterative methodology for whatever purpose we have in mind with only a minimum amount of preparation. This fundamental change in approach has far-reaching implications. Business and government organizations are discovering that the larger and more diverse their data, the less effective ETL becomes. Analysts often must spend the bulk of their time simply creating the frameworks, preparing the data and maintaining the infrastructure. As lines of inquires inevitably change, the frameworks must be torn down and rebuilt, data must be re-ingested and re-indexed, and schemas must be updated also at great effort. And the frameworks themselves are 2014 Booz Allen Hamilton Inc. All rights reserved. No part of this document may be reproduced without prior written permission of Booz Allen Hamilton.

2 Figure 1 The data cell within the Data Lake KEY ROW ID COLUMN TAG GROUP TAG VISIBILITY TIME STAMP VALUE Source: Booz Allen Hamilton difficult to connect, hampering the ability of organizations to integrate and analyze their data. The Data Lake s schema-on-read eliminates these and many other constraints of ETL, enabling organizations to draw full value from their data, no matter how large it grows. Data can be loaded first, then transformed and indexed in an iterative methodology as organizational understanding of data improves. The Data Lake uses a key/value store, an innovative approach founded on schemaon-read. With the key/value store, all relevant information associated with a piece of data is stored with that item in the form of metadata tags. These tags make it possible to store and manage vast amounts of data of all types and have it immediately available for analysis. This ability, coupled with the Data Lake s inexpensive storage running on commodity hardware, enables organizations to add a virtually unlimited number of new data sources at minimal risk. USING THE KEY/VALUE STORE With the Data Lake, an organization s entire repository of data is entered into a giant table and organized through the metadata tags. Each piece of data, such as a name, a photograph, an incident report or a Twitter feed, is placed in an individual cell. It does not matter where in the Data Lake any piece of data is located, where it comes from, or how it might be formatted. Because all of the data can easily be connected through the tags, the time-intensive frameworks of ETL are no longer necessary. Tags can evolve and be added or changed as analytic needs change; this is a fundamental difference between a relational database, which requires a predefined schema, and the Data Lake. Four different types of tags essentially serve as pointers to the data within the cell. They are the primary tag, the tag group, the time stamp, and the Row ID. In addition, the cell contains information on visibility, presented in a logical expression that governs who has access to the data in the cell and under what circumstances. Figure 1 shows how this information is structured within an individual cell. To help show how these data work together in practice, the data and tags are represented in rows in an example shown in Figure 2. Here, a variety of data about an investor is entered into the Data Lake, such as personal information and stock transactions. The first column shows the actual data. The second column, with the primary tag, identifies the type of data in the cell such as name, birthdate, account number, etc. The tags themselves can be organized into groups in this case, the groups might be Investor information and Transactions. There can be any number of primary tags and tag groups and they do not have to be defined before data is ingested. The time stamp tag uses information embedded with the data in the original data source here, the time and date of the various stock transactions. The time stamp helps to distinguish different versions of a similar activity. Not all data entered into the Data Lake must be accompanied by time and date information, so in those cases a time stamp tag would not be applicable. 2 JUNE 2014

3 TYPE OF DATA IN THE CELL ORGANIZING PRINCIPLE FOR THE TAGS USED TIME AND DATE OF THE STOCK TRANSACTIONS DESIGNATION THAT THE ENTRIES ARE DIRECTLY CONNECTED DATA PRIMARY TAG TAG GROUP TIME STAMP John Doe Name Investor Information 1 5/17/71 Date of Birth Investor Information Account # Investor Information Shares ABBC Stock Sales Transactions 9/17/ :43 AM Shares ABBC Stock Sales Transactions 9/17/2013 2:34 PM Shares XYYZ Stock Purchases Transactions 9/17/2013 3:03 PM 1 ROW ID Figure 2 Four types of tags serving as pointers to the data Source: Booz Allen Hamilton The fourth type of tag is Row ID. The rows themselves are not each given their own number. Instead, entire groups of rows usually all relating to a single person or entity are given the same Row ID number. This designates that they are all directly connected with each other. This also allows closely related data to be sharded, or horizontally partitioned, in close disk locations in the underlying storage. In the example, we know that the birthdate, account number and stock transactions are all associated with John Doe because they all have the same Row ID. In the Data Lake, there can be hundreds and even thousands of rows with the same Row ID. It now becomes possible to ask questions of the data, or search for patterns, using any combination of these points: the data itself, the primary tag, the tag group, the time stamp, or the Row ID. We might want to know, for example, which investors made large purchases of a particular stock within a certain time frame. Or perhaps we want to know the frequency with which investors in certain foreign countries make transactions. Any combination of data and tags can be used in our queries. Every piece of structured and unstructured data does not have to be tagged upon ingest. Say, for example, a data source has a large number of data points about an individual, but only a few are needed for an initial inquiry. Every data point can be added to the Data Lake, though only selected ones are assigned tags. The others do not need to be tagged, other than to be given the particular row number associated with the individual so the data can be located later. This saves time because the analysts do not need to assign tags to the bulk of the data. Not defining all tags in the beginning also saves time related to expensive data modeling activities. And the additional data points are now part of the Data Lake, available to be tagged and analyzed whenever needed. Unlike with traditional data structures, we do not need to capture and define the data up front. A single piece of data can be given multiple primary tags and tag groups, which can be assigned ad hoc as we learn more about our information. Say we later decide to conduct queries on investors who are employees of the JUNE

4 Figure 3 Creating new rows for additional data on John Doe DATA PRIMARY TAG TAG GROUP TIME STAMP ROW ID John Doe Name Investor Information 1 5/17/71 Date of Birth Investor Information Account # Investor Information Shares ABBC Stock Sales Transactions 9/17/ :43 AM Shares ABBC Stock Sales Transactions 9/17/2013 2:34 PM Shares XYYZ Stock Purchases Transactions 9/17/2013 3:03 PM 1 John Doe Name Employee Telephone # Investor Information 1 NEW DATA PROVIDING ADDITIONAL INVESTOR INFORMATION ON JOHN DOE NEW TAG GROUP INDICATING WHETHER JOHN DOE IS AN EMPLOYEE Source: Booz Allen Hamilton Figure 4 Creating a new Row ID, with related data and tags DATA PRIMARY TAG TAG GROUP TIME STAMP ROW ID John Doe Name Investor Information 1 5/17/71 Date of Birth Investor Information Account # Investor Information Shares ABBC Stock Sales Transactions 9/17/ :43 AM Shares ABBC Stock Sales Transactions 9/17/2013 2:34 PM Shares XYYZ Stock Purchases Transactions 9/17/2013 3:03 PM 1 John Doe Name Employee Telephone # Investor Information 1 Jane Smith Name Investor Information 2 2/1/76 Date of Birth Investor Information Account # Investor Information Shares ABBC Stock Sales Transactions 6/24/2013 8:16 AM Shares QQWD Stock Purchases Transactions 6/24/ :11 AM Shares XYYZ Stock Purchases Transactions 6/24/2013 2:36 PM Telephone # Investor Information 2 Source: Booz Allen Hamilton 4 JUNE 2014

5 bank. As shown in Figure 3, we can create a new row in the key/value store that provides the data John Doe with an additional tag group: Employee. And at any time, we can add new data on the person, such as a phone number. Figure 3 shows how our updated example might look. Data about another investor might be given the Row ID #2. The Row ID not only connects the data associated with a particular person or entity, it also distinguishes one person or entity from another. Updating the example in Figure 4, we see the content in the new rows. In addition, the key/value store s flexibility in assigning tags means we do not have to know what the data refers to when we enter it into the Data Lake. We might have a nine-digit number associated with a certain person, but perhaps we do not know whether it is a phone number, a Social Security number, a bank account number, or whether it refers to something else. We can add it to the Data Lake and then run queries to see if it is similar to other nine-digit numbers in the Data Lake. Unlike with relational databases, we do not need to know in advance how we will be using the information, or whether we will be using it at all. We can simply add potentially relevant information into the Data Lake and add tags iteratively, as we gain more insight into the data. THE DATA LAKE IN ACTION Because the data in the Data Lake is all connected and uses a schema-on-read approach, its entirety can be searched during any query. In addition, subsets of data and tags can be indexed and analyzed independently. There are three basic ways of searching the Data Lake: by the data itself; by the data and primary tags together; or by the data, primary tags and tag groups together. Say we want to learn what influence a prominent expert on gold trading has on gold prices. We might load into the Data Lake articles, blogs and other content in which the expert is either the author or is quoted by others. Because the Data Lake easily accepts unstructured data, we can include posts on Twitter, Facebook and other social media sites, as well as podcasts and television programs. Searching by the data alone. Say our first question is, Is the price of gold tied to how often the expert s name appears in a tweet, article, blog or in other content? Using the time stamps, we can run analytics that will track mentions of the expert to changes in the price of gold. Searching by data and primary tags together. Say we next want to know, Is the price of gold tied to how often the expert is the author of a tweet, article, blog, etc.? In this case, we would search for the tag Author in all the content that mentions the expert. Searching by the data, primary tags and tag groups together. Next, we might want to narrow our question to, Is the price of gold tied to how often the expert is the author of a tweet? Here, we are looking at content mentioning the expert in which the tag is Author and the tag group is Tweet. Unlike with schema-on-write, where the data structures and analytics must be torn down and rebuilt with each new line of inquiry, using the key/value store makes it easy to switch variables in and out. We might, for example, want to drill down into the content of the tweets, which can also be tagged, and ask what happens when our gold expert discusses a particular subject, such as gold production in China or global supply and demand. Or we may want to see the effect of several experts combined. Or perhaps we want to gauge how influential particular television programs or blogs are on the price of gold. BUILDING THE DATA LAKE The Data Lake is a combination of publicly available powerhouse software programs like Hadoop and Accumulo, and a wide range of Booz Allen proprietary tools and techniques primarily associated with ingest and analytics. In particular, Hadoop and Accumulo, as adapted by the Data Lake, work together to deliver schema-on-read. The Data Lake s key/value store, derived from Accumulo, is supported by a distributed file system (i.e., Hadoop Distributed File System or HDFS), rather than by a conventional storage area network (SAN). With a SAN, data is taken out of storage for processing and then returned, traveling back and forth through a narrow fiber channel that substantially limits speed and JUNE

6 capacity. With the Data Lake s distributed file system, however, the processing is conducted right at the point of storage on thousands of nodes, all networked together in a cloud environment. Through Hadoop, the calculations on all these nodes are conducted in a parallel manner, making it possible for the entirety of the Data Lake to be processed all at once. In essence, Accumulo uses Hadoop to get the data and then moves it to the appropriate locations for analysis. The distributed files system also makes it considerably less expensive to add storage than with a SAN. Instead of continually purchasing and configuring new storage systems, as with a SAN, more nodes can simply be added to the distributed file system as needed. This enables the Data Lake to quickly and easily scale to an organization s growing data. FILLING THE DATA LAKE One of the chief drawbacks of the read-on-write is the sheer time and expense of preparing data. Major IT projects typically require huge datamodeling and standardization committees that often take a year or more to complete their work. The committees must define the problem space they want to tackle, decide what questions they need to ask and then figure out how to design the database schema to answer their questions. Because it is difficult to bring in new data sources once the structure is complete, there is often much disagreement over exactly what information should be included or left out. With these limitations, analysts cannot interactively ask questions of the data they must form hypotheses well in advance of the actual analysis and then create the data structures and analytics to test those hypotheses. Consequently, the only results that come back are the ones that the data structures and analytics are designed to provide. There is a high risk of creating a closedloop system an echo chamber that merely validates the original hypothesis. This is not an issue with the Data Lake, where both structured and unstructured data can be ingested quickly and easily, without data modeling or standardization. Structured data from conventional databases is placed into the rows of the Data Lake table in a largely automated process. Analysts choose which tags and tag groups to assign, typically drawn from the original tabular information. As noted earlier, the same piece of data can be given multiple tags, and tags can be changed or added at any time. Because the schema for storing does not need to be defined up front, expensive and timeconsuming modeling is not needed. INGESTING UNSTRUCTURED AND SEMI-STRUCTURED DATA Unstructured data widely seen as holding the most promise for creating new areas of business growth and efficiency accounts for much of the explosion in big data today (see Figure 5). However, because of the constraints of the conventional schema-onwrite approach, only a small portion of this valuable resource is ever tapped. Using a schema-on-write, unstructured data must be substantially transformed. The process is so time intensive that many organizations have found it nearly impossible to scale to their growing unstructured data. With the Data Lake s schema-on-read, however, there is no need for extensive data transformation unstructured and semi-structured data can be quickly ingested and made ready for analysis. Individual pieces of unstructured data, such as all or portions of tweets, are placed in rows and assigned the appropriate tags. Say a Fortune 500 company tweets its third-quarter financial results. Software configured for the Data Lake identifies various elements of the tweet it recognizes, for example, that a # symbol followed by text is a hashtag. It also recognizes the patterns of URLs, addresses and other types of information. In a largely automated process, the individual elements of the tweet are identified and then loaded into the Data Lake. Even unstructured data without easily identifiable content, such as doctors notes, can be quickly ingested into the Data Lake. Doctors notes, for example, are typically filled with sentence fragments, medical shorthand and other quirks, such as the framing of patient conditions in the negative (as in, the patient was not sweating ). The Data Lake brings together a variety of natural 6 JUNE 2014

7 Figure 5 The volume of structured vs. unstructured data in the word 2.2 Trillions of Gigabytes (Zettabytes) Structured Data Unstructured Data Text, Log Files, Blogs, Tweets, Audio, Video, etc Source: IDC 2011 Digital Universe Study ( language processing techniques and customizes them for specific types of unstructured data in this case, the phrasing of medical conditions. Again, the process is automated, making it possible to ingest and tag large amounts of unstructured data in a short period. ADDING NEW DATA SOURCES With the conventional approach, organizations may be reluctant to add new data sources no matter how promising because they fear the time and expense may outweigh the possible benefit. But with the Data Lake, organizations can add new data sources with little or no risk. This is possible because of two powerful features of the Data Lake s schema-on-read approach: all types of data can be ingested quickly and storage is inexpensive, and data can be stored in HDFS until it is ready to be analyzed. Say an organization has 20 new potential data sources, but does not know in advance which ones, if any, might be useful. An organization using the conventional schema-on-write may be reluctant to add any of the sources. But, the Data Lake actually encourages organizations to add new data sources because the time and resources needed are significantly reduced. Organizations need not fear adding what might be useless information; in a sense, there is no useless information in the Data Lake. NO WASTED SPACE With schema-on-write, data storage is inefficient, even in the cloud. The reason is that so much space is wasted due to the sparse table problem. Imagine a spreadsheet combining two data sources, an original one with 100 fields and the other with 50. The process of combining means that we will be adding 50 new columns into the original spreadsheet. Rows from the original will hold no data for the new columns, and rows from the new source will hold no data from the original. The result will be a great deal of empty cells. This not only wastes storage space, it also creates the opportunity for a great many errors. In the Data Lake, however, no space is wasted. Each piece of data is assigned a row, and since the data does not need to be combined at ingest, there are no empty rows or columns. This makes it possible to store vast amounts of data in far less space than would be required for even relatively small conventional cloud databases. A GRANULAR LEVEL OF SECURITY AND PRIVACY With most relational databases, security and privacy restrictions tend to be at the level of the database or of a table within the database (some databases have row-level security, but that is expensive to implement and maintain). If someone is not authorized to see a single JUNE

8 piece of data in a table, for example, then the entire table is off limits. Analysts running queries of databases may not have access to large swaths of data that should be available to them, severely degrading the results. This is not an issue in the Data Lake, which uses an Attribute-Based Access Control (ABAC) system that allows security and privacy restrictions to be built around each piece of data. As data is ingested into the Data Lake, it is placed in individual cells. Each cell also contains that piece of data s visibility, which determines who has access to the data in the cell and under what circumstances. Visibility might be based on a user s role in the organization. For example, at a health insurance company, cells with patient names and birthdates might be accessible to employees in management and in the accounting, claims, and legal departments. But other cells, say with patient medical information, might have visibility only to employees in claims and legal. When employees log onto the computer system to run queries of the Data Lake or when analytics are run, their department is identified. Their queries will automatically be limited to the appropriate cells. The visibility of a particular piece of data can be configured in multiple ways. Instead of the user s role, it might be based on an individual s clearance to see certain types of information. Or visibility might require both role and clearance. Visibility might also be based on the data source some users, for example, may have access to newspaper articles but not Twitter feeds. Any factor can be considered and can be used in combination with any others. With the conventional approach, changing the visibility of data can be cumbersome and time-consuming it often means stripping information out of one database and putting it in another. But with the Data Lake, it is as simple as making a change to the logical expression, which is a quick, automated task. OVERCOMING THE HURDLES OF VOLUME, VELOCITY AND VARIETY The shift from schema-on-write to schemaon read is not an incremental advance, but rather represents a completely new mindset, one designed expressly for the challenges of large and diverse data sets. With the virtually unlimited capacity of the Data Lake s key/value store, and with its underlying infrastructure that expands easily and inexpensively, organizations can analyze an exponentially increasing volume of data. Free of the constraints of data modeling, normalization and other schema-onwrite requirements, organizations can keep pace with the velocity of information. And because the Data Lake accepts unstructured data without painstaking formatting and structuring, organizations can draw full value from big data in all its variety. Schema-on-read and the Data Lake are new approaches for a new time. FOR MORE INFORMATION Mark Jacobsohn Senior Vice President Jacobsohn_Mark@bah.com Michael Delurey, EngD Principal Delurey_Mike@bah.com This document is part of a collection of papers developed by Booz Allen Hamilton to introduce new concepts and ideas spanning cloud solutions, challenges, and opportunities across government and business. For media inquiries or more information on reproducing this document, please contact: James Fisher Senior Manager, Media Relations, , fisher_james_w@bah.com Carrie Lake Manager, Media Relations, , lake_carrie@bah.com 8 JUNE 2014

How To Manage Big Data

How To Manage Big Data The Data Lake: Taking Big Data Beyond the Cloud by Mark Herman Executive Vice President Booz Allen Hamilton Michael Delurey Principal Booz Allen Hamilton The bigger that big data gets, the more it seems

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Turning Big Data into Opportunity

Turning Big Data into Opportunity Turning Big Data into Opportunity The Data Lake by Mark Herman herman_mark@bah.com Michael Delurey delurey_mike@bah.com Table of Contents Introduction... 1 A New Mindset... 1 Ingesting Data into the Data

More information

DATAOPT SOLUTIONS. What Is Big Data?

DATAOPT SOLUTIONS. What Is Big Data? DATAOPT SOLUTIONS What Is Big Data? WHAT IS BIG DATA? It s more than just large amounts of data, though that s definitely one component. The more interesting dimension is about the types of data. So Big

More information

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach Unlocking The Value of the Deep Web Harvesting Big Data that Google Doesn t Reach Introduction Every day, untold millions search the web with Google, Bing and other search engines. The volumes truly are

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

We are Big Data A Sonian Whitepaper

We are Big Data A Sonian Whitepaper EXECUTIVE SUMMARY Big Data is not an uncommon term in the technology industry anymore. It s of big interest to many leading IT providers and archiving companies. But what is Big Data? While many have formed

More information

DELIVERING ON THE PROMISE OF BIG DATA AND THE CLOUD

DELIVERING ON THE PROMISE OF BIG DATA AND THE CLOUD DELIVERING ON THE PROMISE OF BIG DATA AND THE CLOUD by Mark Jacobsohn Senior Vice President Booz Allen Hamilton Joshua Sullivan, PhD Vice President Booz Allen Hamilton WHY CAN T WE SEEM TO DO MORE WITH

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

Data Lake-based Approaches to Regulatory- Driven Technology Challenges

Data Lake-based Approaches to Regulatory- Driven Technology Challenges Data Lake-based Approaches to Regulatory- Driven Technology Challenges How a Data Lake Approach Improves Accuracy and Cost Effectiveness in the Extract, Transform, and Load Process for Business and Regulatory

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Big Data and its use for Communications Service Providers. IoT, Washington, DC April 9, 2014 Sanjay Mishra

Big Data and its use for Communications Service Providers. IoT, Washington, DC April 9, 2014 Sanjay Mishra Big Data and its use for Communications Service Providers IoT, Washington, DC April 9, 2014 Sanjay Mishra 4/9/2014 IoT, Washington, DC, Big Data & its use for CSP (c) 2014 1 Information and Communications

More information

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014 Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4

More information

Big Data Defined Introducing DataStack 3.0

Big Data Defined Introducing DataStack 3.0 Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...

More information

Accelerate BI Initiatives With Self-Service Data Discovery And Integration

Accelerate BI Initiatives With Self-Service Data Discovery And Integration A Custom Technology Adoption Profile Commissioned By Attivio June 2015 Accelerate BI Initiatives With Self-Service Data Discovery And Integration Introduction The rapid advancement of technology has ushered

More information

Information Governance

Information Governance Information Governance & Extended Content Solutions 2013 SOUND FAMILIAR? How do we connect our information together? How do we manage multiple un-integrated repositories of documents? Our users don t know

More information

Big Data for the Rest of Us Technical White Paper

Big Data for the Rest of Us Technical White Paper Big Data for the Rest of Us Technical White Paper Treasure Data - Big Data for the Rest of Us 1 Introduction The importance of data warehousing and analytics has increased as companies seek to gain competitive

More information

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013 Annex: Concept Note Friday Seminar on Emerging Issues Big Data for Policy, Development and Official Statistics New York, 22 February 2013 How is Big Data different from just very large databases? 1 Traditionally,

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK BIG DATA HOLDS BIG PROMISE FOR SECURITY NEHA S. PAWAR, PROF. S. P. AKARTE Computer

More information

Cloud Computing: What a Project Manager Needs to Know

Cloud Computing: What a Project Manager Needs to Know Cloud Computing: What a Project Manager Needs to Know Dr. Patrick D. Allen, PMP Patrick.allen@jhuapl.edu Purpose Provide Project Managers with the very basics of the three primary types of Clouds and Cloud

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics North Highland and Analytics Governance Considerations for Big Analytics Agenda Traditional BI/Analytics vs. Big Analytics Types of Requiring Governance Key Considerations Information Framework Organizational

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Forget information overload..the real challenge is content intelligence

Forget information overload..the real challenge is content intelligence FINAL DRAFT 10 th Feb 2014 Forget information overload..the real challenge is content intelligence Research Summary, MindMetre, February 2014 Management Summary According to research among senior information

More information

The Principles of the Business Data Lake

The Principles of the Business Data Lake The Principles of the Business Data Lake The Business Data Lake Culture eats Strategy for Breakfast, so said Peter Drucker, elegantly making the point that the hardest thing to change in any organization

More information

Big Data and Apache Hadoop Adoption:

Big Data and Apache Hadoop Adoption: Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Adobe Insight, powered by Omniture

Adobe Insight, powered by Omniture Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before

More information

BBBT Podcast Transcript

BBBT Podcast Transcript BBBT Podcast Transcript About the BBBT Vendor: The Boulder Brain Trust, or BBBT, was founded in 2006 by Claudia Imhoff. Its mission is to leverage business intelligence for industry vendors, for its members,

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Two Recent LE Use Cases

Two Recent LE Use Cases Two Recent LE Use Cases Case Study I Have A Bomb On This Plane (Miami Airport) In January 2012, an airline passenger tweeted she had a bomb on a Jet Blue commercial aircraft at the Miami International

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Ten Mistakes to Avoid

Ten Mistakes to Avoid EXCLUSIVELY FOR TDWI PREMIUM MEMBERS TDWI RESEARCH SECOND QUARTER 2014 Ten Mistakes to Avoid In Big Data Analytics Projects By Fern Halper tdwi.org Ten Mistakes to Avoid In Big Data Analytics Projects

More information

Eliminating Complexity to Ensure Fastest Time to Big Data Value

Eliminating Complexity to Ensure Fastest Time to Big Data Value Eliminating Complexity to Ensure Fastest Time to Big Data Value Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest

More information

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Generating the Business Value of Big Data:

Generating the Business Value of Big Data: Leveraging People, Processes, and Technology Generating the Business Value of Big Data: Analyzing Data to Make Better Decisions Authors: Rajesh Ramasubramanian, MBA, PMP, Program Manager, Catapult Technology

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Sources: Summary Data is exploding in volume, variety and velocity timely

Sources: Summary Data is exploding in volume, variety and velocity timely 1 Sources: The Guardian, May 2010 IDC Digital Universe, 2010 IBM Institute for Business Value, 2009 IBM CIO Study 2010 TDWI: Next Generation Data Warehouse Platforms Q4 2009 Summary Data is exploding

More information

Formal Methods for Preserving Privacy for Big Data Extraction Software

Formal Methods for Preserving Privacy for Big Data Extraction Software Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

Forward Thinking for Tomorrow s Projects Requirements for Business Analytics

Forward Thinking for Tomorrow s Projects Requirements for Business Analytics Seilevel Whitepaper Forward Thinking for Tomorrow s Projects Requirements for Business Analytics By: Joy Beatty, VP of Research & Development & Karl Wiegers, Founder Process Impact We are seeing a change

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Before You Buy: A Checklist for Evaluating Your Analytics Vendor

Before You Buy: A Checklist for Evaluating Your Analytics Vendor Executive Report Before You Buy: A Checklist for Evaluating Your Analytics Vendor By Dale Sanders Sr. Vice President Health Catalyst Embarking on an assessment with the knowledge of key, general criteria

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

Eliminating Complexity to Ensure Fastest Time to Big Data Value

Eliminating Complexity to Ensure Fastest Time to Big Data Value Eliminating Complexity to Ensure Fastest Time to Big Data Value Copyright 2013 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest

More information

Banking On A Customer-Centric Approach To Data

Banking On A Customer-Centric Approach To Data Banking On A Customer-Centric Approach To Data Putting Content into Context to Enhance Customer Lifetime Value No matter which company they interact with, consumers today have far greater expectations

More information

Agile Business Intelligence Data Lake Architecture

Agile Business Intelligence Data Lake Architecture Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Identifying Fraud, Managing Risk and Improving Compliance in Financial Services

Identifying Fraud, Managing Risk and Improving Compliance in Financial Services SOLUTION BRIEF Identifying Fraud, Managing Risk and Improving Compliance in Financial Services DATAMEER CORPORATION WEBSITE www.datameer.com COMPANY OVERVIEW Datameer offers the first end-to-end big data

More information

Business Intelligence Data Detectives. The Truth is in There

Business Intelligence Data Detectives. The Truth is in There Business Intelligence Data Detectives The Truth is in There Welcome Jason Hernandez Director, Information Management Y&L Consulting, Inc. @jasonuhernandez Clint Campbell Solutions Architect Y&L Consulting,

More information

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications NOSQL, BIG DATA AND GRAPHS Technology Choices for Today s Mission- Critical Applications 2 NOSQL, BIG DATA AND GRAPHS NOSQL, BIG DATA AND GRAPHS TECHNOLOGY CHOICES FOR TODAY S MISSION- CRITICAL APPLICATIONS

More information

Beyond the Data Lake

Beyond the Data Lake WHITE PAPER Beyond the Data Lake Managing Big Data for Value Creation In this white paper 1 The Data Lake Fallacy 2 Moving Beyond Data Lakes 3 A Big Data Warehouse Supports Strategy, Value Creation Beyond

More information

Wrangling Actionable Insights from Organizational Data

Wrangling Actionable Insights from Organizational Data Wrangling Actionable Insights from Organizational Data Koverse Eases Big Data Analytics for Those with Strong Security Requirements The amount of data created and stored by organizations around the world

More information

DISCOVERING AND SECURING SENSITIVE DATA IN HADOOP DATA STORES

DISCOVERING AND SECURING SENSITIVE DATA IN HADOOP DATA STORES DATAGUISE WHITE PAPER SECURING HADOOP: DISCOVERING AND SECURING SENSITIVE DATA IN HADOOP DATA STORES OVERVIEW: The rapid expansion of corporate data being transferred or collected and stored in Hadoop

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data : High-throughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Real-time analytical processing (RTAP) of vast amounts of time-series data from sensors, server logs,

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that

More information

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014 White Paper EMC Isilon: A Scalable Storage Platform for Big Data By Nik Rouda, Senior Analyst and Terri McClure, Senior Analyst April 2014 This ESG White Paper was commissioned by EMC Isilon and is distributed

More information

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies

More information

Modern Data Integration

Modern Data Integration Modern Data Integration Whitepaper Table of contents Preface(by Jonathan Wu)... 3 The Pardigm Shift... 4 The Shift in Data... 5 The Shift in Complexity... 6 New Challenges Require New Approaches... 6 Big

More information

There s no way around it: learning about Big Data means

There s no way around it: learning about Big Data means In This Chapter Chapter 1 Introducing Big Data Beginning with Big Data Meeting MapReduce Saying hello to Hadoop Making connections between Big Data, MapReduce, and Hadoop There s no way around it: learning

More information

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through

More information

KPMG Unlocks Hidden Value in Client Information with Smartlogic Semaphore

KPMG Unlocks Hidden Value in Client Information with Smartlogic Semaphore CASE STUDY KPMG Unlocks Hidden Value in Client Information with Smartlogic Semaphore Sponsored by: IDC David Schubmehl July 2014 IDC OPINION Dan Vesset Big data in all its forms and associated technologies,

More information

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc. Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has

More information

What happens when Big Data and Master Data come together?

What happens when Big Data and Master Data come together? What happens when Big Data and Master Data come together? Jeremy Pritchard Master Data Management fgdd 1 What is Master Data? Master data is data that is shared by multiple computer systems. The Information

More information

Where have you been all my life? How the financial services industry can unlock the value in Big Data

Where have you been all my life? How the financial services industry can unlock the value in Big Data Where have you been all my life? How the financial services industry can unlock the value in Big Data Agenda Why should I care? What is Big Data? Is Big Data for me? What will it take? PwC Slide 1 The

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

KNOWLEDGENT REPORT. 2015 Big Data Survey: Current Implementation Challenges

KNOWLEDGENT REPORT. 2015 Big Data Survey: Current Implementation Challenges KNOWLEDGENT REPORT 2015 Big Data Survey: Current Implementation Challenges INTRODUCTION The amount of data in both the private and public domain is experiencing exponential growth. Mobile devices, sensors,

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Master big data to optimize the oil and gas lifecycle

Master big data to optimize the oil and gas lifecycle Viewpoint paper Master big data to optimize the oil and gas lifecycle Information management and analytics (IM&A) helps move decisions from reactive to predictive Table of contents 4 Getting a handle on

More information

Content Marketing Integration Workbook

Content Marketing Integration Workbook Content Marketing Integration Workbook 730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com info@raabassociatesinc.com Introduction Like the Molière character who is delighted to learn he has

More information

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.

More information

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014 White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page

More information

Big Data Efficiencies That Will Transform Media Company Businesses

Big Data Efficiencies That Will Transform Media Company Businesses Big Data Efficiencies That Will Transform Media Company Businesses TV, digital and print media companies are getting ever-smarter about how to serve the diverse needs of viewers who consume content across

More information

Using Predictive Maintenance to Approach Zero Downtime

Using Predictive Maintenance to Approach Zero Downtime SAP Thought Leadership Paper Predictive Maintenance Using Predictive Maintenance to Approach Zero Downtime How Predictive Analytics Makes This Possible Table of Contents 4 Optimizing Machine Maintenance

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Using Big Data Analytics for Financial Services Regulatory Compliance

Using Big Data Analytics for Financial Services Regulatory Compliance Using Big Data Analytics for Financial Services Regulatory Compliance Industry Overview In today s financial services industry, the pendulum continues to swing further in the direction of lower risk and

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Top Data Management Terms to Know Fifteen essential definitions you need to know

Top Data Management Terms to Know Fifteen essential definitions you need to know Top Data Management Terms to Know Fifteen essential definitions you need to know We know it s not always easy to keep up-to-date with the latest data management terms. That s why we have put together the

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information