A Survey on Big Data Analytical Tools



Similar documents
Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Tap into Hadoop and Other No SQL Sources

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

So What s the Big Deal?

Data Refinery with Big Data Aspects

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Big Data on Microsoft Platform

The Future of Data Management

Hadoop. Sunday, November 25, 12

Big Data: Tools and Technologies in Big Data

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

How To Handle Big Data With A Data Scientist

Transforming the Telecoms Business using Big Data and Analytics

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

BIG DATA What it is and how to use?

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS)

Getting Started Practical Input For Your Roadmap

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Are You Ready for Big Data?

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Are You Ready for Big Data?

BIG DATA-AS-A-SERVICE

How To Scale Out Of A Nosql Database

Native Connectivity to Big Data Sources in MSTR 10

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

CIO Guide How to Use Hadoop with Your SAP Software Landscape

BIG DATA CHALLENGES AND PERSPECTIVES

Oracle Big Data SQL Technical Update

Advanced Big Data Analytics with R and Hadoop

BIG DATA TRENDS AND TECHNOLOGIES

APACHE DRILL: Interactive Ad-Hoc Analysis at Scale

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Reference Architecture, Requirements, Gaps, Roles

Big Data and Apache Hadoop Adoption:

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Information Builders Mission & Value Proposition

Hadoop Ecosystem B Y R A H I M A.

INTRODUCTION TO CASSANDRA

Luncheon Webinar Series May 13, 2013

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

HDP Hadoop From concept to deployment.

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

A Brief Outline on Bigdata Hadoop

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

The Future of Data Management with Hadoop and the Enterprise Data Hub

Apache Hadoop: The Big Data Refinery

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

The Next Wave of Data Management. Is Big Data The New Normal?

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Modernizing Your Data Warehouse for Hadoop

Bringing Big Data to People

Big Data Can Drive the Business and IT to Evolve and Adapt

Using Tableau Software with Hortonworks Data Platform

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Data Warehouse design

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Hadoop Big Data for Processing Data and Performing Workload

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Implement Hadoop jobs to extract business value from large and varied data sets

Workshop on Hadoop with Big Data

A Survey on Big Data Concepts and Tools

White Paper: What You Need To Know About Hadoop

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

The Enterprise Data Hub and The Modern Information Architecture

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

MapReduce with Apache Hadoop Analysing Big Data

In-Memory Analytics for Big Data

The Internet of Things and Big Data: Intro

Business Intelligence for Big Data

BIRT in the World of Big Data

Data processing goes big

Big Data Analytics Nokia

Transcription:

A Survey on Big Data Analytical Tools Mr. Mahesh G Huddar Sr. Lecturer Dept. of Computer Science and Engineering Hirasugar Institute of Technology, Nidasoshi, Karnataka, India Manjula M Ramannavar Asst. Professor Dept. of Computer Science and Engineering Gogte Institute of Technology, Belgaum, Karnataka, India Abstract: Due to increase in use of social media forums, email, document and sensor data etc., data is generated at exponential speed. The growth of data has affected all fields, whether it is the business sector or the world of science. A larger amount of data gives a better output but also working with it can become a challenge due to processing limitations.achieving the full use of data in this increasingly digital world requires not only new data analysis algorithms but also a new generation of systems and distributed computing environments to handleincrease in the volume, lack of structure of data and the increasing computational needs of massive-scale analytics. In this paper, we review different big data analytical tools. We try to cover a variety of platforms for big data analytics and compare them based on computing environment, owner, latency, operational mode, data shapes, Hadoop dependency, schema, license, query language and source code. Keywords: Big Data, Enterprise, Open Source, Business Intelligence, Metadata, SQL, HDFS I. INTRODUCTION We are in a flood of data today. Statistics show that daily, we create around 2.5 Exabyte sof data that is 90% of the total world s data has been created in the last two years, and it is growing exponentially. Just to have an idea of the amount of data being generated, one Exabyte s (EB) equals 10 18 bytes, meaning 10 9 GB. [19] A. What is Big Data? The term Big Data was first introduced to the computing world by Roger Magoulas from O Reilly media in 2005 in order to define a great amount of data that traditional data management techniques cannot manage and process due to the complexity and size of this data. Big Data [2] is the large amounts of data that is collected with time and are difficult to analyze using the traditional database system tools. This data comes from everywhere: posts from social media sites, digital videos and pictures, sensors used to gather climate information, cell phone GPS signals, and online purchase transaction records, to name a few. According to MiKE 2.0, the open source standard for Information Management, Big Data is defined by its size, comprising a large, complex and independent collection of data sets, each with the potential to interact. In addition, an important aspect of Big Data is the fact that it cannot be handled with standard data management techniques due to the inconsistency and unpredictability of the possible combinations. [18] B. Characteristics of Big Data There are four characteristics of Big Data: Volume, Velocity, Variety and Veracity. Volume: Volume is the first and most notorious feature. In the year 2000, 800,000 petabytes of data were stored in the world. This number is expected to reach 35 zeta bytes by 2020. Twitter and Face book generate around 7 TB and 10 TB of data every day respectively. Some organizations generate data in terms of terabytes per hour. As implied by the term Big Data, organizations are facing large volumes of data. Organizations which do not know how to manage this large data are facing a big problem. But organizations can use analytical tools to analyze the data and make best use of it for the organization s growth. Variety: Variety refers to different types of data. With the increased use of smart devices, sensors, as well as social collaboration technologies, data has become large and complex, because it includes not only traditional relational data, but also semi-structured, and unstructured data from different sources such as web pages, search indexes, e- Special Issue - IDEAS-2013 85 ISSN: 2278-621X

mail, documents, sensor data, social media forums, web log files (including click-stream data) etc. Organizations should choose an analytical tool consisting of both traditional and nontraditional methods of data analysis as traditional analytical tools are limited to structured data analysis. The organization s success is dependent on its ability to analyze both relational and non-relational data. Velocity: It refers to how quickly the data is generated and stored, and its associated rates of processing and retrieval. Now a days, organizations are dealing with data sizes in terms of hundreds of terabytes, petabytes, Exabyte s etc.and this data is getting generated at an ever-increasing rate; it has become impossible for traditional systems to handle it. So organizations must be able to analyze this large and varied data in real-time or near real time to find insights in this data. So organizations must choose better analytical tool to deal effectively with Big Data. Veracity: Refers to the degree in which a leader trusts the used information in order to take decision. So getting the right correlations in Big Data is very important for the business future. [16] In addition, in Gartner s IT Glossary Big Data is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. [17] C. Uses of Big Data This large and varied data must be analyzed for different reasons. Organizations can use this data to make decisions to get a competitive advantage. For example, service providers can analyze call detail record (CDR) data to know their quality of service and to initiate the necessary improvements. Customer transactions can help a credit card company to detect frauds. In an organization, fraud can be detected by analyzing server logs. User navigation patterns can be understood by analyzing web logs. To understand the customer behavior and interest, we can use his mails. Product sales record can help to understand problems with the products and customer interest in the product as well. D. Managing and Analyzing Big Data The most important question that arises at this point of time is how do we store and process such huge amount of data; most of which is raw, semi structured, and may be unstructured. Big data platforms are categorized depending on how to store and process them in a scalable, fault tolerant and efficient manner [10]. Two important information management styles for handling big data are relational DBMS products enhanced for systematic workloads (often known as analytic RDBMSs, or ADBMSs) and non-relational techniques (sometimes known as NOSQL systems) for handling raw, semi structured and unstructured data. Non-relational techniques can be used to produce statistics from big data, or to preprocess big information before it is combined into a data warehouse. E. Big Data Analysis Figure 1. Big Data Management When a business can make use of all the information available rather than just a part of its details then it has a highly effective benefit over the market opponents. Big Data analytics can help to gain ideas and make better choices. Big Data analytics provides an opportunity to create unmatched company benefits and better service distribution. It also needs new facilities and a new way of thinking about the way company and IT market works. The idea of Big Data is going to change the way we do things today. The International Data Corporation (IDC) research forecasts that overall details will develop by 50 times by 2020, motivated mainly by more included systems such as receptors in outfits, medical gadgets and components like structures and connects. The research also identified that unstructured details - such as data files, email and video - will account for 90% of all details designed over the next several years. But the number of IT experts available to handle all that details will only develop by 1.5 times the present levels. Special Issue - IDEAS-2013 86 ISSN: 2278-621X

The electronic galaxy is 1.8 billion gigabytes in dimension and saved in 500 quadrillion data files. And its dimension gets more than dual in every two years period of time. If we evaluate the electronic universe with our actual universe then it's nearly as many pieces of details in the electronic universe as stars in our actual universe. F. Characteristics of Big Data Platforms A Big Data platform should give a solution which is designed specifically with the needs of the enterprise in the mind. The following are the basic features of Big Data Platforms- Comprehensive, Enterprise-ready, Incorporated, Open Source based, Low latency flows and updates, Solid and fault-tolerant, Scalability, Extensible, Allows ADHOC queries, little maintenance; etc. Figure 2. Developing a Big Data Strategy [15] A. Citus DB II. ENTERPRISE BIG DATA ANALYTICAL TOOLS Citus DB [3] scalable and robust analytical tool is built on top of PostgreSQL. Citus DB is designed having parallelism in mind. Citus DB is the first database analytical tools which enables execution of distributed SQL queries on data which is external to the database. Citus DB gives fast and flexible access to massive volumes of data. Event streams, user actions, machine generated data and log files are applicable datasets. Citus DB partitions these massive data and executes queries efficiently on these that involve groupings, look-ups, orderings, and complex selections. Also, Citus DB supports JOIN between multiple small tables and one large. Citus DB enables real-time responsiveness. For simple queries, run time needed is around 100ms, and increases depending on dataset size and query complexity. Real time insertion and deletion is not available in Citus DB and also it does not support real time analytics. B. Google BigQuery Google BigQuery uses SQL to analyze Big Data and gives real time business insights in seconds. It uses a managed data analysis service without the need for server installation or maintenance. Features of Google BigQuery [4] are Figure 3. Google Big Query Special Issue - IDEAS-2013 87 ISSN: 2278-621X

Managing data refers to creation and deletion of tables is based on a JSON encoded schema and data is imported from Google Storage. Query in BigQuery queries are expressed in a Structured Query Language dialect and the results of length around 64MB are returned in JSON. There are some limitations to the usual Structured Query Language queries. Integration It is easy to integrate BigQuery with Google Spreadsheets andgoogle Apps Script. Access Control - is done in BigQuery via Google Storage. C. Greenplum HD Greenplum [4] HD allows customers to start with big data statistics without the need to develop an entire new venture. It is provided as application or can be used in a preconfigured Data Handling Equipment Component. Greenplum is a 100% open-source qualified and reinforced edition of the Apache Hadoop collection that contains HDFS, Pig, MapReduce, Hbase, Zookeeper and Hive. IT prevails of a finish information research foundation and it brings together Hadoop and Greenplum data resource into only one Data Handling Equipment. Available as application or in a preconfigured Data Handling Equipment Component, Greenplum HD provides a finish foundation, such as set up, training, international support beyond simple appearance of the Apache Hadoop submission. Greenplum HD makes Hadoop quicker, more reliable, and easier to use. We can quickly set up extensive big data statistics remedy using HDFS that brings together Greenplum HD and Isilon scale-out NAS storage systems to provide, extremely effective and versatile information storage and statistics environment. The Greenplum HD DCA Component easily combines the Greenplum HD application into a product, offering an enhanced setting designed for performance and stability. The Greenplum Data Handling Equipment has power of Hadoop batch-processing to process unstructured data. This allows businesses to draw out value from both arranged and unstructured data under only one, smooth foundation. D. Hadapt Hadapt s [6] flagship product is an Adaptive Analytical tool, which brings a standard implementation of Structured Query Language (SQL) to the open source Apache Hadoop project. Hadapt enables interactive SQL based analytics of large data sets, by combining the scalable and robust architectures of Hadoop into a hybrid storage layer. Hadapt 2.0 delivers Hadapt Interactive Query, Hadapt Development Kit for custom analytics and integration with Tableau Software using Apache Hadoop. III. Figure 4. Hadapt OPEN SOURCE BIG DATA ANALYTICAL TOOLS A. Hadoop Apache Hadoop [1], [7] is an open source project which allows the distributed processing of massive data sets across different sets of servers. Hadoop is designed to shift from single server to in terms of thousands of servers, with a high degree of fault tolerance. Hadoop can detect and handle faults at the application layer without depending on high end hardware. Characteristics of Hadoop are: Scalable Flexible Fault tolerant Cost effective Special Issue - IDEAS-2013 88 ISSN: 2278-621X

B. Apache Drill and Dremel Figure 5. Hadoop MapReduce Enterprise Apache Drill [8] and Dremel [9] makes it possible to execute large-scale, ad-hoc queries, with lower latencies. These tools can scan PB of data in terms of few seconds. Apache Drill and Dremel put power not only in the hands of data engineers but also to business analysts. The business organizations loves Apache Drill and Dremel tools a lot but still it has not had a strong development communities attention. Hadoop has certain disadvantages as it uses batch processing for all workflows. The Hadoop team has worked very hard to incorporate ad hoc analytics. Many interface layers such as swazall, pig have been built on top of Hadoop to make it more user friendly, and business-accessible. In contrast to workflow based analysis, most business-driven Business Intelligence and analytics queries are interactive, ad hoc, and low-latency analyses. Writing Map Reduce workflows becomes limited for many business analysts. In an interactive applications it is not desirable to wait for job to start and end for several minutes. As Apache Drill and Dremel can execute ad hoc queries with low latency, it was argued that Apache Drill and Dremel are better than Apache Hadoop and may be replacement for Apache Hadoop. C. Apache Hive Apache Hive [10] is Apache Hadoop s data warehouse system. Hive has following features: ad-hoc queries, easy data summarization, and analysis of massivedatasets. Hive use SQL like query language HiveQL which provides a mechanism to query the data. Traditional MapReduce programmers are also allowed in Apache Hivewhen it is inefficient or inconvenient for them to express custom mappers and reducers in ApacheHiveQL. D. Cloudera Impala Cloudera Impala [11] is open source MPP query engine that runs on the top ofapache Hadoop. Cloudera Impala brings scalable and parallel database to Apache Hadoop giving users to experience low latency to SQL queries for data stored in Hadoop Distributed File System and Apache Hbase without need for data transformation or movement. With Cloudera Impala, data scientists and analysts can perform real-time analytics via SQL or BI tools on data stored in Hadoop. Both interactive and large-scale data processing queries can be done using the same data and metadata on same system without the need to migrate to specialized systems or commercial formats only to perform analysis. E. Giraph Giraph [12] analytical tool enables graph analysis. These tools are often coupled with graph DB s like Infinite GraphorNeo4j. Another tool for graph base project is Golden Orb. Graph DB s are pretty cutting edge. Graphs do a great job in social networks, mapping, computer networks, and geographic pathways calculating shortest routes for example, and in general anything that links the data together. Graphs are also used in physics and bioscience. Graph databases, big picture and analysis languages and frameworks are examples of how the world is started to realize that Big Data is not about having one programming or one database framework. Graph DB s based techniques are a killer applications, more specifically, for analysis of large networks with many linked pathways in the network. IV. COMPARISON BETWEEN ENTERPRISE TOOLS AND OPEN SOURCE TOOLS Special Issue - IDEAS-2013 89 ISSN: 2278-621X

A. Based on What They Offer Comparison chart for ad-hoc query tools for interactive analysis of large data sets. Apache drill allows a user to deal with many different types of data sources (large sized) and hence obsoletes the need for expensive and error prone ETL. (Extract-Transform-Load) of data Apache Drill Table 1 Comparison Chart for Big Data Analytical Tools [13] [14] Apache Hive Impala Giraph BigQuery CitusDB Hadapt HAWQ Owner Community Community Cloudera Community Google CitusData Hadapt Greenplum Lowlatency Operational mode Yes No Yes No Yes Yes Yes Yes On-premise On-premise On-premise On-premise Hosted, SaaS offering Nested, tabular On-premise On-premise Part of Pivotal HD appliance Data shapes Nested, Nested, Nested, Nested, Tabular tabular tabular tabular tabular Tabular Tabular HDFS, HBase, PostgreSQL, Data HDFS, HDFS, HDFS, HDFS / HDFS, Cassandra, N/A MongoDB, sources HBase HBase HBase RDBMS HBase MongoDB, HDFS RDBMS, etc. Hadoop dependent No Yes Yes Yes No No Yes No Schema Optional Required Required Required Required Required Required Required License Apache 2.0 Apache 2.0 Apache 2.0 Apache 2.0 ToS/SLA Commercial Commercial Commercial Source code Open Open Open Open Closed Closed Closed Closed Extensible, Graph Query SQL 2003, HiveQL SQL/HiveQL Query SQL languages MongoQL, subset Language subset SQL SQL subset SQL subset DSL, etc. Cypher Columnar storage Yes Possible Yes No Yes No No Yes B. Based on Security With a set of observations or measurements one can compare closed source project with open source project, and conclude that one is more secure than other but this conclusion must be done based on factors other than its nature that is closed source or open source project. Design process, source code auditing, quality developers, secure design, and other factors, will play an important role into the security of a project, and none of these factors are not directly related to a project of being closed or open source. It is really shocking to see some vulnerabilities in closed source projects. It certainly does not mean that open source projects are more secure than closed source projects, that is the number of vulnerabilities present in a given system are not dependent on openness or closeness of its source code. At the end of the day, security of the system depend on the way the project developers handle security. V. SELECTING THE RIGHT TOOLS FOR DATA ANALYTICS The factors discussed in the paper have a significant impact on technology selection. Organizations are not ready to make risky investment strategies in expensive alternatives just in case there is something more to be discovered. This is where multiple alternatives come into play. Existing exclusive and generally expensive storage space and Special Issue - IDEAS-2013 90 ISSN: 2278-621X

data resource alternatives are being formulated by some of the more cost-effective growing technology, generally from the Apache Hadoop atmosphere. Initial discovery and research of large information amounts, where the "nuggets" are well invisible, can be performed in a Hadoop atmosphere. Once the "nuggets" have been discovered and produced, a decreased and more organized information set can then be fed into a current information factory or statistics system. From that viewpoint, it makes overall sense for providers of current storage space, data resource, and information warehousing and statistics software to provide connections and APIs to Hadoop alternatives. And also put together incorporated promotions that work with both the exclusive and free components. While some of them hurry to accept Hadoop, there is no evidence that it is a sensible and suitable move. As already described, many of the new big data technologies are not ready for popular business utilization, and organizations without the IT abilities of the trailblazers or common early adopters will welcome the support from recognized providers. VI. CONCLUSION To conclude, after the analysis of both closed and open source Big Data Tools, it is pretty evident that it's all about the usage and needs of an individual or the company. It is impossible to afford a few tools at a personal level because of the prices and complications; while using open source systems might pose an outdating and modifications problem. There are alsosecurity issues involved in choosing the tool. Open source promotes development and innovation and supports developers. Big data is on every CIO s mind and for good reasons companies have spent more than $4 billion on big data technologies in the year 2012. These investments will in turn trigger a domino effect of upgrades and new initiatives that are valued for $34 billion for 2013. REFERENCES [1] P. Carter, Big Data Analytics: Future Architectures,Skills and Roadmaps for the CIO, IDC, September 2011. [2] Marcus R. Wigan, Roger Clarke, Big Data s BigUnintendedConsequences Published by the IEEE Computer Society, pp 46-53 [3] http://www.citusdata.com/docs [4] https://cloud.google.com/products/big-query [5] http://www.greenplum.com/ [6] http://hadapt.com/product/ [7] http://hadoop.apache.org/ [8] http://www.mapr.com/support/community-resources/drill [9] www.dremel.com/ [10] http://hive.apache.org/ [11] http://www.cloudera.com/content/cloudera/en/home.html [12] http://giraph.apache.org/ [13] http://www.infoivy.com/2013/08/comparison-table-of-interactive.html [14] http://online.liebertpub.com/action/showpopup?citid=citart1&id=t1&doi=10.1089%2fbig.2013.0011 [15] http://www.navint.com/ [16] P. Zikipoulos, T. Deutsch, D. Deroos, Harness the Power of Big Data, 2012, http://www.ibmbigdatahub.com/blog/harness-power-big-databook-excerpt [17] Gartner, Big Data Definition, http://www.gartner.com/it-glossary/big-data/ [18] MIKE 2.0, Big Data Definition, http://mike2.openmethodology.org/wiki/big_data_definition [19] G. Noseworthy, Infographic: Managing the Big Flood of Big Data in Digital Marketing, 2012 http://analyzingmedia.com/2012/infographichic-big-flood-of-big-data-in-digital-marketing/ Special Issue - IDEAS-2013 91 ISSN: 2278-621X