A Survey on Big Data Analytical Tools
|
|
|
- Ethan Payne
- 10 years ago
- Views:
Transcription
1 A Survey on Big Data Analytical Tools Mr. Mahesh G Huddar Sr. Lecturer Dept. of Computer Science and Engineering Hirasugar Institute of Technology, Nidasoshi, Karnataka, India Manjula M Ramannavar Asst. Professor Dept. of Computer Science and Engineering Gogte Institute of Technology, Belgaum, Karnataka, India Abstract: Due to increase in use of social media forums, , document and sensor data etc., data is generated at exponential speed. The growth of data has affected all fields, whether it is the business sector or the world of science. A larger amount of data gives a better output but also working with it can become a challenge due to processing limitations.achieving the full use of data in this increasingly digital world requires not only new data analysis algorithms but also a new generation of systems and distributed computing environments to handleincrease in the volume, lack of structure of data and the increasing computational needs of massive-scale analytics. In this paper, we review different big data analytical tools. We try to cover a variety of platforms for big data analytics and compare them based on computing environment, owner, latency, operational mode, data shapes, Hadoop dependency, schema, license, query language and source code. Keywords: Big Data, Enterprise, Open Source, Business Intelligence, Metadata, SQL, HDFS I. INTRODUCTION We are in a flood of data today. Statistics show that daily, we create around 2.5 Exabyte sof data that is 90% of the total world s data has been created in the last two years, and it is growing exponentially. Just to have an idea of the amount of data being generated, one Exabyte s (EB) equals bytes, meaning 10 9 GB. [19] A. What is Big Data? The term Big Data was first introduced to the computing world by Roger Magoulas from O Reilly media in 2005 in order to define a great amount of data that traditional data management techniques cannot manage and process due to the complexity and size of this data. Big Data [2] is the large amounts of data that is collected with time and are difficult to analyze using the traditional database system tools. This data comes from everywhere: posts from social media sites, digital videos and pictures, sensors used to gather climate information, cell phone GPS signals, and online purchase transaction records, to name a few. According to MiKE 2.0, the open source standard for Information Management, Big Data is defined by its size, comprising a large, complex and independent collection of data sets, each with the potential to interact. In addition, an important aspect of Big Data is the fact that it cannot be handled with standard data management techniques due to the inconsistency and unpredictability of the possible combinations. [18] B. Characteristics of Big Data There are four characteristics of Big Data: Volume, Velocity, Variety and Veracity. Volume: Volume is the first and most notorious feature. In the year 2000, 800,000 petabytes of data were stored in the world. This number is expected to reach 35 zeta bytes by Twitter and Face book generate around 7 TB and 10 TB of data every day respectively. Some organizations generate data in terms of terabytes per hour. As implied by the term Big Data, organizations are facing large volumes of data. Organizations which do not know how to manage this large data are facing a big problem. But organizations can use analytical tools to analyze the data and make best use of it for the organization s growth. Variety: Variety refers to different types of data. With the increased use of smart devices, sensors, as well as social collaboration technologies, data has become large and complex, because it includes not only traditional relational data, but also semi-structured, and unstructured data from different sources such as web pages, search indexes, e- Special Issue - IDEAS ISSN: X
2 mail, documents, sensor data, social media forums, web log files (including click-stream data) etc. Organizations should choose an analytical tool consisting of both traditional and nontraditional methods of data analysis as traditional analytical tools are limited to structured data analysis. The organization s success is dependent on its ability to analyze both relational and non-relational data. Velocity: It refers to how quickly the data is generated and stored, and its associated rates of processing and retrieval. Now a days, organizations are dealing with data sizes in terms of hundreds of terabytes, petabytes, Exabyte s etc.and this data is getting generated at an ever-increasing rate; it has become impossible for traditional systems to handle it. So organizations must be able to analyze this large and varied data in real-time or near real time to find insights in this data. So organizations must choose better analytical tool to deal effectively with Big Data. Veracity: Refers to the degree in which a leader trusts the used information in order to take decision. So getting the right correlations in Big Data is very important for the business future. [16] In addition, in Gartner s IT Glossary Big Data is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. [17] C. Uses of Big Data This large and varied data must be analyzed for different reasons. Organizations can use this data to make decisions to get a competitive advantage. For example, service providers can analyze call detail record (CDR) data to know their quality of service and to initiate the necessary improvements. Customer transactions can help a credit card company to detect frauds. In an organization, fraud can be detected by analyzing server logs. User navigation patterns can be understood by analyzing web logs. To understand the customer behavior and interest, we can use his mails. Product sales record can help to understand problems with the products and customer interest in the product as well. D. Managing and Analyzing Big Data The most important question that arises at this point of time is how do we store and process such huge amount of data; most of which is raw, semi structured, and may be unstructured. Big data platforms are categorized depending on how to store and process them in a scalable, fault tolerant and efficient manner [10]. Two important information management styles for handling big data are relational DBMS products enhanced for systematic workloads (often known as analytic RDBMSs, or ADBMSs) and non-relational techniques (sometimes known as NOSQL systems) for handling raw, semi structured and unstructured data. Non-relational techniques can be used to produce statistics from big data, or to preprocess big information before it is combined into a data warehouse. E. Big Data Analysis Figure 1. Big Data Management When a business can make use of all the information available rather than just a part of its details then it has a highly effective benefit over the market opponents. Big Data analytics can help to gain ideas and make better choices. Big Data analytics provides an opportunity to create unmatched company benefits and better service distribution. It also needs new facilities and a new way of thinking about the way company and IT market works. The idea of Big Data is going to change the way we do things today. The International Data Corporation (IDC) research forecasts that overall details will develop by 50 times by 2020, motivated mainly by more included systems such as receptors in outfits, medical gadgets and components like structures and connects. The research also identified that unstructured details - such as data files, and video - will account for 90% of all details designed over the next several years. But the number of IT experts available to handle all that details will only develop by 1.5 times the present levels. Special Issue - IDEAS ISSN: X
3 The electronic galaxy is 1.8 billion gigabytes in dimension and saved in 500 quadrillion data files. And its dimension gets more than dual in every two years period of time. If we evaluate the electronic universe with our actual universe then it's nearly as many pieces of details in the electronic universe as stars in our actual universe. F. Characteristics of Big Data Platforms A Big Data platform should give a solution which is designed specifically with the needs of the enterprise in the mind. The following are the basic features of Big Data Platforms- Comprehensive, Enterprise-ready, Incorporated, Open Source based, Low latency flows and updates, Solid and fault-tolerant, Scalability, Extensible, Allows ADHOC queries, little maintenance; etc. Figure 2. Developing a Big Data Strategy [15] A. Citus DB II. ENTERPRISE BIG DATA ANALYTICAL TOOLS Citus DB [3] scalable and robust analytical tool is built on top of PostgreSQL. Citus DB is designed having parallelism in mind. Citus DB is the first database analytical tools which enables execution of distributed SQL queries on data which is external to the database. Citus DB gives fast and flexible access to massive volumes of data. Event streams, user actions, machine generated data and log files are applicable datasets. Citus DB partitions these massive data and executes queries efficiently on these that involve groupings, look-ups, orderings, and complex selections. Also, Citus DB supports JOIN between multiple small tables and one large. Citus DB enables real-time responsiveness. For simple queries, run time needed is around 100ms, and increases depending on dataset size and query complexity. Real time insertion and deletion is not available in Citus DB and also it does not support real time analytics. B. Google BigQuery Google BigQuery uses SQL to analyze Big Data and gives real time business insights in seconds. It uses a managed data analysis service without the need for server installation or maintenance. Features of Google BigQuery [4] are Figure 3. Google Big Query Special Issue - IDEAS ISSN: X
4 Managing data refers to creation and deletion of tables is based on a JSON encoded schema and data is imported from Google Storage. Query in BigQuery queries are expressed in a Structured Query Language dialect and the results of length around 64MB are returned in JSON. There are some limitations to the usual Structured Query Language queries. Integration It is easy to integrate BigQuery with Google Spreadsheets andgoogle Apps Script. Access Control - is done in BigQuery via Google Storage. C. Greenplum HD Greenplum [4] HD allows customers to start with big data statistics without the need to develop an entire new venture. It is provided as application or can be used in a preconfigured Data Handling Equipment Component. Greenplum is a 100% open-source qualified and reinforced edition of the Apache Hadoop collection that contains HDFS, Pig, MapReduce, Hbase, Zookeeper and Hive. IT prevails of a finish information research foundation and it brings together Hadoop and Greenplum data resource into only one Data Handling Equipment. Available as application or in a preconfigured Data Handling Equipment Component, Greenplum HD provides a finish foundation, such as set up, training, international support beyond simple appearance of the Apache Hadoop submission. Greenplum HD makes Hadoop quicker, more reliable, and easier to use. We can quickly set up extensive big data statistics remedy using HDFS that brings together Greenplum HD and Isilon scale-out NAS storage systems to provide, extremely effective and versatile information storage and statistics environment. The Greenplum HD DCA Component easily combines the Greenplum HD application into a product, offering an enhanced setting designed for performance and stability. The Greenplum Data Handling Equipment has power of Hadoop batch-processing to process unstructured data. This allows businesses to draw out value from both arranged and unstructured data under only one, smooth foundation. D. Hadapt Hadapt s [6] flagship product is an Adaptive Analytical tool, which brings a standard implementation of Structured Query Language (SQL) to the open source Apache Hadoop project. Hadapt enables interactive SQL based analytics of large data sets, by combining the scalable and robust architectures of Hadoop into a hybrid storage layer. Hadapt 2.0 delivers Hadapt Interactive Query, Hadapt Development Kit for custom analytics and integration with Tableau Software using Apache Hadoop. III. Figure 4. Hadapt OPEN SOURCE BIG DATA ANALYTICAL TOOLS A. Hadoop Apache Hadoop [1], [7] is an open source project which allows the distributed processing of massive data sets across different sets of servers. Hadoop is designed to shift from single server to in terms of thousands of servers, with a high degree of fault tolerance. Hadoop can detect and handle faults at the application layer without depending on high end hardware. Characteristics of Hadoop are: Scalable Flexible Fault tolerant Cost effective Special Issue - IDEAS ISSN: X
5 B. Apache Drill and Dremel Figure 5. Hadoop MapReduce Enterprise Apache Drill [8] and Dremel [9] makes it possible to execute large-scale, ad-hoc queries, with lower latencies. These tools can scan PB of data in terms of few seconds. Apache Drill and Dremel put power not only in the hands of data engineers but also to business analysts. The business organizations loves Apache Drill and Dremel tools a lot but still it has not had a strong development communities attention. Hadoop has certain disadvantages as it uses batch processing for all workflows. The Hadoop team has worked very hard to incorporate ad hoc analytics. Many interface layers such as swazall, pig have been built on top of Hadoop to make it more user friendly, and business-accessible. In contrast to workflow based analysis, most business-driven Business Intelligence and analytics queries are interactive, ad hoc, and low-latency analyses. Writing Map Reduce workflows becomes limited for many business analysts. In an interactive applications it is not desirable to wait for job to start and end for several minutes. As Apache Drill and Dremel can execute ad hoc queries with low latency, it was argued that Apache Drill and Dremel are better than Apache Hadoop and may be replacement for Apache Hadoop. C. Apache Hive Apache Hive [10] is Apache Hadoop s data warehouse system. Hive has following features: ad-hoc queries, easy data summarization, and analysis of massivedatasets. Hive use SQL like query language HiveQL which provides a mechanism to query the data. Traditional MapReduce programmers are also allowed in Apache Hivewhen it is inefficient or inconvenient for them to express custom mappers and reducers in ApacheHiveQL. D. Cloudera Impala Cloudera Impala [11] is open source MPP query engine that runs on the top ofapache Hadoop. Cloudera Impala brings scalable and parallel database to Apache Hadoop giving users to experience low latency to SQL queries for data stored in Hadoop Distributed File System and Apache Hbase without need for data transformation or movement. With Cloudera Impala, data scientists and analysts can perform real-time analytics via SQL or BI tools on data stored in Hadoop. Both interactive and large-scale data processing queries can be done using the same data and metadata on same system without the need to migrate to specialized systems or commercial formats only to perform analysis. E. Giraph Giraph [12] analytical tool enables graph analysis. These tools are often coupled with graph DB s like Infinite GraphorNeo4j. Another tool for graph base project is Golden Orb. Graph DB s are pretty cutting edge. Graphs do a great job in social networks, mapping, computer networks, and geographic pathways calculating shortest routes for example, and in general anything that links the data together. Graphs are also used in physics and bioscience. Graph databases, big picture and analysis languages and frameworks are examples of how the world is started to realize that Big Data is not about having one programming or one database framework. Graph DB s based techniques are a killer applications, more specifically, for analysis of large networks with many linked pathways in the network. IV. COMPARISON BETWEEN ENTERPRISE TOOLS AND OPEN SOURCE TOOLS Special Issue - IDEAS ISSN: X
6 A. Based on What They Offer Comparison chart for ad-hoc query tools for interactive analysis of large data sets. Apache drill allows a user to deal with many different types of data sources (large sized) and hence obsoletes the need for expensive and error prone ETL. (Extract-Transform-Load) of data Apache Drill Table 1 Comparison Chart for Big Data Analytical Tools [13] [14] Apache Hive Impala Giraph BigQuery CitusDB Hadapt HAWQ Owner Community Community Cloudera Community Google CitusData Hadapt Greenplum Lowlatency Operational mode Yes No Yes No Yes Yes Yes Yes On-premise On-premise On-premise On-premise Hosted, SaaS offering Nested, tabular On-premise On-premise Part of Pivotal HD appliance Data shapes Nested, Nested, Nested, Nested, Tabular tabular tabular tabular tabular Tabular Tabular HDFS, HBase, PostgreSQL, Data HDFS, HDFS, HDFS, HDFS / HDFS, Cassandra, N/A MongoDB, sources HBase HBase HBase RDBMS HBase MongoDB, HDFS RDBMS, etc. Hadoop dependent No Yes Yes Yes No No Yes No Schema Optional Required Required Required Required Required Required Required License Apache 2.0 Apache 2.0 Apache 2.0 Apache 2.0 ToS/SLA Commercial Commercial Commercial Source code Open Open Open Open Closed Closed Closed Closed Extensible, Graph Query SQL 2003, HiveQL SQL/HiveQL Query SQL languages MongoQL, subset Language subset SQL SQL subset SQL subset DSL, etc. Cypher Columnar storage Yes Possible Yes No Yes No No Yes B. Based on Security With a set of observations or measurements one can compare closed source project with open source project, and conclude that one is more secure than other but this conclusion must be done based on factors other than its nature that is closed source or open source project. Design process, source code auditing, quality developers, secure design, and other factors, will play an important role into the security of a project, and none of these factors are not directly related to a project of being closed or open source. It is really shocking to see some vulnerabilities in closed source projects. It certainly does not mean that open source projects are more secure than closed source projects, that is the number of vulnerabilities present in a given system are not dependent on openness or closeness of its source code. At the end of the day, security of the system depend on the way the project developers handle security. V. SELECTING THE RIGHT TOOLS FOR DATA ANALYTICS The factors discussed in the paper have a significant impact on technology selection. Organizations are not ready to make risky investment strategies in expensive alternatives just in case there is something more to be discovered. This is where multiple alternatives come into play. Existing exclusive and generally expensive storage space and Special Issue - IDEAS ISSN: X
7 data resource alternatives are being formulated by some of the more cost-effective growing technology, generally from the Apache Hadoop atmosphere. Initial discovery and research of large information amounts, where the "nuggets" are well invisible, can be performed in a Hadoop atmosphere. Once the "nuggets" have been discovered and produced, a decreased and more organized information set can then be fed into a current information factory or statistics system. From that viewpoint, it makes overall sense for providers of current storage space, data resource, and information warehousing and statistics software to provide connections and APIs to Hadoop alternatives. And also put together incorporated promotions that work with both the exclusive and free components. While some of them hurry to accept Hadoop, there is no evidence that it is a sensible and suitable move. As already described, many of the new big data technologies are not ready for popular business utilization, and organizations without the IT abilities of the trailblazers or common early adopters will welcome the support from recognized providers. VI. CONCLUSION To conclude, after the analysis of both closed and open source Big Data Tools, it is pretty evident that it's all about the usage and needs of an individual or the company. It is impossible to afford a few tools at a personal level because of the prices and complications; while using open source systems might pose an outdating and modifications problem. There are alsosecurity issues involved in choosing the tool. Open source promotes development and innovation and supports developers. Big data is on every CIO s mind and for good reasons companies have spent more than $4 billion on big data technologies in the year These investments will in turn trigger a domino effect of upgrades and new initiatives that are valued for $34 billion for REFERENCES [1] P. Carter, Big Data Analytics: Future Architectures,Skills and Roadmaps for the CIO, IDC, September [2] Marcus R. Wigan, Roger Clarke, Big Data s BigUnintendedConsequences Published by the IEEE Computer Society, pp [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] P. Zikipoulos, T. Deutsch, D. Deroos, Harness the Power of Big Data, 2012, [17] Gartner, Big Data Definition, [18] MIKE 2.0, Big Data Definition, [19] G. Noseworthy, Infographic: Managing the Big Flood of Big Data in Digital Marketing, Special Issue - IDEAS ISSN: X
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
Tap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK OVERVIEW ON BIG DATA SYSTEMATIC TOOLS MR. SACHIN D. CHAVHAN 1, PROF. S. A. BHURA
#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
Big Data: Tools and Technologies in Big Data
Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS)
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS) International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 ISSN 0976
Getting Started Practical Input For Your Roadmap
Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
Are You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: [email protected]
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Are You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
BIG DATA-AS-A-SERVICE
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
CIO Guide How to Use Hadoop with Your SAP Software Landscape
SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs
BIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
APACHE DRILL: Interactive Ad-Hoc Analysis at Scale
APACHE DRILL: Interactive Ad-Hoc Analysis at Scale Michael Hausenblas and Jacques Nadeau MapR Technologies Abstract Apache Drill is a distributed system for interactive ad-hoc analysis of large-scale datasets.
Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
Reference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
Big Data and Apache Hadoop Adoption:
Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
Information Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
A Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
Apache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM
Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that
The Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6
Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...
Modernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist [email protected] O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
Bringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
Big Data Can Drive the Business and IT to Evolve and Adapt
Big Data Can Drive the Business and IT to Evolve and Adapt Ralph Kimball Associates 2013 Ralph Kimball Brussels 2013 Big Data Itself is Being Monetized Executives see the short path from data insights
Using Tableau Software with Hortonworks Data Platform
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
Data Warehouse design
Data Warehouse design Design of Enterprise Systems University of Pavia 10/12/2013 2h for the first; 2h for hadoop - 1- Table of Contents Big Data Overview Big Data DW & BI Big Data Market Hadoop & Mahout
Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
Hadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP Business Analytics for All Amsterdam - 2015 Value of Big Data is Being Recognized Executives beginning to see the path from data insights to revenue
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
A Survey on Big Data Concepts and Tools
A Survey on Big Data Concepts and Tools D. Rajasekar 1, C. Dhanamani 2, S. K. Sandhya 3 1,3 PG Scholar, 2 Assistant Professor, Department of Computer Science and Engineering, Sri Krishna College of Engineering
White Paper: What You Need To Know About Hadoop
CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
The Enterprise Data Hub and The Modern Information Architecture
The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP
Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify
Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012
Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation
MapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside [email protected] About Journey Dynamics Founded in 2006 to develop software technology to address the issues
In-Memory Analytics for Big Data
In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
Business Intelligence for Big Data
Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,
BIRT in the World of Big Data
BIRT in the World of Big Data David Rosenbacher VP Sales Engineering Actuate Corporation 2013 Actuate Customer Days Today s Agenda and Goals Introduction to Big Data Compare with Regular Data Common Approaches
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Big Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
