Big Data Analytics processing with Apache Hadoop storage

Size: px
Start display at page:

Download "Big Data Analytics processing with Apache Hadoop storage"

Transcription

1 Big Data Analytics processing with Apache Hadoop storage R.Gayathri 1, M.BalaAnand 2 1 Assistant Professor, Computer Science and Engineering, V.R.S. College of Engineering and Technology, Tamil Nadu, India 2 Assistant Professor, Computer Science and Engineering, V.R.S. College of Engineering and Technology, Tamil Nadu, India Abstract Big Data is a term connected to information sets whose size is past the capacity of customary programming advancements to catch, store, oversee and prepare inside a passable slipped by time. The well known supposition around Huge Data examination is that it requires web scale adaptability: over many figure hubs with connected capacity. In this paper, we wrangle on the need of an enormously adaptable disseminated registering stage for Enormous Data examination in customary organizations. For associations which needn't bother with a flat, web request adaptability in their investigation stage, Big Data examination can be based on top of a customary POSIX Group File Systems utilizing a mutual stockpiling model. In this study, we looked at a broadly utilized bunched record framework: (SF-CFS) with Hadoop Distributed File System (HDFS) utilizing mainstream Guide diminish. In our investigations VxCFS couldn't just match the execution of HDFS, yet, additionally beat much of the time. Along these lines, endeavors can satisfy their Big Data examination need with a customary and existing shared stockpiling model without relocating to an alternate stockpiling model in their information focuses. This likewise incorporates different advantages like soundness and vigor, a rich arrangement of elements and similarity with customary examination application. Key words - BigData; Hadoop; Clustered File Systems; Investigation; Cloud I. Introduction The exponential development of information in the course of the most recent decade has presented another space in the field of data innovation called Big Data. Datasets that extends the points of confinement of conventional information handling and stockpiling frameworks is frequently alluded to as Big Data. The need to handle also, investigate such gigantic datasets has presented another type of information examination called Big Data Analytics. Enormous Data examination includes breaking down expansive sums of information of an assortment of sorts to reveal concealed examples, obscure relationships and other valuable data. Numerous associations are progressively utilizing Big Data examination to show signs of improvement bits of knowledge into their organizations, expand their income and productivity and increase upper hands over adversary associations. The attributes of Big Data can be comprehensively separated into four Vs i.e. Volume, Velocity, Varity and Variability. Volume includes to the extent of the information. While Speed tells about the pace at which information is created; Varity and Variability lets us know about the many-sided quality and structure of information and distinctive methods for translating it. A typical thought about the applications which expend or break down Big Data is that they require a greatly adaptable and parallel base. This thought is right and bodes well for web scale associations like Facebook or Google. On the other hand, for 2015, IRJET ISO 9001:2008 Certified Journal Page 1344

2 conventional endeavor organizations this is normally not the case. According to Apache Hadoop wiki [3], huge number of arrangements of Hadoop in undertakings normally doesn't surpass 16 hubs. In such situations, the part of conventional stockpiling model with shared stockpiling also, to serve the need to conventional and in addition Big Data investigation can't be completely discounted. Huge Data investigation stage in today's reality regularly alludes to the Map-Reduce structure, created by Google [4], and the instruments and biological system constructed around it. Guide Reduce structure gives a programming model utilizing "guide" and "diminish" capacities Shared Disk Big Data Analytics with Apache. Hadoopover key- quality matches that can be implemented in parallel on a substantial bunch of figure hubs. Apache Hadoop [1] is an open source usage of Google's Map-Reduce demonstrate, and has turned out to be to a great degree mainstream over the a long time for building Big Data examination stage. The other key part of Big Data examination is to push the calculation close to the information. By and large, in a Map-Reduce environment, the figure and capacity hubs are the same, i.e. the computational undertakings keep running on the same arrangement of hubs that hold the information required for the calculations. As a matter of course, Apache Hadoop utilizes Hadoop Distributed Document System (HDFS) [2] as the hidden stockpiling backend, yet it is intended to work with other record frameworks also. HDFS is not a POSIX-consistent document framework, and once information is composed it is not modifiable (a compose once, read-numerous entrance model). HDFS secures information by duplicating information hinders over various hubs, with a default replication variable of 3. In this paper, we attempt to accumulate a sound thinking behind the need of another non-posix stockpiling stack for Big Data examination and backer, in light of assessment and examination that such a stage can be fabricated on conventional POSIX based bunch document frameworks. Conventional bunch document frameworks are frequently taken a gander at with an impulse that it requires costly top of the line servers with best in class SAN. Be that as it may, as opposed to such impressions, these record frameworks can be arranged utilizing merchandise on the other hand mid-range servers for lower expenses. All the more critically, these document frameworks can bolster customary applications that depend on POSIX API's. The broad accessibility of instruments, programming applications and human mastery are other additional items to these record frameworks. Comparative endeavors are attempted by IBM Research [5], where they have presented an idea of metablock in GPFS to empower the decision of a bigger piece granularity for Map/Reduce applications to exist together with a littler square granularity required for conventional applications, and have thought about the execution of GPFS. Whatever is left of the paper is sorted out as takes after. Area 2 portrays the idea of shared Big Data investigation. Segment 3 depicts the structural engineering of the Hadoop connector. Area 4 portrays our test setup took after by our investigations and results in segment 5. Area 6 tells us about extra utilize cases and advantages of our preparation. The future work as a continuation to our current proposition has been portrayed in segment 7 taken after by conclusion and references. 2015, IRJET ISO 9001:2008 Certified Journal Page 1345

3 II. SHARED DISK BIG DATA ANALYTICS In our study we analyze the execution of Hadoop Distributed File System (HDFS), the true record framework in Apache Hadoop with a business bunch record framework called VERITAS Storage Foundation Bunch File System (SF-CFS) by Symantec, with a assortment of workloads and guide lessen applications. We demonstrate that a bunched record framework can really coordinate the execution of HDFS for guide/lessen workloads and can even beat it for a few cases. We have utilized VMware virtual machines as register hubs in our group and have utilized a mid-level stockpiling cluster (Hitachi HUS130) for our study. While we comprehend that looking at a grouped record framework running on top of a SAN to that of a circulated document framework running on neighborhood plates is not an apple to apple correlation, but rather the study is for the most part coordinated towards getting an appropriate and right thinking (if any) behind the thought of presenting a new capacity model for Big Data investigation in datacenters of ventures and associations which are not working at a web scale. To have an assessment, we have run the same workload with HDFS in a SAN situation. Both SF-CFS and HDFS has been arranged with their default settings/tunable in our analyses. We have added to a record framework connector module for SF-CFS record framework supplanting HDFS out and out furthermore have exploited SF- CFS's potential by executing the local interfaces from this module. Our mutual circle Big Data investigation arrangement needn't bother with any adjustment in the Map Reduce implementation. Just by setting a couple of parameters in the arrangement of Apache Hadoop, the entire Big Data examination stage can be made up and running extremely rapidly. III. ARCHITECTURE The bunched record framework connector module we produced for Apache Hadoop stage has an extremely basic structural planning. It uproots the HDFS usefulness from the Hadoop stack. It acquaints SF-CFS with the Hadoop class by executing the APIs which are utilized for correspondence between Map/Reduce Framework furthermore, the File System. This could be accomplished in light of the fact that the Map-Reduce system dependably talks as far as a very much characterized FileSystem [6] API for every information access. The FileSystem API is a dynamic class which the record serving innovation underneath Hadoop must execute. Both HDFS and our bunched document framework connector module execute this FileSystem class, as appeared in Figure 1. Figure 1. Outline of SF-CFS Hadoop Connector 2015, IRJET ISO 9001:2008 Certified Journal Page 1346

4 Clustered File System being a parallel shared information record framework, the document framework namespace and the information is accessible to every one of the hubs in the bunch at any given purpose of time. Not at all like HDFS, where a Name Node keeps up the metadata data of the entire record framework namespace, with SF-CFS every one of the hubs in the group can serve the metadata. Subsequently a question from Map Reduce system relating to information territory can simply be determined by the process hub itself. The advantage of such a determination is the disposal of additional jumps navigated with HDFS in situations when information is not locally accessible. Additionally, the information need not be recreated crosswise over information hubs if there should arise an occurrence of a bunched document framework. Since, every one of the hubs have entry to the information; we can say that the replication component in SF-CFS is comparable to the HDFS with replication variable equivalent to the no. of hubs in the bunch. This building design gets rid of the danger of losing information when an information hub kicks the bucket and least replication was not accomplished for that piece of information. The utilization of RAID advances and seller SLAs away exhibits utilized as a part of SAN environment can record to general dependability of the information. IV. Exploratory SETUP In this study on shared plate Big Data investigation, we have analyzed HDFS (Apache Hadoop 1.0.2) which is the default record arrangement of Apache Hadoop and Symantec Corporation's Cluster File System (SFCFSHA 6.0) which is broadly sent by endeavours and associations in keeping money, monetary, telecom, avionics and different segments. The equipment setup for our assessment involves a 8 hub group with VMware virtual machines on ESX4.1. Each VM has been facilitated on individual ESX has and has 8 CPUs of 2.77GHz and 32GB physical memory. The bunch hubs are interconnected with a 1Gbps system connection committed to Map Reduce movement through a DLink switch. Shared capacity for bunched record framework is cut with SAS circles from a mid-range Hitachi exhibit and direct connected stockpiling for HDFS is made accessible from nearby SAS circles of the ESX has. Each of the process hub virtual machine is running on Linux (RHEL6.2). The setup for HDFS-SAN comprises of the same stockpiling LUNs utilized for SF-CFS, yet, designing in a manner that no two hubs see the same stockpiling, in order to copy a neighborhood plate sort of situation. HDFS-Local setup utilizes the DAS of each of the register hubs. In both cases, we utilized ext4 as the essential record framework. The following table summarizes the various scenarios we compared: 2015, IRJET ISO 9001:2008 Certified Journal Page 1347

5 V. Test results We have utilized TeraSort, TestDFSIO, MRbench and GridMix3 [7] for looking at the execution of SF-CFS and HDFS. These are broadly utilized guide/decrease benchmarks and are accessible pre-bundled inside Apache Hadoop dissemination. In our execution assessment, for TestDFSIO and TeraSort, we have done the correlation for HDFS replication component of 1 and in addition 3. TeraSort: TeraSort is a Map Reduce application to do parallel consolidation sort on the keys in the information set created by TeraGen. It is a benchmark that joins testing the HDFS and Map Reduce layers of a Hadoop group. A full TeraSort benchmark run comprises of the accompanying three stages: Scenario Our solution SF-CFS HDFS in SAN with replication factor 1 HDFS-SAN (1) HDFS in SAN with replication factor 3 HDFS-SAN (3) HDFS in Local Disks (DAS) with replication factor 1 Local (1) HDFS in Local Disks (DAS) with replication factor 1 HDFS- HDFS- Local (3) HDFS in Local Disks (DAS) with replication factor 3 1. Create the info information with TeraGen 2. Run TeraSort on the information 3. Accept the sorted yield information utilizing TeraValidate 2015, IRJET ISO 9001:2008 Certified Journal Page 1348

6 Figure 2: TeraSort outine Hadoop TeraSort is a Map Reduce work with a custom partitioner that uses a sorted rundown of n-1 inspected keys that characterize the key reach for each lessen. Figure: 2 above show the conduct of TeraSort benchmark for a dataset size of 10GB and 100GB. As watched, SF-CFS performs superior to anything HDFS in all the diverse situations. In our study, the execution tests are done on 64MB records and changing the quantity of documents for diverse test situations as outlined in figure 3 and 4 above. It has been watched that HDFS altogether outflanks SF-CFS in read for both replication element of 1 and 3. This is because of the way that HDFS pre-gets a whole piece of information equivalent to the square size and HDFS don't experience the ill effects of any store soundness issues with its compose once semantics. HDFS- Local(3) gives an included point of interest of read parallelism equivalent to the number of register hubs, accepting the squares are equally dispersed/repeated over all hubs, which a common record framework needs. In TestDFSIO keep in touch with, it is watched that HDFS with DAS with a replication variable of 1 outflanks SF-CFS. This execution change however comes at the expense of information misfortune in the occasion of hub disappointments. Taking all things together different cases, SF-CFS performs comparable or superior to anything HDFS for TestDFSIO compose workload. MRBench: MRbench benchmarks a Hadoop bunch by running little occupations rehashed over various times. It puts its attention on the Map Reduce layer and as its effect on the record framework layer of Hadoop is negligible. In our assessment we ran MRbench employments rehashed 50 times for SF-CFS, HDFS in SAN and HDFS in nearby circles for replication element of 3. The normal reaction time reported by MRBench in milliseconds was observed to be best for SF-CFS: GridMix3: GridMix3 is utilized to reproduce Map Reduce load on a Hadoop bunch by imitating ongoing burden mined from generation Hadoop groups. The objective of GridMix3 is to produce a sensible workload on a bunch to approve group use and measure Map Decrease and additionally document framework execution by replaying employment follows from Hadoop bunches 2015, IRJET ISO 9001:2008 Certified Journal Page 1349

7 that consequently catch fundamental ngredients of occupation executions. In our tests, we utilized the employment follow accessible from Apache Subversion. We watched that SF-CFS performed superior to anything HDFS in SAN and in addition HDFS in nearby circles with replication component of 3. Over the span of our study, we additionally contrasted the execution of SF-CFS and HDFS running so as to utilize the SWIM [9] benchmark Facebook employment follows and have watched SF-CFS to perform better or at standard with HDFS. It contains suites of workloads of a huge number of employments, with complex information, entry, and calculation designs which empowers thorough execution estimation of Map/Reduce frameworks. VI. A DDITIONAL CONSIDERATIONS Notwithstanding tantamount execution showed by SF-CFS for different Map/Reduce workloads and applications, SF- CFS gives the advantages of being a powerful, stable and exceedingly dependable record framework. It gives the capacity run examination on top of existing information utilizing existing investigation instruments and applications, which kills the requirement for duplicate in and duplicate out of information from a Hadoop group, sparing huge measure of time. SF-CFS additionally bolsters information ingestion over NFS. Alongside all these, it acquires other standard components like depiction, pressure, document level replication and deduplication and so on. For instance, gzip pressure for the data parts with HDFS is impractical as it is difficult to begin perusing at a subjective point in a gzip stream, and a guide errand can't read its split freely of the others [8]. In any case, if pressure is empowered in SF-CFS, the record framework will perform the decompression and return the information to applications transparently. Information reinforcements, calamity recuperation are other inherent advantages of utilizing SF-CFS for enormous information investigation. SF-CFS answer for Hadoop, otherwise called Symantec Enterprise Solution for HadoopTM is accessible as a free download for SF-CFS clients of Symantec Corporation [10]. VII. F UTURE W ORK Amid our investigation of the execution showed by a business bunch record framework in Map/Reduce workloads and its examination with a dispersed document framework, we watched that a lot of time is spent amid the duplicate period of the Map/Reduce model after guide undertaking completions. In Hadoop stage, the information and yield information of Map/Reduce occupations are put away in HDFS, with middle information produced by Map errands are put away in the neighborhood record arrangement of the Mapper hubs and are duplicated (rearranged) by means of HTTP to Reducer hubs. The time taken to duplicate this middle of the road guide yields build proportionately to the span of the information. Be that as it may, subsequent to in the event of a bunched document framework, every one of the hubs see all the information, this duplicate stage can be stayed away from by keeping the transitional information in the grouped record framework also and specifically understanding it from that point by the reducer hubs. This attempt will totally take out the duplicate stage after guide is over and bound to give a critical help to general execution of Map/Reduce employments. This will require changes in the rationale and code of Map/Reduce system executed inside Apache Hadoop. 2015, IRJET ISO 9001:2008 Certified Journal Page 1350

8 VIII. CONCLUSIONS From all the execution benchmark numbers what's more, their examination, it can be unquestionably contemplated that for Big Data investigation need, conventional shared stockpiling model can't be completely precluded. While because of engineering and plan issues, a group document framework may not scale at the same rate as a mutual nothing model does, yet for use situations where web request versatility is not required, a bunched record framework can make a nice showing even in the Big Data examination area. A bunched document framework like SF-CFS can give various different advantages its plenty of elements. This decade has seen the achievement of virtualization which presented the late patterns of server onsolidation, green registering activities in endeavours [11]. Enormous information examination with a grouped document framework from the current foundation adjusts into this model and bearing. A cautious investigation of the need and utilize cases are required before building a Big Data examination stage, as opposed to running with the thought that common nothing model is the main response to Big Data needs. REFERENCES [1]D.E. O Leary. Artificial Intelligence and Big Data.IEEE Computer Society, 96-99, [2]S. Chaudhuri.How Different id Big Data? IEEE 28thInternational Conference on Data Engineering, 5, [3]H. Topi. Where is Big Data in Your Information Systems Curriculum?acmInroads, Vol. 4. No.1, 12-13, [4]IBM, Big Data at the Speed of Business, What is big data, Online available from [5]C. Li, N. Onose, P. Pirzadeh, R. Vernica, J. Wen. ASTERIX: An Open Source System for Big Data Management and Analysis (Demo). Proceedings of the VLDB Endowment, Vol 5, No. 12, , [6]C. Okoli, K. Schabram. A Guide to Conducting a Systematic Literature Review of Information Systems Research. Sprouts: Working Papers on Information Systems, [7]B. Kitchenham, S. Charters. Guidelines for performing Systematic Literature Reviews in Software Engineering. EBSE Technical Report EBSE ,2007. [8]Merriam-Webster. Online available from mwebster.com/dictionary/definition [9]H. Suonuuti. Guide to Terminology, 2nd edition ed. Tekniikan sanastokeskus ry, Helsinki, [10] B. Marr, Key Performance Indicators, The 75 measures every manager needs to know. Pearson education limited, [11] Hel sinki region infoshare, Open data. Online available from , IRJET ISO 9001:2008 Certified Journal Page 1351

9 BIOGRAPHIES R.Gayathri was born in Cuddalore, India. She received the B.E. degree in Computer science and engineering from the University of Trichy, in 2011, and the M.E in Software engineering from College of engineering Guindy, Chennai. In 2014, she joined as a Assistant Professor in V.R.S.College of Engineering and Technology. Her area of interests include data mining.she is a Life Member of the Indian Society for Technical Education (ISTE). Mr.M.BalaAnand M.E (Ph.D) was born in Villupuram, India. He received the B.Tech. degree in Information Technology from V.R.S. College of Engineering & Technology, Arasur, Villupuram, in 2010, and the M.E in Software engineering from Hindustan University in 2012, He joined as a Assistant Professor in V.R.S.College of Engineering and Technology. His area of interests includes Big Data and Software Engineering. Hde is a Life Member of the Indian Society for Technical Education (ISTE). 2015, IRJET ISO 9001:2008 Certified Journal Page 1352

Shared Disk Big Data Analytics with Apache Hadoop

Shared Disk Big Data Analytics with Apache Hadoop Shared Disk Big Data Analytics with Apache Hadoop Anirban Mukherjee, Joydip Datta, Raghavendra Jorapur, Ravi Singhvi, Saurav Haloi, Wasim Akram {Anirban_Mukherjee, Joydip_Datta, Raghavendra_Jorapur, Ravi_Singhvi,

More information

Iaas for Private and Public Cloud using Openstack

Iaas for Private and Public Cloud using Openstack Iaas for Private and Public Cloud using Openstack J. Beschi Raja, Assistant Professor, Department of CSE, Kalasalingam Institute of Technology, TamilNadu, India, K.Vivek Rabinson, PG Student, Department

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director [email protected] Dave Smelker, Managing Principal [email protected]

More information

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce

More information

Storage Architectures for Big Data in the Cloud

Storage Architectures for Big Data in the Cloud Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Perspective Big Data Framework for Healthcare using Hadoop

More information

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. Amrit Pal Stdt, Dept of Computer Engineering and Application, National Institute

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

OnX Big Data Reference Architecture

OnX Big Data Reference Architecture OnX Big Data Reference Architecture Knowledge is Power when it comes to Business Strategy The business landscape of decision-making is converging during a period in which: > Data is considered by most

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

HDFS Space Consolidation

HDFS Space Consolidation HDFS Space Consolidation Aastha Mehta*,1,2, Deepti Banka*,1,2, Kartheek Muthyala*,1,2, Priya Sehgal 1, Ajay Bakre 1 *Student Authors 1 Advanced Technology Group, NetApp Inc., Bangalore, India 2 Birla Institute

More information

Performance and Energy Efficiency of. Hadoop deployment models

Performance and Energy Efficiency of. Hadoop deployment models Performance and Energy Efficiency of Hadoop deployment models Contents Review: What is MapReduce Review: What is Hadoop Hadoop Deployment Models Metrics Experiment Results Summary MapReduce Introduced

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

Understanding Hadoop Performance on Lustre

Understanding Hadoop Performance on Lustre Understanding Hadoop Performance on Lustre Stephen Skory, PhD Seagate Technology Collaborators Kelsie Betsch, Daniel Kaslovsky, Daniel Lingenfelter, Dimitar Vlassarev, and Zhenzhen Yan LUG Conference 15

More information

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO) IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO) Rick Koopman IBM Technical Computing Business Development Benelux [email protected] Enterprise class replacement for HDFS

More information

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters A Framework for Performance Analysis and Tuning in Hadoop Based Clusters Garvit Bansal Anshul Gupta Utkarsh Pyne LNMIIT, Jaipur, India Email: [garvit.bansal anshul.gupta utkarsh.pyne] @lnmiit.ac.in Manish

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

Evaluating HDFS I/O Performance on Virtualized Systems

Evaluating HDFS I/O Performance on Virtualized Systems Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang [email protected] University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing

More information

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),

More information

Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan

Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Abstract Big Data is revolutionizing 21st-century with increasingly huge amounts of data to store and be

More information

Distributed Framework for Data Mining As a Service on Private Cloud

Distributed Framework for Data Mining As a Service on Private Cloud RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

More information

Mambo Running Analytics on Enterprise Storage

Mambo Running Analytics on Enterprise Storage Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin 1, Gokul Soundararajan Advanced Technology Group 1 University of Utah Motivation No easy way to analyze data stored in enterprise storage

More information

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations

More information

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved. Redefining Backup for VMware Environment 1 Agenda VMware infrastructure backup and recovery challenges Introduction to EMC Avamar Avamar solutions for VMware infrastructure Key takeaways Copyright 2009

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

EMC Invista: The Easy to Use Storage Manager

EMC Invista: The Easy to Use Storage Manager EMC s Invista SAN Virtualization System Tested Feb. 2006 Page 1 of 13 EMC Invista: The Easy to Use Storage Manager Invista delivers centrally managed LUN Virtualization, Data Mobility, and Copy Services

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Big Data in the Enterprise: Network Design Considerations

Big Data in the Enterprise: Network Design Considerations White Paper Big Data in the Enterprise: Network Design Considerations What You Will Learn This document examines the role of big data in the enterprise as it relates to network design considerations. It

More information

HPSA Agent Characterization

HPSA Agent Characterization HPSA Agent Characterization Product HP Server Automation (SA) Functional Area Managed Server Agent Release 9.0 Page 1 HPSA Agent Characterization Quick Links High-Level Agent Characterization Summary...

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division In this talk Big data storage: Current trends Issues with current storage options Evolution of storage to support big

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensie Computing Uniersity of Florida, CISE Department Prof. Daisy Zhe Wang Map/Reduce: Simplified Data Processing on Large Clusters Parallel/Distributed

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

Nutanix NOS 4.0 vs. Scale Computing HC3

Nutanix NOS 4.0 vs. Scale Computing HC3 Nutanix NOS 4.0 vs. Scale Computing HC3 HC3 Nutanix Integrated / Included Hypervisor Software! requires separate hypervisor licensing, install, configuration, support, updates Shared Storage benefits w/o

More information

Towards MapReduce Performance Optimization: A Look into the Optimization Techniques in Apache Hadoop for BigData Analytics

Towards MapReduce Performance Optimization: A Look into the Optimization Techniques in Apache Hadoop for BigData Analytics Towards MapReduce Performance Optimization: A Look into the Optimization Techniques in Apache Hadoop for BigData Analytics Kudakwashe Zvarevashe 1, Dr. A Vinaya Babu 2 1 M Tech Student, Dept of CSE, Jawaharlal

More information

Reduction of Data at Namenode in HDFS using harballing Technique

Reduction of Data at Namenode in HDFS using harballing Technique Reduction of Data at Namenode in HDFS using harballing Technique Vaibhav Gopal Korat, Kumar Swamy Pamu [email protected] [email protected] Abstract HDFS stands for the Hadoop Distributed File System.

More information

A Study of Data Management Technology for Handling Big Data

A Study of Data Management Technology for Handling Big Data Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,

More information

Big Data Architect Certification Self-Study Kit Bundle

Big Data Architect Certification Self-Study Kit Bundle Big Data Architect Certification Bundle This certification bundle provides you with the self-study materials you need to prepare for the exams required to complete the Big Data Architect Certification.

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week Michael Thomas, Dorian Kcira California Institute of Technology CMS Offline & Computing Week San Diego, April 20-24 th 2009 Map-Reduce plus the HDFS filesystem implemented in java Map-Reduce is a highly

More information

MapReduce Evaluator: User Guide

MapReduce Evaluator: User Guide University of A Coruña Computer Architecture Group MapReduce Evaluator: User Guide Authors: Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño December 9, 2014 Contents 1 Overview

More information

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the

More information

<Insert Picture Here> Big Data

<Insert Picture Here> Big Data Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT

More information

A SMART AND EFFICIENT CLOUD APPROACH FOR BACKUP AND DATA STORAGE

A SMART AND EFFICIENT CLOUD APPROACH FOR BACKUP AND DATA STORAGE A SMART AND EFFICIENT CLOUD APPROACH FOR BACKUP AND DATA STORAGE Allagadda Gurappa 1, M.Purushotham Reddy 2, G.Rama Subba Reddy 3 1 M.tech Scholar (CSE), 2 Asst.professor, Dept. of CSE, Vignana Bharathi

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions

More information

The Performance Characteristics of MapReduce Applications on Scalable Clusters

The Performance Characteristics of MapReduce Applications on Scalable Clusters The Performance Characteristics of MapReduce Applications on Scalable Clusters Kenneth Wottrich Denison University Granville, OH 43023 [email protected] ABSTRACT Many cluster owners and operators have

More information

POWER ALL GLOBAL FILE SYSTEM (PGFS)

POWER ALL GLOBAL FILE SYSTEM (PGFS) POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm

More information

Characterizing Task Usage Shapes in Google s Compute Clusters

Characterizing Task Usage Shapes in Google s Compute Clusters Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key

More information

GraySort on Apache Spark by Databricks

GraySort on Apache Spark by Databricks GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner

More information

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) [email protected] http://www.cse.buffalo.edu/faculty/bina Partially

More information

High Performance Computing OpenStack Options. September 22, 2015

High Performance Computing OpenStack Options. September 22, 2015 High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Big Data Analysis: Apache Storm Perspective

Big Data Analysis: Apache Storm Perspective Big Data Analysis: Apache Storm Perspective Muhammad Hussain Iqbal 1, Tariq Rahim Soomro 2 Faculty of Computing, SZABIST Dubai Abstract the boom in the technology has resulted in emergence of new concepts

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Big Data Too Big To Ignore

Big Data Too Big To Ignore Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Scalable Cloud Computing Solutions for Next Generation Sequencing Data Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of

More information