Second Credit Seminar Presentation on Big Data Analytics Platforms: A Survey
|
|
- Camilla Holmes
- 8 years ago
- Views:
Transcription
1 Second Credit Seminar Presentation on Big Data Analytics Platforms: A Survey By, Mr. Brijesh B. Mehta Admission No.: D14CO002 Supervised By, Dr. Udai Pratap Rao Computer Engineering Department S. V. National Institute of Technology, Surat m.brijesh@coed.svnit.ac.in 16/04/2015 Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
2 Outline of Presentation 1 Introduction Categorization of Big Data Analytics Platforms Advantages and Drawbacks of both Horizontal and Vertical Scaling Platforms 2 Horizontal Scaling Platforms Peer-to-peer Networks Apache Hadoop Spark 3 Vertical Scaling Platforms HPC Cluster Multicore CPU GPU FPGA 4 Comparison of Big Data Analytics Platforms 5 Conclusion and Future Scope Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
3 Outline for section 1 Introduction 1 Introduction Categorization of Big Data Analytics Platforms Advantages and Drawbacks of both Horizontal and Vertical Scaling Platforms 2 Horizontal Scaling Platforms Peer-to-peer Networks Apache Hadoop Spark 3 Vertical Scaling Platforms HPC Cluster Multicore CPU GPU FPGA 4 Comparison of Big Data Analytics Platforms 5 Conclusion and Future Scope Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
4 Introduction Categorization of Big Data Analytics Platforms Categorization of Big Data Analytics Platforms Fig-1: Categorization of big data analytics platforms [1] Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
5 Introduction Pros and Cons of Horizontal and Vertical Scaling Platforms Advantages and Drawbacks of both Horizontal and Vertical Scaling Platforms Table-1: Advantages and drawbacks of both horizontal and vertical scaling[1] Scaling Advantages Drawbacks Horizontal scaling Vertical scaling Performance can be increased in small steps as per requirements Up-gradation cost is relatively low System can be scaled as much as needed Most of the software supports it Management and installation of hardware is relatively easy Software required to handle all the complexity regarding data distribution and parallel processing Software which can take advantage of horizontal scaling are limited Cost is relatively high System can onlybe scaled up to certain limit Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
6 Outline for section 2 Horizontal Scaling Platforms 1 Introduction Categorization of Big Data Analytics Platforms Advantages and Drawbacks of both Horizontal and Vertical Scaling Platforms 2 Horizontal Scaling Platforms Peer-to-peer Networks Apache Hadoop Spark 3 Vertical Scaling Platforms HPC Cluster Multicore CPU GPU FPGA 4 Comparison of Big Data Analytics Platforms 5 Conclusion and Future Scope Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
7 Horizontal Scaling Platforms Peer-to-peer Networks Peer-to-peer Networks Fig-2: Architecture of peer-to-peer networks MPI is used to communicate and exchange data between nodes Major drawback of peer-to-peer networks is its inability of fault tolerance Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
8 Hadoop Stack Horizontal Scaling Platforms Apache Hadoop Fig-3: Hadoop stack with different components[1] Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
9 Horizontal Scaling Platforms Hadoop Stack (HDFS) Apache Hadoop Fig-4: Working of HDFS [2] Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
10 Horizontal Scaling Platforms Hadoop Stack (YARN) Apache Hadoop Fig-5: YARN Architecture [3] Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
11 Horizontal Scaling Platforms Hadoop Stack (MapReduce) Apache Hadoop Fig-6: Execution overview of MapReduce [4] Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
12 Horizontal Scaling Platforms Apache Hadoop Hadoop Stack (MapReduce Wrappers) Apache Pig[5], is a SQL-like environment developed at Yahoo. Hive[6], is also SQL-like data-warehousing solution developed at Facebook. Dryad LINQ[7], is a C#-like environment developed at Microsoft Research for providing better flexibility to.net framework users. Mahout[8], is a scalable machine learning library developed using MapReduce paradigm. Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
13 Horizontal Scaling Platforms Apache Hadoop Limitations of Hadoop(MapReduce) The major drawback of MapReduce is with iterative task In every iteration of MapReduce data is read and written on the disk which results in a I/O overhead Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
14 Horizontal Scaling Platforms Berkeley Data Analysis Stack I Spark Fig-7: Berkeley data analysis stack [9] Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
15 Horizontal Scaling Platforms Spark Berkeley Data Analysis Stack II Techyon [10], similar to HDFS but more aggressive use of memory and caching of frequently used files is supported Mesos [11], similar to YARN Spark [12], similar to MapReduce but in-memory processing is supported Spark wrappers, similar to MapReduce wrappers Spark Streaming (Large Scale real-time stream processing)[12] Blink DB (queries with bounded errors and bounded response times on very large data)[13] GraphX (Resilient distributed Graph System on Spark)[14] MLBase (distributed machine learning library based on Spark)[15] Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
16 Horizontal Scaling Platforms Spark Limitation of Spark(BDAS) Most of the components are in developing phase Comparatively less support is available Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
17 Outline for section 3 Vertical Scaling Platforms 1 Introduction Categorization of Big Data Analytics Platforms Advantages and Drawbacks of both Horizontal and Vertical Scaling Platforms 2 Horizontal Scaling Platforms Peer-to-peer Networks Apache Hadoop Spark 3 Vertical Scaling Platforms HPC Cluster Multicore CPU GPU FPGA 4 Comparison of Big Data Analytics Platforms 5 Conclusion and Future Scope Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
18 Vertical Scaling Platforms HPC Cluster HPC Clusters Also known as blades or supercomputers Having thousands of cores with different variety of disk organization, cache, communication mechanism, etc. MPI is generally used as the communication scheme The major drawback is the scalability Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
19 Multicore CPU Vertical Scaling Platforms Multicore CPU Fig-8: General architecture of multicore CPU Multithreading [16] is used to parallelize the task on CPU Major drawback is, its limited number of cores and dependency on system memory for data access which is limited to few gigabytes in size Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
20 GPU Vertical Scaling Platforms GPU Fig-9: GPGPU architecture [17] The major drawback of GPU is, its limited memory which is approx. 12GB per GPU Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
21 Vertical Scaling Platforms FPGA FPGA FPGAs are custom-build very specialized hardware component for specific applications [18] HDL [19] is used to program such components Some of the applications of FPGAs are, industrial control applications[20], developing educational tools[21], network security applications[22] The major drawback is, very high development cost Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
22 Comparison of Big Data Analytics Platforms Outline for section 4 1 Introduction Categorization of Big Data Analytics Platforms Advantages and Drawbacks of both Horizontal and Vertical Scaling Platforms 2 Horizontal Scaling Platforms Peer-to-peer Networks Apache Hadoop Spark 3 Vertical Scaling Platforms HPC Cluster Multicore CPU GPU FPGA 4 Comparison of Big Data Analytics Platforms 5 Conclusion and Future Scope Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
23 Comparison of Big Data Analytics Platforms Comparison of Big Data Analytics Platforms Scaling type Horizontal scaling Vertical scaling Table-2: Comparison of Big Data Analytics Platforms[1] Platforms System/Platform Application/Algorithm Scalability Data I/O Fault tolerance Real-time Data size Iterative performancport processing supported task sup- Peer-to-peer Networks Apache Hadoop Spark HPC Clusters Multicore CPU GPU FPGA Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
24 Outline for section 5 Conclusion and Future Scope 1 Introduction Categorization of Big Data Analytics Platforms Advantages and Drawbacks of both Horizontal and Vertical Scaling Platforms 2 Horizontal Scaling Platforms Peer-to-peer Networks Apache Hadoop Spark 3 Vertical Scaling Platforms HPC Cluster Multicore CPU GPU FPGA 4 Comparison of Big Data Analytics Platforms 5 Conclusion and Future Scope Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
25 Conclusion and Future Scope Conclusion and Future Scope Detailed discussion of various big data analytics platforms such as, peer-to-peer networks, Apache hadoop, spark, HPC clusters, multicore CPUs, GPUs, and FPGAs, are given Comparison of these platforms with respect to parameters such as, scalability, data I/O performance, fault tolerance, real time processing, data size supported, iterative task support, have also been given This qualitative analysis may be helpful to choose an appropriate platform for big data analytics As we are going to work on unstructured big data analytics, this comparison is going to be very helpful to us In future, we are going to survey various privacy preserving techniques for unstructured data Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
26 References References I [1] D. Singh and C. K. Reddy, A survey on platforms for big data analytics, Journal of Big Data, vol. 2, no. 1, p. 8, oct 2014, [Online] Available: [Accessed: 02-Mar-2015]. [2] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The hadoop distributed file system, in Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), ser. MSST 10. Washington, DC, USA: IEEE Computer Society, 2010, pp [3] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O Malley, S. Radia, B. Reed, and E. Baldeschwieler, Apache Hadoop YARN: Yet Another Resource Negotiator, in Proceedings of the 4th Annual Symposium on Cloud Computing, ser. SOCC 13. New York, NY, USA: ACM, 2013, pp. 5:1 5:16. [4] J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp , Jan [5] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, Pig latin: A not-so-foreign language for data processing, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD 08. New York, NY, USA: ACM, 2008, pp [6] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy, Hive: A warehousing solution over a map-reduce framework, Proc. VLDB Endow., vol. 2, no. 2, pp , Aug [7] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey, Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language, in Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI 08. Berkeley, CA, USA: USENIX Association, 2008, pp [8] S. Owen, R. Anil, T. Dunning, and E. Friedman, Mahout in Action. Greenwich, CT, USA: Manning Publications Co., [9] Berkeley data analysis stack, [Online] Available: [Accessed: 11-Mar-2015]. [10] Techyon, [Online] Available: [Accessed: 11-Mar-2015]. [11] Mesos, [Online] Available: [Accessed: 11-Mar-2015]. Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
27 References References II [12] Spark: Lighting fast cluster computing, [Online] Available: [Accessed:03-Mar-2015]. [13] S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica, Blinkdb: Queries with bounded errors and bounded response times on very large data, in Proceedings of the 8th ACM European Conference on Computer Systems, ser. EuroSys 13. New York, NY, USA: ACM, 2013, pp [14] R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica, Graphx: A resilient distributed graph system on spark, in First International Workshop on Graph Data Management Experiences and Systems, ser. GRADES 13. New York, NY, USA: ACM, 2013, pp. 2:1 2:6. [15] T. Kraska, A. Talwalkar, J. C. Duchi, R. Griffith, M. J. Franklin, and M. I. Jordan, Mlbase: A distributed machine-learning system. in Conference on Innovative Data systems Research(CIDR), [16] D. M. Tullsen, S. J. Eggers, and H. M. Levy, Simultaneous multithreading: Maximizing on-chip parallelism, in 25 Years of the International Symposia on Computer Architecture (Selected Papers), ser. ISCA 98. New York, NY, USA: ACM, 1998, pp [17] GPGPU architecture, [Online] Available: papers/kaldeway/kaldeway html/img2.png, [Accessed: 11-Mar-2015]. [18] S. Brown, R. Francis, J. Rose, and Z. Vranesic, Field-Programmable Gate Arrays, ser. VLSI, computer architecture and digital signal processing. Springer US, [19] D. Thomas and P. Moorby, The Verilog R Hardware Description Language. Springer London, Limited, [20] E. Monmasson, L. Idkhajine, M. Cirstea, I. Bahri, A. Tisan, and M. Naouar, FPGAs in industrial control applications, Industrial Informatics, IEEE Transactions on, vol. 7, no. 2, pp , May [21] D. Bouldin, Impacting education using FPGAs, in Parallel and Distributed Processing Symposium, Proceedings. 18th International, April 2004, pp [22] H. Chen, Y. Chen, and D. Summerville, A survey on the application of FPGAs for network infrastructure security, Communications Surveys Tutorials, IEEE, vol. 13, no. 4, pp , Fourth Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
28 Thank you Thank You Mr. Brijesh B. Mehta (SVNIT) Big Data Analytics Platforms: A Survey 16/04/ / 28
Survey of Hardware Platforms Available for Big Data Analytics Using K-means Clustering Algorithm
Survey of Hardware Platforms Available for Big Data Analytics Using K-means Clustering Algorithm Dr. M. Manimekalai 1, S. Regha 2 1 Director of MCA, Shrimati Indira Gandhi College, Trichy. India 2 Research
More informationA Survey on Scalable Big Data Analytics Platform
A Survey on Scalable Big Data Analytics Platform Ravindra Phule 1, Madhav Ingle 2 1 Savitribai Phule Pune University, Jayawantrao Sawant College of Engineering, Handewadi Road, Hadapsar 411028, India 2
More informationA survey on platforms for big data analytics
Singh and Reddy Journal of Big Data 2014, 1:8 SURVEY PAPER Open Access A survey on platforms for big data analytics Dilpreet Singh and Chandan K Reddy * * Correspondence: reddy@cs.wayne.edu Department
More informationAPACHE HADOOP JERRIN JOSEPH CSU ID#2578741
APACHE HADOOP JERRIN JOSEPH CSU ID#2578741 CONTENTS Hadoop Hadoop Distributed File System (HDFS) Hadoop MapReduce Introduction Architecture Operations Conclusion References ABSTRACT Hadoop is an efficient
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationSystems Engineering II. Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de
Systems Engineering II Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de About me! Since May 2015 2015 2012 Research Group Leader cfaed, TU Dresden PhD Student MPI- SWS Research Intern Microsoft
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationAn Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationHow To Analyze Log Files In A Web Application On A Hadoop Mapreduce System
Analyzing Web Application Log Files to Find Hit Count Through the Utilization of Hadoop MapReduce in Cloud Computing Environment Sayalee Narkhede Department of Information Technology Maharashtra Institute
More informationBuilding Block Components to Control a Data Rate in the Apache Hadoop Compute Platform
Building Block Components to Control a Data Rate in the Apache Hadoop Compute Platform Tien Van Do, Binh T. Vu, Nam H. Do, Lóránt Farkas, Csaba Rotter, Tamás Tarjányi Analysis, Design and Development of
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
More informationBig Data Research in the AMPLab: BDAS and Beyond
Big Data Research in the AMPLab: BDAS and Beyond Michael Franklin UC Berkeley 1 st Spark Summit December 2, 2013 UC BERKELEY AMPLab: Collaborative Big Data Research Launched: January 2011, 6 year planned
More informationPlatforms and Algorithms for Big Data Analytics Chandan K. Reddy Department of Computer Science Wayne State University
Platforms and Algorithms for Big Data Analytics Chandan K. Reddy Department of Computer Science Wayne State University http://www.cs.wayne.edu/~reddy/ http://dmkd.cs.wayne.edu/tutorial/bigdata/ What is
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationA Hadoop use case for engineering data
A Hadoop use case for engineering data Benoit Lange, Toan Nguyen To cite this version: Benoit Lange, Toan Nguyen. A Hadoop use case for engineering data. 2015. HAL Id: hal-01167510 https://hal.inria.fr/hal-01167510
More informationApproaches for parallel data loading and data querying
78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies diaconita.vlad@ie.ase.ro This paper
More informationComputing at Scale: Resource Scheduling Architectural Evolution and Introduction to Fuxi System
Computing at Scale: Resource Scheduling Architectural Evolution and Introduction to Fuxi System Renyu Yang( 杨 任 宇 ) Supervised by Prof. Jie Xu Ph.D. student@ Beihang University Research Intern @ Alibaba
More informationSpatial Data Analysis Using MapReduce Models
Advancing a geospatial framework to the MapReduce model Roberto Giachetta Abstract In recent years, cloud computing has reached many areas of computer science including geographic and remote sensing information
More informationDESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT Gita Shah 1, Annappa 2 and K. C. Shet 3 1,2,3 Department of Computer Science & Engineering, National Institute of Technology,
More informationCloud Based Big Data Analytic: a Review
, pp.7-12 http://dx.doi.org/10.21742/ijcs.2016.3.1.02 Cloud Based Big Data Analytic: a Review A.S. Manekar 1, G. Pradeepini 2 1 Research Scholar, K L University, Vijaywada, A. P asmanekar24@gmail.com 2
More informationChameleon: The Performance Tuning Tool for MapReduce Query Processing Systems
paper:38 Chameleon: The Performance Tuning Tool for MapReduce Query Processing Systems Edson Ramiro Lucsa Filho 1, Ivan Luiz Picoli 2, Eduardo Cunha de Almeida 2, Yves Le Traon 1 1 University of Luxembourg
More informationCS 294: Big Data System Research: Trends and Challenges
CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (http://www.cs.berkeley.edu/~istoica/classes/cs294/15/) 1 Big Data First papers:»
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationWhat s next for the Berkeley Data Analytics Stack?
What s next for the Berkeley Data Analytics Stack? Michael Franklin June 30th 2014 Spark Summit San Francisco UC BERKELEY AMPLab: Collaborative Big Data Research 60+ Students, Postdocs, Faculty and Staff
More informationSeeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
More informationExploring the Efficiency of Big Data Processing with Hadoop MapReduce
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.
More informationElastic Memory: Bring Elasticity Back to In-Memory Big Data Analytics
Elastic Memory: Bring Elasticity Back to In-Memory Big Data Analytics Joo Seong Jeong, Woo-Yeon Lee, Yunseong Lee, Youngseok Yang, Brian Cho, Byung-Gon Chun Seoul National University Abstract Recent big
More informationNetFlow Analysis with MapReduce
NetFlow Analysis with MapReduce Wonchul Kang, Yeonhee Lee, Youngseok Lee Chungnam National University {teshi85, yhlee06, lee}@cnu.ac.kr 2010.04.24(Sat) based on "An Internet Traffic Analysis Method with
More informationReview on the Cloud Computing Programming Model
, pp.11-16 http://dx.doi.org/10.14257/ijast.2014.70.02 Review on the Cloud Computing Programming Model Chao Shen and Weiqin Tong School of Computer Engineering and Science Shanghai University, Shanghai
More informationFault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
More informationHadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
More informationMPJ Express Meets YARN: Towards Java HPC on Hadoop Systems
Procedia Computer Science Volume 51, 2015, Pages 2678 2682 ICCS 2015 International Conference On Computational Science MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems Hamza Zafar 1, Farrukh
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationHybrid Software Architectures for Big Data. Laurence.Hubert@hurence.com @hurence http://www.hurence.com
Hybrid Software Architectures for Big Data Laurence.Hubert@hurence.com @hurence http://www.hurence.com Headquarters : Grenoble Pure player Expert level consulting Training R&D Big Data X-data hot-line
More informationHadoop and Hive. Introduction,Installation and Usage. Saatvik Shah. Data Analytics for Educational Data. May 23, 2014
Hadoop and Hive Introduction,Installation and Usage Saatvik Shah Data Analytics for Educational Data May 23, 2014 Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 1 / 15
More informationParallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
More informationEnhancing Massive Data Analytics with the Hadoop Ecosystem
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 11 November, 2014 Page No. 9061-9065 Enhancing Massive Data Analytics with the Hadoop Ecosystem Misha
More informationBeyond Batch Processing: Towards Real-Time and Streaming Big Data
Beyond Batch Processing: Towards Real-Time and Streaming Big Data Saeed Shahrivari, and Saeed Jalili Computer Engineering Department, Tarbiat Modares University (TMU), Tehran, Iran saeed.shahrivari@gmail.com,
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationMapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu
1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationDeciphering Big Data Stacks: An Overview of Big Data Tools
Deciphering Big Data Stacks: An Overview of Big Data Tools Tomislav Lipic 1, Karolj Skala 1, Enis Afgan* 1,2 1 Centre for Informatics and Computing Rudjer Boskovic Institute, RBI Zagreb, Croatia {tlipic,
More informationSurvey Paper on Big Data Processing and Hadoop Components
Survey Paper on Big Data Processing and Hadoop Components Poonam S. Patil 1, Rajesh. N. Phursule 2 Department of Computer Engineering JSPM s Imperial College of Engineering and Research, Pune, India Abstract:
More informationManaging large clusters resources
Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth
More informationBrave New World: Hadoop vs. Spark
Brave New World: Hadoop vs. Spark Dr. Kurt Stockinger Associate Professor of Computer Science Director of Studies in Data Science Zurich University of Applied Sciences Datalab Seminar, Zurich, Oct. 7,
More informationDESIGN AN EFFICIENT BIG DATA ANALYTIC ARCHITECTURE FOR RETRIEVAL OF DATA BASED ON
DESIGN AN EFFICIENT BIG DATA ANALYTIC ARCHITECTURE FOR RETRIEVAL OF DATA BASED ON WEB SERVER IN CLOUD ENVIRONMENT Gita Shah 1, Annappa 2 and K. C. Shet 3 1,2,3 Department of Computer Science & Engineering,
More informationA Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
More informationComparative analysis of mapreduce job by keeping data constant and varying cluster size technique
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in
More informationMRGIS: A MapReduce-Enabled High Performance Workflow System for GIS
MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS Qichang Chen, Liqiang Wang Department of Computer Science University of Wyoming {qchen2, wang}@cs.uwyo.edu Zongbo Shang WyGISC and Department
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationA STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
More informationSurvey on Big Data Tools and Techniques
Survey on Big Data Tools and Techniques Veenakshi Devi 1, Meenakshi Sharma 2 1,2, (Department of Computer Science & Engineering, Sri Sai College of Engineering and Technology) ABSTRACT : Collection of
More informationAn introduction to Tsinghua Cloud
. BRIEF REPORT. SCIENCE CHINA Information Sciences July 2010 Vol. 53 No. 7: 1481 1486 doi: 10.1007/s11432-010-4011-z An introduction to Tsinghua Cloud ZHENG WeiMin 1,2 1 Department of Computer Science
More informationUSC Viterbi School of Engineering
USC Viterbi School of Engineering INF 551: Foundations of Data Management Units: 3 Term Day Time: Spring 2016 MW 8:30 9:50am (section 32411D) Location: GFS 116 Instructor: Wensheng Wu Office: GER 204 Office
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationEnhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
More informationMANAGING RESOURCES IN A BIG DATA CLUSTER.
MANAGING RESOURCES IN A BIG DATA CLUSTER. Gautier Berthou (SICS) EMDC Summer Event 2015 www.hops.io @hopshadoop We are producing lot of data Where does they Come From? On-line services : PBs per day Scientific
More informationParallelizing the Training of the Kinect Body Parts Labeling Algorithm
Parallelizing the Training of the Kinect Body Parts Labeling Algorithm Mihai Budiu Microsoft Research Mihai.Budiu@microsoft.com Derek G. Murray Microsoft Research derekmur@microsoft.com Jamie Shotton Microsoft
More informationData Management Course Syllabus
Data Management Course Syllabus Data Management: This course is designed to give students a broad understanding of modern storage systems, data management techniques, and how these systems are used to
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationIntegrating Hadoop and Parallel DBMS
Integrating Hadoop and Parallel DBMS Yu Xu Pekka Kostamaa Like Gao Teradata San Diego, CA, USA and El Segundo, CA, USA {yu.xu,pekka.kostamaa,like.gao}@teradata.com ABSTRACT Teradata s parallel DBMS has
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationA Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems
A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down
More informationScheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds
ABSTRACT Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds 1 B.Thirumala Rao, 2 L.S.S.Reddy Department of Computer Science and Engineering, Lakireddy Bali Reddy College
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationFlexPRICE: Flexible Provisioning of Resources in a Cloud Environment
FlexPRICE: Flexible Provisioning of Resources in a Cloud Environment Thomas A. Henzinger Anmol V. Singh Vasu Singh Thomas Wies Damien Zufferey IST Austria A-3400 Klosterneuburg, Austria {tah,anmol.tomar,vasu.singh,thomas.wies,damien.zufferey}@ist.ac.at
More informationThe Berkeley AMPLab - Collaborative Big Data Research
The Berkeley AMPLab - Collaborative Big Data Research UC BERKELEY Anthony D. Joseph LASER Summer School September 2013 About Me Education: MIT SB, MS, PhD Joined Univ. of California, Berkeley in 1998 Current
More informationData and Algorithms of the Web: MapReduce
Data and Algorithms of the Web: MapReduce Mauro Sozio May 13, 2014 Mauro Sozio (Telecom Paristech) Data and Algorithms of the Web: MapReduce May 13, 2014 1 / 39 Outline 1 MapReduce Introduction MapReduce
More informationSpark: Cluster Computing with Working Sets
Spark: Cluster Computing with Working Sets Matei Zaharia N. M. Mosharaf Chowdhury Michael Franklin Scott Shenker Ion Stoica Electrical Engineering and Computer Sciences University of California at Berkeley
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationsince 2009. His interests include cloud computing, distributed computing, and microeconomic applications in computer science. alig@cs.berkeley.
PROGRAMMING Mesos Flexible Resource Sharing for the Cloud BENJAMIN HINDMAN, ANDY KONWINSKI, MATEI ZAHARIA, ALI GHODSI, ANTHONY D. JOSEPH, RANDY H. KATZ, SCOTT SHENKER, AND ION STOICA Benjamin Hindman is
More informationGuidelines for Selecting Hadoop Schedulers based on System Heterogeneity
Noname manuscript No. (will be inserted by the editor) Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity Aysan Rasooli Douglas G. Down Received: date / Accepted: date Abstract Hadoop
More informationReconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra
More informationComparison of Different Implementation of Inverted Indexes in Hadoop
Comparison of Different Implementation of Inverted Indexes in Hadoop Hediyeh Baban, S. Kami Makki, and Stefan Andrei Department of Computer Science Lamar University Beaumont, Texas (hbaban, kami.makki,
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationInternational Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014)
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE N.Alamelu Menaka * Department of Computer Applications Dr.Jabasheela Department of Computer Applications Abstract-We are in the age of big data which
More informationApache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas
Apache Flink Next-gen data analysis Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationToward Lightweight Transparent Data Middleware in Support of Document Stores
Toward Lightweight Transparent Data Middleware in Support of Document Stores Kun Ma, Ajith Abraham Shandong Provincial Key Laboratory of Network Based Intelligent Computing University of Jinan, Jinan,
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More information11/18/15 CS 6030. q Hadoop was not designed to migrate data from traditional relational databases to its HDFS. q This is where Hive comes in.
by shatha muhi CS 6030 1 q Big Data: collections of large datasets (huge volume, high velocity, and variety of data). q Apache Hadoop framework emerged to solve big data management and processing challenges.
More informationResource Scalability for Efficient Parallel Processing in Cloud
Resource Scalability for Efficient Parallel Processing in Cloud ABSTRACT Govinda.K #1, Abirami.M #2, Divya Mercy Silva.J #3 #1 SCSE, VIT University #2 SITE, VIT University #3 SITE, VIT University In the
More informationSnapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationBSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,
More informationSARAH Statistical Analysis for Resource Allocation in Hadoop
SARAH Statistical Analysis for Resource Allocation in Hadoop Bruce Martin Cloudera, Inc. Palo Alto, California, USA bruce@cloudera.com Abstract Improving the performance of big data applications requires
More informationRadoop: Analyzing Big Data with RapidMiner and Hadoop
Radoop: Analyzing Big Data with RapidMiner and Hadoop Zoltán Prekopcsák, Gábor Makrai, Tamás Henk, Csaba Gáspár-Papanek Budapest University of Technology and Economics, Hungary Abstract Working with large
More informationSee Spot Run: Using Spot Instances for MapReduce Workflows
See Spot Run: Using Spot Instances for MapReduce Workflows Navraj Chohan Claris Castillo Mike Spreitzer Malgorzata Steinder Asser Tantawi Chandra Krintz IBM Watson Research Hawthorne, New York Computer
More informationProcessing Large Amounts of Images on Hadoop with OpenCV
Processing Large Amounts of Images on Hadoop with OpenCV Timofei Epanchintsev 1,2 and Andrey Sozykin 1,2 1 IMM UB RAS, Yekaterinburg, Russia, 2 Ural Federal University, Yekaterinburg, Russia {eti,avs}@imm.uran.ru
More informationData pipeline in MapReduce
Data pipeline in MapReduce Jiaan Zeng and Beth Plale School of Informatics and Computing Indiana University Bloomington, Indiana 47408 Email: {jiaazeng, plale} @indiana.edu Abstract MapReduce is an effective
More informationBig Data Frameworks Course. Prof. Sasu Tarkoma 10.3.2015
Big Data Frameworks Course Prof. Sasu Tarkoma 10.3.2015 Contents Course Overview Lectures Assignments/Exercises Course Overview This course examines current and emerging Big Data frameworks with focus
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationEvaluating Cassandra Data-sets with Hadoop Approaches
Evaluating Cassandra Data-sets with Hadoop Approaches Ruchira A. Kulkarni Student (BE), Computer Science & Engineering Department, Shri Sant Gadge Baba College of Engineering & Technology, Bhusawal, India
More information