Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Multicore Processors

Size: px
Start display at page:

Download "Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Multicore Processors"

Transcription

1 Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Sudarsanam P Abstract G. Singaravel Parallel computing is an base mechanism for data process with scheduling task, executing the task within time span, reducing the data block through indexing etc., for analytic the given data and send the data as a outcome to further process in the system. Cluster, Hadoop environment and Map Reduce are various important factors for making platform to creating a parallel processing to execute the process very effectively in job scheduling and memory utilization. The number of job scheduling algorithm is promoted in real time processing for various applications to data analytics, even though the mismatching between the job scheduling and their time assign to particular task. In this research paper, a approach is introduce the Map Parallel- scheduling (MPS) using Hadoop environment and Map Reduce concept to create the parallel processing with scheduling algorithm for size the data, memory space utilization and matching between the job scheduling with time span through this MPS. Keyword : Parallel processing, Data analytic, Hadoop Environment, Map Reduce, Map Parallelscheduling (MPS). Introduction Parallel computing is a process to work simultaneously take different operation or activities for same domain, the main principle of parallel processing is divided into smaller part to execute at same time, the best real time example is washing machine. In digital era, the modernisation technology executing different platform to reduce data dimension and improving the speed of data processing via various computing facilitate to smooth computing process like parallel computing, distributing computing, grid computing, utility computing, cloud computing etc., these technology to reduce the time, utilize the memory space, scheduling, job execution [6][7] with powerful support of operating system, complier and programming tools [8]. In parallel computing/distributing computing base for hand out of information and process the information with time span with limited allocation space to meet the designation part, while parallelism using different type of level such as bit, instruction and task to calculate the data, it can be re-ordered and collective joint in a groups which are then executed in parallel without changing the result of the program. Data is disorder form to make them in analytical way to processes in large outcome to meaningful methods, big data analytic, cluster, hadoop environment were making supporting to processing the data either in any computing system [9]. The following section discuss briefly about, how parallel/distributing computing making the BMS Institute of Technology, Bangalore, India Department of Information Technology K.S.R College of Engineering Tiruchengode, Tamilnadu, India

2 Advances in Theoretical Computer Applications scheduling process and creating new platform for this methods with the help of cluster and hadoop environment for data analytics [10]. Theoretical Foundation Big data is a data set is mainly used for the purpose of examining the big data to uncover pattern, unknown correlations and other useful information to fetch faster and better with analysis of all available data set. Big data is volume, velocity, variety which of the information can gather towards it. Big data refer the dataset storage capacity and use to reduce the size beyond the need volume. The variety which explains the source of data and types structured data, unstructured data, and semi structured data. The structure data gives the result which may use to measure the processing needs of data set. The unstructured data which may access the entire text document, video, audio, etc... That takes place the much byte to store in dataset. The semi structured data which implies the HTML and XML document which has the storage of bytes with accurate result. A. Clustering Figure 1.1: Basic Hadoop architecture. Clustering is the task which may help to forming the group to store such information within the dataset and used to separate each of the data with the different folders. The concept is settable for reducing the space and these are obtained many algorithms that are not specified which are get mean value of other cluster. The ISBN : [61]

3 Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for disturbed cluster can separate entire document that are necessary for takenupon the structure of the data set with the cluster. Cluster can remove all over other data which is irrelevant to the particular data set. B. Hadoop Hadoop used to spired the conditions which is often processed in big data that can analysis the entire dataset and its subsets sequences for considering the creating the new environment. This provide the large data stores with extremely adaptive to the data set and which may runs the application using map reduce algorithm that may contain turn towards the fortune of business trend.hadoop frame-worked application works in an platform /environment, it provides spread the storage and calculation the various clusters of radius of the given distance. Hadoop environment is for the considered to scale up from single server to thousands of machines, each offering local computation and storage.a distributed file system that provides high-throughput access to application data. The following figure 1.1 show the basic hadoop environment for new created the parallel computing C. Map Reducing Map reducing is the processing of programming model that keep the large data set in the parallel levels of the disturbed cluster algorithm. Map reducing which may filtering and sort list the entire data which stores is in cluster the distributed server which running the different tasks that follows the dataset with the parallel communication and storage system. Map reducing can write in varies kinds of programming language that are enhanced with the dataset. Input reader can read the information and send towards the Map reduce and filter the data and partition the data then it compare all kind of data in the particular data set that can reduce the remaining data and store in cluster dataset. The about background is required to create a new hadoop environment for the parallel/ distributing process. I. Literature Review Scheduling of Parallel Applications Using Map Reduce On Cloud: A Literature Survey (2015) [1]: The application or environment in parallel form that are introduce to create new trend which is large members used, measure and modify the requirement of the data that can identify the varies size, volume, velocity of the data with execution speed of the process. Cloud computing that can develop the negotiation data with varies size of application and cost of execution. Map reducing model which is used to widely processing the large scale data exhaustive application on cluster in cloud environment. Scheduling can be prepared efficient by using the knowledge of data identification of the map tasks, helps out to reduce the in-between network traffic throughout the reduce phase, speeding the execution of map reduce applications i cloud environment. ISBN : [62]

4 Advances in Theoretical Computer Applications A survey on DyScale: Hadoop Job Scheduler for Different (2016) [2]: The process can contribute the condition of the limited speed and possible complication of the processor and modern functionality of the processor that trade towards the power efficiency of the processor that are correlated to the slowdown and faster trend of the core processor. Dyscale is the framework that can gives the occasion of the schedulers and performance of the servers that occurs the heterogeneous for processing the map reducing in multicore processor like parallel and distributor. The hadoop condition based on the new trends of the job scheduling process since the data can be assume either slow or fast serves the batch job process. Interacting the Map reduce while small scheduler that aborted performing the large scheduling process and the input files which has the task between the positive and negative situation which occupy the information throughout the mapping process of the job trackers and filter the environment and reduce the combined phases of the core processors. Dynamic Clustering for Scientific Workflows with Load Balancing in Resource (2015) [3]: The clustering task which can combine together multiple tasks that are easily balanced toward single task source with the data set. The various workflow which is necessary to the enhanced the needs of the cluster the important of the workflow which can identify the limitation of the running tasks which is concurrently available at the workflow of the load balancing resources. This may assume the subworkflow which is predicated to the cluster that can dedicated the separate task for each of the balancing workflow. It increases the inter task communication between the balancing workflow which discovery similar sub-workflow in the tasks and the load balancer that can spired the information the similar way towards the node of another balancing node which are required in entire information form particular type of cluster storage that can gather the information according to the dynamic cluster with the help of load balancer. An efficient Mapreduce scheduling algorithm in hadoop (2015) [4]: The concept of hadoop that are open source framework programming that are very supportive to the large number of dataset which are distributed in nature. Hereby, Mapreducing is the per pose of getting the large dataset and parallel disturbed algorithm on cluster. The most benefit mapreduing which are handles the information and fault automatically which hide the complexity that abided from the users. Hadoop mainly uses the FIFO conditions that allocated the jobs are executed in the order of their appearance. The progress is only suitable for homogenous not for heterogeneous the performance will be poor the progress the algorithm which used to reduce the execution time between the various algorithm FIFO and SAMR is reduce the task time. The time interval of loader which gives input of the entire time complex of the with the unbalanced job tracker in the split which is reduce task separately parameter in the parallel level of the Mapreduce framework which is required an SAMR algorithm Outlookon Various Scheduling Approaches inhadoop(2016) [5]: Heterogeneous are used for single core process that are generated under the simulated process of ISBN : [63]

5 Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for hadoop that are reduced an processor which event better the single core or multi core process to many core process. Since both core are functions under the processing of other efficiency processor which has hadoop concept based with the implementation of the overview scheduling programming that may provide power efficiency of the multilayer performance of the core scheduling that are helps to perform various approaches of the environment and the core are parallel to the enhancing needs of the hadoop programme with the tracker of the single core processor with the many processor Mapreducing of the dataset which scheduled in the large scale data set. Proposal Research From the literature review, the various researcher mention that scheduler is major role to making the effective processor, but fully can be proper scheduling algorithm not derived for the task of give data, for example: Scheduling algorithm is not append for time sharing system. The first come, first served (FCFS) Scheduling algorithm is non pre-emptive, is an unsatisfactory for interactive systems as it favours long tasks. Priority scheduling Algorithm (PSA) Scheduling algorithm which is preventative in which all things are based on the precedence, each process in the system is based on the priority whereas maximum priority job can run first while lower priority job can be made to wait. Even through, number of scheduling algorithm like Sampling Based Scheduling,Random Scheduling, Memory Dominance Scheduling, Dynamic Scheduling, Age-based Scheduling etc., The proposal system design the environment with hadoop with base layer as Hadoop Distributed File System (HDFS) stores a large number of data to accessing the data on the clusters platform and second layer map reduce to processing the data from parallel computing/ distributing computing ad it act as intermediate layer to data generated by the task and helps to enhancing the performance of the Map Reduce task. In this layer focused for the data generated by the map task and that are later used by the reduce task in parallel and automatic execution and framework plays an essential role in improving the performance. ISBN : [64]

6 Advances in Theoretical Computer Applications The figure 1.3 show the Hadoop environment create the parallel processing with scheduling algorithm for size the data, which name as Map Parallel- scheduling (MPS) using HDFC environment HDFC Map Reduce Parallel Computing Schedulling Algorithm HIVE PIG Conclusion Map-Parallel-Scheduling (MPS) using HDFC environment to performance is the main aspects of any problem or solution for data analytics and processors which uses different core types on a single processor can be used and improve energy and efficiency without giving up the most significant performance above mentioned schedulers and improves scalability with multithreaded workload. The scheduling can used extended for optimization of map reduce programming sequence of data security and data management. MPS Hadoop to make low cost high availability and processing power with job ordering scheduling policies for achieving fairness to job completion processes. References [1] A.Sree Lakshmi, Dr.M.BalRaju, Dr.N.Subhash Chandra, Scheduling of Parallel Applications Using Map Reduce On Cloud: A Literature Survey (2015). In International Journal of Computer Science and Information Technologies,(IJCSIT) Vol. 6 (1), 2015, [2] Supriya.R and Mr.Kantharaju.H.C, A survey on DyScale: Hadoop Job Scheduler for Different (2016).Imperial Journal of Interdisciplinary Research (IJIR) Vol-2, Issue-3, 2016 ISSN : [3] Roya Bagheri1 and Abolfazel Toroghi Haghighat, Dynamic Clustering for Scientific Workflows with Load Balancing in Resource (2015), International Journal of Computer Science and Telecommunications(IJCST) Volume 6, Issue 8, August ISBN : [65]

7 Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for [4] R.Thangaselvi, S.Ananthbabu and R.Aruna, An efficient Mapreduce scheduling algorithm in hadoop (2015), International Journal of Engineering Research & Science (IJOER) Vol-1, Issue-9, December [5] P. Amuthabala, Kavya. T.C, Kruthika. R and Nagalakshmi. N, Outlook on Various Scheduling Approaches in Hadoop, International Journal on Computer Science and Engineering (IJCSE), ISSN : Vol. 8 No.2 Feb [6] Feng Yan, Ludmila Cherkasova, Zhuoyao Zhang and Evgenia Smirni, DyScale: a MapReduce Job Scheduler for Heterogeneous, IEEE Transactions on Cloud Computing, volume PP, issue 99, [7] Dazhao Cheng, Jia Rao, Changjun Jiang and Xiaobo Zhou, Resource and Deadline- Aware Job Scheduling in Dynamic Hadoop Clusters, IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp , [8] Sofia D'Souza and K. Chandrasekaran, Analysis of Map Reduce scheduling and its improvements in cloud environment, IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), pp. 1-5, [9] Hongyang Sun; Yangjie Cao; Wen-Jing Hsu, Efficient Adaptive Scheduling of Multiprocessors with Stable Parallelism Feedback, IEEE Transactions on Parallel and Distributed Systems, volume 22, issue 4, pp , [10] N. Saranya; R. C. Hansdah, Dynamic Partitioning Based Scheduling of Real- Time Tasks in, IEEE 18 th International Symposium on Real- Time Distributed Computing, ISBN : [66]

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

SCHEDULING IN CLOUD COMPUTING

SCHEDULING IN CLOUD COMPUTING SCHEDULING IN CLOUD COMPUTING Lipsa Tripathy, Rasmi Ranjan Patra CSA,CPGS,OUAT,Bhubaneswar,Odisha Abstract Cloud computing is an emerging technology. It process huge amount of data so scheduling mechanism

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

Big Data: Study in Structured and Unstructured Data

Big Data: Study in Structured and Unstructured Data Big Data: Study in Structured and Unstructured Data Motashim Rasool 1, Wasim Khan 2 mail2motashim@gmail.com, khanwasim051@gmail.com Abstract With the overlay of digital world, Information is available

More information

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method Improving Data Processing Speed in Big Data Analytics Using HDFS Method M.R.Sundarakumar Assistant Professor, Department Of Computer Science and Engineering, R.V College of Engineering, Bangalore, India

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Hadoop Scheduler w i t h Deadline Constraint

Hadoop Scheduler w i t h Deadline Constraint Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

More information

Hadoop Cluster Applications

Hadoop Cluster Applications Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Perspective Big Data Framework for Healthcare using Hadoop

More information

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

More information

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex

More information

Hadoop Technology for Flow Analysis of the Internet Traffic

Hadoop Technology for Flow Analysis of the Internet Traffic Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Survey on Scheduling Algorithm in MapReduce Framework

Survey on Scheduling Algorithm in MapReduce Framework Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India

More information

Group Based Load Balancing Algorithm in Cloud Computing Virtualization

Group Based Load Balancing Algorithm in Cloud Computing Virtualization Group Based Load Balancing Algorithm in Cloud Computing Virtualization Rishi Bhardwaj, 2 Sangeeta Mittal, Student, 2 Assistant Professor, Department of Computer Science, Jaypee Institute of Information

More information

Task Scheduling in Hadoop

Task Scheduling in Hadoop Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

Task Scheduling Algorithm for Map Reduce To Control Load Balancing In Big Data

Task Scheduling Algorithm for Map Reduce To Control Load Balancing In Big Data Task Scheduling Algorithm for Map Reduce To Control Load Balancing In Big Data Ms.N.Saranya, M.E., (CSE), Jay Shriram Group of Institutions, Tirupur. charanyaa19@gmail.com Abstract- Load balancing is biggest

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, newlin_rajkumar@yahoo.co.in

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load Pooja.B. Jewargi Prof. Jyoti.Patil Department of computer science and engineering,

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.

More information

Fault Tolerance in Hadoop for Work Migration

Fault Tolerance in Hadoop for Work Migration 1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE

UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE Mr. Swapnil A. Kale 1, Prof. Sangram S.Dandge 2 1 ME (CSE), First Year, Department of CSE, Prof. Ram Meghe Institute

More information

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster , pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing

More information

Survey on Load Rebalancing for Distributed File System in Cloud

Survey on Load Rebalancing for Distributed File System in Cloud Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.

More information

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Dharmendra Agawane 1, Rohit Pawar 2, Pavankumar Purohit 3, Gangadhar Agre 4 Guide: Prof. P B Jawade 2

More information

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Understanding Big Data and Big Data Analytics Getting familiar with Hadoop Technology Hadoop release and upgrades

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

DyScale: a MapReduce Job Scheduler for Heterogeneous Multicore Processors

DyScale: a MapReduce Job Scheduler for Heterogeneous Multicore Processors JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JULY 214 1 DyScale: a MapReduce Job Scheduler for Heterogeneous Multicore Processors Feng Yan, Member, IEEE, Ludmila Cherkasova, Member, IEEE, Zhuoyao Zhang,

More information

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems Ardhendu Mandal and Subhas Chandra Pal Department of Computer Science and Application, University

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014 White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Big Application Execution on Cloud using Hadoop Distributed File System

Big Application Execution on Cloud using Hadoop Distributed File System Big Application Execution on Cloud using Hadoop Distributed File System Ashkan Vates*, Upendra, Muwafaq Rahi Ali RPIIT Campus, Bastara Karnal, Haryana, India ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

A REAL TIME MEMORY SLOT UTILIZATION DESIGN FOR MAPREDUCE MEMORY CLUSTERS

A REAL TIME MEMORY SLOT UTILIZATION DESIGN FOR MAPREDUCE MEMORY CLUSTERS A REAL TIME MEMORY SLOT UTILIZATION DESIGN FOR MAPREDUCE MEMORY CLUSTERS Suma R 1, Vinay T R 2, Byre Gowda B K 3 1 Post graduate Student, CSE, SVCE, Bangalore 2 Assistant Professor, CSE, SVCE, Bangalore

More information

MEASURING PERFORMANCE OF DYNAMIC LOAD BALANCING ALGORITHMS IN DISTRIBUTED COMPUTING APPLICATIONS

MEASURING PERFORMANCE OF DYNAMIC LOAD BALANCING ALGORITHMS IN DISTRIBUTED COMPUTING APPLICATIONS MEASURING PERFORMANCE OF DYNAMIC LOAD BALANCING ALGORITHMS IN DISTRIBUTED COMPUTING APPLICATIONS Priyesh Kanungo 1 Professor and Senior Systems Engineer (Computer Centre), School of Computer Science and

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

CloudRank-D:A Benchmark Suite for Private Cloud Systems

CloudRank-D:A Benchmark Suite for Private Cloud Systems CloudRank-D:A Benchmark Suite for Private Cloud Systems Jing Quan Institute of Computing Technology, Chinese Academy of Sciences and University of Science and Technology of China HVC tutorial in conjunction

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

ISSN:2320-0790. Keywords: HDFS, Replication, Map-Reduce I Introduction:

ISSN:2320-0790. Keywords: HDFS, Replication, Map-Reduce I Introduction: ISSN:2320-0790 Dynamic Data Replication for HPC Analytics Applications in Hadoop Ragupathi T 1, Sujaudeen N 2 1 PG Scholar, Department of CSE, SSN College of Engineering, Chennai, India 2 Assistant Professor,

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Grid Computing Approach for Dynamic Load Balancing

Grid Computing Approach for Dynamic Load Balancing International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-1 E-ISSN: 2347-2693 Grid Computing Approach for Dynamic Load Balancing Kapil B. Morey 1*, Sachin B. Jadhav

More information

Big RDF Data Partitioning and Processing using hadoop in Cloud

Big RDF Data Partitioning and Processing using hadoop in Cloud Big RDF Data Partitioning and Processing using hadoop in Cloud Tejas Bharat Thorat Dept. of Computer Engineering MIT Academy of Engineering, Alandi, Pune, India Prof.Ranjana R.Badre Dept. of Computer Engineering

More information

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult

More information

Approaches for parallel data loading and data querying

Approaches for parallel data loading and data querying 78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies diaconita.vlad@ie.ase.ro This paper

More information

EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALLEL JOBS IN CLUSTERS

EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALLEL JOBS IN CLUSTERS EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALLEL JOBS IN CLUSTERS A.Neela madheswari 1 and R.S.D.Wahida Banu 2 1 Department of Information Technology, KMEA Engineering College,

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters Abhijit A. Rajguru, S.S. Apte Abstract - A distributed system can be viewed as a collection

More information

A Review on Load Balancing Algorithms in Cloud

A Review on Load Balancing Algorithms in Cloud A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

An efficient Mapreduce scheduling algorithm in hadoop R.Thangaselvi 1, S.Ananthbabu 2, R.Aruna 3

An efficient Mapreduce scheduling algorithm in hadoop R.Thangaselvi 1, S.Ananthbabu 2, R.Aruna 3 An efficient Mapreduce scheduling algorithm in hadoop R.Thangaselvi 1, S.Ananthbabu 2, R.Aruna 3 1 M.E: Department of Computer Science, VV College of Engineering, Tirunelveli, India 2 Assistant Professor,

More information

IoT and Big Data- The Current and Future Technologies: A Review

IoT and Big Data- The Current and Future Technologies: A Review Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 5, Issue. 1, January 2016,

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Cloud Computing and Amazon Web Services Cloud Computing Amazon

More information

CDBMS Physical Layer issue: Load Balancing

CDBMS Physical Layer issue: Load Balancing CDBMS Physical Layer issue: Load Balancing Shweta Mongia CSE, School of Engineering G D Goenka University, Sohna Shweta.mongia@gdgoenka.ac.in Shipra Kataria CSE, School of Engineering G D Goenka University,

More information

Massive Cloud Auditing using Data Mining on Hadoop

Massive Cloud Auditing using Data Mining on Hadoop Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed

More information

Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing

Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing Research Inventy: International Journal Of Engineering And Science Vol.2, Issue 10 (April 2013), Pp 53-57 Issn(e): 2278-4721, Issn(p):2319-6483, Www.Researchinventy.Com Fair Scheduling Algorithm with Dynamic

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

A Study of Data Management Technology for Handling Big Data

A Study of Data Management Technology for Handling Big Data Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,

More information

Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics

Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics P.Saravana kumar 1, M.Athigopal 2, S.Vetrivel 3 Assistant Professor, Dept

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information