Cloud Computing. and Scheduling. Data-Intensive Computing. Frederic Magoules, Jie Pan, and Fei Teng SILKQH. CRC Press. Taylor & Francis Group



Similar documents
Grid Computing FUNDAMENTALS OF. Theory, Algorithms and Technologies. Frederic Magoules. Edited by. CRC Press

Development and Management

Engineering Design. Software. Theory and Practice. Carlos E. Otero. CRC Press. Taylor & Francis Croup. Taylor St Francis Croup, an Informa business

CHAPMAN & HALL/CRC INNOVATIONS IN SOFTWARE ENGINEERING AND SOFTWARE DEVELOPMENT. Software Test Attacks to Break Mobile and Embedded Devices

SOFTWARE TESTING AS A SERVICE

Customer and Business Analytic

Improving Business Process Performance

ANDROID SECURITY ATTACKS AND DEFENSES ABHISHEK DUBEY I ANMOL MISRA. ( r öc) CRC Press VV J Taylor & Francis Group ^ "^ Boca Raton London New York

THE COMPLETE PROJECT MANAGEMENT METHODOLOGY AND TOOLKIT

Advances in Network Management

Cloud Computing. Theory and Practice. Dan C. Marinescu. Morgan Kaufmann is an imprint of Elsevier HEIDELBERG LONDON AMSTERDAM BOSTON

Information Technology and Organizational Learning

Ctfo MANAGEMENT SECURITY PATCH. Felicia M. Nicastro. Second Edition. CRC Press. VC#*' J Taylor & Francis Group / Boca Raton London New York

Parallel Computing for Data Science

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Lean Management System LMS:2OI2

RESILIENT. SECURE and SOFTWARE. Requirements, Test Cases, and Testing Methods. Mark S. Merkow and Lakshmikanth Raghavan. CRC Press

Implementation. Business-Driven IT-Wide Agile (Scrum) and Kanban (Lean) Andrew T. Pham and David K. Pham. An Action Guide for Business and IT Leaders

Design of Enterprise Systems

Introduction to Supply Chain Management Technologies

Open Source Data Warehousing and Business Intelligence

EFFECTIVE NON-PROFIT MANAGEMENT

The Green and Virtual Data Center

Implementing the Project Management Balanced Scorecard

for Research and Guiding Innovation for Positive R&D Outcomes Lory Mitchell Wingate

Management. Project. Software. Ashfaque Ahmed. A Process-Driven Approach. CRC Press. Taylor Si Francis Group Boca Raton London New York

Quality Management. Theory and Application PETER D. MAUCH. Ltfi) CRC Press. \ V J Taylor & Francis Group. ^ ^ Boca Raton London New York

A Simulation-Based lntroduction Using Excel

Managing Data in Motion

Study Guide. ScrumMaster. The. James Schiel. CRC Press. Taylor & Francis Croup, an Inform* business AN AUERBACH BOOK. CRC Press (s an imprint of the

Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools

Principles of Distributed Database Systems

Introduction to Financial Models for Management and Planning

BUSINESS ANALYSIS FDR INTELLIGENCE

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

CREATING A THIRD EDITION DAVID MANN

Networking. Systems Design and. Development. CRC Press. Taylor & Francis Croup. Boca Raton London New York. CRC Press is an imprint of the

Cloud Computing. Implementation, Management, and Security. John W. Rittinghouse James F. Ransome

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Contents. Preface Acknowledgements. Chapter 1 Introduction 1.1

Big Data Analytics From Strategie Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph

Data Warehousing in the Age of Big Data

Production and Operations. Management Systems

Contents RELATIONAL DATABASES

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

THE MODERN THEORY OF THE TOYOTA PRODUCTION SYSTEM

SOFTWARE TESTING. A Craftsmcm's Approach THIRD EDITION. Paul C. Jorgensen. Auerbach Publications. Taylor &. Francis Croup. Boca Raton New York

Data Center Storage. Hubbert Smith. Implementation, and Management »C) Cost-Effective Strategies, CRC Press J Taylor & Francis Group

Warning Signs and the Red Flag System

Contents. Dedication List of Figures List of Tables. Acknowledgments

Measuring Data Quality for Ongoing Improvement

Monte Carlo Methods and Models in Finance and Insurance

Data Warehouse Design

CLINICAL DATA MANAGEMENT

The Data Access Handbook

Deliuery Networks. A Practical Guide to Content. Gilbert Held. Second Edition. CRC Press. Taylor & Francis Group

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Exploratory Data Analysis with MATLAB

SECOND EDITION THE SECURITY RISK ASSESSMENT HANDBOOK. A Complete Guide for Performing Security Risk Assessments DOUGLAS J. LANDOLL

How To Write A Diagram

Management. ITIL Release. Dave Howard. A Hands-on Guide. CRC Press. Taylor & Francis Group. Taylor St Francis Croup, an Informa business

The Data Warehouse Challenge

Network Security A Decision and Game-Theoretic Approach

Desktop Grid. Computing. Christophe Cerin. Gilles Fedak. Edited by. CRC Press. Taylor & Francis Croup. Taylor & Francis Group, an Informa business

Green Energy Technology, Economics and Policy

Customer Relationship Management

SQL Server Integration Services Design Patterns

Data Algorithms. Mahmoud Parsian. Tokyo O'REILLY. Beijing. Boston Farnham Sebastopol

Computer Security Literacy

Networking. Cloud and Virtual. Data Storage. Greg Schulz. Your journey. effective information services. to efficient and.

ARIS Design Platform Getting Started with BPM

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Apache Hadoop. Alexandru Costan

Green Project Management

BIG DATA-AS-A-SERVICE

Optimized Scheduling in Real-Time Environments with Column Generation

Business Architecture

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Analysis Services Step by Step

SAP HANA SPS 09 - What s New? SAP HANA Scalability

Winning the Hardware-Software Game

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Preface. Table of Contents. List of Figures. List of Tables. List of Abbreviations. 1 Introduction 1. 2 Problem 23.

Scheduling in the Cloud

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, p i.

Business Information Systems and Technology

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Location-Based Information Systems

Transcription:

Cloud Computing Data-Intensive Computing and Scheduling Frederic Magoules, Jie Pan, and Fei Teng SILKQH CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an Informa business A CHAPMAN & HALL BOOK

Contents List of figures xiii List of tables xv Foreword xvii Preface xix Warranty xxv 1 Overview of cloud computing 1 1.1 Introduction 1 1.1.1 Cloud definitions 1 1.1.2 System architecture 2 1.1.3 Deployment models 3 1.1.4 Cloud characteristics 3 1.2 Cloud evolution 6 1.2.1 Getting ready for the cloud 6 1.2.2 Brief history 7 1.2.3 Comparison w ith related technologies 8 1.3 Cloud services 10 1.4 Cloud projects 12 1.4.1 Commercial products 12 1.4.2 Research projects 13 1.5 Cloud challenges 15 1.5.1 MapReduce programming model 15 1.5.2 Data management 16 1.5.3 Resource scheduling 16 1.6 Concluding remarks 17 2 Resource scheduling for cloud computing 19 2.1 Introduction 19 2.2 Cloud service scheduling hierarchy 19 2.3 Economic models for resource-allocation scheduling 21 2.3.1 Market strategies 21 2.3.2 Auction strategies 23 2.3.3 Economic schedulers 25 vii

viii 2.4 Heuristic models for task-execution scheduling 27 2.4.1 Static strategies 28 2.4.2 Dynamic strategies 30 2.4.3 Heuristic schedulers 32 2.5 Real-time scheduling in cloud computing 35 2.5.1 Fixed priority strategies 35 2.5.2 Dynamic priority strategies 37 2.5.3 Real-time schedulers 38 2.6 Concluding remarks 39 3 Game theoretical allocation in a cloud datacenter 41 3.1 Introduction 41 3.2 Game theory 42 3.2.1 Normal formulation 42 3.2.2 Payoff choice and utility function 43 3.2.3 Strategy choice and Nash equilibrium 45 3.3 Cloud resource allocation model 46 3.3.1 Bid-shared auction 46 3.3.2 Non-cooperative game 47 3.4 Nash equilibrium allocation algorithms 48 3.4.1 Bid functions 48 3.4.2 Parameters estimation 50 3.4.3 Equilibrium price 53 3.5 Implementation in a cloud datacenter 55 3.5.1 Cloudsim toolkit 55 3.5.2 Communication among entities 55 3.5.3 Bidding algorithms 57 3.5.4 Comparison of forecasting methods 57 3.6 Concluding remarks 61... 4 Multi-dimensional data analysis in a cloud datacenter 63 4.1 Introduction 63 4.2 Pre-computing 64 4.2.1 Data cube 65 4.2.2 Sparse cube 65 4.2.3 Reuse of previous query results 66 4.2.4 Data compressing 66 4.3 Data indexing 67 4.4 Data partitioning 68 4.4.1 Data partitioning methods 68 4.4.2 Horizontal partitioning of a multi-dimensional dataset 70 4.4.3 Vertical partitioning of a multi-dimensional dataset 73 4.5 Data replication 75 4.6 Query processing parallelism 75 4.6.1 Inter-and intra-operators 76

IX 4.6.2 Exchange operator 77 4.6.3 SQL operator parallelization 78 4.7 Concluding remarks 84 Data intensive applications with MapReduce 85 5.1 Introduction 85 5.2 MapReduce: New parallel computing model in cloud computing.. 86 5.2.1 Dataflow model 86 5.2.2 Two frameworks: GridGain versus Hadoop 89 5.2.3 Communication cost analysis 90 5.3 Distributed data storage underlying MapReduce 92 5.3.1 Google file system 92 5.3.2 Distributed cache memory 93 5.3.3 Data accessing 94 5.4 Large-scale data analysis based on MapReduce 95 5.4.1 Data query languages 96 5.4.2 Data analysis applications 96 5.4.3 Comparison with shared-nothing parallel databases 97 5.5 SimMapReduce: Simulator for modeling MapReduce framework 100 5.5.1 Multi-layer architecture 101 5.5.2 Input and output of simulator 102 5.5.3 Implementation details of simulator 105 5.5.4 Modeling process 107 5.6 Concluding remarks 109 Large-scale multi-dimensional data aggregation 111 6.1 Introduction Ill 6.2 Data organization Ill 6.2.1 Computations in data explorations 114 6.2.2 Multiple group-by query 117 6.3 Choosing a right MapReduce framework 118 6.3.1 Advantages of GridGain 118 6.3.2 Combiner support in Hadoop and GridGain 119 6.3.3 Realizing MapReduce applications with GridGain 119 6.3.4 Workflow analysis of GridGain procedure 120 6.4 Parallelizing single group-by query with MapReduce 122 6.5 Parallelizing multiple group-by query with MapReduce 122 6.5.1 Data partitioning and data placement 123 6.5.2 MapReduce model-based implementation 123 6.5.3 MapCombineReduce model-based implementation 126 6.6 Cost estimation 128 6.6.1 MapReduce model-based implementation 128 6.6.2 MapCombineReduce model-based implementation 131 6.6.3 Comparison of implementations 132 6.7 Concluding remarks 133

X 7 Multi-dimensional data analysis optimization 135 7.1 Introduction 135 7.2 Data-locating based job-scheduling 135 7.2.1 Job-scheduling implementation 136 7.2.2 Two-level scheduling 136 7.2.3 Alternative job-scheduling schemes 137 7.3 Improvements by speed-up measurements 137 7.3.1 Horizontal partitioning 138 7.3.2 Vertical partitioning 141 7.4 Improvements by affecting factors 143 7.4.1 Query selectivity 144 7.4.2 Side effects 144 7.5 Improvement by cost estimation 145 7.5.1 Horizontal partitioning 146 7.5.2 Vertical partitioning 149 7.5.3 Comparison of partitioning 150 7.6 Compressed data structures 151 7.6.1 Data structure description 151 7.6.2 Data structures for storing recordld-list 152 7.6.3 Compressed... data structures for different dimensions 152 7.6.4 Bitmap sparcity and compressing 154 7.7 Concluding remarks 155 8 Real-time scheduling with MapReduce 157 8.1 Introduction 157 8.2 Real-time scheduling problem 158 8.2.1 Real-time task 158 8.2.2 Processing resource 159 8.2.3 Scheduling algorithms 160 8.3 Schedulability test in the cloud datacenter 160 8.3.1 Pseudo-polynomial complexity 161 8.3.2 Polynomial complexity 162 8.3.3 Constant complexity 163 8.4 Utilization bounds for schedulability testing 164 8.4.1 Classical bound 164 8.4.2 Closer periods 165 8.4.3 Harmonic chains 165 8.4.4 Hyperbolic bound 165 8.5 Real-time task scheduling with MapReduce 166 8.5.1 System model 166 8.5.2 MapReduce segmentation 167 8.5.3 Worst pattern for a schedulable task set 168 8.6 Reliability indication methods 174 8.6.1 Reliability indicator 174 8.6.2 Schedulability test conditions 176

xi 8.6.3 Comparison of rate monotonic conditions 177 8.6.4 Comparison of deadline monotonic conditions 178 8.7 Concluding remarks 182 9 Future for cloud computing 183 Bibliography 187 Index 203