Technische Universität München Institut für Informatik Lehrstuhl für Rechnertechnik und Rechnerorganisation Automatic Performance Engineering Workflows for High Performance Computing Ventsislav Petkov Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzende(r): Prüfer der Dissertation: Univ.-Prof. Dr. Helmut Krcmar 1. Univ.-Prof. Dr. Hans Michael Gerndt 2. Univ.-Prof. Dr. Felix Gerd Eugen Wolf Rheinisch-Westfälische Technische Hochschule Aachen Die Dissertation wurde am 25.09.2Ü13 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 03.02.2014 angenommen.
Page Acknowledgements iii Abstract v List of Figures xiii ListofTables xvii 1. Introduction 1 1.1. Motivation and Problem Statement 1 1.2. Performance Analysis and Tuning Methodology 2 1.3. Process Automation and Standardization 4 1.4. Contributions of This Work 5 1.5. Outline of This Work 6 1. Theoretical Background and Technological Overview 11 2. Software Development Life-Cycle 13 2.1. Software Requirements Engineering 13 2.2. Software Design 14 2.3. Software Construction 15 2.4. Software Testing 16 2.5. Software Maintenance 17 3. Process Automation and Design of Workflows 19 3.1. Foundations of Process Automation 20 3.2. Process Automation Languages and Standards 24 3.3. Business Process Management Suites 30 3.4. Scientific Workflow Automation Tools 33 ix
.' Contents 4. Supportive Software Development Tools 37 4.1. Revision Control Systems 37 4.2. Client-Server Repository Model 38 4.3. Distributed Repository Model 41 5. Related Work 45 5.1. The Need for Parallel Programming 46 5.2. Performance Engineering Automation 47 5.3. Performance Engineering Tools 49 5.4. Performance Tuning Libraries and Frameworks 62 5.5. Eclipse and the Parallel Tools Platform 64 II. PAThWay to Performance Analysis and Tuning Workflows 69 6. Performance Engineering Workflows 71 6.1. Performance Engineering Workflows. 71 6.2. Workflow for Scalability Analysis of Parallel Applications 72 6.3. Workflow for Cross-Platform Performance Analysis 74 6.4. Workflow for Code Migration and Tuning on GPGPUs 76 6.5. Summary and Requirements Overview 78 7. Architecture for Automation of Performance Engineering Processes 79 7.1. Architecture for Automation of Performance Engineering Processes 79 7.2. PAThWay Graphical User Interface 80 7.3. Logic Implementation Layer 81 7.4. Supportive Modules Layer 83 8. Workflow Modeling Environment 85 8.1. Design Goals 86 8.2. jbpm Workflow Execution Environment 86 8.3. Business Process Model and Notation 2.0 90 8.4. PAThWay'sCustomDomain-SpecificBPMNNodes 98 9. Internal Data Storage Repository 113 9.1. Motivation and Design Goals 113 9.2. Database Entities and Their Application 114 9.3. Object-Relational Mapping using Hibernate 120 10. Project Documentation Module 123 10.1. Motivation and Design Goals 123 10.2. EclipseWiki and Its Features 124 10.3. Integration of EclipseWiki with PAThWay 125 x
11. Other Supportive Modules 129 11.1. Internal Supportive Modules 129 11.2. Revision Control Module 130 11.3. Environment Detection Module 131 11.4. Runtime Manager and the Parallel Tools Platform Interface 133 11.5. Experiments Browser 136 11.6. Other Graphical User Interfaces 138 III. Performance Engineering Workflows in Action 143 12. Scalabiliry Analysis Workflow 145 12.1. Scalability Analysis Process 145 12.2. Workflow Model 146 12.3. LRZ Linux Cluster 147 12.4. NAS Parallel Benchmarks Multi-Zone 147 12.5. Workflow Execution 148 12.6. Results Exploration 149 13. Cross-Platform Memory Analysis Workflow 153 13.1. Cross-Platform Memory Analysis Process 153 13.2. Workflow Model 154 13.3. STREAM Memory Benchmark 155 13.4. Runtime Environment 156 13.5. Generic Memory Analysis Strategy for Periscope 157 13.6. Workflow Execution 158 13.7. Results Exploration 159 14. Exhaustive Benchmarking Workflow 161 14.1. Exhaustive Benchmarking Process 161 14.2. Workflow Model 162 14.3. SPEC MPI-2007 Benchmarks 164 14.4. Online Data Clustering Support for Periscope 164 14.5. Workflow Execution 166 15. Generic Performance Tuning Workflow 169 15.1. Performance Tuning and Its Application in HPC Centers 169 15.2. Generic Performance Tuning Workflow 170 16. Conclusion 177 16.1. Summary 177 16.2. Future Work 180 Appendices 183 xi
A. Glossary 185 B. Custom Domain-Specific BPMN Nodes: XML Definitions 191 B.l. Application Configuration 191 B.2. MPI and OpenMP Configuration 192 B.3. Target HPC System Selection 192 B.4. Performance Tool Selection 193 B.5. Source Code Instrumentation 193 B.6. Experiments Creation 194 B.7. Execution Experiment 195 B.8. Runtime Creation Manager 196 B.9. Load Custom Performance Results to the Database 197 B. 10. Store Additional Information to an Experiment 198 B.l 1. Node for Interactive Questions 199 B. 12. Execute a Remote Process 200 C. Internal Database Scheine and Data Persistence Classes 203 C. l. Scheme of the Internal Database 203 C. 2. Data Persistence Classes 217 D. Documentation Module: Modifications to EclipseWiki 219 D. I. PAThWay Extension Classes 219 D.2. Modifications to EclipseWiki's Internal Classes 220 D. 3. Example of a Wiki Markup of a Historical Note 222 E. Other Supportive Modules 223 E. l. Parallel Tools Platform Interface Module 223 E.2. Experiments Browser GUI 225 F. Bibliography 227 xii