1 Marc Casas Guix Education Ph.D. in Computer Sciences, March 2010 Department of Computer Architecture, Technical University of Catalonia (UPC), Barcelona, Spain. 5-years degree in Applied Mathematics (equivalent to M.S.) June Faculty of Mathematics and Statistics (FME), UPC, Barcelona, Spain. Honors and Awards Best Paper Finalist, International Conference for High Performance Computing, Networking, Storage and Analysis (SC 15). Marie Curie Fellow (Beatriu de Pinós - Marie Curie COFUND 7FP Award) September Now. Scholarship for Research Staff Training (Beca de Formación de Personal Investigador, FPI) from the Spanish Government. September August Best Paper Award at the International Euro-Par Conference on Parallel and Distributed Computing, (Euro-Par) August Scholarship from Computer Architecture Department to develop analytical models of parallel programs. September August Award from Bank of Manresa (Caixa de Manresa) for placement among the top 300 undergraduates in Catalonia during the academic year Employment Barcelona Supercomputing Center, Spain. Senior Researcher June Now Lawrence Livermore National Laboratory, USA. Postdoctoral Researcher May May 2013 Barcelona Supercomputing Center, Spain. PhD Student July April 2010 Journals 1. M. Casas, G. Bronevetsky, Evaluation of HPC applications Memory Resource Consumption via Active Measurement, accepted in IEEE Transactions on Parallel and Distributed Systems (TPDS), D. Chasapis, M. Casas, M. Moreto, R, Vidal, E. Ayguade, J. Labarta and M. Valero, PAR- SECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite, accepted in Transactions on Architecture and Code Optimization (TACO), S. Chen, G. Bronevetsky, B. Li, M. Casas, L. Peng, A framework for evaluating comprehensive fault resilience mechanisms in numerical programs, The Journal of Supercomputing, Volume 71, Issue 8, pages , J. González, J. Giménez, M. Casas, M. Moretó, A. Ramírez, J. Labarta, M. Valero, Simulating Whole Supercomputer Applications, IEEE Micro, Volume 31, Number 31, pages 32-45, M. Casas, H. Servat, R. M. Badia, J. Labarta, Extracting the Optimal Sampling Frequency of Applications Using Spectral Analysis, Concurrency and Computation: Practice and Experience, Volume 23, Number 3, pages , M. Casas, R. M. Badia, J. Labarta, Automatic Phase Detection and Structure Extraction of MPI Applications, International Journal of High Performance Computing Applications (IJHPCA), Volume 24, Number 3, pages , 2010.
2 International Conferences 1. E. Castillo, M. Moreto, M. Casas, L. Alvarez, E. Vallejo, K. Chronaki, R. M. Badia, J. L. Bosque, R. Beivide, E. Ayguade, J. Labarta, M. Valero, CATA: Criticality Aware Task Acceleration for Multicore Processors, accepted in the 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), L. Alvarez, M. Moreto, M. Casas, E. Castillo, X. Martorell, J. Labarta, E. Ayguade, M. Valero, Runtime-Guided Management of Scratchpad Memories in Multicore Architectures, accepted in the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), L. Jaulmes, M. Casas, M. Moreto, E. Ayguade, J. Labarta, M. Valero, Exploiting Asynchrony from Exact Forward Recovery for DUE in Iterative Solvers, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 15), pages 53:1-53:12, L. Alvarez, L. Vilanova, M. Moreto, M. Casas, M. Gonzalez, X. Martorell, N, Navarro, E. Ayguade, M. Valero, Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures, proceedings of the International Symposium in Computer Architecture (ISCA), pages , M. Casas, M. Moreto, L. Alvarez, E. Castillo, D. Chasapis, T. Hayes, L. Jaulmes, O. Palomar, O. Unsal, A. Cristal, E. Ayguad, J. Labarta, M. Valero, Runtime-Aware Architectures. proceedings of the International Euro-Par Conference on Parallel and Distributed Computing, (Euro-Par), pages 16-27, M. Casas, G. Bronevetsky, Active Measurement of Memory Resource Consumption, proceedings of the 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS), pages , M. Casas, G. Bronevetsky, Active Measurement of the Impact of Network Switch Utilization on Application Performance, proceedings of the 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS), pages , M. Schulz, J. Belak, A. Bhatele, P.-T. Bremer, G. Bronevetsky, M. Casas, T. Gamblin, K. Isaacs, I. Laguna, J. Levine, V. Pascucci, D. Richards, B. Rountree, Performance Analysis Techniques for the Exascale Co-Design Process, proceedings of PARCO 2013, Munich, Germany, September M. Casas, B. R. de Supinski, G. Bronevetsky, M. Schulz, Fault Resilience of the Algebraic Multi-Grid Solver, 26nd International Conference on Supercomputing (ICS), pages , G. Llort, M. Casas, H. Servat, K. Huck, J. Giménez, J. Labarta, Trace Spectral Analysis Toward Dynamic Levels of Detail, Proceedings of the IEEE International Conference on Parallel and Distributed Systems (ICPADS), pages , M. Casas, R. M. Badia, J. Labarta, Prediction of Behavior of MPI Applications, 2008 IEEE International Conference on Cluster Computing (Cluster), pages , M. Casas, R. M. Badia, J. Labarta, Automatic analysis of speedup of MPI applications, 22nd ACM International Conference on Supercomputing (ICS), pages , M. Casas, R. M. Badia, J. Labarta, Automatic Phase Detection of MPI Applications,Parallel Computing: Architectures, Algorithms and Applications (ParCo), Volume 15, pages , M. Casas, R. M. Badia, J. Labarta, Automatic Extraction of Structure of MPI Applications Tracefiles, International Euro-Par Conference on Parallel and Distributed Computing (Euro-Par), pages 3-12, 2007.
3 Workshops and Tech Reports Invited Talks 1. R. Vidal, M. Casas, M. Moreto, D. Chasapis, R. Ferrer, X. Martorell, E. Ayguade, J. Labarta, M. Valero, Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads, International Workshop in OpenMP (IWOMP), Volume 9342 of the series Lecture Notes in Computer Science, pages 60-72, D. Prat, C. Ortega, M. Casas, M. Moretó and M. Valero Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM POWER7 in the 5th International Workshop on Adaptive Self-tuning Computing Systems (co-located with HiPEAC 2015) 21 January 2015, Amsterdam, The Netherlands. 3. M. Valero, M. Moreto, M. Casas, E. Ayguade, J. Labarta, Runtime-Aware Architectures: A First Approach, Supercomputing Frontiers and Innovations, Volume 1, Number 1, T. Grass, A. Rico, M. Casas, M. Moretó, A. Ramírez, Evaluating Execution Time Predictability of Task-Based Programs on Multi-Core Processors, in the 7th International Workshop on Multi-/Many-Core Computing Systems (MuCoCoS-2014). August, 2014, Porto, Portugal. 5. Adolfy Hoisie, Kevin J Barker, Greg Bronevetsky, Laura Carrington, Marc Casas, Daniel Chavarria, Roberto Gioiosa, Darren J Kerbyson, Gokcen Kestor, Nathan R Tallent, Ananta Tiwari, Progress report for the X-Stack Meeting. 4th Workshop of the Joint Laboratory for Exascale Computing (JLESC), December 2015, Bonn, Germany. Invitational talk on The Hybrid OmpSs + Charm++ Programming Model. 3rd Workshop of the Joint Laboratory for Exascale Computing (JLESC), June 2015, Barcelona, Spain. Invitational talk on Asynchronous algorithms to mitigate faults recoveries and enable approximate computing SIAM Conference on Computational Science and Engineering, Mini-Symposia on Resilience in Numerical Simulations and Algorithms at Extreme Scale, March 2015, Salt Lake City, talk on Runtime Systems for Fault Tolerant Computing. 2nd Workshop of the Joint Laboratory for Exascale Computing (JLESC), November 2014, Chicago, Illinois. Invitational talk on Exploiting Asynchronous Programming Models to Reduce Faults Impact in Iterative Solvers. HiPEAC Computing Systems Week (CSW), October 8 - October , Athens, Greece. Invited to be a member of a panel on Handling Errors at Multiple Levels: Opportunities and Challenges. 8th International Workshops on Parallel Matrix Algorithms and Applications (PMAA14), June 2 - June , Lugano, Switzerland. Invitational talk on Dealing with Faults in HPC systems. International Conference on Parallel and Distributed Processing (IPDPS), May 2014, Phoenix, Arizona, Research Presentation. 26th ACM International Conference on Supercomputing (ICS), June 25-29, San Servolo Island, Venice, Italy, Research presentation. Technische Universitat Dresden - ZIH Colloquium, May 24, Dresden, Germany, Invitational talk on Automatic Phase Detection and Structure Extraction of Parallel Applications. Schloss Dagstuhl Seminars, Program Developing for Extreme Scale Computing, May 2 - May 6, Dagstuhl, Germany, Invited to give a talk on the Applications of Spectral Analysis in Data Acquisition, Multiplexing Hardware Counters and Architecture Simulation.
4 IEEE International Conference on Cluster Computing(Cluster), September 29- October 1, Tsukuba, Japan, Research presentation. 22nd ACM International Conference on Supercomputing (ICS), June 7-12, Island of Kos, Aegean Sea, Greece, Research presentation. International Conference on Parallel Computing (ParCo), September 3-7, Jülich, Germany, Research presentation. International Euro-Par Conference on Parallel and Distributed Computing (Euro-Par). August 28-31, Rennes, France, Research presentation. Research Projects MontBlanc 3 The main target of the Mont-Blanc 3 project is the creation of a new high-end HPC platform (SoC and node) that is able to deliver a new level of performance / energy ratio whilst executing real applications. The technical objectives are: 1. To design a well-balanced architecture and to deliver the design for an ARM based SoC or SoP (System on Package) capable of providing pre-exascale performance when implemented in the time frame of The predicted performance target must be measured using real HPC applications. 2. To maximise the benefit for HPC applications with new high-performance ARM processors and throughput-oriented compute accelerators designed to work together within the well-balanced architecture. 3. To develop the necessary software ecosystem for the future SoC. Work Package 4 (Compute Efficiency) Leader: Marc Casas IBM-BSC Joint Study Agreement (JSA) on OmpSs for Asynchronous Algorithms This JSA focuses on applications that are likely to benefit from the locality awareness of OmpSs, and the irregular or asynchronous forms of parallelism it supports. These characteristics allow for additional asynchronicity in the execution of parallel tasks (compared to OpenMP) and lower bandwidth requirements. As a result, an application s tolerance for network or memory latency increases, which is an interesting property for the target platform. PI: Marc Casas, Costas Bekas IBM-BSC Joint Study Agreement (JSA) on Adaptive resource management for Power architectures This research focuses on adaptive resource management for improvement of powerperformance metrics associated with current and future POWER-series microprocessors. Both hardware-only and runtime-aided adaptive control systems will be pursued. The collaboration will pursue the development of new adaptive algorithms to exploit prefetching enhancements in current and future POWER architectures and generalized concepts in cross-layer co-optimization for improving power-performance metrics in future POWER systems. Proposals on new harware requirements to support the development of a new generation of hardware-software co-managed adaptative systems will be pursued in this collaboration. PI: Miquel Moreto, Marc Casas, Alper Buyuktosunoglu, Pradip Bose Riding on Moore s Law (RoMoL), RoMoL is a 5-year project funded by an ERC Advanced Grant awarded to Prof. Mateo Valero (GA ). RoMoL involves research in microarchitecture, runtime systems, compilers and programming languages, and has the objective to maximize positive synergies between current research activities in the Computer Sciences Department at BSC. PI: Mateo Valero Institute for Sustained Performance, Energy and Resilience (SUPER), The SUPER project is a broadly-based SciDAC institute with expertise in compilers and other system tools, performance engineering, energy management, and resilience. The goal of the project is to ensure that DOE s computational scientists can successfully exploit the emerging generation of high performance computing (HPC) systems. This goal will be met by providing application scientists with strategies and tools to productively maximize performance, conserve energy, and attain resilience. PI: Bob Lucas
5 Exascale Computing Technologies (ExaCT) ExaCT is a project focused on creating essential capabilities to overcome exascale barriers to predictive simulation. It integrates research on these barriers into 3 key LLNL applications using LLNL s most advanced multicore systems. Specifically, the main targets of the project are: Scalable Multicore Algorithms, Application Level Fault Tolerance, Asynchronous Load Analyzer and Adaptor and Next Generation Debugging Methodologies. PI: Bronis R. de Supinski Reliable High Performance Peta- and Exa-Scale Computing : This project will develop a detailed understanding of the effects of faults on real HPC systems, producing fault models that will enable new tools to identify the root causes of failures and help developers to create more reliable applications and systems by characterizing their vulnerabilities to errors. The ultimate goal of this project is to help create a new generation of highly -productive and cost-efficient supercomputers to enable novel DOE science applications. PI: Greg Bronevetsky IBM/BSC MareIncognito Project (MI) : MI is a bilateral project between BSC and IBM. The project considers several fields that define the technical characteristics and the components design for a new generation of Petascale supercomputers for the year 2011, involving all aspects related to that machine: applications, programming models, performance tools, interconnection and processor architecture, etc. PI: Jesús Labarta High Performance Computing V: Architectures, compilers, operative systems, tools and applications. (Computacion de Altas Prestaciones V TIN ) An R&D project funded by the Spanish Interministerial Commission of Science and Technology (Comision Interministerial de Ciencia y Tecnologia CICYT), a Spanish government agency focused on research and development. PI: Mateo Valero. High Performance Computing IV: Architectures, compilers, operative systems, tools and applications. (Computacion de Altas Prestaciones IV TIN C0201) An R&D project funded by the Spanish Interministerial Commission of Science and Technology (Comision Interministerial de Ciencia y Tecnologia CICYT), a Spanish government agency focused on research and development. PI: Mateo Valero. Directed Bachelor Thesis Directed Master Thesis Directed PhD Thesis Program Committees Analysis of Adaptive Prefetcher Configuration in Advanced Server-Class Processors, Calvin Bulla, June 2015 Parallelization of the Facesim simulator, Raúl Vidal, January 2015 Exploring Scalability Techniques of OmpSs, Iulian Brumar, June 2014 Parallelization techniques of the x264 video encoder, Daniel Ruiz, June 2014 Adaptive and Application Dependant Runtime Guided Hardware Reconfiguration for the IBM POWER7, David Prat, September 2014 Runtime Assisted Cache Memory Optimizations, Vladimir Dimic, July 2015 Transparent Management of Scratchpad Memories in Shared Memory Programming Models, Lluc Àlvarez, December 2015 Program Committee (PC) member of the ACM International Conference on Supercomputing (ICS), Program Committee (PC) member of the 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Program Committee (PC) member of the 1st IEEE Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM2016) External Review Committee (ERC) member of the ACM International Conference on Supercomputing (ICS) 2014.
6 Service Reviewer of the Concurrency and Computation: Practice and Experience Journal, Reviewer for the Euro-Par conference Reviewer for the CCGRID conference Organizer of the European Initiative on Runtime Systems and Architecture Co-Design thematic session in the HiPEAC Computing Systems Week (CSW). Oslo, May Reviewer for the Parallel Computing Journal Reviewer for the ACM Transactions on Architecture and Code Optimization Journal (TACO) Reviewer for the International Conference on Parallel Architectures and Compilation Techniques (PACT) Reviewer for IEEE Transactions on Parallel and Distributed Systems Journal Reviewer for PARA 2010: State of the Art in Scientific and Parallel Computing Conference. Organizer of the Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques Workshop (SLAML), held in conjuntion with the 23rd ACM Symposium on Operating Systems Principles (SOSP), PhD Committees Vladimir Marjanovic, Technical University of Catalonia (UPC), January Stojce Nakov, University of Bordeaux, December Judit Planas, Technical University of Catalonia (UPC), November 2015.