Research Statement. Hung-Wei Tseng

Size: px
Start display at page:

Download "Research Statement. Hung-Wei Tseng"

Transcription

1 Research Statement Hung-Wei Tseng I have research experience in many areas of computer science and engineering, including computer architecture [1, 2, 3, 4], high-performance and reliable storage systems [5, 6, 3], software runtime systems [7, 8], programming languages [1, 7], compilers [9], embedded systems [10, 11, 12, 13] and computer networks [14, 15]. Much of my research springs from the observation that entrenched programming and execution models models do not take advantage of modern parallel and heterogeneous computer architectures, creating redundancies and limiting applications ability to use computing resources. The resulting software programs waste the potential of modern computer systems and undermine their effectiveness. Therefore, my research projects focus on making computation more efficient. I started pursuing this route with my PhD thesis on the data-triggered threads (DTT) model. The DTT model eliminates redundant computation and creates non-traditional parallelism for applications running on multi-core processors through microarchitecture, programming languages, software runtime systems, and compiler optimizations. Since receiving my PhD, I have also led a large-scale project that builds efficient data storage and communication mechanisms for big data applications in heterogeneous computing systems that include high-speed non-volatile memory devices, GPU accelerators, and FPGAs. These projects have laid the foundation for my future research, in which I hope to build efficient heterogeneous parallel computers and IoT (Internet of Things) for emerging applications by rethinking architectures, programming languages, systems, and compilers. Below, I describe my research projects at UCSD and sketch my future research directions. Data-triggered threads Data-triggered threads (DTT) [1, 2] is a programming and execution model that avoids redundant computation and exploits parallelism by initiating computation only when the application changes memory content. With the DTT model, code that depends on changing data can immediately execute in parallel, while code depending on data that remains the same can be skipped, avoiding redundant computation. I am the principal designer and developer of DTT. My thesis research makes the following contributions. (1) It defines a programming model, based on imperative programming languages, that allows programmers to express computation in a way that exposes redundancies and identifies new opportunities for parallel execution. (2) It proves that the DTT model requires only a small amount of microarchitectural change. (3) It provides a software-only solution for executing DTT applications in existing architectures. (4) It demonstrates how legacy programs can take advantage of the DTT model without any programmer intervention, by using a transparent compiler-only transformation. Viewed as a whole, my research projects in the DTT model exploit architectures, programming languages, runtime systems and compilers to improve applications performance and energy efficiency. DTT model and microarchitecture Most computers today use the von Neumann model, by which applications initiate parallelism based on the program counter. This conventional approach limits systems ability to make use of parallelism. In addition, this model incurs significant redundant computation. Our research shows that loading redundant values (i.e. unchanged values previously loaded from the same addresses) accounts for more than 70% of memory loads, and that more than half of the computations that follow these loads are unnecessary. In my research, I defined a set of language extensions to C/C++ that allow programmers to describe applications using the DTT model. Unlike other proposals that try to reduce redundant computation, the DTT model needs only a small amount of hardware support. The simulation results [1] show that the DTT model can improve performance by 45% on SPEC2000 applications, and a significant portion of this performance gain is from eliminating redundant computation. 1

2 DTT runtime system To increase the applicability of the DTT model, I designed a software-only framework. I applied several optimizations to minimize the multithreading overhead of real systems. In addition, the runtime system dynamically and transparently disables DTT when a DTT code section may potentially underperform the conventional model. The runtime system achieves a 15% performance improvement on SPEC2000 applications using an additional thread and, when DTT parallelism is added to traditional parallelism, a 19 speedup of PARSEC applications [7]. To provide better support for fine-grained massive data-level parallelism, I further extended the runtime system and the DTT model to allow out-of-order execution for multiple data-triggered threads. The extended DTT model also enables the runtime system to schedule tasks according to data locations and further improve performance. The results in [8] demonstrate that the DTT model can effectively overlap computation with I/O and achieve better scalability than traditional parallelism. CDTT compiler While the original DTT proposal relies on programmers efforts to achieve performance improvement, I have also been working on an LLVM-based compiler, CDTT, to automatically generate DTT for legacy C/C++ applications [9]. Even without profile data, the DTT compiler can identify code that contains redundant behavior, which is where the DTT model provides the largest performance advantages. In most applications, the compiled binary running on the software-only DTT runtime system can achieve nearly the same level of performance as programmers modifications, with an average of 10% performance gain for SPEC2000 [9]. Efficient Heterogeneous Systems for Data-Intensive Applications The growing size of application data, coupled with the emergence of heterogeneous computing resources and high-performance non-volatile memory technologies and network devices, is reshaping the computing landscape. However, programming and execution models on these platforms still follow the CPU-centric approach, which results in inefficiencies. I am currently leading a group of 6 PhD students and 2 undergraduate students in a project to improve the performance of data-intensive applications in these systems. Within 15 months, the project has made the following contributions: (1) We designed a system that enables peer-to-peer data transfers between SSDs and GPUs, eliminating redundant CPU and main memory operations. (2) We designed a simplified API and system software stack that improves data transfer performance and accelerates applications. (3) We demonstrated that exploiting the processing power inside storage devices to deserialize application objects improves energy efficiency without sacrificing performance. The following paragraphs describe these projects in more detail. Efficiently Supplying Data in Computers As computers become heterogeneous, the demands of exchanging data among storage and computing devices increase. One set of GPU benchmarks we studied left the GPU idle for 54% of the total execution time because of stalls due to data transfers [3]. Current programming models in heterogeneous computing systems still transfer data through the CPU and the main memory, regardless of the fact that the majority of computation may not be using the CPU. The result is redundant data copies that consume CPU time, waste memory bandwidth, and occupy memory space. In addition, current programming models require applications to set up the data transfer, preventing the applications from dynamically utilizing more efficient data transfer mechanisms. To address the above deficiencies, my team and I re-engineered the system to support peer-to-peer data transfers between an SSD and a GPU, bypassing the CPU and the main memory. We defined an application interface that frees applications from the task of setting up data routes. We designed a runtime system to dynamically choose the most efficient route to carry application data. A real-world evaluation shows that my proposed design can improve application performance by 46% and reduce energy use by 28%, without modifying the computation kernels. Such a system is even more effective in a multiprogrammed server workload as we can improve the utilization of computing resources and receive 50% performance gain [3]. The resulting system is now the backbone of many other research projects in the group. I am advising several students in my group as they work to develop database systems and high-performance MapReduce framework using this platform. 2

3 Efficiently Using Computing Resources The emergence of heterogeneous computing systems also encourages us to re-examine the role of the CPU and make greater use of the processing resources spread throughout the system. In my research, I observed that using the CPU to deserialize application objects from text files accounts for 64% of total execution time and prevents applications from sending data directly between storage devices and heterogeneous accelerators. At the same time, emerging NVMe SSDs contain energy-efficient embedded cores that allow us to perform object deserialization while bypassing the system overhead. I worked with two undergraduate researchers to move deserialization onto unused processing resources within the SSD. By offloading object deserialization from the CPU to the SSD, we were able to speed up applications by 1.39 and reduce energy consumption by 42%. This work has demonstrated the value of redefining the interaction between applications and storage devices. Accordingly, I am working with the group to provide innovative SSD interface and network semantics that help eliminate inefficient CPU code and improve system efficiencies. Future Directions Because of the limitations imposed by dark silicon, computer systems are shifting to rely more on parallel and heterogeneous architectures. In this environment, the overhead of moving data among devices becomes critical for performance, especially when the application needs to deal with a large amount of data. While my prior work helps address the problem of redundant computation, as well as unnecessary data movement (including cache-to-cache, memory-to-memory, and storage-to-memory), the fundamental demands of moving data from data storage to the device memory remain a challenge for high-performance computing resources. To reduce the amount of data movement and improve the scalability of computing, I would like to bring computation closer to data sources, so that the system propagates computation and data to other computing resources only when doing so is beneficial. To provide this kind of in-house computing power for data sources (including storage devices, network interface cards, and memory modules), I plan to investigate efficient processor designs on these devices. In addition, I will enhance and extend programming tools so that programmers can easily take advantage of this new architecture. The ultimate goal is to perform computation at the optimum place within every kind of computer system, including heterogeneous, parallel computers and emerging larger-scale systems like IoT. In the following, I outline several specific research topics directed toward my research goal. Embedding computing power in different layers of the system Embedding computing power in different layers of data storage is not just a matter of adding processors into peripheral devices. Each type of hardware s specific characteristics affect the design philosophy of processors that will work on that device. The target applications running in the system also drive design decisions and require different trade-offs in the architecture. This line of inquiry raises many research questions. For example, what types of processor architecture are needed to balance the various performance, power, energy, and hardware costs of different types of data storage and I/O devices? Which layers of the memory hierarchy and what kinds of I/O devices need computing power? How do these new processors in the memory/storage hierarchy affect the role and design of the CPU? How can we efficiently share data but still maintain consistency and coherence among processors on different devices in a system? What architectural supports are required to efficiently and dynamically move computation across different processing units in the new design? The computing scenarios of data storage and I/O devices are different from those of CPUs in several respects, including latencies, bandwidth, and parallelism. For example, the access time of current storageclass non-volatile memory technology is still orders of magnitude slower than on-chip caches. At the same time, these devices also offer rich internal parallelism. In addition, workloads that are suitable for processing inside storage or I/O devices can exhibit different behaviors from those that current CPUs are optimized for. As a result, we will need brand new processor architectures to efficiently process these workloads. With heterogeneous processors taking on the burden of computation in the system, we can revisit the CPU design and refocus the CPU on computing-intensive workloads. Sharing data efficiently among different processor-equipped devices requires new models of coherence and consistency. Even in the current computation model, maintaining coherence across high-speed volatile 3

4 memories with different latencies still incurs significant overhead. If the system is to extend memory coherence and consistency even further (e.g. to the non-volatile memory that most storage devices use) data persistence becomes even more challenging, as does the problem of latency. Easy-to-use programming model and efficient system software As computer systems offer increased computing resources, writing a program that best utilizes those resources can potentially become enormously complicated yet still provide only limited flexibility. To efficiently manage tasks and communications on different computing resources, we must rethink programming models, as well as the design of software interfaces, runtime systems and operating systems. Extending existing programming models and their runtime systems, such as Spark and MapReduce, is the first step in this direction. With these high-level programming models hiding details of hardware architecture and data storage, programmers are able to distribute computation to different heterogeneous devices by composing a single program. However, designing a lightweight and efficient runtime system that supports these programming models in a system with tens of computing devices (or more) presents a wealth of research problems. Programming models like the DTT model that trigger asynchronous, event-driven and data-aware parallelism can also be an alternative programming framework, as these models avoid the latency and resource utilization issues in Spark and MapReduce. A programming model like DTT that is compatible with existing high-level programming language presents the opportunity to explore ways of using legacy code to generate new code that can make effective use of emerging computing resources. (My work on the CDTT compiler described above is only one example.) Near-data processing for IoT Bringing computation closer to data locations is beneficial for systems where communication is the most expensive cost. In the IoT, where the latency and energy consumption required to exchange data through wireless communication generates significant overhead, applying the concept of near-data processing is especially valuable. My previous work in this area [14, 15] helps reduce the energy consumption of wireless communication, but fundamentally changing the computation model can improve the problem even further. As with supporting processing inside devices with data storage in a computer, bringing computation closer to IoT devices requires both hardware and software supports. However, designing a processor on tiny wireless devices is more challenging than doing so within a single node computer, because we must both leave most computation in-house to reduce the amount of outgoing data and stay within a limited energy budget. Processors that combine an energy-efficient, general-purpose processor with hardware accelerators would be a compelling choice. To make programming IoT systems as easy as writing programs in a single computer, I also plan to explore a programming model design that allows the programmer to easily configure the task but also requires the least possible middleware overhead from each device. To further reduce the cost of communication, I am also interested in developing a lightweight, RDMA-like protocol that bypasses part of the overhead in the Internet stack. Finally, since many architectural design decisions are determined by the needs of real applications, I also hope to conduct interdisciplinary research projects with researchers from health care, the social sciences, and the field of human-computer interactions to figure out the behavior of IoT applications. The significant changes in computer architectures and application demands are forcing us to rethink computing models, including all aspects of architectures, programming languages, compilers, and systems. In my future work as a professor, I plan to focus on bridging the gap between emerging computer architectures and applications, and to conduct interdisciplinary projects that will help shed light on these emerging issues. References [1] H.-W. Tseng and D. M. Tullsen, Data-triggered threads: Eliminating redundant computation, in 17th International Symposium on High Performance Computer Architecture, HPCA 2011, pp ,

5 [2] H.-W. Tseng and D. M. Tullsen, Eliminating redundant computation and exposing parallelism through data-triggered threads, IEEE Micro, Special Issue on the Top Picks from Computer Architecture Conferences, vol. 32, pp , [3] H.-W. Tseng, Y. Liu, M. Gahagan, J. Li, Y. Jin, and S. Swanson, Gullfoss: Accelerating and simplifying data movement among heterogeneous computing and storage resources, Tech. Rep. CS , Department of Computer Science and Engineering, University of California, San Diego technical report, [4] C.-L. Yang, A. R. Lebeck, H.-W. Tseng, and C.-H. Lee, Tolerating memory latency through push prefetching for pointer-intensive applications, Transactions on Architecture and Code Optimization (TACO), vol. 1, no. 4, pp , [5] H.-W. Tseng, L. M. Grupp, and S. Swanson, Understanding the impact of power loss on flash memory, in 48th Design Automation Conference, DAC 2011, pp , [6] H.-W. Tseng, L. M. Grupp, and S. Swanson, Underpowering NAND flash: Profits and perils, in 48th Design Automation Conference, DAC 2013, pp. 1 6, [7] H.-W. Tseng and D. M. Tullsen, Software data-triggered threads, in ACM SIGPLAN 2012 Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA 2012, pp , [8] H.-W. Tseng and D. M. Tullsen, Data-triggered multithreading for near data processing, in 1st Workshop on Near-Data Processing, WoNDP 2013, [9] H.-W. Tseng and D. M. Tullsen, CDTT: Compiler-generated data-triggered threads, in 20th International Symposium on High Performance Computer Architecture, HPCA 2014, pp , [10] H.-L. Li, C.-L. Yang, and H.-W. Tseng, Energy-aware flash memory management in virtual memory system, IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 16, no. 8, pp , [11] H.-W. Tseng, H.-L. Li, and C.-L. Yang, An energy-efficient virtual memory system with flash memory as the secondary storage, in 2006 International Symposium on Low Power Electronics and Design, ISLPED 2006, pp , [12] C.-L. Yang, H.-W. Tseng, C.-C. Ho, and J.-L. Wu, Software-controlled cache architecture for energy efficiency, IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 5, pp , [13] C.-L. Yang, H.-W. Tseng, and C.-C. Ho, Smart cache: An energy-efficient D-cache for a software MPEG-2 video decoder, in 2003 Joint Conference of the Fourth International Conference on Information, Communications and Signal Processing, 2003 and Fourth Pacific Rim Conference on Multimedia, ICICS-PCM 2003, pp , [14] S.-H. Yang, H.-W. Tseng, E. H.-K. Wu, and G.-H. Chen, Utilization based duty cycle tuning mac protocol for wireless sensor networks, in IEEE Global Telecommunications Conference, 2005, GLOBECOM 2005, pp , [15] H.-W. Tseng, S.-H. Yang, P.-Y. Chuangi, E. H.-K. Wu, and G.-H. Chen, An energy consumption analytic model for a wireless sensor MAC protocol, in IEEE 60th Vehicular Technology Conference, VTC2004-Fall, pp ,

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Data Center and Cloud Computing Market Landscape and Challenges

Data Center and Cloud Computing Market Landscape and Challenges Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Driving force. What future software needs. Potential research topics

Driving force. What future software needs. Potential research topics Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #

More information

Energy-Efficient, High-Performance Heterogeneous Core Design

Energy-Efficient, High-Performance Heterogeneous Core Design Energy-Efficient, High-Performance Heterogeneous Core Design Raj Parihar Core Design Session, MICRO - 2012 Advanced Computer Architecture Lab, UofR, Rochester April 18, 2013 Raj Parihar Energy-Efficient,

More information

Accelerate Cloud Computing with the Xilinx Zynq SoC

Accelerate Cloud Computing with the Xilinx Zynq SoC X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra

More information

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

How To Build An Ark Processor With An Nvidia Gpu And An African Processor Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced

More information

How To Build A Cloud Computer

How To Build A Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/cs5540/spring2014/index.html

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/cs5540/spring2014/index.html Datacenters and Cloud Computing Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/cs5540/spring2014/index.html What is Cloud Computing? A model for enabling ubiquitous, convenient, ondemand network

More information

System Models for Distributed and Cloud Computing

System Models for Distributed and Cloud Computing System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Network Attached Storage. Jinfeng Yang Oct/19/2015

Network Attached Storage. Jinfeng Yang Oct/19/2015 Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability

More information

Virtual machine interface. Operating system. Physical machine interface

Virtual machine interface. Operating system. Physical machine interface Software Concepts User applications Operating system Hardware Virtual machine interface Physical machine interface Operating system: Interface between users and hardware Implements a virtual machine that

More information

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,

More information

Universal Flash Storage: Mobilize Your Data

Universal Flash Storage: Mobilize Your Data White Paper Universal Flash Storage: Mobilize Your Data Executive Summary The explosive growth in portable devices over the past decade continues to challenge manufacturers wishing to add memory to their

More information

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance Alex Ho, Product Manager Innodisk Corporation Outline Innodisk Introduction Industry Trend & Challenge

More information

Technology Insight Series

Technology Insight Series Evaluating Storage Technologies for Virtual Server Environments Russ Fellows June, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved Executive Summary

More information

Improving Grid Processing Efficiency through Compute-Data Confluence

Improving Grid Processing Efficiency through Compute-Data Confluence Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform

More information

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND -Based SSDs Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim Department of Computer Science and Engineering, Seoul National University,

More information

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,

More information

BSC vision on Big Data and extreme scale computing

BSC vision on Big Data and extreme scale computing BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,

More information

Capstone Overview Architecture for Big Data & Machine Learning. Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015

Capstone Overview Architecture for Big Data & Machine Learning. Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015 Capstone Overview Architecture for Big Data & Machine Learning Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015 Accelerators Memory Traffic Reduction Memory Intensive Arch. Context-based Prefetching Deep

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

Operating System for the K computer

Operating System for the K computer Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements

More information

RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS

RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS AN INSTRUCTION WINDOW THAT CAN TOLERATE LATENCIES TO DRAM MEMORY IS PROHIBITIVELY COMPLEX AND POWER HUNGRY. TO AVOID HAVING TO

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track)

Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track) Plan Number 2009 Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track) I. General Rules and Conditions 1. This plan conforms to the regulations of the general frame of programs

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland ory nel Storage ( M C S ) Demystified Jerome McFarland Principal Product Marketer AGENDA + INTRO AND ARCHITECTURE + PRODUCT DETAILS + APPLICATIONS THE COMPUTE-STORAGE DISCONNECT + Compute And Data Have

More information

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1 Distributed Systems REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1 1 The Rise of Distributed Systems! Computer hardware prices are falling and power increasing.!

More information

Parallel Firewalls on General-Purpose Graphics Processing Units

Parallel Firewalls on General-Purpose Graphics Processing Units Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering

More information

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010 Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,

More information

Chapter 13 Embedded Operating Systems

Chapter 13 Embedded Operating Systems Operating Systems: Internals and Design Principles Chapter 13 Embedded Operating Systems Eighth Edition By William Stallings Embedded System Refers to the use of electronics and software within a product

More information

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

How To Make A High Throughput Computing Data Center

How To Make A High Throughput Computing Data Center Technical White Paper High Throughput Computing Data Center Architecture Thinking of Data Center 3.0 Abstract In the last few decades, data center (DC) technologies have kept evolving from DC 1.0 (tightly-coupled

More information

HyperQ Storage Tiering White Paper

HyperQ Storage Tiering White Paper HyperQ Storage Tiering White Paper An Easy Way to Deal with Data Growth Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com [email protected]

More information

Building Web-based Infrastructures for Smart Meters

Building Web-based Infrastructures for Smart Meters Building Web-based Infrastructures for Smart Meters Andreas Kamilaris 1, Vlad Trifa 2, and Dominique Guinard 2 1 University of Cyprus, Nicosia, Cyprus 2 ETH Zurich and SAP Research, Switzerland Abstract.

More information

Reducing Configuration Complexity with Next Gen IoT Networks

Reducing Configuration Complexity with Next Gen IoT Networks Reducing Configuration Complexity with Next Gen IoT Networks Orama Inc. November, 2015 1 Network Lighting Controls Low Penetration - Why? Commissioning is very time-consuming & expensive Network configuration

More information

How To Understand The Concept Of A Distributed System

How To Understand The Concept Of A Distributed System Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz [email protected], [email protected] Institute of Control and Computation Engineering Warsaw University of

More information

LinuxWorld Conference & Expo Server Farms and XML Web Services

LinuxWorld Conference & Expo Server Farms and XML Web Services LinuxWorld Conference & Expo Server Farms and XML Web Services Jorgen Thelin, CapeConnect Chief Architect PJ Murray, Product Manager Cape Clear Software Objectives What aspects must a developer be aware

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

Chapter 2: OS Overview

Chapter 2: OS Overview Chapter 2: OS Overview CmSc 335 Operating Systems 1. Operating system objectives and functions Operating systems control and support the usage of computer systems. a. usage users of a computer system:

More information

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our

More information

Storage Solutions to Maximize Success in VDI Environments

Storage Solutions to Maximize Success in VDI Environments Storage Solutions to Maximize Success in VDI Environments Contents Introduction: Why VDI?. 1 VDI Challenges. 2 Storage Solutions Optimized for VDI. 3 Conclusion. 6 Brought to you compliments of: Introduction:

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides

More information

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems SOFT 437 Software Performance Analysis Ch 5:Web Applications and Other Distributed Systems Outline Overview of Web applications, distributed object technologies, and the important considerations for SPE

More information

Vortex White Paper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems

Vortex White Paper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems Vortex White Paper Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems Version 1.0 February 2015 Andrew Foster, Product Marketing Manager, PrismTech Vortex

More information

Dynamic resource management for energy saving in the cloud computing environment

Dynamic resource management for energy saving in the cloud computing environment Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

EHA: The Extremely Heterogeneous Architecture

EHA: The Extremely Heterogeneous Architecture EHA: The Extremely Heterogeneous Architecture Shaoshan Liu 1, Won W. Ro 2, Chen Liu 3, Alfredo C. Salas 4, Christophe Cérin 5, Jian-Jun Han 6 and Jean-Luc Gaudiot 7 1 Microsoft, WA, U.S.A. 2 Yonsei University,

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang

The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang Nanjing Communications

More information

Architecture Support for Big Data Analytics

Architecture Support for Big Data Analytics Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH) 1

More information

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division In this talk Big data storage: Current trends Issues with current storage options Evolution of storage to support big

More information

Disk Storage Shortfall

Disk Storage Shortfall Understanding the root cause of the I/O bottleneck November 2010 2 Introduction Many data centers have performance bottlenecks that impact application performance and service delivery to users. These bottlenecks

More information

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Introduction Big Data Analytics needs: Low latency data access Fast computing Power efficiency Latest

More information

GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS

GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS Embedded Systems White Paper GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS September 2009 ABSTRACT Android is an open source platform built by Google that includes an operating system,

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

Secured Embedded Many-Core Accelerator for Big Data Processing

Secured Embedded Many-Core Accelerator for Big Data Processing Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland,

More information

Design and Implementation of the Heterogeneous Multikernel Operating System

Design and Implementation of the Heterogeneous Multikernel Operating System 223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

high-performance computing so you can move your enterprise forward

high-performance computing so you can move your enterprise forward Whether targeted to HPC or embedded applications, Pico Computing s modular and highly-scalable architecture, based on Field Programmable Gate Array (FPGA) technologies, brings orders-of-magnitude performance

More information

Performance Oriented Management System for Reconfigurable Network Appliances

Performance Oriented Management System for Reconfigurable Network Appliances Performance Oriented Management System for Reconfigurable Network Appliances Hiroki Matsutani, Ryuji Wakikawa, Koshiro Mitsuya and Jun Murai Faculty of Environmental Information, Keio University Graduate

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Desktop Virtualization and Storage Infrastructure Optimization

Desktop Virtualization and Storage Infrastructure Optimization Desktop Virtualization and Storage Infrastructure Optimization Realizing the Most Value from Virtualization Investment Contents Executive Summary......................................... 1 Introduction.............................................

More information

SPARC64 VIIIfx: CPU for the K computer

SPARC64 VIIIfx: CPU for the K computer SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Rambus Smart Data Acceleration

Rambus Smart Data Acceleration Rambus Smart Data Acceleration Back to the Future Memory and Data Access: The Final Frontier As an industry, if real progress is to be made towards the level of computing that the future mandates, then

More information

ioscale: The Holy Grail for Hyperscale

ioscale: The Holy Grail for Hyperscale ioscale: The Holy Grail for Hyperscale The New World of Hyperscale Hyperscale describes new cloud computing deployments where hundreds or thousands of distributed servers support millions of remote, often

More information

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Introducing Unisys All in One software based weather platform designed to reduce server space, streamline operations, consolidate

More information

Getting More Performance and Efficiency in the Application Delivery Network

Getting More Performance and Efficiency in the Application Delivery Network SOLUTION BRIEF Intel Xeon Processor E5-2600 v2 Product Family Intel Solid-State Drives (Intel SSD) F5* Networks Delivery Controllers (ADCs) Networking and Communications Getting More Performance and Efficiency

More information

A Study of Application Performance with Non-Volatile Main Memory

A Study of Application Performance with Non-Volatile Main Memory A Study of Application Performance with Non-Volatile Main Memory Yiying Zhang, Steven Swanson 2 Memory Storage Fast Slow Volatile In bytes Persistent In blocks Next-Generation Non-Volatile Memory (NVM)

More information

System Software Integration: An Expansive View. Overview

System Software Integration: An Expansive View. Overview Software Integration: An Expansive View Steven P. Smith Design of Embedded s EE382V Fall, 2009 EE382 SoC Design Software Integration SPS-1 University of Texas at Austin Overview Some Definitions Introduction:

More information

Cloud Based Application Architectures using Smart Computing

Cloud Based Application Architectures using Smart Computing Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products

More information

Data Management for Portable Media Players

Data Management for Portable Media Players Data Management for Portable Media Players Table of Contents Introduction...2 The New Role of Database...3 Design Considerations...3 Hardware Limitations...3 Value of a Lightweight Relational Database...4

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011 A Close Look at PCI Express SSDs Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011 Macro Datacenter Trends Key driver: Information Processing Data Footprint (PB) CAGR: 100%

More information

evm Virtualization Platform for Windows

evm Virtualization Platform for Windows B A C K G R O U N D E R evm Virtualization Platform for Windows Host your Embedded OS and Windows on a Single Hardware Platform using Intel Virtualization Technology April, 2008 TenAsys Corporation 1400

More information

How to Choose your Red Hat Enterprise Linux Filesystem

How to Choose your Red Hat Enterprise Linux Filesystem How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Manjrasoft Market Oriented Cloud Computing Platform

Manjrasoft Market Oriented Cloud Computing Platform Manjrasoft Market Oriented Cloud Computing Platform Aneka Aneka is a market oriented Cloud development and management platform with rapid application development and workload distribution capabilities.

More information

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts Part V Applications Cloud Computing: General concepts Copyright K.Goseva 2010 CS 736 Software Performance Engineering Slide 1 What is cloud computing? SaaS: Software as a Service Cloud: Datacenters hardware

More information