Research Statement. Hung-Wei Tseng
|
|
|
- Cory Baldwin
- 9 years ago
- Views:
Transcription
1 Research Statement Hung-Wei Tseng I have research experience in many areas of computer science and engineering, including computer architecture [1, 2, 3, 4], high-performance and reliable storage systems [5, 6, 3], software runtime systems [7, 8], programming languages [1, 7], compilers [9], embedded systems [10, 11, 12, 13] and computer networks [14, 15]. Much of my research springs from the observation that entrenched programming and execution models models do not take advantage of modern parallel and heterogeneous computer architectures, creating redundancies and limiting applications ability to use computing resources. The resulting software programs waste the potential of modern computer systems and undermine their effectiveness. Therefore, my research projects focus on making computation more efficient. I started pursuing this route with my PhD thesis on the data-triggered threads (DTT) model. The DTT model eliminates redundant computation and creates non-traditional parallelism for applications running on multi-core processors through microarchitecture, programming languages, software runtime systems, and compiler optimizations. Since receiving my PhD, I have also led a large-scale project that builds efficient data storage and communication mechanisms for big data applications in heterogeneous computing systems that include high-speed non-volatile memory devices, GPU accelerators, and FPGAs. These projects have laid the foundation for my future research, in which I hope to build efficient heterogeneous parallel computers and IoT (Internet of Things) for emerging applications by rethinking architectures, programming languages, systems, and compilers. Below, I describe my research projects at UCSD and sketch my future research directions. Data-triggered threads Data-triggered threads (DTT) [1, 2] is a programming and execution model that avoids redundant computation and exploits parallelism by initiating computation only when the application changes memory content. With the DTT model, code that depends on changing data can immediately execute in parallel, while code depending on data that remains the same can be skipped, avoiding redundant computation. I am the principal designer and developer of DTT. My thesis research makes the following contributions. (1) It defines a programming model, based on imperative programming languages, that allows programmers to express computation in a way that exposes redundancies and identifies new opportunities for parallel execution. (2) It proves that the DTT model requires only a small amount of microarchitectural change. (3) It provides a software-only solution for executing DTT applications in existing architectures. (4) It demonstrates how legacy programs can take advantage of the DTT model without any programmer intervention, by using a transparent compiler-only transformation. Viewed as a whole, my research projects in the DTT model exploit architectures, programming languages, runtime systems and compilers to improve applications performance and energy efficiency. DTT model and microarchitecture Most computers today use the von Neumann model, by which applications initiate parallelism based on the program counter. This conventional approach limits systems ability to make use of parallelism. In addition, this model incurs significant redundant computation. Our research shows that loading redundant values (i.e. unchanged values previously loaded from the same addresses) accounts for more than 70% of memory loads, and that more than half of the computations that follow these loads are unnecessary. In my research, I defined a set of language extensions to C/C++ that allow programmers to describe applications using the DTT model. Unlike other proposals that try to reduce redundant computation, the DTT model needs only a small amount of hardware support. The simulation results [1] show that the DTT model can improve performance by 45% on SPEC2000 applications, and a significant portion of this performance gain is from eliminating redundant computation. 1
2 DTT runtime system To increase the applicability of the DTT model, I designed a software-only framework. I applied several optimizations to minimize the multithreading overhead of real systems. In addition, the runtime system dynamically and transparently disables DTT when a DTT code section may potentially underperform the conventional model. The runtime system achieves a 15% performance improvement on SPEC2000 applications using an additional thread and, when DTT parallelism is added to traditional parallelism, a 19 speedup of PARSEC applications [7]. To provide better support for fine-grained massive data-level parallelism, I further extended the runtime system and the DTT model to allow out-of-order execution for multiple data-triggered threads. The extended DTT model also enables the runtime system to schedule tasks according to data locations and further improve performance. The results in [8] demonstrate that the DTT model can effectively overlap computation with I/O and achieve better scalability than traditional parallelism. CDTT compiler While the original DTT proposal relies on programmers efforts to achieve performance improvement, I have also been working on an LLVM-based compiler, CDTT, to automatically generate DTT for legacy C/C++ applications [9]. Even without profile data, the DTT compiler can identify code that contains redundant behavior, which is where the DTT model provides the largest performance advantages. In most applications, the compiled binary running on the software-only DTT runtime system can achieve nearly the same level of performance as programmers modifications, with an average of 10% performance gain for SPEC2000 [9]. Efficient Heterogeneous Systems for Data-Intensive Applications The growing size of application data, coupled with the emergence of heterogeneous computing resources and high-performance non-volatile memory technologies and network devices, is reshaping the computing landscape. However, programming and execution models on these platforms still follow the CPU-centric approach, which results in inefficiencies. I am currently leading a group of 6 PhD students and 2 undergraduate students in a project to improve the performance of data-intensive applications in these systems. Within 15 months, the project has made the following contributions: (1) We designed a system that enables peer-to-peer data transfers between SSDs and GPUs, eliminating redundant CPU and main memory operations. (2) We designed a simplified API and system software stack that improves data transfer performance and accelerates applications. (3) We demonstrated that exploiting the processing power inside storage devices to deserialize application objects improves energy efficiency without sacrificing performance. The following paragraphs describe these projects in more detail. Efficiently Supplying Data in Computers As computers become heterogeneous, the demands of exchanging data among storage and computing devices increase. One set of GPU benchmarks we studied left the GPU idle for 54% of the total execution time because of stalls due to data transfers [3]. Current programming models in heterogeneous computing systems still transfer data through the CPU and the main memory, regardless of the fact that the majority of computation may not be using the CPU. The result is redundant data copies that consume CPU time, waste memory bandwidth, and occupy memory space. In addition, current programming models require applications to set up the data transfer, preventing the applications from dynamically utilizing more efficient data transfer mechanisms. To address the above deficiencies, my team and I re-engineered the system to support peer-to-peer data transfers between an SSD and a GPU, bypassing the CPU and the main memory. We defined an application interface that frees applications from the task of setting up data routes. We designed a runtime system to dynamically choose the most efficient route to carry application data. A real-world evaluation shows that my proposed design can improve application performance by 46% and reduce energy use by 28%, without modifying the computation kernels. Such a system is even more effective in a multiprogrammed server workload as we can improve the utilization of computing resources and receive 50% performance gain [3]. The resulting system is now the backbone of many other research projects in the group. I am advising several students in my group as they work to develop database systems and high-performance MapReduce framework using this platform. 2
3 Efficiently Using Computing Resources The emergence of heterogeneous computing systems also encourages us to re-examine the role of the CPU and make greater use of the processing resources spread throughout the system. In my research, I observed that using the CPU to deserialize application objects from text files accounts for 64% of total execution time and prevents applications from sending data directly between storage devices and heterogeneous accelerators. At the same time, emerging NVMe SSDs contain energy-efficient embedded cores that allow us to perform object deserialization while bypassing the system overhead. I worked with two undergraduate researchers to move deserialization onto unused processing resources within the SSD. By offloading object deserialization from the CPU to the SSD, we were able to speed up applications by 1.39 and reduce energy consumption by 42%. This work has demonstrated the value of redefining the interaction between applications and storage devices. Accordingly, I am working with the group to provide innovative SSD interface and network semantics that help eliminate inefficient CPU code and improve system efficiencies. Future Directions Because of the limitations imposed by dark silicon, computer systems are shifting to rely more on parallel and heterogeneous architectures. In this environment, the overhead of moving data among devices becomes critical for performance, especially when the application needs to deal with a large amount of data. While my prior work helps address the problem of redundant computation, as well as unnecessary data movement (including cache-to-cache, memory-to-memory, and storage-to-memory), the fundamental demands of moving data from data storage to the device memory remain a challenge for high-performance computing resources. To reduce the amount of data movement and improve the scalability of computing, I would like to bring computation closer to data sources, so that the system propagates computation and data to other computing resources only when doing so is beneficial. To provide this kind of in-house computing power for data sources (including storage devices, network interface cards, and memory modules), I plan to investigate efficient processor designs on these devices. In addition, I will enhance and extend programming tools so that programmers can easily take advantage of this new architecture. The ultimate goal is to perform computation at the optimum place within every kind of computer system, including heterogeneous, parallel computers and emerging larger-scale systems like IoT. In the following, I outline several specific research topics directed toward my research goal. Embedding computing power in different layers of the system Embedding computing power in different layers of data storage is not just a matter of adding processors into peripheral devices. Each type of hardware s specific characteristics affect the design philosophy of processors that will work on that device. The target applications running in the system also drive design decisions and require different trade-offs in the architecture. This line of inquiry raises many research questions. For example, what types of processor architecture are needed to balance the various performance, power, energy, and hardware costs of different types of data storage and I/O devices? Which layers of the memory hierarchy and what kinds of I/O devices need computing power? How do these new processors in the memory/storage hierarchy affect the role and design of the CPU? How can we efficiently share data but still maintain consistency and coherence among processors on different devices in a system? What architectural supports are required to efficiently and dynamically move computation across different processing units in the new design? The computing scenarios of data storage and I/O devices are different from those of CPUs in several respects, including latencies, bandwidth, and parallelism. For example, the access time of current storageclass non-volatile memory technology is still orders of magnitude slower than on-chip caches. At the same time, these devices also offer rich internal parallelism. In addition, workloads that are suitable for processing inside storage or I/O devices can exhibit different behaviors from those that current CPUs are optimized for. As a result, we will need brand new processor architectures to efficiently process these workloads. With heterogeneous processors taking on the burden of computation in the system, we can revisit the CPU design and refocus the CPU on computing-intensive workloads. Sharing data efficiently among different processor-equipped devices requires new models of coherence and consistency. Even in the current computation model, maintaining coherence across high-speed volatile 3
4 memories with different latencies still incurs significant overhead. If the system is to extend memory coherence and consistency even further (e.g. to the non-volatile memory that most storage devices use) data persistence becomes even more challenging, as does the problem of latency. Easy-to-use programming model and efficient system software As computer systems offer increased computing resources, writing a program that best utilizes those resources can potentially become enormously complicated yet still provide only limited flexibility. To efficiently manage tasks and communications on different computing resources, we must rethink programming models, as well as the design of software interfaces, runtime systems and operating systems. Extending existing programming models and their runtime systems, such as Spark and MapReduce, is the first step in this direction. With these high-level programming models hiding details of hardware architecture and data storage, programmers are able to distribute computation to different heterogeneous devices by composing a single program. However, designing a lightweight and efficient runtime system that supports these programming models in a system with tens of computing devices (or more) presents a wealth of research problems. Programming models like the DTT model that trigger asynchronous, event-driven and data-aware parallelism can also be an alternative programming framework, as these models avoid the latency and resource utilization issues in Spark and MapReduce. A programming model like DTT that is compatible with existing high-level programming language presents the opportunity to explore ways of using legacy code to generate new code that can make effective use of emerging computing resources. (My work on the CDTT compiler described above is only one example.) Near-data processing for IoT Bringing computation closer to data locations is beneficial for systems where communication is the most expensive cost. In the IoT, where the latency and energy consumption required to exchange data through wireless communication generates significant overhead, applying the concept of near-data processing is especially valuable. My previous work in this area [14, 15] helps reduce the energy consumption of wireless communication, but fundamentally changing the computation model can improve the problem even further. As with supporting processing inside devices with data storage in a computer, bringing computation closer to IoT devices requires both hardware and software supports. However, designing a processor on tiny wireless devices is more challenging than doing so within a single node computer, because we must both leave most computation in-house to reduce the amount of outgoing data and stay within a limited energy budget. Processors that combine an energy-efficient, general-purpose processor with hardware accelerators would be a compelling choice. To make programming IoT systems as easy as writing programs in a single computer, I also plan to explore a programming model design that allows the programmer to easily configure the task but also requires the least possible middleware overhead from each device. To further reduce the cost of communication, I am also interested in developing a lightweight, RDMA-like protocol that bypasses part of the overhead in the Internet stack. Finally, since many architectural design decisions are determined by the needs of real applications, I also hope to conduct interdisciplinary research projects with researchers from health care, the social sciences, and the field of human-computer interactions to figure out the behavior of IoT applications. The significant changes in computer architectures and application demands are forcing us to rethink computing models, including all aspects of architectures, programming languages, compilers, and systems. In my future work as a professor, I plan to focus on bridging the gap between emerging computer architectures and applications, and to conduct interdisciplinary projects that will help shed light on these emerging issues. References [1] H.-W. Tseng and D. M. Tullsen, Data-triggered threads: Eliminating redundant computation, in 17th International Symposium on High Performance Computer Architecture, HPCA 2011, pp ,
5 [2] H.-W. Tseng and D. M. Tullsen, Eliminating redundant computation and exposing parallelism through data-triggered threads, IEEE Micro, Special Issue on the Top Picks from Computer Architecture Conferences, vol. 32, pp , [3] H.-W. Tseng, Y. Liu, M. Gahagan, J. Li, Y. Jin, and S. Swanson, Gullfoss: Accelerating and simplifying data movement among heterogeneous computing and storage resources, Tech. Rep. CS , Department of Computer Science and Engineering, University of California, San Diego technical report, [4] C.-L. Yang, A. R. Lebeck, H.-W. Tseng, and C.-H. Lee, Tolerating memory latency through push prefetching for pointer-intensive applications, Transactions on Architecture and Code Optimization (TACO), vol. 1, no. 4, pp , [5] H.-W. Tseng, L. M. Grupp, and S. Swanson, Understanding the impact of power loss on flash memory, in 48th Design Automation Conference, DAC 2011, pp , [6] H.-W. Tseng, L. M. Grupp, and S. Swanson, Underpowering NAND flash: Profits and perils, in 48th Design Automation Conference, DAC 2013, pp. 1 6, [7] H.-W. Tseng and D. M. Tullsen, Software data-triggered threads, in ACM SIGPLAN 2012 Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA 2012, pp , [8] H.-W. Tseng and D. M. Tullsen, Data-triggered multithreading for near data processing, in 1st Workshop on Near-Data Processing, WoNDP 2013, [9] H.-W. Tseng and D. M. Tullsen, CDTT: Compiler-generated data-triggered threads, in 20th International Symposium on High Performance Computer Architecture, HPCA 2014, pp , [10] H.-L. Li, C.-L. Yang, and H.-W. Tseng, Energy-aware flash memory management in virtual memory system, IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 16, no. 8, pp , [11] H.-W. Tseng, H.-L. Li, and C.-L. Yang, An energy-efficient virtual memory system with flash memory as the secondary storage, in 2006 International Symposium on Low Power Electronics and Design, ISLPED 2006, pp , [12] C.-L. Yang, H.-W. Tseng, C.-C. Ho, and J.-L. Wu, Software-controlled cache architecture for energy efficiency, IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 5, pp , [13] C.-L. Yang, H.-W. Tseng, and C.-C. Ho, Smart cache: An energy-efficient D-cache for a software MPEG-2 video decoder, in 2003 Joint Conference of the Fourth International Conference on Information, Communications and Signal Processing, 2003 and Fourth Pacific Rim Conference on Multimedia, ICICS-PCM 2003, pp , [14] S.-H. Yang, H.-W. Tseng, E. H.-K. Wu, and G.-H. Chen, Utilization based duty cycle tuning mac protocol for wireless sensor networks, in IEEE Global Telecommunications Conference, 2005, GLOBECOM 2005, pp , [15] H.-W. Tseng, S.-H. Yang, P.-Y. Chuangi, E. H.-K. Wu, and G.-H. Chen, An energy consumption analytic model for a wireless sensor MAC protocol, in IEEE 60th Vehicular Technology Conference, VTC2004-Fall, pp ,
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
Data Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Driving force. What future software needs. Potential research topics
Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #
Energy-Efficient, High-Performance Heterogeneous Core Design
Energy-Efficient, High-Performance Heterogeneous Core Design Raj Parihar Core Design Session, MICRO - 2012 Advanced Computer Architecture Lab, UofR, Rochester April 18, 2013 Raj Parihar Energy-Efficient,
Accelerate Cloud Computing with the Xilinx Zynq SoC
X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
How To Build An Ark Processor With An Nvidia Gpu And An African Processor
Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced
How To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/cs5540/spring2014/index.html
Datacenters and Cloud Computing Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/cs5540/spring2014/index.html What is Cloud Computing? A model for enabling ubiquitous, convenient, ondemand network
System Models for Distributed and Cloud Computing
System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
Network Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
Virtual machine interface. Operating system. Physical machine interface
Software Concepts User applications Operating system Hardware Virtual machine interface Physical machine interface Operating system: Interface between users and hardware Implements a virtual machine that
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,
Universal Flash Storage: Mobilize Your Data
White Paper Universal Flash Storage: Mobilize Your Data Executive Summary The explosive growth in portable devices over the past decade continues to challenge manufacturers wishing to add memory to their
Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation
Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance Alex Ho, Product Manager Innodisk Corporation Outline Innodisk Introduction Industry Trend & Challenge
Technology Insight Series
Evaluating Storage Technologies for Virtual Server Environments Russ Fellows June, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved Executive Summary
Improving Grid Processing Efficiency through Compute-Data Confluence
Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform
SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs
SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND -Based SSDs Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim Department of Computer Science and Engineering, Seoul National University,
The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
BSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
Capstone Overview Architecture for Big Data & Machine Learning. Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015
Capstone Overview Architecture for Big Data & Machine Learning Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015 Accelerators Memory Traffic Reduction Memory Intensive Arch. Context-based Prefetching Deep
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
Operating System for the K computer
Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements
RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS
RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS AN INSTRUCTION WINDOW THAT CAN TOLERATE LATENCIES TO DRAM MEMORY IS PROHIBITIVELY COMPLEX AND POWER HUNGRY. TO AVOID HAVING TO
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track)
Plan Number 2009 Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track) I. General Rules and Conditions 1. This plan conforms to the regulations of the general frame of programs
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Memory Channel Storage ( M C S ) Demystified. Jerome McFarland
ory nel Storage ( M C S ) Demystified Jerome McFarland Principal Product Marketer AGENDA + INTRO AND ARCHITECTURE + PRODUCT DETAILS + APPLICATIONS THE COMPUTE-STORAGE DISCONNECT + Compute And Data Have
Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1
Distributed Systems REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1 1 The Rise of Distributed Systems! Computer hardware prices are falling and power increasing.!
Parallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
Flash Memory Arrays Enabling the Virtualized Data Center. July 2010
Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,
Chapter 13 Embedded Operating Systems
Operating Systems: Internals and Design Principles Chapter 13 Embedded Operating Systems Eighth Edition By William Stallings Embedded System Refers to the use of electronics and software within a product
Rackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
How To Make A High Throughput Computing Data Center
Technical White Paper High Throughput Computing Data Center Architecture Thinking of Data Center 3.0 Abstract In the last few decades, data center (DC) technologies have kept evolving from DC 1.0 (tightly-coupled
HyperQ Storage Tiering White Paper
HyperQ Storage Tiering White Paper An Easy Way to Deal with Data Growth Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com [email protected]
Building Web-based Infrastructures for Smart Meters
Building Web-based Infrastructures for Smart Meters Andreas Kamilaris 1, Vlad Trifa 2, and Dominique Guinard 2 1 University of Cyprus, Nicosia, Cyprus 2 ETH Zurich and SAP Research, Switzerland Abstract.
Reducing Configuration Complexity with Next Gen IoT Networks
Reducing Configuration Complexity with Next Gen IoT Networks Orama Inc. November, 2015 1 Network Lighting Controls Low Penetration - Why? Commissioning is very time-consuming & expensive Network configuration
How To Understand The Concept Of A Distributed System
Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz [email protected], [email protected] Institute of Control and Computation Engineering Warsaw University of
LinuxWorld Conference & Expo Server Farms and XML Web Services
LinuxWorld Conference & Expo Server Farms and XML Web Services Jorgen Thelin, CapeConnect Chief Architect PJ Murray, Product Manager Cape Clear Software Objectives What aspects must a developer be aware
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
Chapter 2: OS Overview
Chapter 2: OS Overview CmSc 335 Operating Systems 1. Operating system objectives and functions Operating systems control and support the usage of computer systems. a. usage users of a computer system:
BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS
WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our
Storage Solutions to Maximize Success in VDI Environments
Storage Solutions to Maximize Success in VDI Environments Contents Introduction: Why VDI?. 1 VDI Challenges. 2 Storage Solutions Optimized for VDI. 3 Conclusion. 6 Brought to you compliments of: Introduction:
Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer
Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB
Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides
SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems
SOFT 437 Software Performance Analysis Ch 5:Web Applications and Other Distributed Systems Outline Overview of Web applications, distributed object technologies, and the important considerations for SPE
Vortex White Paper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems
Vortex White Paper Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems Version 1.0 February 2015 Andrew Foster, Product Marketing Manager, PrismTech Vortex
Dynamic resource management for energy saving in the cloud computing environment
Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan
Big data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
EHA: The Extremely Heterogeneous Architecture
EHA: The Extremely Heterogeneous Architecture Shaoshan Liu 1, Won W. Ro 2, Chen Liu 3, Alfredo C. Salas 4, Christophe Cérin 5, Jian-Jun Han 6 and Jean-Luc Gaudiot 7 1 Microsoft, WA, U.S.A. 2 Yonsei University,
Chapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang Nanjing Communications
Architecture Support for Big Data Analytics
Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH) 1
Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division
Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division In this talk Big data storage: Current trends Issues with current storage options Evolution of storage to support big
Disk Storage Shortfall
Understanding the root cause of the I/O bottleneck November 2010 2 Introduction Many data centers have performance bottlenecks that impact application performance and service delivery to users. These bottlenecks
Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting
Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Introduction Big Data Analytics needs: Low latency data access Fast computing Power efficiency Latest
GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS
Embedded Systems White Paper GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS September 2009 ABSTRACT Android is an open source platform built by Google that includes an operating system,
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
Secured Embedded Many-Core Accelerator for Big Data Processing
Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland,
Design and Implementation of the Heterogeneous Multikernel Operating System
223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,
Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
high-performance computing so you can move your enterprise forward
Whether targeted to HPC or embedded applications, Pico Computing s modular and highly-scalable architecture, based on Field Programmable Gate Array (FPGA) technologies, brings orders-of-magnitude performance
Performance Oriented Management System for Reconfigurable Network Appliances
Performance Oriented Management System for Reconfigurable Network Appliances Hiroki Matsutani, Ryuji Wakikawa, Koshiro Mitsuya and Jun Murai Faculty of Environmental Information, Keio University Graduate
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
Desktop Virtualization and Storage Infrastructure Optimization
Desktop Virtualization and Storage Infrastructure Optimization Realizing the Most Value from Virtualization Investment Contents Executive Summary......................................... 1 Introduction.............................................
SPARC64 VIIIfx: CPU for the K computer
SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Rambus Smart Data Acceleration
Rambus Smart Data Acceleration Back to the Future Memory and Data Access: The Final Frontier As an industry, if real progress is to be made towards the level of computing that the future mandates, then
ioscale: The Holy Grail for Hyperscale
ioscale: The Holy Grail for Hyperscale The New World of Hyperscale Hyperscale describes new cloud computing deployments where hundreds or thousands of distributed servers support millions of remote, often
Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise
Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Introducing Unisys All in One software based weather platform designed to reduce server space, streamline operations, consolidate
Getting More Performance and Efficiency in the Application Delivery Network
SOLUTION BRIEF Intel Xeon Processor E5-2600 v2 Product Family Intel Solid-State Drives (Intel SSD) F5* Networks Delivery Controllers (ADCs) Networking and Communications Getting More Performance and Efficiency
A Study of Application Performance with Non-Volatile Main Memory
A Study of Application Performance with Non-Volatile Main Memory Yiying Zhang, Steven Swanson 2 Memory Storage Fast Slow Volatile In bytes Persistent In blocks Next-Generation Non-Volatile Memory (NVM)
System Software Integration: An Expansive View. Overview
Software Integration: An Expansive View Steven P. Smith Design of Embedded s EE382V Fall, 2009 EE382 SoC Design Software Integration SPS-1 University of Texas at Austin Overview Some Definitions Introduction:
Cloud Based Application Architectures using Smart Computing
Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products
Data Management for Portable Media Players
Data Management for Portable Media Players Table of Contents Introduction...2 The New Role of Database...3 Design Considerations...3 Hardware Limitations...3 Value of a Lightweight Relational Database...4
Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip
Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC
IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus
Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,
Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
Principles and characteristics of distributed systems and environments
Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011
A Close Look at PCI Express SSDs Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011 Macro Datacenter Trends Key driver: Information Processing Data Footprint (PB) CAGR: 100%
evm Virtualization Platform for Windows
B A C K G R O U N D E R evm Virtualization Platform for Windows Host your Embedded OS and Windows on a Single Hardware Platform using Intel Virtualization Technology April, 2008 TenAsys Corporation 1400
How to Choose your Red Hat Enterprise Linux Filesystem
How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to
7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
Manjrasoft Market Oriented Cloud Computing Platform
Manjrasoft Market Oriented Cloud Computing Platform Aneka Aneka is a market oriented Cloud development and management platform with rapid application development and workload distribution capabilities.
Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts
Part V Applications Cloud Computing: General concepts Copyright K.Goseva 2010 CS 736 Software Performance Engineering Slide 1 What is cloud computing? SaaS: Software as a Service Cloud: Datacenters hardware
