Feb.2012 Benefits of the big.little Architecture
|
|
|
- David Malone
- 10 years ago
- Views:
Transcription
1 Feb.2012 Benefits of the big.little Architecture Hyun-Duk Cho, Ph. D. Principal Engineer Kisuk Chung, Senior Engineer Taehoon Kim, Vice President System LSI Business Samsung Electronics
2 Abstract The demand for performance from portable computing devices, such as tablet computers and smartphones, has been steadily rising. As an immediate consequence of this trend, computing- and display-intensive applications, such as gaming and high-definition video playback are turning out to be enormous drains on the device s power source, that is, the on-board battery. Current generation handheld computers and smartphones need high-performance CPUs, for instance, quad-core CPUs running at or above 1.5 GHz, to leverage all of the device features. However, batteries from the present generation are incapable of supporting very high computing power required in such devices, thus fail to provide considerable durations of usage to users. Moreover, when such processors are combined with high-resolution LCDs in any device, the drain is significantly higher and much quicker. One of the proposed solutions to increase battery life in such devices is to use CPUs with a different, power-efficient architecture. This paper introduces the big.little architecture launched by ARM, and its adoption for the Exynos line of processors by Samsung. The paper also discusses several key advantages of this architecture, compares it to other competing architectures, and demonstrates its efficiency and superiority over other architectures, especially in extending battery life in portable, handheld computing and communications devices. 1. Introduction Rapid advancements in technology have brought about a remarkable increase in the performance of the CPUs used in mobile computing devices, such as smartphones and tablet computers. Operating frequencies or clock rates of such CPUs have increased from a few hundred MHz to several GHz, and the devices have expanded from having a single core to multiple, high performance, separate cores. Due to this rise in available computing power, a mobile computing device today can effortlessly run applications only a traditional computer, such as a desktop or notebook computer, could run in the past. However, technological developments in traditional computers have also demonstrated equal growth. A significant and noticeable gap remains in the performance between a mobile computing device and a traditional desktop or laptop computer. Current generation technology allows pushing the performance limits for mobile computing devices up to a certain extent. This can be accomplished by increasing the clock rates of the CPUs used in the devices or by increasing the number of cores within the CPUs, or both. However, the immediate downside to any of these techniques is a significant rise in the power consumed by the device s CPU. For battery-powered mobile devices, this translates to a higher drain on the device s battery, resulting in a reduced battery life and lower usable duration of the device for the user. It may be noted that in contrast to traditional computing devices, battery capacities for mobile computing devices cannot be increased beyond certain limits, due to the physical and form-factor limitations of the device. As an example, it may not be possible to increase the size of displays on smartphones and tablet computers beyond, say, 5 and 11, respectively. Such restrictions limit the sizes and capacities of the batteries designed to power such devices.
3 To achieve higher computing performances at the same or lower power consumption levels in mobile computing devices, several new processor architectures have been designed. One of these is the asynchronous clock architecture, which uses the concept of running one core at a lower voltage and clock frequency, and the other at a higher voltage and clock frequency, in a dual-core CPU. Such a CPU schedules computationally intensive tasks for the faster core, while less demanding tasks are diverted to the slower core. Samsung has adopted a different architecture, known as the big.little architecture of ARM, for its Exynos line of high performance CPUs, designed specifically for the mobile computing and communications platform. The big.little architecture combines multiple cores to create a CPU that can execute both high- and low-intensity tasks in the most efficient manner. The following sections of this paper introduce the big.little architecture, and demonstrate its superiority over the asynchronous clock architecture in terms of both performance and power consumption. 2. Overview of The big.little Architecture The big.little architecture is designed using the technique of employing separate cores with different computing powers within the same CPU. Figure 1 illustrates the block diagram of a representative CPU designed using the big.little architecture. Figure 1: The big.little architecture The more powerful (or big ) core is responsible for handling computing intensive tasks, such as high-definition video playback, whereas the less powerful (or LITTLE ) core handles lesser demanding tasks, such as text editing. Both the cores implement exactly the same processor architecture (ARMv7), and are capable of executing the same instructions. The only difference lies in the way the cores handle the execution. While the big core is designed with performance as its primary goal, the LITTLE core is designed with efficiency as its principal target. Thus, an application or program designed to run on one core would also run on the other, but with different performance levels. The big.little architecture from ARM uses a Cortex A15 processor as its big CPU, along with a Cortex A7 processor as the LITTLE CPU to form the combination of high- and low-powered cores within a single unit. The high performance Cortex A15 is an out-of-order, triple issue processor with a pipeline length of between 15 and 24 stages. The low power Cortex A7 is an in-order, non-symmetric dual-issue processor with a pipeline length of between 8 and 10 stages. Because of the different lengths of the pipeline in Cortex A15 and Cortex A7, the power consumption levels of both the processors are significantly different. The Cortex A15 is capable of delivering higher performance at the cost of lower power efficiency, whereas the Cortex A7 design enables it to run in the most power efficient mode.
4 The big.little architecture benefits from the fact that the two separate cores are identical, capable of handling the same instructions and the same higher-level software applications. This proves to be extremely beneficial in power-conscious applications, such as smartphones. The big.little architecture can be deployed in smartphones, with the LITTLE core handling normal telephony-related functions of the device, and the big core taking over control from the LITTLE core when higher levels of performance, such as multimedia playback, are needed. To efficiently realize a system based on the big.little architecture, the time needed to migrate a task between the cores must be taken into account. A finite amount of time would be needed to save the state (which includes the content of all the registers and caches) of both the cores, and this time should not be too long. If the context switching latency is beyond a certain threshold, the system will suffer from noticeable performance degradation. To complete the switching in the minimum possible time, a specially designed interconnection bus, known as the CCI (Cache Coherent Interconnect) is used to transfer data among the cores, as shown in Figure 1. For an application running on a device at 1 GHz, the big.little architecture ensures that this switching is completed in less than 20,000 clock cycles (or 20 µs), a duration so small that it does not affect or interfere with the end user s application running on the device [1]. The big.little architecture also provides the flexibility of configuring the number of big and LITTLE cores within the CPU. The present version of the architecture allows up to four cores of both types to be configured, letting the designer have extensive control over the performance of the CPU. Furthermore, due to differences in computing powers, both the big and LITTLE cores occupy different areas on the die, with the LITTLE core taking up just 13% of the area of the big core. It is recommended that the number of big and LITTLE cores in a particular CPU be kept equal, to simplify the context switching and task migration software. 3. Comparison with Asynchronous Clock Architecture Asynchronous clock architecture is a design for multi-core processors, in which, each core is operated at different voltage levels and clock frequencies. As stated earlier, such a design schedules computationally intensive tasks for the faster core, while less demanding tasks for the slower core. Asynchronous clock architecture can thus be considered as one of the techniques to strike an optimum balance between performance and power consumption in multi-core processor based systems. In contrast, synchronous clock architecture implements a CPU with separate cores, with each core running at the same, fixed values of voltages and frequencies. Processors based on the big.little architecture adopt the synchronous clock architecture. Figure 2 illustrates both asynchronous clock architecture and synchronous clock architecture. The figure depicts the interrelationship between the computing tasks (shown as the inner violet boxes) and operating clock frequencies (shown as the outer blue boxes) for both the cores on an asynchronous and a synchronous CPU in case of a dual-core processor. A CPU core with an asynchronous architecture (a) is capable of adjusting its operating clock frequency in accordance with the required computing power. Thus, for less-intensive tasks, a core (CPU1) may be operated at a lower clock, correspondingly reducing its power envelope. This is also a graphical representation of the reasons for the lower power consumption in any asynchronous architecture. The adjoining figure (b) illustrates that a synchronous architecture is incapable of providing such computing-power dependent clock frequencies. Consequently, irrespective of the demand for computing power, the core (CPU1) always runs at a fixed, constant clock rate, without any control over the power envelope.
5 CPU0 CPU1 CPU0 CPU1 (a) Asynchronous Case (b) Synchronous Case Figure 2: Power management by DVFS in both Asynchronous and Synchronous Architectures Nonetheless, asynchronous architecture suffers from performance degradation, as explained by the following paragraphs. Although CPU1 of the synchronous architecture processor consumes more power than CPU1 of the asynchronous architecture processor, being operated at a higher clock frequency, it completes the execution of its scheduled tasks faster. Consequently, any simple power management scheme can be deployed to turn it off (or put it in a low-power state) once it has completed executing its tasks. Overall, the net energy consumption (which is the summation of power consumed over a particular duration of time) in both the asynchronous architecture and synchronous architectures is the same. Figure 3 illustrates the comparison of power dissipation and energy consumption, in both the asynchronous and synchronous architecture example discussed above. Power Sync architecture Async architecture Time Figure 3: Comparison of power dissipation of CPU1 cores in both the synchronous and asynchronous architectures The performance degradation in asynchronous architecture occurs during two kinds of data transfers. In an asynchronous architecture CPU, each CPU core, along with its L1 and L2 caches, and SCU (Snoop Control Unit) run from different clock domains. If an L1 cache miss occurs on one core, the requested data are transferred from the L2 cache to the L1 cache via the SCU, as shown by the arrow A in Figure 4. Because of the different clock domains for the CPU, SCU, and L2 cache, additional clock cycles are required to complete this data transfer, resulting in a degradation of the overall performance of the processor.
6 CPU0 L1 Cache CPU1 L1 Cache (A) Snoop Control Unit L2 Cache (B) Figure 4: Asynchronous architecture depicting CPU cores, L1 and L2 caches, and SCU. (A) represents data transfer from L2 to L1, and (B) represents inter-cpu data transfer The second type of data transfer, which degrades the performance, is low throughputs while transferring data between CPU cores. Arrow B in the above figure illustrates this. When there is an L1 cache miss on any CPU core (say, CPU0), data can be transferred from the L1 cache of the other CPU core. If the source core is operating at significantly lower clock rates than the destination core, this results in severe performance degradation, since the destination core has to wait for additional clock cycles to receive data from the source core, even when the destination core is capable of operating at much higher clock rates (see Figure 5). In contrast to this, in a synchronous architecture, since each CPU core runs at the same clock frequency, a situation such as this cannot arise (see Figure 6). Figure 5: Inter-core data transfer cycle in an asynchronous architecture for a CPU task load ratio of 5:1 Figure 6: Inter-core data transfer cycle in a synchronous architecture for a CPU task load ratio of 5:1 The discussions until this point in this paper have proven that the asynchronous architecture may also not prove to be the most efficient, under all operating conditions. The following paragraphs highlight the key benefits of the big.little architecture, and the advantages it brings for Samsung s Exynos line of processors.
7 The mw/dmips metric is commonly used to indicate the power-performance advantage that processors based on the big.little architecture stand to gain. Using the Cortex A7 core in this architecture results in a 3.3x reduction in power consumption, compared to the Cortex A15 core. Figure 7 illustrates the comparison of energy efficiencies and power-performance mappings for both the asynchronous and big.little architectures. Figure 7: Comparison of energy efficiencies of the asynchronous and big.little architectures The graph in the figure highlights two distinct sections (A to B and B to C) for the performance of both the asynchronous and big.little architectures. For applications such as text editing, which require relatively low CPU performance, big.little proves to be a superior architecture as compared to the plain asynchronous architecture, since the Cortex A7 core in the big.little architecture handles such applications in the most power-efficient manner. Likewise, when demand for performance increases beyond a certain threshold (such as 3D gaming), the Cortex A15 core from the big.little architecture takes over, and provides a significantly superior performance than that provided by the asynchronous architecture. By and large, for a majority of commonly used applications on the mobile platform, which include messaging, web browsing, and multimedia playback, the big.little architecture proves to be far more superior to the asynchronous architecture. It is to be noted that frequent applications run on any mobile computing device include those that require lesser computing resources. As stated above, such applications include editing text files, and playing audio files. On the other hand, applications such as 3D gaming and high-definition video playback are seldom used on mobile devices. These statistics are used to deduce an important characteristic for the device: the net energy consumption. Energy is the summation of power consumed over a particular duration of time, and one of the chief design aspects of the big.little architecture takes into account the fact that frequent uses of low-energy consuming applications prove to be a much smaller drain on the device s power source (the battery), than occasional uses of high-energy consuming applications. The big.little architecture thus possesses a significant edge in providing longer battery lives for a majority of mobile computing and communications devices in use.
8 4. Conclusion This paper discussed the need to have a different architecture for processors used in the mobile computing platform, and introduced the readers to Samsung s power-efficient Exynos line of processors based on the big.little architecture from ARM Limited. The paper presented an overview of the big.little architecture, and its key benefits in providing a superior performance at reduced power consumption. It also compared the big.little architecture with other competing architectures, such as the asynchronous architecture, and demonstrated the superiority of the former over the latter, with detailed analyses of how and when performance degradation occurs in asynchronous architecture. The paper then compared some of the key power versus performance metrics for the big.little architecture and a similar competing architecture, and showed why the big.little architecture is superior when it comes to providing high levels of power efficiency at sustained levels of performance. References [1] Peter Greenhalgh, big.little Processing with ARM Cortex -A15 & Cortex-A7 (Improving Energy Efficiency in High-Performance Mobile Platforms), White Paper released by ARM, Sep
big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices
big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices Brian Jeff November, 2013 Abstract ARM big.little processing
Which ARM Cortex Core Is Right for Your Application: A, R or M?
Which ARM Cortex Core Is Right for Your Application: A, R or M? Introduction The ARM Cortex series of cores encompasses a very wide range of scalable performance options offering designers a great deal
Embedded Parallel Computing
Embedded Parallel Computing Lecture 5 - The anatomy of a modern multiprocessor, the multicore processors Tomas Nordström Course webpage:: Course responsible and examiner: Tomas
big.little Technology: The Future of Mobile Making very high performance available in a mobile envelope without sacrificing energy efficiency
big.little Technology: The Future of Mobile Making very high performance available in a mobile envelope without sacrificing energy efficiency Introduction With the evolution from the first mobile phones
Industry First X86-based Single Board Computer JaguarBoard Released
Industry First X86-based Single Board Computer JaguarBoard Released HongKong, China (May 12th, 2015) Jaguar Electronic HK Co., Ltd officially launched the first X86-based single board computer called JaguarBoard.
Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze
Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Whitepaper December 2012 Anita Banerjee Contents Introduction... 3 Sorenson Squeeze... 4 Intel QSV H.264... 5 Power Performance...
Energy-Efficient, High-Performance Heterogeneous Core Design
Energy-Efficient, High-Performance Heterogeneous Core Design Raj Parihar Core Design Session, MICRO - 2012 Advanced Computer Architecture Lab, UofR, Rochester April 18, 2013 Raj Parihar Energy-Efficient,
Whitepaper. The Benefits of Multiple CPU Cores in Mobile Devices
Whitepaper The Benefits of Multiple CPU Cores in Mobile Devices 1 Table of Contents Executive Summary... 3 The Need for Multiprocessing... 3 Symmetrical Multiprocessing (SMP)... 4 Introducing NVIDIA Tegra
NVIDIA Tegra 4 Family CPU Architecture
Whitepaper NVIDIA Tegra 4 Family CPU Architecture 4-PLUS-1 Quad core 1 Table of Contents... 1 Introduction... 3 NVIDIA Tegra 4 Family of Mobile Processors... 3 Benchmarking CPU Performance... 4 Tegra 4
A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
Update on big.little scheduling experiments. Morten Rasmussen Technology Researcher
Update on big.little scheduling experiments Morten Rasmussen Technology Researcher 1 Agenda Why is big.little different from SMP? Summary of previous experiments on emulated big.little. New results for
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
ARM Architecture for the Computing World: From Devices to Cloud. Roy Chen Director, Client Computing ARM
ARM Architecture for the Computing World: From Devices to Cloud Roy Chen Director, Client Computing ARM 1 Who is ARM ( Advanced RISC Machines ) ARM is the world s leading semiconductor IP company and The
Hardware accelerated Virtualization in the ARM Cortex Processors
Hardware accelerated Virtualization in the ARM Cortex Processors John Goodacre Director, Program Management ARM Processor Division ARM Ltd. Cambridge UK 2nd November 2010 Sponsored by: & & New Capabilities
Glitch Free Frequency Shifting Simplifies Timing Design in Consumer Applications
Glitch Free Frequency Shifting Simplifies Timing Design in Consumer Applications System designers face significant design challenges in developing solutions to meet increasingly stringent performance and
Lecture 2: Computer Hardware and Ports. [email protected] http://faculty.sau.edu.sa/y.alharbi/en
BMTS 242: Computer and Systems Lecture 2: Computer Hardware and Ports Yousef Alharbi Email Website [email protected] http://faculty.sau.edu.sa/y.alharbi/en The System Unit McGraw-Hill Copyright 2011
System-Level Display Power Reduction Technologies for Portable Computing and Communications Devices
System-Level Display Power Reduction Technologies for Portable Computing and Communications Devices Achintya K. Bhowmik and Robert J. Brennan Intel Corporation 2200 Mission College Blvd. Santa Clara, CA
Optimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
AMD PhenomII. Architecture for Multimedia System -2010. Prof. Cristina Silvano. Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923
AMD PhenomII Architecture for Multimedia System -2010 Prof. Cristina Silvano Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923 Outline Introduction Features Key architectures References AMD Phenom
SMARTPHONE INNOVATION
SMARTPHONE INNOVATION - the best is yet to come Jeffrey Ju Senior Vice President and General Manager Wireless Communication, MediaTek This is where it began... 1984 Released in 1984, the Motorola DynaTAC
Low Power AMD Athlon 64 and AMD Opteron Processors
Low Power AMD Athlon 64 and AMD Opteron Processors Hot Chips 2004 Presenter: Marius Evers Block Diagram of AMD Athlon 64 and AMD Opteron Based on AMD s 8 th generation architecture AMD Athlon 64 and AMD
Understanding the Performance of an X550 11-User Environment
Understanding the Performance of an X550 11-User Environment Overview NComputing's desktop virtualization technology enables significantly lower computing costs by letting multiple users share a single
How System Settings Impact PCIe SSD Performance
How System Settings Impact PCIe SSD Performance Suzanne Ferreira R&D Engineer Micron Technology, Inc. July, 2012 As solid state drives (SSDs) continue to gain ground in the enterprise server and storage
PowerPC Microprocessor Clock Modes
nc. Freescale Semiconductor AN1269 (Freescale Order Number) 1/96 Application Note PowerPC Microprocessor Clock Modes The PowerPC microprocessors offer customers numerous clocking options. An internal phase-lock
Application Performance Analysis of the Cortex-A9 MPCore
This project in ARM is in part funded by ICT-eMuCo, a European project supported under the Seventh Framework Programme (7FP) for research and technological development Application Performance Analysis
Dell Wyse Cloud Connect discussion card
Dell Wyse Cloud Connect discussion card What is Cloud Connect? Cloud Connect is a portable enterprise IT-controlled HDMI/MHL (mobile high-definition link) cloud device that allows people to convert a capable
How To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
CHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 Background The command over cloud computing infrastructure is increasing with the growing demands of IT infrastructure during the changed business scenario of the 21 st Century.
VDI Solutions - Advantages of Virtual Desktop Infrastructure
VDI s Fatal Flaw V3 Solves the Latency Bottleneck A V3 Systems White Paper Table of Contents Executive Summary... 2 Section 1: Traditional VDI vs. V3 Systems VDI... 3 1a) Components of a Traditional VDI
The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage
The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...
The Heartbeat behind Portable Medical Devices: Ultra-Low-Power Mixed-Signal Microcontrollers
The Heartbeat behind Portable Medical Devices: Ultra-Low-Power Mixed-Signal Microcontrollers The proliferation of sophisticated yet affordable personal medical devices is transforming the health care industry,
Task Scheduling for Multicore Embedded Devices
Embedded Linux Conference 2013 Task Scheduling for Multicore Embedded Devices 2013. 02. 22. Gap-Joo Na ([email protected]) Contents 2 What is multicore?? 1. Multicore trends 2. New Architectures 3. Software
Introduction to AMBA 4 ACE and big.little Processing Technology
Introduction to AMBA 4 and big.little Processing Technology Ashley Stevens Senior FAE, Fabric and Systems June 6th 2011 Updated July 29th 2013 Page 1 of 15 Why AMBA 4? The continual requirement for more
Nexenta Performance Scaling for Speed and Cost
Nexenta Performance Scaling for Speed and Cost Key Features Optimize Performance Optimize Performance NexentaStor improves performance for all workloads by adopting commodity components and leveraging
EMC XtremSF: Delivering Next Generation Performance for Oracle Database
White Paper EMC XtremSF: Delivering Next Generation Performance for Oracle Database Abstract This white paper addresses the challenges currently facing business executives to store and process the growing
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
Real-Time Operating Systems for MPSoCs
Real-Time Operating Systems for MPSoCs Hiroyuki Tomiyama Graduate School of Information Science Nagoya University http://member.acm.org/~hiroyuki MPSoC 2009 1 Contributors Hiroaki Takada Director and Professor
Maximizing Throughput and Coverage for Wi Fi and Cellular
Maximizing Throughput and Coverage for Wi Fi and Cellular A White Paper Prepared by Sebastian Rowson, Ph.D. Chief Scientist, Ethertronics, Inc. www.ethertronics.com March 2012 Introduction Ask consumers
EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server
White Paper EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server Abstract This white paper addresses the challenges currently facing business executives to store and process the growing
Satellite Telephone Quality of Service Comparison: Iridium vs. Globalstar
Satellite Telephone Quality of Service Comparison: vs. July 25, 2002 Frost & Sullivan takes no responsibility for any incorrect information supplied to us by manufacturers or users. Quantitative market
High Performance or Cycle Accuracy?
CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing
Switch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
Managing Data Center Power and Cooling
White PAPER Managing Data Center Power and Cooling Introduction: Crisis in Power and Cooling As server microprocessors become more powerful in accordance with Moore s Law, they also consume more power
How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router
HyperQ Hybrid Flash Storage Made Easy White Paper Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com [email protected] [email protected]
Parallelism and Energy-efficient GPUs from Mobile to Supercomputers. Jem Davies ARM Fellow, VP of Technology Media Processing Division 24-Feb-13
Parallelism and Energy-efficient GPUs from Mobile to Supercomputers Jem Davies ARM Fellow, VP of Technology Media Processing Division 24-Feb-13 2 First, some background The Eras of Computing The Internet
evm Virtualization Platform for Windows
B A C K G R O U N D E R evm Virtualization Platform for Windows Host your Embedded OS and Windows on a Single Hardware Platform using Intel Virtualization Technology April, 2008 TenAsys Corporation 1400
Choosing a Computer for Running SLX, P3D, and P5
Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line
From Bus and Crossbar to Network-On-Chip. Arteris S.A.
From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,
Chrome OS*-Based Devices in the Enterprise
White Paper Client Virtualization Enterprise Mobility Chrome OS*-Based Devices in the Enterprise Being Able to Access Windows* Applications from a Chrome OS Client Significantly Increases their Versatility
Samsung emcp. WLI DDP Package. Samsung Multi-Chip Packages can help reduce the time to market for handheld devices BROCHURE
Samsung emcp Samsung Multi-Chip Packages can help reduce the time to market for handheld devices WLI DDP Package Deliver innovative portable devices more quickly. Offer higher performance for a rapidly
SABRE Lite Development Kit
SABRE Lite Development Kit Freescale i.mx 6Quad ARM Cortex A9 processor at 1GHz per core 1GByte of 64-bit wide DDR3 @ 532MHz UART, USB, Ethernet, CAN, SATA, SD, JTAG, I2C Three Display Ports (RGB, LVDS
Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager
Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor Travis Lanier Senior Product Manager 1 Cortex-A15: Next Generation Leadership Cortex-A class multi-processor
Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit
Unit A451: Computer systems and programming Section 2: Computing Hardware 1/5: Central Processing Unit Section Objectives Candidates should be able to: (a) State the purpose of the CPU (b) Understand the
enabling Ultra-High Bandwidth Scalable SSDs with HLnand
www.hlnand.com enabling Ultra-High Bandwidth Scalable SSDs with HLnand May 2013 2 Enabling Ultra-High Bandwidth Scalable SSDs with HLNAND INTRODUCTION Solid State Drives (SSDs) are available in a wide
DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering
DELL Virtual Desktop Infrastructure Study END-TO-END COMPUTING Dell Enterprise Solutions Engineering 1 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL
FreeTAB 1017 IPS2 X4+ Sint-Truidensesteenweg 632 3300 Hakendover 016 78 99 38
FreeTAB 1017 IPS2 X4+ Sint-Truidensesteenweg 632 3300 Hakendover 016 78 99 38 MODECOM FreeTAB 1017 IPS2 X4 is a premium tablet PC of outstanding technical parameters, efficiency and design. This high-end
Performance of Host Identity Protocol on Nokia Internet Tablet
Performance of Host Identity Protocol on Nokia Internet Tablet Andrey Khurri Helsinki Institute for Information Technology HIP Research Group IETF 68 Prague March 23, 2007
QLIKVIEW SERVER MEMORY MANAGEMENT AND CPU UTILIZATION
QLIKVIEW SERVER MEMORY MANAGEMENT AND CPU UTILIZATION QlikView Scalability Center Technical Brief Series September 2012 qlikview.com Introduction This technical brief provides a discussion at a fundamental
The Motherboard Chapter #5
The Motherboard Chapter #5 Amy Hissom Key Terms Advanced Transfer Cache (ATC) A type of L2 cache contained within the Pentium processor housing that is embedded on the same core processor die as the CPU
Choose your mobile device carefully. The wrong platform could leave you without key functionality.
Choose your mobile device carefully. The wrong platform could leave you without key functionality. EXECUTIVE SUMMARY Can your sales force be equally effective with Salesforce.com Sales Cloud on any mobile
Introduction to the Universal Flash Storage Assocation
White Paper to the Universal Flash Storage Assocation Executive Summary Universal Flash Storage (UFS) is the next generation, high-performance non-volatile storage standard. The UFS standard is defined
App coverage. ericsson White paper Uen 284 23-3212 Rev B August 2015
ericsson White paper Uen 284 23-3212 Rev B August 2015 App coverage effectively relating network performance to user experience Mobile broadband networks, smart devices and apps bring significant benefits
OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING
OBJECTIVE ANALYSIS WHITE PAPER MATCH ATCHING FLASH TO THE PROCESSOR Why Multithreading Requires Parallelized Flash T he computing community is at an important juncture: flash memory is now generally accepted
PC computer configurations & Windows optimizations (Updated November 2012)
PC computer configurations & Windows optimizations (Updated November 2012) A fast processor and a good amount of memory are important, but do not necessarily guarantee that a computer will perform well
Intel Atom Processor Tristan Greenidge CPTR 350 Introduction Intel Atom is Intel s line for ultra-low-voltage processors. Atoms are used in netbooks, nettops, embedded applications ranging from health
IBM Lotus Notes and Lotus inotes 8.5.2 on Citrix XenApp 4.5/5.0: A scalability analysis
IBM Lotus Notes and Lotus inotes 8.5.2 on Citrix XenApp 4.5/5.0: A scalability analysis Gary Denner IBM Software Group IBM Collaboration Solutions Technical Lead - Lotus Domino SVT Mulhuddart, Ireland
WD Storage Solutions
WD Storage Solutions Your digital life is unique. That s why WD offers a full range of internal hard drives. This handy guide will help you find the perfect drive based on where and how you want to use
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Intel s SL Enhanced Intel486(TM) Microprocessor Family
Intel s SL Enhanced Intel486(TM) Microprocessor Family June 1993 Intel's SL Enhanced Intel486 Microprocessor Family Technical Backgrounder Intel's SL Enhanced Intel486 Microprocessor Family With the announcement
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Communicating with devices
Introduction to I/O Where does the data for our CPU and memory come from or go to? Computers communicate with the outside world via I/O devices. Input devices supply computers with data to operate on.
Broadcom 10GbE High-Performance Adapters for Dell PowerEdge 12th Generation Servers
White Paper Broadcom 10GbE High-Performance Adapters for Dell PowerEdge 12th As the deployment of bandwidth-intensive applications such as public and private cloud computing continues to increase, IT administrators
The IntelliMagic White Paper: Green Storage: Reduce Power not Performance. December 2010
The IntelliMagic White Paper: Green Storage: Reduce Power not Performance December 2010 Summary: This white paper provides techniques to configure the disk drives in your storage system such that they
How To Create An Ad Hoc Cloud On A Cell Phone
Ad Hoc Cloud Computing using Mobile Devices Gonzalo Huerta-Canepa and Dongman Lee KAIST MCS Workshop @ MobiSys 2010 Agenda Smart Phones are not just phones Desire versus reality Why using mobile devices
QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE
QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QlikView Technical Brief April 2011 www.qlikview.com Introduction This technical brief covers an overview of the QlikView product components and architecture
EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi
EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Applied Technology Abstract Microsoft SQL Server includes a powerful capability to protect active databases by using either
AdminToys Suite. Installation & Setup Guide
AdminToys Suite Installation & Setup Guide Copyright 2008-2009 Lovelysoft. All Rights Reserved. Information in this document is subject to change without prior notice. Certain names of program products
Resource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
Towards Energy Efficient Query Processing in Database Management System
Towards Energy Efficient Query Processing in Database Management System Report by: Ajaz Shaik, Ervina Cergani Abstract Rising concerns about the amount of energy consumed by the data centers, several computer
We Deliver the Future of Television The benefits of off-the-shelf hardware and virtualization for OTT video delivery
We Deliver the Future of Television The benefits of off-the-shelf hardware and virtualization for OTT video delivery istockphoto.com Introduction Over the last few years, the television world has gone
Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers
WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that
Interconnection Networks
Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode
McAfee Enterprise Mobility Management Versus Microsoft Exchange ActiveSync
McAfee Enterprise Mobility Management Versus Microsoft Secure, easy, and scalable mobile device management Table of Contents What Can Do? 3 The smartphone revolution is sweeping the enterprise 3 Can enterprises
PC Solutions That Mean Business
PC Solutions That Mean Business Desktop and notebook PCs for small business Powered by the Intel Core 2 Duo Processor The Next Big Thing in Business PCs The Features and Performance to Drive Business Success
Mobile Processors: Future Trends
Mobile Processors: Future Trends Mário André Pinto Ferreira de Araújo Departamento de Informática, Universidade do Minho 4710-057 Braga, Portugal [email protected] Abstract. Mobile devices, such as handhelds,
