Altera SDK for OpenCL v14.1. February 6, 2015
|
|
|
- Mervin McDaniel
- 10 years ago
- Views:
Transcription
1 Altera SDK for OpenCL v14.1 February 6, 2015
2 Industry Challenges Variety of applications are becoming bottlenecked by scalable performance requirements E.g. Object detection and recognition, image tracking and processing, cryptography, cloud, search engines, deep packet inspection, etc Overloading CPUs capabilities Frequencies are capped Processors keep adding more cores Need to coordinate all the cores and manage data Product life cycles are long GPUs lifespan is short Require re-optimization and regression testing between generations 2 Support agreement for GPUs costly Power dissipation of CPUs and GPUs limits system size Maintaining coherency throughout scalable system
3 OpenCL and FPGAs Address These Challenges Power efficient acceleration Typically 1/5 power of GPU and orders of magnitude more performance per watt of CPU FPGA lifecycle over 15 years GPUs lifespan is short Require re-optimization testing between generations FPGA OpenCL code retargeted to future devices without modification Our OpenCL flow abstracts away FPGA hardware flow Puts FPGA into software engineers hands Our OpenCL SDK allows for streaming IO channels and kernel channels Data movement without host involvement 3 Low latency data transmissions to accelerator Shared virtual memory IBM CAPI and Intel QPI
4 Efficiency via Specialization FPGAs ASICs GPUs Source: Bob Broderson, Berkeley Wireless group
5 Application Development Paradigm ASIC FPGA Programmers OpenCL expands The number of application developers Parallel Programmers Standard CPU Programmers 5
6 More SW Engineering Resources than HW? 1000:1 software engineers to FPGA designers Software engineers are not used to long compile times OpenCL Solves This! 6 Our OpenCL flow abstracts away FPGA hardware flow bringing the FPGA to low level software programmers Software developers write, optimize and debug in their software familiar environment Quartus is run behind the scenes Emulator and profiler are software development tools Pushing long compile times to end OpenCL optimization doesn t require a board Allowing SW to drive board requirements (.xml file)
7 OpenCL On FPGAs Fit Into All Markets Automotive/Industrial (Pedestrian Detection, Motion Estimation) Military/Government (Crypto, Image Detection ) Data Processing Algorithms Networking (DPI, SDN, NFV) Computer & Storage (HPC, Financial, Data Compression) 7 Broadcast, Consumer (Video image processing) Medical (Diagnostic Image Processing, BioInformatics)
8 OpenCL and FPGA Acceleration in the News IBM and Altera Collaborate on OpenCL IBM s collaboration with Altera on OpenCL and support of the IBM Power architecture with the Altera SDK for OpenCL can bring more innovation to address Big Data and cloud computing challenges, said Tom Rosamilia, senior vice president, IBM Systems Intel Reveals FPGA and Xeon in One Socket "That allows end users that have applications that can benefit from acceleration to load their IP and accelerate that algorithm on that FPGA as an offload," explained the vice president of Intel's data center group, Diane Bryant Search Engine Gets Help From FPGA "Altera was really interesting in helping with the development the resources they were willing to throw our way were more significant than those from Xilinx Microsoft Engr Manager Baidu and Altera Demonstrate Faster Image Classification Altera Corp. and Baidu, China s largest online search engine, are collaborating on using FPGAs and convolutional neural network (CNN) algorithms for deep learning applications. Xilinx Announces SDAccel Development Environment for OpenCL Delivering Up to 25X Better Performance/Watt to the Data Center 8
9 What is OpenCL? 9 A software programming model for software engineers and a software methodology for system architects First industry standard for heterogeneous computing Provides increased performance with hardware acceleration Low Level Programming language Based on ANSI C99 Open, royalty-free, standard Managed by Khronos Group Altera active member Conformance requirements V1.0 is current reference V2.0 is current release Host C/C++ API OpenCL C Accelerator
10 Driver Stream API CUDA Why Does OpenCL Exist? CPU Programmability Single-Core Multi-Core C/C++ AVX/OpenMP Heterogeneous Programming Architecture PCIe Accelerator Programming Language OpenCL GPGPU Performance 10
11 Programming Language Offerings Target GPU Multi-Core CPU DSP/ Embedded FPGA System (Heterogeneous Platform) Device IP Block Scope Designer Programmer Embedded Programmer Hardware Designer Design Flow CUDA/OpenCL Code Composer Studio (TI C) Quartus II (Verilog/VHDL) Design Activity Design Constraints Hardware Knowledge Task Parallelism Data Parallelism Throughput/Latency Power Efficiency None (Coding Style Guidelines) Real Time Function Acceleration Real Time Execution Cost Limited (macro architecture bandwidth level) HLS IP Design and Integration Clock Frequency Resource Utilization Interface Requirements Power Today Today PoC Yes (protocol-level, timing closure, micro architecture) 11
12 HLS vs OpenCL Positioning Targets CPU, GPU and FPGAs Target user is Software developer Implements FPGA in software development flow Performance is determined by resources allocated Host Required Targets FPGA Target user is FPGA designer Implements FGPA in traditional FPGA development flow Performance is defined and amount of resource to achieve is reported Host not required 12
13 Altera SDK for OpenCL Competitive Differentiator Altera s SDK for OpenCL has proven to be a powerful solution for many vendors Won design tool and development software Elektra award in Europe Won Ultimate Product of the Year for Actively being used today: I was extremely happy to get a great performance with such low effort. I was so impressed with how powerful the Altera tool was! --- Senior Engineer, Altera OpenCL Customer
14 First Conformant OpenCL Solution for FPGAs!!! OpenCL v1.0 specification >8500 Programs tested Supports Arm Host CV and AV SoC
15 Heterogeneous Platform Model OpenCL Platform Model Host Memory (Compute) Device Host Compute Unit Global Memory Processing Element Example Platform x86 PCIe 15
16 Heterogeneous Platform Model OpenCL Platform Model Host Memory Device Device Host Global Memory Example Platform x86 PCIe 16
17 OpenCL Use Model: Abstracting the FPGA away Host Code main() { read_data( ); manipulate( ); clenqueuewritebuffer( ); clenqueuendrange(,sum, ); clenqueuereadbuffer( ); display_result( ); } OpenCL Accelerator Code kernel void sum ( global float *a, global float *b, global float *y) { int gid = get_global_id(0); y[gid] = a[gid] + b[gid]; } Standard gcc Compiler Altera Offline Compiler Verilog EXE AOCX Quartus II Accelerator 17 Host
18 OpenCL Programming Model host.c opencl.h gcc Driver Platform Context Device Queue Acquire Compute Program Kernel Visualize Buffer Launch device.cl aoc 18
19 Interconnect The Only Custom Accelerator Solution: Platforms DDR DDR DDR3 Memory Interface DDR3 Memory Interface OpenCL Domain Built with Altera OpenCL Compiler QDR QDRII Memory Interface QDR QDRII Memory Interface QDR QDRII Memory Interface QDR QDRII Memory Interface Kernel IP Kernel IP 10G Network 10Gb MAC/UOE Data Interface 10Gb MAC/UOE Data Interface 20 Host PCIe gen2x8 Host Interface IO Infrastructure Prebuilt BSP with standard HDL Tools by FPGA Developer
20 10G UDP 10G UDP Altera Reference Platforms Requirement Network Enabled Low Latency High Performance Computing (HPC) Compute Power/ Memory Bandwidth Architecture OpenCL API HAL UMD KMD Stratix V FPGA DMA PCIe CPLD Bridge CPLD DDR3 (OpenCL Kernels) FLASH DDR3 OpenCL API HAL UMD KMD Stratix V FPGA DMA PCIe DDR3 DDR3 (OpenCL Kernels) Global Memory DDR and QDRII+ Large amount of DDR IO Channels 2x10GbE (MAC/UOE) None (Minimize IP overhead) Reference Design OPRA (Streaming) Trading (with global memory access) Option Pricing 21
21 SoC Reference Platforms HPS block removes the complexities of the BSP creation Coherency between Host and Accelerator HPS DDR3 Stratix V FPGA H2F/F2H HPS LWH2F F2S CSR 32bit, 50Mz FPGA Memory OpenCL Kernels DVI DVO Scratch DDR3 Camera Monitor OpenCL Platforms Page contains CV SoC devkit platform users guide 22
22 Altera Network Enabled Reference Platform for OpenCL C/C++ API OpenCL C host.c device.cl Compiler Reference Design Software Layer Hardware Layer Reference Platform Host Device Reference Board bit RHEL 6.4 Windows 7 s5_hft (S5PH-Q)
23 Guaranteed Timing Flow kernel.cl Boardspec.xml AOC Post-fit QXP partition (PCIe, UniPHY, DMA, ) Synthesis / P&R / STA on the OpencL Kernels ONLY No Meet Timing Yes Reconfig kernel PLL Re-run STA with the new PLL value 24 DONE!
24 Interface Heterogeneous Memory Support Host Memory Host IO Global Memory1 Global Memory2 IO Device CU Memories with different characteristics DDR Sequential Access QDR Random Access On-Chip Low Latency kernel void foo( global uint *data attribute((buffer_location(qdr) )) ) { foo(data[i]); } MoSysEfficient HMC High Capacity Combine different memories Attribute-based Automatic 25
25 Interconnect Interconnect Channels Advantage Standard OpenCL Altera Vendor Extension IO and Kernel Channels DDR DDR QDR QDR QDR QDR DDR3 Interface DDR3 Interface QDRII Interface QDRII Interface QDRII Interface QDRII Interface CvP Update OpenCL Kernels OpenCL Kernels DDR DDR QDR QDR QDR QDR DDR3 Interface DDR3 Interface QDRII Interface QDRII Interface QDRII Interface QDRII Interface CvP Update OpenCL Kernels OpenCL Kernels 10G Network 10Gb Interface 10Gb Interface 10G Network 10Gb Interface 10Gb Interface Host Host Interface Host Host Interface 26
26 Kernel Development Flow Modify kernel.cl x86 Emulator (sec) Functional Bugs? Hardware performance met? Optimization Report (min) Prototype (min) Stall-free pipeline? Memory coalesced? Profiler (hours) 28 DONE!
27 x86 emulator Beta v14.1 Enable functional debug on x86 system of kernel code Prototype support to allow users run kernels on x86 platform Debug support for Altera vendor specific debug support such as channels kernel void accel( ) { gid = get_global_id(0); out[gid] = proc(data[gid]); } x86 Kernel Compiler./kernel_tb Running Supports OpenCL syntax Channels Printf 29
28 Example: Load to Store dependency kernel void prefixsum( global int* restrict A, unsigned N ) { for ( unsigned i = 1 ; i < N ; i++ ) { int a = A[i-1]; A[i] += a; } } ============================================================================== *** Optimization Report *** ============================================================================== Relative cost of global Kernel: prefixsum Ln.Col ============================================================================== memory to local Loop for.body computation 2.25 Pipelined execution inferred. Successive iterations launched every 321 cycles due to: Memory dependency on Load Operation from: 3.21 Store Operation 4.7 Largest Critical Path Contributors: True fix requires 49%: Load Operation restructuring the code %: Store Operation 4.7 ============================================================================= 30
29 Example: Accumulating a value kernel void test( global float* restrict input, global float* restrict output, unsigned N ) { float mul = 1.0f; for ( unsigned i = 0; i < N; i++ ) { mul *= input[ i ]; } *output = mul; } ================================================================================== *** Optimization Report *** ================================================================================== Kernel: test Ln.Col ================================================================================== Loop for.body 5.24 Pipelined execution inferred. Successive iterations launched every 3 cycles due to: Data dependency on variable mul 4.10 Largest Critical Path Contributor: 100%: Fmul Operation 6.7 ================================================================================== 31
30 Rapid Prototyping Beta v14.1 Increases productivity during application development Uses a library of pre-compiled templates to skip Quartus II compilation Can test small versions of the final design on hardware very quickly OpenCL Compiler aoc Quartus II ~ hours User Program OpenCL Compiler HW Implementation aoc march=prototype Configuration Template Library ~minutes Ability to generate custom templates based on user kernels Tailors the Rapid Prototyping Template Library to the user 32
31 Profiler BETA v14.1 Instrument the pipeline with performance counters and profiling logic Transfer the profiling information to the host via PCIe link Kernel Pipeline kernel void accel( ) { gid = get_global_id(0); out[gid] = a[gid]+b[gid]; } Load + Load Memory Mapped Registers Store 33
32 Profiler BETA v14.1 Bottlenecks, bandwidth, saturation, pipeline occupancy 34
33 OpenCL Host Library & Run Time Environment (RTE) Host library improvements: Lower CPU usage Improved scalability Lower memory footprint Faster run time SDK & Run Time Environment: OS SDK (needs ACDS) RTE Windows x86-64 Installer Installer Linux (RHEL) x86-64 Installer, RPM Installer, RPM Linux (RHEL) Power - RPM Linux (custom) CV SoC - Tarball 35
34 Installable Client Driver BETA v14.1 host.c clgetplatformid opencl.h ICD nvidiaopencl Acquire Compute AlteraOpenCL Visualize device.cl HKEY_LOCAL_MACHINE\SOFTWARE\ Khronos\OpenCL\Vendors <library>.dll DWORD /etc/opencl/vendors /<vendor>.icd <library>.so 36
35 Altera Client Driver BETA v14.1 host.c clgetplatformid opencl.h ICD nvidiaopencl Acquire clgetdeviceid Compute AlteraOpenCL ACD Visualize device.cl 37
36 OpenCL + FPGA Key Benefits Faster development vs. traditional FPGA design flow Puts the FPGA in the software developers hands Familiar C-based development flow Higher performance/watt vs. CPU/GPGPU Implement exactly what you need Pipeline parallel structures Custom interconnect converging with data processing cores Lower power vs. CPU/GPGPU Core frequency lower: MHz vs 1GHz Turn off unused logic Up to 1/5 the power Portability & Obsolescence free Code can transfer between different HW accelerators (CPU, GPGPU, FPGA, etc) Code ports seamlessly to new generations of the FPGA 38 FPGA life cycle considerably longer than CPUs or GPGPUs
37 Additional Resources
38 Optimize Design Set Up Altera SDK for OpenCL Design Flow Getting Started Guide (document) Install Quartus II v13.1 with Altera SDK for OpenCL Install C Compiler or Development Environment Obtain and setup license from the Self Service Licensing Center Install the FPGA (OpenCL) board aocl install Programming Guide (document) Develop kernel code and compile on CPU/GPU for functional correctness Build, compile & link the host application (Visual Studio/GCC) Compile the OpenCL kernel with Altera offline Compiler (aoc) Run the application Best Practices (document) Optimize kernel for FPGA hardware 40
39 Additional Altera OpenCL Collateral White papers on OpenCL OpenCL online demos OpenCL design examples Instructor-Led training Parallel Computing with OpenCL Workshop by Altera (1 Day) Optimization of OpenCL for Altera FPGAs Training by Altera (1 Day) Online training Introduction to Parallel Computing with OpenCL Writing OpenCL Programs for Altera FPGAs Running OpenCL on Altera FPGAs Single-Threaded vs. Multi-Threaded Kernels Building Custom Platforms for Altera SDK for OpenCL OpenCL board partners page 41
40 Application Benchmarking
41 Case Study: GZIP Compression OpenCL Was 10% Slower 12% more resources 3x faster development time Altera summer intern ported and optimized GZIP algorithm in a little more than a month Industry leading companies FPGA engineer coded Verilog in 3 months Much lower design effort and design time 43
42 Conclusions Results CHREC/Univ of Florida Sobel, Canny, & SURF OpenCL vs. VHDL productivity table VHDL development time OpenCL development time 6 months 1 month Apps. OpenCL vs. VHDL performance table Frames/sec VHDL performance OpenCL performance Stratix 4 Predicted Stratix 5 Stratix 5 Max freq. Frames/sec Max freq. Frames/sec Max freq. Sobel Canny SURF Avoid productivity challenges of HDL 6 increase in productivity OpenCL offers familiar C environment Develop fully pipeline kernels Minimum performance cost < 10 % overhead Productivity Performance 44
43 Case Study: Image Classification Deep Learning Algorithm Convolutional Neural Networking Based on Hinton s CNN Early Results on Stratix V 2X Perf./Power vs. gpgpu despite soft floating point 8+ simultaneous kernels vs. 2 on gpgpu Exploiting OpenCL channels between kernels A10 Expectations Hard floating point Better density and frequency ~ 4X performance/watt v SV The CIFAR-10 dataset consists of x32 colour images in 10 classes, with 6000 images per class. There are training images and test images. Here are the classes in the dataset, as well as 10 random images from each: airplane automobile bird cat deer dog frog horse ship truck Hinton s CNN Algorithm 45
44 AES Encryption Encryption/decryption 256bit key Counter (CTR) method Platform Advantage FPGA Integer arithmetic Coarse grain bit operations Complex decision making Results E5503 Xeon Processor (single core) Power (W) Performance (GB/s) Efficiency (MB/s/W) est AMD Radeon HD 7970 est PCIe385 A7 Accelerator
45 Multi-Asset Barrier Option Pricing Monte-Carlo simulation No closed form solution possible High quality random number generator required Billions of simulations required Used GPU vendors example code Advantage FPGA Complex Control Flow Optimizations Channels, loop pipelining Results Platform Power (W) Performance (Bsims/s) Efficiency (Msims/s/W) W3690 Xeon Processor nvidia Kepler Bittware S5-PCIe-HQ
46 Document Filtering Unstructured data analytics Bloom Filter 48 Platform Advantage FPGA Integer Arithmetic Flexible Memory Configuration Results Power (W) Performance (MTs) Efficiency (MTs/W) W3690 Xeon Processor nvidia Tesla C PCIe385 A7 Accelerator
47 Consumer (Japan) Image Processing Adaptive weighted images p xy c d c d c c 1 ij 1 xy 1 ( i 1) j 2 xy 2 ij xy 2 ( i 1 ) j W d d 2 xy Advantage FPGA Integer Arithmetic Results Platform Power (W) Performance (FPS) Efficiency (FPS/W) W3565 Xeon Processor est nvidia Quadro 4000 est PCIe385 A7 Accelerator
48 Smith-Waterman Sequence Alignment Scoring Matrix Platform Advantage FPGA Integer Arithmetic SMT Streaming Results Power (W) Performance (MCUPS) Efficiency (MCUPS/W) W3565 Xeon Processor nvidia K PCIe385 A7 Accelerator
49 Multi Function Printer Image Processing RGB output of raster scanner converted to CMYK colorants for printing Advantage FPGA SoC Solution IO and Kernel Channels Heterogeneous memory accesses Goal 50PPM at A4/letter size Results >40X improvement over C based algorithm on ARM only No NEON coprocessor used C6 speed grade part improved 20% to 128PPM 51
50 Suricata: IDS/IPS Implementation (Cybersecurity) 2x 10 Gbps ETH IO ETH IO STD IDS PKT PKT Processing Analysis DPI PKT PKT Processing Analysis Traffic Control IPS PKT Manipulation ETH IO ETH IO 2x 10 Gbps Ingress Network Path STD Rules Memory (QDR or DDR) DPI Rules Memory (QDR or DDR) IDS/IPS MGMT Mirror for Egress Network Path Packet Analysis Kernel IDS (task) Stream in decoded packets and store in local memory (aoclreadchannel) Parallel regex with STD rules in global memory (heterogeneous memory support) Write results to global memory Stream out decoded packets (aoclwritechannel) Host IDS/IPS Management Read results from global memory and log Decide to modify or delete packets Packet Manipulation Kernel - IPS (task) Stream in decoded packets (aoclreadchannel) Read and process decision from the host Stream out decoded packets (aoclwritechannel) Decoder Kernel (autorun) Stream in encoded packets (aoclreadchannel) Unpack single streams Stream out decoded packets (aoclwritechannel) Packet Analysis Kernel - DPI (task) Stream in decoded packets and store in local memory (aoclreadchannel) Parallel regex with DPI rules in global memory (heterogeneous memory support) Write results to global memory Stream out decoded packets (aoclwritechannel) Encoder Kernel (autorun) Stream in decoded packets (aoclreadchannel) Repack multiple streams Stream out encoded packets (aoclwritechannel)
51 Haplotype Caller (Pair-HMM) Smith Waterman like algorithm Uses hidden markov models to compare gene sequences 3 stages: Assembler, Pair-HMM (70%), Traversal +Genotyping Floating point (SP + DP) C++ code starting point (from JAVA) Whole genome takes 7.6 days! Results Platform Runtime (ms) Java (gatk 2.8) 10,800 Intel Xeon E nvidia Tesla K40 70 Nallatech SV-D
52 Sobel Filter Fundamental image filter algorithm Used commonly in industrial and automotive applications Sliding window based design pattern Same shift register structure, except in two dimensions WIDTH*3 WIDTH*4-1 WIDTH*4-9 A B C WIDTH*3-9 E F G WIDTH* Pixels enter here WIDTH-9 WIDTH-1
53 Task Implementation and Results Altera OpenCL kernel void sobel(int iters) { // Coefficients int Gx[3][3] = {{-1,-2,-1},{0,0,0},{1,2,1}}; int Gy[3][3] = {{-1,0,1},{-2,0,2},{-1,0,1}}; int rows[2 * COLS + 3]; // line buffer int count = 0; while (count!= iters) { // Shift the line buffer #pragma unroll for (int i = COLS * 2 + 2; i > 0; --i) { rows[i] = rows[i - 1]; } rows[0] = read_channel_altera(in_channel); On our design example website pport/examples/opencl/s obel-filter.html Device Resolution FPS } } int x_dir = 0; int y_dir = 0; #pragma unroll for (int i = 0; i < 3; ++i) { #pragma unroll for (int j = 0; j < 3; ++j) { x_dir += rows[i * COLS + j] * Gx[i][j]; y_dir += rows[i * COLS + j] * Gy[i][j]; } } int edge_weight = abs(x_dir) + abs(y_dir); write_channel_altera(out_channel, edge_weight); ++count; Cyclone V 1080p 60 Stratix V 1080p
How OpenCL enables easy access to FPGA performance?
How OpenCL enables easy access to FPGA performance? Suleyman Demirsoy Agenda Introduction OpenCL Overview S/W Flow H/W Architecture Product Information & design flow Applications Additional Collateral
FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25
FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 December 2014 FPGAs in the news» Catapult» Accelerate BING» 2x search acceleration:» ½ the number of servers»
Embedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven [email protected] Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
Data Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
Intel Xeon +FPGA Platform for the Data Center
Intel Xeon +FPGA Platform for the Data Center FPL 15 Workshop on Reconfigurable Computing for the Masses PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA
CFD Implementation with In-Socket FPGA Accelerators
CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline
Cloud Data Center Acceleration 2015
Cloud Data Center Acceleration 2015 Agenda! Computer & Storage Trends! Server and Storage System - Memory and Homogenous Architecture - Direct Attachment! Memory Trends! Acceleration Introduction! FPGA
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
Extending the Power of FPGAs. Salil Raje, Xilinx
Extending the Power of FPGAs Salil Raje, Xilinx Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of
FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab
FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
Architectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
Xilinx SDAccel. A Unified Development Environment for Tomorrow s Data Center. By Loring Wirbel Senior Analyst. November 2014. www.linleygroup.
Xilinx SDAccel A Unified Development Environment for Tomorrow s Data Center By Loring Wirbel Senior Analyst November 2014 www.linleygroup.com Copyright 2014 The Linley Group, Inc. This paper examines Xilinx
All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule
All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:
OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
Case Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke [email protected] ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014
CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014 Introduction Cloud ification < 2013 2014+ Music, Movies, Books Games GPU Flops GPUs vs. Consoles 10,000
Infrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
Computer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] NIOS II 1 1 What is Nios II? Altera s Second Generation
High-Level Synthesis for FPGA Designs
High-Level Synthesis for FPGA Designs BRINGING BRINGING YOU YOU THE THE NEXT NEXT LEVEL LEVEL IN IN EMBEDDED EMBEDDED DEVELOPMENT DEVELOPMENT Frank de Bont Trainer consultant Cereslaan 10b 5384 VT Heesch
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
COSCO 2015 Heterogeneous Computing Programming
COSCO 2015 Heterogeneous Computing Programming Michael Meyer, Shunsuke Ishikuro Supporters: Kazuaki Sasamoto, Ryunosuke Murakami July 24th, 2015 Heterogeneous Computing Programming 1. Overview 2. Methodology
A Scalable VISC Processor Platform for Modern Client and Cloud Workloads
A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015 Agenda Soft Machines Background
Model-based system-on-chip design on Altera and Xilinx platforms
CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect [email protected] Agenda 3T Company profile Technology
Data and Control Plane Interconnect solutions for SDN & NFV Networks Raghu Kondapalli August 2014
Data and Control Plane Interconnect solutions for SDN & NFV Networks Raghu Kondapalli August 2014 Title & Abstract Title: Data & Control Plane Interconnect for SDN & NFV networks Abstract: Software defined
Altera SDK for OpenCL
Altera SDK for OpenCL Best Practices Guide Subscribe OCL003-15.0.0 101 Innovation Drive San Jose, CA 95134 www.altera.com TOC-2 Contents...1-1 Introduction...1-1 FPGA Overview...1-1 Pipelines... 1-2 Single
Direct GPU/FPGA Communication Via PCI Express
Direct GPU/FPGA Communication Via PCI Express Ray Bittner, Erik Ruf Microsoft Research Redmond, USA {raybit,erikruf}@microsoft.com Abstract Parallel processing has hit mainstream computing in the form
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
FPGA-based MapReduce Framework for Machine Learning
FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China
Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and
Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and Simulink targeting ASIC/FGPA. Previously Worked as logic
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Kalray MPPA Massively Parallel Processing Array
Kalray MPPA Massively Parallel Processing Array Next-Generation Accelerated Computing February 2015 2015 Kalray, Inc. All Rights Reserved February 2015 1 Accelerated Computing 2015 Kalray, Inc. All Rights
GeoImaging Accelerator Pansharp Test Results
GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance
Packet-based Network Traffic Monitoring and Analysis with GPUs
Packet-based Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar [email protected], [email protected] GPU Technology Conference 2014 March 24-27, 2014 SAN JOSE, CALIFORNIA Background Main
Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik
Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Contents Überblick: Aufbau moderner FPGA Einblick: Eigenschaften
Deep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu [email protected]
Deep Learning Meets Heterogeneous Computing Dr. Ren Wu Distinguished Scientist, IDL, Baidu [email protected] Baidu Everyday 5b+ queries 500m+ users 100m+ mobile users 100m+ photos Big Data Storage Processing
The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
Tutorial: Harnessing the Power of FPGAs using Altera s OpenCL Compiler Desh Singh, Tom Czajkowski, Andrew Ling
Tutorial: Harnessing the Power of FPGAs using Altera s OpenCL Compiler Desh Singh, Tom Czajkowski, Andrew Ling OPENCL INTRODUCTION Programmable Solutions Technology scaling favors programmability and parallelism
VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS
VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,
AN FPGA FRAMEWORK SUPPORTING SOFTWARE PROGRAMMABLE RECONFIGURATION AND RAPID DEVELOPMENT OF SDR APPLICATIONS
AN FPGA FRAMEWORK SUPPORTING SOFTWARE PROGRAMMABLE RECONFIGURATION AND RAPID DEVELOPMENT OF SDR APPLICATIONS David Rupe (BittWare, Concord, NH, USA; [email protected]) ABSTRACT The role of FPGAs in Software
Parallelization of video compressing with FFmpeg and OpenMP in supercomputing environment
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 231 237 doi: 10.14794/ICAI.9.2014.1.231 Parallelization of video compressing
High Performance GPGPU Computer for Embedded Systems
High Performance GPGPU Computer for Embedded Systems Author: Dan Mor, Aitech Product Manager September 2015 Contents 1. Introduction... 3 2. Existing Challenges in Modern Embedded Systems... 3 2.1. Not
Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration
Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Jinglin Zhang, Jean François Nezan, Jean-Gabriel Cousin, Erwan Raffin To cite this version: Jinglin Zhang,
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
High Performance or Cycle Accuracy?
CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing
Parallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
High-performance vswitch of the user, by the user, for the user
A bird in cloud High-performance vswitch of the user, by the user, for the user Yoshihiro Nakajima, Wataru Ishida, Tomonori Fujita, Takahashi Hirokazu, Tomoya Hibi, Hitoshi Matsutahi, Katsuhiro Shimano
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected]
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected] Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
NVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim?
Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? Successful FPGA datacenter usage at scale will require differentiated capability, programming ease, and scalable implementation models Executive
A Computer Vision System on a Chip: a case study from the automotive domain
A Computer Vision System on a Chip: a case study from the automotive domain Gideon P. Stein Elchanan Rushinek Gaby Hayun Amnon Shashua Mobileye Vision Technologies Ltd. Hebrew University Jerusalem, Israel
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:
COMPUTING. SharpStreamer Platform. 1U Video Transcode Acceleration Appliance
COMPUTING Preliminary Data Sheet SharpStreamer Platform 1U Video Transcode Acceleration Appliance The SharpStreamer 1U Platform enables high density voice and video processing in a 1U rack server appliance
HPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful:
Course materials In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful: OpenCL C 1.2 Reference Card OpenCL C++ 1.2 Reference Card These cards will
Introduction to GPGPU. Tiziano Diamanti [email protected]
[email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com
Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and
High Efficiency Video Coding (HEVC) or H.265 is a next generation video coding standard developed by ITU-T (VCEG) and ISO/IEC (MPEG).
HEVC - Introduction High Efficiency Video Coding (HEVC) or H.265 is a next generation video coding standard developed by ITU-T (VCEG) and ISO/IEC (MPEG). HEVC / H.265 reduces bit-rate requirement by 50%
Development With ARM DS-5. Mervyn Liu FAE Aug. 2015
Development With ARM DS-5 Mervyn Liu FAE Aug. 2015 1 Support for all Stages of Product Development Single IDE, compiler, debug, trace and performance analysis for all stages in the product development
Accelerating variant calling
Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo
GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics
GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
Accelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
HP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich [email protected]
A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications
1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
GPU-Based Network Traffic Monitoring & Analysis Tools
GPU-Based Network Traffic Monitoring & Analysis Tools Wenji Wu; Phil DeMar [email protected], [email protected] CHEP 2013 October 17, 2013 Coarse Detailed Background Main uses for network traffic monitoring
Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1
(DSF) Quartus II Stand: Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] Quartus II 1 Quartus II Software Design Series : Foundation 2007 Altera
Cloud-Based Apps Drive the Need for Frequency-Flexible Clock Generators in Converged Data Center Networks
Cloud-Based Apps Drive the Need for Frequency-Flexible Generators in Converged Data Center Networks Introduction By Phil Callahan, Senior Marketing Manager, Timing Products, Silicon Labs Skyrocketing network
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS
HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS JASON POWER*, ARKAPRAVA BASU*, JUNLI GU, SOORAJ PUTHOOR, BRADFORD M BECKMANN, MARK D HILL*, STEVEN K REINHARDT, DAVID A WOOD* *University of
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
Fujisoft solves graphics acceleration for the Android platform
DESIGN SOLUTION: A C U S T O M E R S U C C E S S S T O R Y Fujisoft solves graphics acceleration for the Android platform by Hiroyuki Ito, Senior Engineer Embedded Core Technology Department, Solution
ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer
ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop Emily Apsey Performance Engineer Presentation Overview What it takes to successfully virtualize ArcGIS Pro in Citrix XenApp and XenDesktop - Shareable
OpenCL Programming for the CUDA Architecture. Version 2.3
OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different
Le langage OCaml et la programmation des GPU
Le langage OCaml et la programmation des GPU GPU programming with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte Le projet OpenGPU : un an plus tard Ecole Polytechnique - 8 juin 2011 Outline
Scaling from Datacenter to Client
Scaling from Datacenter to Client KeunSoo Jo Sr. Manager Memory Product Planning Samsung Semiconductor Audio-Visual Sponsor Outline SSD Market Overview & Trends - Enterprise What brought us to NVMe Technology
What is a System on a Chip?
What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Qsys and IP Core Integration
Qsys and IP Core Integration Prof. David Lariviere Columbia University Spring 2014 Overview What are IP Cores? Altera Design Tools for using and integrating IP Cores Overview of various IP Core Interconnect
~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect
OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE Guillène Ribière, CEO, System Architect Problem Statement Low Performances on Hardware Accelerated Encryption: Max Measured 10MBps Expectations: 90 MBps
Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education
Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,
How SSDs Fit in Different Data Center Applications
How SSDs Fit in Different Data Center Applications Tahmid Rahman Senior Technical Marketing Engineer NVM Solutions Group Flash Memory Summit 2012 Santa Clara, CA 1 Agenda SSD market momentum and drivers
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
