Wireless Interconnect for Board and Chip Level,

Similar documents
Schedulability Analysis under Graph Routing in WirelessHART Networks

Optimized Data Indexing Algorithms for OLAP Systems

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions?

An inquiry into the multiplier process in IS-LM model

Verifying Numerical Convergence Rates

The EOQ Inventory Formula

Research on the Anti-perspective Correction Algorithm of QR Barcode

Broadband Digital Direct Down Conversion Receiver Suitable for Software Defined Radio

FINITE DIFFERENCE METHODS

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?

How To Ensure That An Eac Edge Program Is Successful

Distances in random graphs with infinite mean degrees

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS

h Understanding the safe operating principles and h Gaining maximum benefit and efficiency from your h Evaluating your testing system's performance

The modelling of business rules for dashboard reporting using mutual information

Large-scale Virtual Acoustics Simulation at Audio Rates Using Three Dimensional Finite Difference Time Domain and Multiple GPUs

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)

Channel Allocation in Non-Cooperative Multi-Radio Multi-Channel Wireless Networks


SHAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGHTS ON ELECTRICAL LOAD PATTERNS

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

1. Case description. Best practice description

Geometric Stratification of Accounting Data

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

MULTY BINARY TURBO CODED WOFDM PERFORMANCE IN FLAT RAYLEIGH FADING CHANNELS

Bluetooth voice and data performance in DS WLAN environment

A system to monitor the quality of automated coding of textual answers to open questions

College Planning Using Cash Value Life Insurance

Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters

What is Advanced Corporate Finance? What is finance? What is Corporate Finance? Deciding how to optimally manage a firm s assets and liabilities.

Multivariate time series analysis: Some essential notions

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

EECC694 - Shaaban. Transmission Channel

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring Handout by Julie Zelenski with minor edits by Keith Schwarz

Capacity Limits of MIMO Channels

Tangent Lines and Rates of Change

Optimizing Desktop Virtualization Solutions with the Cisco UCS Storage Accelerator

MOST error-correcting codes are designed for the equal

Interconnection Generation for System-on-Chip Design and Design Space Exploration

On-Chip Interconnection Networks Low-Power Interconnect

Torchmark Corporation 2001 Third Avenue South Birmingham, Alabama Contact: Joyce Lane NYSE Symbol: TMK

Instantaneous Rate of Change:

ISSCC 2003 / SESSION 10 / HIGH SPEED BUILDING BLOCKS / PAPER 10.5

Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning

2 Limits and Derivatives

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng


Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky

Note nine: Linear programming CSE Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1

Implementation of Digital Signal Processing: Some Background on GFSK Modulation

Evolution from Voiceband to Broadband Internet Access

Operation go-live! Mastering the people side of operational readiness

OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS

Synthesis Of Polarization Agile Interleaved Arrays Based On Linear And Planar ADS And DS.

Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions

RF Measurements Using a Modular Digitizer

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

EE4367 Telecom. Switching & Transmission. Prof. Murat Torlak

How To Understand The Quality Of A Wireless Voice Communication

Theoretical calculation of the heat capacity

Training Robust Support Vector Regression via D. C. Program

Weighted Thinned Arrays by Almost Difference Sets and Convex Programming

Non-Data Aided Carrier Offset Compensation for SDR Implementation

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

1.6 Gbit/s Synchronous Optical QPSK Transmission with Standard DFB Lasers in Realtime

Beamforming and hardware design for a multichannel front-end integrated circuit for real-time 3D catheter-based ultrasonic imaging.

Photonic Networks for Data Centres and High Performance Computing

Adaptive Linear Programming Decoding

Photonic components for signal routing in optical networks on chip

A strong credit score can help you score a lower rate on a mortgage

13 PERIMETER AND AREA OF 2D SHAPES

PASSIVE FREQUENCY DOUBLING ANTENNA SENSOR FOR WIRELESS STRAIN SENSING

Process Control and Automation using Modbus Protocol

Note: Principal version Modification Modification Complete version from 1 October 2014 Business Law Corporate and Contract Law

Achieving New Levels of Channel Density in Downstream Cable Transmitter Systems: RF DACs Deliver Smaller Size and Lower Power Consumption

Copyright 1996 IEEE. Reprinted from IEEE MTT-S International Microwave Symposium 1996

Digital evolution Where next for the consumer facing business?

ADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING

PERFORMANCE ANALYSIS OF VOIP TRAFFIC OVER INTEGRATING WIRELESS LAN AND WAN USING DIFFERENT CODECS

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Attenuation (amplitude of the wave loses strength thereby the signal power) Refraction Reflection Shadowing Scattering Diffraction

Optimizing Configuration and Application Mapping for MPSoC Architectures

In other words the graph of the polynomial should pass through the points

Interconnection Network Design

T = 1 f. Phase. Measure of relative position in time within a single period of a signal For a periodic signal f(t), phase is fractional part t p

Transcription:

213 IEEE. Reprinted, wit permission, from Gerard P. Fettweis, Najeeb ul Hassan, Lukas Landau, and Erik Fiscer, Wireless Interconnect for Board and Cip Level, in Proceedings of te Design Automation and Test in Europe (DATE'13), Grenoble, France, Marc 18-22, 213. Tis material is posted ere wit permission of te IEEE. Suc permission of te IEEE does not in any way imply IEEE endorsement of any of te products or services of Tecnical University Dresden. Internal or personal use of tis material is permitted. However, permission to reprint/republis tis material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from te IEEE by writing to pubs-permissions@ieee.org. By coosing to view tis document, you agree to all provisions of te copyrigt laws protecting it.

Wireless Interconnect for Board and Cip Level Gerard P. Fettweis, Najeeb ul Hassan, Lukas Landau, and Erik Fiscer Vodafone Cair Mobile Communications Systems Dresden University of Tecnology (TU Dresden), 162 Dresden, Germany Email: {fettweis, najeeb.ul.assan, lukas.landau, erik.fiscer}@ifn.et.tu-dresden.de Abstract Electronic systems of te future require a very ig bandwidt communications infrastructure witin te system. Tis way te massive amount of compute power wic will be available can be inter-connected to realize future powerful advanced electronic systems. Today, electronic inter-connects between 3D cip-stacks, as well as intra-connects witin 3D cipstacks are approacing data rates of 1 Gbit/s soon. Hence, te question to be answered is ow to efficiently design te communications infrastructure wic will be witin electronic systems. Witin tis paper approaces and results for building tis infrastructure for future electronics are addressed. I. INTRODUCTION Future computing platforms will be dominated by massive parallelism in number of processing elements (processors of any kind). Today we are reacing on te order of 1 processors on a single die for GPU implementations [1]. And today we see already multiple dies being stacked into a 3D cip-stack for ig capacity flas memory realization [2]. As we will see a continuation of Moore s Law to 7nm tecnology, combined wit te stacking of cips becoming mainstream, and furter increase in building iger cip-stacks, it is foreseeable tat te number of processors in a cip-stack package reacing far beyond multiple million elements. Tis creates te callenge of building a igly efficient and ig-bandwidt intra-connect for tis massive amount of processors in a cip-stack. Many instances of tis massive number of processors in a cip-stack package will be placed on a printed circuit board, e.g. of size 1cm x 1cm. Assuming 4-5 boards to be placed in a 1 liter box, a billion processors in a liter can be foreseen, wic is an extraordinary large number in terms of today s systems. Wen connecting tese up-to-a-billion processors via multiple boards wit a backplane bus system, tis requires massive bandwidt and switcing capabilities. Again, as in cip-stack intra-connect, te major callenge lies in building a igly efficient interconnect arcitecture wic can carry te bandwidt, enable switcing and connectivity, as well as data rate requirements. In all cases, designing communications links for delivering extreme data rates is of utmost importance: For intra-connects witin a 3D cip-stack. For inter-connects between cip-stacks/packages on a board. Tis work as been supported in part by te DFG in te CRC 912 Higly Adaptive Energy-Efficient Computing and European Social Fund in te framework of te Young Investigators Group 3D Cip-Stack Intraconnects. 978-3-981537--/DATE13/ c 213 EDAA For te backplane of a multi-board system. Te backplane, as an aggregator of traffic and infrastructure provider of multiple simultaneously active connections, is a serious bottleneck for building systems of te future. Hence, we propose to take te load off te backplane by providing direct wireless links between boards, from cip-stack to cipstack. Tese beams sall be using beam-steering antennas at carrier frequencies beyond 2 GHz. In tis case e.g. a 4x4 antenna array can be realized in a 2mmx2mm real estate. For ensuring a best coupling out of te electro-magnetic wave, we propose at tis time to use te interposer as a carrier for te arrays. Tis way te interposer can be designed wit a permeability to acieve best coupling out of te wave. Witin a 3D cip-stack multiple alternatives exist for communicating between te different ciplets, of wic also at least two wireless alternatives exist: Inductive coupling. Capacitive coupling. Eac connection must be able to carry ig data traffic. We propose today tat we must develop solutions to acieve at least 1 Gbit/s, as e.g. [3]. In te coming years tis data rate per link needs to be increased into te Tbit/s range. In tis paper we first analyze te wireless board-to-board link design callenge at a 2 GHz carrier frequency range. After measuring and calculating te link budget, targets wic ave to be met ave been defined. Te intra-connect witin a 3D cip stack is addressed next, sowing tat a careful design of te analog/digital conversion needs to be carried out, to meet a very low power consumption target. Te network design witin a 3D cip-stack is addressed tereafter. And finally, new results on very low latency error correction coding for inter/intra-connects are presented. II. LINK BUDGET FOR BOARD-TO-BOARD COMMUNICATIONS Wireless board-to-board communications requires no routing delay and less material, wic leads to spatial relaxation. Furtermore it is in general more flexible as compared to conventional communication metods in a system wit multiple printed circuit boards (PCBs). In tis section, we consider a scenario were two printed circuit boards are placed in parallel. Bot PCBs are equipped wit multiple wireless communication nodes. Te board-to-board cannel as been measured between 22-245 GHz. Tis data is used to derive a link budget, essentially for te design of a wireless link.

patloss /[db] 1 1 2 3 4 5 computed patloss (n=2.), freespace measurement measured data, NWA, orn orn, freespace computed patloss (n=2.454), parallel copperboards measured data, NWA, parallel copperboards (diagonal links) freespace patloss (+ 2x9.5dB antenna gain) freespace patloss freespace patloss (+ 2x12dB array gain ) impulse response /[db] 45 5 55 6 65 7 75 freepsace parallel copper boards wit 5 mm distance, diagonal link antenna ports (partially coverd by copper board) antenna ports orn antenna and antenna port copper boards (+orn antennas) 6 8 orn antennas 7 2 4 6 8 1 12 14 16 18 2 distance / [mm] Fig. 1. Teoretical patloss and measurement data from board-to-board communications. impulse response /[db] 35 4 45 5 55 6 65 7 freespace parallel copper boards wit 5 mm distance, sortest link copper boards (+orn antennas) orn antennas orn antenna and antenna port antenna ports 75.5 1 1.5 τ / [ns] Fig. 2. Impulse response for a distance of 5mm antenna distance, freespace versus parallel copper boards. A. Measurements wit te Vector Network Analyser For te measurements te network analyser R&S ZVA24 as been used wit an extension for te frequency range between 22 GHz and 245 GHz. Te cannel is measured in frequency domain wit 496 samples. Te system is calibrated wit te direct connection of te waveguides. For te measurements, standard gain orn antennas ave been installed on bot measurement ports wic provide approximately 1 db gain at te considered frequency range. Te distance between te measurement ports is controlled via a stepping motor. Two scenarios are considered: In te first setup, we consider freespace measurements wit absorber material at te ground, for different distances. Te purpose of tis measurement is to identify te effective pase center and te effective antenna gain. In te second scenario, copper boards are included. Tis represent te worst-case of a printed circuit board. Notces are prepared for inserting te orn antennas. Te distance between te two boards is fixed as 5mm, wic sall be a lower bound on a board distance. Diagonal communications are modeled by a rotation of te boards on its z-axis, wic also corresponds to different distances of te measurement ports. Analysing te corresponding impulse response, obtained by applying discrete Fourier transformation leads to a unique identification of te reflecting objects. We conclude tat te reflections presented in Fig. 2 are always at least 15 db below te main signal pat (line of sigt), were we do 85.2.4.6.8 1 1.2 1.4 1.6 1.8 2 τ / [ns] Fig. 3. Impulse response for a distance of 15mm antenna distance, freespace versus parallel copper boards (diagonal link). TABLE I LINK BUDGET PARAMETERS FOR BOARD-TO-BOARD COMMUNICATIONS. Unit Value RX noise figure db 1 Pat loss exponent - 2 Pat loss for sortest link.1m (232.5 GHz) db 59.8 Pat loss for largest link.3m (232.5 GHz) db 69.3 Array gain db 12 Butler matrix inaccuracy db 5 Polarization mismatc db 3 Implementation loss db 5 RX temperature K 323 not distinguis between copper plate and te measurement equipment itself. Tis motivates us to ave a detailed look at te line of sigt component and te evaluation of a simplified freespace patloss model wic can be represented by P L d [db] = P L d [db] + 1 n log 1 ( d d ), (1) were d is te distance, P L d is te reference pat loss at d = d and n is te patloss exponent. After applying an effective pase center of te antennas and an antenna gain of 9.5 db it can be seen in Fig. 1 tat te patloss model is in line wit te measurements and especially also wit te measurements including te copper boards. Tese results sall be a careful justification for our patloss model assumption, tat will be used for te following link budget calculation. B. Link Budget For te board-to-board communication wit multiple communication nodes on eac board, we consider te extreme cases wic are given by te aead link (1mm) and te diagonal link (3mm). It is considered tat eac communication node uses a 4-by-4 antenna array, tis corresponds to an array gain of eac 12 db for te transmitter and te receiver. We distinguis between beamforming/beamsteering, were we refer to te discrete realization of te beamforming vector investigated in [4] and butler matrix realization as a complexity trade-off wic is investigated in [5]. In Table I te link budget parameters are summarized wic are similar to tose in [6]. In order to obtain wireless connections wit data rates up to 1 Gbit/s (using dual polarization) te bandwidt is cosen as 25 GHz. Figure 4 sows te required transmit energy according a target SNR at te receiver, were it is

PTX / [dbm] 4 3 2 1 1 sortest link 1mm longest link 3mm longest link 3mm (wit Butler Matrix direction mismatc).5.5 1 1 2 3 (a) rectangular pulse - no ISI.5.5 1 1 2 3 (b) optimal ISI for symbol-by-symbol detection for SNR=25 db 2 5 1 15 2 25 3 35 SNR / [db] Fig. 4. Required transmit power for a desired SNR at te receiver. assumed tat only te worst-case links suffer from te butler matrix realization..5.5 1 1 2 3.5.5 1 1 2 3 III. BANDWIDTH- AND ENERGY EFFICIENT MULTIGIGABIT/S COMMUNICATIONS BASED ON ONE BIT OVERSAMPLING RECEIVERS Wen considering Multigigabit/s communication speeds over a sort distance, te analog-to-digital conversion requires te main part of te total energy consumption. As a conclusion te considered conversion resolution as to be cosen as low as possible in order to save energy. To obtain a ig spectral efficiency, advanced communication metods ave to be applied. In tis section we introduce an alternative sceme wic is based on a simple one-bit oversampling receiver arcitecture [7]. Wen including an optimized intersymbolinterference (ISI) it can be sown tat te information rate increases significantly [8]. For our investigations we ave considered a regular 4- amplitude sift keying (ASK) modulation sceme and we found 5-fold oversampling as te smallest sampling rate, wic enables unique detection. Te investigated cannel is te additive wite Gaussian nose (AWGN) cannel, wic could be te discussed board-to-board cannel. For simplicity te noise samples are considered to be uncorrelated witin te oversampling vector. Te ISI is represented by a linear filter wic can overlap wit anoter symbol. We allow for te design of tis filter and proposed different strategies for different receiver arcitectures. On one and we consider symbol-by-symbol detection were te ISI is an arbitrary distortion from te receiver point of view, wic appears similar to ditering. For tis case we use te information rate directly as te objective for te filter design optimization illustrated in Fig. 5(b). On te oter and, it as been sown tat it is beneficial to consider sequence estimation were te linear combination introduced by te ISI can be exploited even better. For tis case we propose te design wic maximizes te information rate sown in Fig. 5(c). We also propose a suboptimal filter design wic is not based on te noise caracteristics wic migt be unknown. In tis case te information rate cant be computed and terefore te design is based on te unique detection property in te noise free case sown in Fig. 5(d). (c) optimal ISI for sequence detection (d) suboptimal ISI design for SNR=25 db Fig. 5. Impulse response for different ISI filter designs. I(X;Y)/[bpcu] 2 1.5 1.5 Max Information Rate 1Bit OS Max Information Rate 1Bit OS (symbolwise) Rect 1Bit OS 1Bit No OS No Quantization Proposed Suboptimal Design 1Bit OS 5 5 1 15 2 25 3 35 SNR / [db] Fig. 6. Information rates considering 4-ASK communications; Comparison of different pulse designs for 5-fold oversampling and one bit quantization at te receiver. We ave compared our results wit te ISI free case corresponding to te rectangular pulse [7]. Also, we consider two reference cases were no oversampling and no quantizations taken into account. Our results in Fig. 6 indicate a significant improvement of information rate wen considering intersymbol-interference and especially wen considering sequence estimation. IV. 3D NICS: A TOPOLOGY FOR FUTURE MANY-CORE SYSTEMS In te last few years, a tecnology emerges to close te gap between today s multi-processor system-on-cips (SoCs) and future many-core SoCs [9], wic implement tousands of processors, memories and interfaces on a single cip. Te tecnology is called tree-dimensional (3D) Network-in-Cip- Stack (NiCS) and allows te vertical stacking of multiple cips using, e.g., troug silicon vias (TSVs), optical links or inductive or capacitive coupling [1]. Moreover, wireless cip-to-cip communication can provide a very flexible (even dynamic) solution for te interconnection. 3D NiCS enables a natural extension of te well-known concept of network-oncip (NoC) [11] [12] for te interconnection of a large number of processors by exploiting te tird dimension. Terefore, a ig degree of freedom is provided for topology selection. Tis

Fig. 7. 2D mes 3D mes Module Router Star-mes Ciliated 3D mes Topology types: 2D mes, star-mes, 3D mes and ciliated 3D mes. is especially te case, if wireless cip-to-cip communication is employed. New topologies can be explored tat were not feasible or inefficient to be realized on a two-dimensional plane due to wiring constrains. Many 3D topologies ave recently been discussed in literature, like 3D mes, stacked mes, ciliated 3D mes, or tree-based topologies [13]. In tis section, we focus on te 3D mes topology wit te objective to demonstrate its performance potential compared to classical 2D topologies and study its properties wen scaling to many-core SoC. For investigating te network performance, an analytic model based on queuing teory is employed [14]. Te model is very flexible and allows for fast and accurate simulation of large NoC topologies. A classical two-dimensional (2D) mes, as well as a ierarcical star mes (also called concentrated mes) serve as 2D reference topologies [15]. Tey are compared wit a 3D mes. Note tat a star mes topology can also be applied to a 3D layered arcitecture, wic yields a ciliated 3D mes as sown in [13]. Figure 7 illustrates tese four topology types. Te following advantages of te 3D mes are expected. Low latency: Te ig network concentration and sort wire lengts promise for low routing latencies. Hig trougput: A ig degree of interconnection, i.e., a ig bisection bandwidt, combined wit low routing latencies provide a ig network trougput. Sort wires: Te small distance of te vertical layers and te regular structure of te 3D mes result in sort wires. Te results of te performance analysis for te case of 64 modules (8 8 2D mes vs. 4 4 4 star-mes vs. 4 4 4 3D mes) are sown in Fig. 8(a). Terein, te mean latency in te network is analyzed. A global uniform traffic pattern is assumed wit Poisson arrival streams. Different injection rates are considered, ranging from.1 to.8 flits/cycle/module. We clearly find tat te classical 2D mes is a bad coice w.r.t to latency (13 clock cycles at low traffic) due to te low network concentration and long routing pats. Te point were te latency tends towards infinity is called network saturation point. Tis determines te capacity, i.e., te maximum trougput, of te network. It can be seen tat te 2D mes provides a medium trougput of.41 flits/cycle/module in tis case. Average packet latency [clock cycles] Average packet latency [clock cycles] 5 4 3 2 1 5 4 3 2 1.1.2.3.4.5.6.7 Injection rate [flits/cycle/module] (a) 64 cores 2D-Mes Star-Mes 3D-Mes 2D-Mes 3D-Mes 512 mod. 64 mod..1.2.3.4.5.6.7 Injection rate [flits/cycle/module] (b) 512 cores Fig. 8. Performance analysis of average latency in network for 3D mes topology. Te star-mes topology provides a very good latency at low traffic (7 clock cycles) due to te ig network concentration. However, tis advantage comes at te cost of low network trougput (.19 flits/cycle/module). To improve te low bisection bandwidt of tis topology a common tecnique is to employ multiple inter-router links (IRLs) or using express cannels [15]. Te drawback of tis approac is te ig area consumption of te routers due to te big number of ports. In addition, te star-mes topology does not provide an inerent (natural) scaling. I.e., te number of IRLs as to be adapted manually wit increasing network size and concentration factor. Finally, te 3D mes sows a very good tradeoff between latency and acievable trougput. We observed a good latency (1 clock cycles) combined wit a very ig trougput limit (.75 flits/cycle/module), as depicted in Fig. 8(a). Furtermore, te 3D mes offers very good scaling abilities, as Fig. 8(b) sows for te case of a NoC wit 512 modules (32 16 2D mes vs. 8 8 8 3D mes). We observe tat te latency gap between tese two topologies increases significantly. We conclude tat 3D NiCS using a 3D mes topology is

a promising approac for closing te gap to many-core SoC. Furter investigation of 3D topologies is still necessary. E.g., te large area of TSVs will probably not allow to equip every router wit a vertical link. Furtermore, te vertical inter-cip links are expected to offer a iger bandwidt compared to oncip links. Terefore, irregular topologies wit eterogeneous links sould be investigated more closely. V. LOW-LATENCY ERROR CORRECTION CODING One of te most important issue in terms of latency and link performance is te selection of a suitable cannel coding sceme. In [16] and [17], it as been sown tat convolutional codes are most favorable for low latency applications, wereas strong codes like low-density parity-ceck (LDPC) codes ave a better bit-error-rate (BER) performance wen iger latency can be tolerated. LDPC convolutional codes (LDPC-CCs) can combine bot advantages [18], wic makes tem suitable for latency constrained ig-performance error correction applications. We consider ere te structural latency of te code and is defined as te time tat te en/decoder as to wait for te input bits before te mapping of input bits can take place is due to te structure of te code. Te structural latency is a feature of te coding sceme itself, regardless of current and future ways of implementation. Hence, as pointed out in [16], it provides an ultimate lower bound on te actual delay of te code. A. Low-Density Parity-Ceck Convolutional Codes Consider te transmission of a sequence of L codewords v t, t = 1,..., L. Unlike block encoding, tese L blocks are coupled over various time instants t wit m cc determines te maximal distance between a pair of coupled blocks. Here we restrict ourselves to protograp based codes due to tere ability to facilitate low complexity ardware implementation. A protograp consists of n c ceck nodes and n v variable nodes and is represented by its bi-adjacency matrix B, called base matrix. Te edges are spread according to te component base matrices B, B 1,..., B mcc. In order to maintain te degree distribution and structure of te original ensemble, a valid edge spreading sould satisfy te condition [19] m cc B i = B. (2) i= Te resultant ensemble of terminated LDPC-CCs can be described by means of a convolutional protograp wit termination lengt L B [1,L] = B.... B mcc B..... B mcc (L+m cc)n c Ln v. (3) Te last m cc additional ceck nodes result in te rate-loss due to termination. Tis can be decreased by increasing L, wic increases te resultant structural latency. In te following, we y t+w y t+w 1 W 1 y t Decoding m cc y t mcc Fig. 9. Scematic diagram of a window decoder. Te decoding unit ere represents te belief propagation decoder for an LDPC block code. introduce an elegant yet natural way to decode te LDPC-CC wit large L. Te parity-ceck matrix H of te LDPC-CC can be obtained by replacing every 1 in B [1,L] by a permutation matrix of size N N, wit N being te lifting factor. B. Window Decoding Te sequence of L blocks in (3) corresponds to a coupled codeword wit v = [v 1, v 2,..., v t,..., v L ]. Te decoding can be performed by applying te belief propagation over te lifted matrix H, but tis results in large structural latency. A sliding window decoder of size W operates on W consecutive coupled code blocks v t [2]. Te size W of te window can vary from m cc +1 to L 1. Consider te decoding of a received block y t at time instant t. Based on te results of te Sec. II, we consider AWGN cannel between te nodes on te boards. Figure 9 sows te scematic block diagram of te window decoder wen symbols in te received block y t are te target symbols. Te decoding of y t can only start once te succeeding W 1 blocks are available. Eac of tese code blocks contains Nn v code bits. Furtermore, window decoder also requires read access to te m cc previously decoded blocks due to te memory of te code as sown in Fig. 9. Hence for a code wit rate R, te structural latency of te window decoder T WD depends on te window size W and is expressed in terms of number of information bits as [18] [2] T WD = W Nn v R [information bits]. (4) Note tat te latency for te window decoder in (4) is independent of L. Figure 1 sows te required E b /N to acieve a BER of 1 5 as a function of decoding latency. Te decoding latency for te window decoder depends on W and N. Te window size is te property of te decoder and can be varied to reduce te required E b /N for a given code. Te window size can be adjusted in te decoder depending on te requirements of te application at te given time witout canging te encoder. Tis provides a flexibility in terms of latency and performance on te decoder side. For example, consider te curve wit N =4 in Fig. 1. Te performance improves by increasing W, (W =3 4) but eventually te rate of tis improvement decreases (W =7 8). To cope wit tis, te lifting factor N as to be increased. Te lifting factor determines te constrained lengt of te code, ence increasing N increases te constraint lengt and tus te strengt of te code. Tis is û t

Required Eb/N [db] 5. 4.5 4. 3.5 3. 2.5 N=25, W =3,...,8 N=4, W =3,...,8 N=6, W =4,...,6 LDPC-BC 5 1 15 2 25 3 35 4 Decoding Latency [information bits] Fig. 1. Required E b /N for (4, 8)-regular LDPC-CCs to acieve BER of 1 5 as a function of decoding latency. Te component base matrices used ere for LDPC-CC are B = [2, 2], B 1 = B 2 = [1, 1], (4, 8)-regular wit B = [4, 4] is used for LDPC-BC. demonstrated in Fig. 1 wen te window size is varied for different code wit N=25, 4 and 6. Te required E b /N to acieve a BER of 1 5 for LDPC block code (LDPC-BC) is also plotted. Te structural latency of a block code is equal to te number of information bits in one block and is given as T B = Nn v R [information bits]. (5) Figure 1 sows tat for te complete range of latency, LDPC-CC outperform te corresponding block codes from wic tey are derived. For example, consider te operating value for an E b /N =3 db. LDPC-CC requires T WD =2 information bits, wereas LDPC-BC requires te latency of T B =4 information bits to acieve BER of 1 5. Tis provides te gain of 2 information bits in terms of latency compared to te LDPC-BC. VI. CONCLUSION We propose a new system consisting of wireless links between boards, were eac node is a 3D cip stack. We analyzed te wireless board-to-board links operating at te carrier frequency of 2 GHz range. Te cannel measurements suggests tat te cannel can be assumed to be static and largely frequency flat. In terms of quantization, one bit quantization togeter wit oversampling allows te possibility of acieving ultra-ig data rates at low-power consumption of one bit analog to digital converters. Te results also indicate a significant improvement of information rate wen considering ISI and sequence estimation. Moreover te use of 3D NiCS using a 3D mes topology is sown to be a promising approac for fulfilling te requirements of future data rates. In te end an LDPC-CC ave been analyzed, wic is suitable for providing flexibility between latency and performance of te system. Tis provides adaptability to te system depending on te application requirements. ACKNOWLEDGMENT Te autors would like to tank Prof. D. Plettemeier, M. Jenning and K. Wolf from Tecnisce Universität Dresden for undertaking te board-to-board measurements. REFERENCES [1] [Online]. Available: ttp://www.teregister.co.uk/212/9/18/nvidia tesla k2 bencmarks/ [2] Cadence Wite paper, 3D ICs wit TSVs Design Callenges and Requirements. [3] D. Walter, S. Hoppner, H. Eisenreic, G. Ellgut, S. Henker, S. Hanzsce, R. Scuffny, M. Winter, and G. Fettweis, A sourcesyncronous 9Gb/s capacitively driven serial on-cip link over 6mm in 65nm CMOS, in Proc. of te 59t International Solid-State Circuits Conference (ISSCC), Feb. 212, pp. 18 182. [4] J. Israel and A. Fiscer, An approac to discrete receive beamforming, in Proc. 9t International ITG Conference on Systems, Communications and Coding (SCC), Munic, Germany, Jan. 213. [5] J. Israel, A. Fiscer, and J. Martinovic, Optimal antenna positioning for wireless board-to-board communication using a butler matrix beamforming network, in 17t International ITG Worksop on Smart Antennas (WSA), Stuttgart, Germany, Mar. 213. [6] S. Krone, F. Guderian, G. Fettweis, M. Petri, M. Piz, M. Marinkovic, M. Peter, R. Felbecker, and W. Keusgen, Pysical layer design, link budget analysis and digital baseband implementation for 6 gz sortrange applications, EuMA International Journal of Microwave and Wireless Tecnologies (IJMWT), vol. 2, no. 3, 211. [7] S. Krone and G. Fettweis, Acievable rate wit 1-bit quantization and oversampling at te receiver, in Proc. IEEE Communication Teory Worksop, Cancun, Mexico, May. 21. [8] L. Landau, S. Krone, and G. Fettweis, Intersymbol-interference design for maximum information rates wit 1-bit quantization and oversampling at te receiver, in Proc. 9t International ITG Conference on Systems, Communications and Coding (SCC), Munic, Germany, Jan. 213. [9] S. Borkar, Tousand core cips - a tecnology perspective, in Proc. of DAC, 27. [1] TU-Dresden, ESF Young Investigators Group; 3D Cip Stack Intraconnects - 3DCSI, last visited on 15/1/212. [Online]. Available: ttp://tu-dresden.de/die tu dresden/fakultaeten/ fakultaet elektrotecnik und informationstecnik/3dcsi [11] W. Dally and B. Towles, Route packets, not wires: On-cip interconnection networks, in Proc. of Design Automation Conference (DAC), 21, pp. 684 689. [12] L. Benini and G. De Miceli, Networks on cips: a new SoC paradigm, Computer, vol. 35, no. 1, pp. 7 78, Jan 22. [13] B. Feero and P. Pande, Networks-on-cip in a tree-dimensional environment: A performance evaluation, IEEE Transactions on Computers, vol. 58, no. 1, pp. 32 45, Jan. 29. [14] E. Fiscer, A. Feske, and G. Fettweis, A flexible analytic model for te design space exploration of many-core network-on-cips based on queueing teory, in Proc. of Te Fourt International Conference on Advances in System Simulation (SIMUL), 212, pp. 119 124. [15] J. Balfour and W. J. Dally, Design tradeoffs for tiled cmp on-cip networks, in Proc. of te 2t annual international conference on Supercomputing (ICS), 26, pp. 187 198. [16] T. Hen and J. Huber, LDPC codes and convolutional codes wit equal structural delay: A comparison, IEEE Transactions on Communications, vol. 57, no. 6, pp. 1683 1692, Jun. 29. [17] S. Maiya, D. Costello, T. Fuja, and W. Fong, Coding wit a latency constraint: Te benefits of sequential decoding, in 48t Annual Allerton Conference on Communication, Control, and Computing, Oct. 21, pp. 21 27. [18] N. Ul Hassan, M. Lentmaier, and G. Fettweis, Comparison of LDPC block and LDPC convolutional codes based on teir decoding latency, in Proc. 7 t International Symposium on Turbo Codes & Iterative Information Processing, Aug. 212, pp. 225 229. [19] M. Lentmaier, M. Prenda, and G. Fettweis, Efficient message passing sceduling for terminated LDPC convolutional codes, in Proceedings of IEEE International Symposium on Information Teory (ISIT), Aug. 211, pp. 1826 183. [2] M. Papaleo, A. Iyengar, P. Siegel, J. Wolf, and G. Corazza, Windowed erasure decoding of LDPC convolutional codes, in IEEE Information Teory Worksop (ITW), Jan. 21, pp. 1 5.