NANOMOS 4.0: A TOOL TO EXPLORE ULTIMATE SI TRANSISTORS AND BEYOND. ADissertation. Submitted to the Faculty. Purdue University.

Transcription

1 NANOMOS 4.0: A TOOL TO EXPLORE ULTIMATE SI TRANSISTORS AND BEYOND ADissertation Submitted to the Faculty of Purdue University by Xufeng Wang In Partial Fulfillment of the Requirements for the Degree of Master of Science in Electrical and Computer Engineering May 2010 Purdue University West Lafayette, Indiana

2 To my family and the past nine years. ii

3 iii ACKNOWLEDGMENTS Iwouldliketoexpressmysincerethankstobothmyadvisors,ProfessorGerhard Klimeck and Professor Mark Lundstrom for their guidance and help throughout my Master study here at Purdue. I met Professor Klimeck four years ago when I was an undergraduate student. It was him who first brought me into the realm of nanoelectronic research. He provided me with opportunities and support in various projects, and more importantly the freedom to explore and guidance when needed. For everything he has done for me, I m forever indebted to him. I equally thank Professor Lundstrom for his kindness to take me under his guidance as well. He is always very careful in studies and pays attention to details, which sets out great examples for me. I especially liked his strict understanding before doing attitude, and benefited much from his experience and knowledge. I m also very thankful for all the helps I ve received from discussion with Dr. Mathieu Luisier, Dr. Tillmann Kubis, Dr. Sebastian Steiger, Dr. Tony Low, and other post doctoral fellows. Whenever I run into a problem and bother them, they ve always been kind and patient to answer me. I especially thank Dr. Tillmann Kubis for reading my thesis and recommendations and help on LaTeX issues. I appreciate the help and support from my colleagues here at NCN@Purdue, especially my peers from Professor Klimeck s and Professor Lundstrom s research groups. I thank Yunfei, Yang, Himadri, Dionisis, Zhengping and many others for the fruitful discussions. Mrs. Cheryl Haines and Mrs. Vicki Johnson has my many thanks for helping me over various scheduling and tasks. They are the most dutiful secretaries I ve every seen. Outside of Purdue, I would like to thank Dr. Dmitri Nikonov for guidance on nanomos development and reading my thesis. Dr. Nikonov is a knowledgeable pro-

4 i fessional from industry, and I particular liked his visits since he always had nice stories and refreshing prospectives. I also thank Professor Dragica Vasileska at University of Arizona for her help over various topics over the past several years. I would also like to thanks Professor Alejandro Strachan for serving on my advisory committee. This August marks the 6th anniversary of my day at Purdue. Life here is quiet and wonderful, and everyday of mine is delighted by all the friends I have around. D.C. trip with Zhengping and Yi, Chicago visit with Yunfei, cherry picking with Yang...little pieces here and there forms the unforgettable memory which deepens my love for this place. To all my dear friends: I m just so glad to have you! Last but the most, I would like to thank my entire family for their support and love. No word can describe how much I love and miss them. I m also fortunate enough to have the best girl on earth, Hui to care for me. For all of this and to all of them, Idedicatemywork.

5 ii TABLE OF CONTENTS Page LIST OF FIGURES vi ABSTRACT x 1 Introduction Difficulties in Si MOSFET scaling Emerging candidates for Si transistor replacement Si Double-gated MOSFET III-V Double-gated MOSFET Schottky barrier FET III-V High Electron Mobility Transistor (HEMT) SpinFET Development history Objective Chapters overview Semi-classical transport in nanomos Introduction Transport model overview Overall scheme of nanomos simulator Introduction Basic equations Self-consistent calculation Effective mass model Mode space approach Drift-diffusion transport Introduction

6 iii Page Equation reformulated Scharfetter and Gummel Method Drift-diffusion current Semi-classical Ballistic transport Introduction The top of the barrier model Electron density in top of the barrier model Ballistic current Poisson s equation Introduction Device grid and boundary conditions Transport-Poisson coupling with Gummel scheme Transforming electron density between 2D and 1D Coupled solution scheme Non-linear damping Newton-Raphson iteration for Poisson s equation Quantum ballistic transport Introduction Transport model overview System Hamiltonian Introduction The Schrdinger equation Choice of basis: atomistic tight-binding vs. effective mass model Choice of representation: real space vs. mode space System Hamiltonian in matrix form with finite difference approximation Recursive Green s Function formalism Introduction

7 iv Page Motivation Dyson s equation and recursive method Simple NEGF formalism Introduction Ballistic NEGF electron density Ballistic NEGF current nanomos 4.0 Application Example Introduction Model Device: an Undoped-body Extremely Thin SOI MOSFET with Back Gate Results: Internal Quantities Introduction Internal Quantities Results: IV Characteristics Introduction I-V characteristics in the ballistic limit via NEGF Subthreshold characteristics Results: Comparison between various transport models Semiclassical Ballistic vs. Quantum Ballistic Drift-Diffusion vs. Quantum Ballistic Summary Auxiliary programs of nanomos Introduction Graphical User Interface Introduction Rapid Application Infrastructure (Rappture) toolbox Understanding the Rappture toolbox Implementation in nanomos

8 v Page 5.3 nanomos on computer clusters Embarrassingly parallel scheme nanomos Parallel Job Submitter (PJS) Benchmark and testing suite A Finite difference method (FDM) and discretization of Hamiltonian B Matrix inversion techniques B.1 Matrix types B.2 LU Decomposition C Bandstructure calculation, wavefunction and NEGF formalism in atomistic tight-binding simulation C.1 Introduction C.2 Complex electronic bandstructure C.3 Wave function formalism C.3.1 Alternative form C.3.2 Choice of k C.3.3 Transmission from coefficient C.4 NEGF formalism C.4.1 Contact Self-energy from wavefunction formalism C.4.2 Other methods of determining contact self-energy LIST OF REFERENCES

9 vi Figure LIST OF FIGURES Page 1.1 Ideal double-gated MOSFET structure used in nanomos. Source, drain, and channel are all of same semiconductor material Ideal Schottky barrier FET structure used in nanomos. It s similar to double gate MOSFET except with metallic contacts Ideal HEMT structure used in nanomos Ideal spinfet structure used in nanomos. Here we show an example of up-spin electrons being transported across the channel. In this case, the two HMF layers are in parallel configuration. Size of dot denotes relative population of electrons with that spin Picture shows the change in point of view from an electron with rest mass traveling in crystal lattice to one with modified mass traveling in constant potential. By introducing the so called effective mass, we eliminate the need to consider the rapid changing potential from crystal lattice and simplified our calculation D Schrdinger equation is solved in each vertical slice illustrated in the figure. The grid is same as the one used in Poisson s solver Comparison between wavefunction profile bewteen oxide penetration on and off It shows a possible trajectory of an electron traveling in diffusive manner across the channel. Red crosses are scattering centers where electron s momentum is relaxed. Notice electrons with energy lower than the barrier can still get across due to scattering possibly from phonon Illustration of the meaning of mid-nodes. The red nodes are actual solution nodes of Poisson grid. Green nodes are midnodes which are in the middle of actual solution nodes Illustration of electrons injected into the channel with higher or lower energy than top of barrier Illustration of electron density evaluation at point A (red box on the left). The red box includes every possible electron stream, and resulting electron density is just a simple summation of all the streams

10 vii Figure Page 2.8 Illustration of electron current calculation at point A (red box on the left) The device grid and boundary conditions used for solving Poisson s equation. The actual grid is finer than the one illustrated here Illustration of Control Volumn Method (CVM) on node φ m,n.thedashed black box is the control volumn This chart illustrates how to convert electron density between 2D and 1D within a certain vertical slice Illustration of how Newton iterations converge toward a solution with a good starting guess Illustration of Newton iteration with a bad starting guess can be saved from divergence with Brown and Lindsay suggested fix Forward partition of total Hamiltonian. The lower right piece is just the Hamiltonian of a single layer. The two non-square pieces are the coupling between the singled out layer and rest of the device Illustration of recursive partition of Hamiltonian matrix and the chain of equations associated with it After the first recursion process, we forward partition again to obtain the diagonal and nearest off-diagonal blocks. This time however, we partition the blocks differently Illustration of the second forward partition process. The red blocks are ones determined, and green blocks are ones to be determined in that recursion step Fabricated device structure by IBM Corporation. The curve cut in buried oxide layer indicates a thick region not shown in the figure. The nanomos simulation region is enclosed by the dashed line Electrostatic potential profile of entire device region. Most of the plot is occupied by the buried oxide layer which is 145nm thick Electrostatic potential profile focusing on channel region. x is transport direction; z is transverse direction Bottom conduction band profile in the transverse direction Electron density distribution of entire device. Majority of the plot is void of electrons corresponding to the buried oxide layer

11 viii Figure Page 4.6 Electron density distribution focusing on channel region. Due to quantum confinement in transverse direction, the electron density exhibit wave-like pattern in source and drain. Electron density in channel is low compared to those in the source and drain, so it is difficult to observe in this plot Electron density distribution in transport direction. Notice the log scale of this plot Average carrier velocity along the channel Energy vs. transmission coefficient. 0eV reference is the quasi-fermi level in source, where electrons are injected from LDOS induced by the source LDOS induced by drain Full LDOS induced by both source and drain Energy resolved current density. The white curve is a line illustration of current density vs. energy Energy resolved current density foucsing around quasi-fermi level at source Energy resolved electron density I d vs. V g with various back biases Electrostatic potential profile at back gate voltage = 0V Subband electron density in transverse direction at back gate voltage = 0V Electrostatic potential profile at back gate voltage = -20V Subband electron density at back gate voltage = -20V Threshold voltage vs. gate length at linear/saturated current with/without back gate bias DIBL vs. gate length, with and without back gate bias I d vs. V g comparison between semiclassical and ballistic transport models in linear scale I d vs. V g comparison between semiclassical and quantum ballistic transport models in log scale Conduction band profile along the channel. Zoomed into the beginning of the channel for better comparison

12 ix Figure Page 4.26 Charge comparison between semiclassical and quantum ballistic models. Notice the log scale in y-axis Average carrier velocity comparison within the channel. Two gate voltages corresponding to on-state and off-state are plotted for each transport model I d vs. V g comparison between drift-diffusion and quantum ballistic transport models Charge distribution comparison along the channel. Notice the log scaled used in y-axis Snapshots of nanomos GUI built from Rappture toolbox. It is deploied on nanohub.org at The left figure is input interface, and right one is output interface displaying a 3D plot Implementation flow chart of Rappture toolbox to nanomos. The red box indicates the computational code is hidden from users. All users see and need is an easy to use graphical interface

13 x ABSTRACT Wang, Xufeng MSECE, Purdue University, May NanoMOS 4.0: A Tool to Explore Ultimate Si Transistors and Beyond. Major Professors: Gerhard Klimeck, Mark Lundstrom. This thesis discusses the modeling of nano-scale field-effect transistors using nanomos 4.0. Our goal is to present the reader with a comprehensive documentation of nanomos 4.0 including its code structure in detail, development history, and new features. As silicon device scaling is reaching its limit, nanomos 4.0 is able to simulate the ultimate Si transistor performance and explore new non-silicon based devices. In this report, we focus on a simple double-gate thin-body structure for demonstration purposes. One primary aim is to show how theories are incorporated computationally into a fully functional simulator. Physically, two main transport modules of nanomos, semi-classical (drift-diffusion and ballistic) and NEGF, are examined in detail including the driving theories behind them; computationally, we further discuss several important numerical issues and document tricks of the trade used in nanomos to resolve those issues. We present auxiliary programs such as a benchmark and testing suite and discuss how they are used to support the development of nanomos. In the end, we demonstrate the application of nanomos to real devices fabricated by the Intel Corporation to illustrate the methods used to benchmark simulation results against experimental data. Thus, with this report, we deliver a deeper understanding and comprehensive review of this tool to its user.

14 1 1. INTRODUCTION The backbone of today s digital products is the complementary metal-oxide semiconductor (CMOS) technology. Made from a complementary pair of p-type and n-type metal-oxide semiconductor field effect transistors (MOSFET), mass CMOS units can be cascaded, the central processing unit (CPU) within a computer for example, to carry out complicated logical computations. Since the advent of MOS transistors in Bell laboratory [1], semiconductor industry enjoyed an era of rapid growth and invented numerous revolutionary digital products that changed our lives. One major force that pushes the advancement of digital technology so rapidly over the past several decades is the dramatic increases of CMOS speed, and such speed gain is mainly obtained by the way of MOSFET scaling that is to make them smaller and smaller by scaling their size down proportionally (see, for example, scaling theory of Dennard [2]). MOSFET scaling enhances speed in two aspects: first of all, with its channel shortened, electrons can take less time traveling from source to drain giving it a faster switching time. Electrons in this case also encounter less scatters such as impurity and surface roughness in a shorter channel and thus gain a higher mobility. Second of all, scaling allows more devices to be packed within a certain area, making same sized CPUs more powerful and feature rich. It is desirable to have a smaller packing area, so communications between transistors would be faster. Depending on the design of the CPU, the packing area related to communication delay and devices per area related to logic power need to reach an optimized balance. 1.1 Difficulties in Si MOSFET scaling However, device scaling faces its many challenges.

15 2 One serious challenge is the short channel effects [3]. As the channel length being shortened, the source and drain are brought close to each other. When a barrier is induced by the gate within the channel, the barrier height has to drop to the same potential as drain on the drain side. Channel are not doped as heavily as contacts, so the screening length can be significant comparing to the channel length in nanoscale transistors, which means this lowering of barrier on the drain side occurs gradually in a significant region of the channel. In the worst case, drain may even affect the top of the barrier height in so called drain induced barrier lowering (DIBL). Such undesired short channel effects weakens the control of gate and increases off-state current. In order to counter such effects, stronger gate control is needed. One solution is to have a material with high dielectric constant as the gate oxide layer so one can decrease the physical thickness of the layer and obtain a higher capacitance while retaining low gate leakage. Besides all the physical difficulties mentioned above, another major road block in device scaling is the engineering fabrication technique being inadequate. The backbone of MOSFET fabrication is a process called lithography [4]. A pre-defined mask is first placed over a layer of deposited photoresist, and after being exposed to light, areas of photoresist not protected by mask will react and be etched off to create desired patterns. With such process repeated over and over, one can create complex layers of MOSFETs and interconnect structures. As devices being shrunk toward nanoscale today, one needs to define very fine patterns. In this case, mask alignment becomes increasingly difficult. Etching a thin channel also becomes sensitive and requires stricter control. Diffraction due to wave nature of light adds additional burden by blurring the masked edges. Although more sophisticated techniques such as electron beam lithography is able to define sharper edges, its use in mass production is not yet realized.

16 3 1.2 Emerging candidates for Si transistor replacement Facing the many challenges of Si transistor scaling, people begin to look other ways and seek candidates to replace or complement the existing Si transistors. Many promising proposals are made, and they all have their own advantage and disadvantages. Modeling thus has become essential in order to understand these new devices behaviors and evaluate their performances. In this thesis, we present nanomos [5]: a 2-D simulator for thin body (less than 5nm), fully depleted, double-gated n-mosfets. It s a simple and effective program coupling Poisson s equation with one of several transport models self-consistently. Originally designed for simulating Si n-mosfets, its features have been greatly expanded to model new devices by a team of developers. As of current version, nanomos 4.0 incorporates the following device geometries: Si Double-gated MOSFET Fig Ideal double-gated MOSFET structure used in nanomos. Source, drain, and channel are all of same semiconductor material. Traditional bulk MOSFET is built upon a substrate of intrinsic silicon, and such MOSFET only has a single gate on the top to control the potential barrier induced in channel. As devices being scaled down, one major drawback is the loss of decent gate control due to short channel effects. A better gate controlled potential barrier

17 4 allows the MOSFET to have better current on/off ratio and substreshold slope. One way to archive this is to have two gates built symmetrically on top and bottom of the channel in the so-called double-gate MOSFET structure (Fig. 1.1). One major difference between a traditional bulk MOSFET and double-gate MOS- FET is how electrons are confined in the channel. In a traditional bulk MOSFET, electrons traveling in the channel are confined within a triangular quantum well due to substrate band bending induced by the gate. This triangular well is located right at the semiconductor-oxide interface, so the interface roughness scatters electron traveling toward drain and thus degrade the mobility. In the double-gate MOSFET structure, the channel is confined by the gates to form a rectangular quantum well, and two situations may occur depending on the thickness of channel. If channel is thick, depletion simply occurs near the semiconductor-oxide interfaces, leaving middle of the channel between two gates un-depleted. Thus, two separate channels are created. In addition, this also results in separations between energy levels in the well being comparable to thermal voltage k B T,electronsinjectedfromsourcemayoccupy several of these subbands. The resulting situation is similar to two bulk MOSFETs built back-to-back [6]. On the other hand, if the channel is thin enough (less than 5nm for example), the splitting between energy levels become significantly larger than thermal voltage, and electrons are only able to occupy the bottom subband alone without hopping to higher levels. In this case, electron spatial distribution can be obtained from wavefunction solution of Schrdinger equation, which indicates electrons are now confined within the middle of the channel. Channel is fully depleted, and mobility degradation due to surface roughness is lessoned since electrons are now concentrated away from interface. Roughly as much as 4 times the on-current has been estimated in such thin-body double-gate MOSFET than a traditional bulk structure [7].

18 III-V Double-gated MOSFET It s well known that electrons in III-V compound materials such as GaAs has a higher intrinsic mobility than ones in silicon. Higher mobility can be translated to much desired faster switching time and higher current density. Based on the already discussed Si thin-body double-gate MOSFET structure, we can substitute the silicon channel with a III-V compound. In reality, building a III-V MOSFET is not a simple matter of such suggested swapping. Oxide layer in silicon devices are popularly obtained by oxidations of silicon itself into SiO 2. III-V compounds however cannot be oxidized this way to create an isolating barrier. Introducing an additional oxide layer over the III-V channel is also difficult since they may not attach well to each other [4]. Alternatively, people have found structure to overcome the fabrication difficulties and utilize intrinsic III-V material as channel in device called High Electron Mobility Transistor (HEMT). In nanomos, we simply assume the oxide layer is some material described by a dielectric constant and adhesive well enough with the III-V channel. In addition, several mobility models are incorporated to suit different III-V materials Schottky barrier FET Fig Ideal Schottky barrier FET structure used in nanomos. It s similar to double gate MOSFET except with metallic contacts.

19 6 Regular MOSFET forms source and drain by heavily doping the regions with dopant. Schottky barrier FET (Fig. 1.2) instead uses metallic material as contacts. Metal is the ideal contact because they have very low resistance and abundant amount of free electrons. When metal contact forms a junction with the semiconductor channel, a Schottky barrier is formed. The upside is that this junction is abrupt which is difficult to obtain by doping. Abrupt junction means narrow depletion width, which brings faster switching behavior. In addition, the abruptness allows a superior scaling ability. The downside is in order to be injected into the channel, electrons has to either (rarely) hop over or (mostly) tunnel through the thin Schottky barrier. The Schottky barrier FET operates slightly differently than a MOSFET. In a MOSFET, the gate controls the height of potential barrier within the channel. In Schottky barrier FET, the Schottky barrier at metal-semiconductor interface is present no matter what gate bias is given. The gate instead controls the width of the barrier which in return modulates the transmission of electrons tunneling through [8] III-V High Electron Mobility Transistor (HEMT) Fig Ideal HEMT structure used in nanomos. In traditional MOSFET, the channel needs to be doped to obtain free electrons in order to conduct. However, the drawback is traveling electrons in the channel scatters

20 7 at these dopant sites, and their mobility is degraded as a result. HEMT deals with this issue by having an intrinsic channel (InGaAs) and obtaining free electrons from an adjacent doped layer of different high-bandgap material (InAlAs barrier). Due to its higher bandgap, the electrons from InAlAs spill over into InGaAs channel leaving InAlAs depleted. This creates a triangular quantum well inside InGaAs at InAlAs-InGaAs interface. Electrons in the channel are confined within this well and travel inside an intrinsic material with high mobility notice that this resulting higher mobility is contributed by the absence of impurity scattering and inherit property for being III-V material. It s common practice to add a heavily doped delta layer inside InAlAs. By adjusting doping dose in this delta layer, one can accurately change the threshold voltage of the HEMT. In nanomos, we do not simulate the entire HEMT structure due to complications arisen from contacts. We instead only simulate the channel region where current flows horizontally in 1D manner (Fig. 1.3); inclusion of contacts will force us to consider vertical flow of current which isn t compatible with the 1D transport models used in nanomos. HEMT contacts consist of layers of materials, and to take the contact into effect, we simply add a series resistance term which can be determined by fitting experimental data [9] SpinFET In addition to the gate induced potential barrier in the channel, the spin polarization of electrons is able to give another degree of freedom for current control. Figure 1.4 shows one of the possible scheme for utilizing spin effects in a FET structure. This scheme is originally suggested by Sugahara and Tanaka [10], and nanomos simulates it with some slight modifications. In this structure, electrons injected from a nonmagnetic source are unpolarized meaning there is equal probability for them to be up-spin or down-spin. When going through a Half-Metallic-Ferromagnet (HMF), electrons become polarized in one

21 8 Fig Ideal spinfet structure used in nanomos. Here we show an example of up-spin electrons being transported across the channel. In this case, the two HMF layers are in parallel configuration. Size of dot denotes relative population of electrons with that spin. spin direction (up-spin or down-spin). HMF is able to effectively polarize electrons to about 100%. Magnetization of the HMF layer determines the polarization direction of the electron, and it can be switched, for example, by spin transfer torque of the flowing current through. The HMF layer and semiconducting channel (silicon in this case) forms a Schottky barrier which the polarized electrons tunnel through. The gate, just as in the case of Schottky barrier FET, modules the thickness of the Schottky barrier and attenuates the transmission probability of the electrons. On the drain side of the channel, polarized electrons encounter a second layer of HMF, which may or may not have the same magnetization with the first. In case of same magnetization, or the so called parallel configuration, the polarized electrons can simply travel through and reach the nonmagnetic drain; in case of opposite magnetization, or anti-parallel configuration, the polarized electrons cannot travel through the second HMF layer, and the resulting current is greatly lessoned. Therefore, both the gate and magnetization of HMF layers have impact on the current. The theoretical operation of Sugahara-Tanaka spinfet we just described ignores several experimentally observed factors. First of all, it is suggested that the polarization of electrons is randomized to a certain degree at HMF and semiconductor interface. This randomization, of course, is not 100% which would render the spinfet

22 9 pointless, but its effect is significant enough to be necessarily considered. Therefore, for the sake of modeling in nanomos, an additional artificial layer of randomization layer is added between the HMF and semiconducting channel. Within this randomization layer, spin are randomly flipped with a pre-set probability. Another issue is the conduction mismatch between the HMF and semiconductor. Due to randomized polarization, both up-spin and down-spin electrons enter channel, but the existence of conduction mismatch undesirably causes the population difference between them to decrease. To lesson this effect, an additional thin oxide layer is placed between randomization layer and semiconductor channel. Both population of up-spin and down-spin are reduced by this oxide tunneling layer, so their difference remains roughly the same after entering the channel. In nanomos, the spin relaxation of electrons in channel is also considered. This, however, is implemented numerically by concept of self-energy in NEGF calculation [11]. 1.3 Development history nanomos 1.0 (Published in 2000) Developer: Zhibin Ren Original nanomos code for silicon MOSFETs is written in MATLAB. nanomos 2.0 (Published in 2005) Developer: Steve Clark, Shaikh S. Ahmed Rappture interface is added to nanomos, and the code becomes avaliable on nanohub.org. nanomos 3.0 (Published in 2007) Developer: Kurtis Cantley

23 10 Support for III-V materials in semi-classical ballistic and quantum ballistic transport models is added. Rappture interface is updated to reflect the III- Vimplementation. nanomos 3.0 (Published in 2007) Developer: Himadri Pal Top and bottom gate can now have asymmetric configurations with different gate dielectrics and capping layers. nanomos 3.5 (Published in 2008) Developer: Xufeng Wang Support for III-V materials in drift-diffusion transport is added. mobilities models are added. Additional nanomos 3.5 (Published in 2009) Developer: Xufeng Wang, Dmitri Nikonov nanomos source code is restructured and modularized. Material parameters are separated out as a mini-library. Debugging functions are planted within source code to assist code developments. Benchmark and testing suite is created based on a script from Dmitri Nikonov. nanomos 4.0 (Developed in 2009) Developer: Himadri Pal Support for Schottky FET is added. NanoMOS now has the ability to simulate a double gate MOSFETs structure with metallic source/drain via NEGF formalism. nanomos 4.0 (Developed in 2009) Developer: Yang Liu

24 11 Support for HEMT is added. NanoMOS now has the ability to simulate a III-V HEMT structure via NEGF formalism. nanomos 4.0 (Developed in 2009) Developer: Xufeng Wang Parallel Jobs Submitter (PJS) is added. PJS allows nanomos to sweep gate/source bias and run each bias on a cluster node. It supports only clusters with Portable Batch System (PBS) installed such at steele (steele.rcac.purdue.edu) or coates (coates.rcac.purdue.edu). nanomos 4.0 (Developed in 2009) Developer: Yunfei Gao Support for SpinFET is added. NanoMOS now has the ability to simulate a SpinFET structure via NEGF formalism. nanomos 4.0 (To be published in 2010) Developer: Xufeng Wang Merge working branches of Schottky FET, HEMT, and SpinFET modules. Code is restructrued. Rappture interface is updated to accommodate the newly published features. 1.4 Objective This thesis is dedicated to a comprehensive documentation of nanomos simulator at its latest version 4.0. Since its creation in 2000, nanomos has expanded from a simple Si MOSFET simulator to one that is rich in feature with several transport models and device types incorporated. Like any other simulation program, it is essential to look through the user interface and understand the physics buried underneath; using a program without sufficient knowledge and understanding of its internal structure like a magical black box is no different than seeking an answer from a crystal

25 12 ball, and data it spills out will offer no insight whatsoever. As the program grows to be more complex, it is also important to keep tracks of the changes and document the progress. Auxiliary programs are written to wrap around the core to provide automatic services such as data plotting, parallel submission, and benchmarking. Thus, in this thesis, we aim to boil the code down to details and discuss its meaning and origin including what physics is implemented, how it is incorporated numerically, and why it is done that way. Any interested user who has seen what nanomos can do will see why nanomos can do from this thesis. 1.5 Chapters overview In chapter 2, we will look at the semi-classical transport models deployed in nanomos. NanoMOS currently has two models based on semi-classical approach: the drift-diffusion model and the top of the barrier model. 1D Drift-diffusion equation is solved in transport dimension by the merit of mode space approach, and Scharfetter and Gummel method is used to ensure stability of the solutions. The top of the barrier model is the solution to Boltzmann s equation in its ballistic limit. Carriers having energies higher than the top of the potential barrier within the channel will be fully transmitted through. The 1D electron density obtained from these classical transport equations are then expanded to 2D grid based on solutions of Schrdinger equation in confinement direction. Electron density and electric potential has to satisfy Poisson s equation at the same time, so a 2D Poisson s equation is solved via controlled volume method. The coupling between Poisson s equation and transport equation forms the self-consistent scheme which iterates until a converged solution is found. In chapter 3, we will look at the quantum transport models deployed in nanomos. Quantum treatment allows us to take into account the phenomena such as interference and tunneling that are not accounted for in semi-classical pictures. These quantum phenomena has significant impacts on off-current and electron density distribution. We first open discussion with examination of effective mass model and mode space

26 13 approach. We present and validate Schrdinger equation within these approximations. Effective mass Schrdinger equation is then solved via the non-equilibrium Green s function (NEGF) formalism, and it s coupled with Poisson s equation for self-consistency. In this chapter, we introduce the recursive Green s function (RGF) method as an efficient way to obtain quantities of interest such as current and electron density. In chapter 4, we will use nanomos 4.0 to simulate a SOI MOSFET fabricated by IBM corporation. It is important to benchmark a simulation program with real device to verify its validity and compare the results. We overview several important internal plots obtainable in nanomos and discuss their meaning. The subtreshold characteristics which is essential to device design and operation can be explored with nanomos 4.0. We also take this opportunity to see how different transport models predicts the device performance differently. In chapter 5, we will look at the auxiliary programs designed for nanomos. As nanomos becoming richer in feature and more popular in demand, several auxiliary programs are made to enhance the capability of nanomos software. A graphical user interface (GUI) has been made to allow users to communicate with nanomos easier. The GUI is powered by Rapid Application Infrastructure (rappture) developed by nanohub.org. To accommodate large parallel submissions of nanomos jobs on clusters, a parallel job submitter has been created to automate the process. Subversion is used to provide version control for software development, and in addition to ensure the wellness of the nanomos code in development, a benchmark module is developed to test and check for possible code breakdowns.

27 14 2. SEMI-CLASSICAL TRANSPORT IN NANOMOS 2.1 Introduction A major branch of nanomos simulator is the semi-classical transport module. It consists of the diffusive transport model based on drift-diffusion equation and the ballistic transport model based on Boltzmann s equation [12]. With the same simulation structure, this gives an opportunity for one to run both semi-classical transport models and compare the results. Implementing these equations require numerical details and tricks often omitted from result oriented literatures. Here we look into the details of how these semi-classical transport models are incorporated into nanomos. We start by overviewing the characteristics of the semi-classical branch including its advantages and disadvantages comparing to quantum transport branch. We then exam some critical assumptions and concepts fundamental to the nanomos scheme. After establishing a common ground, the implementation of drift-diffusion transport is discussed. We show that the Scharfetter and Gummel method [13] is needed in order to insure stability of the drift-diffusion equations. For the ballistic limit counter-part, top of the barrier model [12] is illustrated and discussed as an equivalent of solution to Boltzmann s equation in absence of scattering. The electron density output from any transport equation is then coupled with Poisson s equation for self-consistency. We describe the implementation details of Control Volume Method (CVM) [14] and Newton-Raphson iterations used in our simple Poisson solver. In the end, some important numerical convergence issues are discussed.

28 Transport model overview Our task here is to simulate how electrons move in a certain device geometry. For this, we have different choices of models to use. Each of the models is based on certain facts and assumptions, resulting in their own advantages and disadvantages. Classically, electrons are treated as particles and their dynamics can be described by the Boltzmann equation. In case of large amount of scattering events occurring during the flight of the particle, the Boltzmann equation becomes the familiar driftdiffusion equation in the diffusive limit; on the other hand, in case of total absence of scattering, the Boltzmann equation can be directly evaluated to obtain the ballistic performance of the device in the ballistic limit [12]. Boltzmann equation however does not take into account any quantum effects such as interference and tunneling, thus it is a pure classical treatment. However, some important aspects of electronic transport especially in case of nanoscale MOSFETs originate from quantum effects and cannot be simply ignored. In case of semi-classical transport models, we avoid the complicated full quantum treatment and instead fuse the simplicity of classical treatment with some quantum corrections. The quantum corrections are reflected in the usage of electron effective mass and the calculation of subbands in confinement direction. Semi-classical transport models we deployed in nanomos thus have the great advantage of being simple and computationally efficient comparing to full quantum treatments. It of course has the disadvantage of not being able to capture some important quantum effects and thus unable to explain certain transport phenomena. In nanomos, two semi-classical transport models are available: drift-diffusion and semi-classical ballistic transport models.

29 Overall scheme of nanomos simulator Introduction In this section, we look at preliminary topics essential to drift-diffusion model, and some of them are also common for the rest of nanomos. We first present the three basic equations underlying the validity of drift-diffusion equation in its finite difference discretized form. The concepts of self-consistency, mode space approach, and effective mass are just simply pointed out without any demonstration nor derivation. These topics will be addressed in great details in chapter Basic equations The underlaying principles making the simulation of electronic transport possible can be described by a system of basic equations. Transport equations Transport equations such as the Boltzmann equation can describe electron dynamic. They determine the dynamics of electron density distribution in response to perturbation such as external electric field and electron density gradient. The electron drift-diffusion equation is shown as an example J n = qnµ n ξ + qd n n = qnµ n φ + qd n n (2.1) J n electron current density n electron density µ n electron mobility D n electron diffusion coefficient φ, ξ electric potential, field

30 17 Poisson s equation Poisson s equation is a fundamental equation describing the spatial relationship between a certain electron density distribution and the corresponding electric field. It holds true no matter which transport equation/model we use, so it is a common routine for all simulation options. 2 φ = 1 ε p n + N + D N A (2.2) φ electrical potential p, n electron, hole density N + D,N A donor, acceptor density In nanomos, we assume the absence of holes and only treats electrons. Thus, Poisson s equation becomes 2 φ = 1 ε n + N + D N A (2.3) Continuity equation The continuity equation unlike Eq. (2.1) and (2.2) deals with time-dependent phenomena such as carrier generation and recombination. Essentially it is an equation stating the conservation of current which is a fundamental property of nature. n t = 1 J + G R (2.4) q n electron density J electron current density G, R electron generation, recombination rate In case of steady-state situation without generation nor recombination, the continuity equation reduces to a simple form and states the conservation of electron current

31 18 density spatially throughout the entire device. In another word, current density is simply a constant in this particular case (constant cross-section area for current along xandconstantcurrentdensityalongy). J =0 (2.5) Self-consistent calculation Observe the three basic equations: there are three unknowns in total, n, φ, and J. Throughout entire device, J is a constant, while the other two can vary with position. Poisson s equation contains two unknowns, n and φ, while the transport equation contains all three; continuity equation contains a simple derivative of J which cannot be used to determine its exact value. We thus start from the Poisson s equation as a break-in point. Notice if electron density distribution n OR electric potential φ somehow becomes known to us, the entire problem is then readily solved. We start the simulation by reasonably guessing an electric potential profile φ, solvetransportequationfor electron density distribution n, and then insert n into Poisson s equation to backward calculate a new φ new. If φ new agrees with φ old within certain tolerance, our solution is converged and our guess is good. In nanomos, this tolerance factor is a user defined convergence parameter. If not, we update the guess and repeat this loop until solution is converged. The uniqueness theorem for second order elliptic equation guarantee the converged solution is the only solution to the Poisson s equation [15]. This widely used iterative scheme is called self-consistent calculation or Gummel s method [16], [17]. Of course, one can also start with an electron density distribution n guess and Poisson s equation to initiate the self-consistent calculation. The most important message here is: no matter where you start, one has to find a pair of n and φ satisfies all three basic equations at the same time.

32 Effective mass model An important aspect of nanomos simulator is the use of effective mass model [13]. In this section, we simply point out the results and consequences of this model. A more detailed look on its derivation and validity is deferred to chapter 3. Fig Picture shows the change in point of view from an electron with rest mass traveling in crystal lattice to one with modified mass traveling in constant potential. By introducing the so called effective mass, we eliminate the need to consider the rapid changing potential from crystal lattice and simplified our calculation. Electrons traveling in a crystal feel the existence of the lattice structure. This microscopic detail has great impact on the electronic transport property and cannot be simply ignored. In the so-called Kronig-Penney model [18], crystal lattice is modeled after a series of finite barrier rectangular quantum wells reflecting the atomic sites and spacings in between. Such approximations are helpful to account for the crystal lattice effects, but they are not feasible computationally in our large structure containing tens of thousands atoms. In the effective mass model, we use a modified version of electron s free mass, the effective mass [19], to collectively account for the lattice effects. Therefore, with a change of electron mass, we can ignore the lattice details and continue to use classical equations. This is the main advantage of effective mass model over others. Effective mass can be obtained empirically from experimental studies such as cyclotron resonance [20] and direct measurement of electronic band structures [21]. In nanomos, we simply import effective mass parameters from widely used sources such as [22]. In general, effective mass varies mainly with material type and crystal

33 20 orientation. One needs to specifies these or lookup a proper effective mass to be used in nanomos simulation Mode space approach Another important aspect of nanomos simulator is the use of mode space approach. In this section, we simply point out the results and consequences of this approach. A more detailed look on its derivation and validity is presented in the later chapter. nanomos studies a MOSFET thin channel sandwiched between two gate oxide layers. It is essentially a quantum well structure formed when the width of the well is narrow (less than 10nm), and energy separations between bound states (subbands) become several times larger than thermal energy k B T. As a result, it is proper to assume electrons residing in different states do not interact with each other. The assumption of uncoupled subbands merits us to treat electrons in each state separately, instead of treating all electrons together. This divide and conquer plan relieves the computational burden and is the main motivation for its usage in nanomos. In order to solve for the subband energies and wavefunctions, we need to solve for the Schrdinger equation in confinement direction. The effective mass Schrdinger equation in confinement direction (z-direction) is h2 2m ϕ z +(E c0 + U (r)) ϕ z = E v ϕ z (2.6) ϕ = C v,k k (2.7)

34 21 h Planck s constant m electron effective mass ϕ z wavefunction in confinement direction (z-direction) E c0 energy at bottom of subband U (r) spatialperturbationpotentialduetoexternalfieldorotherfactors E v vth subband energy C v,k wavefunction expansion coefficient for vth subband and wavenumber k. Grid layout for solving Schrdinger equation in confinement direction Fig D Schrdinger equation is solved in each vertical slice illustrated in the figure. The grid is same as the one used in Poisson s solver.

35 22 Oxide penetration The oxide barrier sandwiching the semiconductor body has a high but finite barrier, and electrons are able to penetrate into any finite barrier. We have to make a choice here to either simply ignore the electron penetration and assume infinitely high oxide barrier, so make our problem slightly more complicated by allowing electron penetration but assuming wavefunctions at metal-oxide interfaces are zero. Both options are present in nanomos. Notice that in either case, electrons cannot leak through the gate, so there s no leakage current whatsoever. Fig Comparison between wavefunction profile bewteen oxide penetration on and off. 2.4 Drift-diffusion transport Introduction Our task in this section: assuming the 1D electric potential profile φ is known throughout the entire device, we solve the 1D drift-diffusion transport equation [23] for electron density distribution n. dφ J n = qnµ n dx + qd dn n dx (2.8)

36 23 J n electron current density n electron density µ n electron mobility D n electron diffusion coefficient φ electric potential Since in nanomos we do not treat holes, the subscript n denoting for electrons will be omitted from now on for clarity. Fig It shows a possible trajectory of an electron traveling in diffusive manner across the channel. Red crosses are scattering centers where electron s momentum is relaxed. Notice electrons with energy lower than the barrier can still get across due to scattering possibly from phonon. We start by examining the equations and transform them into more convenient form with proper assumptions and simplifications. Numerically the discretized driftdiffusion equation derived in a straight-forward manner cannot establish numerical stability. We derive and introduce the Scharfetter and Gummel technique to insure stability of our solutions.

37 Equation reformulated We can use some simple equations to recast the drift-diffusion transport equation into a more convenient form. Assume non-degenerate carriers and apply Einstein s relationship (2.9) to eliminate diffusion constant D µ = kt q J = qnµ dφ dx + ktµdn dx = ktµ n d qφ dx kt + dn dx (2.9) (2.10) (2.11) As suggested by eq. (2.11), it s convenient to normalize the electrical potential with respect to thermal voltage. From now on, electrical potential in this section is in units of kt/q. The normalized equation is J = ktµ n dφ dx + dn dx (2.12) Finite difference method (FDM) and continuity equation Proceed with regular procedure and transform the drift-diffusion equation (2.12) via FDM, we get φ i φ i 1 J i 1/2 = ktµ i 1/2 n i 1/2 + n i n i 1 a a φ i+1 φ i J i+1/2 = ktµ i+1/2 n i+1/2 + n i+1 n i a a (2.13) (2.14) a grid spacing constant i grid node index The drift-diffusion equation at current form (2.13) (2.14) cannot be solved, because we do not know the current density yet that s actually what we are solving for in

38 25 the end. However, even without explicit knowledge of the current density, we can still solve the drift-diffusion equation with the help of continuity equation. Under steady state condition with no generation nor recombination, the continuity equation (2.5) in 1D is simply dj dx =0 (2.15) This means the current densities at neighboring nodes (well, thus every node in device) are equal J i 1/2 = J i+1/2 (2.16) Therefore, we can now eliminate the unknown current density in the drift-diffusion equation by pairing up neighboring nodes. φ i φ i 1 ktµ i 1/2 n i 1/2 + n i n i 1 a a φ i+1 φ i = ktµ i+1/2 n i+1/2 + n i+1 n i a a (2.17) Observe eq. (2.17) is centered at node i. We can write in the end N such equations for N non-boundary nodes, so the system of equations is solvable. This justifies why we choose to use mid-point nodes: only by using mid-point nodes we will be able to write one such equation centered at every node. If we decide to let FDM lattice be 2a instead of a, we cannot use any unknown mid-point value which eliminates the need for interpolation, but we can only find N 1equationsfortheN unknowns. Mid-node interpolation Since the values at mid-nodes are unknown, we have to rely on interpolation techniques to approximate them. The most straight-forward method is the linear interpolation.

39 26 Fig Illustration of the meaning of mid-nodes. The red nodes are actual solution nodes of Poisson grid. Green nodes are midnodes which are in the middle of actual solution nodes. µ i 1/2 = µ i + µ i 1 2 µ i+1/2 = µ i + µ i+1 2 n i 1/2 = n i + n i 1 2 n i+1/2 = n i + n i+1 2 (2.18) (2.19) (2.20) (2.21) Substitutes the interpolated results (2.18)-(2.21) into FDM drift-diffusion equation (2.17), we get kt µ i + µ i 1 2 n i + n i 1 φ i φ i 1 2 a = kt µ i + µ i+1 2 n i + n i+1 φ i+1 φ i 2 a + n i n i 1 a + n i+1 n i a (2.22) Regroup the terms

40 27 µi + µ i 1 (φ i+1 φ i +2) n i+1 + (φ i φ i 1 +2)+(φ i+1 φ i 2) n i µ i + µ i+1 µi + µ i 1 + (φ i φ i 1 2) n i 1 µ i + µ i+1 =0 (2.23) It seems we have a complete system of equations to solve for charge, and up to now everything is straight-forward. It is true that in theory we can obtain the charge density this way, but computationally the above simple method is unstable and sometimes produces wrong results (see next subsection) Scharfetter and Gummel Method Numerical instability with traditional approach Observe the finite difference equation we obtained If µi + µ i 1 (φ i+1 φ i +2) n i+1 + (φ i φ i 1 +2)+(φ i+1 φ i 2) n i µ i + µ i+1 µi + µ i 1 + (φ i φ i 1 2) n i 1 µ i + µ i+1 =0 (2.24) φ i+1 φ i > 2 (2.25) φ i φ i 1 > 2 (2.26) then every coefficient of the electron densities n in the equation (2.24) is positive. Trying to solve such an equation forces one or more electron densities to become negative which is unphysical. Add the two equations (2.25) (2.26) and write in SI units, we see the criteria for instability more clearly

41 28 φ i+1 φ i 1 > 4kT/q (2.27) This tells us by using the traditional approach, if two second-nearest neighboring nodes have a potential difference more than 4kT/q, instability occurs. Such rapid change in potential may occur at source/drain interfaces with channel under high V G and low V ds. To avoid instability, one has to place finer grids at such places of rapid potential change. This usually requires a smart non-uniform grid layout. Doing so may dramatically increase the number of grid points and thus the computational time. A stable numerical technique: Scharfetter and Gummel method In 1969, Scharfetter and Gummel suggested an alternative way to discretize and solve the drift-diffusion equation while bypassing the aforementioned instability [13]. We now backtrack and start with the original drift-diffusion equation centered at midpoint i 1/2 dφ i 1/2 J i 1/2 = ktµ i 1/2 n i 1/2 + dn i 1/2 dx dx (2.28) Recall that only two conditions were applied in order to obtain the above equation 1-D transport Einstein s relationship In addition, the linear interpolation for mobility (2.18)-(2.21) is reasonable. Since mobility usually does not change rapidly, if they change at all, the instability shouldn t come from interpolation error. J i 1/2 = kt µ i + µ i 1 2 dφ i 1/2 n i 1/2 + dn i 1/2 dx dx (2.29) From now on, no linear interpolation on the potential φ or electron density n is made unless further justified.

42 29 The instability dilemma discussed previously brings an undesired consequence of possible negative electron density n solution. Electrical potential φ on the other hand does have the freedom to be negative. Therefore, it s natural for one to seek a certain function of φ that always returns positive values, so by replacing n with such functions, one forces electron density to be positive. Of course, one simple and common function satisfy the above criteria is the exponential function n = u e φ (2.30) u (x) isapositiveunknownfunctionbetweennodei and i 1 Substitute it into drift-diffusion equation (2.29), one gets J i 1/2 = kt µ i + µ i 1 2 = kt µ i + µ i 1 2 = kt µ i + µ i 1 2 = kt µ i + µ i 1 e φ du 2 dx ue φ dφ dx + d u e φ dx ue φ dφ dx + d eφ dx u + u d dx eφ ue φ dφ dx + d dφ eφ u + ueφ dx dx (2.31) Recast it into integration convenient form e φ J i 1/2 = kt µ i + µ i 1 2 Integrate both sides between nodes i and i 1 du dx (2.32) i i 1 e φ J i 1/2 dx = First, we look at the left side i i 1 kt µ i + µ i 1 2 du dx (2.33) dx i i 1 e φ J i 1/2 dx (2.34)

43 30 The continuity equation (2.15) states the spatial invariance of current density, so we can bring it out of the integral sign. The electric potential φ between two nodes is unknown to us. It is reasonable to assume a linear variation for φ between two nodes, so φ (x) =φ i 1 (x i 1 )+ φ i (x i ) φ i 1 (x i 1 ) a x i 1 x x i (x x i 1 ) (2.35) Left side (2.34) then becomes i e φ J i 1/2 dx i 1 i = J i 1/2 exp φ i 1 (x i 1 )+ φ i (x i ) φ i 1 (x i 1 ) (x x i 1 ) dx i 1 a a = J i 1/2 φ i (x i ) φ i 1 (x i 1 ) exp φ i 1 (x i 1 ) x i φ i(x i ) φ i 1 (x i 1 ) x i 1 (x x a i 1 ) a = J i 1/2 exp φ i 1 (x i 1 ) φ i(x i ) φ i 1 (x i 1 ) (x a i x i 1 ) φ i (x i ) φ i 1 (x i 1 ) exp φ i 1 (x i 1 ) φ i(x i ) φ i 1 (x i 1 ) (x a i 1 x i 1 ) a = J i 1/2 exp φ i 1 (x i 1 ) φ i(x i ) φ i 1 (x i 1 ) a a φ i (x i ) φ i 1 (x i 1 ) exp ( φ i 1 (x i 1 )) = J i 1/2 a φ i φ i 1 e φ i e φ i 1 (2.36) Now, for the right side i i 1 kt µ i + µ i 1 du 2 dx dx = kt µ i + µ i 1 2 = kt µ i + µ i 1 2 = kt µ i + µ i 1 2 ui u i 1 du (u i u i 1 ) ni e φ i n i 1 e φ i 1 (2.37)

44 31 Now, join the left (2.36) and right side (2.37), we obtain the finalized Scharfetter and Gummel discretization equation at mid-node i 1/2 J i 1/2 a e φ i e φ i 1 = kt µ i + µ i 1 ni e φ i n i 1 e φ i 1 (2.38) φ i φ i 1 2 Recast in a more suggestive form J i 1/2 = kt 1 φ i φ i 1 e φ i 1 e φ i a φ i φ i 1 = kt 2a (µ i + µ i 1 ) = kt 2a (µ i + µ i 1 ) µ i + µ i 1 2 ni e φ i n i 1 e φ i 1 e φ i 1 e φ i e φ i n i n i 1 e φ i φ i 1 φ i φ i 1 ni e φ i φ i 1 n i 1 e φ i φ i 1 1 = kt 2a (µ i + µ i 1 ) B (φ i φ i 1 ) n i n i 1 e φ i φ i 1 B (φ i φ i 1 ) is the Bernoulli function defined as (2.39) B (z) = z e z 1 (2.40) With the same procedure, one can write similar equation for density current at node i +1/2 J i+1/2 = kt 2a (µ i+1 + µ i ) B (φ i+1 φ i ) n i+1 n i e φ i+1 φ i Continuity equation tells us the current density at two nodes are equal, thus (2.41) J i 1/2 = J i+1/2 kt 2a (µ i + µ i 1 ) B (φ i φ i 1 ) n i n i 1 e φ i φ i 1 = kt 2a (µ i+1 + µ i ) B (φ i+1 φ i ) n i+1 n i e φ i+1 φ i (µ i + µ i 1 ) B (φ i φ i 1 ) n i (µ i + µ i 1 ) B (φ i φ i 1 ) n i 1 e φ i φ i 1 =(µ i+1 + µ i ) B (φ i+1 φ i ) n i+1 (µ i+1 + µ i ) B (φ i+1 φ i ) n i e φ i+1 φ i (2.42)

45 32 After simplification, one gets (µ i + µ i 1 ) B (φ i φ i 1 ) e φ i φ i 1 n i 1 (µ i + µ i 1 ) B (φ i φ i 1 )+(µ i+1 + µ i ) B (φ i+1 φ i ) e φ i+1 φ i ni +(µ i+1 + µ i ) B (φ i+1 φ i ) n i+1 =0 (2.43) This is the final discretized drift-diffusion equation centered at node i via Scharfetter and Gummel technique, which is in its exact form deployed in nanomos. Here, we approach the problem via a needed motivation: we do not desire negative electron density n, so we enforce an exponential function to keep it positive. Mathematicians especially may not find such solution procedure elegant and satisfying. In fact, the mathematical value of Scharfetter and Gummel technique wasn t realized years after it s published. Since then, people have offered a wide range of interpretations from Green s function prospective [24] to generalized splines [25] to the scheme of up-winding [26] Drift-diffusion current The evaluation of current is very straight forward from what we have already accomplished; it s simply J i+1/2 = kt 2a (µ i+1 + µ i ) B (φ i+1 φ i ) n i+1 n i e φ i+1 φ i (2.44) This equation can be evaluated anywhere in the channel, because the continuity equation (2.15) ensures the current at every node is the same.

46 Semi-classical Ballistic transport Introduction In previous section, we have seen how to numerically incorporate a simple and stable drift-diffusion solver. The drift-diffusion equation is a solution of Boltzmann s equation in its diffusive limit. In this section, we explore the same equation in its ballistic limit form. We first explain the equivalent top of the barrier model: it is a simple model which the electron density and current expressions are derived straight-forwardly. Restate our task in this section: assuming the 1D electric potential profile φ is known throughout the entire device, we solve the 1D Boltzmann s equation in ballistic limit for electron density distribution n The top of the barrier model In the drift-diffusion picture, electrons drift and diffuse toward drain end due to electrical field and concentration gradient. While traveling, electrons encounter numerous scattering events which relaxes its energy and momentum. The influences of relaxations are effectively modeled by the diffusion constant and mobility. In the ballistic picture, electrons are assumed to ballistically travel in the channel without losing any energy or momentum. The gate however is able to modify the electrostatic potential profile in the channel by changing its bias, and electrons encountering a potential barrier higher than its energy are simply reflected back; if their energy is higher than that of the barrier, the electrons are transmitted across with total certainty. This of course is a classical treatment because we have completely ignored the effect of any quantum effect such as tunneling and interferences. The ballistic picture is summarized below. It s not hard to see the critical potential is the maximum barrier height in the channel: it s the clear-cut point between energy

47 34 ranges for total transmission and reflection, and thus this simple semi-classical model is also known as the top of the barrier model. Fig Illustration of electrons injected into the channel with higher or lower energy than top of barrier Electron density in top of the barrier model At any place in the channel, there are two types of carriers going opposite directions as shown in the figure: the left going and right going ones. Each type of carrier distributes itself over a range of energies in a way described by the Fermi distribution function. If we can correctly identify the types of electrons presented, it is a simple matter of summation to determine the electron density at that point. For illustration purpose, we will concentrate on evaluating the electron density at point A, which is a point near the source side of the channel. From the source, left-going electrons are injected into the channel with all possible energies, and they all pass point A. Portion of them whose energy is below that of the top of the barrier is reflected back, so they pass point A as well. From the drain, right-going electrons are injected into the channel with all possible energies, but only those whose energy is higher than the top of the barrier get across and reach point A.

48 35 Fig Illustration of electron density evaluation at point A (red box on the left). The red box includes every possible electron stream, and resulting electron density is just a simple summation of all the streams. The electron inherits the Fermi distribution from the contact which they are injected from. The electron density at certain energy level is defined as the product of LDOS and Fermi distribution at that energy. n (E) =D (E) f (E) (2.45) n (E) electrondensityatenergye (2.46) D (E) localdensityofstateatenergye (2.47) f (E) fermidistributionofthecontactinjectedtheelectronatenergye (2.48) Since we use mode space approach here, we calculate the electron density from each subband and sum all possible subbands together. In the transverse direction, we

49 36 assume everything is uniform, and electrons behave as planewaves. Let us consider the 1D electron density at point A in subband i and transverse mode j. n i,j (A) = + = Etop 0 D 1D (E x ) f s (E x ) de x + Etop D 1D (E x ) f d (E x ) de x E top 1 m x 1 0 π h 2E x 1+e de (Ex+E i+e j µ S )/k B T x 1 m x 1 π h 2E x 1+e de (Ex+E i+e j µ S )/k B T x E top 1 π h 0 D 1D (E x ) f s (E x ) de x m x 2E x 1 1+e (Ex+E i+e j µ D )/k B T de x (2.49) Now, we have to sum up all available modes in transverse direction.

50 37 n i (A) = = = D 1D (E j ) n i,j (A) de j 1 m y n i,j (A) de j π h 2E j 1 m x 1 m y 1 π h 2E x π h 2E j 1+e de (Ex+E i+e j µ S )/k B T xde j 0 + Etop π h m x 1 2E x π h m y 2E j 1 1+e (Ex+E i+e j µ S )/k B T de xde j + = + + = 1 0 E top 0 Etop 0 m x 1 π h 2E x π h m x m y 2E x 2π 1 m x m y π h 2E x 2π m x 1 π h E top 1 π h m x m y 2π h 2 2E x m y 2E j 1 1+e (Ex+E i+e j µ D )/k B T de xde j 1 h F 1/2 (µ S E i E x ) de x 1 h F 1/2 (µ S E i E x ) de x m y 1 2π h F 1/2 (µ D E i E x ) de x 1 π π Etop 0 1 Ex F 1/2 (µ S E i E x ) de x 1 Ex F 1/2 (µ S E i E x ) de x + 1 π E top 1 Ex F 1/2 (µ D E i E x ) de x (2.50) The complete Fermi-Dirac integral of order 1/2 inside(2.50)is F 1/2 (µ S E i E x )= 0 1 E 1/2 j π 1+e de (Ex+E i+e j µ D )/k B T j (2.51) In the end, we need to sum up the contributions from all subbands n (A) = = n n i (A) i=1 n m x m y 2π h 2 i=1 1 π π Etop 0 1 Ex F 1/2 (µ S E i E x ) de x 1 Ex F 1/2 (µ S E i E x ) de x + 1 π E top 1 Ex F 1/2 (µ D E i E x ) de x (2.52)

51 38 For a point B on the right side of the top of the barrier, we can determine a similar equation using same procedure for point A. n (B) = = n n i (B) i=1 n m x m y 2π h 2 i=1 1 π π Etop 0 1 Ex F 1/2 (µ D E i E x ) de x 1 Ex F 1/2 (µ D E i E x ) de x + 1 π E top 1 Ex F 1/2 (µ S E i E x ) de x (2.53) For the electron density at the top of the barrier, either formula (2.52) or (2.53) will work. In nanomos, the numerical integration is carrier out by a simple composite trapezoidal rule: b a f (x) dx b a n f (a)+f (b) n k=1 f a + k b a (2.54) n Infinity is replaced by a large energy where the Fermi-distrubution of electrons is effectively zero. Near zero energy, due to the term 1 Ex,E x have to be replaced by averysmallnumbertoavoidsingularityissues. TheFermi-Diracintegralsarecarried out numerically [27] Ballistic current Electron density is merely a summation over all the electron streams, but when calculating current, we have to take into account their velocity (speed and direction). We will start to evaluate the current at point A as an example. The current velocity in effective mass approximation is the derivative of electron energy spectrum at the bottom of the lowest conduction band, and it is v = 1 h de dk (2.55)

52 39 Fig Illustration of electron current calculation at point A (red box on the left). For semiconductors with parabolic bands, v electron velocity (2.56) Thus E = h2 k 2 2m (2.57) v = 1 h de dk = 1 h d h 2 k 2 dk 2m = 1 h h 2 k m = 1 h h 2 2m E m h 2E = (2.58) m The 1D current density at point A in subband i and transverse mode j is

53 40 J i,j (A) =qvn i,j (A) 2Ex q m = x 1 0 m π h 2E x 1+e de (Ex+E i+e j µ S )/k B T x Etop 2Ex q m x 1 0 m π h 2E x 1+e de (Ex+E i+e j µ S )/k B T x 2Ex q m x 1 π h 2E x 1+e de (Ex+E i+e j µ D )/k B T x = q π h q π h E top m 1+e (Ex+E i+e j µ S )/k B T de x 1 E top Current density at point A in subband i is 1 E top 1+e de (Ex+E i+e j µ D )/k B T x (2.59) J i (A) = 0 = q π h 0 1 m y J i,j (A) de j π h 2E j = q m y π h 2 2π 1 m y π h 2E j E top 1 1+e ( de x Ex+E i +E j µ S)/k B T 1 E top 1+e ( de x Ex+E i +E j µ D)/k B T F E top 1/2 (µ S E i E x ) de x E top F 1/2 (µ D E i E x ) de x de j (2.60) Finally, we obtained the final current density as a summation over all possible subbands J (A) = = n J i (A) i=1 n i=1 q m y π h 2 2π E top F 1/2 (µ S E i E x ) de x E top F 1/2 (µ D E i E x ) de x (2.61) This current density expression (2.61) actually holds true for all points across the channel, unlike the electron density expression which differs between the left and right

54 41 side of the top of the barrier. The injected electrons with energy below the top of the barrier are reflected back, thus they collectively cancel each other in the current integration; only the electrons having energy over the top of the barrier contribute to the current effectively. 2.6 Poisson s equation Introduction Restate our task in this section: assuming the electron density distribution n is known throughout the entire device, we solve Poisson s equation for electric potential profile φ. 2 φ = 1 ε n + N + D N A (2.62) Poisson s equation is a simple second-order partial differential equation (PDE), but in this case what complicates the solution are the 2-dimensional nature of the problem and spatially varying material composition ε. Onecommonmethodtosolve PDE in this case is via discretization methods and seeking solutions on a grid. In this section, we present our simple Poisson solver and how it is coupled with transport equations. We start with the device grid and proper boundary conditions used. We point out Poisson s equation is to be solved self-consistently with transport equations, and in such case more efficient algorithms can be deployed. By introducing the use of quasi-fermi levels as a bridge, transport equations and Poisson s equation obtain a damping factor in the self-consistent loop. We look into the details of this coupled scheme and its implementation. In the end, the Poisson s equation are solved using Newton-Raphson iterating technique.

55 42 Fig The device grid and boundary conditions used for solving Poisson s equation. The actual grid is finer than the one illustrated here Device grid and boundary conditions Boundary conditions To solve a PDE, boundary conditions have to be properly set. Boundary conditions are grid nodes on edges, so we have to make sure not only the grid completely covers the region of our simulation interest, but also it terminates at places where boundary condition is known to us. As marked on figure (2.9), we have the following types of boundaries: At metallic gate oxide interface: Dirichlet boundary condition. φ = V G. These points directly contact metallic gate, so they should have whatever potential the gate does. At semiconductor/oxide vacuum interface: Neumann boundary condition. dφ dr =0,r being the direction perpendicular to the interface. Zero field here means no electron drift/leak toward outside of the device.

56 43 At semiconductor metallic source/drain contact interface: Neumann boundary condition. dφ dr =0,r being the direction perpendicular to the interface. Zero field here allows the potential to float to whatever value it needs to maintain charge neutrality. Although the source/drain extension regions are heavily doped semiconductors, they are not perfect metals and in reality can still have electric field within them under high drain bias. The Neumann boundary condition forces the boundary and only the boundary to assume the role of a metal by having no electric field, thus eliminating drifting current at that point. Under non-equilibrium conditions, electrons are supplied from source solely by diffusive current, which the diffusion constant is that of semiconductor body. Control volume method It seems we can straightforwardly utilize the grid laid out in figure (2.9) and use finite difference method to discretize the Poisson s equation. While doing so however, one would come to realize we cannot take care of material interfaces properly. At oxide/semiconductor interface for example, a FDM expression for the interface node consists of all neighboring nodes, but the dielectric constant ε at these neighboring nodes are different. In addition, we are not even sure which dielectric constant we shall use for the interface node. Therefore, it s not immediately clear how we can treat the interface nodes properly in FDM. The proper treatment at such material interfaces is through Control Volume Method (CVM). CVM is essentially a finite element method based on flux balances on a finitecontrolvolume(3d)/area(2d) insteadofthegoverningpartialdifferential equations. In order to demonstrate how it s implemented in nanomos, we choose to treat a very general case shown in figure.

57 44 Fig Illustration of Control Volumn Method (CVM) on node φ m,n. The dashed black box is the control volumn Different methodology, but same goal here: we are seeking a linear expression centered around node (m,n) in this finite difference grid. Foremost important thing is to realize the flux balance: it is steady-state in our case, so the flux balance is governed by Guass Law. The net electric field flux in and out of the Gaussian surface (control area) equals its enclosed net charge. Q enclosed = ε 0 E dl (2.63) l Q enclosed enclosed total charge by Gaussian surface The electric field can be approximated as (flow into Gaussian surface is defined as positive)

58 45 E = V d E top = (φ m+1,n φ m,n ) b E bottom = (φ m 1,n φ m,n ) b E left = (φ m,n 1 φ m,n ) a E right = (φ m,n+1 φ m,n ) a (2.64) (2.65) (2.66) (2.67) (2.68) Now, we compute the net electric field flux. In first quadrant, the Gaussian surface has two segments, A and B, and the electric field flux through them are Q 1 = ε 1 l E dl = ε 1 E top a 2 + E right b 2 = ε 1 (φm+1,n φ m,n ) b a 2 + (φ m,n+1 φ m,n ) b a 2 (2.69) Similarly, we can find the flux in other quadrants Q 2 = ε 2 E top a 2 + E left b 2 (φm+1,n φ m,n ) = ε 2 a b 2 + (φ m,n 1 φ m,n ) a Q 3 = ε 3 E bottom a 2 + E left b 2 (φm 1,n φ m,n ) = ε 3 a b 2 + (φ m,n 1 φ m,n ) a Q 4 = ε 4 E bottom a 2 + E right b 2 = ε 4 (φm 1,n φ m,n ) b b 2 b 2 a 2 + (φ m,n+1 φ m,n ) b a 2 (2.70) (2.71) (2.72) Therefore, the total charge enclosed is

59 46 Q enclosed = Q 1 + Q 2 + Q 3 + Q 4 (φm+1,n φ m,n ) = ε 1 b (φm+1,n φ m,n ) + ε 2 b (φm 1,n φ m,n ) + ε 3 b (φm 1,n φ m,n ) + ε 4 b a 2 + (φ m,n+1 φ m,n ) a a 2 + (φ m,n 1 φ m,n ) a a 2 + (φ m,n 1 φ m,n ) a b 2 b 2 b 2 a 2 + (φ m,n+1 φ m,n ) b a 2 (2.73) The enclosed charge can be approximated by the charge density at the central node multiplied by the control area Clean up the equation, we get Q enclosed = abn m,n (2.74) abn m,n = a 2b (ε 1 + ε 2 ) φ m+1,n + a 2b (ε 3 + ε 4 ) φ m 1,n a 2b + b (ε 1 + ε 2 + ε 3 + ε 4 ) φ m,n 2a + b 2a (ε 2 + ε 3 ) φ m,n 1 + b 2a (ε 1 + ε 4 ) φ m,n+1 (2.75) It is left in this form without further reduction because it is the exact CVM equation used in nanomos. It is also interesting to see that in case of homogeneous material, all dielectric constants equal to each other and the formula simply reduce to that of FDM. Solving the system of equations After obtaining the CVM equation for Poisson s equation, it is straight-forward to write down the complete system of equations for the device. For a 2D finite difference grid with N non-boundary nodes, we have a system of N equations with N unknowns.

60 Transport-Poisson coupling with Gummel scheme As for now, we have two systems of discretized equations: 2D Poisson s equation and 1D transport equation. For the transport equation, we use drift-diffusion equation as an example. The properly discretized Poisson s equation in a 2D finite difference grid centered at node (h, i) is abn h,i = a 2b (ε 1 + ε 2 ) V h+1,i + a 2b (ε 3 + ε 4 ) V h 1,i a 2b + b (ε 1 + ε 2 + ε 3 + ε 4 ) V h,i 2a + b 2a (ε 2 + ε 3 ) V h,i 1 + b 2a (ε 1 + ε 4 ) V h,i+1 (2.76) The properly discretized Drift-diffusion transport equation in a 1D finite difference grid centered at node (i) is (µ i + µ i 1 ) B (φ i φ i 1 ) e φ i φ i 1 n i 1 (µ i + µ i 1 ) B (φ i φ i 1 )+(µ i+1 + µ i ) B (φ i+1 φ i ) e φ i+1 φ i ni +(µ i+1 + µ i ) B (φ i+1 φ i ) n i+1 =0 (2.77) Transforming electron density between 2D and 1D The Poisson s equation and transport equation are obviously still in their own dimensions: Poisson s equation is in 2D, and the transport equation is in a 1D modespace representation. In order to couple them together, we have to find a way to relate the 2D and 1D electron density. To eliminate some possible confusions: both Poisson s equation and transport equation contain two unknowns, the electron density and electrostatic potential, in

61 48 real space representation. However, due to the change to mode space representation, the electrostatic potential term in transport equation has a new meaning it is the subband potential at a certain vertical slice along the channel. To relate this subband potential to the 2D electrostatic potential is the job of Schrdinger equation solver, and it is not straight forward to do so. Thus, the only commonly shared unknown variable that is easy to transform between 2D and 1D is the electron density; the 1D electron density used in the transport equation is merely a vertical sum of the 2D density inside a vertical slice. Fig This chart illustrates how to convert electron density between 2D and 1D within a certain vertical slice. 1. electron density 2D -> 1D Simply sum up the 2D electron density in a vertical slice n 1D i = n h=1 n 2D h,i (2.78) Notice n 1D i is the total 1D electron density across all subbands. The electron density within each subband id related to Fermi distribution function. 2. electron density 1D -> 2D

62 49 For this operation, we have to refer to the wavefunctions of the subbands. For a certain subband, the electron density distribution is proportional to the wavefunction magnitude. n h,i = n f s ϕ h,i,s 2 (2.79) s=1 ϕ h,i,s wavefunction at layer i, vertical grid node h, and in subband s The summation is over all possible subbands. The wavefunction solution is obtainable by solving Schrdinger equation in a principle layer in confinement direction. The details are presented in previous section. f s is the fermi distribution function acting as a weight for the possibility of electron occupation at that subband energy. By doing so, we are able to obtain n h,i for every vertical node. n h,i is not the electron density; it is merely a weight proportional to the electron density resides at that node. The true electron density at node (k, i) isgivenby n k,i = n k,i N z n h,i h=1 n i (2.80) We have thus successfully expanded a 1D electron density to a 2D grid Coupled solution scheme By using the method presented in this section, we have accomplished the coupling between a real space 2D Poisson s equation with a mode space 1D transport equation. The job remaining is to solve for all these unknowns in the system of discretized equations. Here, we face a choice: we can either solve the transport and Poisson s equation together at once, or we can start with an initial guess and solve the two equations self-consistently. The former method is called a non-iterative scheme, and very often people use Newton s method to solve such large system of equations, thus it

63 50 is known as the Newton s scheme; the later method relies on a self-consistent iterative method and is known as the Gummel iterative scheme. Newton s scheme has a one-step-solves-all style, and since it is relying on Newton s method, stable solution is almost guaranteed. If a set of equations has trouble to converge in this scheme, it is almost certain that they are ill conditioned. However, the disadvantage is the large size of matrix one has to have. Since it is solving everything together, one needs to manipulate with a square matrix containing number of elements equal to the second power of total number of unknowns. This can be very challenging for a large device region computationally. Gummel s scheme is an iterative method. It has the advantage of solving transport and Poisson s equation separately, thus dramatically reducing the computational demand. However, it suffers from many critical drawbacks regarding its convergence. As most iterative method, Gummel s scheme is sensitive to a good initial guess. Without a good guess, it may converge very slowly or in the worst case may diverge from true solution. As we progress, we discuss corresponding techniques to counter these drawbacks and obtain stable convergence. As mentioned in next section, the stability problem forces us to transform Poisson s equation into a non-linear form. In order to archive computational efficiency while maintaining stability of solutions, nanomos has implemented the Gummel s scheme for outer transport-poisson equations coupling and Newton-Raphson iteration for solving inner non-linear Poisson s equation Non-linear damping The discretized transport equation in matrix form usually has a small size because it is 1D and linear. Therefore, we can just use any of the standard technique to invert the matrix and solve it. In nanomos, the straight forward Gaussian elimination method is used to solve for the matrix s inverse. For a complete discussion of related methods in inverting a matrix, please see appendix B.

64 51 Now, in Gummel s scheme, equations are solved iteratively starting from an initial guess. Each iterative loop takes previous solution as input and solves for a new set of solution. If the new and old solutions become very similar to each other, we declare such solution as converged. However, if the new and old does not get close to each other, or in worst case depart from each other, we call such unwanted situation divergence. One can imagine the relationship between the new and old solution like a vibrating spring fixed on one end: its free end moves up and down. If no perturbation occurs (wind, gravity, fraction, etc.), the spring moves up and down forever (new and old solutions never change due to no damping). If we add some helping perturbation such as fraction, the spring is then damped and eventually the maximum and minimum positions become very close to each other (converged) or even stops moving (exact solution). If we add some destructive force like intentionally pulling the string, the spring may go wild and maximum and minimum positions become further apart (diverge). Obviously, we need to accomplish two things here: firstly, perturb our solutions in each iterative loop (so the spring won t vibrate forever). Secondly, ensure we are perturbation it in a damping way instead of a divergence. The spring in our case is a very complicated coupled system of equations between transport and Poisson s equations; the perturbation is adjustment made to the electrostatic potential and electron density in each iteration, and this is built-in since we are solving transport and Poisson s equation back and forth. One mechanism missing here is obviously a way to ensure the perturbation is of damping type to avoid divergence. The transport equation is solved in one step for exact solution, so the burden of damping is placed at Poisson s equation. In order to archive stable convergence in Poisson s equation, we choose to exploit the concept of quasi-fermi level due to its exponential form. The reason for this will become clear shortly.

65 52 The quasi-fermi level for electrons is defined as n F = qv + k B T F 1 1/2 N C In terms of quasi-fermi level, the electron density thus becomes n = N C F 1/2 F + qv k B T In non-degenerate limit, the Fermi-Dirac integral can be simplified (2.81) (2.82) F + qv n = N C exp k B T (2.83) Transport equation uses electrostatic potential of previous iterative loop to solve for electron density, and we denote this potential V old. Poisson s equation uses the electron density from transport equation to solve for a new electrostatic potential, and we denote this potential V new. Substitute the quasi-fermi level obtained from transport equation into the electron density term in Poisson s equation, we get n = N C exp qv n old + k B T ln N C + qv new k B T q (Vnew V old ) = N C exp exp k n BT ln k B T k B T q (Vnew V old ) = n exp k B T N C (2.84) The above equation (2.84) only holds true if V new and V old are equal, and that is when the solution converges and become exact. Since V new is to be determined by the Poisson s equation, it gives an extra degree of freedom to the solution Newton-Raphson iteration for Poisson s equation Now, the Poisson s equation has became non-linear, and to solve it we need to use Newton-Raphson method.

66 53 Fig Illustration of how Newton iterations converge toward a solution with a good starting guess. Newton-Raphson method is a simple and powerful iterative technique to determine the solution of an equation. Here, we use a simple example to illustrate its usage. For example, we wish to solve for the solution of exp(x) x 10 = 0 The reason for using this particular equation will become clear later soon. Using Newton s method, we start from an initial guess. As illustrated by the figure, we start from a guess of x guess =3.5 Calculate the slope of curve at that point by taking the derivative of the equation

67 54 y =exp(x) x 10 dy dx x=3.5 =exp(x) 1 =exp(3.5) 1 = The equation for the line tangent to x =3.5 isthus Then, find the value of x when y =0 z = x x = 0 x = This updated x value is used as the new guess, and the iteration goes on until x become very close to the true value, or in another word, the difference between x from two consecutive iterations is negligible. As one can see from figure (2.12), which is Newton s method in graphical presentation, it has the advantage of fast convergence, and if the initial guess is placed right, convergence of solution is just a matter of time. We now turn into our non-linear Poisson s equation. Centered at node (h, i), the Poisson s equation can be written as f α = a 2b (ε 1 + ε 2 ) V h+1,i + a 2b (ε 3 + ε 4 ) V h 1,i a 2b + b (ε 1 + ε 2 + ε 3 + ε 4 ) V h,i 2a + b 2a (ε 2 + ε 3 ) V h,i 1 + b 2a (ε 1 + ε 4 ) V h,i+1 abn h,i exp V h,i Vh,j old =0 (2.85)

68 55 The system of equation containing N unknowns is of course more complicated than our simple example, but the essential steps are the same. We need to find its derivative, then the tangent line interpolating from one guess to another, and repeat until solution converges. In our simple example which is 1D, the slope is just a number; in our Poisson system of equations, this derivative term has become a full matrix called Jacobian matrix. Jacobian is a term familiar to vector calculus denoting the matrix of all first-order partial derivatives of a vector-valued function. The Jacobian matrix can then be formed by taking derivative with respect to all other nodes. Notice each equation centered at a specific node contains five unique variables, so the resulting Jacobian matrix contains five diagonal terms. The matrix is very sparse, so we can take advantage of this to save computational memory. We denote the Jacobian matrix element containing the derivative of node α with respect to node β as f α,β = f α V β (2.86) Thus, we obtained the Newton-Raphson iterative equation with Taylor s expansion correction to first order f α (V new ) f α (V old )+f α,β (V old ) V β =0 (2.87) V β is the linear interpolated shift of V at node β. We now observe the Poisson s equation in this particular form in detail and discuss the strategy suggested by Brown and Lindsay [28] to ensure stable convergence. It is not difficult to see that the iterative equation is of form e V V + C =0 (2.88) This equation is in exact form of our simple equation with the complication of multi-dimensions. We now illustrate the instability problem from the simple 1D case. If for some reason (bad starting guess for example), V lands at point A, and an followed-up linear interpolation will lead to a V new at B. If we continue as usual, value

69 56 of f α becomes extremely large and may very well overflow the computer memory; this is a bad step and diverges us from true solution. In order to avoid such divergence, Brown and Lindsay suggested to impose artificial reduction of such outbound solutions. A bad solution at point B for example can be reduced by its fifth root or nature log to a less aggressive position C, thus avoiding the divergent solution. In nanomos, the following original recipe suggested by Brown and Lindsay is used and proven to work well if V new V old < 1, V change = V new V old. (No reduction) if 1 < V new V old < 3.7, V change = sign (V new V old ) ( V new V old ) 1/5. (1/5th root reduction) if 3.7 < V new V old,v change = sign (V new V old ) log ( V new V old ). (nature log reduction) Fig Illustration of Newton iteration with a bad starting guess can be saved from divergence with Brown and Lindsay suggested fix.

70 57 Such divergent situation usually only occurs at the beginning of the simulation, when the starting several guesses can be far away from true solution. Once guess becomes near the true solution, it is the nature of Newton s method that a fast and stable convergence can be archived.

71 Introduction 3. QUANTUM BALLISTIC TRANSPORT The other major branch of nanomos simulator is the quantum transport module. In nanomos, we implement non-equilibrium Green s function (NEGF) formalism as a mean to solve Schrdinger equation in transport direction [29]. It is not until version 4.0 nanomos implements proper phonon scattering treatment into NEGF. Scattering prior to version 4.0 is implemented via Buttiker probe approach [30], which is now obsolete and no longer available in nanomos. In this section, we only look at the ballistic NEGF transport in details and briefly mention dissipative NEGF transport with phonon scattering which is documented elsewhere [31] [32]. We start by some general discussion on the quantum ballistic transport model including its advantages and disadvantages comparing to semi-classical models. The formation of model Hamiltonian for our device is then presented. The Hamiltonian can take on different forms depends on which representation and basis we use. We explain in detail the difference between some common choices and why we choose to use effective mass and mode space approach in nanomos. Contact effects are incorporated into a finite size Hamiltonian matrix by the introduction of self-energy. Electron density and current can be evaluated via the knowledge of electron correlation function, and we present recursive Green s function (RGF) method to obtain them in a computationally efficient manner. In the end, we simply point out the formulas used to determine electron density and current in NEGF formalism that are well documented in standard texts.

72 Transport model overview In order to take quantum effects into account, such as interference and tunneling, one must solve the Schrdinger equation within device region instead of the previously demonstrated different variances of semi-classical Boltzmann transport equation. Interference and tunneling can have significant impacts on device characteristics especially in the nanoscale. For example, due to interference effect, electron density profile in the source regions may show fluctuations not captured in semi-classical picture. Also, portion of off-state current is contributed to electron tunneling between source and drain, which may be significant if channel is short. 3.3 System Hamiltonian Introduction Constructing the Hamiltonian matrix for the device is always one of the first several steps toward quantum simulation. Depending on the model one uses, the Hamiltonian can take on different forms. Here we start from the fundamentals and examine every step and detail lead us to the final formalism used in nanomos quantum transport module. We start from stationary Schrdinger equation as a common ground and then discuss the use of effective mass and mode space approach including their advantages and disadvantages. For effective mass concept, we derive the effective mass Schrdinger equation used in nanomos and compare it with real space approach. For mode space approach, we derive the mode space Schrdinger equation and highlight its validity. With these discussed preliminaries, we discretize the Schrdinger equation to form the Hamiltonian matrix and introduce the self-energy concept to take effects from infinite contacts into account.

73 The Schrdinger equation In order to take quantum effects into consideration, the Schrdinger equation must be solved. The most general form of Schrdinger equation within our interest (no magnetic field, no relativistic effect, 1D) is the time dependent Schrdinger equation i h Ψ t = h2 2 Ψ 2m x + V Ψ (3.1) 2 h Plank s constant Ψrealwavefunction V total potential m mass of electron In our case, we seek stationary solutions, which means our solutions are invariant of time. Thus we use a time independent Schrdinger equation. It can be obtained by aseparationofvariableswith Ψ(x, t) =e iet/ h ψ (x) (3.2) h2 2 2m x + V 2 ψ = Eψ (3.3) h2 2 2m x + V Hamiltonian 2 E allowed energies of the system ψ stationary wavefunction of the system Our system is complicated, so seeking a continuous and explicit wavefunction solution is not realistic and impossible. Therefore, we need to expand the unknown

74 61 stationary wavefunction of the system in terms of basis functions that are known to us. ψ = n C n ϕ n (3.4) n expansion basis index C n wavefunction expansion coefficient for basis index n ϕ n wavefunction component for basis index n Until here, the concepts above are fundamental to and shared by most if not all the quantum device simulators available today. What make them begin to differ in general structure is each s decisions in what basis they use and how to solve the Schrdinger equation. We now examine some of the most popular choices of basis below in order to illustrate the reasoning behind the choice made in nanomos Choice of basis: atomistic tight-binding vs. effective mass model Atomistic tight-binding model Observe that our system is really composed of individual atoms, so a natural choice of basis would be the atomic orbitals. Each individual atom has certain allowed states for electrons to reside on the so called orbitals. When individual atoms are arranged together to form a solid, interactions occur among neighboring atoms, and their originally highly localized orbital states becomes less localized. Lowdin [33] proposed in his work that one can still find a set of equivalent orthogonal orbitals, the so called Lowdin orbitals, as an alternative to neglect overlap. Therefore, starting with orthogonalized Lowdin orbitals, one is able to construct stationary wavefunction of the system with localized atomistic basis. The advantage

75 62 of tight-binding approach is it describes the lattice structure accurately. The disadvantage is its validity mostly relies on the coupling parameters which usually need to be extracted empirically from experimental data or calculated sophisticatedly from first principles. In appendix C, we however demonstrate the wavefunction formalism with tightbinding model as comparison. Effective mass model Effective mass model essentially approximate atomic scale potential perturbation arise from the periodic crystal structure by modifying the rest electron mass to an effective one, and this also merits the use of simple planewave as real wavefunction expansion basis instead of the more complicated Bloch wave [34] basis. Starting from the stationary Schrdinger equation of the system h2 2 2m x + U 2 crystal (x)+u (x) ψ = Eψ (3.5) U crystal is the perturbation due to crystal lattice U is other forms of external perturbation Notice here we have separated the crystal lattice perturbation from other possible sources of perturbation (such as scattering, gate potential). Now, we expand the real wavefunction with Bloch wave as basis

76 63 ψ = v,k C v,k v, k (3.6) v, k = e ikr u v,k (r) istheblochfunction v is the subband index k is the wavevector C is the wavefunction expansion weight. In time dependent case, it s time varying. Substitute the basis into the Schrdinger equation, we get v,k C v,k h2 2m 2 x + U 2 crystal,v,k (r)+u v,k (r) v, k = v,k C v,k E v (k) v, k (3.7) Follow the usual procedure to turn it into a matrix equation by Fourier s trick [35], v,k C v,k h2 2 2m x + U 2 crystal,v,k (r)+u v,k (r) v, k v,k = v,k C v,k E v (k) v, k v,k v, k h2 2m = E v (k) C v,k The first term, v, k h2 2 x + U 2 crystal,v,k (r) v, k C v,k + v,k 2m v,k Uv,k (r) v, k C v,k (3.8) 2 + U x 2 crystal,v,k (x) v, k, is the expectation value of energy for an electron traveling in pure crystal without external perturbation. Clearly, this is the bandstructure of the crystal, and it is readily available for common semiconductors. The second term is the expectation value of energy for external perturbations between different k and v.

77 64 v v, k U v,k (r),k dr = L e ikr u v,k (r) U v,k,v,k (r) eik dr L u v,k (r) u v,k (r) ei k k r U v,k,v,k r u v,k (r) (r) (3.9) Here we will make an important assumption of the effective mass model, u v,k (r) u v,k (r) (3.10) This assumption states the periodic function component of Bloch wave is wavevector independent within a subband. In most cases, we are only interested in electron transport near the bottom of the conduction band, so the degree of freedom for k value is small. Further, this important assumption decouples the subbands and merits us to treat each subband individually. To see this, place the assumption into equation v, k U v,k (r) v,k = dr L u v,k (r) u v,k = k U v,k (r) k δ The original Schrdinger equation then becomes E v (k) C v,k + v,k k k (r) ei r U v,k,v,k (r) v, v (3.11) k k U v,k (r) δ v, v C v,k = E eigen,v (k) C v,k E v (k) C v,k + k U v,k (r) k C v,k = E eigen,v (k) C v,k E v (k) C v,k k + U v,k (r) C v,k k = E eigen,v (k) C v,k k (3.12) This looks much like the original Schrdinger equation, but the basis has been simplified to that of planewave, instead of Bloch wave. In another word, one advantage of effective mass model is it merits us to use planewave as basis. Now, we introduce the second important approximation: the concept of effective mass. E v (k) is the electron energy corresponding to a certain k at certain subband v. Usually, one shall lookup a band diagram to obtain this energy. However, as mentioned before, in most of the cases we only interest in electronic transport occurring

78 65 near the bottom of the conduction band, and in major semiconductors (Si, Ge, and others) a simple parabolic shape is able to describe the bottom of the conduction band well by introducing an effective electron mass that depends on the material type. Usually this effective mass has to be empirically fit from experimental data. At the bottom of conduction band, E v (k) =E c0 + h2 k 2 2m (3.13) E c0 is the energy at bottom of the conduction band. Thus, with the concept of effective mass, the Schrdinger equation becomes E c0 + h2 k 2 C 2m v,k k + U v,k (r) C v,k k = E eigen,v (k) C v,k k h 2 k 2 2m C v,k k + E c0 + U v,k (r) C v,k k = E eigen,v (k) C v,k k (3.14) Also notice h 2 k 2 2m C v,k k = h2 2m ( ) C v,k k (3.15) So, the final effective mass Schrdinger equation is h2 2m ϕ +(E c0 + U (r)) ϕ = E v ϕ (3.16) ϕ = C v,k k In nanomos, we will use the concept of effective mass over atomistic tight-binding model due to its simplicity and computational efficiency Choice of representation: real space vs. mode space Real space approach Since we are considering a 2D device region, we need to solve the 2D effective mass Schrdinger equation. h2 2m x 2 h2 ϕ (x, z) x2 2m z 2 z 2 ϕ (x, z)+(e c0 + U (x, z)) ϕ (x, z) =Eϕ(x, z) (3.17)

79 66 The wavefunction depends on both x and z directions, so it s a 2D wavefunction. With a real space grid of size N x N z, we deal with a Hamiltonian of size (N x N z ) 2. This can be a huge matrix with its solution computationally challenging. Mode space approach The motivation behind mode space approach is to exploit the quantized momentum of carrier in confined transverse direction and reduce the total size of the Hamiltonian. Here we justify this approach. Expand the wavefunction in terms of eigenfunctions in transverse direction: ϕ (x, z) = n C n (x) φ n (x, z) (3.18) Multiply each side of the Schrdinger equation (3.17) with φ i (x, z) δ x x, and integrate over entire 2D region. Doing so allows us to focus on a certain subband i at position x. φ i (x, z) δ x x h 2 2 2m z z n C 2 n (x) φ n (x, z) dxdz + φ i (x, z) δ + x x U (x, z) n C n (x) φ n (x, z) dxdz φ i (x, z) δ x x h 2 2 C 2m x x 2 n (x) φ n (x, z) dxdz n = φ i (x, z) δ x x (E E c0 ) C n (x) φ n (x, z) dxdz (3.19) n Use the orthogonality of basis, the first term becomes φ i (x, z) δ x x h 2 2 2m z z n C 2 n (x) φ n (x, z) dxdz + φ i (x, z) δ x x U (x, z) n C n (x) φ n (x, z) dxdz = φ i (x, z) δ x x h 2 2 2m z z + U x,z C 2 n (x) φ n (x, z) dxdz n = φ h 2 2 i x,z 2m z z + U x,z C 2 n x φ n x,z dz n = φ i x h 2 2,z 2m z z + U x,z ϕ x,z 2 = E i C i x (3.20)

80 67 For the second term, recall the chain rule for second order derivatives Therefore, 2 A (x) B (x) = x2 x = x x AB A x B + B x A = A 2 x 2 B + x A x B + B 2 x 2 A + x A x B = A 2 x 2 B + B 2 x 2 A +2 x A x B (3.21) A 2 x B = 2 2 AB B 2 x2 x A 2 2 x A x B (3.22) Use the relation above, the second term on the left becomes = φ i (x, z) δ x x h 2 2 δ x x h 2 2m x δ x x n 2 2m x x 2 n 2 x 2 φ i (x, z) n C n (x) φ n (x, z) h2 2m x δ x x h 2 x φ i (x, z) x 2m x C n (x) φ n (x, z) dxdz C n (x) φ n (x, z) dxdz 2 x 2 φ i (x, z) dxdz C n (x) φ n (x, z) dxdz (3.23) Now, we make a critical assumption that transverse eigenfunctions doesn t vary with position in transport direction φ i (x, z) x n =0 (3.24)

81 68 Then, the second term (3.23) further reduce to a simple form = φ i (x, z) δ x x h 2 = h2 2m x δ x x h 2 2m x The term on the right hand side is 2 2m x x 2 n 2 x 2 C i (x) dxdz C n (x) φ n (x, z) dxdz 2 x C 2 i x (3.25) = φ i (x, z) δ x x (E E c0 ) n φ i x,z (E E c0 ) n C n (x) φ n (x, z) dxdz C n x φ n x,z dz =(E E c0 ) C i x (3.26) Thus, the 2D effective mass Schrdinger equation has become 1D h 2 2 2m x x C 2 i x + E i C i x =(E E c0 ) C i x (3.27) C i x is the transverse eigenfunction basis coefficient of ith mode at position x along the channel. What we have accomplished here is to turn a 2D transport problem into 1D. In order to obtain the subband energy and wavefunction coefficients, we need to solve another effective mass 1D Schrdinger equation in confinement direction. The greatest advantage comes in the reduction of Hamiltonian size. In confinement direction, we have a 1D Hamiltonian with size N z N z ;intransportdirection,wehaveanother1d Hamiltonian with size N x N x. This is a dramatic reduction comparing to the real space 2D Hamiltonian with size (N x N z ) 2 and saves lots of computing time. However, one essential thing to keep in mind is the validity of mode space approach formula used here relies on the assumption (3.24).

82 System Hamiltonian in matrix form with finite difference approximation In nanomos, the 1D mode-space effective mass Schrdinger equation is used to treat quantum transport: h 2 2 2m x x C 2 i x + E i C i x =(E E c0 ) C i x (3.28) We will apply the usual recipe of finite difference method to transform the differential equation into a linear difference equation. Layout a uniform 1D finite difference grid with spacing constant a along the channel, and we seek solution to the Schrdinger equation at each grid node. At node n, the Schrdinger equation becomes h 2 2 2m x x C 2 i (x n )+E i C i (x n )=(E E c0 ) C i (x n ) h 2 Ci (x n 1 ) 2C i (x n )+C i (x n+1 ) + E 2m x a 2 i C i (x n )=(E E c0 ) C i (x n ) h 2 2 h 2 C 2a 2 m i (x n 1 )+ + E x 2a 2 m i + E c0 C i (x n )+ h2 C x 2a 2 m i (x n+1 )=EC i (x n ) x with tc i (x n 1 )+(2t + E i + E c0 ) C i (x n ) tc i (x n+1 )=EC i (x n ) t = h 2a 2 m x (3.29) (3.30) For each node, we can obtain such a finite difference equation relating to the neighboring two grid nodes. However, if we keep doing so for all nodes, we quickly run into a problem with the left and right terminating nodes at the boundaries: we do not know the definite wavefunction at those nodes. In previous cases such as drift-diffusion, we know the boundary electrostatic potentials to be that of source and drain, so we can terminate the finite difference chain of equations. This forces us to find a way to terminate the infinitely large Hamiltonian at the boundaries while taking into account the coupling with the contacts. In another word, we are looking for the self-energy terms.

83 70 Self energy is a broad concept in NEGF formalism [29]; it is used to describe perturbations from sources such as contacts, phonon scattering, photon injections, and others. In this simple ballistic transport model, no scattering is considered, so the self-energy term is solely used to describe contact couplings and truncate the infinitely large Hamiltonian into a finite one. In case of dissipative scattering in nanomos, phonon scattering is introduced via self energy terms that must be solved self-consistently with Schrdinger equation [36]. Self-energy for contacts Acontactisasemi-infiniteregionconnectedtothedeviceregionservingasa reservoir for electrons. In our case, although the geometry suggests the source and drain have finite sizes, they in fact extend left and right toward infinity. In the true contacts, electrostatic potential does not vary, and electron is of Bloch type. The Bloch theorem with our finite difference grid allow us to write the following k is the wavevector. C i (x n )=C i (x n 1 ) e ika (3.31) C i (x n )=C i (x n+1 ) e ika (3.32) We are now able to substitute the Bloch relation into our Hamiltonian and turn it into a finite matrix. At the source end node 1, tc i (x 1 ) e ika +(2t + E i + E c0 ) C i (x 1 ) tc i (x 2 )=EC i (x 1 ) 2t + Ei + E c0 te ika C i (x 1 ) tc i (x 2 )=EC i (x 1 ) (3.33) At the drain end node n, tc i (x n 1 )+(2t + E i + E c0 ) C i (x n ) tc i (x n ) e ika = EC i (x n ) tc i (x n 1 )+ 2t + E i + E c0 te ika C i (x n )=EC i (x n ) (3.34) Now, we can write the 1D Schrdinger equation describing the device region in finite matrix form (H +Σ)C i = EC i (3.35)

84 71 1D Bandstructure Now, we need to find an explicit expression to replace the wavevector with its corresponding energy, so the Schrdinger equation contains only one unknown. This explicit relationship between wavevector k and energy E is commonly known as the bandstructure, or energy spectrum, or dispersion relation. In our case, we are looking for a 1D bandstructure for electrons in source/drain. The finite difference Schrdinger equation centered at a node z inside contact is tc i (x z 1 )+(2t + E i + E c0 ) C i (x z ) tc i (x z+1 )=EC i (x z ) (3.36) Since node z and its neighboring nodes are all inside contact, we are merited to exploit the Bloch theorem for periodic crystal: C i (x z 1 )=C i (x z ) e ika (3.37) C i (x z+1 )=C i (x z ) e ika (3.38) Substitute these Bloch relations into Schrdinger equation, one gets tc i (x z ) e ika +(2t + E i + E c0 ) C i (x z ) tc i (x z ) e ika = EC i (x z ) te ika +2t + E i + E c0 te ika = E (3.39) Apply Euler s identity E = t (cos (ka) sin (ka)) + 2t + E i + E c0 t (cos (ka)+sin(ka)) = 2t cos (ka)+2t + E i + E c0 = E i + E c0 +2t (1 cos (ka)) (3.40) 2t + k =cos 1 Ei + E c0 E (3.41) 2t Now, we have a 1D finite difference Schrdinger equation with only energy E as unknown.

85 Recursive Green s Function formalism Introduction In this section, we introduce the recursive Green s function (RGF) method [37]. Knowledge of retarded Green s function and electron correlation function is essential in determining quantities of interest such as local density of states, electron density and current. It however requires inverting the Hamiltonian matrix which can be fairly large in size. We start with motivation behind using RGF method. By using Dyson s equation as solution to a matrix inversion problem, we derive the RGF formulas for retarded Green s function and electron correlation function Motivation When calculating matters of interest such as electron density, transmission, and current density, two quantities that of paramount importance in NEGF formalism are the retarded Green s function G and electron correction function G n defined as AG = I (3.42) AG n =ΣG (3.43) A system Hamiltonian matrix G Green s function I identity matrix same size as A and G Σselfenergyterm G Hermitian conjugate of Green s function From their definitions, it seems both quantities are easily obtainable. However, the difficulty is numerical we have to invert a full Hamiltonian matrix of entire system in both cases. Inverting a matrix is computationally expensive and poses as the major

86 73 factor of slowing down the calculation. If the device region is large and mesh is fine, one can easily run into computer memory insufficiency. On the other hand, we have another reason to seek an alternative and more efficient way to use NEGF formalism: the self-consistent scheme requires the coupling between NEGF transport module and Poisson s solver. There is no need to ask for anything other than electron density profile from the transport module until solutions converge. After convergence, we can calculate transmission and current density once for all; anything calculated while solution is not converged has no usage apart from debugging purposes. However, both electron density and current can be calculated solely from the main diagonal of the electron correction matrix G n ;therestofg n does not have to be known. This opens another door of opportunity for us to save on computation time. One commonly encountered Hamiltonian matrix is of block tri-diagonal type. In general, it can be written as H = D 0,0 S 0,1 0 0 S 1,0 D 1,1 S 1, S n 1,n 2 D n 1,n 1 S n 1,n (3.44) 0 0 S n,n 1 D n,n S is the coupling block matrix between two principle layers D is the diagonal block matrix containing principle layer Hamtilonian In order to find its Green s function, one needs to invert it. For clarity, we assume the diagonal terms has absorbed the self-energy and other external perturbation terms, and we shall just concentrate on how to find G = H 1 (3.45)

87 74 One can of course use direct inversion method, or the faster LU decomposition. However, here we will exploit the matrix s unique block tri-diagonal nature and deploy what s called Recursive Green s Function (RGF) method. RGF is essentially a decimation technique based on inversion of matrix by partitioning Dyson s equation and recursive method Before we move ahead, it is necessary to introduce the Dyson s equations for Green s function and electron density Green s function in order to appreciate the recursive method [38]. The Dyson s equation is a broad concept associated with the concept of self-energy, and here we only take advantage of its identity which splits Green s function s diagonal with off-diagonal terms. Dyson s equation for Green s function For a device Hamiltonian containing N principle layers, its Green s function is defined as H N,N G N,N = I (3.46) Notice here that each element of the Hamiltonian is in fact a matrix corresponding to a principle layer. If we partition it into four blocks with diagonal block being square matrices of any size, D 00 S 01 S 10 D 11 g 00 g 01 g 10 g 11 = I 0 0 I (3.47) D diagonal square matrix partition of total Hamiltonian S off-diagonal square matrix of total Hamiltonian

88 75 In its equation form, D 00 g 00 + S 01 g 10 = I (3.48) D 00 g 01 + S 01 g 11 =0 (3.49) S 10 g 00 + D 11 g 10 =0 (3.50) S 10 g 01 + D 11 g 11 = I (3.51) If the Green s function partitions are the only unknowns, this system of equations are solvable, and what solves it is called the Dyson s equation G = G 0 + G 0 UG (3.52) = G 0 + GUG 0 (3.53) g 00 g 01 g 10 g 11 G 0 = D D11 1 (3.54) U = 0 S 01 (3.55) S 10 0 G = g 00 g 01 g 10 g 11 (3.56) Again, we write Dyson s equation into equivalent system of four equations = D D 1 11 D D11 1 = D D g 00 g 01 g 10 g 11 0 S 01 S S 01 S 10 0 g 00 g 01 g 10 g 11 D D 1 11 (3.57) (3.58)

89 76 g 00 = D 1 00 D 1 00 S 01 g 10 (3.59) g 01 = D 1 00 S 01 g 11 (3.60) = g 00 S 01 D 1 11 (3.61) g 10 = D 1 11 S 10 g 00 (3.62) = g 11 S 01 D 1 00 (3.63) g 11 = D 1 00 D 1 00 S 10 g 01 (3.64) Similarly, we can obtain a system of equation for Green s function s Hermitian conjugate G = G 0 + G 0 U G (3.65) = G 0 + G U G 0 (3.66) g 00 g 01 g 10 g 11 = D D g 00 g 01 0 S 10 g 10 g 11 S 01 0 = D D D S 10 0 D 1 11 S 01 0 D D 1 11 g 00 g 01 g 10 g 11 (3.67) (3.68)

90 77 Reminder on Hermitian conjugate properties g 00 = D 1 00 g 01S 01D 1 00 (3.69) g 01 = g 00S 10D 1 11 (3.70) = D 1 00 S 10g 11 (3.71) g 10 = g 11S 01D 1 00 (3.72) = D 1 11 S 01g 00 (3.73) g 11 = D 1 11 g 10S 10D 1 11 (3.74) (ABC) = C B A (3.75) A 1 = A 1 (3.76) Recursive method for Green s function Here we seek an alternative way to determine Green s function s diagonal and off-diagonal terms. First, we choose to partition the matrix shown in previous section in such a way that the bottom right diagonal term includes only a single principle layer. Combine eq. (3.51) and eq. (3.60) S 10 D 1 00 S 01 G 11 + D 11 G 11 = I S10 D 1 00 S 01 + D 11 G11 = I g 11 = D 11 S 10 D 1 00 S 01 1 (3.77) Now, we have obtained a formula describing the relation between the upper-left and lower-right diagonal blocks of the retarded Green s function. We are free to choose the size of the blocks as long as both remain square. If we are clever about it,

91 78 we can choose the lower-right diagonal block g 11 to correspond to the last principle layer all the way to the right and g 00 to correspond to the rest of the principle layers. If we have totally N principle layers ordered from left to right, g N,N = D N,N S N,1:N 1 D 1 1:N 1,1:N 1, S 1:N 1,N 1 (3.78) Fig Forward partition of total Hamiltonian. The lower right piece is just the Hamiltonian of a single layer. The two non-square pieces are the coupling between the singled out layer and rest of the device. Now, the matrix size becomes important, and we start to denote matrix size on essential matrices using the following notation. g N,N N 1,N 2 (N,N) superscript denotes the total Hamiltonian size it is derived from (N 1,N 2) subscript denotes its row and column indices Above notation thus means the (N 1) th row, (N 2) th column block of the inverse of Hamiltonian of size (N,N). Since the Hamiltonian loses 1 row and column each time it is partitioned, its size is an unique indication of partition progress.

92 79 If we write it down in matrix form, we can find some dramatic reduction g N,N N,N = D N,N 0 S N,N 1 = g N 1,N 1 1,1 g N 1,N 1 1,N g N 1,N 1 N 1,1 g N 1,N 1 N 1,N 1 0. S N 1,N D N,N S N,N 1 g N 1,N 1 N 1,N 1 S N 1,N 1 (3.79) Now, we have not only reduced the matrix size, but also expressed the equation to describe the relation between the adjacent diagonal blocks of the retarded Green s function. In order to determine g N,N N,N, we need to determine gn 1,N 1 N 1,N 1, which is lowerright diagonal block of the inverse of the Hamiltonian of all layers to N s left. It is very important to realize 1 This is easily seen from the figure. g N 1,N 1 N 1,N 1 g N 1,N 1 N 1,N 1 = gn,n N 1,N 1 (3.80) is the last diagonal block of a new Green s function, not the second last diagonal block of the old one, which is g N,N N 1,N 1. In another word, we have moved away from the old matrix as shown in the figure (3.2). We can re-apply the recursive formula on g N 1,N 1 N 1,N 1 1 g N 1,N 1 N 1,N 1 = D N 1,N 1 S N 1,N 2 g N 2,N 2 N 2,N 2 N 2,N 1 S (3.81) Keep applying the recursive formula, 1 g N 2,N 2 N 2,N 2 = D N 2,N 2 S N 2,N 3 g N 3,N 3 N 3,N 3 N 3,N 2 S (3.82) g 1,1 1,1 =(D 1,1 ) 1 (3.83) With each recursion, the Hamiltonian needed to be inverted decreases by a principle layer toward the left. It is like we are progressing from the right to the left. Thus, this inverse of the partial Hamiltonian in each step is called the left-connected

93 80 Fig Illustration of recursive partition of Hamiltonian matrix and the chain of equations associated with it. Green s function. If we keep recursing and reach the end, we have just an inversion of the one last principle layer s Hamiltonian. After we obtained the g 1,1 1,1, we revert what we did and backward recurs toward g N,N N,N. After q steps, we will obtain gn,n N,N along with all other g on the way. Our ultimate goal is to find the tri-diagonals of the Green s function, and we currently only has the very lower-right block of the Green s function, G N,N = g N,N N,N. Observe eq. (3.60) G 1:N 1,N = D 1 1:N 1,1:N 1 S 1:N 1,NG N,N Like what we did for eq. (3.79), we can write it in matrix form and reduce it

94 81 G 1,N. G N 1,N = g N 1,N 1 1,1 g N 1,N 1 1,N g N 1,N 1 N 1,1 g N 1,N 1 N 1,N 1 0. S N 1,N G N,N G N 1,N = g N 1,N 1 N 1,N 1 S N 1,NG N,N (3.84) Similarly for G N,N 1 we can find a reduced expression using eq. (3.63) G N,1:N 1 = G N,N S N,1:N 1 D 1 1:N,1:N G N,N 1 = G N,N S N,N 1 g N 1,N 1 N 1,N 1 (3.85) With the knowledge of the two off-diagonal element eq. (3.84) and eq. (3.85), we can determine the next diagonal block via eq. (3.59). Similar to what we did before, we can reduce it G 1:N 1,1:N 1 = D 1 1:N 1,1:N 1 D 1 1:N 1,1:N 1 S 1:N 1,NG N,1:N 1 (3.86) G N 1,N 1 = g N 1,N 1 N 1,N 1 gn 1,N 1 N 1,N 1 S N 1,NG N,N 1 (3.87) G N,N 1 has been determined by eq. (3.85). In order to determine next block of Green s function G N 2,N 2,weneedtorepartition G. The Dyson s equation once again tells us G 1:N 2,N 1:N = D 1 1:N 2,1:N 2 S 1:N 2,N 1:NG N 1:N,N 1:N (3.88)

95 82 Fig After the first recursion process, we forward partition again to obtain the diagonal and nearest off-diagonal blocks. This time however, we partition the blocks differently. This in matrix form is G 1,N 1 G 1,N.. G N 2,N 1 G N 2,N g N 2,N 2 1,1 g N 2,N 2 1,N 2 =..... g N 2,N 2 N 2,1 g N 2,N 2 N 2,N G N 1,N 1 G N,N 1 S N 2,N 1 0 G N 1,N G N,N (3.89) Similarly, we can obtain G N 2,N 1 = g N 2,N 2 N 2,N 2 S N 2,N 1G N 1,N 1 (3.90) G N 1,N 2 = G N 1,N 1 S N 1,N 2 g N 2,N 2 N 2,N 2 (3.91) G N 2,N 2 = g N 2,N 2 N 2,N 2 gn 2,N 2 N 2,N 2 S N 2,N 1G N 1,N 2 (3.92) Keep re-partition like this, and we eventually obtain every diagonal and nearest off-diagonal elements of the retarded Green s function for the device. The general equation can be summarized as, for nth diagonal block of G, G n,n+1 = g n,n n,ns n,n+1 G n+1,n+1 (3.93) G n+1,n = G n+1,n+1 S n+1,n g n,n n,n (3.94) G n,n = g n,n n,n g n,n n,ns n,n+1 G n+1,n (3.95)

96 83 Fig Illustration of the second forward partition process. The red blocks are ones determined, and green blocks are ones to be determined in that recursion step. Notice that unlike the recursive steps in figure (3.2), here we progress in the same Green s function matrix and recursively partition it. Dyson s equation for electron density Green s function In order to evaluate the electron density, one needs the electron density Green s function defined as HG n =ΣG (3.96) The Dyson s equation solving this system is

97 84 G n = G 0 UG n + G 0 ΣG (3.97) = G 0 UG n + G 0 Σ G 0 + G 0 U G (3.98) = G 0 ΣG 0 + G 0 UG n + G 0 ΣG 0 U G (3.99) = G 0 ΣG 0 + G n U G 0 + GUG 0 ΣG 0 (3.100) As we demonstrated in previous section, we can follow similar steps and obtain a recursive formula for the electron density Green s function g n 1,1 = g 1,1 Σ 1,1 + H 1,0 g n 0,0H 0,1 g 1,1 (3.101) 3.5 Simple NEGF formalism Introduction With the knowledge of retarded Green s function and electron correlation function, evaluation of electron density and current follows the standard NEGF formalism. In this section, we simply show the expressions used in nanomos and refer much of the derivations to standard texts and articles [39] [29] [38] [40] Ballistic NEGF electron density The retarded Green s function is simply defined as G (E) = E + iη + I H Σ 1 (3.102) and the advanced Green s function is defined as G + (E) = E iη + I H Σ 1 (3.103) η + is a infinite small number added to the energy. It is important both conceptually and computationally.

98 85 Now, we define the coupling matrix for source and drain Γ S = i Σ S Σ + S (3.104) Γ D = i Σ D Σ + D (3.105) Once the coupling matrix Γ is constructed, we can obtain the source and drain partial spectral function by A S = GΓ S G + (3.106) A D = GΓ D G + (3.107) The diagonal term of the spectral function is the density of states at each node. We can thus find the 1D electron density at each node in subband i and transverse mode j, whichshallbeacombinationofelectronsstreamedfrombothsourceand drain. n i,j = 1 2πa Gn (3.108) with G n =(f S A S + f D A D )electrondensityfunction (3.109) the factor 1 a f S = f (µ s E E i E kj )quasi-fermilevelforsource (3.110) f D = f (µ D E E i E kj )quasi-fermilevelfordrain (3.111) distributes a point charge density to an interval. The 1D density of state in transverse direction is [12] g 1D = m y π h 2 = 2 π h 2 h 2m y E m y 2E j (3.112)

99 86 We integrate across all transverse modes for 2D electron density, n i = 1 2πa 0 2 m y n i,j de j π h 2E j = 1 2 m y (f S A S + f D A D ) de j 2πa 0 π h 2E j = 1 m y k B T F 1/2 (µ S E E i ) A S ha 2π 3 +F 1/2 (µ D E E i ) A D (3.113) We need to then sum over all possible subbands to reach the 3D electron density n (E) = = n i=1 n i=1 n i 1 m y k B T ha 2π 3 F 1/2 (µ S E E i ) A S +F 1/2 (µ D E E i ) A D (3.114) This only tells us the density of electrons at a certain energy, and we need to integrate over all possible energy to obtain total electron density n = = 0 0 n (E) de n i=1 1 m y k B T ha 2π 3 F 1/2 (µ S E E i ) A S +F 1/2 (µ D E E i ) A D de (3.115) Ballistic NEGF current The current at a certain energy evaluated at source/drain should be the same, since there s no scattering event and only two terminals are present. Here we will evaluate it at the source end. The current can be found by [12] I = q h I (E) de (3.116)

100 87 I (E) =T race (Γ S A) f S T race (Γ S G n ) (3.117) The extra factor in front of the energy integration arises from time derivative 1 h of Schrdinger equation and q is just the charge of an electron. A is the complete spectral function defined as A = i G G + = A S + A D (3.118) G n is the electron density function defined in (3.109). Substitute and reduce eq. (3.117), we get I (E) =T race (Γ S A S +Γ S A D ) f S T race (Γ S f S A S +Γ S f D A D ) = T race (Γ S A D f S Γ S A D f D ) = T race Γ S GΓ D G + (f S f D ) = T SD (E)(f S f D ) (3.119) T SD is the transmission from source to drain at energy E, and it s defined as Thus, the current is simply T SD (E) =T race Γ S GΓ D G + (3.120) I (E i, E kj )= q h 0 T SD (E) f (µ s E E i E kj ) f (µ D E E i E kj ) de (3.121) To obtain the total current, we need to follow the usual steps to integrate/sum over all transverse energies, subbands, and valleys. Over all transport energies, we obtain

101 88 I (E i )= q h = q h m y π h 2 m y h 2π T SD (E) = 2q m y h 2 2π T SD (E) = q m y k B T h 2π 3 Over all subbands, we obtain 0 T SD (E) f (µ s E E i E kj ) dede kj 2E j f (µ D E E i E kj ) 1 f (µ s E E i E kj ) dede kj 0 0 πe j f (µ D E E i E kj ) 1 f (µ s E E i E kj ) dede kj 0 0 πe j f (µ D E E i E kj ) F 1/2 (µ S E E i ) F 1/2 (µ D E E i ) T SD (E) de (3.122) I = = n I (E i ) i=1 0 n i=1 q m y k B T h 2π 3 F 1/2 (µ S E E i ) F 1/2 (µ D E E i ) This is the final current to be evaluated in nanomos. T SD (E) de (3.123)

102 89 4. NANOMOS 4.0 APPLICATION EXAMPLE 4.1 Introduction In previous chapters, we discussed the underlying structure of nanomos and the methods deployed to simulate supported devices. In this chapter, we use nanomos to simulate an undoped-body, extremely thin, SOI MOSFET with back gate, which was recently reported by IBM Corporation [41]. We examine the internal physical operation of this device and benchmark nanomos simulations against experimentally measured results. Finally, we compare three different transport models: 1) quantum ballistic, 2) semiclassical ballistic, and 3) drift-diffusion and discuss the differences. The chapter begins with an overview of the SOI MOSFET structure after which we compare it with other popular candidates for nanoscale MOSFETs and discuss the advantages and disadvantages of each device structure. Next, clear step-by-step instructions on how to use nanomos 4.0 to setup the geometry and perform simulations are the given. Key details on simulation methods are also discussed. We then discuss the simulation results focusing on internal quantities such as carrier and current densities, electric fields and electrostatic potentials, local density of states, etc. Next, we examine the I-V characteristics and subthreshold behavior to reveal the advantage of back gate control in such SOI MOSFETs. Finally, we repeat the simulations for semi-classical ballistic and then drift-diffusion transport and compare and contrast the results with the quantum ballistic simulations.

103 90 Fig Fabricated device structure by IBM Corporation. The curve cut in buried oxide layer indicates a thick region not shown in the figure. The nanomos simulation region is enclosed by the dashed line. 4.2 Model Device: an Undoped-body Extremely Thin SOI MOSFET with Back Gate The device we choose to model is displayed above (Figure 4.1). It is an undopedbody extremely thin SOI MOSFET with back gate. As its name suggests, its channel is made of undoped silicon. Undoped channels have the advantage of lacking ionized impurity scattering, thus potentially giving carriers higher mobility. They also avoid complications that arise from doping the channel especially when the channel gets short The source and drain as well as the extension regions are doped by ion implantation. The channel and poly-silicon top gate are separated by a thin layer of SiON oxide which serves as the insulator. The back of the device sits entirely on a thick layer of buried oxide, which a back gate is afixed upon. Comparing to the traditional bulk silicon MOSFET, this SOI MOSFET replaces the bulk with an insulating oxide. This thick layer of buried oxide helps electrons to be confined within the thin body channel, desirably suppressing 2D electrostatics for lower off-current, as in the case of double-gate thin body MOSFET. In addition, the

104 91 back gate is able to bend the conduction band in the transverse direction, and as a result the electrons are pushed toward front gate for better electrostatic control. By biasing the back gate, we can also adjust the threshold voltage to a desirable value. On top of the physics advantages, SOI MOSFETs are easy to fabricate using existing technology since it is planar like the traditional bulk silicon MOSFET. In order to simulate this structure using nanomos 4.0, we have to confine our simulation region to a smaller portion than the entire device. As discussed in previous chapters, nanomos 4.0 can only handle 1D transport with mode space approach. This limits our interest to where the current only flows along the channel in x-direction (shown in figure). We have to thus exclude source/drain and the regions beyond, since current flows vertically in these regions. In addition, larger region also adds computational burden to the simulation. However, doing so does not compromise the conclusion: we are only interested in the solutions in channel region in order to evaluate the behavior of the device. The simulation region is enclosed by the dashed line in Figure 4.1. We follow the specifications given in [41] and set parameters in nanomos as follows to best resemble the fabricated device. channel length = 47 nm. source/drain included length = 10 nm. source/drain overlap with gate = 1 nm. channel thickness = 8.6 nm. top oxide thickness = 1.1 nm. buried oxide layer thickness = 145 nm. source/drain doping = 1E20 /cm 3 body doping = 0 /cm 3 source/drain doping slope = 0 nm/dec (abrupt doping).

105 Results: Internal Quantities Introduction In this section, we concentrate on some important internal quantities that nanomos computes. All these plots are obtainable using the plotting function saveoutput.m within nanomos s source code. We explain in detail what each plot means and how it is obtained. With the device geometry parameters given in previous section, biasing conditions are configurated as follows: front gate bias = 0.4 V back gate bias = -20 V drain bias = 1 V We only probe plots from this one bias; I-V characteristic related plots are discussed in the next section. As suggested in the experimental data [41], such biasing conditions shall place the SOI device in the on-state with saturated on-current. In addition, the transport model we choose is quantum ballistic via NEGF. It treats carriers quantum mechanically, but scattering is absent Internal Quantities Electrostatic potential profile One of the most explanatory plots of device behavior is the electrostatic potential profile. It illustrates the potential landscape an electron effectively sees while traveling within the device. This plot is obtained from solution of the 2D Poisson equation, which is self-consistently solved with the transport equation. One important thing must be kept in mind is: the plots generated here are raw data taken

106 93 Fig Electrostatic potential profile of entire device region. Most of the plot is occupied by the buried oxide layer which is 145nm thick. Fig Electrostatic potential profile focusing on channel region. x is transport direction; z is transverse direction. straight from the solution of Poisson solver, which does not include the different electron affinities materials may have. For example, the oxide layer has a much lower

107 94 Fig Bottom conduction band profile in the transverse direction. electron affinity than silicon channel, so the actual conduction band profile electron sees shall have that affinity difference added to the oxide region. The first plot, Figure 4.2 in 3D, shows the electrostatic profile of the entire simulation region. Majority of the plot is occupied by the buried oxide layer, whose potential profile has a constant slope indicating in this case a lack of electrons due to its high dielectric constant. More interesting facts can be found by focusing the plot around the channel region, which is shown in Figure 4.3. The zoomed in version of Figure 4.2 shows a downward bending of conduction band toward the front gate. This is where electrons reside since they favor lower energy. The situation here is similar to that in bulk MOSFET; electrons are induced near the interface of front gate oxide and channel. However, what is different now is the back gate can bend the conduction band so the electron population gets closer to front gate. This desirably gives the front gate a better control. Figure 4.4 is the conduction band profile at the top of the barrier in transverse direction. The top of the barrier is the point where conduction band is highest in the channel. nanomos finds this point and plots the conduction band profile in

108 95 transverse direction. Internally, this plot can give us a better idea of what kind of potential the Schrdinger equation used to solve for subbands. Electron density distribution Fig Electron density distribution of entire device. Majority of the plot is void of electrons corresponding to the buried oxide layer. Figure 4.5 shows the electron density distribution in entire device region. It is easy to see that electron population concentrates within the channel region, leaving buried oxide void of electrons as expected for having high dielectric constant. Figure 4.6 focuses on the channel region. The electron density is obtained as a solution to 1D Schrdinger equation along the transport direction via NEGF. It is then expanded according to the wavefunctions in transverse direction to form the full 2D electron density profile plot. We now examine this plot along both transport and transverse direction. In the transport direction, we can understand its distribution by simply considering its rough dynamic behavior. Electrons populate the source side abundantly, and those with energy higher than the top of the barrier are injected into the channel. Since we are treating electrons quantum mechanically, even though electrons have

109 96 Fig Electron density distribution focusing on channel region. Due to quantum confinement in transverse direction, the electron density exhibit wave-like pattern in source and drain. Electron density in channel is low compared to those in the source and drain, so it is difficult to observe in this plot. Fig Electron density distribution in transport direction. Notice the log scale of this plot. energy higher than of the band, they still have a chance to be reflected back into source, and this is why electron density keeps decreasing in channel toward drain. Inside drain, injected electrons made across the channel accumulates. Electrons are

110 97 also injected from the right side, but under high drain bias, we can roughly ignore the contribution to current from drain since most of them are reflected back by the barrier. Since nanomos uses an uncoupled mode space approach, electron density is solved individually in each mode. All the simulations done here include electron penetration; electrons are allowed to penetration into potential barriers where classically is forbidden. When the Schrdinger equation is solved across the transverse direction, electron affinities are added into the potential profile, so oxide regions have a high but finite barrier. Small amount of electrons can exist in the barrier as a result. If electron penetration is turned off, the barrier is simply set to infinity while solving the Schrdinger equation, thus no electron exists in the barrier. Average carrier velocity Fig Average carrier velocity along the channel. Current is conserved throughout the device as a resulting of satisfying the continuity equation; it is the product of electron concentration and its velocity. From the

111 98 previous electron density plot and current, we can easily derive the average carrier velocity which is shown in Figure 4.8. This velocity by all means is merely an average among all the electrons at that point; it does not describe the true velocity individual electrons have. Transmission Fig Energy vs. transmission coefficient. 0eV reference is the quasi-fermi level in source, where electrons are injected from. In ballistic NEGF, the current is determined via transmission coefficient. It is aquantityindicatingtheprobabilityofelectronscanbetransmittedthroughthe channel in a single subband. In case of multiple subbands such as our case, it often becomes a simple addition of transmission coefficients from all subbands. In our simulation where we taken into consideration of two subbands from each of the two valleys, the maximum obtainable transmission is thus four. The details of how it is obtained numerically can be seen from previous chapter; here we concentrate on this result in particular.

112 99 Figure 4.9 shows the transmission coefficient vs. injected electron energy. It indicates electrons with energy below approximately -0.1eV has negligible amount of transmission. Above this cut-off energy, there is significant amount of transmission meaning electron has a good chance to reach the drain from source. The 0eV corresponds to the quasi-fermi level in source. This cut-off energy roughly corresponds to the top of the barrier of first conduction band. Electrons above this energy can hop over the barrier and reach the drain. As energy increases, electrons are able to hop over the second, third barrier, and so on, thus the rising of transmission coefficient. Once energy reaches the top of fourth subband, the transmission coefficient cease to increase since no more subband is available for conduction. This is only true in this particular simulation, since if we decide to include more subbands, the transmission coefficient can keep increasing. However, the higher the energy, the fewer electrons reside at since it is the far end of the Fermi distribution tail. To save computational time while keeping results accurate this is the main principle of choosing the right amount of subbands into consideration. The last detail is the smooth appearance of the transmission curve. At a certain quasi-fermi level, electrons are injected according to Fermi distribution. At 0K, Fermi distribution is a delta function meaning electrons are strictly injected at one energy level. Therefore, electrons are either completely transmitted or reflected. The resulting transmission curve is step-like. At finite energy however, this energy level is broadened, so partial transmission of the entire distribution may occur, thus giving the transmission curve a smooth look. In our simulation, we set the temperature to 300K. Local Density of States (LDOS) Electrons can be injected from both source and drain in NEGF. They are injected into allowable energy states. In the transverse direction, due to quantum

113 100 Fig LDOS induced by the source Fig LDOS induced by drain. confinement, states are separated by fairly large energy gaps, so we have to treat them individually. In the transport direction where confinement is lacking, states are continuous. We can employ a simple analytic equation to determine the density of states at a certain energy point in 1D, since we are using the concept of effective mass, and the Ek diagram is merely parabolas.

114 101 Fig Full LDOS induced by both source and drain. These density of states however does not offer any insight spatially, so we have to weight them by their spatial wavefunction. The result is the local density of states. As mentioned in previous chapter, LDOS is generalized as the diagonal elements of spectral function in NEGF. Figure 4.10 and Figure 4.11 shows the LDOS arisen from source and drain, and their addition is Figure 4.12 the total LDOS. Qualitatively, we can understand the plot as follows: the wavefunction pattern is induced by the square of wavefunction s amplitude. Wave decays rapidly into the barrier region, where barely any states exists. As energy increases, DOS decreases as result, but as a new subband is encountered, DOS suddenly peaks and then gradually decrease again. The horizontal peaks of LDOS inside source and drain can be explained as such. Energy resolved current density Current calculated in NEGF uses the concept of transmission, and transmission coefficients are energy dependent. This suggests electrons injected at different energy levels may contribute unequally to the total current. On the other hand, the Fermi

115 102 Fig Energy resolved current density. The white curve is a line illustration of current density vs. energy. Fig Energy resolved current density foucsing around quasi- Fermi level at source distributions centered around quasi-fermi levels in source and drain are also energy dependent. The energy resolved current density is defined as the product of the transmission coefficient and Fermi distribution at that energy point. The Fermi distribution is a

116 103 difference between that arisen from source and drain, since total current is the net flow. Figure 4.13 shows the obtained energy resolved current density. In general, the result is as expected. The strongest current contribution is near the source quasi-fermi level where most electrons are injected. Upon close inspection, the two peaks require a bit more detailed explanation. First of all, temperature is finite, so the Fermi distribution is not a step function. Probability of state occupation is not highest at the quasi-fermi level but below it. This explains why the lower peak does not occur right at 0eV but below, even though the transmission coefficient is increasing with energy. The second smaller peak at around 0.03eV is due to the second and third subbands are just turned on, and the transmission coefficient experienced a rapid increase there. As energy continue to increase, this peak however is reduced by the shrinking Fermi distribution at its tail. Energy resolved electron density Fig Energy resolved electron density.

117 104 Just like the energy resolved current density, electron density can also be distributed across all possible energies. Such energy resolved electron density gives us a better idea at what energy level electrons reside at which part of the device. In NEGF, it is evaluated directly through a density matrix G n. The calculation details are explained in previous chapter. Qualitatively, this plot has a close connection with LDOS plot presented earlier. It is merely the product of LDOS with source/drain Fermi distribution. Figure 4.15 shows the energy resolved electron density plot. Few electrons reside within the channel, because most states there arise from drain as illustrated from Figure 4.11; the quasi-fermi level of drain is low in energy, so product of LDOS with a weak Fermi distribution tail at high energy yields few electron occupation. Another detail is the electrons injected into the channel from source. Their density decreases across the channel due to quantum reflection. No matter the electron is transmitted or reflected, it stays at the same energy due to lack of inelastic scattering. It releases all its energy once it reaches the drain, mostly due to inelastic phonon scattering. That process cannot be seen here, but we can assume it happens outside our simulation domain. 4.4 Results: IV Characteristics Introduction In this section, we concentrate on IV characteristics of the model SOI MOSFET and discuss its subthreshold behavior I-V characteristics in the ballistic limit via NEGF Figure 4.16 shows the I d vs. V g plot. The five different curves correspond to five different back gate biases ranging from 0V to -20V in steps of -5V.

118 105 Fig I d vs. V g with various back biases. Without any back gate bias, the transistor displays a fairly poor characteristic with I on approximately 10 3 A/um and I off approximately 10 4 A/um. The on/off current ratio is 10:1. With the increase of back gate bias, the IV characteristic is significantly improved. At its best with V bg = 20V,theoffcurrentI off = A/um. The on current remains roughly the same. This gives an on/off current ratio of 333:1. The impact of back gate can be better understood by looking at the electrostatic potential profile and electron density profile. Figure 4.17 and Figure?? shows a comparison between electrostatic potential profiles of V bg =0V and V bg = 20V. Figure 4.18 and Figure 4.20 shows a cut along transverse direction showing the bending of conduction band. In case of biased back gate, the conduction band is bent upward in the channel. This pushes electrons toward front gate, thus giving front gate a better electrostatic control which is highly desirable in improving various short channel effects such as DIBL, subthreshold slope, and off current.

119 106 Fig Electrostatic potential profile at back gate voltage = 0V Fig Subband electron density in transverse direction at back gate voltage = 0V Subthreshold characteristics With the knowledge of IV and various other plots, we can extract useful information to study the subthreshold behavior of such SOI MOSFET.

120 107 Fig Electrostatic potential profile at back gate voltage = -20V. Fig Subband electron density at back gate voltage = -20V. The first quantity of interest is the threshold voltage shown in Figure The threshold voltage is extracted via linear interpolation method from I d vs. V g plot with different gate length ranging from 30nm to 70nm. Wecollecteddatafromboth

121 108 Fig Threshold voltage vs. gate length at linear/saturated current with/without back gate bias. linear (V SD =0.05V )andsaturated(v SD =1V ) biasing regimes. Applying back gate bias generally increased the threshold voltage especially for short channel devices. The V t roll-offs are greatly lessoned. This can be understood as a result of better electrostatic control, because of the shifting of channel electron population toward the gate by back gate bias. Drain induced barrier lowering (DIBL) occurs when channel gets short, and the electric field from drain undesirably invades the channel to compete with the field induced by the gate. As a consequence, channel potential may be lowered or moved toward source due to drain bias. DIBL can be extracted straight forwardly from I d vs.

122 109 Fig DIBL vs. gate length, with and without back gate bias. V g plot by sweeping the drain bias several times. Or we can extract it from existing data shown in Figure by DIBL =(V t lin V t sat ) / (V SD lin V SD sat ) (4.1) =(V t lin V t sat ) /0.95V (4.2) DIBL extract using either method look the same as shown in Figure Without back gate bias, the DIBL effect is significant in such SOI MOSFET when channel becomes short. DIBL effect is greatly reduced once a back gate bias is applied at 20V. Above IV and subthreshold characteristics illustrated the beneficial impact of applying a back gate bias on various short channel effects. In addition, having a back gate allows engineers to attenuate the threshold voltage freely to suit their design. The thinner the buried oxide layer, the less variability device has for threshold voltage.

123 Results: Comparison between various transport models Semiclassical Ballistic vs. Quantum Ballistic Semiclassical and quantum ballistic transport models differ only in the treatment of electrons in transport direction. The quantum ballistic transport model employs the Schrdinger equation and solves it via NEGF. Therefore, quantum effects such as interference and tunneling are taken into consideration. On the other hand, the semiclassical ballistic transport only considers the top of potential barrier in channel with everything above transmitted and below reflected. We now compare I-V plots between the two transport models under same bias condition. Fig I d vs. V g comparison between semiclassical and ballistic transport models in linear scale. Figure 4.24 shows the off-current comparison. The off-current is about the same in both transport models. This can be better understood by examining the channel potential profile plots shown in Figure The SOI device modeled has a channel length of 47nm. Similar double gate device with 10nm channel has been reported

124 111 Fig I d vs. V g comparison between semiclassical and quantum ballistic transport models in log scale. Fig Conduction band profile along the channel. Zoomed into the beginning of the channel for better comparison.

125 112 to have significantly more off current observed in quantum ballistic model than using semiclassical treatment [ref]. This is due to, in addition to thermionic current which presents in both models, tunneling current exists in quantum ballistic model. Tunneling current mostly exists at the tip of the channel barrier, where the barrier is thinnest and easiest to be penetrated through. This can be observed from energy resolved current density plot in Figure However, the belief of tunneling current shall increase off-current for quantum ballistic transport model does not correspond to the results we obtained. This is because at low gate bias and a not-too-short channel length of 47nm, the barrier is fairly thick, and contribution from tunneling to total off-current is very small. This causes the results from both transport models to be very close. As the device begin to turn on, we again observed some a bit usual: the current from quantum ballistic model continues to be less than that from semiclassical treatment. As the gate bias increases, the top barrier becomes lower and thinner. Tunneling current increases as a result. However, a comparison on potential profile within the channel between two models reveals additional details. The self-consistent potential has a lower top of the barrier in semiclassical model, and this causes more current to pass. In addition, this lowered barrier region is where the tunneling current passes, but the tunneled electron has low velocity These factors together causes the semiclassical current to remain higher than that of quantum ballistic model. Although the currents are different, both models have similar electron density distribution along the channel except end of channel. The current is the result of product between average carrier velocity and electron density, so naturally the resulting average carrier velocity for semiclassical ballistic model is higher than that of quantum ballistic along the channel. This can be attributed to the facts that tunneling and interference can both degrades carrier velocity.

126 113 Fig Charge comparison between semiclassical and quantum ballistic models. Notice the log scale in y-axis Drift-Diffusion vs. Quantum Ballistic The drift-diffusion transport model does not take quantum effects in transport direction either, as semiclassical ballistic transport model. It introduce scattering described by mobility into channel region, so transport of electrons between source and drain is no longer ballistic. Figure 4.28 shows the I-V comparison between drift-diffusion and quantum ballistic transport. Evidently, the current from drift-diffusion model is much lower in both on-state and off-state. Mobility is usually a ill defined concept in short channel devices, because after injection from the source, electrons may not be slowed down enough by scatters to reach a common velocity described here by mobility; instead, they may encounter very few scatters and reach the drain end. This is especially true in our thin body, undoped SOI MOSFET. Here, if we choose to use drift-diffusion model, what we accomplished is to artificially limit the velocity electron may travel within the channel. Without any

127 114 Fig Average carrier velocity comparison within the channel. Two gate voltages corresponding to on-state and off-state are plotted for each transport model. Fig I d vs. V g comparison between drift-diffusion and quantum ballistic transport models.

128 115 experimental measurement, what mobility we shall use for channel is unknown. Using a mobility for bulk material underestimates the velocity for reason just discussed. Here we used the bulk constant mobility through the channel to illustrate its effect. Quantum ballistic model yields higher current in both on-state and off-state mainly due to two reasons. First and most important of all, it has no scattering in contrast to drift diffusion model with scattering described by mobility. Second of all, it has tunneling current. By adjusting the mobility, one can adjust the current level from drift-diffusion simulation. Fig Charge distribution comparison along the channel. Notice the log scaled used in y-axis 4.6 Summary Here we used nanomos 4.0 to simulate a laboratory fabricated undoped-body, extremely thin, SOI MOSFET with back gate, which was recently reported by IBM Corporation [ref]. We first reviewed the various internal plots of nanomos using model SOI as simulation target. The plots revealed some important operating mech-

129 116 anisms behind SOI MOSFETs. The simulated results show the biasing of back gate in model SOI structure can improve the short channel effects such as subthreshold roll-offs and DIBL. The back gate can also bend the conduction in transverse direction and shift the electron population toward front gate for better electrostatic control. Comparison between two ballistic models shows similar results characterizing device s maximum ballistic performance, yet drift-diffusion model predicts lower current in all cases.

130 AUXILIARY PROGRAMS OF NANOMOS 5.1 Introduction Auxiliary programs assisting nanomos are made to enhance user experience and aid in development progress. In this chapter, we review these programs in detail and understand how they help nanomos to become more efficient and robust. The auxiliary programs include a Graphical User Interface (GUI) providing an easy to interact interface for nanomos, Parallel Job Submitter (PJS) for massive jobs submission to clusters, and Benchmark and Testing Suite for checking potential errors introduced in development. Many of the concepts here are shared universally in other software developments. 5.2 Graphical User Interface Introduction One significant addition to the nanomos source code is a Graphical User Interface (GUI). It allows user to easily interact (input parameters, simulate, and display results) with nanomos via a graphical interface consists of buttons and text boxes. It is built using a GUI builder called Rapid Application Infrastructure (Rappture) toolbox developed by nanohub.org [42]. In this section, we start by overviewing the Rappture toolbox s features. We take a step further and discuss the detailed structure of Rappture including how it constructs a GUI and maintains the communication between the GUI and interface. This discussion might be useful for those who wish to use Rappture in their own applications. We in the end show how it is implemented with nanomos and present a chart to visualize the process.

131 118 Fig Snapshots of nanomos GUI built from Rappture toolbox. It is deploied on nanohub.org at The left figure is input interface, and right one is output interface displaying a 3D plot Rapid Application Infrastructure (Rappture) toolbox With the exception of few, most scientific applications are still living in commandline status today. A potential user who downloaded the application has to supply either an ASCII input deck or type the parameters manually after the executable name to run the program. There could be many parameters scattering around an input deck making any modification confusing, difficult, and error-prone. In the worst case, complicated programs like SPICE [43] have so many parameters that their input decks literally form their own grammar and format which a user has to take time to read the manual and adapt. Rapid Application Infrastructure is developed to overcome such difficulty by providing a quick solution to GUI building for scientific applications written in different computer languages. As a summary, its strengths are:

132 Rappture can wrap around without invasive modification of core application. 2. With the wrap-around concept, it can provide GUI to application written in any language. 3. It also supports invasive/built-in style implementation. An application can use Rappture library to implant a GUI within itself, eliminating the need of a wrap around. 4. It is easy to use and implement. Online documentations are available [42]. 5. A Rappture application can be deployed freely on nanohub.org, allowing other users around the world to use Understanding the Rappture toolbox Rappture offers two styles of implementing GUI to an application. One is the wrap-around method, with the other being built-in method. The main difference between those are the invasiveness : wrap-around method can be applied to application written in any language, and it does not require implanting any rappture specific commands into the application. On the contrast, built-in method as its name suggests requires implantation of rappture specific commands into the application source code. It requires the use of rappture library in the source code and language binding support for the specific computer language used by the source code. Therefore, the built-in method only applies to languages currently having Rappture Application Programming Interfaces (API)s available to them. Here we put more empathy on the wrap-around method since it s easier to use and applies to applications virtually written in any computer language. In addition, we only discuss the concept and structure of Rappture. For command manuals and other information, readers can consult Rappture documentation online.

133 120 Wrap-around method General concept A typical application accepts input as an ASCII file, simulates, and returns output in proper format (either it s an ASCII text file, or MIME picture, or in other forms). We call such application without GUI a core application. Rappture wraps around this core application with a GUI module. The GUI module creates a GUI first. The GUI accepts inputs from user using easy to fill items like text boxes or option drop boxes. After the user has adjusted the inputs, GUI takes all these inputs and pass on to a wrapper module. Here, the inputs are treated and transformed into an ASCII input deck. This input deck is one readable by the core application. By treated and transformed we mean tasks such as re-adjusting the input formats, unit conversions, arranging them in proper orders, and finally writing them into one ASCII input deck. After the simulation is completed, output files are generated as result. These output files are read by Rappture s wrapper module and again treated and transformed into proper format for Rappture to accept. Depending on the type of output data, wrapper module decides what method of output display to use via pre-defined rules (ex. plain text display for ASCII data, 3D visualization for 3D data). The GUI module is called upon to accept those data and display them in the GUI. Afullsimulationcyclehassobeencompleted. GUI module GUI module is one of the two parts necessary for Rappture application. It is described by a file commonly named tool.xml. Its sole purpose is to generate the GUI: nothing else can be nor shall be accomplished in tool.xml. Once opened, tool.xml is basically an xml style file with rappture specific tags. Each of these tags has its own usage. Since it is in xml style, it is important to keep in mind that tags always come in pairs with one opening and the other closing. With the use of different tags, one can spawn different interface features like a text box or picture. Each input is one such spawned feature, and it is assigned a unique ID.

134 121 Wrapper module The wrapper module is the other part necessary for Rappture application. It is usually found in the same folder as tool.xml as a matter of good practice. It is less restrictive in terms of creation than tool.xml since one can choose to write it in any computer language supported by Rappture. Supported here means there are APIs available interfacing between the computer language and Rappture s various functions. Currently, the supported language include MATLAB, Tcl, C/C++, FORTRAN, Python, and several others [42]. Since Rappture only has two parts, it is obvious the wrapper module has to do anything that is not done by the GUI module, which is everything besides the GUI description. After GUI is generated by the GUI module, user enters inputs. After that is done, user must click simulate button to start simulation. This simulate button invokes the wrapper module. Inputs have been assigned unique IDs in tool.xml to distinguish among them, and wrapper module access them by the IDs and store them locally. These local variables are treated and transformed into proper format acceptable by the core application. For example, the core application might only accepts length in meters, but a user might enter 3 mm as input. This unit conversion has to be done in the wrapper, and Rappture provide unit conversion package to carry out such things easily. The inputs are then formed into an ASCII input deck in a manner readable by the core application. Wrapper module then invokes the core application with the input deck to start the simulation. Once the simulation completes, output files are generated and wrapper module is notified. Wrapper module opens each output file and imports the contents into local variables. These variables are treated and transformed into proper format acceptable by Rappture plotting functions. Either it is a 2D plot or 3D visual, Rappture has a fixed format accepting the data for displaying. These Rappture plotting functions communicates with and show the output on GUI.

135 122 Built-in method In the built-in method, the need for a separate wrapper module is eliminated. Instead, the tasks belong to wrapper module is merged into the core application. This of course requires the core application to be written in one of the aforementioned languages supported by Rappture APIs. Doing so essentially eliminates the need to generate an input deck and read from output files; Rappture directly communicates the source code via proper API functions instead. This also has other benefits such as making Rappture built-in functions and those of core applications directly available to each other. With such integration, one downside is Rappture package has to be bind with the core application always. This can be troublesome since it adds additional potential difficulties in compiling the code on other platforms which may be OK for the core application but not robust for Rappture Implementation in nanomos Rappture GUI is implemented in nanomos via the wrap-around method which makes nanomos can be distributed by core source alone without Rappture support. The detailed implementation steps is same as ones described in wrap-around method section. The Rappture GUI version of nanomos is available on nanohub.org and can be used in any web browser with up-to-date JAVA plug-in installed. One additional fact worth noticing is nanomos has been compiled into an executable to work with Rappture GUI. This is rather a preference than a necessity; uncompiled MATLAB (which is essentially compiled C/C++ routines in lower level) can be used with Rappture. Therefore, if one is using nanomos with Rappture supported interface, compilation must be carried out after any modification being made to source code.

136 123 Fig Implementation flow chart of Rappture toolbox to nanomos. The red box indicates the computational code is hidden from users. All users see and need is an easy to use graphical interface. 5.3 nanomos on computer clusters Embarrassingly parallel scheme As scientific simulation programs deploying increasingly detailed models and becoming more and more powerful, the added complicities usually degrade the run

137 124 speed. A device structure in sophisticated program such as OMEN [44] can take hours if not days to be simulated. Such lengthy delay in obtaining the results decreases the efficiency of research and sometimes even makes certain simulations impractical to be carried out. Several things can be done to speed up a code. Firstly, it is obvious a faster CPU and sometimes larger RAM can help. Secondly, a better structured and written code can usually yield shorter run time. In case of MATLAB, good practice such as avoiding unneeded for loops, effective allocation of array memories, and correct usage of sparse matrices can all lead to significant save in computational time. Thirdly, sometimes rewriting computationally intensive code portions with more efficient computer language can help. MATLAB commands are basically pre-compiled C/C++ codes with convenient style and fancy options. As a result, MATLAB functions sometimes sacrifice speed and efficiency for convenience and generality. By re-writing these functions in C/C++ code, one is able to build dedicated and possibly specialized function for the sake of his/her own code to gain speed. In nanomos, such attempts are made but since the code is not overly complicated, little performance gain is noticed. Modern simulation applications like OMEN sometimes utilize parallel programming concepts to obtain extreme performance. As an entry level parallel scheme, an application can run instances of itself on different computers each with a unique input deck at the same time. For example, if one wishes to run 5 sets of different simulations, doing so can speed up the overall task 5 times than running the simulations one after another. Such simple parallel scheme is known as the embarrassingly parallel scheme due to its simplicity. Worth to notice is the genuine parallel programming implementation to an application. Packages, mainly Message Passing Interface (MPI) today, allow developers to manipulate the distribution of computation among many CPUs in source code. Splendored multi-core operations can be carried out as a result, and simulation time can be dramatically reduced [45]. MPI is however only available to C/C++ and

138 125 FORTRAN. In MATLAB, there is a Parallel Computing Toolbox available, and it comes with a license cost nanomos Parallel Job Submitter (PJS) nanomos being a simple simulator usually has a short simulation time in the range of minutes to hours. Therefore, code level parallelization via MATLAB Parallel Computing Toolbox is not implemented. Instead, a parallel job submitter using embarrassingly parallel scheme is available to reduce the total run time for a series of independent jobs. The parallel job submitter (PJS) is programmed to interface with standard Portable Batch System (PBS) for cluster queuing. It can be used on clusters such as Steele (steele.rcac.purdue.edu) and Coates (coates.rcac.purdue.edu) at Purdue [46]. PJS takes from user a nanomos input deck and a gate and/or source bias sweep. For example, if 3 different gate biases are to be simulated, PJS first generates 3 nanomos input decks from the original one supplied by the user each with a distinct gate bias. PJS then interface with PBS and reserve 3 CPUs with proper amount of memories specified by user. Once the CPUs are ready, PJS submits the jobs to PBS and begin the simulations. Once the simulations are completed, PJS takes the outputs back and stores them separately into different folders. PJS can smartly allocate jobs. If hundreds of gate/source bias to be done and only handful CPUs are available, PJS can monitor the PBS queue and submit a new job as soon as a CPU becomes free from a newly completed job. PJS thus eliminates the need to manually submit and monitor the jobs on clusters and provide users a hands free experience. 5.4 Benchmark and testing suite nanomos development consists of efforts from a team of developers. Each developer is responsible for a portion of the code, and the code might be modified

139 126 simultaneously from part to part. The use of Subversion (SVN) in nanomos s development provides the necessary version control in a multi-developer environment and resolves conflicts in coding. However, it cannot detect any error existing in the code possibly unknowingly made by a developer who checked in the code. Without any error prevention software, potential errors hide and accumulate inside the code from revision to revision and breakout at some later time, making debugging difficult and confusing. A benchmark and testing suite is therefore created for nanomos to screen errors in the code. The benchmark and testing suite is written in MATLAB. It consists of a library of input decks with a diverse range of simulation types and parameters. Also stored in the library are output files generated from those input decks. These benchmark simulations are successfully generated by nanomos meaning nanomos should not break down while simulating them. They are also inspected by developers and deemed to be reasonable results. Each time a new feature is made inside nanomos, a new pair of input deck and output files reflecting the latest features must be generated and placed in the library. Once invoked, the benchmark and testing suite run every available input decks in the library with latest nanomos version in development. After the simulations are done, it compares the newly generated outputs results with those stored in the library if the simulation is successfully completed. In case of crash or differences between the new and benchmark outputs, a warning is given to the developer with crash information or which output variables being inconsistent. While comparing two variables for consistency, a tolerance is given for numerical type variables that may differ slightly due to machine precision.

140 APPENDICES

141 127 A. FINITE DIFFERENCE METHOD (FDM) AND DISCRETIZATION OF HAMILTONIAN Finite difference method is an approximation to discretize an otherwise continuous differential equation. It is usually used in the case which an explicit solution to the differential equation is too difficult if not impossible to obtain; instead one seeks discrete solutions at certain pre-defined solution nodes on a finite difference mesh. Solutions at the nodes are to be exact, and solutions in between can be approximated via linear interpolation or other interpolation methods. For first order derivatives, dy = y i+1 y i 1 dx i 2 For second order derivatives, (A.1) d 2 y dx 2 i = y i+1 (x + a) 2y i (x)+y i 1 (x a) a 2 a is distance between nodes, which is assumed to be constant (A.2) Other discretization methods popularly deployed in device simulation includes finite element method (FEM) and finite volume method (FVM) [47].

142 128 B. MATRIX INVERSION TECHNIQUES Coincided in its definition (3.102), the evaluation of Green s function has a crucial step of inverting a Hamiltonian matrix, and depending on the system one models, the Hamiltonian can have a huge size. In situation such as self-consistent calculation or sweeping of a bias, one needs to evaluate the Green s function thus inverting the Hamiltonian repeatedly. Inverting a matrix especially of large size can be very costly in computation. Therefore, we will have a look at the special techniques deployed in nanomos to speed up the matrix inversion process. B.1 Matrix types Due to the unique nature of our modeling methods, Hamiltonian matrices often have a banded block tri-diagonal form: it s not tri-diagonal in terms of individual elements, but when grouping the elements into principle layers, the Hamiltonian can be divided into block tri-diagonal form. Each diagonal block corresponds to a principle layer s Hamiltonian matrix, and the left and right blocks are coupling matrices corresponding to interactions with nearest neighboring blocks. The concept of principle layers which only permits nearest neighbor interactions is of paramount importance to the Recursive Green s Function (RGF) method [37]. Here, we just wish to simply point out that often the Hamiltonian appeared in nanomos is of block tri-diagonal form. Another important type of matrix, of course, is a regular matrix which doesn t have any special characteristics besides being invertible. A principle layer s Hamiltonian can be of this type.

143 129 B.2 LU Decomposition The LU decomposition method [48] is often the preferred direct solution method for low to medium sized systems (usually less than equations). The basic idea of LU decomposition in solving matrix equation of form Ax = B (B.1) is to decompose matrix A into two matrices L and U with L being an upper diagonal matrix and U being an lower diagonal matrix. That is A = LU (B.2) so the original problem becomes LUx = B (B.3) or is a pair of more suggestive form Ly = B (B.4) Ux = y (B.5) Advantage of this from computational point of view (in MATLAB for example) is by decomposing the matrix into L and U, one can effectively solve the problem via Gaussian elimination (backslash \ in MATLAB) instead of the CPU costly matrix inversion (command inv in MATLAB) especially for large matrices. Notice here that if one uses LU decomposiiton in MATLAB (command du ), he/she has to use command backslash \ to solve the problem in order to obtain speed advantage because of the reasoning above.

144 130 C. BANDSTRUCTURE CALCULATION, WAVEFUNCTION AND NEGF FORMALISM IN ATOMISTIC TIGHT-BINDING SIMULATION C.1 Introduction In this study, the wavefunction formalism and several related topics are documented using Graphene Nanoribbon (GNR) as example. The wavefunction formalism is based on a direct solution scheme of Schrdinger equation in absence of scattering event in channel coupled with scattering boundary conditions for contacts. It s a numerically more efficient way to evaluate the ballistic transport properties than the usual NEGF formalism. It is presented here with few other related topics to provide adeeperunderstandingofthequantumtransportsectionandbringaprospective outside of nanomos. Many of the techniques here are not ultilized in nanomos but instead deployed in more sophisticated simulator such as OMEN. C.2 Complex electronic bandstructure Electronic bandstructure is the relationship between electron s energy (E) andits crystal momentum (k). In tight binding language, it is found via discretization of the system with orthogonal Lowdin orbitals. The Schrdinger equation for entire system is Hψ E = Eψ E (C.1) E.) (H is the Hamiltonian of the system, ψ E is the wavefunction at the specific energy

145 131 In tight binding language, the wavefunction of the system at certain position is a linear combination of orthogonal Lowdin orbital basis, thus ψ E (R) = n C n ϕ n (C.2) or more compactly ψ E = n C n n (C.3) (n is the nth atomic orbital index. In case of graphene, we only consider one atomic orbital per carbon atom, so the position dependency is built-in the atomic index. ϕ n is the nth atomic orbital wavefunction. C n is the nth wavefunction coefficient/weight) Insert equation (C.2) into (C.1), one gets H n C n n = E C n n n n multiply each side with the complex conjugate of nth state, (C.4) H n C n T n,n+1 C n+1 T n,n 1 C n 1 = EC n (C.5) Here, we only considered nearest neighbor coupling. One is able to obtain such an equation for each principle layer in the GNR. A principle layer is a slab of atoms that can be periodically extended to produce the entire system, and neighboring principle layers only have nearest neighbor couplings between them. Therefore, all semi-1d system with confined width can be generalized into equation (C.5). In a periodic crystal, the Bloch theorem states the wavefunction of two neighboring principle layers differ by a phase factor proportional to the translational vector between two principle layers ϕ n+1 = e ika ϕ n ϕ n 1 = e ika ϕ n (C.6) (C.7)

146 132 Notice the direction determines the sign of exponential factor. Here, we define the right-going wave having a positive k. Insert equation (C.6) into (C.5), H n C n T n,n+1 C n+1 T n,n 1 C n 1 = EC n (C.8) (H n T n,n+1 e ika T n,n 1 e ika )ϕ n = Eϕ n (C.9) Equation (C.9) is the usual bandstructure calculation equation. There are two unknowns in this equation, injection energy E and wavevector k. Weinsertoneinto the equation and will get the other. Inserting discretized values of k into equation (C.9) will make it into a regular eigenvalue problem. Values of E and their corresponding eigenfunctions can be found by simply finding the eigenvalues of the left-hand-side (LHS) matrix. k as wavevector can be complex, but energy E can only be real. The real part of k denotes a propagating wave; the imaginary part denotes a decaying wave. In an infinite crystal, people thus usually ignore the k with imaginary part, because these waves will eventually decay away since the crystal is infinitely large. However, at crystal interface, the decaying states becomes important. Although decaying, they can spill over the interface. This is the case when electrons are injected into a device from contacts with certain energy, which is the inverse of a bandstructure calculation. If we wish to know what wavevectors an electrons can have with a certain injection energy E, we have to solve equation (C.9) for unknown k. Equation(C.9)initscurrent form is not convenient to do so. Therefore, we need to transform equation (C.9) into another form that can be solved for k. With the help from another neighboring principle layer, we can transform (C.9). (H n T n,n 1 e ika )ϕ n T n,n+1 ϕ n+1 = Eϕ n (C.10) (H n E)ϕ n T n,n+1 ϕ n+1 = T n,n 1 e ika ϕ n (C.11)

147 133 ϕ n = e ika ϕ n+1 (C.12) or in matrix form D n T n,n ϕ n ϕ n+1 = e ika T n,n ϕ n ϕ n+1 (C.13) M left ϕ = e ika M right ϕ (C.14) The trick here is to factor out a term exclusively contains k. Now, we have once again turned it into an eigenvalue problem, but solving for k this time. In order to solve equation (C.13), we have to invert either M left or M right. However, this is usually not possible unless every atom within a principle layer is coupled to the left AND right neighbors. The exceptional case ensures there s an non-zero number in every column of matrix T n,n 1 and T n,n+1, which due to the unique structure of M left and M right. Therefore, in general, we need to further transform equation (C.13) to avoid singularity problems. A common trick to do this is to write the Hamiltonian in such as way that one row has a Bloch exponential factor being positive and another row being negative. Each row must also have elements that doesn t contain the exponential factor. Therefore, one can divide the row with negative exponential factor and allowing other elements in the row obtaining a positive exponential factor. Then, one can decompose the matrix into two with the exponential factor in front, and invert them to solve for k. The wavevector k obtained this way can be complex k = a + i b (C.15) If contact) 1. a > 0,b = 0, right propagating wave (able to travel far into another

148 a<0,b=0, left propagating wave 3. b>0, decaying right traveling wave (unable to travel far into another contact) 4. b<0, decaying left traveling wave Due to numerical accuracy, one needs to set up a tolerance range (in case number isn t exactly zero but a very small number) and group the obtained wavevectors into the four catalogs. What s also obtained are the wavefunctions of the waves. C.3 Wave function formalism What we have now from previous section is the knowledge of all the possible k at each energy E in a perfect crystal. Now, this crystal will be connected to a device region. The picture is now like this: electrons traveling in the perfect crystal (contact) will inject into the device region with certain energy (injection energy), and we are interested to see the response to this injection (or excitation) at the other contact. The idea here can be easily generalized into multi-contact picture. As usual, we define the entire system (contacts and device region) like this N 2 N 1 N 0 N 1... N S 1 N S N S+1 N S+2 (C.16) The left and right contacts are semi-infinite crystals, so the wavefunctions are of Bloch type. We have solved for the Schrdinger equation in this situation in previous section, so we now know the wavevector and its wavefunction at each the slab in the contacts. Assume we only inject from the left contact. To solve for the Schrdinger equation in the device region, we can write D 0,0 C 0 + T 0, 1 C 1 + T 0,1 C 1 =0 (C.17) (10)

149 135 D 1,1 C 1 + T 1,0 C 0 + T 1,2 C 2 =0 (C.18)... D S,S C S + T S,S 1 C S 1 + T S,S+1 C S+1 =0 (C.19) (11) D S+1,S+1 C S+1 + T S+1,S C S + T S+1,S+2 C S+2 =0 (C.20) As a reminder that what we solved in previous section is a set of k and their eigenfunctions at a certain E satisfying the following equation C n = a n e ikx ϕ + n (C.21) also C n = b n e ikx ϕ n (C.22) by time reversal symmetry property. Any combination of the two is a general solution to the Shcrodinger equation. C n = ae ikx ϕ + n + be ikx ϕ n (C.23) (12) Put equation (C.23) into (C.17), D 0,0 (ae ikx ϕ be ikx ϕ 0 )+T 0, 1 (ae ikx ϕ be ikx ϕ 1)+T 0,1 C 1 =0 (C.24) with Bloch s theorem, ϕ n+1 = e ika ϕ n (C.25)

150 136 ϕ n 1 = e ika ϕ n (C.26) equation (C.17) further becomes D 0,0 (aϕ bϕ 0 )+T 0, 1 (ae ikx e ika ϕ be ikx e ika ϕ 0 )+T 0,1 C 1 =0 (C.27) Conviniently, we set x =0atprinciplelayer0, D 0,0 (aϕ bϕ 0 )+T 0, 1 (ae ika ϕ be ika ϕ 0 )+T 0,1 C 1 =0 (C.28) (D 0,0 + T 0, 1 e ika )aϕ + 0 +(D 0,0 + T 0, 1 e ika )bϕ 0 + T 0,1 C 1 =0 (C.29) (13) (D 0,0 + T 0, 1 e ika )bϕ 0 + T 1,0 C 1 = (D 0,0 + T 0, 1 e ika )aϕ + 0 (C.30) Also, equation (C.17) for C 1 can be written as D 1,1 C 1 + T 1,0 (aϕ bϕ 0 )+T 1,2 C 2 =0 (C.31) D 1,1 C 1 + T 1,0 bϕ 0 + T 1,2 C 2 = T 1,0 aϕ + 0 (C.32) (14) For the right contact, we have D S,S C S + T S,S 1 C S 1 + T S,S+1 C S+1 =0 (C.33) (15) D S+1,S+1 C S+1 + T S+1,S C S + T S+1,S+2 C S+2 =0 (C.34) (16)

151 137 and C S+1 = ce ikx ϕ + S+1 + de ikx ϕ S+1 (C.35) C S+2 = ce ikx ϕ + S+2 + de ikx ϕ S+2 (C.36) so we then have for eq. (C.34) D S+1,S+1 (ce ikx ϕ + S+1 + de ikx ϕ S+1 )+T S+1,SC S + T S+1,S+2 (ce ikx ϕ + S+2 + de ikx ϕ S+2 ) (C.37) (C.38) =0 (C.39) Invoke Bloch s theorem and let the coefficients absorb the exponential factors, D S+1,S+1 (cϕ + S+1 + dϕ S+1 )+T S+1,SC S (C.40) + T S+1,S+2 (ce ika ϕ + S+1 + deika ϕ S+1 ) (C.41) =0 (C.42) (D S+1,S+1 + T S+1,S+2 e ika )cϕ + S+1 + T S+1,SC S (C.43) = (D S+1,S+1 + T S+1,S+2 e ika )dϕ S+1 (C.44) (17) Equation (C.33) becomes D S,S C S + T S,S 1 C S 1 + T S,S+1 (cϕ + S+1 + dϕ S+1 )=0 (C.45) D S,S C S + T S,S 1 C S 1 + T S,S+1 cϕ + S+1 = T S,S+1dϕ S+1 (C.46) (18)

152 138 Therefore, now the matrix discretizing the Schrdinger equation can be written in the form L 00 L L 10 D 11 T T 21 D 22 T T S,S 1 D S,S R S,S+1 C 0 C 1 C 2... C S = I 0 I I S (C.47a) R S+1,S R S+1,S+1 C S+1 I S+1 with

153 139 L 0,0 = D 0,0 ϕ 0 + T 0, 1 ϕ 0 e ika (C.48) = D 0,0 ϕ 0 + T 1,0 ϕ 0 e ika (C.49) L 0,1 = T 0,1 L 1,0 = T 1,0 ϕ 0 R S+1,S+1 = D S+1,S+1 ϕ + S+1 + T S+1,S+2ϕ + S+1 eika (C.50) (C.51) (C.52) = D S+1,S+1 ϕ + S+1 + T S,S+1ϕ + S+1 eika (C.53) R S,S+1 = T S,S+1 ϕ + S+1 R S+1,S = T S+1,S I 0 = (D 0,0 ϕ T 0, 1 ϕ + 0 e ika )a (C.54) (C.55) (C.56) = (D 0,0 ϕ T 1,0 ϕ + 0 e ika )a (C.57) I 1 = T 1,0 ϕ + 0 a I S = T S,S+1 ϕ S+1 d I S+1 = (D S+1,S+1 ϕ S+1 + T S+1,S+2ϕ S+1 eika )d (C.58) (C.59) (C.60) = (D S+1,S+1 ϕ S+1 + T S,S+1ϕ S+1 eika )d (C.61) C 0 = b C S+1 = c (C.62) (C.63) C.3.1 Alternative form The matrix equation can in fact be further reduced by observing the first second, second last and last rows. For left side contact

154 140 L 00 C 0 + L 01 C 1 = I 0 (C.64) C 0 = L 1 00 (I 0 L 01 C 1 ) (C.65) L 10 C 0 + D 11 C 1 + T 12 C 2 = I 1 (C.66) L 10 L 1 00 (I 0 L 01 C 1 )+D 11 C 1 + T 12 C 2 = I 1 (C.67) L 10 L 1 00 I 0 L 10 L 1 00 L 01 C 1 + D 11 C 1 + T 12 C 2 = I 1 (C.68) (D 11 L 10 L 1 00 L 01 )C 1 + T 12 C 2 = I 1 L 10 L 1 00 I 0 (C.69) Similiarly for the right side contact R S+1,S C S + R S+1,S+1 C S+1 = I S+1 C S+1 = R 1 S+1,S+1 (I S+1 R S+1,S C S ) (C.70) (C.71) T S,S 1 C S 1 + D S,S C S + R S,S+1 C S+1 = I S T S,S 1 C S 1 + D S,S C S + R S,S+1 R 1 S+1,S+1 (I S+1 R S+1,S C S )=I S T S,S 1 C S 1 + D S,S C S + R S,S+1 R 1 S+1,S+1 I S+1 R S,S+1 R 1 S+1,S+1 R S+1,SC S = I S (C.72) (C.73) (C.74) T S,S 1 C S 1 +(D S,S R S,S+1 R 1 S+1,S+1 R S+1,S)C S = I S R S,S+1 R 1 S+1,S+1 I S+1 (C.75) By doing so, the matrix side has been reduced. The term containing three Ls in front of C 1 is effectively an self-energy. Notice this alternative form is popularly displayed in other literatures. [49] [50]

155 141 C.3.2 Choice of k As mentioned in previous section, we will obtain four types of k as solutions satisfying the bulk Schrdinger equation. Now, we need to choose which of the four types are reasonable to be inserted to the matrix for solving C. Thefourtypesofk are k = a + i b (C.76) If 1. a > 0,b = 0, right propagating wave (able to travel far into another contact) 2. a<0,b=0, left propagating wave 3. b>0, decaying right traveling wave (unable to travel far into another contact) 4. b<0, decaying left traveling wave If we inject only from the left side, we have the following situation 1. We assume here only the propagating wave is able to reach from one contact to the other; decaying waves will not be able to survive the device region. Therefore, we can ignore the waves having no propagating component, because the response at the other contact would be zero anyway. Also, the electron traveling from the left side starts with energy E, and it will not scatter in the device, so it will still have energy E upon reaching the right contact. This justifies the reason why the right contact and left contact are using the same bandstructure energy to look up complex k. 2. At the left contact, the right-going wave component shall be that of propagating, and the left-going one can be of any type (decay or propagating). C 0 = ae ikxp ϕ be ikx ϕ 0 (C.77)

156 At the right contact, the right-going wave component can be of any type (decay or propagating), and the left-going one shall be of type propagating. C S+1 = ce ikx+ ϕ + S+1 + de ikxp ϕ S+1 (C.78) Therefore, for each pair of right-going propagating wave component and left-going decaying/propagating wave component, there exists a solution to C. The complete matrix equation demonstrating only the first row is (D 0,0 + T 1,0 e ika )ϕ 0 C 0 + T 0,1 C 1 = (D 0,0 + T 1,0 e ika )aϕ + 0 (C.79) (wrong due to matrix ordering) (D 0,0 ϕ 0 + T 1,0 ϕ 0 e ika ) C 0 + T 0,1 C 1 = (D 0,0 ϕ T 1,0 ϕ + 0 e ika )a (C.80) For example, in case each principle layer has 4 unique orbitals, 3 left-traveling waves, and 2 right-going propagating waves, the equation becomes (D 0,0 ϕ 0 + T 1,0 ϕ 0 e ik a ) C 0 + T 0,1 C 1 (C.81) = (D 0,0 ϕ + p0 + T 1,0 ϕ + p0e ikpa )a (C.82) Although the above equation holds, the resulting matrix cannot join the rest of the matrix equation due to its size. Therefore, we multiply each side by the complex conjugate of left-going wavefunction. The complete equation then becomes (ϕ 0 D 0,0 ϕ 0 + ϕ 0 T 1,0 ϕ 0 e ik a ) C 0 + ϕ 0 T 0,1 C 1 (C.83) = (ϕ 0 D 0,0 ϕ + p0 + ϕ 0 T 1,0 ϕ + p0e ikpa )a (C.84) a is a diagonal matrix containing the injection probability of each injecting state (Fermi distribution at that state). Notice it s also OK to multiply with each side with the complex conjugate of right-going wavefunction. The time reversal symmetry insures the size of left-going and right-going wavefunction matrices are the same (because number of left-going and right-going states are the same). This is the final equation to be used in simulation.

157 143 C.3.3 Transmission from coefficient Knowing the transmitting coefficient from previous calculations, one can obtain a transmission out of it. Transmission from left to right is given by T (E) = n,m de/dk is calculated from equation (C.9) C S+1,n (k m ) 2 de dk m de dk n 1 (C.85) (H n T n,n+1 e ika T n,n 1 e ika )ϕ n = Eϕ n (C.86) ( d dk H n d dk T n,n+1e ika d dk T n,n 1e ika )ϕ n = de dk ϕ n (C.87) (19) ϕ n de dk ϕ n =(0 iat n,n+1 e ika + iat n,n 1 e ika )ϕ n de dk ϕ n = iat n,n+1 e ika ϕ n + iat n,n 1 e ika ϕ n de dk ϕ n = ϕ niat n,n+1 e ika ϕ n + ϕ niat n,n 1 e ika ϕ n de dk = ia(ϕ nt n,n+1 e ika ϕ n ϕ nt n,n 1 e ika ϕ n ) Observe that (k here is a scalar) (C.88) (C.89) (C.90) (C.91) Remind the reader that (ϕ nt n,n+1 e ika ϕ n ) = ϕ nt n,n 1 e ika ϕ n (C.92) (ABC) = C B A (C.93) Therefore, equation (C.91) has become de dk = ia(ϕ nt n,n+1 e ika ϕ n (ϕ nt n,n+1 e ika ϕ n ) ) de dk = ia 2 imag(ϕ nt n,n+1 e ika ϕ n ) (C.94) (C.95)

158 144 Notice that for either left or right contact, the above formula holds true. The coupling matrix to be used all the same for both contacts. (For example, in [51], the factor a is missing because in the transmission formula, the multiplication of the group velocity and its inverse cancel such factors out). C.4 NEGF formalism C.4.1 Contact Self-energy from wavefunction formalism The self energy used in NEGF formalism is merely a term added into the Hamiltonian of the device region to account for the effects from contacts. Here, I will use a simple explanation to derive a method to calculate the self-energy. The self-energy term serves the same purpose as the previously explored injection wavefunctions; it tells the influence to the device from an electron injected from contact at a certain energy. In fact, we ve derived the self-energy term in the alternative form section previously. The D 00 modifying term Σ left = L 10 L 1 00 L 01 (C.96) is the self energy for the left contact. Similarly, for the right contact, Σ right = R S,S+1 R 1 S+1,S+1 R S+1,S (C.97) Expand the left side self-energy term as example, Σ left = L 10 L 1 00 L 01 = T 1,0 ϕ 0 (ϕ 0 D 0,0 ϕ 0 + ϕ 0 T 1,0 ϕ 0 e ik a ) ϕ 0 T 0,1 (C.98) The three complex conjugate matrices, recall from previous section, serves the only purpose of regulating the finalized matrix size, so regardless one choose the leftgoing or right-going wave, these terms cancel each other out as clearly seen from the equation.

159 145 C.4.2 Other methods of determining contact self-energy We will now presenting other methods of determining contact self-energy, although the wavefunction approach presented previously (also known as Quantum Transmitting Boundary Method (QTBM)) is potentially the most computationally effective way. Solution of self-energy by definition: the decimation and iterative techniques By definition, self-energy is term inserted into the device Hamiltonian to take the infinitely large contacts into account, thus reducing a complete but infinitely large Hamiltonian matrix into a finite and solvable one. The contact effects are taken into account by the self-energy exactly. To illustrate the idea, a full system Hamiltonian matrix can be partitioned as the following G left G ld G dl G device G dr G rd G right E + i0 I H left H ld = H ld E + i0 H device H rd H rd E + i0 H right 1 (C.99) (C.100) G left is green s function of left contact (C.101) The first step toward obtaining self-energy is to find the lead green s function G left and G right. G left = E + i0 I H left 1 G right = E + i0 I H right 1 (C.102) (C.103)

160 146 Notice the dilemma is H left and H right are infinitely large matrices because contacts are infinitely large. Therefore, a direct attempt to solve the above equations aren t practical. Now, we will follow a phenomenological approach toward determining self-energy. We have to first understand the definition of a contact. In device simulation, a contact is defined as an infinitely large region. We can describe the contact in terms of principle layers just as what we did for the device region. Let s start with a single isolated principle layer and grow it by adding coupled principle layers to it to become a contact. Define a principle layer s retarded green s function G 0 = E + i0 I H 0 1 H 0 is the Hamiltonian of principle layer (C.104) (C.105) This is an isolated layer. Now, we couple it with another principle layer added to its right side. The regarded green s function for this second principle layer is G 1 = E + i0 I H 0 Σ 01 1 Σ 01 = τ G 0 τ = τ E + i0 I H 0 1 τ Σ 01 is the self-energy due to the left (C.106) (C.107) (C.108) Now, the second principle layer s regarded green s function is written in an explicit form with every value known to us. We can keep adding principle layers to its left and repeat the above procedure G 2 = E + i0 I H 0 Σ 12 1 Σ 12 = τ G 1 τ (C.109) (C.110) = τ E + i0 I H 0 Σ 01 1 τ (C.111) = τ E + i0 I H 0 τ E + i0 I H 0 1 τ 1 τ (C.112)

161 147 and finally G n = E + i0 I H 0 Σ n 1,n 1 Σ n 1,n = τ G n 1 τ = τ E + i0 I H 0 τ G n 1 τ 1 τ =... (C.113) (C.114) We now have the choice to calculate a regarded green s function for the contact composed of n principle layers. However, how many principle layers shall we include to be sufficient? We now observe an important property of the contact: because contact is ideally infinitely large, adding an additional principle layer into it doesn t change its property. In another word, we need to include so many principle layers to a point that G n and G n 1 has negligible difference. This outlines the principle of the decimation method of calculating surface green s function numerically: keep adding layers to the previous two equations until the difference between G n and G n 1 are within certain small number. G n = E + i0 1 I H 0 Σ n 1,n Σ n 1,n = τ G n 1 τ = τ E + i0 I H 0 τ G n 1 τ 1 τ =... Another method using the same principle relies on a typical iterative scheme. In this method, we can simply set G n = G n 1, and equation becomes G n = E + i0 I H 0 Σ n 1,n 1 = E + i0 I H 0 τ G n 1 τ 1 = E + i0 I H 0 τ G n τ 1 (C.115) (C.116) (C.117) One gives an initial guess to G n, and calculate the right-hand-side. If two sides of the equation s difference is larger than a pre-set tolerance, one updates the guess for G n and calculate the RHS again. Keep iterating like this until the solution converge.

162 148 Both the methods mentioned in this section are born straight forwardly from the definition of self-energy. They have the advantage of being easy to apply and understand. On the other hand, they are rarely used since matrices need to be repeatedly inverted, which costs great amount of computational time. One of the most popular scheme as an alternative is discussed in the following section Sancho-Rubio method An alternative and more computationally efficient method is suggested by Sancho, Sancho, and Rubio. [52] The Sancho-Rubio method improves our previously presented straight-forward decimation technique by introducing the concept of effective-layer. Starting from the overall Green s function of the semi-infinite contact G 0,0 G 0,1 G 0,2 G 0,3 G 0,n G 1,0 G 1,1 G 1,2 G 1,3 G 1n G 2,0 G 2,1 G 2,2 G 2,3 G 2,n G contact = G 3,0 G 3,1 G 3,2 G 3,3 G 3,n G n,0 G n,1 G n,2 G n,3 G n,n D 0,0 H 0, H 0,1 D 0,0 H 0, H 0,1 D 0,0 H 0,1 0 = 0 0 H 0,1 D 0, D n,n D 0,0 = E + i0 I H 0,0 1 (C.118) (C.119) (C.120)

163 149 Here, since we are dealing with a contact, we have reasonably set all Hamiltonians of the principle layers and coupling between them to be the same. G 0,0 is where the entire chain got cut off, so it s the surface green s function. G n,n is somewhere deep inside the semi-infinite chain, so if n is large enough, it can be interpreted as the regarded bulk green s function. Alternatively, we can write D 0,0 H 0, H 0,1 D 0,0 H 0, H 0,1 D 0,0 H 0, H 0,1 D 0, D n,n (C.121) G 0,0 G 0,1 G 0,2 G 0,3 G 0,n G 1,0 G 1,1 G 1,2 G 1,3 G 1n G 2,0 G 2,1 G 2,2 G 2,3 G 2,n G 3,0 G 3,1 G 3,2 G 3,3 G 3,n G n,0 G n,1 G n,2 G n,3 G n,n (C.122) = I I I I I (C.123)

164 150 This gives us a system of equations concerning the first column of Green s function matrix: D 0,0 G 0,0 = I + H 0,1 G 1,0 D 0,0 G 1,0 = H 0,1G 0,0 + H 0,1 G 2,0 D 0,0 G 2,0 = H 0,1G 1,0 + H 0,1 G 3,0 D 0,0 G 3,0 = H 0,1G 2,0 + H 0,1 G 4,0 (C.124) (C.125) (C.126) (C.127) (C.128) D 0,0 G n,0 = H 0,1G n 1,0 + H 0,1 G n+1,0 (C.129) We are highly interested in this column because it contains the surface Green s function G 0,0 which needs to be solved. Essential trick in the Sancho-Rubio method is to let even-index layers to absorb neighboring odd-index layers to form what s so called effective layers. Let s demonstrate how it s done: D 0,0 G 0,0 = I + H 0,1 G 1,0 G 1,0 = D0,0 H 1 0,1G 0,0 + H 0,1 G 2,0 D 0,0 G 2,0 = H 0,1G 1,0 + H 0,1 G 3,0 G 3,0 = D0,0 H 1 0,1G 2,0 + H 0,1 G 4,0 (C.130) (C.131) (C.132) (C.133) (C.134) G n 1,0 = D0,0 H 1 0,1G n 2,0 + H 0,1 G n,0 (C.135) D 0,0 G n,0 = H 0,1G n 1,0 + H 0,1 G n+1,0 G n+1,0 = D0,0 H 1 0,1G n,0 + H 0,1 G n+2,0 n here is assumed to be even (C.136) (C.137) (C.138) Now merge neighboring odd-index layers into even ones

165 151 D 0,0 G 0,0 = I + H 0,1 D0,0 H 1 0,1G 0,0 + H 0,1 G 2,0 D 0,0 G 2,0 = H 0,1D 0,0 H 1 0,1G 0,0 + H 0,1 G 2,0 + H 0,1 D 1 0,0 D 0,0 G 2n,0 = H 0,1D 1 0,0 + H 0,1 D 1 0,0 H 0,1G 2,0 + H 0,1 G 4,0 H 0,1G 2n 2,0 + H 0,1 G 2n,0 H 0,1G 2n,0 + H 0,1 G 2n+2,0 n has been changed to 2n because we need even index. (C.139) (C.140) (C.141) (C.142) (C.143) (C.144) (C.145) Clean up the equations in more compact and general form D 1s G 0,0 = I + α 1 G 2,0 D 1 G 2n,0 = β 1 G 2(n 1),0 + α 1 G 2(n+1),0 (C.146) (C.147) α 1 = H 0,1 D 1 0,0H 0,1 (C.148) β 1 = H 0,1D 1 0,0H 0,1 (C.149) D 1s = D 0,0 H 0,1 D 1 0,0H 0,1 (C.150) D 1 = D 0,0 H 0,1 D 1 0,0H 0,1 H 0,1D 1 0,0H 0,1 (C.151) n>0 (C.152) index 1 stands for 1st iteration loop (C.153) Now, what we have accomplished is, while retaining the surface Green s function G 0,0, the effective layers are now twice as large as the original principle layers. We can again repeat what we did and further combine the effective layers into another set of effective layers 4 times larger than original principle layers. Not hard to see that the size of effective layer grows with 2 to the power of number of iterations: after v iterations, the size of effective layer is 2 v times of original principle layer. After another iteration

166 152 D 1s G 0,0 = I + α 1 G 2,0 (C.154) G 2,0 = D 1 1 (β 1 G 0,0 + α 1 G 4,0 ) (C.155) D 1 G 4,0 = β 1 G 2,0 + α 1 G 6,0 (C.156) G 6,0 = D 1 1 (β 1 G 4,0 + α 1 G 8,0 ) (C.157) (C.158) G 2(n 1),0 = D1 1 β1 G 2(n 2),0 + α 1 G 2n,0 (C.159) D 1 G 2n,0 = β 1 G 2(n 1),0 + α 1 G 2(n+1),0 G 2(n+1),0 = D1 1 β1 G 2n,0 + α 1 G 2(n+2),0 (C.160) (C.161) n>0 (C.162) more compactly D 2s G 0,0 = I + α 2 G 4,0 D 2 G 4n,0 = β 2 G 4(n 1),0 + α 2 G 4(n+1),0 (C.163) (C.164) α 2 = α 1 D 1 1 α 1 (C.165) β 2 = β 1 D 1 1 β 1 (C.166) D 2s = D 1s α 1 D 1 1 β 1 (C.167) D 2 = D 1 α 1 D 1 1 β 1 β 1 D 1 1 α 1 (C.168) n>0 (C.169) After v iterations,

167 153 D vs G 0,0 = I + α v G 2 v,0 D v G 2 v n,0 = β v G 2 v (n 1),0 + α v G 2 v (n+1),0 α v = α v 1 D 1 v 1α v 1 β v = β v 1 D 1 v 1β v 1 D v,s = D v 1,s α v 1 D 1 v 1β v 1 D v = D v 1 α v 1 D 1 v 1β v 1 β v 1 D 1 v 1α v 1 (C.170) (C.171) (C.172) (C.173) (C.174) (C.175) n>0 (C.176) Recall that α v is the effective coupling between surface principle layer and a distant layer. Ideally, a contact is large enough to make sure this coupling is zero. That is, what happens on one distant end doesn t affect surface property any more. This is the criteria when one shall stop the above Sancho-Rubio iteration and deem the solution has converged. After reaching convergence, the surface green s function can be obtained as G 0,0 = D 1 vs (C.177) From geometry point of view, the decimation based technique, no matter it s Sancho-Rubio or straight forward definition, is essentially a contact build up approach. The speed of each technique relies on how fast they can build a contact with principle layers large enough to be called a contact. Sancho-Rubio method builds with 2 to the power of iteration total, while the straight-forward method builds up one principle layer after another linearly. From matrix coupling point of view, the decimation based technique relies on looking for the neighbor layer coupling from nearest toward further ones in a direction starting from a principle layer. If look toward one direction, one obtains surface case; if look toward both directions, one obtains the bulk case. Each time a coupling is made between starting layer and a distant one, the effects of layers between them is

168 154 taken into account as self-energy for the starting layer, the term D. Repeat so until the coupling between the starting principle layer and a distant one is almost zero, and at that point one can deem the self-energy term that of surface/bulk.

169 LIST OF REFERENCES

170 155 LIST OF REFERENCES [1] W. Heywang and K.H.Zaininger, Silicon: The Semiconductor Material. Silicon: evolution and future of a technology, Springer, [2] R. H. Dennard and et al, Design of ion-implanted mosfets with very small physical dimensions, IEEE Journal of Solid-State Circuits, vol.sc-9,pp , [3] Y. Taur and T. H. Ning, Fundamentals of Modern VLSI Devices. Cambridge University Press, [4] S. A. Campbell, The Science and Engineering of Microelectronic Fabrication. Oxford University Press, [5] Z. Ren and et al., Nanomos [6] Z. Ren, Nanoscale MOSFETs: Physics, Simulation and Design. PhD thesis, Purdue University, [7] Z. Ren, R. Venugopal, S. Datta, and M. S. Lundstrom, Examination of design and manufacturing issues in a 10nm double gate mosfet using nonequilibrium green s function simulation, in IEDM, [8] H. Pal, T. Low, and M. Lundstrom, Negf analysis of ingaas schottky barrier double gate mosfets, in IEEE Internation Electron Devices Meeting, [9] Y. Liu and M. Lundstrom, Simulation of III-V HEMTs for high speed low power logic applications. Theelectrochemicalsociety,2009. [10] S. Sugahara and M. Tanaka, A spin metal-oxide-semiconductor field-effect transistor using half-metallic-ferromagnet contacts for the source and drain, Applied Physics Letters, vol.84,p.2307,2004. [11] Y. Gao and et al., Simulation of the spin field effect transistors: effects of tunneling and spin relaxation on its performance, Journal of Applied Physics, [12] M. Lundstrom, Fundamentals of carrier transport. Cambridge University Press, [13] D. L. Scharfetter and H. K. Gummel, Large-signal analysis of a silicon read diode oscillator, IEEE Transactions on Electron Devices, vol.16,pp.64 77, [14] S. Ramadhyani and S. V. Patankar, Solution of the poisson equation: Comparison of the galerkin and control-volume methods, International Journal for Numerical Methods in Engineering, vol.15,pp ,2005.

171 156 [15] L. C. Evans, Partial differential equations. American Mathematical Society, [16] S. Selberherr, Simulation of semiconductor devices and processes. Springer- Verlag Wien, [17] H. K. Gummel, A self-consistent iterative scheme for one-dimensional steady state transistor calculation, IEEE Trans. Electron Devices, vol.11,pp , [18] R. de L. Kronig and W. G. Penney, Quantum mechanics of electrons in crystal lattices, Proceedings of the Royal Society of London, vol.a130,p.499,1931. [19] R. F. Pierret, Advanced semiconductor fundamentals. Addison-Wesley Publishing, [20] G. Dresselhaus, A. F. Kip, and C. Kittel, Cyclotron resonance of electrons and holes in silicon and germanium crystals, Phys. Rev., vol. 98, pp , [21] H. Uchiyama, K. M. Shen, and et al., Electronic structure of mgb2 from angleresolved photoemission spectroscopy, Phys. Rev. Lett., vol.88,p ,2002. [22] S. Sze and K. Ng, Physics of semiconductor devices. Wiley-Interscience, [23] R. Pierret, Semiconductor Device Fundamentals. Pearson,1996. [24] N. Saito, An interpretation of the scharfetter-gummel finite difference scheme, Proc. Japan Acad. Ser. A Math. Sci, vol.82,pp ,2007. [25] J. W. Jerome, Drift-diffusion systems: variational principles and fixed point maps for steady state semiconductor models. Springer,1991. [26] S. Odanaka, M. Wakabayashi, and T. Ohzone, The dynamics of latchup turnon behavior in scaled cmos, IEEE Transactions on Electron Devices, vol.32, pp , [27] R. Kim and M. Lundstrom, Notes on fermi-dirac integrals [28] G. W. Brown and B. W. Lindsay, The numerical solution of poisson s equation for two-dimensional semiconductor devices, Solid-State Electronics, vol.19, [29] S. Datta, Electronic transport in mesoscopic systems. Cambridge University Press, [30] M. Buttiker, Four-terminal phase-coherent conductance, Phys. Rev. Lett., vol. 57, pp , [31] H. Pal, To be published phd thesis. [32] D. Nikonov, To be published book chapter on quantum transport in nanomos. [33] P. O. Lowdin, On the non?orthogonality problem connected with the use of atomic wave functions in the theory of molecules and crystals, The Journal of Chemical Physics, vol.18,p.365,1950.

172 157 [34] C. Kittel, Introduction to Solid State Physics. Wiley, [35] D. J. Griffiths, Introduction to Quantum Mechanics. Prentice Hall, [36] D. Nikonov and H. Pal, Electron-phonon and spin scattering in negf: Made simple [37] G. Klimeck, Numerical aspects of negf: The recursive green function algorithm [38] S. Datta, Quantum phenomena. Addison-Wesley Publishing, [39] S. Datta, Quantum transport: atom to transistor. Cambridge University Press, [40] L. P. Kadanoff and G. Baym, Quantum Statistical Mechanics. Benjamin/Cummings, [41] A. Majumdar, Z. Ren, and et al., Undoped-body extremely thin soi mosfets with back gates, IEEE Transactions on Electron Devices, vol.56,2009. [42] M. McLennan, Add rappture to your software development [43] L. W. Nagel and D. O. Pederson, SPICE (Simulation Program with Integrated Circuit Emphasis). University of California, Berkeley, [44] G. Klimeck and M. Luisier, From nemo1d and nemo3d to omen: moving towards atomistic 3-d quantum transport in nano-scale semiconductors, in IEEE International Electron Devices Meeting, [45] S. Lee and et al., Million atom electronic structure and device calculations on peta-scale computers, in IWCE, [46] Purdue university steele cluster. [47] P. G. Ciarlet, The finite element method for elliptic problems. North-Holland Publishing Company, [48] D. Vasileska and S. M. Goodnick, Computational electronics. Morgan&Claypool Publishers, [49] W. R. Frensley, Evaluation of scattering-state wavefunctions. frensley/technical/qtrans/node7.html. [50] C. Bowen, Full bandstructure quantum transport in nanoscaled devices. PhD thesis, The University of Texas at Dallas, [51] M. Luisier, Quantum Transport Beyond the Effective Mass Approximation. PhD thesis, ETH Zrich, [52] M. P. L. Sancho, J. M. L. Sancho, J. M. L. Sancho, and J. Rubio, Highly convergent schemes for the calculation of bulk and surface green functions, Journal of Physics F: Metal Physics, vol.15,1985.