WITH the proliferation of electronic devices that utilize

Transcription

1 IOWA STATE UNIVERSITY - CPRE581, DECEMBER A Survey of Phase Change Memory (PCM) Heather Garnell, Matthew Kocsis, Matthew Weber Abstract Phase change memory (PCM) brings opportunity to replace multiple existing technologies and enhance others. It fits well into the long term need to have scalable random access memory (RAM) and a non-volatile memory replacement that resolves limitations of existing memory parts and edges towards a universal memory. Options for using phase change memory as RAM and as non-volatile memory are investigated, along with hybrid approaches that resolve limitations of PCM as well as existing memory technology. Further more, design and reliability are discussed with considerations to bring this technology to fruition. Practical commercial applications of this technology will soon be available as a result of the research conducted over the past several decades. Index Terms Phase Change Memory, PCM, PRAM. I. INTRODUCTION WITH the proliferation of electronic devices that utilize flash storage and RAM technologies, future growth will require significant advancements to maintain. Phase Change Memory, also known as Ovonic Unified Memory (OUM), looks to address power consumption, memory cell density, scalability, and access time latencies, while maintaining compatibility with existing RAM and flash technology. It allows the system DRAM like addressable access to the memory and easily adapts to flash s block mode. It seeks to be a Universal memory. Since PCM could be classified as an emerging technology, most of the current research has been focused in the validation of the physics, as well as theories for performance and reliability. A few sample devices have been produced currently, but for the most part, practical applications have gathered data points for research of the technology through simulations. This survey report is structured to provide an overview of the physical design of PCM with additional information to give a historical background on how the technology has developed. It also dives into some practical applications and the benefits the technology can bring to existing systems and those not yet developed. Lastly the topic of reliability is discussed along with the software and hardware techniques that can used to work around and improve based on the issues this technology presents. In the 1960s the field of semiconductor electronics was expanding significantly due to the recent pioneering inventions of the transistor and the integrated circuit. During this time an American inventor named Stanford Ovshinsky worked to study the properties of thin film electronic devices which would later become fundamental concepts in the field of nanotechnology. Many of Ovshinsky s experimental structures contained elements found in group 16 (also known as 6A) of the periodic table. Group 16 elements are called both the oxygen family and the chalcogens. This group contains the elements oxygen, sulfur, selenium, tellurium, and polonium (ununhexium is also included, however it does not occur in nature). Due the the chemical instability of these elements (each atom lacks two electrons to achieve stable state), they often form into molecules called chalcogenides, which include a chalcogen ion and at least one other electively positive element (Note: molecular bonding of this nature involving oxygen is commonly called an oxide rather than a chalcogenide). Chalcogenides can be formed into glasses, which are matrixed structures of covalent-bonded chalcogens with other elements, typically those of groups 3A, 4A, and 6A [27]. In 1968 while studying the semiconducting properties of amorphous chalcogenide glasses, Ovshinsky reported the ability to rapidly transition the semiconductor between high and low impedance [27]. While in retrospect this study seems trivial, an interesting memory phenomenon was observed in the semiconductor during tests. It was found that the high or low impedance state of the chalcogenide glass semiconductor would remain even after switching current was removed. The next year, a group at Iowa State University (led by Arthur Pohm and Charles Sie) further studied the memory effects of the chalcogenide glasses and found that the memory was a result of a physical change to the glass when crystals formed or melted [28]. The Iowa State team also laid a foundation for fabrication and control of the memory devices using diodes and transistors. Ovshinsky s discovery and the work at Iowa State laid the foundation for significant technological advances, including modern phase change memory. II. PHYSICS In the years after Ovshinsky, Pohm, and Sie s work on chalcogenide glass resistivity, Phase change memory quickly fell behind other emerging storage and memory technologies. It wasn t until 1987 that a team in Osaka, Japan was able to speed up the phase change properties of chalcogenide compound to the point where it was again competitive with other technologies [29]. Phase chase cells store information by means of a physical change of their state which changes the impedance of the cell. The physical change is non-volatile, therefore it does not require voltage, current, or power to maintain the state. The chalcogenide material that makes up the cell typically has two distinct states, high and low impedance. When the cell is in the high impedance state, the molecular structure of the cell is disorganized and considered amorphous. When the cell is in the low impedance state, the molecular structure is more organized in a crystalline manner [30]. Initially research and understanding of the molecular states of the chalcogenide glass materials concluded that the fundamental initial cause of the phase change was thermal, however it is now more

2 IOWA STATE UNIVERSITY - CPRE581, DECEMBER commonly known that electronic affects are necessary to cause sufficient heat to precipitate the physical state change [26]. Temperature induced state change is the primary means to chalcogenide glass memory storage. Although alternative temperature inducing techniques have been employed, such as lasers in the previously mentioned study by Yamada et al. [29], the primary method of heating the glass film for PCM applications is by the application of electrical current. Several values for temperature, time, and voltage are important in the understanding of the phase change process. Each of these values changes based on the material type and physical properties (ie dimensions) of the phase change cell. The important values are melting temperature, crystallization temperature, Reset time, Set time, Read time, Reset voltage, Set voltage, Read voltage, and Threshold voltage. Figure 1 illustrates how these values affect the phase change device cell. For a phase change memory bit to store a zero, the device must exist in the high impedance amorphous state (RESET). The amorphous state is achieved by heating the glass to a sufficiently high temperature to cause melting (melting temperature, held for Reset time at a minimum), and then rapidly cooling the glass so it solidifies without forming into a crystal structure. To cause this rapid heating, a voltage greater than the Reset voltage and threshold voltage is applied for a length of time longer than the Reset time. This operation constitutes both the upper bound of required current and required voltage of the memory device. [2] For a phase change memory bit to store a 1, the device must exist in the low impedance crystal state (SET). The crystal state is achieved by heating the glass to a temperature greater than the crystallization temperature but less than the melting temperature for a time no less than the Set time. This level of high but not extreme heat facilitates the glass to lattice and become highly conductive. To cause this heating, a voltage must be applied that is larger than the Set voltage, but less than the threshold voltage for a period of time no shorter than the Set time. This operation constitutes the upper bound of both power consumption (power * time) and write time. [2] The state of the phase change device can be read by applying a voltage that is significantly smaller than the Set voltage for just enough time to determine whether the cell is in a high or low impedance state [2]. In order for the memory to retain data, it is crucial to be able to reliably distinguish between the high and low impedance states. A. Temperature Dependence Unlike conventional memory or storage that relies on voltages or magnetism to maintain data, phase change memory s dependence on temperature creates a new set of temperature related drawbacks. The drawbacks of temperature dependence exist primarily because it adds an additional environmental parameter that has the potential to degrade the performance, retention, or integrity of the data. Operations that are standard in electronics manufacturing, like soldering components, have the potential to erase or destroy data on a PCM device. Beyond extreme temperatures, operating and storage temperature can also have an unwanted impact on PCM devices. Phase change cells undergo a gradual transition from the amorphous state to the crystalline state [31]. It is known that with increased temperature, the rate of crystallization increases, and it is commonly believed that the relationship between temperature and crystallization follows Arrhenius Law [31]. Due to this phenomenon, increased temperatures will reduce the stability of the programmed state (tending towards lower resistance), thus affecting data retention lifetime. Other efforts to increase the data density focus on storing multiple bits per PCM cell. This is accomplished by defining multiple crystalline states between completely amorphous and completely crystalline that have discernible resistances. This approach, however, dramatically increases the affect of high temperatures due to the relative small differences in resistance. Data on the resistance shift of a four state (2 bit) PCM cell is shown in Figure 2 [2]. Reliability of PCM devices will be addressed later in this survey. Fig. 2. Multi-bit resistance patterns of a PCM cell. The data shows that over time, the impedance of the specific states skew into the expected ranges of other states, showing the critical nature that temperature plays in PCM reliability. Used from [2] Fig. 1. Phase change cell operations B. Threshold Voltage Heat being the primary catalyst for state changes in PCM devices can be produced by multiple techniques, however as mentioned previously the most common is electrical current. Heating effects are created when a significantly large voltage is applied across the PCM cell, inducing current flow, and thus a heating effect. Described by Ohm s Law, power equals current squared divided by the resistance of the material. The expected power consumption and required voltage needed to melt the chalcogenide glass become alarming when it is revealed that

3 IOWA STATE UNIVERSITY - CPRE581, DECEMBER the difference between low and high impedance can be as high as six orders of magnitude [28]; the high impedance state is simply too resistive to economically melt electrical current. It is at this point where the chalcogenide glass shows an interesting property; at a certain threshold voltage, the glass exhibits negative resistance [33]. As the voltage applies to the glass increases, it reaches a certain threshold voltage where the resistance of the material suddenly begins to drop. At this point, the classic I/V curve shows a distinct S shaped pattern where the voltage decreases and the current increases (see Fiture 3). This threshold voltage enables achieving sufficiently high current in order to heat the material without an extraordinary amount of power consumption. Even though the threshold switching property of chalcogenide glasses is not completely understood, it has long been recognized as an electronic process [32]. Researchers have been interested in this phenomenon since it was discovered primarily because of the obvious power and time saving properties of the low resistance breakdown state. Modeling research of this phenomenon is a primary method to gain understanding of how both scaling and materials will affect the threshold properties. Modeling by Ielmini [32] supports evidence that the increased electric field caused by the voltage differential across the phase change material causes an electron energy gain resulting in electron hopping between electron traps inherent in the amorphous material. Other studies by Redaelli et al. [34] goes further to model the phase change cell s transition dynamics in response to the electronic carriers. devices at micro and nano scale. Further research by IBM under Simone Raoux studied the properties of crystallization as film thickness was decreased to the single-nanometer level, and the group found that the crystallization temperatures did change based on thickness [35]. As the film s thickness was reduced on the nanometer order, the crystallization temperature increased. The study also found that the as thickness is reduced, the semiconductors surrounding the phase change material play a larger role in determining crystallization temperatures. These results are promising since the materials continue to show solid phase change properties even at nanoscale dimensions. Research by Wei et al [36], however, reports that the overall time to crystallize PCM cells also increases as the devices are scaled on the nanometer in addition to an expected increase in crystallization temperature. Beyond crystallization temperatures, the effects of scaling on melting temperatures has also been studied. IBM s team lead by Raoux reports research that shows that melting temperatures show the opposite effects when phase change cells are scaled; the temperature decreases [37]. This is beneficial to the technology because it leads to less energy consumption and less voltage requires to drive the device into the RESET state. Research into other phase change cell parameters also supports the case that scaling of PCM cells is feasible. Lankhorst, Ketelaars and Wolters presented research in 2005 involving a different line cell design where phase change material was arranged as a thin lead between two metal contacts [38]. This study revealed that the threshold switching voltage also scales linearly as a function of the bridge length, defined as the length of the applied voltage. These results again prove beneficial for power consumption as devices are created on the order of nanometers, however the continued scaling of threshold voltage would end up being catastrophic if it ever reached the read voltage level. Ielmini s research shows, however, that the threshold voltage saturates around 10nm, therefore the voltage required to perform a read on the cell should not cause a switching effect on the cell [32]. Fig. 3. Observed threshold voltage characteristics with additional modeled I/V curves. Used from [32] C. Scaling Feasibility for a new memory technology must survive beyond the first several generations, therefore physical issues, including scalability must be addressed before the technology can seriously be considered for widespread use. Each of the primary properties must be considered as devices are driven to smaller sizes with lower power footprints. Research on large PCM cells shows that temperature properties are not affected by physical dimensions [22], and that only material composition played a role in determining crystallization and melting temperatures, however this research did not study D. Fabrication Much like scalability, physical design and fabrication of micro- and nano-scaled components must be researched, prototyped, and understood before phase change memory can be adopted for large scale production and use. Much research has focused on applying current fabrication and design techniques typically used in micro- and nano-scale electronic components to phase change memory cells. There are multiple design considerations, three of which dominate current research: phase change material size and shape; electrical contact design; and current sourcing devices. 1) Design Limitations: Many of the most important design limitations are due to the large current source required to drive PCM cells. If PCM cells are created on a lithographic scale, the current required to drive the cells require large devices such as FETs or diodes [2]. Large current sourcing devices not only increases the power required to operate the memory, but also slows down the performance and limits the density of the storage. Research into sourcing devices has found that

4 IOWA STATE UNIVERSITY - CPRE581, DECEMBER Fig. 5. Current improvements for the volume limited (planar) cell design. Heating regions are shows in the false-color images. Used from [47] Fig. 4. Mushroom style phase change memory cell side view. The active region is illustrated, showing that only part of the phase change material changes state when heated. Used from [34] BJTs can effectively power PCM cells created by a 90nm process [40], and that vertically stacking diodes with 90nm PCM cells could also provide the required current to create a PRAM device as large as 512Mb [41]. It is suggested however that these current source devices will not scale as effectively, since models show that current sourcing capability reduces much faster with feature size than the reduction in current requirements of the PCM cell [2]. Sub-lithographic phase change features however can overcome this limit by optimizing the phase change element instead [2]. It is worthy to note that studies have been performed to investigate the feasibility of PCM arrays without access devices (transistors or diodes). Demonstrations by Chen et al. show that the threshold switching can be used as a method of programming and reading the cells between a RESET state and an additional strong RESET state [42]. Issues do arise with techniques like this, however, since the high and low resistances (RESET and strong RESET) could be too close to each other for fast read times, and the constant danger of performing an unwanted phase change operation while attempting a read. 2) Cell Design: There are two primary designs for PCM cell construction, contact minimized and volume minimized. Both designs have advantages and disadvantages and are currently the topic of many research studies. Figure 6 illustrates the two primary designs. Contact minimized cells are created where the contact between the conducting element has a known contact area and meets a larger volume of phase change material [43]. This design has advantages due to the simplicity of the materials involved, since the phase change material is not unusually formed, and it also defines the critical dimension of the cell in terms of the heater/contact element. It has been shown that edge contact size significantly lowers required current, which benefits power consumption and density [44]. The primary disadvantage to this approach is due to the large phase change volume, and the greater current needed to heat the material. The volume minimized cell approaches the design in the opposite fashion; the phase change material itself fills an open pore in the surrounding insulating layer and therefore limits the overall volume of the phase change material that must be heated [43]. The obvious advantage to this design is the much reduced RESET current required for the device, however the phase change material itself must be modified at a sub-lithographic scale. There has been research into several Fig. 6. PCM cell design and voltage characteristics. (a) Voltage/Current relationship for PCM cells in low-impedance state (upper line) and high-impedance state (lower line). (b) Contact-minimized cell design. (c) Volume-minimized cell design. Used from [39].

5 IOWA STATE UNIVERSITY - CPRE581, DECEMBER designs that attempt to form a hybrid of the two alternatives. A novel design by Chen et al. at IBM creates a phase change bridge memory cell where the phase change material consists of a narrow thin line between two electrodes [45]. This design benefits reduced phase change material volume, a lower threshold voltage, and well defined contact dimensions. The bridge design has been shown to achieve the RESET state with less than 200 microamps of current. Pellizzer et al. present a similar approach, where they create a micro-trench containing the phase change material, having many of the same benefits as the bridge design [46]. Another design creates an even smaller phase change region by depositing the material within a small dashed-contact, thus minimizing phase change area, current, and switching time [47]. Fig. 7. Memory hierarchy ranging from low cost slow speed off-line storage to small amounts of high performance memory paired with the CPU [2] E. Processing Processing methods in modern phase change memory devices are very similar to those employed manufacturing other VLSI and circuit components. Most methods for PCM cell creation involve sputtering the PCM component onto a medium, typically an insulating material or on top of other prepared surfaces that lie under the phase change material [2]. Etching techniques have also been studied and show promise due to the consistency and linear nature of etch time and etch depth [48]. A further processing enhancement presented by Zhong et al. shows that an additional mechanical polishing step dramatically increased reliability, increasing programming endurance by an order of magnitude [49]. The polishing step deals with the irregularities that occur during normal material contact points; the smooth surface helped maintain consistent resistances thus increasing the overall reliability of the PCM cells. III. PRACTICAL APPLICATIONS OF PCM Phase Change Memory can primarily be categorized as a replacement or enhancement of existing memory technology. The research about practical applications seemed to have been focused on creating implementations that use it as either Random Access Memory (RAM) or flash/storage. When compared to current technology, PCM S limitations make the technology more suitable as flash memory replacement, then RAM. [1] This however hasn t prevented numerous concepts from being proposed, with some coining PCM as Universal Memory (UM) or Storage Class Memory (SCM). With the eventual goal of reducing the cost/performance gap that exists between today s RAM & flash. [2] In a traditional computing system you d have a memory hierarchy similar to Figure 7 [2]. Where the latency increases and the cost decreases as you go from the top to the bottom. Each level of components would directly impact the capability and performance of the system. The Off Chip RAM layer would consist of volatile memory that would be in a state of refresh and being accessed by the On Chip controllers (caching, etc). The On Line Storage would consist of nonvolatile memory/storage where data would be paged into or stored/retrieved from a filesystem. The PCM would allow the Off Chip and On Line storage technologies (shown in Figure 7 [2]) to be replaced by a single technology or hybrid, at a reduced cost and reasonable performance. Figure 8 [3] shows how the different storage technologies compare. Note the gap between the storage/flash (HDD,NOR,NAND) and DRAM/SRAM where the current PCM technology fits. This reduction in latency allows a substantial performance improvement for every system memory miss, since current technology has the hard drive performing 5 orders of magnitude slower then the system memory. [3] A. PCM as a RAM alternative In applications where PCM is likely to be used as a RAM alternative, it must be capable of addressing the long latencies, high energy writes and limited endurance. The current PCM research has already proven there s opportunity for increased density (capacity) and lower cost [4]. 1) Traditional Architecture: SRAM - Static Random Access Memory is primarily used in low latency configurations like cache. SRAM is a low capacity; volatile, fast access memory that usually runs at clock speeds similar to the CPU it s paired with. It provides the CPU with an efficient memory swap space for instructions and data during runtime. Replacing this technology with PCM is a challenge because of PCMs limitations. PCM is closer to a flash replacement than RAM. At a high level the issues are with the latencies when doing memory access and the number of times you can access/change state of a cell. However, there are some definite benefits with capacity because of the cell size allowing for denser designs. [2] The performance of SRAM is in the range of 10ns (or less) access times. Current PCM technology puts read accesses within reach, but writes consume significant time ns. [2] There are demonstrations of getting the write time down to 10ns, but not at a large scale. [2] The endurance of SRAM is infinite, so when compared to PCM an improvement of at least 10 8 has been estimated as adequate to even make PCM a option. Some research has shown that by giving up the non-volatile (retention) feature of PCM, lower energy writes could get around cycles. [2] DRAM System Memory - Dynamic Random Access Memory (DRAM) consists of arrays of capacitors that requires a periodic refresh to maintain it s contents and also requires a write after every read to restore what was read. With PCM

6 IOWA STATE UNIVERSITY - CPRE581, DECEMBER a read is non-destructive, with only a modification to the data requiring a write transaction. [4] The organization of PCM has a very similar layout to DRAM technology, consisting of banks, blocks, and subblocks, with the row and column address possibly decoded into local subblocks. [4] Figure 9 shows a diagram of this hierarchical internal organization. Fig. 10. PCM Performance when compared to 400Mhz DRAM [4] Fig. 9. [4] DRAM is organized into banks, blocks, subblocks, rows and columns. Since DRAM has access times slower then SRAM, it s definitely possible that depending on the application, PCM access latencies could be acceptable. [2] With the obvious exceptions of more demanding applications like video processing or caching. DRAM is also going to be similar to SRAM with the fact that it has infinite writes. So replacing DRAM with just PCM and not a caching/buffering scene would greatly affect the life expectancy of system. [2] That being said there are applications where a system has a defined life and purpose where this memory may be a adequate fit for the requirements. When it comes to the energy consumption, PCM has an energy intense write process that s tough to compare to DRAM. The number of DRAM banks, memory refresh and having to rewrite to each bank after each read access could leave PCM as lower power. [2] Especially since it inherently has lower static power leakage because of being nonvolatile. [5] Some performance and energy usage research was done using a model of a 4Ghz architecture running on a SESC simulator and testing against SPLASH-2 and SPEC, and NAS benchmarks. It was found that on average there was a 1.6x delay penalty and 2.2x energy penalty when directly replacing DRAM with PCM, as seen in Figure 10. [4] The results have also shown in Figure 11 that the 72% of non-destructive reads to PCM that dont require a write, do help improve the delay and energy penalty. [4] Fig. 11. PCM Writes for every read when compared to 400Mhz DRAM [4] One approach to getting better performance and energy usage is to add a buffer on each bank. It would act as a intermediate cache to prevent excessive swapping, which would previously incur delay and energy penalties. [4] Continuing the same research as above, things were revisited to identify the optimum buffer for a PCM modeled after a 400MHZ 4 bank DRAM. The tests covered sizes from byte buffers to find the optimum configuration that got the timing and energy closest or better then DRAM. The results had the delay penalty reduced from 1.6x to 1.16x and the energy penalty from 2.2x to 1.0x. Almost all of the benchmarks, as seen in Figure 12, were within 5% of the DRAM results. Any benchmarks that didn t show change more then likely must execute with data that s consistently invalidated from the buffer (no locality). 2) Addressing Architecture Limitations: A few approaches have been considered to try to address some of the limitations of the technology. Memory Structure - Current memory approaches can be Fig. 8. Typical access latency for different memory technologies with a processor running at 4GHz [3]

7 IOWA STATE UNIVERSITY - CPRE581, DECEMBER Fig. 12. PCM Performance improvement with buffering, when compared to 400Mhz DRAM [4] though of as a two dimensional matrix made up of rows and columns. A memory of size xyz could be created by different combinations of row and column sizing. Through using smaller rows, the amount of read/write time required would go down (time & energy saved) while maintaining the same memory size. This would also exploit greater locality because of working with smaller segments of data that would possibly align better with cache block sizing. Further optimizations by matching the block sizing to the row size would provide maximum efficiency for cache transactions. [4] Partial Writes - The second concept is to implement partial writes. The system would have to track memory modifications and write only modified cache lines or words. [4] One benefit to this approach is you re able to leverage the byte addressing capability of PCM, which would lower the number of writes required when a single byte/word changes. It s estimated that by using a combination of the row sizing and partial write approaches, theyd extend endurance of a PCM part by 5.6yrs. [4] 3) Hybrid Architecture: A Storage Class Memory SCM architecture would allow a system to blur the boundaries of volatile(memory) and non-volatile(storage) memory to gain benefits of performance and endurance. [2] It would allow intermediate storage solutions where a non-volatile memory with 1-3uS of latency would be acceptable and also high end storage solutions needing closer to a 300ns latency. The terms storage class and memory class have been coined to refer to these concepts respectively. [2] An example set of architectures are shown in Figure 13. A hybrid DRAM approach takes a PCM and pairs it with some kind of DRAM buffering to create a multilevel memory. [6] It complements existing memory technology with new gap filling technology that would lower static power usage and cost. [2] Two ways to look at the physical design of the hybrid system would be to create PCM with a DRAM buffer built in or keep the DRAM and PCM separate (more like the current cache and main memory architecture). If combined this could be considered a possibly replacement solution for a mobile processor RAM & flash combo Package On Package (POP) memory. [7] PCM Aware Paging Fig. 13. (a) traditional, (b) a flash based disk cache (storage class architecture), (c) PCM as main memory, and (d) a hybrid PCM approach with DRAM as a buffer (memory class architecture) [3] One approach to a hybrid design, is to use the PCM as a media to offload or page out from DRAM. It would bridge the gap between the latency of DRAM paging to hard drive storage and power/size/cost of getting faster/larger storage capability DRAM. [18] This design would enable parts of the DRAM to possibly shutdown and rely on the PCM to store state data until the system resumes and data is retrieve back into DRAM. [8] In-order to achieve this design, there has to be logic either in the cache controller, DRAM controller or the OS. [7] This design suggests that the OS adds additional data to it s virtual memory management scheme that would separate all data to either be hot or code. Leading to all infrequently updated cold data to leverage PCM as a very low power storage alternative to DRAM or spinning a hard disk. [18] Any hot data would still be allocated into DRAM to prevent the PCM endurance and write latency issues. Lazy Write Another hybrid approach would be the concept of a Lazy Write. The DRAM controller could act like a cache controller with a page being the unit of granularity between it and the PCM. [3] With the goal of reducing the number of writes to the PCM and working around the slow write speed. As shown in Figure 14, all inbound writes are queued/cached and not immediately written through. [3] This is done directly to address PCM write latency and prevent stalls. In the scenario of servicing a page fault, the page would be fetched from the hard disk and only written to DRAM cache. The PCM

8 IOWA STATE UNIVERSITY - CPRE581, DECEMBER would still have a spot reserved for this data, but would not populate it until it s evicted from DRAM. When evicted, it would check to make sure the data had changed and only write out to PCM if a different. On a DRAM miss, the page would be retrieved from PCM. This concept would assume that any tags used by the DRAM for handling the queuing and DRAM cache are located in SRAM for speed. [3] In summary, this buffer/cache technique would lower access latencies of PCM through caching and also reduce high write latencies through queuing. Also as an added benefit, it would help with the inherit wear limit of writes by reducing periodic/repeat writes to the same location. Fig. 14. An example of a DRAM cache/buffer design that compliments PCM. [3] To further enhance the DRAM caching method, it would be possible to extend it to have smaller sized pages. A write could be tracked as part of the DRAM tagging, which would leave significantly less lines needing to be written for a single dirty piece of data. [3] B. PCM as a storage alternative 1) Traditional Architecture: Today s commercial flash technologies primary center on two types of flash (NAND and NOR). NOR flash provides a significantly faster read random access time (around 10ns). [9] It s designed to be used in environments where the data is traditionally read-only or infrequently written. This is because of the 10us write access time and it s 100k writes limitation. [9] The write access time ends up longer because of its larger reprogramming currents and slower programming throughput. These limitations have prevented the capacity to match the density of NAND and also will prevent it from scaling for future use. NOR memory layout is closer to the hierarchy of the DRAM and can support execution of instructions directly from memory. Replacing this memory has been the first real look at a use for PCM. The required endurance is within reach; PCM can offer the scaling/density that s needed, and PCM actually improves on the write access time, while still providing the subblock addressability to the memory. [2] NAND flash has higher density because of it s smaller cell size. The challenge for PCM density is to overcome NAND. NAND already has approaches and products implementing 1-4bit per cell technology (Multi level cell MLC). [2] If PCM is to do block writes to get faster write data rates, then a power efficient method will need to be developed. NAND does use a block architecture to achieve faster programming ( 200us/2000bytes) throughput then NOR. [9] Since NAND is block based, it can t be addressed for direct program execution, as if it was system memory. For performance, the block accesses do make up for NAND s access times in some cases, but the overall read access times ( 25us) are slower then NOR flash. It s primary application has been in mass storage applications (like the solid state drive in the MacBook Air [3]), where the efficiency of this design is focused on large data transfers (multiple blocks), rather then subblocks. [2] [9] When comparing NOR and NAND, each one has it s place, with one as a program memory and the other as storage (respectively). PCM offers a solution to hopefully take the best of both technologies and solve the scalability and access time issues. 2) Filesystem Approaches: Most flash filesystems have been designed for use with NAND. They have performance overhead issues and the mounting time and memory usage that don t scale. The way the system handles the metadata, can also cause unnecessary writes and latency. PCM could be leveraged in his situation for it s ability to be byte addressable on a high density device. This opens up opportunity for having a larger flash device that can do in-place writes instead of block accesses like with NAND. This would yield fewer unnecessary writes to each word, since in the case of NAND block access you d be blindly writing to a number of words, in-order to accomplish a single write. [10] For example when using a filesystem on top of your storage media (flash, harddisk, PCM), you could have a number of write accesses. Especially if your filesystem is journaling or using shadow paging (i.e. Journal is written, then date or shadow/duplication of the file tree is done when there s a update/write, Figure 15). For a Fig. 15. Reallocating of new space to place filesystem changes in. [15] NAND flash, this would result in a number of block accesses and would degrade the life of the flash with unnecessary writes to words in the blocks that don t change. Here are a few approach to this problem and how they could be addressed on PCM. Short Circuit Shadow Paging One approach is using Short Circuit Shadow Paging to modify an individual byte in-place. It would use 64bit writes to write the data and metadata at the same time to a specific location. If something changes, an in-place update like in Figure 16 would only access the bytes or pointer needing to be changed. [15] This also holds true for in-place append (Figure 17)

9 IOWA STATE UNIVERSITY - CPRE581, DECEMBER and partial writes. All having the modifications made by byte addressing and making minor changes to pointers, rather then large duplication or index/journal updates. Fig. 16. Short Circuit Shadow Paging - In-place update/write. [15] Fig. 18. Flash Filesystem Layouts [11] Fig. 17. Short Circuit Shadow Paging - In-place append. [15] PCM Flash Filesystem (PFFS) PFFS would take a different approach to the metadata and size of memory access when compared to YAFFS2 (currently used for NAND fs). It s estimated that these changes would yield a small file write access performance increase of 25% when PFFS is used instead of YAFFS2 and maintain the comparable mount time and memory usage. [11] A traditional NAND flash r/w would be 2kB of a page, with it having a 64B reserved section. If the transaction was for a write, an additional erase operation would have to occur prior to the write. Current practice is to out-place that write access by using a block that s been erased previously and abandoning the current block. At some later point a garbage collection would cleanup that previously used block. Several filesystems have been created to work around the limitations of NAND flash. [11] JFFS2 is a log structured filesystem that stores sequential nodes containing data and metadata. It does not consider NAND flash limitations (how spare regions and r/w units need to be tracked). [12] YAFFS2 in Figure 18 is designed to aligned with NAND limitations to make it more efficient then JFFS2. [13] Both have scaling issues because of how they store the location of a updated page at a location in the NAND flash. So when the filesystem is mounted it must be scanned to build a table of those locations into main memory. Thus if the flash gets larger and has more content the memory and time required to build that table increases linearly. [11] CFFS is an addition filesystem that stores metadata separate from data and reduces the area needing to be scanned when mounted. [14] Less area also means faster garbage collection, but it causes additional writes to modify index pointer values on each write. PFFS tries to resolve the issues with the YAFFS2 and at the center of it s improvement is how a PFFS system uses both NAND for storing the data and PCM for storing the metadata. Breaking the design into these two parts allows for a couple key advantages. The first is lowering the overhead of updating the metadata. Currently when YAFFS2 has an update to it s metadata it could result in a 512 to 2kB write. When the actual data being updated is just a few bytes. With frequent updates, this leads to a lot of page invalidates that would degrade the performance of the NAND flash. Instead with PFFS, the metadata is updated at the 2byte word granularity for more efficient access. It also doesn t require any garbage collection for the metadata blocks like YAFFS2 does, since those blocks are not mixed in with data and stored in NAND. The second eliminates using the main memory to store all the directory structure and data indexes, like with YAFFS2. This allows a consistent mounting time and memory usage. The PFFS s PCM would contain all of this directory and data index pointer information, not requiring the NAND part to be scanned on every mount to find every file and construct a tree. [11] The metadata structure as show in Figure 19, is very similar to Ext2. It consists of super block that stores a basic description of the file system and two bitmaps that identify if an inode or data page is free or not free. Each file is managed through data stored in the inode region, but the data block information is used a little differently. PFFS uses the data block region to store information to manage the NAND flash blocks where the data resides. Lastly the File name stores the file or directory name that this entry represents. Fig. 19. PFFS Metadata Structure [11] Data Indexing

10 IOWA STATE UNIVERSITY - CPRE581, DECEMBER PFFS stores a total of 48 page pointers for indexing the data. 44 data page pointers that point to the first 44 pages of a file 2 indirect index page pointers - each pointer contains additional data page pointers 2 double indirect index page pointers - stores indirect index page pointers and 128 or 512 page pointers depending on page size of NAND For example if PFFS is 2KB, this method of indexing would allow it to address 88KB using the data page pointers in PRAM. [11] This would leave the remaining data to be accessed by using pages of index pointers stored in NAND. The reason the pages of index pointers are stored this way is because of the PCM access speed. PFFS can t write fast enough when compared to NAND doing block accesses. Per research it s been estimated that 1-2% overhead is required for the metadata in this design. (i.e. 40MB of PCM for 4GB of NAND, etc...) This design scales well to allow PFFS to store files larger then 1GB. If the PFFS instead allocated all the data index pointers in PCM instead of relocating them into NAND using indirect index pointers, 20% space overhead would have to be tacked on to the previously mentioned 1-2%. Since with PFFS, each inode only needs 192bytes of PCM, just 32MB of PRAM would be required to store 60k files of max 1GB. [11] Further research has shown that placement of the data index pointers is most efficient in NAND because it s not faster to write over several hundred bytes of data to PRAM. [11] For example if a larger file is being copied, it would require all the data index pointers to be updated. That process won t be efficient if all those pointers were stored in PCM. Instead, up to a point, it s efficient for small files to have their pointers stored in PCM and then after a specific file size the pointers are stored in pages in NAND for write efficiency. Directory Structure The complete PFFS directory structure is maintained within PCM. It stores a set of pointers that create a linked list of the filessytem layout. In-order to effectively traverse this list, PFFS creates a hash key for the filename and saves it in the inode of each respective file. Then the PFFS sorts the directories by the hash key values. This presorting allows quick lookups at a later point. Garbage Collection PFFS still requires the process of reclaiming dead pages back into live pages for use, but only in NAND. Using the metadata that PFFS stores in PCM, it can eliminate the hot data issue where you have metadata stored in the same pages as data in NAND. That scenario always makes a NAND only approach have a large amount of hot data because of regular metadata updates. PFFS eliminates this issue since the metadata is being stored solely in the PCM,leaving the NAND to only have cold or warm data. This difference allows a decrease in the overhead of garbage collection and would substantially increase the life of NAND blocks. Wear-leveling Approach If it s assumed that PCM would provide a more durable substitute to flash and could incur larger amounts of writes before failure, then wear-leveling possibly won t be an issue for PFFS. However since PCM is storing all the metadata to maintain the filesystem, it could be argued that some approach to guarantee the integrity of that data is required. The challenge is going to come from PFFS having a high number of irregular write patterns using the in-place (byte addressability) update method. Those transactions are hard to keep track of because of the granularity that s being exploited. One approach to doing accurate wear-leveling of PCM, is to control the number of times you write and define a size of write access. So hypothetically, PFFS could virtually divide the PCM into 4KB segments. Those segments would be mapped to a physical segment that could be relocated as necessary. A segmentation table could be mapped into a fixed location in PCM and would maintain all the mappings for the virtual to physical metadata locations, in-use bit, and extra usage count information. Using that table, the wear-leveling algorithm could then move the data around based on how active it s being written. So the coldest and hottest data would get swapped in-order to slow the wearing out of that segment. Also an approach is proposed to word level shift the data in a segment to try and level out any of the sub segment words that have gotten written more times then other words in that same segment. [11] Experimental results showing performance, garbage collection, mounting time, and memory usage have been gathered for PFFS using a ARM development kit. The read/write performance were estimated using the PostMark benchmark program. Some of the results can be seen in Figure 20. The graph shows how the total execution time was less for a PFFS approach mainly because of the reduced number of NAND page writes, since metadata was relocated to PCM. PFFS garbage collection is Fig. 20. Comparison of PFFS vs. YAFFS2 Access Performance [11] show in Figure 21 for the performance test above. It s shown that after transactions it s notable that the amount of pages active in the YAFFS2 is much greater then PFFS. Since PFFS doesn t have the metadata changing and causing pages to be invalidated, you re going to have less garbage collection to do. One of the weak points in PFFS performance is when writing large files. Since the access time to update metadata in PFFS isn t terribly fast in the current PCM prototypes, until thats resolved there was a 6% write time penalty for storing the metadata in PCM. [11] As mentioned earlier, the mount time and memory usage are fixed and won t grow based on the flash size and contents. Tables I & II show a series of writes that compare YAFFS2 and PFFS. This shows that PFFS is scalable O(1) for both cases and won t levy a unreasonable

11 IOWA STATE UNIVERSITY - CPRE581, DECEMBER in a non-volatile memory. It also resolves the dependency of memory usage for NAND filesystem use, which can be challenging in an embedded device. The way the memory is allocated is also done slightly different, with the creation of virtual blocks that try to allocated logical pages in the same physical block. This allows a victim block to be merged into another physical block that s still part of the same virtual block, which helps to reduce fragmentation. [17] Another Fig. 21. [11] Comparison of PFFS vs. YAFFS2 Garbage Collection Performance Test environment PFFS YAFFS2 No File write (s) (s) After a 50MB file write (s) (s) After a 100MB file write (s) (s) After a 200MB file write (s) (s) After a 12800KB file write (s) (s) After a 25600KB file write (s) (s) TABLE I A TABLE COMPARING MOUNT TIMES BETWEEN PFFS AND YAFFS2. [11] Fig. 22. Hybrid FTL Page Map with PCM as Storage [17] burden on a low power limited memory embedded system. [11] By separating the bookkeeping and data storage, PFFS has resolved some of the standing issues with NAND flash memory that lead to using large mapping tables in main memory. Thus preventing the lengthy scan times to build that table at mount time and the use of the memory it consumed. It also has yielded at 25% performance increase when working with small files and matches the performance of YAFFS2 when working with larger files. A Flash Translation Layer (FTL) with PCM This concept uses roughly the same memory configuration as PFFS. It pairs NAND with PCM and uses PCM for it s inplace updating and lower access latencies. The hybrid FTL consists of a byte-addressable interface with a FAT filesystem and uses the PCM to store the metadata. This configuration provides fast translation table and metadata access, garbage collection efficiency and less dependency on system RAM. [17] FTL is a block layer that adapts the flash memory to be usable by a higher level OS like FAT. Shown in Figure 22 this concept uses PCM as a storage for the page map table that s normally stored in DRAM. This provides protection from some sudden power failure scenarios, since the table is stored Test environment PFFS YAFFS2 No File write 64KB 48KB After a 50MB file write 64KB 84KB After a 100MB file write 64KB 124KB After a 200MB file write 64KB 300KB After a 12800KB file write 64KB 2572KB After a 25600KB file write 64KB 5228KB TABLE II A TABLE COMPARING MEMORY BOOKKEEPING USAGE BETWEEN PFFS AND YAFFS2. [11] key aspect of this design is how the data can be paged in PCM depending on if the filesystem chooses to take NAND blocks and page them into PCM for fast smaller then block size periodic accesses. This would also allow that data to be accessed via a memory access (byte-addressable) and not a storage access (block). The feature is called Redundant Sector Area [17]. The enhanced FAT filesystem has been modified to improve the write performance, but maintain compatibility with the FAT filesystem standard. Since 30-60% of all writes to flash are related to metadata updates, this design proposes moving that data to PCM, where it can be byte addressed instead of causing multiple NAND block accesses. [17] This leads to key performance improvement of less merging and garbage collection overhead. Since overhead is caused by the random writes of (sub block size) mixed user and metadata as shown in Figure 23. It s been measured that on average only 11% of a Fig. 23. Hybrid FAT Meta and User data layout compared to traditional a NAND filesystem [17] 512byte metadata sector is repeatedly modified. So using byte addressable access to parts of the metadata has a measurable impact [17]. With the metadata stored in PCM, this creates opportunity to speed up the filename search process. In a normal NAND configuration you d have to search across flash sector boundaries to try to find a file. PCM could be used to store a hash table of extended file attributes that would reduce the number of sector access and filename checks.

12 IOWA STATE UNIVERSITY - CPRE581, DECEMBER Test config. # of files File size # of trans. Total write PM1 40 2K-4KB 750 1,075MB PM KB-4KB MB PM3 60 1MB-4MB 750 1,297MB PM KB-512KB 1.5k 369MB TABLE III A TABLE OF TEST HYBRID FTL/FAT CONFIGURATIONS. [17] This solution was evaluated with an actual PCM part and ARM microprocessor, against the test configuration shown in TableIII. The results centered on 56% reduction on average in execution time when just the FTL layer was optimized with separating out the metadata to reduce the garbage collection time. Additional optimizations to the FAT filesystem on top of that FTL layer, leads to a 83% on average reduction. Figure24 gives an example of a commercial FTP (PMFTL) and normal FAT filesystem against a configuration with just the hybrid FTL and also a configuration with the hybrid FTL and FAT. are skeptical that the reliability of PCM has improved enough for widespread use [24]. A. Data Retention Phase change memory must retain the state of data for a prolonged time. Meta-stable amorphous state. The meta-stable amorphous state of PCM cells is a major reason for phase-change memories higher potential for wear out [20] [26]. Degradation of Material [20]. B. Data Endurance Phase change memory has been shown to fail over time. The failure of a PCM cell is proportional to the number of rewrites. Much research has been done to increase the lifetime of cells. Fig. 24. Test results of hybrid FTL and FAT filesystem execution time. [17] 3) Hybrid Architecture: Expanding on the Storage Class Memory discussion above, the storage class PCM memories would have slower access speed, but still fill a hole between existing DRAM and hard drive storage. It s been simulated (Figure 13) to show that a PCM device, when used as a disk precache, has a 41% improvement in disk energy utilization with a relatively small amount of flash. This allows the hard drive to maximize idle time. [16] In future high performance devices there s possibility of using flash to further bridge the gap between magnetic storage and DRAM. With all of flash s access latency benefits when compared to a hard drive and the opportunity to reduce the size of the DRAM, it makes PCM very lucrative to fix the scalability(power, size) and cost issues of large capacity DRAM. [3] IV. RELIABILITY A considerable amount of research has been done to try to improve, predict, and understand the reliability issue of phase-change memory. Recognition and Skepticism. Phase-change memory has received widespread recognition as a promising candidate to scale down beyond 22nm. This is attributable to the material improved reliability, qui programming operation, and good potential for scaling [?]. However, there are still those who Fig. 25. Cycling endurance shown as a function of pulse energy, illustrating that PCM endurance drops quickly after extended exposure to high temperatures. Reprinted from Reference [2]. Short mode failure. Short mode failure occurs when the device is permanently stuck in a highly conductive state [24]. DC effect. A paper from 1980 [50] describes a phenomena similar to the short-mode failure described above. It is referred to as the dc effect. Dc effect refers to a situation where a meta-stable ON state is persistent for hours after removal of the dc bias. Gleixner characterizes the reliability of PCM in [7], particularly the data retention and disturb risk of PCM as being mostly limited to the chalcogenide s amorphous state due its meta-stable nature. Wear and Endurance Writing to PCM is the main mechanism that leads to wear within the material. Thermal expansion and contraction occurs when current is injected into the material for state transitions. This degradation occurs on the electrode storage contact. This prevents programming currents from being reliably injection into the cells [25]. Other potential causes of cell failure are: Rewriting bits in PCM requires a large current be applied to the material. This can lead to void formation in the material [20]. Contamination of the heating element [20] Interface quality of heater-gst system [24]] Inter-diffusion between GST and adjacent material [24]

13 IOWA STATE UNIVERSITY - CPRE581, DECEMBER Process-related damage: A number of cell failures observed in fully integrated PCM chips have been attributed to process-related damage. [2] Fig. 28. Illustration of the programming disturb phenomenon as a schematic. Reprinted from Reference [24]. Fig. 26. Trench PCM cell showing accelerated failure (decreased RESET resistance as a function of time). Reprinted from Reference [2]. Fig. 27. Comparison of SET and RESET resistances to illustration of the differences between a failure caused by stuck-set and by stuck-reset condition. Reprinted from Reference [2]. Stuck-SET: cell is stuck in the low resistance state Stuck- RESET: cell is stuck in the low resistance state C. Disturb immunity Disturb immunity refers to loss of data during spurious voltage transients [24]. Gleixner characterizes the reliability of PCM in [20], particularly the data retention and disturb risk of PCM as being mostly limited to the chalcogonide s amorphous state due its meta-stable nature. Temperature and Proximity disturbance This paper discusses temperature concerns when scaling PCM. Proximity of PCM cells is a large factor in the temperature of the device so that topics is also discussed [22]. D. Other aspects of reliability research Thermal Crosstalk Thermal crosstalk between cells is the potential to cause degradation of material resistance [24]. This degradation has been a concern and research has been done to prevent it from happening [2]. However, research has shown that is unlikely for a cell to be reprogrammed by a neighboring cell. [24] Polarity Issues Bias-polarity-dependent effects have been observed. Potential factors for the unexpected behavior of the polarity of cells have been hypothesized and researched. These factors seem to contribute to the ability to keep and maintain the expected polarity. They include: overlapping temperature gradients, current flow direction, and the polarity of the applied voltage.?? shows a polarity shift of amorphous material in a phase change device in the direction of the positive anode. The effect was attributed to uniform overlapping temperature gradients with unidirectional current causing the hot spot in PCM cells to be shifted in a direction dependent on the direction of the applied voltage (the thermo-electric Thomson effect). The design of the PCM with a certain polarity (asymmetric, symmetric bridges) has been known to cause undesired effects of be stuck on high resistance state or requiring polarity alteration to produce reliable SET states [2]. Relationship between Polarity and Endurance?? shows a relationship between a stuck-set failure and reverse polarity pulses. Applying a pulse of current in the opposite direction can allow cycling to start again, resolve the stuck-set failure, and allow continued execution for another 10 5 cycles with the original RESET pulse conditions pg. 34. E. Simulation Studies Phase-change RAM using CMOS technology In 2003 Hwang et al. integrated a phase-change RAM using a 0.24 um CMOS technology and then tested the reliability of the device, specifically studying the data retention time, write endurance, read disturbance, and imprint effect. Using data retention time statistics from reading data from storage at varying temperatures and allowing 1% fail bits, the group showed that the chip (a PRAM device based on Ge2Sb2Te5 chalcogenide material and 0.24 nanometer-cmos transistor) met the tests requirements (<1% failure) for 100 hours at 125 degrees Celsius, 10 hours at 150 degrees Celsius,

14 IOWA STATE UNIVERSITY - CPRE581, DECEMBER Fig. 31. Implementation of redundant bit-write removal and row shifting. Added hardware is circled. Fig. 29. Illustration of a polarity shift of the amorphous plug toward the positive anode. Reprinted from Reference [2]. velopment of a hybrid PRAM cache. They also describe the reliability of a PCM cell in terms of mean-time-to-failure which is dependent on processor frequency (f), frequency of write operations(f w ), cell cycle endurance (N fail ). MTTF years = {f w /f * N fail } / The testing showed that write frequency could only be to achieve a lifetime of years for the PRAM cache [?]. Fig. 30. Using reverse polarity pulses on a PCM device showing stuck-set failure, to enable continue use of the device for 105 more SET-RESET cycles. Reprinted from Reference [2]. and for 1 hour at 175 degrees Celsius. Along with this data retention time test, the reliability write endurance, reading disturbance, and imprint effect was determined. Imprint effect refers to situations where cells are stuck in an amorphous or crystalline state. By writing to the PCM with 105 pulses of the same current, they demonstrate that the number of failures does not increase. Rather in their experiment, they see a decrease in the number of failures initially due to the first fire effect, where the threshold voltage shows and initial decreases until stabilizing [21]. Hybrid PRAM cache. In 2008 Mangalagiri and Sarpatwari conducted tests and provide reliability conclusions for consideration during de- F. Mitigation to Improve Reliability Process Integration. Optimize process integration can help to reduce variation in the material properties across the cells [20]. Cell programming. Optimizing cell programming techniques to minimize cell damage caused by the high current used to head the cell during rewrites [20]. Material Composition. Modification of the material composition of the chalcogenide alloy can help to reduce the potential for short-mode failures [24]. Thermal Effects. Thermal effects, caused by power dissipation from use of the PCM, can be reduced by considering the material selection and geometrical design of the device and the cells within [26]. Thermal boundary resistance techniques can help to reduce currents needed 23-reifenberg2008. Other. Figure 32 shows the lifetime of PCM after implementing three techniques to increase the endurance of the material: eliminating redundant bit-writes, row shifting, and segment swapping. V. FUTURE RESEARCH With current research focused on simulating potential configurations and applications of PCM, some papers have already started to discuss prototypes for the future, allowing hard metrics to be gathered. [17] Using that these metrics, there is potential for issues and opportunities to emerge that current simulation techniques are unable to expose.

15 IOWA STATE UNIVERSITY - CPRE581, DECEMBER areas of PCM, including construction and integration, continue to enhance the cost and performance of current and future devices. In addition, much work has been done to improve how well PCM devices are integrated into current and future end systems. These improvements will enable PCM to compete with current technologies and become a viable next generation solution for a scalable memory. REFERENCES Fig. 32. Shows the lifetime of PCM after eliminating redundant bit-writes, row shifting, and segment swapping. Through research for this paper, it became apparent that formal writings have not been published about investigation into the following issues. They would need to be addressed to make this technology feasible for widespread use. The first concern is with the logistics of how a system would boot with PCM as DRAM. A method would need to be devised to create a low impact initialization scheme that would lower the amount of writes required at power on, since the device would be in a state of brief preparation before a instant on would occur. That discussion could then lead into investigation of how the power management resume or standby modes could be tweaked to leverage the PCM as a way to do a memory refresh low power mode without the refresh. This would enable even more power savings in that state. Another area of research involved the possibility of using PCM to provide a locked write capability similar to NOR flash. This would allow for a protected, provable readonly state. Focusing on software, it will be beneficial to investigate methods to write code (or compiler could generate) to be made more efficient for execution specifically with PCM in mind. The same discussion could take place for the data storage and retrieval from PCM devices. In addition to these previously un-researched topics, continued work is required into the issues and enhancements that were previously presented in this paper. These areas include power management, scalability, data retention, endurance, as well as the use of PCM as an enhancement or replacement for existing technology. VI. CONCLUSION The concepts that form the foundation for Phase-change memory technology were discovered over 40 years ago, but only during the last decade most of the research into phase change material s potential has occurred. The need for both a main memory replacement and an alternative to non-volatile high density and high capacity storage are some of the primary driving factors for renewed focus on PCM. Research in all [1] B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, D. Burger, Phase-Change Technology and the Future of Main Memory. IEEE Computer Society, January/February, [2] G. W. Burr, M. J. Breitwisch, M. Franceschini, D. Garetto, K. Gopalakrishnan, B. Jackson, B. Kurdi, C. Lam, L. A. Lastras, A. Padilla, B. Rajendran, S. Raoux, R. S. Shenoy, Phase change memory technology. Journal of Vacuum Science and Technology B, volume 28, issue 2, pages , March/April, [3] M. K. Qureshi, V. Srinivasan, J. A. Rivers, Scalable High Performance Main Memory System Using Phase-Change Memory Technology. IBM Research,T.J. Watson Research Center, Yorktown Heights NY 10598, ISCA, Austin, Texas, USA, June, [4] B. C. Lee, E. Ipek, O. Mutlu, D. Burger, Architecting Phase Change Memory as a Scalable DRAM Alternative. ISCA, Austin, Texas, USA, June, [5] P. Zhou, B. Zhao, J. Yang, Y. Zhang, A Durable and Energy Efficient Main Memory Using Phase Change Memory Technology. ISCA, Austin, Texas, USA, June, [6] M. Ekman, P. Stenstrom, A Case for Multi-Level Main Memory. WMPI, Munich, Germany [7] W. Zhang, T. Li, Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast, and Durable Memory Architecture. Intelligent Design of Efficient Architecture Lab, University of Florida. [8] M. Ekman, P. Stenstrom, A Cost Effective Main Memory Organization for Future Servers. IEEE Computer Society, IPDPS [9] Micron, Technical Note - NAND Flash 101: An Introduction to NAND Flash and How to Design it into Your Next Product [10] B. C. Lee, Phase Change Memory: An Architecture and Systems Perspective. Duke University Fall [11] Y. Park, S. Lim, C. Lee, K. Park, PFFS: A Scalable Flash Memory File System for the Hybrid Architecture of Phase-change RAM and NAND Flash. SAC, Fortaleza, Ceara, Brazil, [12] D. Woodhouse, JFFS: The Journaling Flash File System. Proceedings of the 8th IEEE Workshop on Hot Topics in Operating Systems (HotOS- VIII), [13] YAFFS A Flash le system for embedded use.. [14] S. Lim, K. Park, An Ecient NAND Flash File System for Flash Memory Storage. IEEE Transaction on Computers, vol. 5(no. 7):pp , July [15] E. Nightingale, J. Condit, C. Frost, E. Ipek, B. Lee, D. Burger, D. Coetzee, Better IO through Byte-Addressable, Persistent Memory. Microsoft Research Presentation. [16] F. Chen, S. Jiang, X. Zhang, SmartSaver: Turning Flash Drive into a Disk Energy Saver for Mobile Computers. ISLPED, Tegernsee, Germany, [17] H. Lee, High-Performance NAND and PRAM Hybrid Storage Design for Consumer Electronics. Samsung Advanced Institute of Technology, Samsung Electronics Co Ltd, Yong-in, Korea, January, [18] X. Wu, J. Li, L. Zhang, E. Speight, R. Rajamony, Y. Xie, Hybrid Cache Architecture with Disparate Memory Technologies. ISCA, Austin, Texas, USA, June, [19] Boniardi, M., Ielmini, D., et al. Impact of Material Composition on Write Performance of PCM. Memory Workshop (IMW), 2010 IEEE International, [20] Gleixner, B. and Pellizzer, F. and Bez, R. Reliability Characterization of Phase Change Memory. Non-Volatile Memory Technology Symposium (NVMTS), [21] Hwang, Y.N., Hong, J.S., Lee, S.H., Ahn, S.J., Jeong, G.T. and Koh, G.H. Full Integration and Reliability Evaluation of phase-change ram based on 0.24m-CMOS technologies. Symposium on VLSI Technology, Digest of Technical Papers, , [22] Kim.Kim, S. and Wong, H.-S.P. Analysis of Temperature in Phase Change Memory Scaling. IEEE Electron Device Letters, vol. 28, no. 8, pg , 2007.