Estimating MLC NAND Flash Endurance: A Genetic Programming Based Symbolic Regression Application Damien Hogan Computer Science & Information Systems, University of Limerick, Ireland. damien.t.hogan@ul.ie Tom Arbuckle Computer Science & Information Systems, University of Limerick, Ireland. tom.arbuckle@ul.ie Conor Ryan Computer Science & Information Systems, University of Limerick, Ireland. conor.ryan@ul.ie ABSTRACT NAND Flash memory is a multi-billion dollar industry which is projected to continue to show significant growth until at least 2017. Devices such as smart-phones, tablets and Solid State Drives use NAND Flash since it has numerous advantages over Hard Disk Drives including better performance, lower power consumption, and lower weight. However, storage locations within Flash devices have a limited working lifetime, as they slowly degrade through use, eventually becoming unreliable and failing. The number of times a location can be programmed is termed its endurance, and can vary significantly, even between locations within the same device. There is currently no technique available to predict endurance, resulting in manufacturers placing extremely conservative specifications on their Flash devices. We perform symbolic regression using Genetic Programming to estimate the endurance of storage locations, based only on the duration of program and erase operations recorded from them. We show that the quality of estimations for a device can be refined and improved as the device continues to be used, and investigate a number of different approaches to deal with the significant variations in the endurance of storage locations. Results show this technique s huge potential for real-world application. Categories and Subject Descriptors I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods and Search Heuristic methods; B.3.1 [Semiconductor memories] General Terms Experimentation, Performance, Reliability Keywords Genetic programming, Symbolic regression, Flash memory, NAND, Endurance. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profi or commercial advantage and that copies bear this notice and the full citation on the firs page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specifi permission and/or a fee. GECCO 13, July 6 10, 2013, Amsterdam, The Netherlands. Copyright 2013 ACM 978-1-4503-1963-8/13/07...$15.00. 1. INTRODUCTION Flash memory is the fastest growing product in the history of semiconductor technology [7]. The Flash market was forecast to surpass $30 billion in 2012 [15], with the market for NAND Flash (the type of Flash investigated in this research) projected to grow by 14% each year between 2012 and 2017. NAND Flash [21] is a key component in personal devices such as smart-phones, tablets, and digital cameras, and is also the core component in Solid State Drives (SSDs) [22], with SSD based Ultra-books providing one of the fastest growing areas for the NAND Flash market. NAND Flash storage provides many advantages over harddisk drives (HDDs) including faster performance, lower power consumption, greater durability (due to the absence of moving parts), lower noise emission, and lower weight. However, Flash remains significantly more expensive per gigabyte than traditional forms of storage. A key operational characteristic of Flash memory is the degradation of storage locations through repeated use. This results in these storage locations having a finite lifetime, with the number of times they can be programmed being referred to as their endurance. The endurance is specified in terms of p/e cycles, or the number of times a block (the smallest erasable area, see Section 2) can be programmed and then erased. Both program and erase operations are part of this quantifier because locations must first be erased before they can be programmed. The endurance of Flash memory poses difficulties for manufacturers since locations degrade at different rates; both from chip to chip, and even within the same Flash chip. Currently, no technique is available to successfully estimate the varying endurance values encountered across chips. This results in manufacturers placing extremely conservative endurance specifications on their devices in order to account for the worst performing storage blocks. A feature of the degradation of storage locations through repeated use is that they become easier (lower voltage required) to program and erase as they degrade (see Section 2). This leads to the duration of the program and erase operations decreasing (because current flows for a shorter time) as the storage location continues to be p/e cycled. We show it is possible to use this characteristic of Flash memory to generate solutions which use the program and erase times to estimate the actual endurance of a storage block. In collaboration with partners in industry, Genetic Programming (GP) [17] is used to perform symbolic regression, producing expressions which map the supplied input data to 1285
an estimated endurance. The data used by GP for training and testing the evolved expressions was accumulated over the course of three months through the destructive testing of NAND Flash chips. The primary objective of this research is to investigate the potential to evolve accurate expressions which take, as inputs, the durations of program and erase operations recorded from test blocks and produce, as an output, an estimation of the block s endurance in p/e cycles. A secondary goal is to examine whether the accuracies of the estimates for each block improve as additional p/e cycles are completed, further into the lifetime of the block. Finally, the third objective focuses on the required generality of the solutions. A number of approaches are compared including the possibility of using a single expression to estimate endurance for blocks from all chips. Another approach considered is the use of an ensemble of expressions, with each member responsible for dealing with particular kinds of blocks, such as those from a particular production batch. The final approach proposes the use of classification and symbolic regression together and reports on the potential shown by the regression phase of this technique. The remainder of this paper comprises the following sections. Section 2 provides background information about Flash memory while Section 3 introduces other research related to this topic. Section 4 details the hardware phase of the research and the GP configuration with Section 5 outlining the results achieved. Section 6 details our plans for future work and Section 7 concludes the paper. 2. BACKGROUND Flash is currently the dominant non-volatile memory technology, although there are several others competing to replace it [18]. Flash memory stores information in cells. The main operational component of each cell is a particular type of transistor. Current through these transistors is controlled by the presence or absence of charge on a gate, called a floating gate [20], which is electrically isolated by an oxide layer from the rest of the transistor. The transistors principle of operation relies on the placing and removal of charge on the floating gate. Fowler-Nordheim [8] tunnelling, a quantum mechanical effect, is used to force electrons through the insulating oxide layer and on to the gate (programming) or off the gate (erasing), the quantity of charge on the gate then being a representation of the information stored in the cell. A difficulty with Flash memory arises because the oxide layer that maintains the charge on the floating gate degrades as the cell is repeatedly programmed and then erased [27]. With use, it becomes easier to program and erase the cells and each of these operations therefore requires less time. However, eventually the gate can no longer retain its charge and the memory location becomes unreliable. In NAND Flash memory, the type under study in this paper, bits are represented by charge stored in cells. Read and program operations operate on groups of cells known as a page while erase operations take place on groups of pages referred to as blocks. Blocks must be erased before their pages can be programmed. Early implementations of Flash memory store one bit per cell. A threshold is used to decide whether a gate stores sufficient charge to be considered programmed. Such cells are termed Single-Level Cells (SLC). Current Flash technologies use three thresholds so that a single gate can store four patterns representing two bits. These are termed Multi-Level Cells (MLC) and seven threshold/three bit chips are also available as Three-Level Cells (TLC). This increase in the number of bits stored per cell increases the difficulty in correctly placing precise amounts of charge and in determining the bit pattern stored in the cell. In addition, the time for which the pattern can be stored without the charge leaking away (its retention)decreases. To compensate for these difficulties, Flash memory (onchip) controllers implement Error Correcting Codes (ECC) [23]. Supplementary information is retained in extra cells (unavailable to users) that make it possible to detect and correct errors in the data stored. Once the number of errors passes a critical threshold, however, errors can no longer be corrected and the controller marks the block as bad and removes it from service. ECC is expensive, both in terms of silicon real estate and processing time, so there is a limit to how much can be implemented practically. The devices used in this work are capable of detecting and correcting 12 bits in each 512 byte sector, or part of a page. Equipment manufacturers who use Flash in their devices need to know and rely on their endurance specifications. However, determining the endurance of a chip is both expensive and time consuming because blocks on the chip need to be repeatedly p/e cycled to destruction. Chips with the same nominal specification but manufactured from different wafers or in different manufacturing runs have differing behaviours. This leads chip manufacturers to place extremely conservative specifications on their chips. More accurate specifications of chip endurance would save users and manufacturers money. Moreover, the effective capacity of devices using Flash chips would be increased since less additional capacity to allow for the decreasing number of available blocks (over-provisioning) would be required. 3. RELATED WORK In particular, Desnoyers [6] and Grupp et al. [10] provide good overviews of NAND performance and also include observations of the change in program and erase operation durations as storage locations are cycled. Other related papers include analysis of endurance [4] and proposals for improved performance including endurance [25, 24]. There have recently been a small number of publications reporting the use of Evolutionary Algorithms with Flash memory. Sullivan and Ryan [26] reported on their use of Genetic Algorithms to extend the endurance of NOR Flash memory by optimising some of the Flash device s internal control parameters. Hogan et al. constructed a successful technique to perform binary classification on NAND Flash blocks according to their endurance [12, 13], that is, predicting whether or not a block would survive past a particular number of cycles. This group also examined the potential use of retention classifiers [14]. Arbuckle et al. provide overviews of this and related work [2, 1] and also compare a number of statistical and machine learning techniques for the purpose of endurance classification [3]. Our research is concerned with performing symbolic regression using GP to estimate the endurance values of NAND Flash blocks. We have not found any previous reports of this approach to the endurance problem in the literature. Finally, as an alternative view to that in which Flash market dominance will soon come to an end [9], we note that 1286
Chiu [5] has reported endurance of up to 100 million cycles which may signify a new beginning for Flash memory. 4. EXPERIMENTAL SETUP In order to generate endurance estimation expressions, GP was utilised together with data acquired from the destructive cycling of blocks on a number of MLC NAND Flash chips. Data recorded from each block included the durations of the program and erase times at regular intervals, and also the total number of p/e cycles successfully completed by the block prior to failure. The primary goal of this research was to evolve expressions to estimate the number of p/e cycles that a block would complete based only on its program and erase timing information. We will now provide an overview of the data acquisition process, followed by an analysis of the accumulated data, and a discussion of the GP configuration used in these experiments. 4.1 Data Acquisition A distinguishing characteristic of this research is the use of data collected from the destructive testing of Flash chips. A Flash test platform was acquired which facilitated precise chip control, such as reading, programming, and erasing particular locations. Each platform contained the MLC NAND Flash chip under test, and also a single-board computer running a lightweight Linux distribution, accepting commands in the form of plain text strings from a test server. Blocks were p/e cycled by programming all bits in each of their 128 pages to contain 0 and then immediately erasing the entire block so that all bits were reset to 1. Every 1,000 cycles, the condition of the block was evaluated by calculating the number of errors occurring in data stored within the block. This process was performed by first programming each page within the block to contain a specific data pattern. The data stored in all of the block s pages was then immediately read, counting the number of bit differences occurring between the retrieved data and the programmed information. The number of bit differences is referred to as the error count. Error counts were obtained every 1,000 cycles until the number of errors exceeded the maximum number of errors correctable through ECC. The number of p/e cycles completed at the point of failure was the actual endurance of the block. As well as calculating the error counts and determining the endurance, the durations of program and erase operations were also recorded at intervals of 1,000 cycles. Since each of the 128 pages needed to be programmed for each block, the program time stored was the average of the 128 recorded values, while the erase time was simply the duration of the single erase operation required to reset all cells within the block (because all pages are erased in a single operation). In total, 96 blocks were p/e cycled to destruction on each of six chips, with three chips being tested in parallel through the use of three Flash test platforms. These chips were all produced by the same manufacturer to the same specification. Three chips from each of two different production batches, or manufacturing runs, were examined. This data acquisition phase of the research was extremely costly in terms of the time taken, since it took almost three months to accumulate just 576 data points. Data points are required to train and test supervised machine learning systems by providing inputs to the system (operation durations) together with the corresponding out- Table 1: Endurance Statistics. The minimum, mean, maximum, and standard deviations for blocks evaluated on each individual chip are listed, together with combined information for groups of chips. Chip Min Mean Max Std. Dev. 1(A) 22,000 29,417 46,000 6,044 2(A) 23,000 28,771 39,000 3,598 3(A) 31,000 41,188 57,000 7,545 4(B) 64,000 85,240 119,000 12,404 5(B) 60,000 77,073 114,000 10,632 6(B) 115,000 158,031 225,000 31,312 Batch A 22,000 33,125 57,000 8,249 Batch B 60,000 106,781 225,000 41,715 All 22,000 69,953 225,000 47,544 put (endurance). Each data point represented a single block, and contained the duration of program and erase times at 1,000 p/e cycle intervals up to 15,000 p/e cycles together with the endurance value, or number of p/e cycles completed by the block. 4.2 Data Analysis The error counts and timing data recorded from the test blocks change in a predictable way as each block is p/e cycled. Through the degradation of the insulating oxide layer in each cell (see Section 2), errors become more likely, increasing the error count as the block is cycled. The degradation of the oxide also makes it easier to program and (initially, see below) erase the cells, resulting in the durations of these operations decreasing through repeated cycling. The difficulty posed in evolving an expression to estimate the endurance of test blocks is illustrated in Table 1. This table highlights the fact that even though all chips are of the same make and model, the endurance of their blocks varies significantly. The range of endurance values across all chips is 203,000 cycles with a standard deviation of 47,544 cycles making the task of estimating a particular block s endurance very difficult. Figures 1 and 2 display violin plots [11] which demonstrate the change in program and erase durations as the number of cycles completed increases. The violin plots show the distribution of data, with the widest areas of the plot, shaded grey in this case, showing the areas where most data points occur. The top of the plot represents the maximum value while the bottom is the minimum point. The black rectangle marks all points occurring within the second and third quartiles while the white dot marks the median of the data. The durations of program operations slowly decrease as blocks are cycled. After 1,000 cycles, the fastest block had an average page program time of approximately 716µs while the slowest block required almost 830µs to program each page. As well as initially having a wide range of program durations, these times decrease at different rates, as can be seen in the violin plots in Figure 1. After 15,000 cycles, the difference between the minimum and maximum values clearly exceeds its initial value, meaning that the range of timesrecordedfromblockshasincreased. The durations of erase operations initially decrease by up to 35%, then remain around a minimum value, before later sharply increasing to values up to 30% longer than the initial recorded times. However, this research will only use timing 1287
Program Time (μs) 650 700 750 800 Erase Time (μs) 1800 2000 2200 2400 2600 2800 1000 4000 7000 10000 13000 p/e Cycles Completed Figure 1: Distributions of Program Durations. The duration of program operations decreases as the number of p/e cycles increases. 1000 4000 7000 10000 13000 p/e Cycles Completed Figure 2: Distributions of Erase Durations. The duration of erase operations decreases (until it reaches a minimum value) as the number of cycles increases. data recorded during the first 15,000 p/e cycles. This range of cycles was chosen since it is large enough to facilitate 15 different tests while also being less than 22,000 cycles, which is the point at which the first blocks fail. This means the previously described increase in erase time will not be applicable here since it occurs later in the blocks lifetimes. Having quickly decreased to a minimum value, or fastest erase time, the duration of the erase operation then remains relatively static for a significant number of cycles. This phenomenon is apparent in the violin plots representing higher numbers of cycles in Figure 2. The quick decrease in erase time is easily identifiable but it is also noticeable from the violin plots that many of the erase times decrease towards a particular minimum value. Initially, after 1,000 cycles the bottom of the violin plot forms a point, but this later becomes a flatter, wider surface as the same erase times are recorded from more and more blocks. It must be emphasised that the program and erase times change at different rates from block to block with stronger, higher endurance blocks showing slower changes and weaker, lower endurance blocks decreasing more quickly. As discussed above, the erase durations of many blocks had decreased to a minimum value within the 15,000 cycles illustrated in the violin plot. These lower erase times are recorded from the weaker blocks since the erase durations recorded from blocks with higher endurance values do not decrease to the minimum value until later in their lifetime. 4.3 Genetic Programming The objective of this research is to establish a technique to estimate the endurance of blocks within Flash chips. In particular, the aim is to show that the quality of each estimation can be improved as more timing data becomes available from the relevant test block. GP was used to perform symbolic regression to evolve expressions that took as inputs, program and erase times, and produced as the output, an estimated endurance value. The fitness of each GP individual was evaluated by calculating the mean error between the actual and estimated endurance values for each data point in the data set. The data set for use by GP comprised 576 data points and was created using the process described in Section 4.1. Each data point represented the information captured from a single block and contained the durations of program and erase operations recorded at 1,000 cycle intervals up to 15,000 cycles. Combinations of this timing data formed the inputs for each generated expression throughout the evolution process. The total number of cycles successfully completed by the block was included in each data point and represented the output, or target result, for each expression. In order to evaluate the ability of the evolved expressions to generalise to previously unseen data, five fold cross validation was used, requiring the division of the data set into five equally sized parts. However, the first fold contained an extra data point since 576 is not divisible by five. Each single GP run then used four folds as training data (during evolution), and reserved the fifth fold for use as test data to evaluate the generalisation ability of the best performing expression. This resulted in approximately an 80% training and 20% testing split for each GP run. This cross validation process allowed five different compositions of the data set since five different folds could be used as the testing portion of the data set. Thirty GP runs were performed on each different data set composition, giving 150 runs in total for each set of inputs. During initialisation of the population and throughout the evolutionary process, expressions which resulted in invalid values were discarded as proposed by Keijzer [16], rather than using guarded operations. The ephemeral random constant, also proposed in Keijzer s paper, which randomly chooses constants maintaining a mean of 0 and a standard deviation of 5, was also used. GP runs that evolved expressions producing results greater than five standard deviations from the mean test error for each set of inputs were removed from the result set. This filtering was performed since these anomalous results significantly affected the mean 1288
Table 2: Sets of Inputs Used. Experiments were performed using data recorded at intervals of 1,000 cycles up to the completion of 15,000 cycles. Test Input 1 Input 2 Input 3 Input 4 1 pt1000 et1000 2 pt1000 et1000 pt2000 et2000............... 10 pt1000 et1000 pt10000 et10000............... 15 pt1000 et1000 pt15000 et15000 Table 3: Tableau showing GP parameters. Parameter Details Objective Accurately estimate the endurance for each test block. Function Set +, -, *, / Terminal Set Program and erase timing information as listed in Table 2. Keijzer ERC, where the ephemeral random constant (R) has a mean of 0 and a standard deviation of 5. Fitness The mean error between estimated endurance and actual endurance across all points in the data set. Generations 100 Population 1000 Crossover 0.9 Mutation 0.1 Elitism 2 Max Tree Depth 10 Selection Tournament selection, size = 7. results achieved, and accounted for approximately 1.07% of the total runs performed. Fifteen different sets of inputs to the GP system were evaluated, each utilising timing information following the completion of an increased number of cycles as can be seen in Table 2. Each set of inputs was tested across 150 GP runs. This facilitated investigation into the change in quality of the estimations as timing information recorded later in the lifetime of blocks was supplied. The sets of inputs were formed, in turn, from the program and erase times recorded after 1,000 cycles together with those recorded at each of the remaining cycling intervals up to 15,000 cycles. The first of the 15 tests comprised just two inputs the program and erase times after 1,000 cycles while, for example, the inputs for the fifth test were formed from the times after 1,000 cycles together with the times recorded after 5,000 cycles. Providing the GP system with two pairs of program and erase inputs, recorded at different intervals, allowed the potential for learning from the rate of change of the program and erase times. ECJ, a Java-based evolutionary computation research system [19] was utilised to perform the GP runs described above. Initially, trial runs were performed to assess GP parameter settings such as the function set operators, maximum tree depth, and a number of crossover / mutation rate combinations. Table 3 provides details of the resulting GP parameters which were used throughout the experiments. A population of 1,000 individuals evolves for 100 generations, using tournament selection together with sub-tree crossover and point mutation. Mean Percent Error 0 5 10 15 20 25 30 0 5000 10000 15000 p/e Cycles Completed All Chips Batch A Chips Batch B Chips Figure 3: Batch Level Results. Each point represents the mean best test error for runs upon completion of increasing numbers of p/e cycles. 5. RESULTS Each batch of tests examined 15 different sets of inputs in order to establish how the accuracy of the endurance estimation changes as more cycles are completed. Initially, the data set comprised all available data points from the six test chips. Following this, the same experiments were performed, but with smaller data sets, in which the data was divided according to the production batch of the chips. Finally, the potential for combining the use of classification and symbolic regression was considered, with regression performed on three groups of data points of similar endurance. As mentioned in Section 4.3, prior to starting the GP runs, the entire data set was randomly subdivided into five partitions, or folds, ensuring that each fold contained roughly the same number of data points from each chip. Care was also taken to ensure that each fold contained a similar distribution of endurance values. The results presented in the following sections refer to the mean percentage error achieved on the test data by the single best training expression from each GP run. 5.1 Data Set Comprising All Data Ideally, a single expression would be evolved to estimate the endurance of any block on any chip and from any production batch. The first set of tests examined the potential for achieving this, since the data from all six test chips was included in the data set. Referring back to Table 1, we can see that this data set ranged across 203,000 endurance values and had a standard deviation of 47,544 cycles. As can been seen in Figure 3, having initially reported a mean error for the endurance estimation of approximately 25%, the accuracy of the evolved solutions continually improves, achieving a minimum value of around 15%. Figure 4 plots the mean best training fitness per generation for four tests performed using all data points. The plots for only four of the 15 tests are provided since it becomes difficult to distinguish them from each other as more are added to the diagram. These plots confirm that the evolved 1289
Mean Percentage Error 0 10 20 30 40 50 1,000 p/e Cycles 5,000 p/e Cycles 10,000 p/e Cycles 15,000 p/e Cycles Mean Percent Error 0 5 10 15 20 25 30 All Blocks Endurance <= 30k 30k < Endurance <= 60k 60k < Endurance <= 90k 0 20 40 60 80 100 0 5000 10000 15000 Generations Completed p/e Cycles Completed Figure 4: Mean Best Training Fitness. The mean data for four sample tests is presented showing that GP initially learns at a fast pace before later slowing. Figure 5: Block Level Results. Each point represents the mean best test error for runs upon completion of increasing numbers of p/e cycles. expressions are improving as the number of generations increases meaning that GP is learning as evolution proceeds. The huge performance difference between blocks within this data set had a significant effect on the results achieved. In order to investigate further, the next step in this research divided the available data points into two data sets; one comprising only chips from production batch A, with the other only containing data from production batch B. 5.2 Batch Level Division of Data The data set comprising only blocks from batch A had a standard deviation of 8,249 cycles, implying that the blocks performed more similarly than those in the previous data set. However, batch B had a significantly larger standard deviation of 41,715 cycles, since chip six hugely outperformed the other two chips sampled from that production batch. The results of the batch level tests are plotted in Figure 3. As expected, based on the standard deviations of the two data sets, more accurate expressions were evolved for the data points from batch A than from batch B. It is also significant that both sets of tests outperformed those from the initial data set. The results for batch A decreased from a mean error of 16.57% after 1,000 cycles to 9.10% after 15,000 cycles. Meanwhile, the more varied, and thus more difficult, batch B data set reported an initial mean error of 20.07% but this decreased to 10.17% after 15,000 cycles. These batch level results are a significant improvement on the experiments comprising all available data. However, this approach requires evolution of expressions for all (including future) production batches, requiring the time consuming data acquisition phase (see Section 4.1) to be performed on chips from each extra batch. It would be preferable to evolve a fixed number of expressions, rather than having to evolve new expressions for each additional production batch. With the aim of evolving a fixed number of endurance estimation expressions capable of working with all production batches, the next stage of this research divided the original data set into a number of smaller, more targeted sets. 5.3 Block Level Division of Data The block level division of data approach is based on the use of both classifiers (such as those proposed by Hogan et al. [12, 13]) and symbolic regression. Firstly a collection of binary classifiers are required to group the data points, or test blocks, by endurance, using the timing information recorded early in the block s lifetime. Secondly, expressions should be evolved using symbolic regression for each of the groups created in the first step. These groups of relatively similar blocks could then be treated as distinct data sets with symbolic regression providing more refined endurance estimations, with each expression targeting a specific category of blocks. Having completed these initial two steps, test blocks from the entire data set can be classified using an ensemble of binary classifiers. This step essentially predicts that the endurance of the block lies between two boundaries. Finally, the corresponding regression expression, for the data points lying within these boundaries, can be used to provide an accurate estimation of the endurance. Since the focus of this research is the application of symbolic regression, only step two is performed in order to evaluate the potential of this hybrid technique. Three similarly sized data groups were created at boundaries of 30,000 cycles since this resulted in relatively equal numbers of blocks in each division. The first group contained all blocks with an endurance less than or equal to 30,000 cycles. The second group contained all blocks having endurance values between 31,000 and 60,000 cycles while the third group comprised those blocks that had an endurance of between 61,000 and 90,000 cycles. Blocks having endurance above 90,000 cycles were not used for this initial, proof-of-concept experiment. Since the classification phase of this approach has not been completed, and would likely introduce some classification error, the results cannot be directly compared with those from the earlier two approaches. However, they can be used as an indication of the potential of this technique. The results plotted in Figure 5 show that all three block level groups perform better than the original, larger data set. In partic- 1290
ular, the 30,000 and 90,000 groups perform extremely well, achieving mean error estimations of just 7.14% and 8.18% respectively after the completion of only 1,000 cycles. These values remain relatively consistent as the number of cycles completed increases, decreasing by only 1%. Upon examination of the division of the data sets, the first sub group (up to 30,000 cycles) is expected to perform well. The data points in this group come from just two chips, and all contain very similar values since the lowest endurance in the entire data set is 22,000 cycles. The data points in the third subgroup are also formed from two chips. However, unlike the first data group, these data points do vary across the available range from 61,000 to 90,000 cycles. Taking this into account, this sub set of the data performs extremely well. Finally, the second set of data points, those achieving between 31,000 and 60,000 cycles, initially perform adequately, although noticeably not as well as the other two block based sub-groups. However, as the number of cycles completed increases, the results show significant improvement, achieving almost the same accuracy level as the other two sub-groups after 13,000 cycles. Having examined the program and erase times (inputs) for these tests, there is no clear explanation for this difference in performance between the second subgroup and the other sub-groups. 5.4 Discussion The characteristic described in Section 4.2, whereby the erase time passes through three different operational phases, affects the results achieved. The duration of the erase time decreases (quickly in weaker blocks) to a minimum level, and then remains relatively static for a significant number of cycles (of the order of thousands) before later sharply increasing to a maximum level. During the period when the erase time settles around its minimum value, the GP system has no way of differentiating between different levels of cycling based on the erase time and, in turn, cannot learn from the rate of change. This explains the decreased rate of improvement as more cycles are completed, especially in the tests using the complete data set, and also the slight increase toward the right hand side (13,000 to 15,000) of a number of the results plots. The block level division of data (classification/regression hybrid approach) suggests that, not surprisingly, having more specialised regression expressions yields better results. Since it has been shown by Hogan et al. [12] that it is possible to classify blocks as being of a particular performance group, this type of partitioning of the data can also be performed. The pilot investigation into this approach focuses only on the initial training of the symbolic regression expressions, and did not attempt to train or test the system using the required classifiers. However, this hybrid technique has shown significant potential and will be comprehensively evaluated in future work. 6. FUTURE WORK We propose to continue our collaboration with partners in industry and construct a larger data set, providing more data points for training and testing, allowing construction of more robust endurance estimation tools. Having accumulated a larger data set, the research will proceed to verify that the classification and regression approaches can be used together to produce the most accurate estimations. Initially, a series of binary classifiers will be evolved using GP to predict whether each test block will successfully complete a predefined number of cycles. Each classifier will be trained to identify a different portion of the data set. As with the regression results reported in this paper, we expect that the classifications will become more accurate as more timing data is supplied. GP based symbolic regression will then be utilised, together with a block level division of data points, to evolve a series of expressions to estimate the endurance for blocks within each of the classification groups. This process will result in the evolution of a number of classification and regression expressions. We will proceed to evaluate the entire process, firstly classifying each block in order to select the most suitable regression expression, and then estimating the number of cycles the block will complete. Following this comprehensive set of tests, we will be in a position to directly compare the results achieved with those from the approaches discussed in Sections 5.1 and 5.2 above. 7. CONCLUSIONS The overall goal of this research was to examine the potential of utilising symbolic regression techniques through GP to generate expressions to estimate endurance. These expressions should only use the program and erase timing information recorded from the block under test. In response to the primary research question posed in Section 1, we have shown that symbolic regression can be successfully used to estimate the actual endurance of blocks. We have also shown that this can be achieved after the completion of relatively small numbers of cycles. Previous approaches to this problem (as discussed in Section 3) had focused on classification but our technique achieves far greater accuracy (predicting the point of failure) using symbolic regression. This is due to the continual refinement of estimations as more cycles are completed and the fact that regression is not limited to classifying within predefined boundaries such as iterative steps of 50,000 cycles. Secondly, the accuracy of the estimates was also shown to improve as data is recorded from increasing numbers of cycles, essentially allowing the system to continually refine its estimation for any test block as more data becomes available. The rate of refinement is fast over the first few thousand cycles, before slowing down, which we expect is related to the properties of the change in erase time as the number of p/e cycles increases. The third, and final research question queried the ability, or necessity, to create a general expression capable of working with all chips. Results show that a general solution is possible, producing reasonable estimations with a mean error of around 25% after only 1,000 cycles, and decreasing to around 15% later. These results are acceptable considering the range of endurance values and variance of the data set. Evolving expressions for each production batch was also investigated but the area which showed most potential was division of the data set at block level. The theory behind this approach was that, combining the classification and symbolic regression approaches, each block would first be classified based on the program and erase information in order to select the most suitable symbolic regression generated expression. In this paper, only the initial symbolic regression training was performed, with the entire classification/regression process to be evaluated in future work. 1291
The techniques described in this paper show huge potential for real-world application. Currently, the endurance specifications for Flash chips are extremely conservative estimates based on worst-case performance of blocks. The research reported here will allow manufacturers increase the specified endurance of Flash chips since they can now predict the quality of blocks within chips after the completion of a relatively small number of cycles. 8. ACKNOWLEDGMENTS The authors would like to thank the reviewers for their helpful comments.this research was funded by an Enterprise Ireland Innovation Partnership Grant, contract number IP/2008/0591. 9. REFERENCES [1] T. Arbuckle, D. Hogan, and C. Ryan. Optimising Flash memory for differing usage scenarios: Goals and approach. In Proc. Int. Conference on Convergence and Hybrid Information Technology, pages 137 140, August 2012. [2] T. Arbuckle, D. Hogan, and C. Ryan. Optimising Flash non-volatile memory using machine learning: A project overview. In Proc. 5th Balkan Conference on Informatics, pages 235 238, September 2012. [3] T. Arbuckle, D. Hogan, and C. Ryan. Learning predictors for Flash memory endurance: A comparative study of alternative classification methods. International Journal of Computational Intelligence Studies, 2013. To appear. [4] S. Boboila and P. Desnoyers. Write endurance in Flash drives: Measurement and analysis. In 8th USENIX Conference on File and Storage Technologies, pages 115 128, February 2010. [5] Y.-T. Chiu. Forever Flash. IEEE Spectrum, 49(12):11 12, 2012. [6] P. Desnoyers. Empirical evaluation of NAND Flash memory performance. SIGOPS Oper. Syst. Rev., pages 50 54, March 2010. [7] Flash Memory Summit. Flash memory backgrounder. http://www.flashmemorysummit.com/english/ Conference/FM Backgrounder.html. Accessed January 28th, 2013. [8] R. H. Fowler and L. Nordheim. Electron emission in intense electric fields. Proceedings of the Royal Society of London. Series A, 119(781):173 181, 1928. [9] L. Grupp, J. Davis, and S. Swanson. The bleak future of NAND Flash memory. In Proc. 10th USENIX conference on File and Storage Technologies, pages 17 24, February 2012. [10] L. M. Grupp, A. M. Caulfield, J. Coburn, S. Swanson, E. Yaakobi, P. H. Siegel, and J. K. Wolf. Characterizing Flash memory: Anomalies, observations, and applications. In Proc. 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 24 33, 2009. [11] J. L. Hintze and R. D. Nelson. Violin plots: A box Plot-Density trace synergism. The American Statistician, 52(2):181 184, 1998. [12] D. Hogan, T. Arbuckle, and C. Ryan. Evolving a storage block endurance classifier for Flash memory: A trial implementation. In Proc. 11th IEEE Int. Conference on Cybernetic Intelligent Systems, pages 12 17, August 2012. [13] D. Hogan, T. Arbuckle, and C. Ryan. How early and with how little data? Using genetic programming to evolve endurance classifiers for MLC NAND Flash memory. In Proc. 16th European Conference on Genetic Programming, pages 253 264, April 2013. [14] D. Hogan, T. Arbuckle, C. Ryan, and J. Sullivan. Evolving a retention period classifier for use with Flash memory. In Proc. 4th Int. Conf. on Evolutionary Computation Theory and Applications, pages 24 33, October 2012. [15] ICInsights. Total Flash memory market will surpass DRAM for first time in 2012. http://www.icinsights.com/news/bulletins/total- Flash-Memory-Market-Will-Surpass-DRAM-For-First- Time-In-2012, December 2012. Accessed 27th Jan., 2013. [16] M. Keijzer. Improving symbolic regression with interval arithmetic and linear scaling. In Proc. 6th European Conference on Genetic Programming, pages 70 82, 2003. [17] J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT press, 1992. [18] H. Li and Y. Chen. An overview of non-volatile memory technology and the implication for tools and architectures. In Design, Automation Test in Europe, pages 731 736, April 2009. [19] S. Luke. ECJ 20. A Java-based evolutionary computation research system. http://cs.gmu.edu/ eclab/projects/ecj/, October 2010. [20] F. Masuoka and H. Iizuka. Semiconductor memory device and method for manufacturing the same, 1980. US Patent 4,531,203. [21] R. Micheloni, L. Crippa, and A. Marelli. Inside NAND Flash Memories. Springer, 2010. [22] R. Micheloni, A. Marelli, and K. Eshghi. Inside Solid State Drives (SSDs), volume37ofspringer Series in Advanced Microelectronics. Springer, 2012. [23] R. Micheloni, A. Marelli, and R. Ravasio. Error Correction Codes for Non-Volatile Memories. Springer, 2010. [24] V. Mohan, T. Siddiqua, S. Gurumurthi, and M. R. Stan. How I learned to stop worrying and love Flash endurance. In Proc. 2nd USENIX conference on Hot topics in storage and file systems, 2010. [25] Y. Pan, G. Dong, and T. Zhang. Exploiting memory device wear-out dynamics to improve NAND Flash memory system performance. In Proc. 9th USENIX conference on File and storage technologies, 2011. [26] J. Sullivan and C. Ryan. A destructive evolutionary algorithm process. Soft Computing A Fusion of Foundations, Methodologies and Applications, 15:95 102, 2011. [27] J. Witters, G. Groeseneken, and H. Maes. Degradation of tunnel-oxide floating-gate EEPROM devices and the correlation with high field-current-induced degradation of thin gate oxides. IEEE Transactions on Electron Devices, 36(9):1663 1682, September 1989. 1292