Correlating Time-Domain Noise to Detect Fraudulent Credit Cards

School of Engineering and Applied Science Electrical and Systems Engineering Department ESE 498 Correlating Time-Domain Noise to Detect Fraudulent Credit Cards By Wenyuan Liu Andrew Wiens Supervisor Robert Morley Submitted in Partial Fulfillment of the Requirement for the BSEE Degree, Electrical and Systems Engineering Department, School of Engineering and Applied Science, Washington University in St. Louis May 2013

Table of Contents List of Figures and Tables 2 Student Statement 3 Abstract 3 Acknowledgment 3 Problem Statement 4 Problem Formulation 4 Project Specification 4 Concept Synthesis 4 Detailed Engineering Analysis and Design Presentation 6 Conclusions 21 References 22 1

List of Figures and Tables Figure 1 Noise region in the data track 5 Figure 2 Mean smoothing filter 5 Figure 3 Output from mean smoother 6 Figure 4 Our LabView design 7 Figure 5 Raw, unprocessed waveform 8 Figure 6 Find Peaks sub-vi 8 Figure 7 Accept distribution before and after adding false peak discrimination 9 Figure 8 FSM sub-vi that finds the swipe direction 10 Figure 9 State diagram of the FSM sub-vi 10 Figure 10 Normalize block 11 Figure 11 Example normalized waveform 12 Figure 12 Filter block 13 Figure 13 Filtered waveform 13 Figure 14 Trim edges sub-vi 14 Figure 15 Normalized and filtered waveform with data peaks trimmed 14 Figure 16 Decimate and sample subset 15 Figure 17 Example waveform after decimation 16 Figure 18 Quantization 17 Figure 1 Example magneprint 17 Figure 20 Correlation block sub-vi 18 Figure 21 Accept and reject distributions generated from Dr. Morley s card database 19 Figure 22 Block diagram of live mode VI 19 Figure 23 Accept and reject distributions generated in live mode 20 Figure 24 Optimizer VI 21 Table 1 Best M, B, filter half-width, and spacing (delta) parameters 22 2

Student Statement We, the designers, have applied ethics to the design process and in the selection of the final proposed design. We have complied with the WUSTL Honor Code. Abstract In this project, we developed using Labview a system that detects fraudulent credit cards using the noise in-between the data on the credit cards. First, we determined the direction of the card swipe and normalized all swipes to be forwards. Then the amplitudes of the peaks were normalized to one, as well as the number of samples between adjacent data peaks to 266(zeros) and 133(ones). The normalized waveform is then high pass filtered with a FIR moving average filter and then broken up from peak to peak. 64 samples were taken in-between the peaks among the 266 samples (only from zeroes). Then we formed a Magneprint of the card by decimating, quantizing, and choosing a subset of the samples. We found the best parameters for the filter, decimation, samples taken, and sample spacing for Magneprint sizes of 384, 768, 1024, 2048 bits that minimized the rate of false positive and false negative swipes. In the end our system was excellent in determining fraudulent cards. Acknowledgment Thanks to Dr. Morley and Ed Richter for their help in this project. 3

Problem Statement Today s market is dominated by credit card transactions. There are many ways to counterfeit credit cards, which creates problems in credit card security. A system to detect fraudulent credit cards is needed, and the system must be robust, consistent, and reliable as well as taking cost into consideration. Problem Formulation We have proposed a credit card security system where the task is to create a unique fingerprint of each individual card called a Magneprint. Fraudulent counterfeits can then be identified when their Magneprints are compared to the authentic card. For the system to be robust, consistent, costefficient and reliable, it must have minimal rates of false positives and false negatives as well as small Magneprint sizes. Project Specifications The Magneprint sizes allowed for performance testing are 384, 768, 1024, 2048 bits. The rate of false positives and false rejects must be minimized and for our scope and initial stages of testing it must be less than 0.5%. Concept Synthesis Literature Review The Magneprint patents provide much information about the nature of the noise found on magnetic cards as well as implementations to generate a fingerprint from the noise. Figure 1 appears in U.S. patent no. 7,478,751 and shows an example of magnetic noise on a credit card as captured by an analog to digital converter while swiping a card through a card reader. High frequency noise can be found between the data peaks. 4

Figure 1 Noise region in the data track In order to isolate the high frequency noise to create a fingerprint of the credit card, the low frequency components of the waveform have to be removed. The same patent describes a method to remove low frequencies from the waveform using a mean smoother (moving average filter). The mean smoother subtracts the average of n neighboring samples from each sample. This filter is useful for creating Magneprints because it has a steep response in the frequency domain. Also, because the filter has a finite impulse response, energy from data peaks in the waveform do not affect the noise region. A block diagram of the filter and an example of a waveform filtered with the mean smoother are shown below as they appear in the patent. Figure 2 shows the block diagram for the mean smoother and Figure 3 shows an example waveform produced by the mean smoother. Figure 2 Mean smoothing filter 5

Figure 3 Output from mean smoother Concept Generation and Reduction Another way to process the noise in between the data peaks instead of using a moving average filter is to linearly normalize the slope of the noise by dividing every sample by the slope calculated from the first and last sample. We thought about this alternative and decided that the moving average provided greater simplicity and it is easier to optimize via sweeping the filter length. Detailed Engineering Analysis and Design Presentation We have chosen LabView 2012 as our tool to build the Magneprint system for its utility and popularity in academia as well as per recommendations from our advisor Dr. Morley. The scheme of the system is to capture unique fingerprints made of noise from individual cards and build a database. Known authentic and inauthentic cards are correlated and the results displayed on histograms to check for system performance, the details of which will be discussed later in this section. Top The general mechanism of our system is to first capture a waveform of a card, which will present with data and noise. Since our database will contain only forward containing Magneprints, we will flip the 6

waveforms from all backwards swipes. Then we normalize the waveforms so that they can be properly correlated. The waveform is then put through various signal processing techniques such as taking only specified chunks of the whole waveform, a moving average filter, decimation, taking subsamples, and quantization to create the Magneprint. The parameters for the Filter, Decimate (and subsample), and Quantize VIs are determined with another optimization VI explained later in this section. In the end each card is correlated with the rest and the results are displayed on histograms. The top-level LabView VI is shown in Figure 4 below. It shows our implementation at the highest level, a pipeline architecture of 1D array of waveforms. Part of the credit goes to Dr. Morley for his code in acquiring a raw waveform from DAT files. Figure 4 Our LabView design Figure 5 shows a raw waveform captured by swiping a Washington University student ID thru a Magtek card reader provided by Dr. Morley. The waveform is not processed in any way; as depicted, it contains around 85k samples. 7

Figure 5 Raw, unprocessed waveform Find Peaks Below is a figure of the sub VI we created to find the peaks in the raw waveform. It uses a peak detector block to find all peaks in the waveform. Figure 6 Find Peaks sub-vi At first we used the raw peaks from the peak detector. However, we found that the peak detector found not only the data peaks, but also picked false peaks due to noise as well. This was a problem for the accept distribution because the Magneprints must line up correctly. This sub VI we designed tries to detect and discard false peaks. It does by checking both the number of samples between the peaks and the amplitude of the peaks. 1. First, the Find Peaks sub-vi determines the number of samples between each consecutive pair of peaks. 8

a. If the percent difference in the number of samples between two peaks is within +/- 15 percent compared to the previous two samples, then the peak is considered true and the data bit is a zero. b. Otherwise: i. If the percent difference is between -35 and -70 percent, that is, there are 35 to 70 percent fewer samples between the two peaks less than the last zero bit, then: 1. If the percent difference in amplitude since the last peak is within +/- 60 percent, then the peak is considered true and the data bit is a one 2. Otherwise the peak is considered noise and discarded. ii. Otherwise the peak is considered noise and discarded. To find the percentages above, we made a few swipes and plotted histograms of the peak-to-peak change in the separation and change in amplitude between peaks. The histograms revealed that the user is not generally capable of a bit-to-bit change in speed greater than about 15 percent. After implementing our Find Peaks sub-vi, our accept distribution was much improved. Since the number of false peaks was reduced, the Magneprints lined up better than before. The diagram below shows the improved accept distribution before and after implementing false peak discrimination. Figure 7 Accept distribution before and after adding false peak discrimination FSM We designed a sub VI called FSM that determines the direction the card was swiped. It determines the direction bit-by-bit using a finite state machine that detects the start sentinel. The LabView code is shown in Figure 8, and the state diagram is shown in Figure 9. 9

Figure 8 FSM sub-vi that finds the swipe direction 0 0 1 1 1 2 0 3 1 4 0 5 1 6 1 7 1 8 1 9 0 1 0 10 backward 1 0 0 11 forward 0 0 0 1 Figure 9 State diagram of the FSM sub-vi 10

Normalize In order to correlate the time-domain signal, each swipe s waveform must be normalized, otherwise the speed of the swipe will change the correlation. The normalize block uses the Find Peaks block to split the waveform into pieces correcponding to peaks in the data. It throws out ones, keeping only zero bits. This is because ones have a higher frequency than zeros, and the noise section is too small to be very useful for building Magneprints. After splitting the waveform, it resamples each sub array to 266 samples. Then, it normalizes the amplitude of the entire waveform so that the data peaks are 1. It does this by comparing the absolute amplitude of the data peaks, linearly interpolating between the data peaks, and dividing each sample by that number. Figure 10 shows the VI for the normalize block and Figure 11 shows a sample waveform of a normalized swipe. Figure 10 Normalize block 11

Figure 11 Example normalized waveform Filter This block implements the mean smoothing filter described in the patent. The sub-vi pads the beginning of the input waveform with n zeros, where n is the half-width of the moving average filter. It then filters the original input waveform with a moving average block, subtracts the filtered waveform from the padded waveform, and then removes n samples from the beginning. Figure 12 shows the block diagram and Figure 13 shows a sample waveform filtered with the mean smoother. 12

Figure 12 Filter block Figure 13 Filtered waveform Trim Edges This sub-vi removes the data portion of the waveform leaving the noise portion. Our implementation removes the first and last 101 samples from each 266-sample data bit leaving 64 samples of Magneprint per bit. Figure 14 shows the block diagram and Figure 15 shows a sample Magneprint waveform. 13

Figure 14 Trim edges sub-vi Figure 15 Normalized and filtered waveform with data peaks trimmed 14

Decimate and sample subset Here is our Decimate and sample subset subvi. The input waveform is first converted to an array. Then based on separation, the array is effectively decimated by taking only the nth element and building a new array containing those n elements. When the desired number of samples is reached, the inside loop which builds new array stops. In addition, if a large enough separation (delta) is chosen so that the end of the input array is reached before taking the desired amount of samples (signified by the greater than or equal to comparison block), the loop also stops. That further ensures robustness of the whole system. Figure 16 Decimate and sample subset 15

Figure 17 shows a resulting waveform after having a sample separation of 67 samples and taking 96 total samples. As you can clearly see when compared to the previous waveform (Figure 15), the number of samples has greatly decreased from around 11k samples down to 96 samples. Quantize Figure 17 Example waveform after decimation Here is our Quantization sub-vi. The input waveform is quantized based on B, the number of bits. Quantization represents a waveform in discrete levels based on the number of bits desired. So if B is 4, the number of levels a waveform can be represented in is 2 4 = 16. The higher the number of bits, the more levels the waveform can be represented in and so the resolution is higher. But one drawback of having a high number of bits is that it increases the size of the Magneprint, and that is why we want to optimize B and get the smallest B possible while still maintaining system performance. Part of this quantization sub-vi came from Dr. Morley. 16

Figure 18 Quantization Here our waveform is quantized into 16 levels since B is 4. As you can see there are some flat portions of the wave form which is representative of quantization, where the samples having amplitudes near that level all got rounded to that level (shown by the circles). Correlate Figure 2 Example magneprint The correlate block shown in Figure 20 finds the correlation between every pair of cards in the input 1D array of waveforms. Each waveform has attributes telling the card number, swipe number, swipe direction, and average speed. The accept distribution is populated with correlation coefficients from pairs of cards that share the same card number but different swipe numbers. The reject distribution is 17

populated with correlation coefficients from pairs of cards that have different card numbers. A distribution of average speeds is also populated, using the average speed attribute that was added to each waveform by the normalize block. Figure 20 Correlation block sub-vi Figure 21 shows the results after running the top level VI using a database of swipes provided by Dr. Morley. There were a total of 264 swipes in the database containing 75 different cards and around 3 reswipes for each card, with some additional swipes swiped backwards. The parameters used were: Filter=17, Delta=89, M(subsamples to take)=96. B(number of bits)=4. As you can see for the reject distribution the histogram is a near perfect Gaussian distribution centered around 0. The histogram is also fairly tight, meaning the standard deviation is small, with the range above and below -0.4 to 0.4. The accept distribution also looks excellent, with no correlations below 0.9. This means that the two distributions do not overlap, and so there are no false positive swipes (a card deemed authentic when it is not) and no false negatives (a card deemed inauthentic when it is). This further demonstrates the robustness and accuracy of our system. 18

Figure 21 Accept and reject distributions generated from Dr. Morley s card database Live Mode Figure 22 shows the top level VI for live mode operation. The user is able to swipe cards using the card reader and in real-time, the program is able to build the accept and reject histograms. Pressing the Authentic button will put the next swipe into the accept histogram and un-pressing it will put the swipe into the reject. The save button saves all the current swipes in memory into a directory marked by Save Directory. The DAQ assistant helps acquire the waveform with the sampling rate of 500kHz. The VI is broken up into 3 sections which and are run sequentially to avoid timing issues. It is not much different from the file mode VI. Figure 22 Block diagram of live mode VI Figure 23 shows our results. The parameters used are shown on the top. We used 11 different cards to build the reject histogram and 8 swipes of a Washington University ID to build the accept histogram. As you can see for the reject distribution, even though it does not look as symmetric as the previous 19

reject histogram taken from the database, it is still centered about 0 and is still fairly tight. It is because there are only 11 swipes that it does not look perfectly Gaussian. For the accept histogram, it looks just like the one taken using the database. Every swipe correlation looks to have a correlation coefficient of above 0.9, meaning a fairly perfect match. As explained before, the rate of false positives and false negatives is still minimal. Figure 23 Accept and reject distributions generated in live mode Optimize VI Separation is defined as: Where μ A = mean of the accept distribution σ A = standard deviation of the accept distribution 20

σ R = standard deviation of the reject distribution Separation S is used as a rating of performance for our system. The higher the separation, the smaller the rates of false positives and false negatives, because the accept and reject distributions have less overlap. That way we can pick a threshold that determines if a card is authentic or not that does not touch the tails of the accept and reject distributions. Figure 24 shows the optimizer VI, a VI that sweeps the parameter values and calculates separation for each parameter combination. The combination of parameters that give the largest S is chosen and displayed on the LabView front panel for M*B sizes of 384, 768, 1024, and 2048. Delta can take on any value that does not overflow the smallest Magneprint in the database, while Filter can take these values: 17, 22, 27, 32, 37. M and B can be any factor combinations of 384, 768, 1024, and 2048 for each M*B. Figure 24 Optimizer VI This VI is run using swipes from the database. The results containing parameters achieving highest separation is shown in Table 1 below. 21

Table 1 Best M, B, filter half-width, and spacing (delta) parameters This optimization program is very computationally intensive and it took around 2 hours to complete on a quad core PC. Cost Analysis Since this project is almost purely software, we will discuss the cost associated with running our code. To run our code efficiently in LabView, especially the optimization VI, at least a mid-tier quad core PC is required and that costs around $500. Making further improvements to our code will require programmer man-hours and that can cost at least $15/hr spent on our code. Bill of materials Where applicable, we have estimated the cost to program and test our project: Magtek card reader $75 LabView 2012 $2700 NI ELVIS II $3625 PC workstation $600 Hazards and Failure Analysis Our software implementation poses no threat to the public and the environment. Conclusions We were able to determine various parameters for the best separation for each M*B size as detailed in Figure 20, with the separation being at least 0.642, which is excellent in reducing the rate of false 22

positives and false negatives. The live swiping of cards confirmed our working system, which was able to identify authentic and fraudulent cards base on setting a correlation threshold of 0.8; our system is very robust and reliable and is an accurate way to distinguish authentic cards from copied cards. Future areas of exploration include hardware implementation of the system, optimizing cost vs. robustness for commercial use, and after that commercial testing. In addition, our implementation could use less computing time and memory by processing the cards on the fly. References Method and apparatus for authenticating a magnetic fingerprint signal using a filter capable of isolating a remanent noise related signal component, with R. E. Morley, R. S. DeLand, E. C. Limtao, E. J. Richter, and S. R. Wood U.S. Patent No. 7,478,751, January 20, 2009 "Magnetic stripe card verification system," with T. C. McGeary and R. S. DeLand, Jr. U.S. Patent No. 6,098,881, August 8, 2000. 23