Cyberspace Security Use Keystroke Dynamics. Alaa Darabseh, B.S. and M.S. A Doctoral Dissertation In Computer Science

Size: px
Start display at page:

Download "Cyberspace Security Use Keystroke Dynamics. Alaa Darabseh, B.S. and M.S. A Doctoral Dissertation In Computer Science"

Transcription

1 Cyberspace Security Use Keystroke Dynamics by Alaa Darabseh, B.S. and M.S. A Doctoral Dissertation In Computer Science Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Approved Dr. Akbar Siami Namin Committee Chair Dr. Rattikorn Hewett Dr. Donald Jones Dr. Susan Mengel Dr. Mark Sheridan Dean of Graduate School August, 2015

2 Copyright 2015, Alaa Darabseh

3 ACKNOWLEDGMENTS First, I would like to thank my advisor Dr. Akbar Siami Namin for his great support and long hours with me. The research presented in this dissertation will not be possible without his help. I also would like to express my appreciation to my committee members (Drs. Rattikorn Hewett, Donald Jones, and Susan Mengel) for their endless support to make this work possible. I feel privileged and honored to have them as members of my dissertation committee. None of this work could have been done without the support of my father, who always encouraged me to be successful. Without his prayers and love I would never have reached this moment. Finally, I would like to dedicate this achievement to my mother soul.. ii

4 TABLE OF CONTENTS Acknowledgments... ii Abstract... vi List of Tables... viii List of Figures... ix 1. Introduction Biometrics: The Focus of This Dissertation Motivation Research Questions Contributions Background User Authentication in Computer Security Something You Know Something You Have Something You Are Combination of Authentication Factors Biometric Systems Performance Measures Keystroke Dynamics Static and Dynamic Authentication Systems Keystroke Features Literature Review Static Verification Continuous Verification Motivational Challenges and Contributions of the Thesis Research Motivations Research Contributions Experimental Setup Data Collection Features and their Extractions...30 iii

5 5. Keystroke Feature Selection Using Statistical Methods In Hypotheses Testing Motivations Research Questions Data Analysis Methods Mann-Whitney Statistics Test Effect-Sizes R's Statistics Package Analyses And Results Mann-Whitney Test Analyses Mann-Whitney Test Results Effect-Size Analyses Effect-Size Results Subset Feature Performance Conclusion Keystroke Feature Selection Using Machine Learning-Based Classifications Research Questions Machine Learning Methods Support Vector Machine Linear Discriminate Classifier K-Nearest Neighbors Naïve Bayes One Class SVM R Packages Machine Learning Analyses And Results Feature s Items Performance Features Performance Conclusion Keystroke Feature Selection Using Wrapper-Based Feature Selections Filter Approach Wrapper Approach...66 iv

6 7.2. Wrapper-Based Feature Subset Selection Techniques Greedy Algorithms Best-First Search Algorithm Genetic Algorithm Particle Swarm Optimization Weka Analyses Tools Results Continuous User Authentication Based on Sequential Change-Point Analysis Motivations Research Questions Intrusion Detection Change-Point Detection Batch Change-Point Detection Sequential Change-Point Detection CMP R Package Analyses Data Set Analyses Methods Analyses Results Conclusion Conclusion and Future Work...95 Bibliography...99 v

7 ABSTRACT Most of the current computer systems authenticate the user identity only at the point of entry to the system (i.e., login). However, an effective authentication system includes continuous or frequent monitoring of the identity of the user to ensure the valid identity of the user throughout a session. Such a system is called a continuous authentication system. An authentication system with such security scheme protect against certain attacks such as session hijacking that can be performed by a malicious user. Recently, keystroke analysis has acquired popularity as one of the main approaches in behavioral biometrics techniques that can be used for continuously authenticating user. There are several advantages when applying keystroke analysis: First, keystroke dynamics are practical, since every user of a computer types on a keyboard. Second, keystroke analysis is inexpensive because it does not require any additional components (such as special video cameras) to sample the corresponding biometric feature. Third and most importantly, typing rhythms can be still available even after the authentication stage has been passed. A major challenge in keystroke analysis is the identification of the major factors that influence the performance accuracy of the keystroke authentication detector. Two of the most influential factors that may impact the performance accuracy of the keystroke authentication detector include the classifier employed and the choice of features. Currently, there is insufficient research that addresses the impact of these factors in continuous authentication analysis. The majority of exciting studies in keystroke analysis focuses primarily on the impact of these factors in the static authentication analysis. Understanding the impact of these factors will contribute to the improvement of continuous authentication keystroke based system performance. Furthermore, most of the existing schemes of keystroke analysis require having predefined typing models either for legitimate users or impostors. However, it is difficult or even impossible in some situations to have typing data of the users (legitimate or impostors) in advance. For vi

8 instance, consider a personal computer that a user carries to a college or to a cafe. In this case, only the computer owner (legitimate user) is known in advance. For another instance, consider a computer that has a guest account in a public library; in this case, none of the system users are known in advance. Thus, a new automated and flexible technique that has the ability to authenticate the user without the need for any prior user typing model is needed. This dissertation focuses on improving continuous user authentication systems (that are based on keystroke dynamics) designed to detect malicious activity caused by another person (impostor) whose goals to is take over the active session of a valid user. The research will be carried out by1) studying the impact of the selected features on the performance of keystroke continuous authentication systems; 2) proposing new timing features that based on utilization of the most frequently used English words (e.g. The, And, For ) that can be useful in distinguishing between users in continuous authentication systems; 3) comparing the performance of keystroke continuous authentication systems with the application of different algorithms; 4) investigating the possibility of improving the accuracy of continuous user authentication systems by combining more than one feature; 5) proposing a new detector that does not require predefined typing models either from legitimate users or impostors. vii

9 LIST OF TABLES 4.1: Number of occurrences of the feature items in the English passages : Average of effect sizes for all feature items using 28 users data, sorted ascendingly : Item performance using SVM : Item performance using OC-SVM : Item performance using KNN : Item performance using NB : Performance Comparison of four different keystroke features on SVM, LDC, NB, and KNN : Comparison of features selected : Performance of detection system using different ARL of letter E : Letter performance using sequential change-point detection when employing Lepage statistical tests, the change occurs after 50 observations : Letter performance using sequential change-point detection when employing Lepage statistical tests, the change occurring after 100 observations :Letter performance using sequential change-point detection when employing different statistical test techniques viii

10 LIST OF FIGURES 2.1: Traditional biometrics system : Keystroke dynamics enrollment and authentication schemes : Security cycles of static and continuous keystroke authentication system : Keystroke timing information : Main window of the application used to collect the data from users : An example of four keystroke features extracted for the word the : A comparison between the performances of four different keystroke features. (a) Results obtained using Mann-Whitney statistic test. (b) Results obtained using Mann- Whitney statistic test and Voting Schema : A comparison between the performance of four different keystroke features items using 28 participants data with Mann-Whitney statistic test : The performance of different combinations of different keystroke features performed using 8 participants data : FAR values for the 5, 10, and 15 most discriminative items based upon two approaches applied (p-values and effect sizes) using 28 user s data : FAR values for the 5, 10, and most frequent items of each feature using 28 users data : Performance comparison of four keystroke features used independently against the ten different combinations performed on SVM : Performance comparison of four keystroke features used independently against the ten different combinations performed on KNN : Three main categories of feature selection : Feature subset selection using Wrapper approach : Forward and Backward Selection Wrapper Approach Algorithm : Best-First Search Wrapper Approach Algorithm : Genetic Search Wrapper Approach Algorithm : Particle Swarm Optimization Wrapper Approach Algorithm : An illustration of change-point detection ix

11 8.2: An illustration of sequential change point detection. The detection delay is measured in the amount of data a test needed to signal a change point in the sequence : Performance of detection system using different ARL of letter E x

12 CHAPTER 1 INTRODUCTION In modern society, the combination of username and password security scheme is being the mainly used authentication method to control access to sensitive and important resource particularly in computer security and related fields. In this case, the user claims an identity by providing her user name and then proves the ownership of the claimed identity by providing a password. However, these traditional protection mechanisms that relay on using a passwords are less satisfactory and vulnerable to be stolen. Moreover, these traditional protection mechanisms that relay on using username and passwords are only verify user s identity at the point of entry to the system (i.e., login). However, an effective authentication system includes continuous or frequent monitoring of the identity of the user to ensure the valid identity of the user throughout a session. Such a system is called a continuous authentication system. An authentication system with such security scheme protect against certain attacks such as session hijacking that can be performed by a malicious user. There are numerous conceivable applications and scenarios that require a continuous user authentication approach. For instance, consider a student who takes online quizzes/tests. This is an important application in the light of the fact that the number of students taking online classes is increasing and teachers are getting more concerned about true assessment and academic integrity. Threat in this case includes substitute of the valid student who is already authenticated at the start of the exam. For another example, consider an employee who works for an organization. In this case, threats include an insider intruder who can takes over an active session. Recently, keystroke analysis has acquired popularity as one of the main approaches in behavioral biometrics techniques that can be used for continuously authenticating user. There are several advantages when applying keystroke analysis: First, keystroke dynamics are practical, since every user of a computer types on a keyboard. Second, keystroke analysis is inexpensive because it does not require any additional components (such as special video cameras) to sample the corresponding biometric feature. Third and 1

13 most importantly, typing rhythms can be still available even after the authentication stage has been passed. The intention that forms the basis of this dissertation is to improve continuous keystroke biometrics user authentication systems designed to detect malicious activity caused by another person (impostor) who intends to take over the active session of a valid user. Section 1.1 provides an overview of biometrics. The motivation of this dissertation is provided in Section 1.2. The main questions that will be addressed in this dissertation are provided in Section 1.3. Dissertation contributions are provided in Section Biometrics: The Focus of This Dissertation Authentication and Identification of individuals are amongst the most emerging problems that have been arisen in daily operations. Since the beginning of human interactions, humans have used facial features and voices to recognize others. In this context, biometrics refers to human characteristics and traits that make each individual unique. For a long time, humans have used fingerprints as a biometric feature to authenticate the identity of others. Handwriting, signature, voice, and hand geometry have also been used since the 1980s. During the 90 s, commercial software capable of facial recognition and iris scanning have been added to the market. In a modern society that heavily depends on computers, biometrics plays a critical role in nearly every aspect of our lives. Government agencies, transportations, healthcare, financial institutions, education, and security are increasingly reliant on biometrics to ensure the quality and more importantly the integrity and security of daily activities. Biometrics can be used to prevent unauthorized access to ATMs, smart cards, cellular phones, desktop PCs, and computer networks. Unlike traditional authentication mechanisms, such as keys or passwords, biometrics cannot be lost, stolen, or forgotten. Biometrics is inherently secure by nature and cannot be socially engineered or shared with others, unless it is forged. 2

14 In terms of computer security, biometrics refers to authentication techniques that depend on the use of human characteristics that make each individual unique, and often involves personal characteristics that can be used to uniquely verify a person's identity. Biometrics- based techniques are mainly classified as physiological biometrics features such as fingerprint, face, or iris or behavioral biometrics features, such as gait, handwritten signatures, keystroke dynamics, etc. Biometric techniques based on physiological features are considered more successful than those based on behavioral characteristics (Ashbourn 2014). This relative success is probably due to the fact that physiological features are more stable and generally do not change over time, whereas behavioral features can be affected by an impermanent state, such as stress or illness. However, biometrics techniques based on physiological features usually require an additional sampling special tool (such as video cameras). Therefore, in the case of access to computers, the need for an additional special tool leads to increased cost. In contrast, biometrics techniques based on behavioral features such as keystroke dynamics can be a natural choice to increase computer access security when used in conjunction with traditional authentication methods. The aim of this dissertation is to advance the user active authentication using keystroke dynamics. Through this dissertation, we assess the performance and influence of various keystroke features on keystroke dynamics authentication systems. Furthermore, this dissertation proposes a new detector that does not require predefined typing models either from legitimate users or impostors Motivation Keystroke dynamics are defined as a behavioral biometric characteristic which involves analyzing a computer user s habitual typing pattern when interacting with a computer keyboard (Ashbourn 2014). In other words, keystroke biometric systems assume that typing characteristics of an individual are unique and difficult to reduplicate. A biometric keystroke authentication system consists of two phases: the enrollment 3

15 phase, which includes capturing typing data, filtering, feature extraction, and pattern learning; and the verification phase, which includes capturing typing data, filtering, feature extraction, and performing the comparison with the biometric pattern. The amount of typing data that is required to build a user pattern in the enrollment phase and the amount of typing data that must be collected before the comparison occurs in verification phase is an important issue in the keystroke authentication system. An efficient keystroke authentication system should have the ability to build the user pattern in minimal time, and it should be able to achieve quickest detection while maintaining good accuracy. However, maintaining high detection accuracy and short detection time is difficult and somehow these two are conflicting requirements that must be balanced for optimum detection. In order to tackle this problem, we could reduce the amount of data needed to be collected without loss of accuracy of a prediction model. This can be achieved by reducing the number of features that need to be learned by the classifier. As a matter of fact, the proper selection of features plays a key role in enhancing the accuracy of keystroke authentication detectors (Killourhy and Maxion 2010). Through this dissertation, we assess the performance and influence of various keystroke features in keystroke dynamics authentication systems. In particular, this dissertation investigates the performance of keystroke features on various subsets of the 20 most frequently used English alphabet letters, the 20 most frequently appearing pairs of English alphabet letters, and the 20 most frequently appearing English words. The performance of four features including the key duration, flight time latency, digraph time latency, and word total time duration are analyzed. Experiments are conducted to measure the performance of each feature individually and of different subset combinations of these features. This dissertation focuses on improving continuous user authentication systems (that are based on keystroke dynamics) designed to detect malicious activity caused by another person (impostor) whose goals to is take over the active session of a valid user. The 4

16 research will be carried out by 1) studying the impact of the selected features on the performance of keystroke continuous authentication systems; 2) proposing new timing features that based on utilization of the most frequently used English words (e.g. The, And, For ) that can be useful in distinguishing between users in continuous authentication systems; 3) comparing the performance of keystroke continuous authentication systems with the application of different algorithms; 4) investigating the possibility of improving the accuracy of continuous user authentication systems by combining more than one feature; 5) proposing a new detector that does not require predefined typing models either from legitimate users or impostors Research Questions The main research questions addressed in this dissertation are the follows: 1. Which keystroke timing feature(s) among key duration, flight time latency, digraph time latency, and word total time duration timing performs better in keystroke dynamics? This question will be discussed in Chapters 5 and Does any combination of timing features improve the accuracy of an authentication scheme? This question will be discussed in Chapters 5 and Which feature item contribute more to the accuracy of the model, where feature item are defined as the exact instances of letters in each feature, e.g. a, Th, etc.? This question will be discussed in Chapters 5 and How does word total time duration feature perform in comparison with key duration, flight time latency, and digraph time latency? This question will be discussed in Chapters 5 and Which authentication algorithm(s) among (SVM, LDC, NB, and KNN) performs better in keystroke dynamics? This question will be discussed in Chapter 6. 5

17 6. Can we detect the impostor who takes over from the legitimate user during the computer session when NO predefined typing models are available in advance either for legitimate users or impostors? This question will be discussed in Chapter Is it possible to detect an imposter in the early stages of an attack? If so, what amount of typing data is needed for a system in order to be able successfully detect the imposter? This question will be discussed in Chapter Contributions The aim of this dissertation is to advance active user authentication technology that utilizes keystroke dynamics. This dissertation focuses on various keystroke features as they affect the performance of the keystroke dynamics authentication system. The present study also considers the possibility of improving performance by combining four keystroke features including key duration, flight time latency, digraph time latency, and word total time duration. The major contribution of this dissertation is the utilization of most frequently used English words in determining identify of users typing on the keyboard. A comparison is conducted between a less common timing feature for keywords, namely the total time to type whole word, and other more common features, such as digraph and flight times of letters. Another important contribution of this dissertation includes proposing a novel approach based on sequential change-point methods for early detection of an imposter in computer authentication. There are two main advantages of the sequential change-point methods. First, they can be implemented online, and hence, enable the building of continuous user authentication systems without the need for any user model in advance. Second, they minimize the average delay of attack detection while maintaining an acceptable detection accuracy rate. 6

18 The key contributions of this dissertation are as follows: Introduction of new features that can distinguish between users in continuous authentication systems based on keystroke dynamics. The new features are based on utilization of the most frequently used English words in deciding the identity of users typing on the keyboard. Extraction of various types of keystroke features and investigation as to which one is the most efficient in keystroke dynamics. Adaptation of new anomaly detection algorithm for analyzing typist data. The new algorithm is based on sequential change-point analysis for early detection of an imposter in computer authentication. Performance comparison of several anomaly detection techniques in terms of their ability to distinguish between users i n continuous authentication systems. Application of machine learning-based features selection techniques useful in selecting the most influential features in continuous authentication systems based on keystroke dynamics, which maintain sufficient classification accuracy of the system and decrease the training and testing time required. 7

19 CHAPTER 2 BACKGROUND This chapter provides an overview of the authentication concepts. It also provides an overview of different kinds of biometrics authentication methods focusing on keystroke authentication methods. This chapter is organized as follows. Section 2.1 provides an overview of authentication methods. Section 2.2 discusses biometrics authentication systems. Section 2.3 discusses keystroke dynamic systems in detail. Section 2.4 provides a brief summary of extant studies of keystroke analysis User Authentication in Computer Security Authentication can be defined as the process that verifies if someone is, in fact, who he or she claims to be. In other words, it gives proving ownership of the claimed identity. For instance, passwords are usually required to gain access to computers; PIN codes are required to get money from ATM machines. There are several methods that provide the ability to authenticate a user. Traditionally, these methods are categorized into three main types: Something you know and do not share, e.g., passwords, PIN or pass-phrases. Something you have and keep safe, e.g., smart cards. Something you are and cannot be shared, e.g. biometric features such fingerprints. In the following subsections we provide a brief description of each type Something You Know In this type, the user who needs to get access to the system must simply provide knowledge of a secret. For example, PIN codes are required to get money from ATM machines, or passwords are usually required to gain access to computers. The main 8

20 advantages of this method are that it is a fast authentication mechanism with less cost compared to other methods, and easy to implement in real world. However, this method has some limitations: namely, secrets are easy to forget, or they can be stolen when the user writes them down Something You Have In this type, the user who wants access to the system is required to provide a unique piece of hardware (a key, a smart card, a SIM card) that can be used to match a user identity. The main advantage of this method is that the user here does not need to remember any secrets to get access to the system as in the case of the previous class. However, this method is more expensive since it requires special pieces of hardware for the equipment as well as for the user. Moreover, this method requires taking action when the hardware is lost or stolen Something You Are An authentication process of this type is based on utilization of biometric features such as fingerprints or an iris pattern. It based on the assumption that biometric features are unique from one person to another. For instance, fingerprints are unique even in the case of identical twins. The main advantage of this method is that biometric features cannot be lost, stolen, or forgotten. Biometrics is inherently secure by nature and cannot be socially engineered or shared with others, unless it is forged Combinations of Authentication Factors In this type, to access to a digital system, the user must provide at least two authentications from the above three types. One example of such a combination could be in the case of withdrawing money from an ATM machine where two things must be provided: the bank card and the PIN code. In this case, the user provides two factors: something he knows (a PIN code) and something he has (a bank card) to access his bank account. It is apparent that the purpose of this sort of combination is to increase the system security. 9

21 2.2. Biometric Systems According to (Ashbourn 2014), a biometric system is "the automated verification of human identity through repeatable measurement of physiological and/or behavioral characteristics". The operation of biometric authentication systems consists of two phases (Ashbourn 2014): I. The enrollment phase: this phase includes data capture, data filtering, feature extraction, and pattern learning. II. The verification phase: this phase includes data capture, feature extraction, and the comparison of the performance with the biometric pattern. The main scheme for a biometric authentication system as outlined by Ratha et al. (2001) is depicted in Figure 2.1. Figure 2.1: Traditional biometrics system. The biometric authentication system is based on utilization of human characteristics (features) that make each person unique. However, in order for those features to be used in building practical and effective biometric systems, they need to satisfy, to some degree, certain essential properties. The properties are: 1. Universality- The characteristic (features) should exist for each person. 2. Distinctiveness- No two people share the same characteristic. 10

22 3. Permanence- The characteristic should be constant over time. 4. Collectability-Data of the characteristic must be quantitatively measurable. 5. Performance- The characteristics need to provide good accuracy. 6. Acceptability- The characteristics must be accepted by most people. 7. Circumvention- The characteristics cannot be easily imitated or replicated. Generally, biometric characteristics should meet all of these properties in order for the biometric system to be a feasible and effective. However, it is clear that no characteristics can possibly satisfy all those properties to the highest degree. The degree to which a candidate characteristic satisfies these properties usually determines the type or level of biometric security application it can be used for Performance Measures In the previous section, we have seen that the operation of a biometric authentication system consists of two phases, the enrollment phase and the verification phase. In the enrollment phase, the system tries to learn user characteristics and build a pattern. In the verification phase, the system compares the new sample with the stored pattern and generates a similarity or dissimilarity score. The generated score is then matched against some predetermined threshold. The threshold is a line that represents the difference between the legitimate user and the imposter. All scores that fall below a threshold line are considered to characterize the legitimate user, and all scores that fall above a threshold line are considered to characterize an impostor. A biometric authentication system requires the person being identified to make a claim to an identity. Given the fact that the claimant may or may not be a genuine user, there are four possible outcomes, two of which describe false claims. 1. An impostor tries to authenticate and is denied (true negative). 2. A legitimate user tries to authenticate and is accepted (true positive). 3. An impostor tries to authenticate and is accepted (false positive). 11

23 4. A legitimate user tries to authenticate and is denied (false negative). Typically, performance of biometric systems is measured in terms of various error rates. The most commonly used error rates are False Acceptance Rate (FAR) and False Rejection Rate (FRR). FAR refers to the percentage of imposters who were mistakenly accepted by the system. In statistics, this error is referred to as a Type II error, defined as: Number of AcceptedImposter Attempts FAR Total Number of Imposter Match Attempts FRR refers to the percentage of authorized users whose identities were mistakenly denied from the system. In statistics, this error is referred to as a Type I error, defined as: FRR Number of RejectedLegitimate User Attempts Total Number of Legitimate Match Attempts For an ideal case, both of these error rates should be equal to 0%. However, there is a trade-off between these two metrics; i.e., increasing one might not be possible without decreasing the other one. The choice of which threshold to use is a very critical issue in biometric authentication systems that may depend heavily on the security level of the application. For instance, in domains where a high security application is required, FAR must be as low as possible to detect as many impostors as possible. Another important error metric that is often used to compare different biometric systems is the Equal Error Rate (EER). EER can be defined as the point where FAR value is equal to FRR value. In the next section, we describe the keystroke dynamics authentication in detail as one of the behavioral biometric types. 12

24 2.3. Keystroke Dynamics Keystroke dynamics are defined as a behavioral biometric characteristic which involves analyzing a computer user s habitual typing pattern when interacting with a computer keyboard (Ashbourn 2014). In other words, keystroke biometric systems assume that typing characteristics of an individual are unique and difficult to reduplicate. There are several advantages to keystroke analysis: First, keystroke dynamics are practical, since every computer user types on a keyboard. Second, it is an inexpensive method because it does not require any additional components. Third and most important, typing rhythms can remain available even after the authentication phase has passed. The concept behind keystroke dynamics is inspired by much older works that differentiate telegraph operators by their tapping rhythm over telegraph lines. This capability was improved and used during World War II by the US military to distinguish an ally from an enemy. In the mid-70s, Spillane (1975) in an IBM technical report was the first to suggest using a computer keyboard s typing rhythms to identify users. In the early 80s, Gaines et al. (1980) conducted a feasibility study on the timing of keystroke patterns as an authentication method. Since that time, keystroke dynamics has been an active research area. Keystroke dynamics research is known by other names such as keystroke analysis, typing rhythms, and typing biometrics. Several studies (Joyce and Gupta 1990, Bleha et al. 1990, Obaidat and Sadoun 1997, Sang et al. 2005) conclude that users tend to have constant behavior patterns when they type on a keyboard. Thus, keystroke dynamics can be used for authentication. As with other biometric authentication systems, a keystroke authentication system consists of two phases: the enrollment phase, which includes capturing typing data, filtering, feature extraction, and pattern learning; and the verification phase, which includes capturing typing data, filtering, feature extraction, and performing the comparison with the biometric pattern. The main scheme for the keystroke dynamics system as outlined by Romain et al. (2011) is pictured in Figure 2.2. A user types on the keyboard and the timing of typing features is extracted and compared with the user's 13

25 stored typing pattern by the matcher. The matcher decides whether the current user is a legitimate user. Figure2.2: Keystroke dynamics enrollment and authentication schemes Static and Dynamic Authentication Systems Keystroke analysis techniques can be primarily classified into two main categories static and dynamic (or continuous) analysis. Static analysis means that the analysis is executed at certain points in the system (i.e., at log-in time). This type of analysis ordinarily involves short typing samples such as those which might be seen at log-in time; for example, user IDs, passwords, names, and/or passes phrases. This method is often used to add additional security to the system and to address some limitations inherent in the traditional authentication techniques. With such a security scheme, during an authentication process (at log-in time), the verification system attempts to verify two issues: first, is the user credential correct? Second, is the manner of typing the password similar to the user profile? Therefore, if an attacker was able to steal the user s credentials, he/she will be rejected by the verification system because he/she will not type in the same pattern as the legitimate user. 14

26 With a static security scheme, the authentication process is statically performed only at the point of entry to the system (i.e., log-in). However, an effective authentication system continuously verifies the identity of the user by gathering typing data throughout the user s session to ensure the valid identity of the user. An authentication system with such a security scheme can be protected against certain attacks, such as session hijacking performed later by a malicious user. In contrast to static analysis, dynamic analysis includes continuous or frequent monitoring of one's keystroke behavior. It is first checked during the log-in session and continues after the initial authentication. In this case, larger typing samples are usually necessary in order to build an individual model. Typing samples can be collected directly, by requiring individuals to type some predefined long text several times, or indirectly, by monitoring their keystroke activities (e.g., while they are writing s and using word processing). Figure 2.3 shows the security life cycles of static and continuous keystroke authentication systems. It is obvious that continuous keystroke authentication systems provide greater security. Figure 2.3: Security cycles of static and continuous keystroke authentication system. Additionally, static authentication analysis can be utilized only in systems where there is no need for additional text entry (e.g., to check bank accounts online). By contrast, there are numerous conceivable applications of the keystroke biometric for dynamic authentication analysis. 15

27 One such application is the verification of the identity of students taking online quizzes/tests. This is an important application in light of the fact that the number of students taking online classes is increasing, leading to growing concerned among education professionals about true assessment and academic integrity. Another good application is the use of dynamic biometrics to prevent insider attacks by installing key loggers on each employee's computer to monitor his/her keystroke activities Keystroke Features When a person types on a keyboard, two main events occur: 1) the key down event, when a person presses a key, and 2) the key up event, when a person releases a key. Timestamps of each event are usually recorded to keep track of when a key is pressed or released. A variety of timing features can then be extracted from this timing information. Two of the most used features are 1) duration of the key, which is the time the key is held down, and 2) keystroke latency, which is the time between two successive keystrokes. Latency can be calculated by many different methods. The most commonly used methods are: Press-to-press (PP) latency, which is the time interval between consecutive key presses; PP is also called digraph time. Release-to-press (RP) latency, which is the time interval between releasing the key and pressing the next one; PR is also called flight time. Release-to-release (RR) latency, which is the time interval between releases of two consecutive keys. It is also possible to capture other keystroke dynamics information, such as the time it takes to write a word, two letters (digraph) or three letters (tri-graph). Figure 2.4 presents the most popular features that can be extracted from keystroke timing information. Other features such as difficulty of typing phrase, pressure of keystroke, and frequency of word 16

28 errors can also be used as features in keystroke analysis. However, not all features are favorable since some of them require extra tools, as in the case of keystroke pressure. Therefore, this dissertation focuses on keystroke timing information of the user when typing on the keyboard. Figure 2.4: Keystroke timing information Lierature Review So many reports can be found on keystroke analysis based on a static authentication. Fewer works are to be found on keystroke analysis with continuous authentication. In this section, we present a brief summary of extant studies on both types of keystroke analysis Static Verification Joyce and Gupta performed a study on 33 users, out of whom six users represented legitimate users and 27 users represented impostors (Joyce and Gupta 1990). All users were asked to type in their log-in name, password, first name and last name, eight times in one session. The mean reference's signature was computed by calculating the mean of the eight values. Digraph latencies that fall within 1.5 standard deviations of the reference s signature mean are considered to belong to a valid user. Using absolute distance, the researchers achieved 0.25% of FAR and 16.67% of FRR. Bleha et al. ( 1990) conducted a similar study to that proposed by Joyce and Gupta 17

29 (1990), using a digraph to distinguish between legitimate users and impostors. They performed experiments on 39 users, out of whom 14 users represented legitimate users, whereas 25 users represented impostors. Using a minimum distance classifier, Bleha et al. achieved 8.1% of FAR and 2.8% of FRR. Brown and Rogers (1993) used the search strategies introduced in Artificial Neural Networks (ANNs) as a pattern classifier for keystroke dynamics data. They were the first researchers to use keystroke duration as a feature to differentiate between legitimate users and impostors. They performed experiments on 61 users, out of whom 46 users represented legitimate users, while 15 users represented impostors. Each user provided 41 and 30 samples of eight characters long. This study reported the FAR to 0.0 percent and achieved a FRR of percent. Obaidat and Sadoun (1997) used keystroke duration and latency together as a feature to differentiate between legitimate users and impostors. Their experiment involved 15 participants who were asked to type their names 225 times each day over a period of eight weeks. Using neural network as a classifier, Obaidat and Sadoun were able to achieve a FAR of 0.0 percent and a FRR of percent. Cho et al. (2000) collected the required data online for their experiments. In the study, 21 participants represented legitimate users and 15 represented impostors who provided 275 and 75 samples of 8 characters long. Cho et al. used the multi-layer perceptron (MLP) ANN for classification, reporting a FAR of 0.0 and a FRR of Sang et al. (2005) performed a similar experiment to that of Obaidat and Sadoun, using duration and latency together as a feature to differentiate between legitimate users and impostors. Support vector machine (SVM) was used to classify ten user profiles. This study achieved very good accurate results of 0.02% FAR and 0.1% FRR. Ara ujo et al. (2005) conducted seven experiments on 30 users. They attempted to improve the regular log-in-password authentication performance by combining more than 18

30 one feature. Four features (key code, two keystroke latencies, and key duration) were analyzed. The statistical distance classifier was used in their experiments for classification. The better results were achieved with the use of all features, which yielded a FAR of 1.89% and FRR of 1.45%. Revett et al. (2007) performed an experiment on 50 participants using the probabilistic neural network (PNN). In this study, 20 participants represented legitimate users and 30 represented impostors, who provided 13 and 30 samples of text range from 6 to 15 characters long. Keystroke event times were recorded in milliseconds with an accuracy of ±1 ms. they reported an average FAR/FRR of Continous Verification As previously mentioned, fewer studies are to be found on the subject of continuous verification. Some papers tested the possibility of continuous verification of the user during a complete typing session. Gaines et al. (1980) conducted a feasibility study on the use of timing patterns of keystrokes as an authentication method. The experiment involved six professional typists who were required to provide three passages consisting of 300 to 400 words, two times each, over a period of four months. A statistical t-test was applied under the hypothesis that the means of the digraph times were the same in both settings, and with the assumption that the two variances were equivalent. It was demonstrated that the number of digraph values that exceeded the test were typically between 80 to 95 percent. The most frequent five digraphs that appeared as distinguishing features were in, io, no, on, ul. Umphress and Williams (1985) conducted an experiment in which 17 participants were asked to provide two typing samples. The first sample, which was used for training, included about 1400 characters, and the second, which was used for testing, included about 300 characters. Digraph latencies that fall within 0.5 standard deviations of its 19

31 mean are considered to belong to a valid user. They achieved 6% of FAR and 12% of FRR. John and Williams (1988) performed experiments on 36 participants who were asked to type the same text of 537 characters twice in two separate events over a month. The first sample was used for training and building an authentication model of the users, and the second sample was used for testing. The test digraph latency was counted valid if it fell within a 0.5 standard deviation of the mean reference digraph latency, and was accepted if the ratio of valid digraph latencies to total latencies was more than 60 percent. A False Acceptance Ratio (FAR) of 5.5% and a False Rejection Ratio (FRR) of 5% were achieved. Monrose and Rubin (2000) performed a study on both static and dynamic keystroke analyses. Overall, 31 users were asked to type a few sentences from a list of available phrases and/or enter a few free sentences. Three different methods were used to measure similarities and differences between typing samples: normalized Euclidean distance, weighted maximum probability, and non- weighted maximum probability measures. About 90% of correct classification was achieved when fixed text was used, which was later enhanced to 92.14% with expansion of the analyses to 63 users (Monrose and Rubin 1997). However, it was reported that when different texts are used, accuracy collapsed to 23% of correct classification in the best state. Dowland et al. (2001) monitored normal activities of four users for some weeks on computers using Windows NT, which means there were no constraints on the users. Different statistical techniques were applied. Only digraphs that occurred less frequently by the users were used to build the users profiles. A new sample compared users profiles in two steps: first, each digraph in the new sample is compared to the mean and standard deviation of its corresponding digraph in the users profiles and marked accepted if it fell within some defined interval. Second, the new sample was assigned to 20

32 the user whose profile provided the largest number of accepted digraphs. A 50% correct classification was achieved. Bergadano et al. (2002) used the type error and intrinsic variability of typing as a feature to differentiate between legitimate users and impostors. Their experiment involved 154 participants, of whom 44 users, as legitimate users, were asked to type a fixed text of 683 characters long for five times over a period of one month, while 110 users were asked to provide only one sample to be used as impostor users. The degree of disorder within tri-graph latencies was used as a measure for dissimilarity metric and statistical method for classification to compute the average difference between the units in the array. This approach was able to achieve 0.0% of FAR and 2.3% of FRR. Curtin et al. (2006) conducted three identification experiments in which subjects were asked to type three texts 10 times. The first two texts were 600 characters in length and the third one was 300 characters in length. The first text was used for training. The nearest neighbor classification technique employing Euclidean distance was used in these experiments. A 100% identification accuracy was achieved on eight users typing the same text. However, this accuracy diminished with 30 subjects, typing different texts, and gradually decreasing the length of the text. It was concluded that the best performance could be achieved under these conditions: sufficient training and testing text length, sufficient number of enrollment samples, and the same keyboard type used for enrollment and testing. Gunnetti and Picardi (2005) conducted an experiment on 205 participants. Their work focuses on long free text passages. They used 40 participants to represent legitimate users who provided 15 samples each, and 165 participants to represent impostors who provided only one sample each. They developed a method for comparing the two typing samples based on the distance between typing times. They reported a FAR below 5% and a FRR below 0.005%. 21

33 Hu et al. (2008) used 19 participants to represent legitimate users who provided 5 typing samples each, and 17 participants to represent impostors who provided 27 samples. Typing environment conditions were not controlled in the experiment. Typing samples were used in building users profiles by creating averaging vectors from all training samples. K-nearest neighbor technique was employed to cluster the users profiles based on the distance measure. 66.7% of correct classification. 22

34 CHAPTER 3 MOTIVATIONAL CHALLENGES AND CONTRIBUTIONS OF THE THESIS A major challenge in keystroke analysis is the identification of the major factors that influence the performance accuracy of the keystroke authentication detector. Two of the most influential factors that may impact the performance accuracy of the keystroke authentication detector include the classifier employed and the choice of features. Currently, there is insufficient research that addresses the impact of these factors in continuous authentication analysis. The majority of exciting studies in keystroke dynamics focuses primarily on the impact of these factors in the static authentication analysis. Understanding the impact of these factors will contribute to the improvement of continuous authentication system performance. While chapter 1 provides an overview of the motivations and contributions of this dissertation, this chapter provides a detailed account of the same. The motivation of this dissertation is provided in Section 3.1. And dissertation contributions are provided in Section Research Motivations Extant literature on keystroke dynamics demonstrates conflicting results regarding which feature is the most effective timing feature in terms of distinguishing between users in keystroke dynamics domain. Experiments show that hold times are much more important than latency times (Pin et al. 2012, Robinson et al. 1998). It is also observed that using tri-graph time offered better classification results than using digraphs or higher order n-graphs (Killourhy and Maxion 2010). Revett et al. (2007) also reported that the digraph and tri-graph times were more effective compared to hold time and time of flight. Accordingly, recent studies combine more than one of these features (Pin et al. 2012, Araújo et al. 2005). Research has found that use of all three types of features (i.e., hold times, digraph times, and flight times) produces better results (Araújo et al. 2005). In 23

35 contrast, it has also been reported that, when hold times and either digraph times or flight times are included, the particular combination has a trivial effect (Pin et al. 2012). Another observation taken from the literature on keystroke dynamics is that existing schemes use some features (i.e., hold times, digraph times, and flight times) from static authentication keyboard-based systems to represent user typing behavior in continuous authentication keyboard-based systems. These features can be used in the static authentication keyboard-based systems for successful user authentication. However, these features do not guarantee strong statistical significance in continuous authentication keyboard-based systems. Hence, better informative timing features must be added to the extant literature to guarantee successful distinguishing between users in continuous authentication keyboard-based systems. On the other hand, other research demonstrates that the employed algorithm (classifier) also plays an important role in enhancing the performance of keystroke dynamic systems (Killourhy et al. 2009, Zhao 2006). Killourhy et al. (2009) benchmarked and compared the performance of 14 algorithms on a single data set. Techniques varied from simple statistics classifiers such as the mean and standard deviations of typing times, to more complex pattern-recognition classifiers, such as neural networks and support vector machines. The study reported that different algorithms will have different error rates. Zhao (2006) used seven machine learning methods to build models to differentiate user keystroke patterns and found that certain classifiers are more accurate than others. However, even though these existing works are helpful in terms of deciding which classifiers provide more accurate rates, the focus of these studies is comparison of the algorithms performance on static authentication keyboard-based systems. Hence, more work is required to compare the algorithms performance in continuous authentication keyboard-based systems. From a different angle, another important conclusion that can be drawn from the existing literature of keystroke dynamics is that most of the existing schemes require 24

36 having predefined typing models either for legitimate users or impostors. It is difficult or even impossible in some situations to have typing data of the users (legitimate or impostors) in advance. For instance, consider a personal computer that a user carries to a college or to a cafe. In this case, only the computer owner (legitimate user) is known in advance. For another instance, consider a computer that has a guest account in a public library; in this case, none of the system users are known in advance. Thus, a new automated and flexible technique that has the ability to authenticate the user without the need for any prior user typing model is needed Research Contributions This dissertation focuses on the impact of the selected feature on the performance of the keystroke dynamics authentication system. The major contribution of this dissertation is the utilization of most frequently used English words in ascertaining identify of users typing on the keyboard. Another important contribution of this dissertation is the proposal of a novel approach based on sequential change-point methods for early detection of an imposter in computer authentication. There are two main advantages of the sequential change-point methods: First, they can be implemented online, and hence, enable the creation of continuous user authentication systems without the need for any user typing model in advance. Second, they minimize the average delay of attack detection while maintaining an acceptable detection accuracy rate. The key contributions of this dissertation are as follows: Identification of two main factors that may influence error rates of a continuous keystroke authentication system. In particular, this dissertation focuses on the impact of the choice of features and algorithm employed on error rates of a continuous keystroke authentication system. Introduction of a new anomaly detection approach based on sequential changepoint methods for early detection of attacks in computer authentication. The 25

37 developed algorithms are self-learning, which allow the construction of continuous keystroke authentication systems without the need for any user typing model in advance. Introduction of new features useful in distinguishing between users in continuous authentication systems based on keystroke dynamics. The new features are based on the utilization of the most frequently used English words to ascertain identify of the users typing on the keyboard. Introduction of feature selection techniques useful in selecting the most influential features in continuous authentication systems based on keystroke dynamics, which maintain sufficient classification accuracy of the system and decrease the training and testing times required. The proposed techniques use a wrapper approach based on different machine learning classifiers such as SVM for use in feature evaluation. The process of subset feature selection is performed by various searching algorithms such as Genetic Algorithm and Greedy Algorithm. Investigation of the possibility of improving the performance of the keystroke continuous authentication system by performing various combinations of the extracted keystroke features. In particular, this dissertation compares the independent performances of four keystroke features, including i) key duration, ii) flight time latency, iii) digraph time latency, and iv) word total time duration against ten different combinations of these keystroke features. Extraction of various types of keystroke features and determination of the most efficient in keystroke dynamics. In particular, the performance of four features such as i) key duration, ii) flight time latency, iii) digraph time latency, and iv) word total time duration are extracted and analyzed in order to find which one among these four features performs better in keystroke dynamics. 26

38 Introduction and comparison of the performance of several anomaly detection techniques in the terms of their ability to distinguish between users i n continuous authentication systems. In particular, four machine learning techniques are adapted for keystroke authentication in this dissertation. The selected classification methods are: Support Vector Machine (SVM), Linear Discriminate Classifier (LDC), K-Nearest Neighbors (K-NN), and Naive Bayesian (NB). The adapted techniques are selected because of their prevalence in the literature. Furthermore, these techniques are accompanied by a good and comprehensive set of methods and implementations to enable various comparisons of less common timing features used in keywords, such as total time to type whole word, with other more common features, such as digraph and flight times of letters. 27

39 CHAPTER 4 EXPERIMENTAL SETUP This chapter describes the experimental procedures, data collection, and keystroke features used in this dissertation. A set of experiments is executed based on the collected data. Those experiments were based on fixed rather than free text, and designed to collecting keystroke timing data from users with different typing experience levels. This chapter is organized as follows. Section 4.1 provides data collection procedures. Section 4.2 discusses feature extraction. 4.1 Data Collection A VB.NET windows form application was developed to collect raw keystroke data samples. For each keystroke, the time (in milliseconds) a key was pressed and the time a key was released, was recorded. Participants had the ability to use all keys on the keyboard including special keys such as Shift and Caps Lock keys. More importantly, they were also able to use the Backspace key to correct their typing errors. Participants had the choice of either using our laptop computer or downloading the application for collecting data on their own machines. The main window of the application was split into two sections. The top section displayed the text that participants were required to type, i.e., fixed text. The bottom section provided a space to allow the participant enter the text. The texts used in our experiment were a collection of random English sentences randomly selected from a likewise randomly selected pool of English passages. Figure 4.1 shows a screen shot of the main window of the application used to collect the data from users. Once the participant finished typing the text, the participant was asked to hit the collect data button. The collected timing raw data was sent to SQL database that was developed to save participant timing data. The timing raw data file that was recorded by 28

40 the application for each participant contains the following information for each keystroke entry: Key character that was pressed. Time the key was pressed in milliseconds. Time the key was released in milliseconds. The experiment of keystroke timing dynamic involved 28 participants. However, the process of collecting data from participants involves two stages, in the first stage; we collect data only from 8 participants, in the second stage, we collect data from another 20 participants. The participants were undergraduate and graduate students with different majors. All participants were asked to type the same prepared text (5000 characters) one time. Furthermore, 6 of the 28 participants were asked to provide other two-samples of the same text at different times. We were thus able to have two data sets: the first data set contained one sample for each one of the 28 users; the second data set contained two additional samples for only 6 users who played the role of the owner of the device and whose active authentications were of interest. Figure 4.1: Main window of the application used to collect the data from users. 29

41 4.2. Features and their Extractions The collected raw data from both data sets were used to extract the following features for our experiment: 1. Duration (F1) of the key presses for the 20 most frequently appearing English alphabet letters (e, a, r, i, o,t, n, s, h, d, l, c, u, m, w, f, g, y, p, b) (Gaines 1956). 2. Flight Time Latency (F2) for the 20 most frequently appearing pairs of English alphabet letters (in, th, ti, on, an, he, at, er, re, nd, ha, en, to, it, ou, ea, hi, is, or, te) (Gaines 1956). 3. Digraph Time Latency (F3) for the 20 most frequently appearing pairs of English alphabet letters (in, th, ti,on, an, he, at, er, re, nd, ha, en, to, it, ou, ea, hi, is,or, te) (Gaines 1956). 4. Word Total Duration (F4) for the 20 most frequently appearing English words (for, and, the, is, it, you, have, of, be, to, that, he, she, this, they, will, I, all, a, him) (Fry and Jacqueline 2012). Every feature (F1, F2, F3, and F4) consisted of 20 data items. For instance, F1 contained 20 alphabet letters, where each letter represented an item. Figure 4.2 illustrates an example of four keystroke features extracted for the word the." Figure 4.2: An example of four keystroke features extracted for the word the. Table 4.1 reports the number of instances a feature item occurs in the English passages. The items are grouped into one letter, two letters, and words with more than 30

42 two letters. These items are reported to be the most frequent English letters and words [ (Gaines 1956), (Fry, Edward B., and Jacqueline E. Kress 2012)] [8, 9]. Table 4.1: Number of occurrences of the feature items in the English passages Item Ave Frequency Item Ave Frequency Item Ave Frequency A 388 AN 62 AND 27 E 568 AT 61 FOR 27 I 418 ED 30 HAVE 27 N 222 ER 53 IS 49 O 412 HE 136 IT 37 R 236 IN 37 OF 28 S 320 ON 22 THE 29 T 472 TH 125 YOU 28 H 350 EA 28 ALL 32 D 159 EN 29 BE 28 L 249 HA 81 HE 32 C 68 HI 71 I 50 U 124 IS 107 SHE 28 M 138 IT 64 THAT 27 W 129 ND 45 THEY 28 F 103 OR 51 THIS 34 G 86 OU 49 TO 34 Y 120 RE 55 WILL 29 P 58 TE 27 A 59 B 59 TO 54 HIM 28 31

43 CHAPTER 5 KEYSTROKE FEATURES SELECTION USING STATISTICAL METHODS IN HYPOTHESES TESTING This chapter focuses on comparison of features to determine the features most effective in distinguishing users. The individual performance of four features is analyzed, including key duration, flight time latency, digraph time latency, and word total time duration. Experiments are conducted to measure the performance of each feature individually and the results from the different subsets of these features. In addition, this chapter reports how the word total time duration feature performs compared to key duration, flight time latency, and digraph time latency in determining identify of users when typing. This chapter is organized as follows. Section 5.1 provides chapter motivation. Section 5.2 provides research questions to be addressed in this chapter. Section 5.3 describes and explains the methods of analysis to be used. Section 5.4 explains the analyses conducted and the results obtained. Section 5.5 provides a conclusion and answers to the questions posed Motivations Section 3.1 showed that the literature of keystroke dynamics demonstrates conflicting results regarding which feature is the most effective timing feature in the keystroke dynamics domain. Section 3.1 showed that existing schemes in the continuous authentication keyboard area use some features from static authentication keyboard-based systems to represent user typing behavior. However, these features do not guarantee strong statistical significance in continuous authentication keyboard-based systems. Hence, it is necessary for new piece of informative timing feature to be added to the existing literature to guarantee successful performance of continuous authentication keyboard-based systems. 32

44 5.2. Research Questions The main research questions that will be addressed in this chapter are: Q1. Which keystroke timing feature(s) among key duration, flight time latency, digraph time latency, and word total time duration timing performs better in keystroke dynamics? Q2. Does any combination of timing features improve the accuracy of authentication scheme? Q3. Which feature items contribute more to the accuracy of the model, where feature items are defined as the exact instances of letters in each feature, e.g. a, Th, etc.? Q4. How does the word total time duration feature perform compared with key duration, flight time latency, and digraph time latency? 5.3. Data Analysis Methods In order to answer the research questions posed in the previous section, two statistical methodologies are applied. Statistical methodologies are based on finding statistical parameters (mean and standard deviation) differences between the typing samples of the users. In particular, Mann-Whitney statistics test and effect sizes are applied in this chapter. The following subsections describe these methodologies and their application to the collected data Mann-Whitney Statistics Test This statistical test is one of the most powerful and common non-parametric tests for comparing the mean values of two populations. If one of the two random variables is randomly greater than the other, the test evaluates the differences in mean values between the two underlying populations. 33

45 The Mann-Whitney test is used to test a null hypothesis: The null hypothesis where μ 1 and μ 2 are the mean values of the two populations, assumes that the two-samples have been generated from the same population, against the alternative hypothesis which assumes that there is a statistically significant difference between the mean values of the two-samples compared. The decision criterion is then derived from the estimated p-value with 95% confidence, as formulated as follows: P-value>= 0.05 Otherwise Do not reject H0 Reject H0 Thus a p-value is simply a measure of the strength of evidence against H Effect-Sizes Statistical significance testing normally relies on the null hypothesis and p-values to interpret results. That is, a null hypothesis test assesses the likelihood that any effect such as a correlation, and differences in means between two groups, may have occurred by chance. As indicated earlier, we calculate a p-value: the probability of the null hypothesis as being true, and thus provide evidence against the null hypothesis. However, sole reliance on statistical testing in formulating. A hypothesis raises the following issues. First, the statistical significance test does not tell the size of a difference between two measures, it only provides evidence against the null hypothesis. In particular, with this process we spend all our time testing whether or not the p-values reach 0.05 alpha value, such that p= is often reported as statistically significance but p= is not (such cases are called practically infeasible). Thus, even very small effects may reach statistical significance. Second, a statistical significance test depends 34

46 mainly on the sample size. Therefore, even the most trivial effect will become statistically significance if we test enough samples sizes. To account for this result, we also report effect size, i.e. the magnitude of difference between two-sample mean values. Effect size is defined as: the magnitude or size of an effect. In general, effect size measures either the sizes of correlations or the sizes of differences between two groups. The most common effect-size measures are Pearson r or correlation coefficient and Cohen s d (also known as the standardized mean difference). Pearson r actually measures the relationship strengths or association between two variables. On the other hand, Cohen's d can be used when comparing two groups' means, as the Mann-Whitney statistic test described in the previous section which is also used to compare two means. This dissertation focuses and reports the Cohen's d effect-size values. Cohen's d effect size is simply calculated by taking the difference in the two groups' means divided by their combined standard deviations, as shown in the following equation: 1-2 Cohen' s d Spooled 2 2 s1 s2 Spooled 2 where μ 1 is the mean for one sample, μ 2 is the mean for the second sample, and s1 and s2 are the standard deviations for the first sample and second samples, respectively. The Cohen's d interpretation depends on the research question. It covers the whole range, from no relationship (0) to a perfect relationship (3, or -3), with 3 indicating a perfect negative relation, and 3 indicating a perfect positive relation. Thus, effect-size meaning varies by situation. 35

47 Cohen suggests that d=0.2 be considered as a 'small' effect size, 0.5 a 'medium' effect size, and 0.8 a 'large' effect size (Cohen 2013). This means that a d of.2 shows that the two groups' means differ by 1/5 a standard deviation, a d of.5 shows that the two groups' means differ by 1/2 a standard deviation, a d of.8 shows that groups' means differ by 8/10 standard deviation, and so on R's Statistics Package Mann-Whitney statistic test and the effect-size test analyses were conducted using R software packages. In particular, wilcox.test function that implemented by R's statistics base-package was used to carry out Mann-Whitney tests analyses. On the other hand, cohensd function from lsr R package was used to carry out effect- size tests analyses (Navarro 2014) Analyses And Results This section describes analyses that were conducted based on the Mann-Whitney statistic test and the effect-size test. This section also reports the results obtained from both approaches Mann-Whitney Test Analyses Analysis 1: The aim of this analysis is to compare features in terms of relative effectiveness at distinguishing users. In this analysis, the Mann Whitney statistic test is used to compare the mean values for each two pairs of user feature items (i.e. letters). Each feature item of the first user is compared to its related feature item of the second user and observing the p-values that were measured from the statistical test recorded. This analysis was conducted on 8 users (who collected the data from at first stage of the collecting the data process) and 8 items for each feature, the 8 most frequently appearing English alphabet items for each feature. In this case, since there were 8 users, 36

48 there were 28 unique statistic tests and thus 28 p-values for each feature item. Since there were 8 items for each feature, there were 224 unique statistic tests and thus 224 p-values for each feature. In this analysis, the accuracy of each feature calculated as the total number of p-values greater than 0.05 divided by the total number of statistical tests for the underlying feature. The p-value >= 0.05 indicates that there is no evidence to reject the underlying null hypothesis, and thus an indication of no statistically significant differences between twosamples, i.e., users. Therefore, features that have greater numbers of p-values less than 0.05 are supposed to be more discriminative. Analysis 2: The aim of this analysis is to compare features effectiveness in distinguishing users with involvement of additional feature items and users. This analysis was conducted on 28 users (who collected the data from at second stage of the collecting the data process) and 20 items for each feature, the 20 most frequently appearing English alphabet items for each feature. In order to compare the data sets collected for two pairs of users, Mann Whitney statistic test is used to compare the mean values of the feature item (i.e. letters) for each of two pairs of users. Each feature item of the first user was compared to the related feature item of the second user, and the p-values measured by the statistical test recorded. Based on the generated p-values, each item will have one decision value, either "reject" or "do not reject". Since there were 20 items for each feature, there were 20 comparisons for each feature and thus 20 p-values for each feature with one p-value for each item comparison. Next, vote counting schemes were applied on these p-values to get a consensus and a finalized decision as to whether users were the same person or not; that is, either "reject" or "do not reject. In particular, two pairs of samples were accepted as belonging to the 37

49 same person if the ratio of the accepted feature item to total feature items accounted for more than 50 percent. For this analysis, we were interested in small p-values, i.e., the significant level of The p-value <= 0.05 indicates that the null hypothesis is rejected and thus an indication of statistically significant differences between two populations, i.e., users. Since there were 28 users, there were 378 imposter match attempts and thus 378 finalized decisions for each feature. Thus, the performance of the each feature was measured as the percentage of imposter attempts mistakenly accepted by the system. This analysis had two main benefits: First, the ability to compare the performance of individual features by measuring the accuracy of each feature individually. Second, because it measured the accuracy of each feature item individually, we could determine which items weighed and contributed more to the accuracy of the feature. In particular, the accuracy of each feature items calculated as the total number of p-values greater than 0.05 divided by the total number of statistical tests for the underlying item. The p-value >= 0.05 indicates that there is no evidence to reject the underlying null hypothesis and thus an indication of no statistically significant differences between two-samples, i.e., users. Therefore, items that have greater numbers of p-values less than 0.05 are supposed to be more discriminative. Analysis 3: The purpose of this analysis is to evaluate the performance of keystroke dynamics on the selected features against the same user authentication, i.e., the legitimate users of a computer system. For this analysis, only the data samples collected from 6 users were studied, 6 users provided three typing samples of the same text. The three samples gathered from each 38

50 user were pooled and treated as one large sample in order to obtain sufficient frequencies for the feature items concerned. To evaluate the performance of the proposed method on the same user, some unseen data were introduced. For this purpose, the processed data of each user from the second data set were partitioned into two sets: training and test data sets. The methods of generating these two data sets could be any partitioning mechanism often employed in a typical cross-validation assessment, e.g., leave-one-out cross validation, random sampling, and K-fold cross validation. We used K-fold cross-validation in our experiments, where a data set was split into K partitions. The major advantage of K-Fold cross validation is that all the instances in the data set are ultimately used for both training and testing purposes. The K-fold cross validation creates K partitions of the data set and uses K-1 partitions to form training data; the remaining one is used for testing and evaluating of the developed model. Popular choices for K-Fold cross validation are K=5 and K=10. For each item of the selected features of the target user, we ran K-fold cross validation choosing K=5, then for each i=1 to K we ran Mann-Whitney statistic test on the randomly generated partitions. In other words, each feature item had five unique statistical tests producing five p-values for each feature item. In the end, each feature item was tested five times against the same user, which produced five legitimate user match attempts for each feature of the same user. Since there were six legitimate users, there were 30 legitimate match attempts and thus 30 finalized decisions for each feature. Thus, the performance of the each feature was measured as the percentage of legitimate user attempts mistakenly rejected by the system, i.e., false negatives. 39

51 Analysis 4: The purpose of this analysis is to evaluate and study the influence of various keystroke feature combinations on the performance accuracy of the keystroke dynamics system. This analysis was conducted on 8 users (who collected the data from at first stage of the collecting the data process) and 8 items for each feature, the most 8 frequently appearing English alphabet items for each feature. In this case, since there were 8 users, there were 28 unique statistic tests and thus 28 p-values for each feature item. Since there were 8 items for each feature, there were 224 unique statistic tests and thus 224 p-values for each feature. In this analysis, the accuracy of combined features was calculated as the total number of p-values greater than 0.05 divided by the total number of statistical tests for the underlying features. The p-value >= 0.05 indicates that there is no evidence to reject the underlying null hypothesis, that is, the mean differences between the two-samples, and thus an indication of no statistically significant differences between two-samples, i.e., users. Therefore, combined features that have greater numbers of p-values less than 0.05 are supposed to be more discriminative Mann-Whitney Test Results Results of Analyses 1 and 2: This section reports the results for two cases: first, a comparison between the performances of four different keystroke features using data of 8 users and 8 items for each feature. In this case, the FAR value of each feature was calculated as the total number of p-values greater than 0.05 divided by the total number of statistical tests for the underlying feature. Second, a comparison between the performances of four different keystroke features with data of 28 users and 20 items for each feature. In this case, the FAR value of each feature was measured as the percentage of imposter attempts mistakenly accepted by the system based on the generated p-values and a voting schema. Figure 5.1 shows the results obtained for the two cases, respectively. 40

52 For the first case, observing Figure 5.1 (a), we notice that by using F1, we are able to obtain a better result compared to the other keystroke features (F2, F3, and F4). The FAR value for F1 is slightly above 6% followed by F4 with FAR value of 10%. This result is consistent with those of other published studies (Pin et al. 2012, Robinson et al. 1998). Of particular interest in Figure 5.1(a) is that F4 is shown to perform better than F2 and F3. The FAR values measured for F1, F2, F3, and F4 were 6.57%, 17.1%, 18%, and 10.28%, respectively. The FAR value of F4 was 10.28%, which shows that the total typing time for the most frequently used English words might contain valuable timing information that could be utilized in order to differentiate users. For the second case, in Figure 5.1 (b), we notice that by using F3 feature, we are able to obtain a better result compared to the other keystroke features (F1, F2, and F4). The FAR value for F3 is slightly above 3% followed by F2 with FAR value of 6%. Thus, based on this experiment, we can conclude that latency times are a more effective timing feature than hold times. (a) 8 users data and 8 items for each feature. (b) 28 users data and 20 items for each feature. Figure 5.1: A comparison between the performances of four different keystroke features. (a) Results obtained using Mann-Whitney statistic test. (b) Results obtained using Mann- Whitney statistic test and Voting Schema. 41

53 Figure 5.2 illustrates the FAR values for all feature items individually based on 28 users data. FAR value of each feature item was calculated as the total number of p- values greater than 0.05 divided by the total number of statistical tests for the underlying item. We can observe that some items performed better than others. For instance, the FAR value measured for E was 5%, while a FAR value observed for M was 28%. Moreover, as we can observe from Figure 5.2 (a), letters with higher frequencies performed better than letters with lower frequencies. (a) F1 items. (a) F2 items. (c) F3 items. (a) F4 items. Figure 5.2: A comparison between the performance of four different keystroke features items using 28 participants data with Mann-Whitney statistic test. Analysis 3 Results: For the third analysis designed to evaluate the performance of keystroke dynamics on the selected features against same user authentication, i.e., the legitimate users of a computer system, all four features achieved 0% FRR for all 42

54 legitimate users, indicating that the underlying features and the use of the most frequently used letters and words accurately distinguished the 6 users. Analysis 4 Results: We have already reported that F1 showed the best result among the four keystroke features. Our experiment results show that the combinations of features did not improve the performance of keystroke authentication. The FAR of combined features was calculated as the total number of p-values greater than 0.05 divided by the total number of statistical tests for the underlying combined features. As shown in Figure 5.3, the FAR values increase for combinations of all of the features. The most interesting finding we can note from Figure 5.3 is that if F1 combines with F4, it will lead to the best result out of all other combinations. Figure 17 reports the performance of different combinations of different keystroke features performed on eight users' data. While the combination of F1 and F4 produces the best performance among all four keystroke feature combinations, F1 offers the optimal result if used independently. Figure 5.3: The performance of different combinations of different keystroke features performed using 8 participants data Effect-Size Analyses Analysis: The aim of this analysis is measure the accuracy of each feature item (i.e., letter), individually based on its average effect sizes. 43

55 In order to apply this approach on our data sets, we performed the following: First, for each single item (i.e., letter), we calculated the effect size for each pair of users with the formula in section Since there were 28 users, there were 378 unique effect- size values for each item. Second, we calculated the average effect sizes of each item individually by using these 378 values. These averages provide us a summary estimate of multi-dimensional outcomes for feature s among all 28 users. Therefore, items that have larger averages of effect sizes were supposed to be more discriminative than those with smaller averages Effect-Size Analyses Results This subsection reports the effect sizes for feature items based on 28 users, so we can determine which feature items have a greater effect in terms of the difference between two mean measures. Table 5.1 shows the averages of effect sizes for all feature items based on data from 28 users. We notice that the feature F3 items have bigger effect sizes compared to the other keystroke features, i.e., F1, F2, and F4. The item s effect-size values for feature F3 range from 0.54 to 1.39, followed by feature F2 with the items effect-size values range from 0.14 to 1. In particular, we demonstrate the averages of effect sizes for feature item as follows: F3 item effect sizes range from a medium effect size to a large effect size, with the majority of these items having large effect sizes. F2 item effect sizes range from a small effect size to a large effect size, with the majority of these items having medium effect sizes. The majority of F1 item have medium effect sizes. F4 item effect sizes range from a small effect size to a medium effect size, with the majority of these items having medium effect sizes. 44

56 Table 5.1: Average of effect sizes for all feature items using 28 users data, sorted ascendingly. Item ES Item ES Item ES Item ES W 0.79 ON 1.02 OU 1.39 HIM 0.9 A 0.70 HI 1.01 ON 0.98 FOR 0.78 C 0.67 IN 0.92 HI 0.97 HAVE 0.69 P 0.67 TO 0.9 EA 0.9 OF 0.68 B 0.66 EA 0.88 IN 0.85 SHE 0.67 S 0.66 AT 0.8 TO 0.84 AND 0.64 H 0.65 OR 0.8 OR 0.74 TO 0.6 U 0.65 EN 0.74 IT 0.73 THAT 0.6 E 0.63 RE 0.74 AT 0.72 THEY 0.55 F 0.61 IT EN 0.71 THE 0.53 N 0.61 ER 0.7 RE 0.66 THIS 0.53 D 0.59 IS 0.68 IS 0.65 BE 0.52 I 0.59 HA 0.67 TH 0.64 ALL 0.52 O 0.59 AN 0.66 HA 0.63 YOU 0.48 Y 0.59 ND ND 0.62 IT 0.45 M 0.54 TH 0.62 AN 0.6 WILL 0.41 R 0.52 HE 0.56 ED 0.59 IS 0.39 T 0.52 TE 0.4 ER 0.57 A 0.34 L 0.50 ED 0.36 HE 0.54 HE 0.31 G 0.49 OU TE 0.36 I 0.26 AVG Subset Feature Performance This section reports the feature's FAR values for the 5, 10, and 15 most discriminative items based upon the two approaches applied (p-values and effect sizes). Moreover, this section reports the FAR values of the 5, 10, and 15 most frequent items of each feature. This finding is important because it indicates some features to be more dependable than others because they originated from a larger sample set or had a generally higher recurrence in the written language. For instance, a, e, and t ought to constitute more metering than q or z. Figure 5.4 illustrates the trade-off of the FAR values for various subsets of feature items based on the applied approaches (p-values and effect sizes). The performance of all 45

57 features improved using both approaches. For instance, the FAR value of feature F4 was reduced from 12% to 6% when using the 15 most discriminative items based on the effect-size approach, and to 5% based on the p-values approach. (a) based on p-values (b) based on effect-size values Figure 5.4: FAR values for the 5, 10, and 15 most discriminative items based upon two approaches applied (p-values and effect sizes) using 28 user s data. Figure 5.5 shows the trade- off of the FAR values for the 5, 10, and 15 most frequent items of each feature. As shown, the performance of features F1, F2, and F4 somewhat improved, but that of feature F3 did not. Figure 5.5: FAR values for the 5, 10, and most frequent items of each feature using 28 users data. 46

58 5.5. Conclusion We return to the questions posed in Section 5.2 and intend to address them: Q1. Which keystroke timing feature(s) among key duration, flight time latency, digraph time latency, and word total time duration timing performs better in keystroke dynamics? The results reported variation with reference to this question based on the method employed and the number of participants involved in the analysis. When using Mann Whitney statistics test applied to 8 users and 8 items of each feature, key duration (F1) obtained a better result among the four features with the FAR value equal to 6.5%, followed by word total time duration (F4) with FAR value of 10%. On the other hand, when applying a voting schema on p-values generated using Mann Whitney with data of 28 users and 20 items of each feature, the digraph time latency (F3) feature obtained a better result among the four features, with the FAR value equal to 3%, followed by F2, with FAR value of 6%. Both approaches achieved 0% FRR for all features. Therefore, we can conclude that when applying voting schema on p-values generated using Mann Whitney, the digraph time latency (F3) feature obtains a better result among the four features with the FAR value equal to 3% and FRR value equal to 0%. Q2. Does any combination of timing features improve the accuracy of authentication scheme? The results reported that using Mann-Whitney statistics test applied on 8 users and 8 items of each feature, the combination of features does not improve the accuracy of authentication. 47

59 Q3. Which feature items contribute more to the accuracy of the model, where feature items are defined as the exact instances of letters in each feature, e.g. a, th, etc.? The results reported in section show that some items perform better than others. For F1 feature, E, A, I, T, O items performed the best among the 20 items of F1 feature s items. For F2 feature, OF, OU, IN, TH, TO items performed the best among the 20 items of F2 feature s items. For F3 feature, OF, OU, IT, IN, TH performed the best among the 20 items of F3 feature items. For F4 feature, OF, IS, AND, ALL, WILL items be performed the best among the 20 items of F4 feature items. Q4. How word total time duration feature performs comparing to key duration, flight time latency, and digraph time latency? The results reported in Section show that F4 performed about the average of the other features. When Mann- Whitney statistics test is applied on 8 users and 8 items of each feature, word total time duration (F4) ranked second after key duration (F1) with the FAR value of 10% and FRR value of 0%. 48

60 CHAPTER 6 KEYSTROKE FEATURE SELECTION USING MACHINE LEARNING-BASED CLASSIFICATIONS While chapter 5 focuses on the application of statistical methods based on finding statistical parameters (mean and standard deviation) differences between two-samples in order to find out which features are most effective in distinguishing users, this chapter focuses on the application of Machine learning-based Classification Methods in order to compare feature performance in terms of distinguishing users. More specifically, five machine learning techniques are adapted for keystroke authentication in this chapter. The selected classification methods are Support Vector Machine (SVM), One Class Support Vector Machine (OC-SVM), Linear Discriminate Classifier (LDC), K-Nearest Neighbors (K-NN), and Naive Bayesian (NB). The adapted techniques are selected because of their prevalence in the literature. Furthermore, these techniques are accompanied by a good and comprehensive set of methods and implementations that enables the conducting of various forms of classification analysis. This chapter is organized as follows. Section 6.1 provides research questions to be addressed in this chapter. Section 6.2 describes and explains the analysis methods used. Section 6.3 explains the conducted experiments and their obtained results. Section 6.4 provides a conclusion and answers to questions posed Research Questions In addition to the research questions addressed in the previous chapter (outlined in section 5.2), this chapter will address the following research question: Q1. Which authentication algorithm(s) among (SVM, LDC, NB, and KNN) performs better in keystroke dynamics? This is a critical question to address since the algorithm employed plays an important role in influencing of classifier error rates of the continuous authentication detector [5]. 49

61 6.2. Machine Learning Methods Five machine learning techniques are adapted for keystroke authentication in this chapter. The selected classification methods are Support Vector Machine (SVM), One Class Support Vector Machine (OC-SVM), Linear Discriminate Classifier (LDC), K- Nearest Neighbors (K-NN), and Naive Bayesian (NB). The following section subsections describe these methodologies and they were applied on the collected data Support Vector Machine (SVM) Support vector machines (SVMs) are an excellent technique for data classification. SVM is a discriminative classifier based on the concept of separating hyperplane. In other words, given a set of labeled of training data, SVM objective is to find an optimal separating hyperplane that separates the training data set by a maximal margin. Given two classes labeled training data, of data either to (+1) or (-1). the SVM outputs an optimal hyperplane that categorizes new observation Generally, SVM classification is a two-phase process. In the first phase, the training phase, SVM classifier builds two models using the training vectors. The algorithm projects two classes of training data into a higher dimensional space, then finds a linear separator between the two classes. Thus, all vectors located on one side of the hyperplane are labeled as 1, and all vectors located on the other side are labeled as -1. In the second phase, the testing phase, the testing vectors are mapped into the same high-dimensional space and it is predicted as to which category they belong to, according to which side of the hyperplane they fall. In addition to the traditional linear classification, SVMs can effectively execute a non-linear classification using what is called the kernel functions. The two most used kernel functions with SVM are the polynomial kernel and the radial basis. 50

62 While the polynomial kernel produces polynomial boundaries of degree d in the original space, the radial basis function (RBF) produces boundaries by putting weighted Gaussians upon key training instances.rbf kernel nonlinearly projects training data samples into a higher dimensional space and can therefore deal with the cases in which the relation between class labels and attributes is nonlinear. Thus, this kernel is distinct upon the linear kernel Linear Discriminate Classifier (LDC) Linear Discriminate classifier (LDC) is a simple and mathematically strong classification method that is based upon the concept of searching a linear combination of variables which can best separate two or more classes. LDC tries to find a linear combination of variables (predictors) that maximizes the difference between the means of the data while at the same time minimizing the variation within each group of the data. Therefore, LDC usually looks for a projection where samples from the same class are projected close to each other and, at the same time, the projected means are as separate as possible. Assume that we have C classes with the mean vector μ i of each class, we are then interested in finding a scalar by projecting the samples onto a line: where x is a sample vector, Wk is a weight vector that represents coefficients, and W0 is the bias, also called threshold, which determines the decision surface. In order to find the optimum W, first the within-class scatter matrix SW is computed by the following equation: 51

63 where which represents scatter matrix for every class, and is the mean class vector. Similarly, the between-class scatter matrix is computed by the following equation: where m is the overall mean, and Mi and Ni are the sample mean and sizes of the respective classes. Finally, linear discriminates can be expressed as: K-Nearest Neighbor K-nearest neighbor classifier is one of the simplest and most well-known classifiers in pattern recognition. The main principle behind it is to calculate the distance from a query point to all of its neighbors in the training set and to select the k closest ones. Specifically, in the training phase, the KNN classifier builds two models using the training vectors. In the testing phase, the distances between the query point and all its neighbors are calculated. A majority vote is then used to classify a new point to the most prevalent class between its K-nearest neighbors. In other words, KNN classifies new points based on a similarity measure (e.g., distance functions). In general, the distance can be any metric measure; however, the Euclidean distance is the most frequent choice. The Euclidean distance is given by the following equation: 52

64 Naïve Bayes The Naive Bayes classifier technique is a well-known machine learner based on Bayes s rule of conditional probability with a naive independence assumption between every pair of variables. Specifically, Naive Bayes classifier provides a method of calculating the posterior probability, P(c x), from P(c), P(x), and P (x c) as seen in the following equation: where: P(c x) is the posterior probability of target class given predictor (attribute). P(c) is the prior probability of class. P(x c) is the likelihood, which is the probability of predictor given class. P(x) is the prior probability of predictor One Class SVM (OC-SVM) One Class SVM (OC-SVM) is an extension of SVM that can train a classification model from only one class without the need for negative samples. Basically, this SVM variant was developed for the purpose of anomaly detection. It maps the data from one class into high-dimensional space and finds a linear separator between the mapped and the origin. In the training phase, the detector builds only a one-class model using the training samples. In the testing phase, the testing vector is mapped into the same high- 53

65 dimensional space. The anomaly score is then calculated as a sign-inverted distance between the testing vector and the linear separator R Packages Analyses of machine learning-based classification methods were conducted using R scripts. In particular, knn and lda functions from MASS R package (Venables and Ripley 2013) were used to carry out K-Nearest Neighbors (K-NN), and Linear Discriminate Classifier (LDC) analyses respectively. On the other hand, svm and naivebayes functions from e1071 R package were used to carry out Support Vector Machine (SVM) and Naive Bayesian (NB) analyses respectively Machine Learning Analyses And Results This section explains analyses and results that were obtained based on machine learning methods. Two types of analyses were conducted in this setting. In the first type, the accuracy of each feature item is measured as the percentage of positive and negative testing cases were correctly classified by the classifier. In the second type, the accuracy of each feature is measured as the percentage of false positive and false negative error rates. In the first type of analysis, the selected classification methods are support vector machine (SVM), one-class support vector machine (OC-SVM), k-nearest neighbor s classifier (K-NN), and Naive Bayesian (NB). In the second type of analysis, the selected classification methods are support vector machine, linear discriminate classifier (LDC), k-nearest neighbor s classifier (K-NN), and Naive Bayesian (NB) Feature s Items Performance Analysis: The aim of this analysis is to measures the accuracy of each feature item individually, so we can determine which items contribute more for the precision of the features. For this analysis, all feature items from the data set were compared using the four proposed methods, (SVM), (OC-SVM), (K-NN), and (NB). For each user, the first 25 54

66 typed timing repetitions of all feature items were selected to build the user file. The first 20 repetitions of each feature item (of the 25) for each user were chosen for the training phase, and the remaining 5 repetitions of each feature item were used for the testing phase. For each pair of users, classifiers were trained and tested to measure the accuracy of each feature item individually according to its ability to classify users. The accuracy of each feature item is measured as the percentage of positive and negative testing cases correctly classified by the classifier. In other words, accuracy of the features is measured as the percentage of patterns that were correctly mapped to their true users, and is evaluated by the following formula: where T is the number of sample cases correctly classified and N is the total number of sample cases. Tables 6.1 to 6.4 illustrates the accuracy values for all feature items in each experiment based on using (SVM), (OC-SVM), (K-NN), and (NB), respectively. We can observe that some items perform better than others. For instance, the accuracy value measured using (SVM) for B was 90% while the accuracy value measured for R was 73%. The high accuracy values for feature items suggest that these items might be good indicators for distinguishing users. Moreover, in Tables 6.1 to 6.4 it is observed that some feature items appear more than once in the topmost of all classifiers results. For instance: for F1 feature, B, H, T, I, Y items appear in the 10 topmost-appearing items of F1 feature item. For F2 feature, IN, OU, ED, EA, TH items appear in the 10 topmost-appearing items of F2 feature items. For F3 feature, IN, OU, ED, HI, EA items appear in the 10 topmost-appearing items of F3 feature items. For F4 feature, WILL, ALL, A, AND, HAVE items appear in the 10 topmost-appearing items of F4 feature items. These feature items that appear in the topmost suggest that these items are good indicators for distinguishing users. 55

67 The more interesting finding demonstrated in tables is that F4 items perform better than F2 and F3 items, which shows that the total time of typing the most frequently used English words might contain valuable timing information that could be utilized to differentiate users. Apart from the accuracy values of feature items, Tables 6.1 to 6.4 report the overall accuracy of each feature individually by calculating the average of its items accuracy values. We use these averages to compare the four different types of keystroke features. In tables 6.1 to 6.4 it is shown that we were able to obtain a better result using F1 compared to the other keystroke features (F2, F3, and F4). Three classifiers (SVM, OC- SVM, and KNN) out of the four used classifiers show that F1 is performed better than other features, followed by F4. For instance, the average accuracy value measured using SVM for F1 was 84%, followed by F4, with the accuracy value of 81%. Thus, we can conclude that hold times are a more effective timing feature than latency times. This result is consistent with existing work [20] Table 6.1: Item performance using SVM Item Accuracy Item Accuracy Item Accuracy Item Accuracy B 0.90 IN 0.85 IN 0.86 WILL 0.87 H 0.88 OU 0.80 HI 0.82 ALL 0.86 P 0.88 ED 0.79 ED 0.81 A 0.85 T 0.88 HI 0.79 OU 0.81 AND 0.84 Y 0.88 EA 0.78 EA 0.8 HAVE 0.84 I 0.86 TH 0.77 HE 0.8 THE 0.84 N 0.86 ER 0.76 TH 0.8 TO 0.84 U 0.86 RE 0.73 TO 0.8 IS 0.83 W 0.86 TO 0.73 EN 0.79 OF 0.83 G 0.85 AT 0.70 IS 0.79 BE 0.83 L 0.85 HE 0.70 ER 0.78 FOR 0.81 M 0.84 AN 0.68 HA 0.78 IT 0.81 O 0.82 EN 0.68 OR 0.76 SHE 0.81 S 0.82 IS 0.67 RE 0.76 THIS 0.80 C 0.81 IT 0.67 AN 0.75 HE 0.79 D 0.81 ND 0.67 IT 0.75 I 0.79 F 0.80 HA 0.66 AT 0.74 YOU

68 Table 6.1 Continued Item Accuracy Item Accuracy Item Accuracy Item Accuracy A 0.79 OR 0.66 ND 0.72 THEY 0.77 E 0.76 ON 0.64 ON 0.7 THAT 0.74 R 0.73 TE 0.62 TE 0.7 HIM 0.73 AVG Table 6.2: Item performance using OC-SVM Item Accuracy Item Accuracy Item Accuracy Item Accuracy Y 0.80 OU 0.73 ED 0.72 TO 0.79 L 0.79 EA 0.71 IN 0.71 HAVE 0.77 M 0.79 IN 0.70 EA 0.69 OF 0.77 T 0.79 HI 0.68 OU 0.69 THE 0.76 G 0.78 TH 0.68 HA 0.65 A 0.76 H 0.77 RE 0.67 OR 0.65 ALL 0.75 I 0.77 ER 0.65 TH 0.65 AND 0.74 N 0.77 ED 0.63 TO 0.65 YOU 0.74 O 0.77 AN 0.60 HE 0.64 WILL 0.73 U 0.76 HA 0.60 HI 0.64 FOR 0.72 W 0.75 ND 0.60 IS 0.64 IS 0.72 S 0.74 TO 0.60 IT 0.64 IT 0.72 B 0.73 IS 0.59 ND 0.64 BE 0.72 C 0.73 IT 0.59 TE 0.62 HE 0.72 F 0.73 AT 0.58 EN 0.61 THIS 0.72 D 0.72 EN 0.57 AN 0.60 I 0.72 A 0.71 TE 0.57 RE 0.59 SHE 0.71 R 0.71 HE 0.54 AT 0.58 THEY 0.69 P 0.70 ON 0.53 ER 0.58 HIM 0.69 E 0.69 OR 0.53 ON 0.54 THAT 0.64 AVG

69 Table 6.3: Item performance using KNN Item Accuracy Item Accuracy Item Accuracy Item Accuracy B 0.86 IN 0.86 IN 0.86 A 0.91 M 0.86 OU 0.85 IS 0.84 WILL 0.9 H 0.85 EA 0.81 TH 0.83 ALL 0.88 L 0.85 ED 0.81 ED 0.82 BE 0.87 O 0.85 TH 0.81 HE 0.82 TO 0.87 A 0.84 HI 0.8 HI 0.82 I 0.87 I 0.84 RE 0.8 OU 0.82 AND 0.86 S 0.84 ER 0.79 EN 0.81 OF 0.86 T 0.84 TO 0.74 TO 0.81 THE 0.86 Y 0.84 AT 0.73 EA 0.80 HAVE 0.84 C 0.83 AN 0.72 AN 0.78 IS 0.83 D 0.83 ND 0.71 ER 0.78 IT 0.83 F 0.83 EN 0.7 RE 0.78 FOR 0.82 G 0.83 HE 0.7 AT 0.77 YOU 0.82 N 0.83 IT 0.7 HA 0.77 HE 0.82 P 0.83 HA 0.69 IT 0.77 SHE 0.82 R 0.83 IS 0.69 OR 0.76 THIS 0.82 E 0.82 OR 0.69 ND 0.74 THEY 0.82 U 0.82 ON 0.65 TE 0.71 THAT 0.74 W 0.82 TE 0.65 ON 0.70 HIM 0.74 AVG Table 6.4: Item performance using NB Item Accuracy Item Accuracy Item Accuracy Item Accuracy F 0.71 IN 0.82 IN 0.83 BE 0.79 B 0.7 TH 0.74 ED 0.78 TO 0.79 C 0.7 ED 0.73 TH 0.78 WILL 0.79 P 0.7 ER 0.73 HE 0.77 ALL 0.79 H 0.69 HI 0.73 HI 0.77 HAVE 0.78 O 0.69 EA 0.7 TO 0.76 THE 0.78 T 0.69 HE 0.7 EA 0.75 OF 0.77 I 0.68 OU 0.7 OU 0.75 A 0.77 G 0.67 RE 0.68 HA 0.74 AND 0.76 R 0.67 TO 0.67 EN 0.73 IS 0.76 S 0.67 HA 0.65 ER 0.73 THIS 0.75 U 0.66 IS 0.65 IS 0.73 HIM 0.74 L 0.65 AN 0.63 AN 0.71 FOR 0.73 W 0.65 IT 0.63 IT 0.71 IT

70 Table 6.4 Continued Item Accuracy Item ntinued Accuracy Item Accuracy Item Accuracy Y 0.65 OR 0.63 OR 0.71 YOU 0.73 D 0.64 AT 0.62 AT 0.69 SHE 0.73 M 0.64 EN 0.61 RE 0.69 I 0.71 A 0.63 ND 0.61 ND 0.67 THAT 0.7 E 0.63 ON 0.59 ON 0.66 HE 0.7 N 0.61 TE 0.59 TE 0.65 THEY 0.66 AVG Features Performance Analyses: The aim of this analysis is to compare features in terms of relative effectiveness at distinguishing users. In this analysis, authentication system performance was evaluated by assigning each of the 28 users in the training data set the role of a genuine user. Then, each of the remaining users played the role of imposter. Classifiers were then trained and tested to measure their ability to classify genuine user and imposters. On each repetition, we keep track of the number of false positives and false negatives classified by the classifier. The false acceptance rate (FAR) and false rejection rate (FRR) were then computed as follow: We ran two analyses in this context. The aim of the first analysis was to compare features individually in order to find out which features are most effective in distinguishing users. The purpose of the second analysis is to evaluate and study the influence of various keystroke feature combinations on the performance accuracy of the keystroke dynamics authentication system. Specifically, we compared the performance of four features used individually against ten different feature combinations using the proposed classifiers. 59

71 For both analyses, all features from the data set were compared using the four proposed methods, (SVM), (LDC), (K-NN), and (NB). The first 25 typed timing repetitions of all feature items (20 items of each feature) were selected to create a user profile. Then, the first 20 repetitions of each feature item (of the 25) for each user were chosen for the training phase, and the remaining 5 repetitions of each feature item were used for the testing phase. We compared four different keystroke features (F1, F2, F3, and F4) based on the four classification methods. Tables 6.5 illustrate the FAR and FRR values for all features based on the use of (SVM), (LDC), (K-NN), and (NB), respectively. We observe that by using F4, we are able to obtain a better result compared to the other keystroke features (F1, F2, and F3). In particular, the results are ascertaining noticeably better with the use of the KNN classifier. The FAR and FRR values when F4 was used were 3% and 2%, followed by F3 with 5% and 4% of FAR and FRR, respectively. The low FAR and FRR values for F4 and F3 suggest that these two timing features might be good indicators for distinguishing users. Table 6.5: Performance Comparison of four different keystroke features on SVM, LDC, NB, and KNN. Feature TC-SVM LDC NB KNN FAR,FRR FAR,FRR FAR,FRR FAR,FRR F1 0.08, , , , 0.08 F2 0.14, , , , 0.10 F3 0.09, , , , 0.04 F4 0.14, , , , 0.02 Since KNN and SVM outperformed the other classification methods, we use only KNN and SVM in the second analysis that aims to evaluate and study the influence of various keystroke feature combinations on the performance accuracy of the keystroke dynamics authentication system. It can be seen from Figures 6.1 and 6.2 that for both methods, the error rates values of FAR and FRR were reduced by combining more than one feature. In particular, using KNN classifier, we were able to achieve 2% of FAR and 1% of FRR when combining the four features. More notably, while using KNN classifier, 60

72 the results show that the combination of F4 with any other three features produced a better result as compared to other combinations without F4. Furthermore, the worst results were obtained from combining F2 and F3 for both classifiers. It is possible that because the latency times are less stable than duration times, combining of the two latency times in combination might prove insufficient for distinguishing users FAR FRR Figure 6.1: Performance comparison of four keystroke features used independently against the ten different combinations performed on SVM FAR FRR Figure 6.2: Performance comparison of four keystroke features used independently against the ten different combinations performed on KNN Conclusion We return to the questions posed in Section 6.2 and intend to address them; questions 1-4 are taken from previous chapter section 5.2: 61

73 Q1. Which keystroke timing feature(s) among key duration, flight time latency, digraph time latency, and word total time duration timing performs better in keystroke dynamics? The results reported that when applying the four classification methods (SVM, LDC, NB, and KNN) on 28 users and 20 items of each feature, word total time duration (F4) obtains a better result among the four features. In particular, the results are noticeably better while using the KNN classifier. The FAR and FRR values when using F4 achieved were 3% and 2% followed by F3 with 5% and 4% of FAR and FRR, respectively. Q2. Does any combination of timing features improve the accuracy of authentication scheme? The experiment results show that, when using KNN and SVM classification methods using 28 users and 20 items of each feature, the combination of features enhances the performance accuracy. While the best FAR and FRR values obtained by F4 among the four features when they used individually were 3% and 2% respectively, the combinations of the four features achieved 2% of FAR and 1% of FRR when using KNN classifier. Q3. Which feature item contributes more to the accuracy of the model? Where feature item are defined as the exact instances of letters in each feature, e.g. a, Th, etc.? The results reported in section show that some items are performing better than others. In Tables it is observed that some feature item appear more than once in the topmost of each classifier results. However, to answer this question, the 5 feature item appear in the 10 topmost-appearing items obtained for each feature in all classifiers are selected. Based on this approach, we conclude that, for F1 feature, H, T, I, B, Y items are performed the best among the 20 items of F1 feature item. For F2 feature, IN, OU, ED, 62

74 HI, EA are performed the best among the 20 items of F2 feature item. For F3 feature, IN, OU, ED, HI, EA are performed the best among the 20 items of F3 feature item. For F4 feature, WILL, ALL, A, AND, HAVE items are performed the best among the 20 items of F4 feature item. Q4. How word total time duration feature performs comparing to key duration, flight time latency, and digraph time latency? The results reported in Section shows that F4 obtains a better result among the four features. In particular, the results are noticeably better using the KNN classifier. The FAR and FRR values achieved when using F4 were 3% and 2%, respectively. Q5. Which authentication algorithm(s) among (SVM, LDC, NB, and KNN) performs better in keystroke dynamics? For comparison of the classification algorithms mentioned in section 6.3.2, each of these learning algorithms was applied on the processed user patterns. The digraph latency (F3) was used to compare the classification algorithms performance since it is the feature most utilized by researchers in literature. By observing Table 6.5, we notice that KNN and SVM perform the best among the four proposed classifiers. However, using KNN classifier, we are able to obtain a better result compared to the other learning classifiers (SVM, LDC, and NB) for authentication error rates of FAR and FRR. The FAR and FRR values achieved were 5% and 4%, respectively. 63

75 CHAPTER 7 KEYSTROKE FEATURE SELECTION USING WRAPPER-BASED APPROACHES While the main focus in the last two chapters (chapters 5 and 6) was on comparison of features in terms of effectiveness at distinguishing users, this chapter focuses on reducing the dimensionality of features by identifying a small subset of all possible features that represents the most discriminating features, thereby decreasing the processing time. In particular, wrapper-based feature subset selection approach is used in this chapter to reduce the dimensionality by identifying a small subset of features that represent the most discriminating features in the keystroke dynamics environment. The main advantage of wrapper approaches is that it takes into account the interaction between feature search and model selection, which can yield better classification performance because the feature selection procedure is integrated by the classification algorithm. Experimental results have shown that the wrapper methods can obtain good performance, albeit with the disadvantage of high computational cost. Therefore, wrapper-based feature subset selection approach is used in this dissertation. This chapter is organized as follows. Section 7.1 provides an overview of feature selection methods. Section 7.2 describes and explains the analysis methods to be used. Section 7.3 explains obtained results Feature Selections Feature selection is a major problem that has been addressed in many areas including statistics, pattern recognition, and data mining. There are two main purposes for feature selection: reducing the dimensionality by identifying a small subset of all possible features that represent the most discriminating features, i.e. subset selection, or ranking the features according to their importance, i.e. feature ranking. 64

76 Ultimately, the main objective of feature selection is to choose a small (subset) of features from a large pool of features to reduce the cost execution time of a given system, as well as to improve its accuracy. There are three main categories or approaches of feature selection, as shown in Figure 7.1 (Guyon, Isabelle, and André Elisseeff 2003) [32]. 1. Filter-based feature selection: this category ranks the features according to their importance and eliminates all features that do not achieve a sufficient score. 2. Wrapper-based feature selection: in this category, the process of the feature selection search is integrated with classification algorithm. Thus, the importance of features depends on the classifier model used. 3. Embedded-based feature selection: this category performs feature selection as part of the training process. Figure 7.1: Three main categories of feature selection Filter Approach In Filter approach, the feature subset selection is performed independently of the classification algorithm. A computed weight score is assigned to a feature, and then those features with better weight scores are selected (Khushaba et al. 2008). The selected features are then presented as input to the classifier algorithm. In this approach, the process of feature selection is performed only one time, and then the selected features can be evaluated using different classifiers. This selection approach may be regarded as a preprocessing step. Thus, this approach requires no feedback from classifiers. 65

User Authentication using Combination of Behavioral Biometrics over the Touchpad acting like Touch screen of Mobile Device

User Authentication using Combination of Behavioral Biometrics over the Touchpad acting like Touch screen of Mobile Device 2008 International Conference on Computer and Electrical Engineering User Authentication using Combination of Behavioral Biometrics over the Touchpad acting like Touch screen of Mobile Device Hataichanok

More information

An Analysis of Keystroke Dynamics Use in User Authentication

An Analysis of Keystroke Dynamics Use in User Authentication An Analysis of Keystroke Dynamics Use in User Authentication Sam Hyland (0053677) Last Revised: April 7, 2004 Prepared For: Software Engineering 4C03 Introduction Authentication is an important factor

More information

Deployment of Keystroke Analysis on a Smartphone

Deployment of Keystroke Analysis on a Smartphone Deployment of Keystroke Analysis on a Smartphone A. Buchoux 1 and N.L. Clarke 1,2 1 Centre for Information Security & Network Research, University of Plymouth, Plymouth, UK info@cisnr.org 2 School of Computer

More information

Personal Identification Techniques Based on Operational Habit of Cellular Phone

Personal Identification Techniques Based on Operational Habit of Cellular Phone Proceedings of the International Multiconference on Computer Science and Information Technology pp. 459 465 ISSN 1896-7094 c 2006 PIPS Personal Identification Techniques Based on Operational Habit of Cellular

More information

The Development of a Pressure-based Typing Biometrics User Authentication System

The Development of a Pressure-based Typing Biometrics User Authentication System The Development of a Pressure-based Typing Biometrics User Authentication System Chen Change Loy Adv. Informatics Research Group MIMOS Berhad by Assoc. Prof. Dr. Chee Peng Lim Associate Professor Sch.

More information

Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

More information

BehavioSec participation in the DARPA AA Phase 2

BehavioSec participation in the DARPA AA Phase 2 BehavioSec participation in the DARPA AA Phase 2 A case study of Behaviometrics authentication for mobile devices Distribution Statement A (Approved for Public Release, Distribution Unlimited) 1 This paper

More information

Continuous Biometric User Authentication in Online Examinations

Continuous Biometric User Authentication in Online Examinations 2010 Seventh International Conference on Information Technology Continuous Biometric User Authentication in Online Examinations Eric Flior, Kazimierz Kowalski Department of Computer Science, California

More information

Multimodal Biometric Recognition Security System

Multimodal Biometric Recognition Security System Multimodal Biometric Recognition Security System Anju.M.I, G.Sheeba, G.Sivakami, Monica.J, Savithri.M Department of ECE, New Prince Shri Bhavani College of Engg. & Tech., Chennai, India ABSTRACT: Security

More information

Keywords image processing, signature verification, false acceptance rate, false rejection rate, forgeries, feature vectors, support vector machines.

Keywords image processing, signature verification, false acceptance rate, false rejection rate, forgeries, feature vectors, support vector machines. International Journal of Computer Application and Engineering Technology Volume 3-Issue2, Apr 2014.Pp. 188-192 www.ijcaet.net OFFLINE SIGNATURE VERIFICATION SYSTEM -A REVIEW Pooja Department of Computer

More information

Mathematical Model Based Total Security System with Qualitative and Quantitative Data of Human

Mathematical Model Based Total Security System with Qualitative and Quantitative Data of Human Int Jr of Mathematics Sciences & Applications Vol3, No1, January-June 2013 Copyright Mind Reader Publications ISSN No: 2230-9888 wwwjournalshubcom Mathematical Model Based Total Security System with Qualitative

More information

Digital Identity & Authentication Directions Biometric Applications Who is doing what? Academia, Industry, Government

Digital Identity & Authentication Directions Biometric Applications Who is doing what? Academia, Industry, Government Digital Identity & Authentication Directions Biometric Applications Who is doing what? Academia, Industry, Government Briefing W. Frisch 1 Outline Digital Identity Management Identity Theft Management

More information

User Authentication Methods for Mobile Systems Dr Steven Furnell

User Authentication Methods for Mobile Systems Dr Steven Furnell User Authentication Methods for Mobile Systems Dr Steven Furnell Network Research Group University of Plymouth United Kingdom Overview The rise of mobility and the need for user authentication A survey

More information

addressed. Specifically, a multi-biometric cryptosystem based on the fuzzy commitment scheme, in which a crypto-biometric key is derived from

addressed. Specifically, a multi-biometric cryptosystem based on the fuzzy commitment scheme, in which a crypto-biometric key is derived from Preface In the last decade biometrics has emerged as a valuable means to automatically recognize people, on the base is of their either physiological or behavioral characteristics, due to several inherent

More information

Authentication Solutions Through Keystroke Dynamics

Authentication Solutions Through Keystroke Dynamics Objective: The objective of this paper is to provide a basic understanding of the biometric science of keystroke dynamics, and how BioPassword is using keystroke dynamics technology to deliver enterprise

More information

KEYSTROKE DYNAMIC BIOMETRIC AUTHENTICATION FOR WEB PORTALS

KEYSTROKE DYNAMIC BIOMETRIC AUTHENTICATION FOR WEB PORTALS KEYSTROKE DYNAMIC BIOMETRIC AUTHENTICATION FOR WEB PORTALS Plurilock Security Solutions Inc. www.plurilock.com info@plurilock.com 2 H IGHLIGHTS: PluriPass is Plurilock static keystroke dynamic biometric

More information

Biometric Authentication using Online Signature

Biometric Authentication using Online Signature University of Trento Department of Mathematics Outline Introduction An example of authentication scheme Performance analysis and possible improvements Outline Introduction An example of authentication

More information

A Behavioral Biometric Approach Based on Standardized Resolution in Mouse Dynamics

A Behavioral Biometric Approach Based on Standardized Resolution in Mouse Dynamics 370 IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 A Behavioral Biometric Approach Based on Standardized Resolution in Mouse Dynamics S.Benson Edwin Raj Assistant

More information

Identity Theft, Computers and Behavioral Biometrics

Identity Theft, Computers and Behavioral Biometrics Identity Theft, Computers and Behavioral Biometrics Robert Moskovitch, Clint Feher, Arik Messerman, Niklas Kirschnick, Tarik Mustafić, Ahmet Camtepe, Bernhard Löhlein, Ulrich Heister, Sebastian Möller,

More information

User Behaviour Analytics

User Behaviour Analytics User Behaviour Analytics How do they know its really you? White Paper Sept 2015 Ezmcom Inc. 4701 Patrick Henry Drive BLDG 7, Santa Clara, CA, 95054, US Executive Summary Authentication has traditionally

More information

Multi-factor authentication

Multi-factor authentication CYBER SECURITY OPERATIONS CENTRE (UPDATED) 201 (U) LEGAL NOTICE: THIS PUBLICATION HAS BEEN PRODUCED BY THE DEFENCE SIGNALS DIRECTORATE (DSD), ALSO KNOWN AS THE AUSTRALIAN SIGNALS DIRECTORATE (ASD). ALL

More information

May 2010. For other information please contact:

May 2010. For other information please contact: access control biometrics user guide May 2010 For other information please contact: British Security Industry Association t: 0845 389 3889 f: 0845 389 0761 e: info@bsia.co.uk www.bsia.co.uk Form No. 181.

More information

3D PASSWORD. Snehal Kognule Dept. of Comp. Sc., Padmabhushan Vasantdada Patil Pratishthan s College of Engineering, Mumbai University, India

3D PASSWORD. Snehal Kognule Dept. of Comp. Sc., Padmabhushan Vasantdada Patil Pratishthan s College of Engineering, Mumbai University, India 3D PASSWORD Tejal Kognule Yugandhara Thumbre Snehal Kognule ABSTRACT 3D passwords which are more customizable and very interesting way of authentication. Now the passwords are based on the fact of Human

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Application of Neural Network in User Authentication for Smart Home System

Application of Neural Network in User Authentication for Smart Home System Application of Neural Network in User Authentication for Smart Home System A. Joseph, D.B.L. Bong, D.A.A. Mat Abstract Security has been an important issue and concern in the smart home systems. Smart

More information

Voice Authentication for ATM Security

Voice Authentication for ATM Security Voice Authentication for ATM Security Rahul R. Sharma Department of Computer Engineering Fr. CRIT, Vashi Navi Mumbai, India rahulrsharma999@gmail.com Abstract: Voice authentication system captures the

More information

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks Ph. D. Student, Eng. Eusebiu Marcu Abstract This paper introduces a new method of combining the

More information

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

A Review of Anomaly Detection Techniques in Network Intrusion Detection System A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In

More information

French Justice Portal. Authentication methods and technologies. Page n 1

French Justice Portal. Authentication methods and technologies. Page n 1 French Justice Portal Authentication methods and technologies n 1 Agenda Definitions Authentication methods Risks and threats Comparison Summary Conclusion Appendixes n 2 Identification and authentication

More information

Support Vector Machines for Dynamic Biometric Handwriting Classification

Support Vector Machines for Dynamic Biometric Handwriting Classification Support Vector Machines for Dynamic Biometric Handwriting Classification Tobias Scheidat, Marcus Leich, Mark Alexander, and Claus Vielhauer Abstract Biometric user authentication is a recent topic in the

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering Authentication System for Online Banking Application by Using Keystroke Dynamic on Android Phones Dnyaneshwari S. Dhundad, Prof. D. N. Rewadkar Post Graduate Student, Dept. of Computer Engineering, RMD

More information

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 2, February 2014,

More information

Physical Security: A Biometric Approach Preeti, Rajni M.Tech (Network Security),BPSMV preetytushir@gmail.com, ratri451@gmail.com

Physical Security: A Biometric Approach Preeti, Rajni M.Tech (Network Security),BPSMV preetytushir@gmail.com, ratri451@gmail.com www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3864-3868 Abstract: Physical Security: A Approach Preeti, Rajni M.Tech (Network

More information

Framework for Biometric Enabled Unified Core Banking

Framework for Biometric Enabled Unified Core Banking Proc. of Int. Conf. on Advances in Computer Science and Application Framework for Biometric Enabled Unified Core Banking Manohar M, R Dinesh and Prabhanjan S Research Candidate, Research Supervisor, Faculty

More information

This method looks at the patterns found on a fingertip. Patterns are made by the lines on the tip of the finger.

This method looks at the patterns found on a fingertip. Patterns are made by the lines on the tip of the finger. According to the SysAdmin, Audit, Network, Security Institute (SANS), authentication problems are among the top twenty critical Internet security vulnerabilities. These problems arise from the use of basic

More information

Multifactor Graphical Password Authentication System using Sound Signature and Handheld Device

Multifactor Graphical Password Authentication System using Sound Signature and Handheld Device Multifactor Graphical Password Authentication System using Sound Signature and Handheld Device Jyoti Rao *1,Kishan Mistry #2, Bhumika Mistry #3, Divya Malviya #4, Devesh Gudway #5 # Student & Department

More information

Signature Verification Why xyzmo offers the leading solution.

Signature Verification Why xyzmo offers the leading solution. Dynamic (Biometric) Signature Verification The signature is the last remnant of the hand-written document in a digital world, and is considered an acceptable and trustworthy means of authenticating all

More information

Biometrics is the use of physiological and/or behavioral characteristics to recognize or verify the identity of individuals through automated means.

Biometrics is the use of physiological and/or behavioral characteristics to recognize or verify the identity of individuals through automated means. Definition Biometrics is the use of physiological and/or behavioral characteristics to recognize or verify the identity of individuals through automated means. Description Physiological biometrics is based

More information

Biometrics in Physical Access Control Issues, Status and Trends White Paper

Biometrics in Physical Access Control Issues, Status and Trends White Paper Biometrics in Physical Access Control Issues, Status and Trends White Paper Authored and Presented by: Bill Spence, Recognition Systems, Inc. SIA Biometrics Industry Group Vice-Chair & SIA Biometrics Industry

More information

Assignment 1 Biometric authentication

Assignment 1 Biometric authentication Assignment 1 Biometric authentication Internet Security and Privacy Alexandre Fustier Vincent Burger INTRODUCTION:...3 I. TYPES AND DESCRIPTION OF BIOMETRICS...4 1. PHYSIOLOGICAL BIOMETRIC...4 a. Fingerprints...4

More information

Digital Fingerprinting Based on Keystroke Dynamics

Digital Fingerprinting Based on Keystroke Dynamics Digital Fingerprinting Based on Keystroke Dynamics Abstract A. Ahmed, I. Traore and A. Almulhem Department of Electrical and Computer Engineering University of Victoria, Victoria, BC, Canada e-mail: {aahmed,

More information

International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014

International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014 Efficient Attendance Management System Using Face Detection and Recognition Arun.A.V, Bhatath.S, Chethan.N, Manmohan.C.M, Hamsaveni M Department of Computer Science and Engineering, Vidya Vardhaka College

More information

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased

More information

Security in an Increasingly Threatened World. SMS: A better way of doing Two Factor Authentication (2FA)

Security in an Increasingly Threatened World. SMS: A better way of doing Two Factor Authentication (2FA) Security in an Increasingly Threatened World SMS: A better way of doing Two Factor Authentication (2FA) January 2015 The Proliferation of The App World The revolution of the smart phone forever affected

More information

Measuring Performance in a Biometrics Based Multi-Factor Authentication Dialog. A Nuance Education Paper

Measuring Performance in a Biometrics Based Multi-Factor Authentication Dialog. A Nuance Education Paper Measuring Performance in a Biometrics Based Multi-Factor Authentication Dialog A Nuance Education Paper 2009 Definition of Multi-Factor Authentication Dialog Many automated authentication applications

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

An Enhanced Countermeasure Technique for Deceptive Phishing Attack

An Enhanced Countermeasure Technique for Deceptive Phishing Attack An Enhanced Countermeasure Technique for Deceptive Phishing Attack K. Selvan 1, Dr. M. Vanitha 2 Research Scholar and Assistant Professor, Department of Computer Science, JJ College of Arts and Science

More information

Detecting Credit Card Fraud

Detecting Credit Card Fraud Case Study Detecting Credit Card Fraud Analysis of Behaviometrics in an online Payment environment Introduction BehavioSec have been conducting tests on Behaviometrics stemming from card payments within

More information

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems

More information

Security+ Guide to Network Security Fundamentals, Fourth Edition. Chapter 10 Authentication and Account Management

Security+ Guide to Network Security Fundamentals, Fourth Edition. Chapter 10 Authentication and Account Management Security+ Guide to Network Security Fundamentals, Fourth Edition Chapter 10 Authentication and Account Management Objectives Describe the three types of authentication credentials Explain what single sign-on

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Enhanced Password Based Security System Based on User Behavior using Neural Networks

Enhanced Password Based Security System Based on User Behavior using Neural Networks I.J. Information Engineering and Electronic Business, 2012, 2, 29-35 Published Online April 2012 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijieeb.2012.02.05 Enhanced Password Based Security System

More information

IDRBT Working Paper No. 11 Authentication factors for Internet banking

IDRBT Working Paper No. 11 Authentication factors for Internet banking IDRBT Working Paper No. 11 Authentication factors for Internet banking M V N K Prasad and S Ganesh Kumar ABSTRACT The all pervasive and continued growth being provided by technology coupled with the increased

More information

SECUDROID - A Secured Authentication in Android Phones Using 3D Password

SECUDROID - A Secured Authentication in Android Phones Using 3D Password SECUDROID - A Secured Authentication in Android Phones Using 3D Password Ms. Chandra Prabha K M.E. Ph.D. 1, Mohamed Nowfel 2 E S, Jr., Gowtham V 3, Dhinakaran V 4 1, 2, 3, 4 Department of CSE, K.S.Rangasamy

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

IDENTITY MANAGEMENT. February 2008. The Government of the Hong Kong Special Administrative Region

IDENTITY MANAGEMENT. February 2008. The Government of the Hong Kong Special Administrative Region IDENTITY MANAGEMENT February 2008 The Government of the Hong Kong Special Administrative Region The contents of this document remain the property of, and may not be reproduced in whole or in part without

More information

Denial of Service Attack Detection Using Multivariate Correlation Information and Support Vector Machine Classification

Denial of Service Attack Detection Using Multivariate Correlation Information and Support Vector Machine Classification International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Denial of Service Attack Detection Using Multivariate Correlation Information and

More information

22 nd NISS Conference

22 nd NISS Conference 22 nd NISS Conference Submission: Topic: Keywords: Author: Organization: Tutorial BIOMETRICS - DEVELOPING THE ARCHITECTURE, API, ENCRYPTION AND SECURITY. INSTALLING & INTEGRATING BIOMETRIC SYSTEMS INTO

More information

Accessing the bank account without card and password in ATM using biometric technology

Accessing the bank account without card and password in ATM using biometric technology Accessing the bank account without card and password in ATM using biometric technology Mini Agarwal [1] and Lavesh Agarwal [2] Teerthankar Mahaveer University Email: miniagarwal21@gmail.com [1], lavesh_1071985@yahoo.com

More information

STUDY OF IMPLEMENTATION OF INTRUSION DETECTION SYSTEM (IDS) VIA DIFFERENT APPROACHS

STUDY OF IMPLEMENTATION OF INTRUSION DETECTION SYSTEM (IDS) VIA DIFFERENT APPROACHS STUDY OF IMPLEMENTATION OF INTRUSION DETECTION SYSTEM (IDS) VIA DIFFERENT APPROACHS SACHIN MALVIYA Student, Department of Information Technology, Medicaps Institute of Science & Technology, INDORE (M.P.)

More information

Alternative authentication what does it really provide?

Alternative authentication what does it really provide? Alternative authentication what does it really provide? Steve Pannifer Consult Hyperion Tweed House 12 The Mount Guildford GU2 4HN UK steve.pannifer@chyp.com Abstract In recent years many new technologies

More information

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan Handwritten Signature Verification ECE 533 Project Report by Ashish Dhawan Aditi R. Ganesan Contents 1. Abstract 3. 2. Introduction 4. 3. Approach 6. 4. Pre-processing 8. 5. Feature Extraction 9. 6. Verification

More information

Multimedia Document Authentication using On-line Signatures as Watermarks

Multimedia Document Authentication using On-line Signatures as Watermarks Multimedia Document Authentication using On-line Signatures as Watermarks Anoop M Namboodiri and Anil K Jain Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824

More information

Biometric For Authentication, Do we need it? Christophe Rosenberger GREYC Research Lab - France

Biometric For Authentication, Do we need it? Christophe Rosenberger GREYC Research Lab - France Biometric For Authentication, Do we need it? Christophe Rosenberger GREYC Research Lab - France OUTLINE Le pôle TES et le sans-contact Introduction User authentication GREYC - E-payment & Biometrics Introduction

More information

XYPRO Technology Brief: Stronger User Security with Device-centric Authentication

XYPRO Technology Brief: Stronger User Security with Device-centric Authentication Ken Scudder Senior Director Business Development & Strategic Alliances XYPRO Technology Talbot A. Harty CEO DeviceAuthority XYPRO Technology Brief: Stronger User Security with Device-centric Authentication

More information

Data Security 2. Implement Network Controls

Data Security 2. Implement Network Controls UNIT 19 Data Security 2 STARTER Consider these examples of computer disasters. How could you prevent them or limit their effects? Compare answers within your group. 1 You open an email attachment which

More information

Analysis of Multimodal Biometric Fusion Based Authentication Techniques for Network Security

Analysis of Multimodal Biometric Fusion Based Authentication Techniques for Network Security , pp. 239-246 http://dx.doi.org/10.14257/ijsia.2015.9.4.22 Analysis of Multimodal Biometric Fusion Based Authentication Techniques for Network Security R.Divya #1 and V.Vijayalakshmi #2 #1 Research Scholar,

More information

User Authentication Guidance for IT Systems

User Authentication Guidance for IT Systems Information Technology Security Guideline User Authentication Guidance for IT Systems ITSG-31 March 2009 March 2009 This page intentionally left blank March 2009 Foreword The User Authentication Guidance

More information

Development of Academic Attendence Monitoring System Using Fingerprint Identification

Development of Academic Attendence Monitoring System Using Fingerprint Identification 164 Development of Academic Attendence Monitoring System Using Fingerprint Identification TABASSAM NAWAZ, SAIM PERVAIZ, ARASH KORRANI, AZHAR-UD-DIN Software Engineering Department Faculty of Telecommunication

More information

Advanced Authentication

Advanced Authentication White Paper Advanced Authentication Introduction In this paper: Introduction 1 User Authentication 2 Device Authentication 3 Message Authentication 4 Advanced Authentication 5 Advanced Authentication is

More information

De-duplication The Complexity in the Unique ID context

De-duplication The Complexity in the Unique ID context De-duplication The Complexity in the Unique ID context 1. Introduction Citizens in India depend on the Government for various services at various stages of the human lifecycle. These services include issuance

More information

Smart Card- An Alternative to Password Authentication By Ahmad Ismadi Yazid B. Sukaimi

Smart Card- An Alternative to Password Authentication By Ahmad Ismadi Yazid B. Sukaimi Smart Card- An Alternative to Password Authentication By Ahmad Ismadi Yazid B. Sukaimi Purpose This paper is intended to describe the benefits of smart card implementation and it combination with Public

More information

ENHANCING ATM SECURITY USING FINGERPRINT AND GSM TECHNOLOGY

ENHANCING ATM SECURITY USING FINGERPRINT AND GSM TECHNOLOGY Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Biometrics for Payment Applications. The SPA Vision on Financial Match-on-Card

Biometrics for Payment Applications. The SPA Vision on Financial Match-on-Card Biometrics for Payment Applications The SPA Vision on Financial Match-on-Card November 2013 Table of Contents 1. Introductory Remarks... 3 2. The Use of Biometrics for Personal Authentication... 5 2.1.

More information

Application-Specific Biometric Templates

Application-Specific Biometric Templates Application-Specific Biometric s Michael Braithwaite, Ulf Cahn von Seelen, James Cambier, John Daugman, Randy Glass, Russ Moore, Ian Scott, Iridian Technologies Inc. Introduction Biometric technologies

More information

A Various Biometric application for authentication and identification

A Various Biometric application for authentication and identification A Various Biometric application for authentication and identification 1 Karuna Soni, 2 Umesh Kumar, 3 Priya Dosodia, Government Mahila Engineering College, Ajmer, India Abstract: In today s environment,

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

KEITH LEHNERT AND ERIC FRIEDRICH

KEITH LEHNERT AND ERIC FRIEDRICH MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They

More information

SURVEY OF INTRUSION DETECTION SYSTEM

SURVEY OF INTRUSION DETECTION SYSTEM SURVEY OF INTRUSION DETECTION SYSTEM PRAJAPATI VAIBHAVI S. SHARMA DIPIKA V. ASST. PROF. ASST. PROF. MANISH INSTITUTE OF COMPUTER STUDIES MANISH INSTITUTE OF COMPUTER STUDIES VISNAGAR VISNAGAR GUJARAT GUJARAT

More information

Back Propagation Neural Network for Wireless Networking

Back Propagation Neural Network for Wireless Networking International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-4 E-ISSN: 2347-2693 Back Propagation Neural Network for Wireless Networking Menal Dahiya Maharaja Surajmal

More information

An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation

An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation Shanofer. S Master of Engineering, Department of Computer Science and Engineering, Veerammal Engineering College,

More information

An Algorithm for Electronic Money Transaction Security (Three Layer Security): A New Approach

An Algorithm for Electronic Money Transaction Security (Three Layer Security): A New Approach , pp.203-214 http://dx.doi.org/10.14257/ijsia.2015.9.2.19 An Algorithm for Electronic Money Transaction Security (Three Layer Security): A New Approach Md. Syeful Islam Samsung Research Institute Bangladesh

More information

Multi-Factor Authentication of Online Transactions

Multi-Factor Authentication of Online Transactions Multi-Factor Authentication of Online Transactions Shelli Wobken-Plagge May 7, 2009 Agenda How are economic and fraud trends evolving? What tools are available to secure online transactions? What are best

More information

How To Detect Denial Of Service Attack On A Network With A Network Traffic Characterization Scheme

How To Detect Denial Of Service Attack On A Network With A Network Traffic Characterization Scheme Efficient Detection for DOS Attacks by Multivariate Correlation Analysis and Trace Back Method for Prevention Thivya. T 1, Karthika.M 2 Student, Department of computer science and engineering, Dhanalakshmi

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Progressive Authentication on Mobile Devices. They are typically restricted to a single security signal in the form of a PIN, password, or unlock

Progressive Authentication on Mobile Devices. They are typically restricted to a single security signal in the form of a PIN, password, or unlock Progressive Authentication on Mobile Devices Zachary Fritchen Introduction Standard authentication schemes on mobile phones are at the moment very limited. They are typically restricted to a single security

More information

Biometric SSO Authentication Using Java Enterprise System

Biometric SSO Authentication Using Java Enterprise System Biometric SSO Authentication Using Java Enterprise System Edward Clay Security Architect edward.clay@sun.com & Ramesh Nagappan CISSP Java Technology Architect ramesh.nagappan@sun.com Agenda Part 1 : Identity

More information

Firewalls, Tunnels, and Network Intrusion Detection. Firewalls

Firewalls, Tunnels, and Network Intrusion Detection. Firewalls Firewalls, Tunnels, and Network Intrusion Detection 1 Firewalls A firewall is an integrated collection of security measures designed to prevent unauthorized electronic access to a networked computer system.

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Securing e-government Web Portal Access Using Enhanced Two Factor Authentication

Securing e-government Web Portal Access Using Enhanced Two Factor Authentication Securing e-government Web Portal Access Using Enhanced Two Factor Authentication Ahmed Arara 1, El-Bahlul Emhemed Fgee 2, and Hamdi Ahmed Jaber 3 Abstract This paper suggests an advanced two-factor authentication

More information

Using Received Signal Strength Indicator to Detect Node Replacement and Replication Attacks in Wireless Sensor Networks

Using Received Signal Strength Indicator to Detect Node Replacement and Replication Attacks in Wireless Sensor Networks Using Received Signal Strength Indicator to Detect Node Replacement and Replication Attacks in Wireless Sensor Networks Sajid Hussain* and Md Shafayat Rahman Jodrey School of Computer Science, Acadia University

More information

NFC & Biometrics. Christophe Rosenberger

NFC & Biometrics. Christophe Rosenberger NFC & Biometrics Christophe Rosenberger OUTLINE GREYC - E-payment & Biometrics Contactless transactions Biometric authentication Solutions Perspectives 2 GREYC Research Lab Research Group in Computer science,

More information

Development of Attendance Management System using Biometrics.

Development of Attendance Management System using Biometrics. Development of Attendance Management System using Biometrics. O. Shoewu, Ph.D. 1,2* and O.A. Idowu, B.Sc. 1 1 Department of Electronic and Computer Engineering, Lagos State University, Epe Campus, Nigeria.

More information

Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

More information

INCOMMON FEDERATION: PARTICIPANT OPERATIONAL PRACTICES

INCOMMON FEDERATION: PARTICIPANT OPERATIONAL PRACTICES INCOMMON FEDERATION: PARTICIPANT OPERATIONAL PRACTICES Participation in the InCommon Federation ( Federation ) enables a federation participating organization ("Participant") to use Shibboleth identity

More information

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ImpostorMaps is a methodology developed by Auraya and available from Auraya resellers worldwide to configure,

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information