2010 Seventh International Conference on Information Technology Continuous Biometric User Authentication in Online Examinations Eric Flior, Kazimierz Kowalski Department of Computer Science, California State University Dominguez Hills 1000 Victoria Street, Carson, CA 90747 eflior1@cp.csudh.edu, kkowalski@csudh.edu Abstract Online examinations pose a unique problem for distancebased education, in that it can be very difficult to provide true user authentication. Due to the inherent anonymity of being online, compared to taking an examination in a classroom environment, students may attempt to artificially boost their scores in online examinations by having another individual take the exam for them, which a typical user/password authentication scheme cannot detect. This paper discusses and presents a method for providing continuous biometric user authentication in online examinations via keystroke dynamics. Key Words: correlation, cosine, dynamics, keystroke, multi-factored, signature 1. Introduction In the situation of giving an online examination, there are security factors to consider beyond simple password authentication for access to the examination. It is not unreasonable to assume that a student may willingly give their password to someone else, with the intent that the other person will take the examination for the student. With this in mind, a system must be developed in order to determine that the person taking the examination is, in fact, the student registered to take the examination. While it may be infeasible to guarantee with 100% confidence that the person taking the examination is the student, there are methods which can be used to provide an estimate of how certain it is that the person taking the examination is who they claim to be. One way we can accomplish this is a biometric method in which we monitor the keystroke dynamics of the person taking the examination. Characteristics of keystroke dynamics vary from person to person, and are thought to be as individual as a signature. By measuring the flight time, or the time it takes the user to go from one key down event to another, a profile can be built of a user s typing signature. When we compare this recorded signature to the keystroke dynamics of the person taking the examination, we can make a determination about whether or not the person taking the test is the registered This paper presents information about using keystroke dynamics to obtain biometric authentication of a user, and a software project, which uses HTML, PHP, MySQL and JavaScript to implement an online examination where keystroke dynamics are used in order to authenticate the 2. Authentication Methods Currently, there are 4 primary methods of user authentication, which are: 1) Knowledge factors, or something unique that the user knows; 2) Ownership factors, or something unique that the user has; and Inherence factors, 3) something unique that the user is or 4) something unique that the user does [1]. However, when considering online examinations, each of these methods has a number of drawbacks. 2.1 Knowledge factors With regards to something unique that the user knows, this authentication method requires the user to know a unique sequence of numbers or characters. In an environment where a user does not want an unauthorized user to access their account, for instance, in online banking, implementing a strong password policy can help provide authentication security for the However, if the user will freely give their password away, no password policy, however strong, can prevent an unauthorized user from gaining access. 2.2 Ownership factors In the same vein, if the user is required to have some token, such as an ATM card, dongle, or key, an unscrupulous user can easily transfer this token to the unauthorized user, circumventing the authentication scheme. 978-0-7695-3984-3/10 $26.00 2010 IEEE DOI 10.1109/ITNG.2010.250 488
2.3 Inherence factors The next methods, known collectively as inherence factors, provide a very accurate means of authentication. They do, however, have drawbacks in that they can be unreasonably intrusive and expensive and difficult to implement. 2.3.1. Something the user is. The third method provides a very reliable method of authenticating a user, as most metrics used, such as fingerprint, voiceprint, retinal pattern, etc. are relatively difficult to duplicate. In the case of online examinations, however, there is an inherent difficulty in implementing these authentication methods due to the hardware requirement [2]. At the time of this writing, while fingerprint readers are becoming more popular, many are still prohibitively expensive. Other biometrics, such as DNA sampling, are simply too intrusive, expensive, and time consuming to consider for online authentication. Considering that many computers have built-in microphones, voice recognition is promising; however, it may be rather difficult to distinguish a live user from a recording [3]. 2.3.2. Something the user does. The final method, something unique that the user does, is perhaps the most promising of the four methods for providing continuous user authentication in online examinations. Examples of something unique that the user does include a users handwriting, walking gait, or typing rhythm. Authentication via handwriting, or sometimes simply a signature, requires that the exam taker have access to a tablet device, which can be cost prohibitive. In addition, given the wide variation of handwriting font, style, and size, and considering the variations which can be displayed by an individual, developing a fast and efficient computer program for handwriting authentication is relatively difficult. Since most computers have a keyboard as an input device, it is rather natural to examine the typing rhythm, or keystroke dynamics, of a particular user in order to perform authentication. This method, unlike many of the ones discussed, has the unique advantage of being able to be applied continuously throughout the examination. This helps prevent a situation in which a user accesses a system by legitimately authenticating themselves, and then giving access to an unauthorized 3. Keystroke Dynamics There are a number of factors to be considered when performing biometric user authentication via keystroke dynamics. D. Gunetti and C. Picardi, of the University of Torino, claim that Keystroke dynamics, unlike other biometric information, convey an unstructured and very small amount of information. From two consecutive keystrokes we may just extract the digraph latency, and the amount of time each key is held down (the keystroke duration), a pretty shallow kind of information. Moreover, this information may vary not only because of the intrinsic instability of behavioral characteristics, but because different keyboards can be used, different environmental conditions exist, and, above all, because typing rhythms also depend on the entered text. [4] In order to combat this problem of limited information, the role of keystroke dynamics in biometric authentication is often limited to fixed sections of text. This can be quite limiting in the domain of an online examination, as each exam taker is expected to provide a unique test answer. In the section four, we present a method for providing continuous user authentication with unique trial texts. Despite these limitations, there are a number of metrics which can be recorded and used for user verification. These include, but are not limited to: Typing speed; Keystroke seek-time; Flight-time; Characteristic sequences of keystrokes; and Examination of characteristic errors [5]. We discuss these metrics in the following sections. 3.1. Typing speed A typical user has a maximum typing speed, and is directly related to their typing skill. Typing speed is typically measured in Words per Minute and represents the number of 5-character sequences a user can type in one minute. For the purposes of user authentication, it is preferable to determine a users maximum Keystrokes per Minute rather than their WPM. Depending on a user s skill and experience with typing, this maximum Keystrokes per Minute represents an upper bound on the speed number of keystrokes a user will typically type in one minute. That is to say, a user may, and often will, enter fewer than their maximum Keystrokes per Minute, but it is rather unlikely that they will enter more. Thus, if a user provides keystrokes at a rate considerably larger than their recorded maximum Keystrokes per Minute, this may provide an indication that the user is not who they claim to be. 489
3.2. Keystroke seek-time Depending on each user s mastery of typing, different letters will take a different amount of time in order for the user to locate and press a particular key. This can be rather unique, as a typical keyboard has 105 keys, which gives at most 105! potential combinations of seek-time, assuming the seek time for each key is different. Given that there are so many different potential combinations of seek time, a dramatic difference in key seek-times can suggest the presence of an unauthorized 3.3. Flight-time Flight-time, which is the time between two key-up or two key-down events, is another metric which can be used to determine a profile of a Flight-time also includes the amount of time that a user holds a key down, known as hold-time. Flight-time varies greatly from one user to another, as the flight-time is closely related to the physiological makeup of the user s hands. A right-handed user may, for instance, have a shorter hold-time on keys on the right half of the keyboard when compared with their hold-time for keys on the left-half of the keyboard. Injuries and other physical abnormalities may also express themselves through the flight-time metric. Due to the physiological nature of variations in flighttime, we will focus on flight-time as the metric used for user authentication in our proof of concept system. 3.4. Characteristic sequences of keystrokes In a given language which can be typed on a keyboard, there are a series of sequences of keys which are repeatedly typed. In the English language, these include short words such as the, which are typed while requiring very little thought from a In addition, there are a number of frequently typed sequences of keys which are not words, but form common parts of words, for instance, many words begin with the same prefix, or end with the same suffix. In addition, commonly typed words, for instance, the name of the user, are deeply ingrained in the user s typing pattern. These sequences of keystrokes, if captured, can provide another method of verifying a user s identity. 3.5. Examination of characteristic errors In addition to having characteristic sequences of keystrokes, a user may also make a number of characteristic errors. These may include holding the Shift-key for too long, resulting in backspacing, or simply common typographical errors. If these common errors can be recorded, they also provide a reference against which the user s identity can be checked. 4. Implementing Continuous Keystroke Dynamic Authentication The first step in performing biometric identification using keystroke dynamics requires determining a profile of the This is much like storing a signature card at a bank, and provides a reference against which later tests can be made. R. Joyce and G. Gupta describe the process, To obtain a reference signature, we follow an approach similar to that used by the banks and other financial institutions. A new user goes through a session where he/she provides a number of digital signatures by typing in the four strings several times. Note that in the present environment the digital signature has four components, one component for each string that the user types. The system requires a new user to provide eight reference signatures by typing his/her username, password, first name and last name eight times. The number 8 was chosen to provide sufficient data to obtain an accurate estimation of the user s mean digital signature as well as information about the variability of his/her signatures. [6] These digital signatures are then processed and stored for later use. Once the signature has been recorded and processed, the data is compared against a new signature generated at the time of verification. As such, we must be able to determine the correlation between the newly created signature and the recorded and stored signature. 4.1. Cosine Correlation One method of comparing new data against the recorded signature was developed by the noted Polish mathematician, Hugo Steinhaus, co-founder of the Lwów School of Mathematics. In implementing this method, called the cosine correlation, we attempt to determine the correlation between the current trial signature and the reference signature. The correlation,, is determined as follows:, where is a vector of length which stores the flight times between keystrokes in the reference signature, and is a vector of length which stores the flight times between keystrokes in the trial signature. Each refers to the flight time between two keystrokes. A low r value implies a positive correlation, and should result in the user being authenticated. 490
4.2 Proof of Concept We have developed a proof of concept software system which incorporates HTML, PHP, MySQL, and JavaScript to create an implementation for administering an an online examination where keystroke dynamics are used in order to authenticate the In order to implement continuous authentication via keystroke dynamics, the system uses PHP and JavaScript embedded in HTML. JavaScript is used to record the time between key presses, and also to calculate the cosine correlation between the recorded signatures and the trial signatures. PHP is used to provide an interface between the MySQL database, and to allow information to be passed from one page to another. When the user provides their signature on the registration.php page, JavaScript is used to record the time between key down events. The length of the sample text provided must be at least 500 characters long. The signature requires that the user not backspace or delete during the registration, and doing so will cause the user to have to begin the registration again. The time between successive keystrokes is stored as an element of an array, which provides the basis for the signature. Upon completing the registration, the array, in a commadelimited string representation, and its length are sent via PHP to registration2.php. At registration2.php, PHP is used to turn the string from registration.php back into an array, and the array is divided into 10 discrete signatures of 50 characters each. These signatures are then stored in MySQL as commadelimited strings. When the examination begins, exam.php retrieves the 10 signatures associated with the user which were previously generated and stored in MySQL. It then passes each of those signatures to JavaScript, which stores each signature as an integer array containing the keystroke dynamics. When the user enters the answer to their exam question, the system monitors the user s keystrokes. When the user has completed a series of 50 keystrokes with no deletion or significant pauses, the system uses JavaScript to determine the cosine correlation between the signature of the last 50 keystrokes and all 10 stored signatures. If the average value lies over a certain threshold, a counter containing the number of failed authentications is incremented. When the user continues to the next successive question, both the answer and the number of failed authentications are passed to the next page. At exam2.php, the answer from the previous question and the number of failed authentications from the previous question are stored in the MySQL database. The user is presented with a new question, and the process of recording the keystroke dynamics for this trial is repeated. This process is repeated until the examination is completed. 4.2.1. Proof of concept system testing. For testing purposes, a php script is run, which simulates the work which would be done prior to administering the signature generation. A database is created in MySQL, which contains the following tables: students, which contains information about the student; dynamics, which contains the keystroke signatures; and answers, which contains information about the students responses and any failed authentications. In addition, virtual students are randomly generated, and assigned random and unique student ID numbers. This information is stored in the students table of the database. Once the database has been created and populated with student information, we simulate the generation of keystroke dynamic signatures. In their class, students would be directed to the index page, index.html. There, they encounter a PHP script, and are required to enter in their unique testing identification number. This is a number separate from their student identification number, and acts as a password for access to the testing system, providing multi-factored user authentication. The first time the user logs in, they are directed to a PHP script where they are asked to copy a pre-determined text into a text box. The system uses JavaScript to record the keystroke dynamics of the user in 10 discrete 50 keystroke blocks. Upon completion of entering the text, the user is directed to another PHP script. On this page, the keystroke dynamic information is stored in the proper field of the dynamics table, and the user is notified that their information has been recorded. At this time, the user can log off the system, and their biometric information will be ready for comparison at the time of the examination. When the user loads index.html at the time of the examination, the system recognizes that their biometric information is already contained in the system, and directs them to the first page of the examination. The examination can be set up to be either hard-coded with the examination question, or to select a random question from a database of questions. When the student logs on to exam.php, they are presented with an essay style question, and a text box in which to enter their answer. The system records the answer, and generate a signature every 50 keystrokes which is compared against the 10 signatures which are stored in the system. The cosine correlation is determined, and if the values lie above a certain threshold, an alert is generated and stored in the MySQL database. After completing the essay question, the user is directed to the successive question in the examination 491
where the essay answer for the previous question is recorded, and the process repeats. Failing an authentication can be visible or made transparent to the One method of making a failed authentication visible to the user is to generate a JavaScript event which turns the background of the page red to notify the user that they have failed an authentication. Knowing ahead of time that the system will be determining whether or not the student is actually answering the question provides a deterrent effect, impressing on the students that the work must be their own. However, there is a downside to this, in that there is a psychological effect in any false positive generated while the student is answering an exam question which may cause the student to lose concentration, become confused, or upset at a false negative. In addition, knowing that the exam is using keystroke dynamics to authenticate the user may simply cause the user to circumvent the system by having a collaborator tell the student an answer, and have the student type the essay. Upon completion of the exam, the student is directed to a completion page, where the final answer is recorded. The student is notified of the systems recognition that they have successfully completed the exam, and is allowed to log off. The administrator of the exam can look at the answers table after the exam has finished, and extract the answers recorded by each student. It is expected that each student will generate a small number of alerts during the process of taking the examination, but an abnormally high number of alerts generated will give the administrator reason to suspect that the person who wrote the examination is not the student registered in the class. provide a level of certainty that a user who sits an online examination is, in fact, the one who was supposed to take the examination. 12. References [1] Anderson, R, Security Engineering: A Guide to Building Dependable Systems, Wiley Publishing, Inc., Indianapolis, IN, 2008 [2] Y. Levy, M. Ramin, A Theoretical Approach for Biometrics Authentication of e-exams, http://telempub.openu.ac.il/users/chais/2007/morning_1/m1_6.pdf [3] Kinnunen, T., Hautamaki, V., Franti, P., On the Fusion of Dissimilarity-Based Classifiers for Speaker Identification, 8 th European Conference on Speech Communication and Technology, 2641-2644, 2003 [4] D. Gunetti, C. Picardi, Keystroke analysis of free text, ACM Transactions on Information and System Security (TISSEC), v.8 n.3, p.312-347, August 2005 [5] Ilonen, J., Keystroke Dynamics, Lecture in Advanced Topics in Information Processing, http://www.it.lut.fi/kurssit/03-04/010970000/seminars/ilonen.pdf [6] R. Joyce, G. Gupta, Identity authentication based on keystroke latencies, Communications of the ACM, v.33 n.2, p.168-176, Feb. 1990 4. Conclusion While our proof of concept system used HTML, PHP, JavaScript and MySQL, there are a number of programming technologies which can be used to gather data regarding keystroke dynamics. We found that using keystroke dynamics for biometric authentication of a user taking an online examination is feasible for multi-factor user authentication. Steinhaus method of cosine correlation gives us a way to perform continuous user authentication via keystroke dynamics in an online examination scenario. The problem of requiring a fixed text for authentication via keystroke dynamics can be overcome by generating multiple signatures from one set of text, and using the average value of the cosine correlation. In this manner, variations from one signature to another are diminished and can give a more accurate correlation between the trial signature and the recorded signature. This allows us to 492