Sign Language Phoneme Transcription with Rule-based Hand Trajectory Segmentation

Size: px
Start display at page:

Download "Sign Language Phoneme Transcription with Rule-based Hand Trajectory Segmentation"

Transcription

1 J Sign Process Syst (2010) 59: DOI /s Sign Language Phoneme Transcription with Rule-based Hand Trajectory Segmentation W. W. Kong Surendra Ranganath Received: 21 May 2008 / Revised: 31 July 2008 / Accepted: 23 September 2008 / Published online: 16 October Springer Science + Business Media, LLC. Manufactured in The United States Abstract A common approach to extract phonemes of sign language is to use an unsupervised clustering algorithm to group the sign segments. However, simple clustering algorithms based on distance measures usually do not work well on temporal data and require complex algorithms. In this paper, we present a simple and effective approach to extract phonemes from American sign language sentences. We first apply a rule-based segmentation algorithm to segment the hand motion trajectories of signed sentences. We then extract feature descriptors based on principal component analysis to represent the segments efficiently. The segments are clustered by k-means using these high level features to derive phonemes. 25 different continuously signed sentences from a deaf signer were used to perform the analysis. After phoneme transcription, we trained Hidden Markov Models to recognize the sequence of phonemes in the sentences. Overall, our automatic approach yielded 165 segments, and 58 phonemes were obtained based on these segments. The average number of recognition errors was 18.8 (11.4%). In comparison, completely manual trajectory segmentation and phoneme transcription, involving considerable labor yielded 173 segments, 57 phonemes, and the average number of recognition errors was 33.8 (19.5%). W. W. Kong S. Ranganath (B) Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore elesr@nus.edu.sg W. W. Kong g @nus.edu.sg Keywords American sign language (ASL) Phoneme transcription Trajectory segmentation Principal component analysis (PCA) Hidden Markov models (HMM) 1 Introduction One of the key issues in sign language recognition is scalability to large vocabulary. In speech recognition, this problem is handled by using phoneme-based modeling. Back in 1959, Fry [1] and Denes [2] built the first phoneme-based speech recognizer to recognize four vowels and nine consonants, and this approach is in common use for speech recognition today. The same strategy also has potential for sign language recognition. However, the major difficulty in sign language is that unlike speech phonemes which are linguistically well defined and can be used without ambiguity to transcribe speech sentences, there is no consistent phoneme lexicon in sign language. There are approximately 40 phonemes in the English language such as \i\, \f\, \aa\, etc., but phonemes in sign language are usually loosely defined, and depend on the modeling approach and features used in different sign language recognition systems. Also, there is no standard way of defining phonemes in sign language, and different schemes have been used for phoneme transcription. Sign language communication involves manual and non-manual gestures. The latter consist of facial expressions, and head and torso movements, while manual gestures use the hand and arm to convey lexical meaning. Our focus is on manual signs and in this context, sign linguists such as Stokoe [3] and Liddell and Johnson [4] offer some guidelines to model the phonemes by distinguishing the basic components of a

2 212 J Sign Process Syst (2010) 59: sign gesture as consisting of the handshape, hand orientation, location, and movement. Stokoe emphasized the simultaneous organization of these components while Liddell and Johnson s Movement-Hold model emphasized sequential organization. An automatic phoneme transcription procedure is an essential step towards building practical sign language recognition systems that scale well with vocabulary size. As there is no phoneme lexicon or a standard approach to transcribe phonemes for sign language, it is important to devise an efficient strategy for consistent phoneme transcription, including an automatic segmentation procedure to work with continuously signed sentences. Towards this end, we propose an effective automatic procedure to transcribe phonemes from continuous American sign language (ASL) sentences which are signed naturally by a deaf signer rather than from textbook definitions of signs. The lexical meaning of a sign is inferred by recognizing the four parallel components of handshape, location,orientation and movement, and thus, different types of phonemes can be defined for each of the components. Sign language sentences can then be labeled with a sequence of these phonemes. In this paper, we consider transcribing phonemes for the hand movement trajectory only as the other three components, handshape, hand orientation, and position, are simpler to deal with, and corresponding phonemes can be defined easily by using simple clustering algorithms. Currently, there are two main approaches for phoneme transcription viz., 1) transcription based on sign language models defined by sign linguists, and 2) transcription which is dependent on the data collected and features used. In the first approach, the sign components are quantized into limited categories and sign language models such as Stokoe s or Liddell s are used to label the signs. Vogler and Metaxas [5] adopted this approach and defined the phonemes for movement and location, using Liddell s model to recognize 22 ASL signs based on these phonemes. The small vocabulary size makes the transcription rather straightforward. Wang et al. [6] defined a phoneme as the smallest unit that has meaning and can distinguish one sign from another. They performed an extensive study of Chinese sign language (CSL) and explicitly defined about 2400 phonemes for CSL. However, the transcription process and the resulting phonemes are not clearly described. In this strategy, a manual transcription process can be very laborious and time-consuming, and when the vocabulary size is large this approach is unreliable and impractical. Hence, it is important to devise an automatic method to define phonemes for recognizing SL sentences. In the second approach, many works use unsupervised learning to obtain phonemes automatically without using any sign language models. Walter et al. [7] adopted a mixture density-based clustering approach for transcribing phonemes from gesture trajectory segments. Mixture parameters were determined using expectation maximization and minimum description length was used as the criterion to automatically determine the number of clusters. Bauer and Kraiss [8] used k-means clustering to self-organize trajectory segments into fenones. In this approach, the fenones formed usually do not relate to phonetic concepts. Also, temporal segments may not be properly aligned when segments are obtained from continuously signed sentences. This poses problems for clustering algorithms such as k-means which use the Euclidean distance measure. Hence, a complex clustering algorithm is often required to handle the problem. Wang et al. [9] adopted dynamic programming to segment the data streams, and a hybrid of neural networks and k-means was used to cluster the segments. Fang et al. [10] proposed a temporal clustering algorithm to group segments using concatenated handshape, position and orientation features of both hands. The temporal clustering algorithm was based on a modified k-means algorithm proposed by Wilpon and Rabiner [11]. However, these are complex and computationally expensive. In this paper, we propose an automatic procedure to perform phoneme transcription. The automatic procedure saves time and intensive labor required for manual transcription. There are two steps in transcribing phonemes from continuously signed sentences, viz., segmentation of the hand trajectories, followed by phoneme transcription. Several works have considered automatic trajectory segmentation for various purposes. Sagawa and Takeuchi [12] considered minima of hand velocity and large changes in hand motion trajectory angle as candidates for segment boundaries. Wang et al. [13] also used a similar method for trajectory segmentation. Gibet and Marteau [14] identified boundary points where the radius of curvature became small and there was a decrease in velocity. They used the product of velocity and curvature as the measure in their segmentation algorithm to detect boundary points. Rao et al. [15] used the spatio-temporal curvature of motion trajectory to describe a dynamic instant, which is taken to be an important change in motion characteristics such as speed, direction and acceleration. These changes were captured by identifying maxima of spatio-temporal curvature. Walter et al. [7] used a two-step segmentation algorithm for 2-D hand motion. They first detected rest and pause positions by identifying points where the velocity dropped

3 J Sign Process Syst (2010) 59: below a pre-set threshold. After this, they identified discontinuities in orientation to recover strokes (movement and hold) by applying Asada and Brady s Curvature Primal Sketch [16]. For phoneme transcription, we extracted principal component analysis (PCA)-based features and clustered them. Our approach alleviates some of the problems in [7 10] by using PCA features, which allows us to use simple k-means, rather than complex algorithms to cluster the temporal segments. Further, unlike the phonemes obtained in [8], the phonemes yielded by our approach are related to the phonetic concepts which are more meaningful for describing sign language. In related work, Vogler [5] used the first and the second eigenvalues from PCA to differentiate between lines and curves and used them as global features for sign language recognition. However, this was not explored further. We believe that this is a good starting point to facilitate phoneme transcription. Several other works have also used PCA-based features to perform gesture or sign language recognition. Nam and Wohn [17] projected the 3-D hand trajectory to a plane found by PCA, and used a chain encoding scheme for describing the hand movement path for recognition. We used Sagawa and Takeuchi s [12] approach of using minimum velocity and maximum change of directional angle as the basis for segmentation, but found that it oversegmented the hand trajectories. However, the true segmentation points are a subset of this initial segmentation, and we devised rules based on the characteristics of the boundary points obtained to identify the true segmentation points and minimize the false alarms. Next, we extracted feature descriptors from these segments using PCA. Even though the hand trajectory of a complete sentence may be a complex 3-D curve, we can expect that the segments obtained will correspond to lines or planar curves. Hence, PCA of these segments will directly yield the directions of the lines and the planes of the curves. We apply PCA to each segment, and cluster features by k-means to specify the phonemes and give geometric meaning to them. After the phonemes are automatically defined, they are used to label the sentences and train Hidden Markov Models (HMMs) for recognition. Of course, once the HMMs are trained, input sentences are implicitly segmented and recognized. The remainder of this paper is organized as follows. In Section 2, the automatic rule-based segmentation algorithm is presented. Phoneme specification and transcription are described in Section 3. Recognition by HMMs is explained in Section 4. Experimental results are presented and discussed in Section 5.Section6 gives the conclusions. 2 Automatic Rule-based Trajectory Segmentation Automatic trajectory segmentation is performed in two steps. First, we obtain initial segmentation points based on minimal velocity and maximal change of directional angle. Typically, the trajectories are over segmented by this procedure. These over segmented trajectories are then processed by rules to identify the true segmentation points and minimize false alarms. 2.1 Initial Segmentation Temporal segmentation is implemented by detecting points of minimal velocity and maximal change of directional angle. The continuous raw 3-D hand trajectory data is first interpolated and smoothed using splines. Figure 1 shows an example original and splined hand trajectories of a sentence. This step is useful for more accurate and reliable velocity and directional angle computation. Velocity v t is estimated as v t = p t+1 p t (1) where p t = (x t, y t, z t ) is the 3-D position at time t. Figure 1 Original and splined trajectories.

4 214 J Sign Process Syst (2010) 59: Figure 2 Directional angle. Directional angle change, θ, is computed as the angle between two vectors formed by three consecutive 3-D positions as shown in Fig. 2. Thus cos(θ) = u 1 u 2 (2) u 1 u 2 where u 1 = p t p t 1 and u 2 = p t+1 p t. The initial segment boundaries are marked at the points of local velocity minima and maxima of directional angle change. These are processed by rules to minimize the false boundary points. 2.2 Rule Formulation The rules are specified by observation and by using features that characterize the boundary points adequately; these features are summarized in Table 1. minvel and maxang are binary features to indicate a point of minimal velocity and a point of maximal change of directional angle, respectively. normvel is the normalized velocity with respect to the peak and lies in [0 1], and dirang is the absolute directional angle change of a point in [0 180 ]. lftvalley(p vl /H vl ) and rgtvalley(p vr /H vr ) characterize the valley associated with a velocity minimum. Similarly, lftpeak(p al /H al ) and rgtpeak(p ar /H ar ) characterize the peak associated with an angle maximum. Figure 3 illustrates the idea. The rules are summarized in Table 2. Rule 1 checks if a boundary point corresponds to a local minimum Table 1 Features characterizing velocity minima and maxima of directional angle change. Feature minvel maxang Description Point is a local minimum of velocity or not. Point is a local maximum of directional angle change or not. Normalized velocity values. Absolute angle values. normvel dirang lftvalley P vl /H vl (see Fig. 3). rgtvalley P vr /H vr (see Fig. 3). lftpeak P al /H al (see Fig. 3). rgtpeak P ar /H ar (see Fig. 3). Figure 3 Definition of parameters for features described in Table 2. of velocity and maximum change of directional angle, and indicates a strong potential boundary point if both are true. Rules 2, 3 and 4 examine the characteristics of the valley of the minimal velocity and the peak of the maximal change of directional angle. A true detection should be characterized by a deep valley while a shallow valley is possibly a false alarm. A true maximal angle change is characterized by relatively sharp peak. Rule 5 checks the values of the normalized velocity and directional angle change. A point with a high velocity value, and a low directional angle change is likely to be a false alarm, while a point with a low velocity value and a high directional angle change is a potential boundary Table 2 Formulated rules. Rule Description Rule 1 if (minvel = TRUE) and (maxang = TRUE), check Rule 2 elseif (minvel = TRUE) and (maxang = FALSE), check Rule 3 else check Rule 4 Rule 2 if (lftvalley > T 1 or rgtvalley > T 2 ) and (lftpeak > T 3 or rgtpeak > T 4 ), detection = TRUE POINT else detection = FALSE ALARM Rule 3 if (lftvalley > T 1 or rgtvalley > T 2 ), check Rule 5 else detection = FALSE ALARM Rule 4 if (lftpeak > T 3 or rgtpeak > T 4 ), check Rule 5 else detection = FALSE ALARM Rule 5 if (normvel <= T 5 and dirang >= T 6 ) or (dirang >= T 7 and normvel <= T 8 ), detection = TRUE POINT else detection = FALSE ALARM Note: T i,i=1,2,...,8, are thresholds found empirically, and (T 5 < T 8 ), (T 7 > T 6 ). the condition (minvel =FALSE)and (maxang =FALSE) will not occur.

5 J Sign Process Syst (2010) 59: point. However, we relax these conditions, and accept a point with a very low velocity (T 5 ) but moderately high directional angle change (T 6 ) as a true boundary point. On the other hand, if this condition is not met, but the point exhibits a very high directional angle change (T 7 ) and moderately low velocity (T 8 ), we also consider it as a true boundary point. The threshold values (T i )are found empirically as described in Section 5. 3 Phoneme Transcription The segments can have different lengths, locations, orientations and directions of motion in the 3-D signing space. Also, the segments obtained from continuous sentence trajectories can be noisy in the sense that the segmentation algorithm does not always give the exact boundary points and slight deviations in segment boundaries are usually obtained. Movement epenthesis in naturally signed continuous sentences also contributes to this. Movement epenthesis is an extra segment that occurs between signs and contains no information related to the sign meaning. For example, Fig. 4 shows a segment which is essentially a straight line, but has a small extraneous part that arises from movement epenthesis. 3.1 Descriptors for Trajectory Segments There are too many variations in the segments of naturally signed sentences which make it difficult to directly cluster them. Hence, we suggest a better representation that will enable the use of simple clustering algorithms. We characterize a trajectory segment by the plane in which it lies, and its shape, direction of motion, size and position. Curves are described by all the above features, while lines are described only by their direction, size and position. PCA can easily differentiate lines (1-D) from curves (2-D) based on eigenvalues. For a line, the first eigenvalue (when ordered from largest to smallest) greatly exceeds the second, and we use this fact to easily separate lines and curves. Based on normalized eigenvalues E i = λ i 3 j=1 λ j, i = 1, 2, 3 (3) a segment is determined to be a line if E 1 > 0.95, and a 2-D curve, otherwise. Following this determination, a set of features are extracted as described below Plane of the Trajectory Segment The normal to the plane in which the curve lies in 3-D space can be obtained by the vector cross product n i = e i 1 ei 2 (4) where n i is the normal to the plane, and (e i 1, ei 2 )are the first and second eigenvectors of the i th segment. As there are two possible directions for n i in 3-D, we adopt a fixed convention to choose its direction. Since two combinations of ±e i 1 and ±ei 2 correspond to the normal direction chosen, we use one of the pairs as our first and second eigenvectors Direction of Motion We use dominant motion direction to describe direction for lines, and clockwise/anticlockwise sense for circles. As for arcs, both are used. Dominant Direction Though the direction of a line can be simply computed from the starting position to the ending position of the trajectory segment, to reduce sensitivity to noise, the dominant direction is obtained based on the first eigenvector, e i 1 which is along the direction of the largest variance in the data. As both e i 1 and ei 1 can be considered to be valid directions of maximum variance, we resolve this ambiguity as follows: i) Compute a unit vector from the starting point to the ending point of the segment as w i = pi n pi 1 p i n pi 1 (5) Figure 4 Straight line segment with small portion arising from movement epenthesis. where p i 1 and pi n are the starting and ending points of the i th segment, respectively.

6 216 J Sign Process Syst (2010) 59: ii) Compute Shape θ 1 = cos 1 (w i e i 1 ) (6) θ 2 = cos 1 (w i e i 1 ) (7) The dominant direction is chosen to point in the eigenvector direction that is closer to w i by choosing e i 1 if θ 1 is smaller than θ 2,and e i 1, otherwise. Clockwise and Anticlockwise Motion We use the projected 2-D curves to determine whether the motion is clockwise or anticlockwise as follows: i) The first turning point, q, of the curve is located, for example, as in Fig. 5a orb.thecurveisthen rotated so that q lies on the positive horizontal axis. The corresponding rotated trajectories are as shown in Fig. 5c and d, respectively. ii) Clockwise and anticlockwise motion sense can then be found by the following rule: motion= { clockwise if (x, y ) or(x, y ) ast anticlockwise if (x, y ) or(x, y ) ast (8) Both arcs and circles are initially classified as curves, but need to be distinguished based on shape of the segments in the 2-D principal subspace. This is done with Fourier descriptors, which are extracted following the steps below. i) The trajectory segment is resampled to a fixed number of samples, N, equally spaced in arc length. N is chosen to be a power of 2 to facilitate the application of the fast Fourier transform. We used N = 64. ii) The projected 2-D curve coordinates are used to define a complex signal iii) z t = x t + iy t, t = 0, 1,..., N 1 (9) where x and y are the x- and y-coordinates in the projected plane. The motion direction of the projected trajectory segment (clockwise or anticlockwise) affects the ordering of the Fourier descriptors. Hence, to Figure 5 a, b Projected trajectories and c, d corresponding rotated trajectories.

7 J Sign Process Syst (2010) 59: remove this sensitivity, we re-ordered the projected segment from the last sample to the first, if its motion sense was found to be anticlockwise. iv) The DFT of z =[z 0, z 1,..., z N 1 ] is obtained as F =[f 0, f 1,..., f N 1 ]. v) Invariance to translation is obtained by removing the first element (DC component) in F. Rotation invariance is achieved by removing the phase information, i.e. using only the absolute values of f i. Scale normalization is obtained by dividing the Fourier coefficients by f 1. The final Fourier descriptors are given as F = [ f2 f 1, f 3 f 1,..., f ] N 1 f 1 (10) For discriminating only between circles and arcs, the first and last k elements in F were used, and k = 5 was found to be sufficient Size and Position The maximum range in each of the x-, y-, z-coordinates is found, and the largest range is taken to represent the size. Position is described by using only the starting and ending positions of the segments. As the segments obtained are noisy, we represent the start and end positions of the segment by the mean values of the first and last 5% of the segment points. 3.2 Defining Phonemes with K-means There are two alternatives defining phonemes by clustering. We can either concatenate the extracted features and cluster these vectors or cluster each feature separately. We adopt the latter approach as it is simpler and allows simple geometric labeling of the clusters. Figure 6 shows the transcription procedure. The 3-D trajectory segments are first segregated into lines or curves based on the principal eigenvalue found by PCA of each segment. The features used for lines are dominant direction, size and position. All the features in Section 3.1 are used for arcs and circles, with the exception of dominant direction for circle. The individual features are clustered by k-means. Table 3 summarizes possible clusters for each feature and serves as a guideline to determine the number of clusters for each feature. The actual number of clusters is found empirically. The phonemes are then defined by grouping the trajectory segments which have the same geometric feature descriptions. For example, all the trajectory segments which are identified as lines with Dominant Direction = down, Size = small, and Position = mouth, are considered as a cluster (phoneme). Figure 6 Phoneme transcription procedure.

8 218 J Sign Process Syst (2010) 59: Table 3 Possible clusters for the descriptors. Descriptors Clusters Plane xy-,yz-,xz-,±45 -planes Shape circles and arcs Dominant direction up,down,left,right,away,toward Motion sense clockwise,anticlockwise Size large,small Position 12 positions (refer to [3]) 4 Continuous Sentence Recognition with HMMs We train HMMs [18, 19] using continuous sentences labeled by the transcribed phonemes. Raw 3-D trajectory positions are used as the observation sequences to train the model parameters, M = (π, A, B) of left to right HMMs. Each sentence is modeled as a sequence of phonemes. Each phoneme is modeled by 3 5 states, and each state is represented by a single Gaussian with full covariance matrix. We employ Viterbi training and decoding, where the phoneme boundaries are detected implicitly. Initialization of the HMMs is done as follows. As our signed sentences always start at the same spatial position, the initial state s prior probability is set to 1. The transition probabilities are set to be equi-probable except for the invalid transitions, whose probabilities are set to zero. For the case when the phonemes are obtained by automatic transcription, the segments obtained with our segmentation algorithm are used to compute the initial Gaussian parameters. On the other hand, when the phonemes are defined manually, the entire sentence trajectory is divided equally into segments according to the number of phonemes in the sentence. As each segment is represented by 3 5 states, the segment is equally divided into 3 5 sub-segments from which the Gaussian parameters are estimated to initialize each state. which was synchronized with the trackers, was used to record the frontal view of the signer signing the sentences. The video clips obtained were used to facilitate the manual segmentation and phoneme transcription procedures. We conducted several experiments to evaluate the automatic rule-based trajectory segmentation procedure and phoneme transcription process. The evaluations are based on 25 ASL sentences signed 5 10 times by a deaf signer. Various subsets of this data were used for different experiments. 5.1 Automatic Trajectory Segmentation To compare the performance of automatic trajectory segmentation with manual segmentation, and to have labeled training data, segment points were marked by an expert signer. In the segmentation algorithm, the initial segments were obtained from all samples of the 25 sentences by automatically locating the points of minimal velocity and maximal change of directional angle. This yielded 1996 initial segment boundary points. In order to use the rules of Table 2 for processing this initial segmentation, threshold values (T i )are needed for the features. To obtain these thresholds, we picked two training samples each from 13 randomly picked sentences and labeled their initial segment points as true segment points or false alarms in relation to the manually marked points. The features were then extracted from this training data, and the threshold value for each feature was set by examining its distribution. We conducted two experiments to compare results from rule-based segmentation on the initial 1996 segment points. In Experiment 1, we used the rules in Table 2, while in Experiment 2, we used only a subset of the features, viz., minvel, maxang, normvel, dirang andtherulesintable4.table5 summarizes the results obtained by the two experiments. The accuracy of segment point detection in Experiment 1 was 89.7% 5 Experiments and Results A Polhemus FASTRACK electromagnetic motion tracking system was used for capturing the 3-D hand trajectories. It senses the position (x, y, z) coordinates and orientation (azimuth, elevation, and roll) of objects, and consists of one transmitter and four trackers. Two trackers were attached to right and left hands, respectively, and two other trackers were attached to the waist and the head of the signer to provide reference values. The sampling rate for each tracker is 30 samples/s. In this work, we only used the right hand data for the experiments. In addition, a video camera, Table 4 Simplified rules. Rules Descriptions Rule 1 if (minvel = TRUE) and (maxang = TRUE), detection = TRUE POINT else, checkrule2 Rule 2 if (normvel <= T 5 and dirang >= T 6 ) or (dirang >= T 7 and normvel <= T 8 ), detection = TRUE POINT else detection = FALSE ALARM Note: T i, i = 5,6,7,8, are the same thresholds as in Table 2. the condition (minvel =FALSE)and (maxang =FALSE) will not occur.

9 J Sign Process Syst (2010) 59: Table 5 Detection accuracy by rules. Experiment 1 Experiment 2 Total no. of points from initial segmentation Manually labeled boundary points Detected true boundary points 722 (89.7%) 769 (95.5%) False alarms 140 (11.8%) 462 (38.8%) Missed points 83 (10.3%) 36 (4.5%) with 11.8% false alarms; the corresponding results for Experiment 2 were 95.5% and 38.8%, respectively. Though Experiment 2 detects the true segmentation points about 6% better than Experiment 1, it is about 27% worse in its ability to discard false alarms. We note here that we use the terms true boundary point and false alarms only in relation to points manually marked by an expert signer. Manual segmentation involves difficult judgements and guesses, and it would be optimistic to label this as groundtruth. 5.2 Phoneme Transcription For purposes of comparison, we obtained phoneme transcriptions in three different ways, and used them for recognizing signed sentences by HMMs. In Experiment A, trajectory segmentation was done manually, and results were compared between manual and automatic phoneme transcription. For this experiment, an expert signer attempted to define the trajectory segments in one sample of each of the 25 sentences according to sign linguistics, in conjunction with the initial segments obtained at points of velocity minima and/or maxima of directional angle change. A frontal video of the signer, which was closely synchronized with the trackers was also used to facilitate this manual segmentation process. Based on this collective information, the expert signer identified 173 segments in the 25 sentences. As a basis for comparison with the automatic phoneme transcription procedure of Section 3, the expert signer also manually transcribed these 173 trajectory segments into phonemes by visual observation. The same video clips that were used to facilitate manual segmentation were used with an ASL dictionary for manual transcription. The 173 segments obtained represent sign information as well as movement epentheses. As movement epenthesis is a connecting segment without sign meaning, the segments corresponding to it should be represented as simply as possible. In addition, as the movement epentheses are usually signed more loosely, clustering them with the sign phonemes may cause the phoneme clusters to be less representative. Hence, the trajectory segments were first separated out into movement epenthesis segments and sign segments, based on analysis of the sentence structure and the segments obtained. We then described the movement epenthesis by their start and end positions and direction only, and described the sign phonemes by the features used for lines, arcs and circles as shown in Section 3. This manual approach yielded 57 phonemes, which included 33 phonemes corresponding to signs and 24 phonemes corresponding to movement epenthesis. When the same 173 segments were automatically transcribed by the procedure of Section 3, we obtained 56 clusters (phonemes), with 36 sign phonemes and 20 movement epenthesis phonemes. The phoneme clusters obtained by both approaches were checked by plotting the trajectories of cluster members. We observed that the clusters obtained by the automatic procedure were generally more consistent and the cluster members were closer in appearance. On the other hand, some phoneme clusters specified by manual transcription were poorly formed. This can be expected as it is difficult to maintain consistency in manual transcription. Also, relying on the video clips during this process could have led to errors when there were visual occlusions. In the automatic transcription process, on the other hand, the PCA process separates the segments into lines, or planar curves and circles, and each feature of these categories is individually clustered. This simplifies checking the validity of the clusters as the number of clusters obtained for each feature is greatly reduced. Figure 7a and b show one of the clusters (phonemes) obtained by the automatic and manual phoneme transcription processes, respectively. It can be seen that the cluster formed by automatic phoneme labeling is more consistent. Another attractive benefit of automatic phoneme transcription is the significant reduction in time and labor as compared to manual transcription. In Experiment B, the entire phoneme transcription process was automatic using the trajectory segmentation procedure of Section 2 followed by the transcription process of Section 3. For this we made use of the segment points obtained from all samples of all sentences from Experiment 1 to derive the final segment boundary points in each sentence for automatic transcription in a consistent manner. This was done by selecting only the consistently occurring points in all samples of a sentence to form the final segment boundary points. This yielded a total of 165 segments in the 25 sentences to be used for automatic transcription. Of these, 25 points corresponded to the starting location of each sentence, which was assumed

10 220 J Sign Process Syst (2010) 59: Figure 7 Clusters obtained (trajectories are normalized) (a, b). known. Of the remaining, 128 points corresponded to manually labeled points, while 12 were false alarms. The automatic trajectory segmentation and phoneme transcription procedures make no assumptions about the movement epenthesis segments and sign segments, and hence, they are not distinguished from each other. All the trajectory segments obtained are clustered according to the transcription procedure described in Section 3. The automatic transcription process based on these 165 segments yielded 58 phonemes. We then used the transcribed phonemes from Experiments A and B to recognize the sequence of phonemes in the 25 sentences using HMMs. 5.3 Recognition with HMMs We used four samples of each of the 25 sentences to train and one sample to test the HMMs in a full round robin procedure, i.e. in each trial 75 sentences were used for training and 25 for testing. After decoding a test sentence, the sequence of phonemes detected in the sentence was checked for errors. Three kinds of phoneme recognition errors are possible, viz. insertion, deletion and substitution. Table 6 compares the average number of errors in Experiment A for all the test sentences when the training sentences were labeled by phonemes transcribed manually and automatically. On the 173 segments in the 25 sentences, the average numbers of errors were 33.8 and 24.0, for sentences that were labeled manually and by automatically transcribed phonemes, respectively. It can be seen that recognition performance improves when sentences are labeled with automatically transcribed phonemes. This suggests that consistent phoneme transcription is important for better recognition performance, and that manual transcription contributes to inconsistency in grouping the segments. When segments of different kinds are grouped together, the data distribution is inconsistent, and this degrades the recognition performance of the HMMs. We used the same experimental procedure to evaluate the recognition rate for the case when the phoneme transcription process was fully automatic as in Experiment B. Of the 165 segments obtained for the 25 sentences, the average number of errors was 18.8 as shown in Table 7, which is less than that obtained by manual segmentation in Table 6. Thus,interms of recognition performance, phonemes transcribed by the fully automatic procedure gives better results than when manual segmentation and/or manual transcription is involved. This is fortunate since manual processing is tedious and extremely time consuming. Table 6 Average HMM recognition errors on 25 test sentences containing 173 segments (Experiment A). Error type Manual transcription Automatic transcription errors errors Insertion Deletion Substitution Total The trajectory segmentation was manual; the comparison is between manual and automatic phoneme transcription. Table 7 Average HMM recognition errors on 25 test sentences containing 165 segments obtained by the rule-based segmentation algorithm and automatic transcription (Experiment B). Error type Errors in fully automatic phoneme transcription Insertion 4.4 Deletion 8.4 Substitution 6.0 Total 18.8

11 J Sign Process Syst (2010) 59: We note here that accuracy of automatic trajectory segmentation directly affects the recognition accuracy. False alarms lead to too many segments in the trajectory, while missed points cause segments to be merged and thus they may not represent the smallest units, leading to many variations in the segments. In both cases, more clusters (phonemes) will be needed to group the segments consistently. If we use more phonemes to represent a sentence, the recognition accuracy may be better, but at the same time the size and complexity of the HMM decoding network will increase. On the other hand, if we compromise by having fewer clusters (phonemes) with larger cluster variances, sentence representation may be less accurate and thus the recognition accuracy may drop. Hence, we need to segment the trajectories as accurately as possible for phoneme transcription. 6 Conclusions We devised an automatic segmentation procedure to perform temporal segmentation of naturally signed ASL hand trajectories, and used a set of rules to detect true segmentation points and eliminate false alarms from a simple initial segmentation. We also devised an automatic phoneme transcription scheme which relies on effective feature representation. PCA is used to simplify the problem significantly by projecting 3-D hand trajectory segments to 1-D (lines) or 2-D (curves). High level features which describe the geometry of the segments are extracted in the projected space. These feature descriptors have proven to be useful for phoneme transcription in our experiments. The experimental results show that our automatic approach is more accurate than manual trajectory segmentation and phoneme transcription, while providing significant savings on the time consuming human labour required in the manual approach. An automatic approach will be even more important for large vocabulary systems where manual transcription is impractical. Overall, the average number of phoneme recognition errors on 25 sentences containing 165 segments based on our automatic ruled-based segmentation and phoneme transcription procedure was 18.8 (11.4%). The rules and thresholds used in this work are based on observation and found empirically. We are currently working on an approach to automate this procedure by using a Bayesian network. Also, in further work, we will extend the automatic segmentation and phoneme transcription approach to all the components of manual signs, for complete recognition of signed sentences. References 1. Fry, D. B. (1959). Theoretical aspects of mechanical speech recognition. Journal of the British Institution of Radio Engineers, 19(4), Denes, P. (1959). The design and operation of the mechanical speech recognizer at University College London. Journal of the British Institution of Radio Engineers, 19(4), Stokoe, W. C. (1978). Sign language structure: An outline of the visual communication system of the american deaf, studies in linguistics: Occasional papers 8. Silver Spring: Linstok, Liddell, S. K., & Johnson, R. E. (1989). American sign language: The phonological base. Sign Language Studies, 64, Vogler, C., & Metaxas, D. (1999). Towards scalability in ASL recognition: Breaking down sign into phonemes. In Gesture workshop (pp ). Gif-sur-Yvette, France, March. 6. Wang, C., Gao, W., & Shan, S. (2002). An approach based on phonemes to large vocabulary chinese sign language recognition. In Proceedings of the fifth IEEE international conference on automatic face and gesture recognition (pp ). Washinton, DC, USA, May. 7. Walter, M., Psarrou, A., & Gong, S., (2001). Auto clustering for unsupervised learning of atomic gesture components using minimum description length. In Proceedings of the IEEE ICCV workshop on recognition, analysis, and tracking of faces and gestures in real-time systems (pp ). Vancouver, Canada, July. 8. Bauer, B., & Kraiss, K.-F. (2001). Towards an automatic sign language recognition system using subunits. In Gesture workshop (pp ). London, UK, April. 9. Wang, C., et al. (2000). An approach to automatically extracting the basic units in chinese sign language recognition. In Proceedings of the 5th international conference on signal processing (pp ). Beijing, China, August. 10. Fang, G., et al. (2004). A novel approach to automatically extracting basic units from chinese sign language. In Proceedings of the 17th international conference on pattern recognition (pp ). Cambridge, UK, August. 11. Wilpon, J., & Rabiner, L. (1985). A modified k-means clustering algorithm for use in isolated work recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(3), Sagawa, H., & Takeuchi, M. (2000). A method for recognizing a sequence of sign language words represented in a japanese sign language sentence. In Proceedings of the fourth IEEE international conference on automatic face and gesture recognition (pp ). Grenoble, France, March. 13. Wang, T.-S., et al. (2001). Unsupervised analysis of human gestures. In Proceedings of the second IEEE pacific rim conference on multimedia: Advances in multimedia information processing (pp ). Beijing, China, October. 14. Gibet, S., & Marteau, P.-F. (2007). Approximation of curvature and velocity using adaptive sampling representations application to hand gesture analysis. In Gesture workshop. Lisbon, Portugal, May. 15. Rao, C., Yilmaz, A., & Shah, M. (2002) View-invariant representation and recognition of actions. International Journal of Computer Vision, 50(2), Asada, H., & Brady, M. (1984). The curvature primal sketch. Technical Report 758, MIT AI memo.

12 222 J Sign Process Syst (2010) 59: Nam, Y., & Wohn, K. (1996). Recognition of space-time hand-gestures using hidden Markov model. In Proceedings of the ACM symposium on virtual reality software and technology (pp ), Hong Kong, July. 18. Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), Bilmes, J. (2006). What HMMs can do. IEICE Transactions on Information and Systems, E89-D(3), in human gesture understanding applications and also include human-computer interaction, machine learning, and computer vision. W. W. Kong received the B.Eng. (Honors) degree in 2000 and the M.Eng. degree in 2005, both in Electrical Engineering, from the National University of Singapore. During 2001, she was with the R&D Division of Singapore Epson Industrial Pvt. Ltd., where she was working on scanner software development and testing. Currently, she is working toward the Ph.D. degree in the Department of Electrical and Computer Engineering at the National University of Singapore. Her research interests are Surendra Ranganath received the B. Tech. degree in Electrical Engineering from the Indian Institute of Technology (Kanpur), the M.E. degree in Electrical Communication Engineering from the Indian Institute of Science (Bangalore) and the Ph.D degree in Electrical Engineering from the University of California (Davis). From 1982 to 1985, he was with the Applied Research Group at Tektronix, Inc., Beaverton, OR where he was working in the area of digital video processing for enhanced and high definition TV. From 1986 to 1991, he was with the medical imaging group at Philips Laboratories, Briarcliff Manor, NY. In 1991, he joined the Department of Electrical and Computer Engineering at the National University of Singapore, where he is currently an Associate Professor. His research interests are in digital image processing, computer vision, and machine learning with focus on human-computer interaction and video understanding applications.

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Vision based Vehicle Tracking using a high angle camera

Vision based Vehicle Tracking using a high angle camera Vision based Vehicle Tracking using a high angle camera Raúl Ignacio Ramos García Dule Shu gramos@clemson.edu dshu@clemson.edu Abstract A vehicle tracking and grouping algorithm is presented in this work

More information

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION Saurabh Asija 1, Rakesh Singh 2 1 Research Scholar (Computer Engineering Department), Punjabi University, Patiala. 2 Asst.

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Tracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object

More information

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Automatic Labeling of Lane Markings for Autonomous Vehicles

Automatic Labeling of Lane Markings for Autonomous Vehicles Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 jkiske@stanford.edu 1. Introduction As autonomous vehicles become more popular,

More information

Static Environment Recognition Using Omni-camera from a Moving Vehicle

Static Environment Recognition Using Omni-camera from a Moving Vehicle Static Environment Recognition Using Omni-camera from a Moving Vehicle Teruko Yata, Chuck Thorpe Frank Dellaert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 USA College of Computing

More information

Tracking Groups of Pedestrians in Video Sequences

Tracking Groups of Pedestrians in Video Sequences Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Parallel Hidden Markov Models for American Sign Language Recognition

Parallel Hidden Markov Models for American Sign Language Recognition Proceedings of the International Conference on Computer Vision, Kerkyra, Greece, September, 999 Parallel Hidden Markov Models for American Sign Language Recognition Christian Vogler and Dimitris Metaxas

More information

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011 Behavior Analysis in Crowded Environments XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011 Behavior Analysis in Sparse Scenes Zelnik-Manor & Irani CVPR

More information

Solving Simultaneous Equations and Matrices

Solving Simultaneous Equations and Matrices Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering

More information

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition IWNEST PUBLISHER Journal of Industrial Engineering Research (ISSN: 2077-4559) Journal home page: http://www.iwnest.com/aace/ Adaptive sequence of Key Pose Detection for Human Action Recognition 1 T. Sindhu

More information

Online Farsi Handwritten Character Recognition Using Hidden Markov Model

Online Farsi Handwritten Character Recognition Using Hidden Markov Model Online Farsi Handwritten Character Recognition Using Hidden Markov Model Vahid Ghods*, Mohammad Karim Sohrabi Department of Electrical and Computer Engineering, Semnan Branch, Islamic Azad University,

More information

Palmprint Recognition. By Sree Rama Murthy kora Praveen Verma Yashwant Kashyap

Palmprint Recognition. By Sree Rama Murthy kora Praveen Verma Yashwant Kashyap Palmprint Recognition By Sree Rama Murthy kora Praveen Verma Yashwant Kashyap Palm print Palm Patterns are utilized in many applications: 1. To correlate palm patterns with medical disorders, e.g. genetic

More information

RECOGNIZING SIMPLE HUMAN ACTIONS USING 3D HEAD MOVEMENT

RECOGNIZING SIMPLE HUMAN ACTIONS USING 3D HEAD MOVEMENT Computational Intelligence, Volume 23, Number 4, 2007 RECOGNIZING SIMPLE HUMAN ACTIONS USING 3D HEAD MOVEMENT JORGE USABIAGA, GEORGE BEBIS, ALI EROL, AND MIRCEA NICOLESCU Computer Vision Laboratory, University

More information

Computational Geometry. Lecture 1: Introduction and Convex Hulls

Computational Geometry. Lecture 1: Introduction and Convex Hulls Lecture 1: Introduction and convex hulls 1 Geometry: points, lines,... Plane (two-dimensional), R 2 Space (three-dimensional), R 3 Space (higher-dimensional), R d A point in the plane, 3-dimensional space,

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior

Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior Kenji Yamashiro, Daisuke Deguchi, Tomokazu Takahashi,2, Ichiro Ide, Hiroshi Murase, Kazunori Higuchi 3,

More information

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Trading Strategies and the Cat Tournament Protocol

Trading Strategies and the Cat Tournament Protocol M A C H I N E L E A R N I N G P R O J E C T F I N A L R E P O R T F A L L 2 7 C S 6 8 9 CLASSIFICATION OF TRADING STRATEGIES IN ADAPTIVE MARKETS MARK GRUMAN MANJUNATH NARAYANA Abstract In the CAT Tournament,

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Low-resolution Character Recognition by Video-based Super-resolution

Low-resolution Character Recognition by Video-based Super-resolution 2009 10th International Conference on Document Analysis and Recognition Low-resolution Character Recognition by Video-based Super-resolution Ataru Ohkura 1, Daisuke Deguchi 1, Tomokazu Takahashi 2, Ichiro

More information

Face Model Fitting on Low Resolution Images

Face Model Fitting on Low Resolution Images Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Classifying Manipulation Primitives from Visual Data

Classifying Manipulation Primitives from Visual Data Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if

More information

Efficient on-line Signature Verification System

Efficient on-line Signature Verification System International Journal of Engineering & Technology IJET-IJENS Vol:10 No:04 42 Efficient on-line Signature Verification System Dr. S.A Daramola 1 and Prof. T.S Ibiyemi 2 1 Department of Electrical and Information

More information

Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

More information

A Reliability Point and Kalman Filter-based Vehicle Tracking Technique

A Reliability Point and Kalman Filter-based Vehicle Tracking Technique A Reliability Point and Kalman Filter-based Vehicle Tracing Technique Soo Siang Teoh and Thomas Bräunl Abstract This paper introduces a technique for tracing the movement of vehicles in consecutive video

More information

Towards Automated Large Vocabulary Gesture Search

Towards Automated Large Vocabulary Gesture Search Towards Automated Large Vocabulary Gesture Search Alexandra Stefan, Haijing Wang, and Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington, USA ABSTRACT This paper

More information

Analysis of Multi-Spacecraft Magnetic Field Data

Analysis of Multi-Spacecraft Magnetic Field Data COSPAR Capacity Building Beijing, 5 May 2004 Joachim Vogt Analysis of Multi-Spacecraft Magnetic Field Data 1 Introduction, single-spacecraft vs. multi-spacecraft 2 Single-spacecraft data, minimum variance

More information

A Method of Caption Detection in News Video

A Method of Caption Detection in News Video 3rd International Conference on Multimedia Technology(ICMT 3) A Method of Caption Detection in News Video He HUANG, Ping SHI Abstract. News video is one of the most important media for people to get information.

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

MA 323 Geometric Modelling Course Notes: Day 02 Model Construction Problem

MA 323 Geometric Modelling Course Notes: Day 02 Model Construction Problem MA 323 Geometric Modelling Course Notes: Day 02 Model Construction Problem David L. Finn November 30th, 2004 In the next few days, we will introduce some of the basic problems in geometric modelling, and

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Solving Geometric Problems with the Rotating Calipers *

Solving Geometric Problems with the Rotating Calipers * Solving Geometric Problems with the Rotating Calipers * Godfried Toussaint School of Computer Science McGill University Montreal, Quebec, Canada ABSTRACT Shamos [1] recently showed that the diameter of

More information

Christian Vogler and Dimitris Metaxas

Christian Vogler and Dimitris Metaxas Accepted at the Gesture Workshop 99, March 17 19, 1999, Gif-sur-Yvette, France. See Appendix B for some comments that did not make it into the paper. Toward Scalability in ASL Recognition: Breaking own

More information

Tracking Densely Moving Markers

Tracking Densely Moving Markers Tracking Densely Moving Markers N. Alberto Borghese and Paolo Rigiroli b) Laboratory of Human Motion Analysis and Virtual Reality (MAVR), Department of Computer Science, University of Milano, Via Comelico

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

DETECTION OF SIGN-LANGUAGE CONTENT IN VIDEO THROUGH POLAR MOTION PROFILES

DETECTION OF SIGN-LANGUAGE CONTENT IN VIDEO THROUGH POLAR MOTION PROFILES 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) DETECTION OF SIGN-LANGUAGE CONTENT IN VIDEO THROUGH POLAR MOTION PROFILES Virendra Karappa, Caio D. D. Monteiro, Frank

More information

Reconstructing 3D Pose and Motion from a Single Camera View

Reconstructing 3D Pose and Motion from a Single Camera View Reconstructing 3D Pose and Motion from a Single Camera View R Bowden, T A Mitchell and M Sarhadi Brunel University, Uxbridge Middlesex UB8 3PH richard.bowden@brunel.ac.uk Abstract This paper presents a

More information

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal

More information

Binary Image Scanning Algorithm for Cane Segmentation

Binary Image Scanning Algorithm for Cane Segmentation Binary Image Scanning Algorithm for Cane Segmentation Ricardo D. C. Marin Department of Computer Science University Of Canterbury Canterbury, Christchurch ricardo.castanedamarin@pg.canterbury.ac.nz Tom

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Solutions to old Exam 1 problems

Solutions to old Exam 1 problems Solutions to old Exam 1 problems Hi students! I am putting this old version of my review for the first midterm review, place and time to be announced. Check for updates on the web site as to which sections

More information

Using Lexical Similarity in Handwritten Word Recognition

Using Lexical Similarity in Handwritten Word Recognition Using Lexical Similarity in Handwritten Word Recognition Jaehwa Park and Venu Govindaraju Center of Excellence for Document Analysis and Recognition (CEDAR) Department of Computer Science and Engineering

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Going Big in Data Dimensionality:

Going Big in Data Dimensionality: LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE Going Big in Data Dimensionality: Challenges and Solutions for Mining High Dimensional Data Peer Kröger Lehrstuhl für

More information

Understanding Purposeful Human Motion

Understanding Purposeful Human Motion M.I.T Media Laboratory Perceptual Computing Section Technical Report No. 85 Appears in Fourth IEEE International Conference on Automatic Face and Gesture Recognition Understanding Purposeful Human Motion

More information

HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT

HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT International Journal of Scientific and Research Publications, Volume 2, Issue 4, April 2012 1 HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT Akhil Gupta, Akash Rathi, Dr. Y. Radhika

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

Accelerometer Based Real-Time Gesture Recognition

Accelerometer Based Real-Time Gesture Recognition POSTER 2008, PRAGUE MAY 15 1 Accelerometer Based Real-Time Gesture Recognition Zoltán PREKOPCSÁK 1 1 Dept. of Telecomm. and Media Informatics, Budapest University of Technology and Economics, Magyar tudósok

More information

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow , pp.233-237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow Giwoo Kim 1, Hye-Youn Lim 1 and Dae-Seong Kang 1, 1 Department of electronices

More information

Simultaneous Gamma Correction and Registration in the Frequency Domain

Simultaneous Gamma Correction and Registration in the Frequency Domain Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong a28wong@uwaterloo.ca William Bishop wdbishop@uwaterloo.ca Department of Electrical and Computer Engineering University

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

CS231M Project Report - Automated Real-Time Face Tracking and Blending

CS231M Project Report - Automated Real-Time Face Tracking and Blending CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks

Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks Welcome to Thinkwell s Homeschool Precalculus! We re thrilled that you ve decided to make us part of your homeschool curriculum. This lesson

More information

SOLID MECHANICS TUTORIAL MECHANISMS KINEMATICS - VELOCITY AND ACCELERATION DIAGRAMS

SOLID MECHANICS TUTORIAL MECHANISMS KINEMATICS - VELOCITY AND ACCELERATION DIAGRAMS SOLID MECHANICS TUTORIAL MECHANISMS KINEMATICS - VELOCITY AND ACCELERATION DIAGRAMS This work covers elements of the syllabus for the Engineering Council exams C105 Mechanical and Structural Engineering

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Jiří Matas. Hough Transform

Jiří Matas. Hough Transform Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian

More information

AUTOMATIC EVOLUTION TRACKING FOR TENNIS MATCHES USING AN HMM-BASED ARCHITECTURE

AUTOMATIC EVOLUTION TRACKING FOR TENNIS MATCHES USING AN HMM-BASED ARCHITECTURE AUTOMATIC EVOLUTION TRACKING FOR TENNIS MATCHES USING AN HMM-BASED ARCHITECTURE Ilias Kolonias, William Christmas and Josef Kittler Centre for Vision, Speech and Signal Processing University of Surrey,

More information

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer A Segmentation Algorithm for Zebra Finch Song at the Note Level Ping Du and Todd W. Troyer Neuroscience and Cognitive Science Program, Dept. of Psychology University of Maryland, College Park, MD 20742

More information

PHYSIOLOGICALLY-BASED DETECTION OF COMPUTER GENERATED FACES IN VIDEO

PHYSIOLOGICALLY-BASED DETECTION OF COMPUTER GENERATED FACES IN VIDEO PHYSIOLOGICALLY-BASED DETECTION OF COMPUTER GENERATED FACES IN VIDEO V. Conotter, E. Bodnari, G. Boato H. Farid Department of Information Engineering and Computer Science University of Trento, Trento (ITALY)

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

How To Analyze Ball Blur On A Ball Image

How To Analyze Ball Blur On A Ball Image Single Image 3D Reconstruction of Ball Motion and Spin From Motion Blur An Experiment in Motion from Blur Giacomo Boracchi, Vincenzo Caglioti, Alessandro Giusti Objective From a single image, reconstruct:

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

COMPUTING CLOUD MOTION USING A CORRELATION RELAXATION ALGORITHM Improving Estimation by Exploiting Problem Knowledge Q. X. WU

COMPUTING CLOUD MOTION USING A CORRELATION RELAXATION ALGORITHM Improving Estimation by Exploiting Problem Knowledge Q. X. WU COMPUTING CLOUD MOTION USING A CORRELATION RELAXATION ALGORITHM Improving Estimation by Exploiting Problem Knowledge Q. X. WU Image Processing Group, Landcare Research New Zealand P.O. Box 38491, Wellington

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Normalisation of 3D Face Data

Normalisation of 3D Face Data Normalisation of 3D Face Data Chris McCool, George Mamic, Clinton Fookes and Sridha Sridharan Image and Video Research Laboratory Queensland University of Technology, 2 George Street, Brisbane, Australia,

More information

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode

More information

Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification 1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 Principal Components Null Space Analysis for Image and Video Classification Namrata Vaswani, Member, IEEE, and Rama Chellappa, Fellow,

More information

An Iterative Image Registration Technique with an Application to Stereo Vision

An Iterative Image Registration Technique with an Application to Stereo Vision An Iterative Image Registration Technique with an Application to Stereo Vision Bruce D. Lucas Takeo Kanade Computer Science Department Carnegie-Mellon University Pittsburgh, Pennsylvania 15213 Abstract

More information

On Correlating Performance Metrics

On Correlating Performance Metrics On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are

More information

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER 93 CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER 5.1 INTRODUCTION The development of an active trap based feeder for handling brakeliners was discussed

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Path Tracking for a Miniature Robot

Path Tracking for a Miniature Robot Path Tracking for a Miniature Robot By Martin Lundgren Excerpt from Master s thesis 003 Supervisor: Thomas Hellström Department of Computing Science Umeå University Sweden 1 Path Tracking Path tracking

More information

Thresholding technique with adaptive window selection for uneven lighting image

Thresholding technique with adaptive window selection for uneven lighting image Pattern Recognition Letters 26 (2005) 801 808 wwwelseviercom/locate/patrec Thresholding technique with adaptive window selection for uneven lighting image Qingming Huang a, *, Wen Gao a, Wenjian Cai b

More information

An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network

An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network Proceedings of the 8th WSEAS Int. Conf. on ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING & DATA BASES (AIKED '9) ISSN: 179-519 435 ISBN: 978-96-474-51-2 An Energy-Based Vehicle Tracking System using Principal

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan Handwritten Signature Verification ECE 533 Project Report by Ashish Dhawan Aditi R. Ganesan Contents 1. Abstract 3. 2. Introduction 4. 3. Approach 6. 4. Pre-processing 8. 5. Feature Extraction 9. 6. Verification

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta]. 1.3 Neural Networks 19 Neural Networks are large structured systems of equations. These systems have many degrees of freedom and are able to adapt to the task they are supposed to do [Gupta]. Two very

More information

How To Segmentate An Image

How To Segmentate An Image Edge Strength Functions as Shape Priors in Image Segmentation Erkut Erdem, Aykut Erdem, and Sibel Tari Middle East Technical University, Department of Computer Engineering, Ankara, TR-06531, TURKEY, {erkut,aykut}@ceng.metu.edu.tr,

More information

Lecture L6 - Intrinsic Coordinates

Lecture L6 - Intrinsic Coordinates S. Widnall, J. Peraire 16.07 Dynamics Fall 2009 Version 2.0 Lecture L6 - Intrinsic Coordinates In lecture L4, we introduced the position, velocity and acceleration vectors and referred them to a fixed

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Bayesian Network Modeling of Hangul Characters for On-line Handwriting Recognition

Bayesian Network Modeling of Hangul Characters for On-line Handwriting Recognition Bayesian Network Modeling of Hangul haracters for On-line Handwriting Recognition Sung-ung ho and in H. Kim S Div., EES Dept., KAIS, 373- Kusong-dong, Yousong-ku, Daejon, 305-70, KOREA {sjcho, jkim}@ai.kaist.ac.kr

More information

The Visual Internet of Things System Based on Depth Camera

The Visual Internet of Things System Based on Depth Camera The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed

More information