Behavior Grouping based on Trajectories Mining Shoji Hirano Shusaku Tsumoto Department of Medical Informatics Shimane University, School of Medicine, Japan 1
Introduction Outline Background, Objective, Approach Method Multiscale comparison and grouping of trajectories Experimental Results Australia Sign Language data Hospital Management Conclusions 2
Temporal Data Mining One Dimensional Time Series: Chronological Behavior of One Variable Two Dimensional Time Series Trajectory: Behavior of Two Variables Grouping of Temporal Sequences Capture the dynamic behavior of Temporal Variables 2D: Detection of Co-variant variables Disease Grouping,..
Discoveries from Hepatitis Data Left: ALB, PLT covariant Right: ALB, PLT non-covariant PLT PLT PLT PLT #170 (C5;F4) #602 (C5;F4) ALB ALB #558 (C15;F1) ALB #636 (C15;F3) ALB Two Groups of Disease Progression of Liver Fibrosis Group1: ALB, PLT: decreasing Group2: PLT: decreasing, ALT: stable
Trajectory Mining Process Segmentation and Generation of Multiscale Trajectories Segment Hierarchy Trace and Matching Calculation of Dissimilarities Clustering of Trajectories 5
Multiscale Structural Comparison Represent trajectories using multiscale description Search the best correspondences of partial trajectory throughout all scales Attr.2 (cf.ueda et al. (1990) Trajectory B Scale 0 Scale 1 Scale 2 Segment t=0 Attr.2 Attr.1 Trajectory A Scale 0 Scale 1 Scale 2 t=0 Attr.1 6
Multiscale Description Represent convex/concave structure of trajectories on various observation scales Trajectory representation ( ex ( t), ex ( t),..., ex ( )) c( t) = 2 1 I t ex i ( t), i I : time series of test i (cf. Mokahatan et al. (1986)) σ=large C( t, σ ) Trajectory at scale σ C( t, σ ) = EX ( t, σ ), EX 2( t, σ ),..., EX I ( t, EX ( t, σ ) = ex ( t) g( t, σ ) ( )) 1 σ i = i n= σ e In(σ ) exi ( t) I n : modified Bessel function of order n σ=large: Global feature of the trajectory σ=small: Local feature of the trajectory σ=small C(t,0) 7
Segment Matching based on Concave/convex Structures Segment: partial trajectory between inflection points Curvature at scale σ(2d case) K( t, σ ) = EX 1EX + EX EX ( EX + EX ) 2 2 1 1 2 2 2 3/ 2 (cf.ueda et al. (1990) σ=large c j ( t, σ ) (σ ) A EX ( m) i ( t, σ ) = Inflection point: t, σ C j EX i ( t, σ ) m t Segment representation m = ex ( t) g i ( m) ( t, σ ) ( ) : K( t 1, σ ) K( t, σ ) < { ( σ ) a i = 1,2 N} ( σ ) A = i,..., 0 σ=small (0) a 2 (0) a 1 (0) A 8
Multiscale Structural Comparison Global Matching Criteria Minimization of total segment dissimilarity Complete match; the original trajectory must be formed without gaps/overlaps by concatenating the segments Dissimilarity k ) d( a i, b ( ( h) j ) between two segments ( k ) ( h) a i, b j d( a ( k ) i, b ( j) h ) = g g + θ θ ( k ) a i ( h) b j 2 ( k ) a i ( h) b j 2 + v ( k ) a i v ( h) b gradient rotation angle velocity j + γ k ) ( c ( a ) + c( b ( ( j) i h )) replacement cost ( k ) a v = i l n ( k ) ai ( k ) a i (length) (# of points) (k ) θ ai (k ) g ai (h) v b j (h) θ bj (h) g bi Segment (k ) a i Segment (h) b j 9
Value-based Dissimilarity of Trajectories After structural matching, calculate the value-based dissimilarity for each pair of matched segments Attr.2 Trajectory A CoG Attribute 1 dissimilarity dv1(ap,bp) = peak difference+ (left diff. + right diff.)/2 Attr.2 Attr.1 Attribute 2 dissimilarity dv2(ap,bp) = peak difference+ (left diff. + right diff.)/2 Trajectory B (0) (0) 2 2 val ( a p, bp ) = dv 1 dv2 d + + cost Attr.1 D val ( A, B) = 1 P P p= 1 d val ( a (0) p, b (0) p ) 10
Experiment 1: ASL Data Dataset: Australia sign lang. dataset in UCI KDD archive Time-series data on the hand positions (3D) collected from 5 signers during performance of sign language. Used for experimental validation by Vlachos et al. in ICDE02 (as 2D trajectory) and Keogh et al. in KDD00 (as 1D time-series) For each signers, two to five sessions were conducted. In each session, five sign samples were recorded for each of the 95 words. The length of each sample was different and typically contained about 50-150 time points. signer A signer E session 1 session n session n word 1 word 95 sample 1 sample 5 word 95 sample 1 sample 5 Examples of Norway 11
Experiment 1: ASL Data Experimental Procedure Out of the 95 signs (words), select the following 10 signs: Norway, cold, crazy, eat, forget, happy, innocent, later, lose, spend. Select a pair of words such as {Norway, cold}. For each word, there exist 5 sign samples; therefore a total of 10 samples are selected. Calculate the dissimilarities for each pair of the 10 samples by the proposed method. Construct two groups by applying average-linkage hierarchical clustering. Evaluate whether the samples are grouped correctly. word 1 ( Norway ) sample 1 sample 5 word 2 ( cold ) sample 1 sample 5 pairwise comparison & grouping (into two clusters) evaluate whether groups are correct or not Apply this procedure for every pair of 10 words (total 45 pairs /session) 12
Experiment 1: ASL Data Results Session # of correct pairs ratio andrew2 26/ 45 0.578 john2 34/ 45 0.756 john3 29/ 45 0.644 john4 30/ 45 0.667 stephen2 38/ 45 0.844 stephen4 29/ 45 0.644 waleed1 33/ 45 0.733 waleed2 36/ 45 0.800 waleed3 25/ 45 0.556 waleed4 26/ 45 0.578 (best) (worst) According to Vlachos et al., the results by the Euclidean dist., DTW, and LCSS were 0.333 (15/45), 0.444 (20/45), and 0.467 (21/45). Signer/session info was not available on the paper. 13
Background for 2 nd Expermeint Hospital Information System (1980 s- ) Computerization of All Hospital Information Large-Scale Databases Data: Order and its Record: 1Order 3 to 5 Trans. All the clinical actions are described as orders Prescription Doctor (Order) Pharmacist Laboratory Examination Doctor (Order) Laboratory
Background: HIS (2) Hospital Information System Computerization of Orders Results of Orders Data for Clinical Actions Reuse of Stored Data Laboratory Examinations, Prescriptions, They are results from orders History of Orders: History of Clinical Actions Data-centric Hospital Management
Background: HIS (3) How many orders are made every day? A Case: Shimane University Hospital 616 beds, 1000 for outpatient clinic #Orders: about 8000 Prescription: 700, Injection: 700 Actions (Doctors & Nurses): 4300 Storage of Data : 100MB /day 30GB / year (cf. Image: 2.5TB/ year)
Chronology of #Orders (2008.6.1~6.7) Mon Tue Wed Thr Fri Fri Sun Sat
Chronology of #Orders (2008.6.2) Descriptions Documents Nursery
#Login 2008/6/2~2008/6/7 Wards Outpatient Clinic
Reuse of Data Understanding Dynamic Behavior of Hospital, Doctors and Patients : Temporal Data Mining Reuse of Orders Analysis of Clinical Actions Data Mining for Temporal Behaviors of Hospital or Medical Staff New type of Hospital Management
Co-occurrence of #Orders (2008.6.2) Reservations Prescription Morning Examinationa Afternoon Records
Experiment 2 : Data of #Orders Data # of Orders for Each Day (2008.6.2~6.7) Objective Find groups of similar trajectories Analyze the relationships between the grouped trajectories Method Generate a dissimilarity matrix using the proposed method Perform cluster analysis using dendrograms generated by hierarchical clustering method Results 2 Major Groups: Outpatient/Ward + Ward
Clustering Results
Visualization for Clusters
Records + Reservations Reservations Morning Outpatient Wards Afternoon Records Prescriptions, Examinations, Radiology, Reservations
Records and Nursery (Wards) Nursery Afternoon Wards Morning Records Outpatient Nursery and Injections
Conclusions Presented a new method for trajectory mining Trajectory representation -> multiscale, structural comparison -> value-based dissimilarity -> clustering Application to Australia Sign Language Dataset Correct grouping ratio: 0.556 (worst), 0.844 (best) High robustness to noise Application to Hopsital Data Two Groups of Behavior of #Orders: Outpatient, Ward Captured the Macroscopic Behavior of the UniversityHospital Future work Extention to Multidimensional Trajectories 27
Preliminary Results (3D) Matching Results for 3-D Trajectories 28
29