Behavior Grouping based on Trajectories Mining. Department of Medical Informatics Shimane University, School of Medicine, Japan

Similar documents
Temporal Data Mining in Hospital Information Systems: Analysis of Clinical Courses of Chronic Hepatitis

Maintenance of Domain Knowledge for Nursing Care using Data in Hospital Information System

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -

Coupled Behavior Analysis with Applications

Data Mining for Risk Management in Hospital Information Systems

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]

SoSe 2014: M-TANI: Big Data Analytics

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS

Information processing for new generation of clinical decision support systems

2.1. Data Mining for Biomedical and DNA data analysis

Hospital Information System in Japan Case example of Osaka University Hospital

Highmark Professional Provider Privileging Application Teleradiology Supplement INSTRUCTIONS

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Categorical Data Visualization and Clustering Using Subjective Factors

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Exploration and Visualization of Post-Market Data

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

CLUSTER ANALYSIS FOR SEGMENTATION

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

Two-Phase Data Warehouse Optimized for Data Mining

For People With Diabetes. Blood Sugar Diary

Fujitsu Healthcare Business Overview

Temporal Data Mining for Small and Big Data. Theophano Mitsa, Ph.D. Independent Data Mining/Analytics Consultant

Secure Healthcare IT Solutions Covering Wide Range of Medical Care Information

Information Management course

HMS & HSDM Human Resources Winter Recess Break - December Time Reporting Instructions

DATA MINING AND CUSTOMER RELATIONSHIP MANAGEMENT FOR CLIENTS SEGMENTATION

LAUREA MAGISTRALE - CURRICULUM IN INTERNATIONAL MANAGEMENT, LEGISLATION AND SOCIETY. 1st TERM (14 SEPT - 27 NOV)

State of Bahrain Ministry of Health Salmaniya Medical Complex

INFORMATION TECHNOLOGIES FOR PATIENT CARE MANAGEMENT

Household Information. * Print Full Name: Date: * Address: * Language: * Date of Birth: * Gender: F M

Energy Price Fact Sheet

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Deep Vein Thrombosis or Pulmonary Embolism

Time series clustering and the analysis of film style

Investigating Clinical Care Pathways Correlated with Outcomes

Measurements on the Spotify Peer-Assisted Music-on-Demand Streaming System

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Making the Most of Your Local Pharmacy

Implementing MICO Beyond the EMR

Healthcare Professional. Driving to the Future 11 March 7, 2011

Neural Networks Lesson 5 - Cluster Analysis

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Chemoembolization for Patients with Pancreatic Neuroendocrine Tumours

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz

Project Management Professionals Hot Topics & Challenges Quality Management. Topic: Seven Basic Quality Management Tools

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

COMMUNITY HEALTH RESOURCES PARENT GUIDE. Children s Diagnostic & Treatment Center (CDTC)

Statistical Databases and Registers with some datamining

Electronic Health Records - An Overview - Martin C. Were, MD MS March 24, 2010

Part-time Diploma in InfoComm and Digital Media (Information Systems) Certificate in Information Systems Course Schedule & Timetable

DHL Data Mining Project. Customer Segmentation with Clustering

Automated Process for Generating Digitised Maps through GPS Data Compression

The Role of The Consultant, The Doctor and The Nurse. Mr Gary Kitching Consultant in Emergency Medicine Foundation Training Programme Director

A Two-Step Method for Clustering Mixed Categroical and Numeric Data

Open source tools for trajectory data analysis

Preface: Cognitive Informatics, Cognitive Computing, and Their Denotational Mathematical Foundations (II)

Tutorial for proteome data analysis using the Perseus software platform

Patient Trajectory Modeling and Analysis

Injury Reporting PACKET

Visual Data Mining with Pixel-oriented Visualization Techniques

Dynamic Data in terms of Data Mining Streams

How To Cluster

Effective Clustering of Time-Series Data Using FCM

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Unsupervised Data Mining (Clustering)

Operations Research in Health Care or Who Let the Engineer Into the Hospital?

Introduction to Data Mining

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

The SPSS TwoStep Cluster Component

The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project

Visualization of large data sets using MDS combined with LVQ.

Open & Big Data for Life Imaging Technical aspects : existing solutions, main difficulties. Pierre Mouillard MD

Sales Associate Business Plan-Tobias Realty

Staying on Schedule. Tips for taking your HIV medicines

Comparison of Elastic Matching Algorithms for Online Tamil Handwritten Character Recognition

Unsupervised learning: Clustering

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

CHAPTER 1 INTRODUCTION

High Blood Pressure in People with Diabetes:

Using multiple models: Bagging, Boosting, Ensembles, Forests

14 Sep Jul 2016 (Weeks 1 - V14)

STScI Bandwidth and the Archive. STUC Presentation 11/12/2009 Carl Johnson

Trajectory based Behavior Analysis for User Verification

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

A Review of Missing Data Treatment Methods

Supplementary Material: Covariate-adjusted matrix visualization via correlation decomposition

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Spatio-Temporal Map for Time-Series Data Visualization

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

PLASTIC REGION BOLT TIGHTENING CONTROLLED BY ACOUSTIC EMISSION MONITORING

HDDVis: An Interactive Tool for High Dimensional Data Visualization

n n n n Doctor Patient Nurse / Clerk

Transcription:

Behavior Grouping based on Trajectories Mining Shoji Hirano Shusaku Tsumoto Department of Medical Informatics Shimane University, School of Medicine, Japan 1

Introduction Outline Background, Objective, Approach Method Multiscale comparison and grouping of trajectories Experimental Results Australia Sign Language data Hospital Management Conclusions 2

Temporal Data Mining One Dimensional Time Series: Chronological Behavior of One Variable Two Dimensional Time Series Trajectory: Behavior of Two Variables Grouping of Temporal Sequences Capture the dynamic behavior of Temporal Variables 2D: Detection of Co-variant variables Disease Grouping,..

Discoveries from Hepatitis Data Left: ALB, PLT covariant Right: ALB, PLT non-covariant PLT PLT PLT PLT #170 (C5;F4) #602 (C5;F4) ALB ALB #558 (C15;F1) ALB #636 (C15;F3) ALB Two Groups of Disease Progression of Liver Fibrosis Group1: ALB, PLT: decreasing Group2: PLT: decreasing, ALT: stable

Trajectory Mining Process Segmentation and Generation of Multiscale Trajectories Segment Hierarchy Trace and Matching Calculation of Dissimilarities Clustering of Trajectories 5

Multiscale Structural Comparison Represent trajectories using multiscale description Search the best correspondences of partial trajectory throughout all scales Attr.2 (cf.ueda et al. (1990) Trajectory B Scale 0 Scale 1 Scale 2 Segment t=0 Attr.2 Attr.1 Trajectory A Scale 0 Scale 1 Scale 2 t=0 Attr.1 6

Multiscale Description Represent convex/concave structure of trajectories on various observation scales Trajectory representation ( ex ( t), ex ( t),..., ex ( )) c( t) = 2 1 I t ex i ( t), i I : time series of test i (cf. Mokahatan et al. (1986)) σ=large C( t, σ ) Trajectory at scale σ C( t, σ ) = EX ( t, σ ), EX 2( t, σ ),..., EX I ( t, EX ( t, σ ) = ex ( t) g( t, σ ) ( )) 1 σ i = i n= σ e In(σ ) exi ( t) I n : modified Bessel function of order n σ=large: Global feature of the trajectory σ=small: Local feature of the trajectory σ=small C(t,0) 7

Segment Matching based on Concave/convex Structures Segment: partial trajectory between inflection points Curvature at scale σ(2d case) K( t, σ ) = EX 1EX + EX EX ( EX + EX ) 2 2 1 1 2 2 2 3/ 2 (cf.ueda et al. (1990) σ=large c j ( t, σ ) (σ ) A EX ( m) i ( t, σ ) = Inflection point: t, σ C j EX i ( t, σ ) m t Segment representation m = ex ( t) g i ( m) ( t, σ ) ( ) : K( t 1, σ ) K( t, σ ) < { ( σ ) a i = 1,2 N} ( σ ) A = i,..., 0 σ=small (0) a 2 (0) a 1 (0) A 8

Multiscale Structural Comparison Global Matching Criteria Minimization of total segment dissimilarity Complete match; the original trajectory must be formed without gaps/overlaps by concatenating the segments Dissimilarity k ) d( a i, b ( ( h) j ) between two segments ( k ) ( h) a i, b j d( a ( k ) i, b ( j) h ) = g g + θ θ ( k ) a i ( h) b j 2 ( k ) a i ( h) b j 2 + v ( k ) a i v ( h) b gradient rotation angle velocity j + γ k ) ( c ( a ) + c( b ( ( j) i h )) replacement cost ( k ) a v = i l n ( k ) ai ( k ) a i (length) (# of points) (k ) θ ai (k ) g ai (h) v b j (h) θ bj (h) g bi Segment (k ) a i Segment (h) b j 9

Value-based Dissimilarity of Trajectories After structural matching, calculate the value-based dissimilarity for each pair of matched segments Attr.2 Trajectory A CoG Attribute 1 dissimilarity dv1(ap,bp) = peak difference+ (left diff. + right diff.)/2 Attr.2 Attr.1 Attribute 2 dissimilarity dv2(ap,bp) = peak difference+ (left diff. + right diff.)/2 Trajectory B (0) (0) 2 2 val ( a p, bp ) = dv 1 dv2 d + + cost Attr.1 D val ( A, B) = 1 P P p= 1 d val ( a (0) p, b (0) p ) 10

Experiment 1: ASL Data Dataset: Australia sign lang. dataset in UCI KDD archive Time-series data on the hand positions (3D) collected from 5 signers during performance of sign language. Used for experimental validation by Vlachos et al. in ICDE02 (as 2D trajectory) and Keogh et al. in KDD00 (as 1D time-series) For each signers, two to five sessions were conducted. In each session, five sign samples were recorded for each of the 95 words. The length of each sample was different and typically contained about 50-150 time points. signer A signer E session 1 session n session n word 1 word 95 sample 1 sample 5 word 95 sample 1 sample 5 Examples of Norway 11

Experiment 1: ASL Data Experimental Procedure Out of the 95 signs (words), select the following 10 signs: Norway, cold, crazy, eat, forget, happy, innocent, later, lose, spend. Select a pair of words such as {Norway, cold}. For each word, there exist 5 sign samples; therefore a total of 10 samples are selected. Calculate the dissimilarities for each pair of the 10 samples by the proposed method. Construct two groups by applying average-linkage hierarchical clustering. Evaluate whether the samples are grouped correctly. word 1 ( Norway ) sample 1 sample 5 word 2 ( cold ) sample 1 sample 5 pairwise comparison & grouping (into two clusters) evaluate whether groups are correct or not Apply this procedure for every pair of 10 words (total 45 pairs /session) 12

Experiment 1: ASL Data Results Session # of correct pairs ratio andrew2 26/ 45 0.578 john2 34/ 45 0.756 john3 29/ 45 0.644 john4 30/ 45 0.667 stephen2 38/ 45 0.844 stephen4 29/ 45 0.644 waleed1 33/ 45 0.733 waleed2 36/ 45 0.800 waleed3 25/ 45 0.556 waleed4 26/ 45 0.578 (best) (worst) According to Vlachos et al., the results by the Euclidean dist., DTW, and LCSS were 0.333 (15/45), 0.444 (20/45), and 0.467 (21/45). Signer/session info was not available on the paper. 13

Background for 2 nd Expermeint Hospital Information System (1980 s- ) Computerization of All Hospital Information Large-Scale Databases Data: Order and its Record: 1Order 3 to 5 Trans. All the clinical actions are described as orders Prescription Doctor (Order) Pharmacist Laboratory Examination Doctor (Order) Laboratory

Background: HIS (2) Hospital Information System Computerization of Orders Results of Orders Data for Clinical Actions Reuse of Stored Data Laboratory Examinations, Prescriptions, They are results from orders History of Orders: History of Clinical Actions Data-centric Hospital Management

Background: HIS (3) How many orders are made every day? A Case: Shimane University Hospital 616 beds, 1000 for outpatient clinic #Orders: about 8000 Prescription: 700, Injection: 700 Actions (Doctors & Nurses): 4300 Storage of Data : 100MB /day 30GB / year (cf. Image: 2.5TB/ year)

Chronology of #Orders (2008.6.1~6.7) Mon Tue Wed Thr Fri Fri Sun Sat

Chronology of #Orders (2008.6.2) Descriptions Documents Nursery

#Login 2008/6/2~2008/6/7 Wards Outpatient Clinic

Reuse of Data Understanding Dynamic Behavior of Hospital, Doctors and Patients : Temporal Data Mining Reuse of Orders Analysis of Clinical Actions Data Mining for Temporal Behaviors of Hospital or Medical Staff New type of Hospital Management

Co-occurrence of #Orders (2008.6.2) Reservations Prescription Morning Examinationa Afternoon Records

Experiment 2 : Data of #Orders Data # of Orders for Each Day (2008.6.2~6.7) Objective Find groups of similar trajectories Analyze the relationships between the grouped trajectories Method Generate a dissimilarity matrix using the proposed method Perform cluster analysis using dendrograms generated by hierarchical clustering method Results 2 Major Groups: Outpatient/Ward + Ward

Clustering Results

Visualization for Clusters

Records + Reservations Reservations Morning Outpatient Wards Afternoon Records Prescriptions, Examinations, Radiology, Reservations

Records and Nursery (Wards) Nursery Afternoon Wards Morning Records Outpatient Nursery and Injections

Conclusions Presented a new method for trajectory mining Trajectory representation -> multiscale, structural comparison -> value-based dissimilarity -> clustering Application to Australia Sign Language Dataset Correct grouping ratio: 0.556 (worst), 0.844 (best) High robustness to noise Application to Hopsital Data Two Groups of Behavior of #Orders: Outpatient, Ward Captured the Macroscopic Behavior of the UniversityHospital Future work Extention to Multidimensional Trajectories 27

Preliminary Results (3D) Matching Results for 3-D Trajectories 28

29