MOTION CAPTURE ASSISTED ANIMATION: TEXTURING AND SYNTHESIS



Similar documents
Motion Capture Assisted Animation: Texturing and Synthesis

Motion Capture Assisted Animation: Texturing and Synthesis

CS 4204 Computer Graphics

Interactive Computer Graphics

Fundamentals of Computer Animation

CHAPTER 6 TEXTURE ANIMATION

The 3D rendering pipeline (our version for this class)

This week. CENG 732 Computer Animation. Challenges in Human Modeling. Basic Arm Model

animation animation shape specification as a function of time

Exercises for older people

Computer Animation. Jason Lawrence CS 4810: Graphics

Graphics. Computer Animation 고려대학교 컴퓨터 그래픽스 연구실. kucg.korea.ac.kr 1

Chapter 1. Animation. 1.1 Computer animation

SimFonIA Animation Tools V1.0. SCA Extension SimFonIA Character Animator

The Fundamental Principles of Animation

ROTATOR CUFF HOME EXERCISE PROGRAM

Range of Motion. A guide for you after spinal cord injury. Spinal Cord Injury Rehabilitation Program

ANKLE STRENGTHENING INTRODUCTION EXERCISES SAFETY

General Guidelines. Neck Stretch: Side. Neck Stretch: Forward. Shoulder Rolls. Side Stretch

Character Animation from a Motion Capture Database

How To Stretch Your Body

C O M P U C O M P T U T E R G R A E R G R P A H I C P S Computer Animation Guoying Zhao 1 / 66 /

Physical Capability Strength Test: One Component of the Selection Process

Fact sheet Exercises for older adults undergoing rehabilitation

Mocap in a 3D Pipeline

Computer Animation. Computer Animation. Principles of Traditional Animation. Outline. Principles of Traditional Animation

Cardiac Rehab Program: Stretching Exercises

Rehabilitation Exercises for Shoulder Injuries Pendulum Exercise: Wal Walk: Back Scratcher:

Basic Stretch Programme 3. Exercise Circuit 4

Strength Training HEALTHY BONES, HEALTHY HEART

Computer Animation. CS 445/645 Fall 2001

EGOSCUE CLINIC PAIN/POSTURE STRETCHES 1. Standing Arm Circles: Helps Restore Upper Body Strength

Chronos - Circuit Training Bodyweight

Preventing Falls. Strength and balance exercises for healthy ageing

Physical & Occupational Therapy

Athletics (Throwing) Questions Javelin, Shot Put, Hammer, Discus

CG T17 Animation L:CC, MI:ERSI. Miguel Tavares Coimbra (course designed by Verónica Orvalho, slides adapted from Steve Marschner)

KNEE EXERCISE PROGRAM

MOON SHOULDER GROUP. Rotator Cuff Home Exercise Program. MOON Shoulder Group

Passive Range of Motion Exercises

SAMPLE WORKOUT Full Body

Knee Conditioning Program. Purpose of Program

Rotator Cuff Home Exercise Program MOON SHOULDER GROUP

Clasp hands behind hips and stretch arms down towards floor. Roll shoulder back to open chest. Do not let back arch. Power Skips

Exercises for Low Back Injury Prevention

Maya 2014 Basic Animation & The Graph Editor

Lower Body Strength/Balance Exercises

animation shape specification as a function of time

Transferring Safety: Prevent Back Injuries

Get to Know Golf! John Dunigan

Pattern Characterization of Running and Cutting Maneuvers in Relation to Noncontact

CHAPTER 3: BACK & ABDOMINAL STRETCHES. Standing Quad Stretch Athletic Edge (650)

Automatic Labeling of Lane Markings for Autonomous Vehicles

Motion Capture Technologies. Jessica Hodgins

ACL Reconstruction Rehabilitation Program

try Elise s toning exercise plan

Back Safety Goals. Back injury and injury prevention Lifting techniques Quiz

Chapter 1. Introduction. 1.1 The Challenge of Computer Generated Postures

INTRODUCTION TO POSITIONING. MODULE 3: Positioning and Carrying

Range of Motion Exercises

Moving and Handling Techniques

Lesson Plan. Performance Objective: Upon completion of this assignment, the student will be able to identify the Twelve Principles of Animation.

ACL Reconstruction Rehabilitation

Design of a six Degree-of-Freedom Articulated Robotic Arm for Manufacturing Electrochromic Nanofilms

Sports Injury Treatment

COMP Visualization. Lecture 15 Animation

Shoulders (free weights)

JUNIPERO SERRA VOLLEYBALL OFF SEASON TRAINING SCHEDULE

Material 1. Dolly, hand cart, wheel cart, large box, back belt

CARDIAC REHABILITATION HOME EXERCISE ADVICE

Cardiovascular rehabilitation home exercise programme

Hip Conditioning Program. Purpose of Program

Spine Conditioning Program Purpose of Program

ISOMETRIC EXERCISE HELPS REVERSE JOINT STIFFNESS, BUILDS MUSCLE, AND BOOSTS OVERALL FITNESS.

Back Safety and Lifting

Computer Workstation Ergonomic Self Evaluation

Rehabilitation after shoulder dislocation

When lifting and carrying weight, do these things:

Exercise 1: Knee to Chest. Exercise 2: Pelvic Tilt. Exercise 3: Hip Rolling. Starting Position: Lie on your back on a table or firm surface.

Foot and Ankle Conditioning Program. Purpose of Program

Conditioning From Gym To Home To Gym

A Game of Numbers (Understanding Directivity Specifications)

EXERCISE INSTRUCTIONS 1

EXERCISE DESCRIPTIONS PHASE I Routine #1

Effective Use of Android Sensors Based on Visualization of Sensor Information

This document fully describes the 30 Day Flexibility Challenge and allows you to keep a record of your improvements in flexibility.

SHOULDER PULL DOWNS. To learn efficient use of the shoulder blades and arms while maintaining a neutral spine position.

Ensure that the chair you use is sturdy and stable. Wear comfortable clothes and supportive footwear.

Project 2: Character Animation Due Date: Friday, March 10th, 11:59 PM

.org. Herniated Disk in the Lower Back. Anatomy. Description

SAM PuttLab. Reports Manual. Version 5

Adult Advisor: Plantar Fasciitis. Plantar Fasciitis

Optimizing an Electromechanical Device with Mulitdimensional Analysis Software

Injuries from lifting, loading, pulling or pushing can happen to anyone in any industry. It is important to be aware of the risks in your workplace.

he American Physical Therapy Association would like to share a secret with you. It can help you do more with less effort breathe easier feel great.

Instructor Training Program Levels 1 through 4 Uneven Bars

Closed-Loop Motion Control Simplifies Non-Destructive Testing

Medial Collateral Ligament Sprain: Exercises

Transcription:

MOTION CAPTURE ASSISTED ANIMATION: TEXTURING AND SYNTHESIS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF PHYSICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Katherine Ann Pullen July 2002

c Copyright by Katherine Ann Pullen 2002 All Rights Reserved ii

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Christoph Bregler (Principal Adviser) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Patricia Burchat (Physics) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Gene Alexandar (Mechanical Engineering) Approved for the University Committee on Graduate Studies: iii

Abstract This thesis discusses methods for using the information in motion capture data to assist in the creation of life-like animations. To address the problem of developing better methods for collecting motion capture data, we focus on video motion capture. A new factorization method is presented that allows one to solve for the model of the subject s skeleton from a series of video images. No markers or special suits are required. The rest of this thesis discusses new techniques for flexible use of motion capture data after it has been collected. For the case of cyclic motions such as walking, we demonstrate a technique for complete synthesis. It begins with an analysis phase, in which the data is divided into features such as frequency bands and correlations among joint angles, and is represented with multidimensional kernel-based probability distributions. These distributions are then sampled in a synthesis phase, and optimized to yield the final animation. We also demonstrate methods for performing texturing and synthesis of widely varying motions based on motion capture data. First we present results using principle components analysis as a basis for the texturing and synthesis. Second we discuss our most successful technique for motion capture assisted animation, in which a simple matching algorithm is used. These methods allow an animator to sketch an animation by setting a small number of keyframes on a fraction of the possible degrees of freedom. Motion capture data is then used to enhance the animation. Detail is added to degrees of freedom that were keyframed, a process we call texturing. Degrees of freedom that were not keyframed are synthesized. The methods take advantage of the fact that joint motions of an articulated figure are often correlated, so that given an incomplete data set, the missing degrees of freedom can be predicted from those that are present. Finally, we discuss the various techniques and results, and suggest approaches for future improvements. iv

Acknowledgements There are so many people to thank, people without without whom this project would not have been possible. First of all, there is the Stanford Movement group: Gene Alexander, Rich Bragg, Chris Bregler, Ajit Chaudhari, Erika Chuang, Hrishi Despande, James Davis, Lorie Loeb, Lorenzo Torresani, Kingsley Willis, and Danny Yang. These people have provided many wonderful discussions and insights about my work, and often believed that what I was working on was going somewhere even when I didn t think it was. My advisors Pat Burchat and Chris Bregler have been extremely supportive throughout my graduate career, giving sound advice but always respecting the path I chose even when it was unconventional. I am eternally grateful to the Stanford Dance Department, for completely altering the course of my life in a positive way. In particular I wish to thank the faculty members Kristine Elliot, Emilie Flink, Diane Frank, Tony Kramer, Theresa Maldenado, and Robert Moses. I owe my sanity to the many friends I have interacted with while at Stanford, especially Wendy Adams, Debbie Bielawski, Cindy Chen, Tasha Fairfield, Anna Friedlander, Leslie Ikemoto, Karyn Ishimoto, Susie Judd, Miriam Kaplan, Elizabeth Koenig, Nadya Mason, Amit Mehta, Georg Petschnigg, Allison Rockwell, Uma Sundaram, Doug and Melissa Thomas, Melissa Starovasnik, and Esther Yuh. I also wish to thank my family, who have always been there for me and encouraged me to pursue my dreams. It was great being in the Bay Area with the whole Tapia clan, spawned by my aunt Marjeen and Uncle Stino, who were always a comforting presence. My dear Grandma and Grandpa and Aunt Diane were always in my thoughts despite the long distances. Finally, I am especially grateful to my immediate family: Mom, Dad, and brother Walter. v

Contents Abstract Acknowledgements iv v 1 Introduction 1 1.1 Project Motivation.............................. 1 1.2 Application to Animation........................... 3 1.3 Motion Capture Methods........................... 5 1.3.1 Overview............................... 5 1.3.2 Optical................................ 6 1.3.3 Magnetic............................... 7 1.3.4 Video................................. 7 1.4 Animation Methods.............................. 8 1.4.1 Keyframe Animation......................... 8 1.4.2 Physical Simulation......................... 12 1.4.3 Motion Capture............................ 14 1.5 Thesis Outline................................. 16 2 Video Motion Capture 19 2.1 Motivation................................... 19 2.2 Background Theory and Previous Work................... 20 2.2.1 Overview............................... 20 2.2.2 Video-based Tracking: Gradient Formulation............ 20 2.2.3 Example Motion Model: Affine Tracking.............. 21 vi

2.2.4 Twist Motion Model......................... 22 2.3 Solving for Joint Locations.......................... 26 2.4 Experiments.................................. 29 3 Sampling Kernel-Based Probability Distributions 31 3.1 Overview................................... 31 3.2 Related Work................................. 32 3.2.1 Sampling Probability Densities................... 32 3.2.2 Signal Processing........................... 33 3.3 Methods.................................... 33 3.3.1 Analysis............................... 34 3.3.2 Synthesis............................... 40 3.4 Experiments.................................. 43 3.5 Discussion................................... 48 4 Principle Components Analysis Based Methods 49 4.1 Overview................................... 49 4.2 Review of Principle Components Analysis.................. 50 4.3 Application to Motion Data.......................... 51 4.4 Discussion................................... 53 5 Fragment Based Methods 55 5.1 Overview................................... 55 5.2 Related Work................................. 56 5.3 Methods.................................... 59 5.3.1 Frequency Analysis.......................... 62 5.3.2 Matching............................... 62 5.3.3 Path finding.............................. 69 5.3.4 Joining................................ 71 5.4 Experiments.................................. 74 5.4.1 Walking................................ 74 5.4.2 Otter Character............................ 75 vii

5.4.3 Modern Dance............................ 76 5.5 Discussion................................... 77 6 Conclusions and Discussion 85 Bibliography 90 viii

List of Tables 2.1 Symbols used in derivations......................... 27 ix

List of Figures 1.1 Keyframe animation of a computer model. Here we illustrate the example of animating one of the spine angles of a human character to cause her to bend forward at the waist. The animator sets the upright start pose, shown at the left of the figure with the character colored red. The animator also sets the end pose, bent forward at the waist, shown to the right of the figure in red. The animator specifies how many frames should occur between these two poses, and the computer fills in the missing frames by interpolating between the two key poses. Another way to represent this process is with a graph, shown below the images of the character. The plot is of the spine angle as a function of time, and each point corresponds to one of the images at the top of the figure. Key positions are indicated with red dots, and positions interpolated by the computer are indicated with blue dots........ 10 x

1.2 Comparison of keyframed data and motion capture data for root y translation for walking. (a) Keyframed data, with keyframes indicated by red stars. The points computed by the computer are indicated with blue dots. In this example, the keyframed data has been created by setting the minimum possible number of keys to describe the motion. (b) Motion capture data. Here all the points are specified by the data, and are shown with black dots. Notice that while the keyframed data is very smooth and sinusoidal, the motion capture data shows irregularities and variations. These natural fluctuations are inherent to live motion. A professional keyframe animator would achieve such detail by setting more keys............................. 11 2.1 Hopping Wallaby with acquired kinematic model overlayed....... 30 3.1 Example of a set of 4 phases during a walk cycle. The phases are as follows (a) right foot flat on the floor; (b) right heel lifts, right toe still contacting floor; (c) left foot flat on the floor; (d) left heel lifts, left toe contacting floor. Note this is a simplified model, for example in reality there is a moment when the left toes are on the floor at the same time the right heel is touching the floor. However, we found this simplified model gave good results in the synthesis process.............. 35 3.2 Hip angle data with the phases marked. Right foot flat, green circle; right toe in contact, magenta triangle; left foot flat, blue star; left toe in contact, red square. Note how the data has a very particular structure within each phase............................... 35 3.3 Example of decomposing data into frequency bands. Shown is the left hip angle data, higher frequencies are at the top, lower at the bottom. A Laplacian pyramid decomposition was used for this plot........ 36 3.4 Plot of the knee angle vs. the hip angle at each point in time for two walk styles. (a) normal walk (b) funky walk................ 37 xi

3.5 Contour plot of a 2-D kernel-based probability distribution for the hip and knee angle, the same data as shown in the correlations plot in figure 3.4a. Four different sigmas were used, as a fraction of the standard deviation of the angle data. (a) 1/40 (b) 1/10 (c) 1/5 (d) 1/2........ 39 3.6 Plots of hip angle data after sampling and optimization. Shown is the 5th lowest frequency band in a Laplacian Pyramid decomposition. (a) Motion capture data; (b) synthetic data after sampling; (c) the same synthetic data as in figure 3.6b after optimization............. 44 3.7 Correlation plots of hip angle data after sampling and optimization. Shown is the 5th lowest frequency band in a Laplacian Pyramid decomposition, the same data as in figure 3.6. The plot is of each point at time t-2 versus the point at time t. (a) Motion capture data is shown with black circles, sampled data is shown with blue squares. Note how some of the blue squares fall outside the range one would expect them to be in based on the distribution of black circles. (b) Motion capture data is again shown with black circles, and the sampled data after being optimized is shown with magenta stars. Now the synthetic data falls within a range predicted by the real data............... 45 3.8 Example frame from one of the output animations. Each of the characters was animated with a different set of synthetic data, note how they vary in their motions.......................... 47 xii

5.1 Illustration of the difference between texturing and synthesis. All plots are of the lowest spine x angle. (a) A keyframed curve is shown with a dashed blue line. Here we did not illustrate the key positions in this figure, only the resulting curve after the computer interpolates between them. Note its smooth appearance, as is common with computer generated curves. The solid magenta line is the result after texturing. (b) In this plot, we consider a case in which the spine angle was not animated at all, as indicated by the dashed blue line which does not change with time. This degree of freedom was synthesized, and the result is shown with the solid magenta line. (c) A plot of motion capture data is shown here for comparison. Note that its overall appearance is similar to the textured and synthesized curves. Compare also figure 1.2........ 57 5.2 Correlation between joint angles. Shown is the left knee x angle versus the left hip x angle for each point in time for human walking data. Data points are indicated with blue circles, and points that are consecutive in time are connected by black lines. The fact that this plot has a definite form demonstrates that the angles are related to each other. (Also see figure 3.4.).................................. 60 5.3 Choosing the matching angles from the keyframed data. Shown are plots of joint angle as a function of time for some of the degrees of freedom from a keyframed sketch of a humanoid character. (Not all of the degrees of freedom of this particular character are shown to save space). In this sketch, only the lower body degrees of freedom were animated, as can be seen by the fact that only the joint angles from the legs show any change with time. We choose some of the degrees of freedom that were animated to serve as the matching angles that will drive the rest of the animation. In this example we use the left hip x and knee x angles, as indicated with red dashed lines in the figure.... 61 xiii

5.4 Matching angles in the motion capture data. In these plot we show the same degrees of freedom as in figure 5.3, but for the motion capture data. Here we can see that the motion for all degrees of freedom, including the upper body, is specified. The matching angles selected from the keyframed data are again indicated here by dashed red lines. These selected degrees of freedom will be compared to the keyframed data to find similar regions as described in the text............ 61 5.5 Frequency analysis. Shown are bands 2-7 (where lower numbers refer to higher frequencies) of a Laplacian Pyramid decomposition of the left hip x angle for dance motions from both keyframing and motion capture. Higher frequency bands are shown at the top of the figure, lower frequency bands at the bottom. Adding all the bands together yields the original signal. One band, shown with a red dashed line, is chosen for the matching step........................ 63 5.6 Breaking data into fragments. The bands of the keyframed data and motion capture data shown with red dashed lines in figure 5.5 are broken into fragments where the sign of the first derivative changes. (a) keyframed data. (b) motion capture data. (c) keyframed data broken in to fragments. (d) motion capture data broken into fragments..... 64 xiv

5.7 Matching. (a) A longer segment of the low frequency band of the hip x angle data (a matching angle) from figures 5.5 and 5.6 is shown here again in black, broken into fragments. To the left in blue is the first keyframed fragment from figure 5.6c. Note the position of this fragment is arbitrary here, it is shown only for purposes of comparison to the motion capture curve. We wish to find fragments of the motion capture data that are similar to it, and some possibilities are shown with dashed magenta lines. (b) The spine x angle motion capture data from the same locations in time is shown, broken into fragments at the same location as the matching angle data. If the animator wished to synthesize the spine angle data, the fragments of spine angle data from the same locations in time where the matching hip angle data was chosen would be saved, as indicated by dashed magenta lines. (c) If on the other hand the animator had already keyframed a sketch of the spine angle motion and wished to texture the result, only the high frequency bands of the spine angle data would be selected. Shown is a plot of the sum of bands 2 and 3 of a Laplacian pyramid decomposition of spine x angle motion capture data, and the chosen fragments after matching are again indicated by the dashed magenta lines............. 65 5.8 Close-up of the matching process. Each keyframed fragment is compared to all of the motion capture fragments, and the K closest matches are kept. Shown is the process of matching the first fragment shown in figure 5.6c. (a) The keyframed fragment to be matched. (b) The keyframed fragment, shown in a thick blue line, compared to all of the motion capture fragments, shown in thin black lines. (c) Same as figure 5.8b, but the motion captured fragments have been stretched or compressed to be the same length as the keyframed fragment. (d) Same as figure 5.8c, but only the 5 closest matches are shown........... 66 xv

5.9 Matching and synthesis. (a) The five closest matches for a series of fragments of keyframed data is shown. The keyframed data is shown with a thick blue line, the matching motion capture fragments are shown with thin black lines. (b) An example of one of the angles being synthesized is shown, the lowest spine joint angle rotation about the x axis. The five fragments for each section come from the spine motion capture data from the same location in time as the matching hip angle fragments shown in figure 5.9a. (c) An example of a possible path through the chosen spine angle fragments is shown with a thick red line....................................... 67 5.10 Texturing. Shown are bands 2-7 of a Laplacian Pyramid decomposition of the lowest spine x angle for a keyframe animation of dance motion. On the left is the original keyframed data. On the right is the result after texturing, in which bands 2-3, shown in magenta, have been replaced by joined fragments of the corresponding bands of the motion capture data as described in the text................ 68 5.11 Choosing a path by maximizing the instances of consecutive fragments. In the table we show a hypothetical example of a case where four keyframed fragments were matched, and the K = 3 closest matches of motion capture fragments were kept for each keyframed fragment. The matches at the tops of the columns are the closest of the 3 matches. Blue lines are drawn between fragments that were consecutive in the motion capture data, and the cost matricies between each set of possible matches are shown below......................... 70 xvi

5.12 Joining the ends of selected fragments. (a) Four fragments of spine angle data that were chosen in the matching step are shown. Note this graph is a close up view of the first part of the path illustrated in figure 5.9c. There are significant discontinuities between the first and second fragments, as well as between the third and fourth. (b) The original endpoints of the fragments are marked with black circles, the new endpoints are marked with blue stars. The second and third fragments were consecutive in the motion capture data, so the new and old endpoints are the same. (c) For each fragment, the line between the old endpoints (black dashes) and the line between the new endpoints (blue solid line) are shown. (d) For each fragment, the line between the old endpoints is subtracted, and the line between the new endpoints is added, to yield the curve of joined fragments. The new endpoints are again marked with blue stars........................ 72 5.13 Smoothing at the join point. A close up of the join between fragments 1 and 2 from figure 5.12 is shown with a red solid line. (a) The quadratic fit using the points on either side of the join point (as described in the text) is shown with a black dashed line. (b) The data after blending with the quadratic fit is shown with a blue dashed line.......... 73 5.14 Example frames from the walking animations. On the top row are some frames from the keyframed sketch, and on the bottom row are the corresponding frames after enhancement............... 75 5.15 Example frames from animations of the otter character. On the top row are some frames from the original keyframed animation, while on the bottom are the corresponding frames after texturing......... 76 5.16 Example frames from the dance animations. The blue character, on the left in each image, represents the keyframed sketch. The purple character, on the right in each image, shows the motion after enhancement...................................... 77 xvii

xviii

Chapter 1 Introduction 1.1 Project Motivation The ultimate goal of this project is to gain a better understanding of nuance in human and animal movement. In particular, we are interested in characterizing (1) variations in repetitive motions in a given individual and (2) differences in the same movement executed by different individuals. We use the term motion texture (a term originally suggested by Ken Perlin, a professor at New York University) to describe both of these aspects of live movement. Just as a piece of cloth has a certain texture defined by its look and feel, so does an individual s way of moving. For example, often you can recognize the identity of a person from far away without being able to see his face, just by how he is walking. Also in analogy to the case of a cloth texture is the fact that an integral part of a motion texture is the presence of stochastic properties. Just as the piece of cloth has irregularities such as slight variations in the size of each stitch, a person s motion will have variations within it. These variations may take place over relatively long time scales. For example, when we walk not every step is identical. Some steps may be slightly shorter or longer, and the upper body will respond slightly differently to each step. The variations may also occur over shorter time scales, for example within one step of a walk cycle the motion is not perfectly smooth, but shows some natural high frequency fluctuation. There are a number of fields of study that require a detailed understanding of the motion texture exhibited by each individual. One of the biggest applications of this concept is in 1

2 CHAPTER 1. INTRODUCTION biomechanics and medicine. In recent years, as the technology has improved both for the tracking of motion (see section 1.3) and in computer modelling [35], there have been many quantitative studies of human motion for the purposes of treatment and prevention of injuries [2]. Upon being injured, people will often alter their patterns of motion. Initially they may do so to avoid pain, but sometimes long after the pain has dissipated the altered movement patterns will remain. For example, such a phenomenon has been observed in patients who have torn their anterior cruciate ligaments in their knees [8]. The changes are usually subtle, and can only be detected by careful analysis of the gait of each patient. Since every individual moves with his or her own texture, the changes must be understood in the context of that individual s way of moving. It is important to understand these changes, because they often are not best for the long term health of the patient. By being able to identify these alterations in motion, a therapist may be able to recognize and help the patient overcome them more quickly. Similarly, noting individual differences in how people move may lead to the ability to prevent certain injuries from occurring. For example, in patients with osteoarthritis of the knee, it has been shown that individual variations in the dynamic loading of the knee strongly affects the outcome of standard treatments [39]. Research is in progress to determine whether these variations also influence the tendency to develop osteoarthritis in the first place [2]. Again, these changes are subtle, and in diagnosis one must be able to distinguish a gait pattern that might lead to injury from the natural variations among the way in which different people walk. Another application which requires a detailed understanding of live motion is in computer vision. There has been a great deal of interest in recent years in creating a fully video-based tracking system for articulated motions (see section 1.3.4). In such a system, the input to the computer would be a series of video images of the subject, and the algorithm would automatically detect the overall position and limb configuration at each frame. This problem is a difficult one, especially in the case of faster motion where the difference in position between frames may be significant. It has been proposed [15] to make these techniques more efficient with an initial coarse search stage. Such a search could be made more efficient by including probabilistic information about the motion of the subject [47].

1.2. APPLICATION TO ANIMATION 3 In other words, if the algorithm could predict where the subject is likely to have moved to in the next frame based on the previous frames, it may be able to narrow down the search for the next position. As in the case of the biomechanics applications, such a search requires a detailed knowledge of the patterns of human motion. A final application for the understanding of the way in which humans and animals tend to move is in animation. Animators are often particularly concerned with the subtle detail of a character s motion, because the nuance often reveals mood or personality. As improvements in technology have made data of live motion more readily available, there has been an increased interest in using the information in such data to assist an animator in the creation of a character s motion. This thesis focusses on the development of methods for analyzing and manipulating motion capture data for use in creating more life-like animations. 1.2 Application to Animation Currently there are three main methods by which a computer animation of a character can be generated. Most commonly, keyframing is used, in which the animator specifies important key poses for the character at some frames, and the computer calculates what the frames between these keys should be with an interpolation technique. In a second method, physical simulation is used to drive the motion of the character. Due to the complexity of the required calculations, this method has not been used with much success for characters. Finally, in more recent years, motion capture has been used to animate characters. In motion capture, sensors are placed on a live person, and the data that describes his or her motion is collected and mapped onto the character. As the technology for motion capture has improved and the cost has come down, there has been increasing interest in using it for character animation. The benefits of motion capture are many. Often an animator is particularly concerned with the subtle detail of how the character moves. However, achieving detail in a keyframed animation is extremely labor intensive. With motion capture data, all of the detail is immediately present, along with the nuance that gives personality to that individual s way of moving. In other words, the data contains the texture of the motion. The goal of a skilled animator is to reveal the

4 CHAPTER 1. INTRODUCTION personality or mood of the character through its motion texture. One problem with motion capture data is that after it has been collected, it is difficult to change. In fact, partly because of this inflexibility, many animators have little interest in using motion capture data. Keyframing may be labor intensive, but one can make a character do exactly what one wants it to. It is often difficult to know exactly what motions are needed before entering a motion capture session, and afterwards when the animator sits down to create the scene, he or she may find that the data is not exactly what is needed. It could be a simple change, for example perhaps the subject in the motion capture session walked in a straight line, but in the scene being created the character must walk in a curved path and stop at a particular point. To address this problem, many techniques have been developed to help edit motion capture data after it has been collected [23, 28, 38, 45]. However these methods may not always be sufficient. First, in editing the motion, one must be careful to not alter it in such a way that the detail is lost. Second, often the animator may want a completely different action than was captured in the data, so that it is not a simple matter of editing a motion that is already there. It is this situation that we are especially interested in. The animator may not have the motions she wants in the data, but may have a number of other motions performed by the actor that have the style and life-like qualities that she wants for her animation. We ask the question, given the information that is present in the motion capture data, can the animator somehow generate other motions that have the texture of that motion? This question is, in fact, the central one addressed by this thesis. The ultimate goal of this work is not to create a fully automatic method for creating animations. Instead, we are seeking methods that enable an animator to use live data to assist with the creative process of developing a character animation. The ideal situation would be as follows. The animator gets an idea about a particular style of motion she would like, but which might be difficult to animate. She has a friend who moves in that style, and can easily collect some data of that friend moving. She then sits down to create her animation, starting with the keyframe method she is familiar with, and which allows her to control the character. However, now that she has the motion capture data, she can texture her animation with the style of her friend s motions. To achieve this dream scenario, there are several goals that must be met. First, to allow

1.3. MOTION CAPTURE METHODS 5 her to spontaneously collect data of her friend, the method of data collection must be simple and flexible. Current motion capture systems do not meet that criteria; they involve highly sophisticated, expensive equipment that one can only use in particular locations at prescheduled times. As a result, there is a great deal of interest in developing systems based totally on video data, which would be inexpensive and portable. Recently computer vision techniques have improved to the point that such a motion capture system may be possible. In this thesis we present another step toward achieving that goal. The second goal that must be met is to develop new techniques for using the data after it has been collected. The methods must be flexible enough to allow the animator to have control over the results, and yet maintain the texture of the original motion capture data. The bulk of this thesis is devoted to discussing such methods. In the remainder of this chapter, background information relevant to our work is provided. In section 1.3 we discuss currently available methods for motion capture, and the problems associated with each. In section 1.4 we discuss the advantages and disadvantages of the various methods of animation. Finally, in section 1.5 we provide a brief outline of the body of the thesis. 1.3 Motion Capture Methods 1.3.1 Overview In general, the term motion capture refers to any method for obtaining data that describes the motion of a human or animal. Ultimately, to be useful for driving a computer generated character, this data must take the form of the angles of all the joints in the body being modelled, plus 6 more degrees of freedom for the overall rotations and translations of the body. However, the raw data may take different forms, depending on the method used for capturing the motion. Currently the two most common methods for obtaining motion capture data are optical and magnetic [32].

6 CHAPTER 1. INTRODUCTION 1.3.2 Optical In an optical system, retro-reflective markers are attached to the body of the subject. A system of cameras (the number varies widely, anywhere from 6 to over 500, depending on the particular system) surrounds the space where the subject moves. Each camera sends out a beam of infrared light, which is reflected back from the markers. After the marker positions are recorded as 2D frames, post-processing finds the 3D location of each marker at each point in time, and then solves for the joint configurations. There are many advantages of using an optical system. The movement of the subject is relatively unencumbered, compared to other methods, and it is possible for the space in which the actions can take place to be relatively large. In addition, very high rates of data collection are possible, which is especially important for people in the biomechanics community doing detailed research on joint motions. On the other hand, there are some disadvantages of optical systems, mainly caused by the intensive post-processing required. Usually the data cannot be collected in real time, and in fact it may be several hours or days before the final result can be viewed, which can be problematic if the user has only one day scheduled at the motion capture studio. An even greater problem is that of occluded markers. The multiple camera set-up is designed to minimize this problem, so that at any given moment no matter what direction the subject is facing or what position he or she is in, the chances are good that each marker will be seen by at least some of the cameras. However in practice there are still often moments where markers are occluded from all of the cameras, usually by self-occlusion. For example, if the subject hunches forward too much, any markers on the front of the body will be covered. The software becomes confused when it loses sight of a marker, and a technician must spend time going through a data set fixing these moments by hand. A related problem is that it is difficult to capture more than one subject at a time, because when they get close together the markers of each overlap, and again the software becomes confused as to which marker belongs to which person.

1.3. MOTION CAPTURE METHODS 7 1.3.3 Magnetic The advantages of magnetic systems address many of the flaws of optical systems. In a magnetic system, a known magnetic field is set up, and the actor wears sensors that detect the location and orientation of each limb based on that magnetic field. This method allows for real-time data collection, and there are no problems with occlusion. On the other hand, one big drawback of this method is that it is very sensitive to the area it is performed in. Metal objects must not be nearby, and usually a field of high enough quality for data collection can only be created in a relatively small space. In addition, wires must be attached to each sensor, which makes many motions awkward for the subject. In most cases the wires run from the sensors to an external interface. In higher-end models, the system is wireless in that the wires all connect to a unit worn by the subject as a backpack. Such a system allows for much greater freedom of motion than having wires run to an external location, but still may encumber the motion of the subject, much more so than in an optical system in which the only objects attached to the person are the small reflectors. 1.3.4 Video The technology surrounding both optical and magnetic systems is continuing to improve, and the motion capture data generated by such systems is becoming more readily available to animators interested in using it. In fact, there are numerous motion capture studios where one can have custom data collected. However, it is still a cumbersome process. The fees may be large, and a particular day must be scheduled ahead of time to collect the data. If the animator then finds the data is not exactly what is needed, another day must be scheduled, and another fee paid. An even greater problem may be that the data must be collected in a special studio. Even for an optical system, in practice the space is usually quite limited. The most dynamic motions are likely to be found in a different environment, for example an athlete in the midst of a game on the field or a dancer performing on stage. If you take the athletes off the field, have them wear special suits, stand around and be calibrated, and then ask them to perform their activity, the results will not be nearly as dynamic as if they were actually in competition.

8 CHAPTER 1. INTRODUCTION As a result, it would be extremely useful to be able to get motion data by merely using a couple of video cameras. However, this technique is a difficult one. Standard computer vision tracking techniques do not work for an articulated figure, in which many of the motions cannot be defined by a simple affine transformation, but involve rotations about all the joints in a kinematic chain. Work has been done to address this problem by Bregler [14], in which a kinematic chain tracker was developed that can simultaneously extract the global translation and rotations of an articulated figure as well as the joint angles. A problem with this method is that an accurate model of a skeleton is required, but that information may not be known before beginning the experiments. In this thesis the tracking method is extended to allow one to solve for the joint positions, which provides the necessary information to begin tracking. 1.4 Animation Methods There are three main methods by which computer animations are created: (1) key frame interpolation; (2) physical simulation; and (3) motion capture. Each of these methods has its advantages and disadvantages and is appropriate in different situations. In this thesis we are mainly concerned with using motion capture data, but to put the work in context it is useful to consider a brief review of each of the animation methods. In the following sections, we discuss and review the advantages and disadvantages of the various methods of animation in more detail. 1.4.1 Keyframe Animation Keyframe animation has been used by traditional animators (animators who draw the frames of the animation by hand) long before the advent of computers. In traditional animation, one normally draws the extremes, or important landmarks in the motion, called keyframes, and then draws the intermediate frames using the keyframes as a guide. With the advent of computers and 3D graphics, people began using the computer as a tool to assist in creating an animation. A 3D model of a character is created in the computer, and the animator again specifies keyframes, this time not by drawing, but by posing the model in the computer. In

1.4. ANIMATION METHODS 9 this case the animator does not have to create the intermediate frames. Instead, the computer calculates them based on the keyframes, usually by interpolating between the key positions to create the motion curves that drive the action of the modelled character. This process is illustrated in figure 1.1. On the surface, this use of the computer may appear to be a great savings in labor. However, in reality the use of computer models has its own set of difficulties associated with it. As a result, creating a computer animation can be just as labor intensive as making a traditional animation. One main reason for the high labor cost of a computer animation is that a typical articulated figure model such as a humanoid character usually has at least 50 degrees of freedom. For example, a minimal model of a human may have a left and right hip, knee, ankle, ball of foot, shoulder, elbow, and wrist, as well as 5 joints for the spinal column. Each of these joints has 3 degrees of freedom (rotation about the x, y, and z axis). In addition we must include the 6 degrees of freedom for the root translations and rotations, for a total of 63 degrees of freedom. Making the model more realistic by adding hand and finger joints or more spine joints would further increase the complexity. The animator must then painstakingly animate each of these degrees of freedom, one at a time. Another problem with keyframe animation as used on a computer is the interpolation process. If too few keyframes are set, the motion may be lacking in the detail we are used to seeing in live motion (figure 1.2). The curves that are generated between key poses by computer are usually smooth splines or other forms of interpolation, which do not represent the way a live human or animal moves. Live motion contains variations at high frequencies that splines do not. An animator may achieve a high level of detail by setting more and more keyframes, even to the point of specifying the position at every time, but at the expense of more time and effort. In fact, many other researchers before us have made the observation that part of what gives a texture its distinctive look, be it in cloth or in motion, are variations within the texture. These variations are often referred to as noise, and one of the earliest papers to address this topic was in image texture synthesis, where random variability was added to textures with the Perlin-noise function [36]. These ideas were later applied to animations [37]. Other researchers have created motion of humans running using dynamical simulations [25] and applied hand crafted noise functions [10].

10 CHAPTER 1. INTRODUCTION Figure 1.1: Keyframe animation of a computer model. Here we illustrate the example of animating one of the spine angles of a human character to cause her to bend forward at the waist. The animator sets the upright start pose, shown at the left of the figure with the character colored red. The animator also sets the end pose, bent forward at the waist, shown to the right of the figure in red. The animator specifies how many frames should occur between these two poses, and the computer fills in the missing frames by interpolating between the two key poses. Another way to represent this process is with a graph, shown below the images of the character. The plot is of the spine angle as a function of time, and each point corresponds to one of the images at the top of the figure. Key positions are indicated with red dots, and positions interpolated by the computer are indicated with blue dots.

1.4. ANIMATION METHODS 11 2 0 2 4 6 8 10 (a) Keyframed Data translation in inches 12 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.5 1 1.5 2 2.5 3 3.5 4 (b) Motion Capture Data 4.5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 time in seconds Figure 1.2: Comparison of keyframed data and motion capture data for root y translation for walking. (a) Keyframed data, with keyframes indicated by red stars. The points computed by the computer are indicated with blue dots. In this example, the keyframed data has been created by setting the minimum possible number of keys to describe the motion. (b) Motion capture data. Here all the points are specified by the data, and are shown with black dots. Notice that while the keyframed data is very smooth and sinusoidal, the motion capture data shows irregularities and variations. These natural fluctuations are inherent to live motion. A professional keyframe animator would achieve such detail by setting more keys.

12 CHAPTER 1. INTRODUCTION The disadvantage of these methods is in that they require one to (1) decide on a noise function; (2) tune the function to optimize the look of each animation; and (3) the resulting animations, while improved from the original, still may not look correct, or truly life-like. Our work differs in that we extract a motion texture, which inherently contains variations, from live data rather than trying to develop an artificial noise function that must be tuned before being added to the animation. 1.4.2 Physical Simulation In order to reduce the burden on the animator, there has been a large amount of research to develop techniques based on physical simulation. These methods have been most useful for animating cloth deformations [7, 20], rigid objects [5, 6, 33], or fluids [21, 22] where the physics of the situation that determines the motion can clearly be specified. However, the problem of creating a complete physics-based model of an articulated figure for the purposes of artistic animation has not yet been solved. Most work in physical simulation so far has taken place by researchers seeking highly accurate models for use in biomechanical studies. Such models require one to account for complexities such as the fact that more than one muscle usually controls each joint; muscles exert forces on tendons, which may have non-linear properties; and joints usually are not simple hinges, but may have complex kinematics, involving sliding as well as rotating about multiple axis [19]. This type of modelling is not practical for animation, in which one wants to be able to quickly compose a wide variety of motions. It is extremely unlikely that the animator will know the proper configuration of muscles and bones or the internal energy required to move them to create the desired motion. In fact the animator probably will not know even basic starting points such as the masses of the limbs of the character, which may or may not even resemble a human if it is a fantasy creature. However, a number of researchers have developed clever methods to make use of physics in their work. Some of the most successful such animations of articulated figures using physics based methods were done using the method of spacetime constraints [49]. The animator specifies the physical parameters of the character, for example the masses of each limb and the spring constants of the joints. High level controls such as key positions and hard constraints at contact points are

1.4. ANIMATION METHODS 13 also specified. The motion is determined by solving a constrained optimization problem over all of the time points at once. The solver seeks to minimize the energy of the system while maintaining the constraints set by the animator. In practice the most complicated character this technique was applied to was the Luxo Lamp (from Pixar Studios). This character had just four joints, each of which had one degree of freedom, plus two translational degrees of freedom. In other words, the motion was restricted to a plane. The resulting animations were quite life-like and appealing, but extending this method to a human model is not practical. A simple animation of the lamp making a jump took just under 10 minutes to compute. When one considers the far more complex motion of a human, for example the presence of two legs and the weight shifting back and forth between them, or the difficulty of even correctly specifying the masses and spring constants for all of the joints, it is clear that the computations would be excessively long if they could be performed at all. Because of these difficulties, the most successful attempts at using rigorous physical simulation of human motion have been modelled after work in robotics research [42]. The problem of making a physical robot walk is analogous to generating a simulation of a walk. The legs move with input of energy from a power source, but it is difficult to know exactly how much energy to put into each leg at what time points to have the robot walk forward in a coordinated manner without falling down. The problem is complicated by the fact that it is likely that the robot is being controlled at a high level by the user, for example with a remote control that allows for the specification of direction and velocity. Solving the equations of motion and predicting the proper initial conditions to move such a system in real time would be extremely difficult if not impossible. As a result, most robots incorporate some form of feedback control into their movement. A sensor detects whether the robot is falling, moving too slowly, etc., and then responds by increasing or decreasing the energy input to the appropriate joints. The same principle can be applied to animation. Now instead of a physical robot, we have a simulated model of a human in the computer. Jessica Hodgins and her colleagues have developed a method of applying control systems to virtual humans [43] and applied it to create animations of humans performing athletic events such as running, biking and a gymnast vaulting [25]. The animations were successful in that the characters clearly

14 CHAPTER 1. INTRODUCTION performed the activity being simulated. However, they are clearly simulated; one can see immediately that the motions are not life-like. In addition, each specific motion had to be treated differently. For example, for running a state machine was used; the dynamics are different depending on whether the character is in flight or has one foot in contact with the ground, which in turn differs from the gymnast who vaults off of her hands or the cyclist who is constantly in contact with the bike seat. In fact this problem of being unable to generalize the model is inherent to rigorous physical simulation; it will always be difficult to find a model and control system applicable to any motion an animator might dream up. As a result, we felt that in developing a method for creating life-like animations, it would be more effective to use a statistical analysis of live data to generate the motion curves. Correlations among various features in the data can be modelled with probability distributions. For example, we create multidimensional probability distributions based on correlations among various features of the data that we can sample from to create new motions (chapter 3). In other work, we break the real and keyframed data into fragments, and seek the closet matches for use in texturing and synthesis (chapter 5). 1.4.3 Motion Capture The use of live data for animations has a long history. Traditional animators (animators who draw each frame by hand) often carefully study motion by looking at movie frames in slow motion to see exactly how a live person executes a motion. Taking this idea even further, they may revert to rotoscoping, in which case a film of a live actor is actually traced frame-by-frame. In this case they then often go back and accentuate the motion beyond what was originally there to make it more extreme and give even more personality to the characters. With the advent of computer and 3D models of articulated figures, animators have another tool at their disposal in the form of motion capture data. In motion capture data, the joint configuration of a live person are detected by sensors, and these angles are then read into the computer model to create the animation (see section 1.3 for more details). Originally such data was extremely difficult to obtain, as the sensor technology is costly and few animators had access to it. However in recent years the technology has improved

1.4. ANIMATION METHODS 15 and motion capture data has become much more readily available. As a result, there has been increased interest in using it for creating computer animations. The advantages of using motion capture data are clear. It immediately provides motion data for all degrees of freedom at a very high level of detail. If the motion is unusual or if extreme realism is required, it may be difficult for an animator to accurately create the motion. As a result, it may appear that it is much easier to simply capture the actions of a live actor performing the motions, and then map the data onto a model. However, there are some distinct disadvantages of this method. One of the largest is that the data may not be exactly what the animator wants. Motion capture sessions are still costly and labor intensive, so it is best not to have to repeat them. Yet it is often difficult to know exactly what motions are desired before the session. As a result, a great deal of research in recent years has been aimed at devising methods for editing motion capture data after it has been collected. Many of these methods allow one to vary the motions to adapt to different constraints while preserving the style of the original motion. For example, Witkin and Popovic [50] developed a method in which the motion data is warped between keyframe-like constraints set by the animator. In other work, Rose and his colleagues developed a method to use radial basis functions and low order polynomials to interpolate between example motions, while maintaining inverse kinematic constraints [44]. A number of techniques for editing motion capture data have been based on the method of spacetime constraints [49], discussed in detail in section 1.4.2. For example Gleicher [23] developed a method which allows one to begin with an animation and interactively reposition the character. A spacetime constraints solver then finds a new motion by minimizing the distance from the old motion subject to constraints specified by the animator. A similar method was used to allow for retargetting of motions to characters of different dimensions [23]. To make the solution tractable, dynamics were not taken into consideration at all. Dynamics was added in the method of Popovic and Witkin [38], in which the editing is performed in a reduced dimensionality space. A more subtle problem with motion capture data is that it is not an intuitive way to begin constructing and animation. Animators are usually trained to use keyframes, and will often build an animation by first making a rough animation with few keyframes to sketch out

16 CHAPTER 1. INTRODUCTION the motion, and add complexity and detail on top of that. It is not easy or convenient for an animator to start creating an animation with a detailed motion he or she did not create and know every aspect of. In fact, in may cases what an animator may want from motion capture data is not the action, but the style of it, what we call the texture of the motion. The work in this thesis strives to create a method that allows one to either create an animation from scratch or alter an existing animation using texture information from motion capture data. Our work differs from that of many other researchers in that rather than starting with the live data, we start with a sketch created by the animator of what the final result should be, and fit the motion capture data onto that framework. As a result, it can be used to create motions substantially different from what was in the original data. 1.5 Thesis Outline In chapter 2 we describe our work toward achieving a video motion capture system. A new theoretical framework for determining the joint positions of an articulated figure is presented, along with the background necessary to understand it. In chapters 3-5 we describe work toward developing better methods for creating animations. In chapter 3, we demonstrate a method for complete synthesis of the motion. It begins with an analysis phase, in which the data is divided into features such as frequency bands and correlations among joint angles, and is represented with multidimensional kernel-based probability distributions. These distributions are then sampled in a synthesis phase, and optimized to yield the final animation. Chapter 4 presents results using principle components analysis as a basis for the texturing and synthesis. In Chapter 5 we discuss our most successful technique for motion capture assisted animation, in which a matching algorithm is used. These methods allow an animator to sketch an animation by setting a small number of keyframes on a fraction of the possible degrees of freedom. Motion capture data is then used to enhance the animation. Detail is added to degrees of freedom that were keyframed, a process we call texturing. Degrees of freedom that were not keyframed are synthesized. The methods take advantage of the fact that joint motions of an articulated figure are often correlated, so that given an incomplete data set, the missing degrees of freedom can be predicted from those

1.5. THESIS OUTLINE 17 that are present. Finally, in chapter 6 we discuss the various techniques and results, and suggest approaches for future improvements.

18 CHAPTER 1. INTRODUCTION

Chapter 2 Video Motion Capture 2.1 Motivation Our earliest experiments with motion capture assisted animation were performed with data of a wallaby hopping on a treadmill. A wallaby is a small species of kangaroo, and because it hops with its legs together, it is a good approximation to model the motion as occurring in a plane. Note that for just about any other animal motion, such as human walking, the fact that the legs move one at a time causes a twist in the hips, which in turn makes a 2D approximation of the motion appear quite unnatural. The data of the hopping wallaby was in the form of a video of the animal, who had white markers painted onto each of its joints. Tracking the position of the markers either by standard computer vision techniques or by hand would be a straightforward task. However, it quickly became apparent that using the markers directly would be ineffective; the distance between the markers varied by up to 50 percent over one hop cycle. The reason for this variation is that the markers do not stay over the joints, due to deformations of the muscle and skin as it slides over the bones. In fact, this problem of not knowing where the joints are in a figure being tracked is a common issue one must deal with in motion capture data, no matter what tracking method is used. Any method that attaches markers to a subject will exhibit these problems, and other researchers have sought methods to overcome them. For example, the point-cluster method reviewed in [1] and discussed in more detail in [3] makes use of a cluster of markers 19

20 CHAPTER 2. VIDEO MOTION CAPTURE attached to each limb to more accurately determine the joint motion. We could not make use of more markers because the data had already been collected, so we turned to a markerless video-based tracking technique [14]. However, this method assumes that an accurate model of the skeleton was known before beginning the tracking process. We did not have access to that information, and so sought a method to extract it from the data. In this chapter we describe the theoretical framework for a new factorization technique that allows one to determine the kinematic chain model for an articulated figure. For the simplified case of 2D tracking, the method was applied to the wallaby data. In section 2.2 we describe the mathematical background necessary to understand the method; in section 2.3 we present the method itself; and in section 2.4 we describe our experiments and results. This work is published in [15]. 2.2 Background Theory and Previous Work 2.2.1 Overview The new method we have developed is an extension of the video motion capture method developed by Bregler [14]. That technique was created to allow recovery of the motion of a high degree-of-freedom articulated figure from video sequences, without requiring the subject to wear any special markers or suits. Given a high-speed video sequence of a human and an accurate model of the subject, meaning all of the joint locations and angles for some initial configuration, the method returns the change in angle for each joint for each frame of the video sequence. It makes use of twists and the product of exponential maps, which allow for a linear approximation that can be used to find a robust solution for the kinematic degrees of freedom. The following sections discuss the background necessary to understand that method and our extension of it. 2.2.2 Video-based Tracking: Gradient Formulation Many tracking techniques make use of the gradient-based formulation of Lucas and Kanade [31]. This method models the change in intensity between two frames in a video sequence

2.2. BACKGROUND THEORY AND PREVIOUS WORK 21 as I(x + u x (x,y,φ), y + u y (x,y,φ), t + 1) = I(x,y,t) (2.1) where I is the image intensity at pixel position (x,y) at time t. The motion change u(x,y,φ) = [u x (x,y,φ), u y (x,y,φ)] T gives the pixel displacement as a function of location (x,y) and model parameters φ. The first order Taylor expansion of equation 2.1 yields I x u x (x,y,φ) + I y u y (x,y,φ) = I t (x,y) (2.2) where I t (x,y) is the temporal image gradient, and [I x (x,y),i y (x,y)] T is the spatial gradient at pixel location (x, y ). In the work presented in this thesis, u(x,y,φ) is derived from a twist based representation of articulated figure motion. Before discussing that method, it is easiest to first consider a simpler model, affine tracking, discussed in the next section. 2.2.3 Example Motion Model: Affine Tracking The motion of many rigid objects on a series of video frames can be described by a simple affine transformation with K = 6 degrees of freedom, represented by the parameters φ = [a 1,a 2,a 3,a 4,d x,d y ] T. In that case the motion model is as follows: ] ] u(x,y,φ) = [ a1 a 2 a 3 a 4 ][ x y + [ dx d y (2.3) Putting equation 2.3 into equation 2.2, we get xi x a 1 + yi x a 2 + I x d x + xi y a 3 + yi y a 4 + I y d y = I t (x,y,t) (2.4) Now define a the vector B i at pixel location (x i,y i ) as B i = [ x i I x,i y i I x,i I x,i x i I y,i y i I y,i I y,i ] (2.5) Here the spatial derivatives I x,i and I y,i are taken at pixel i.

22 CHAPTER 2. VIDEO MOTION CAPTURE If the object we are tracking covers N pixels, we have N equations of the form of equation 2.5. We define the N K matrix B and the N 1 vector z as follows: B = B 1 B 2. B N I t (x 1,y 1 ) I t (x 2,y 2 ) z =. I t (x N,y N ) Then we can write the over-constrained set of N equations as follows: (2.6) B φ = z (2.7) The least squares solution to equation 2.7 is φ = (B T B) 1 B T z (2.8) Because the parameters we derive from this solution are obtained from a first-order Taylor approximation (recall equation 2.2), in practice we use an iterative procedure. First we find φ, and then warp the image at time t + 1 using these parameters. Based on the warped image, we then re-compute the new image gradients, and repeat the whole process until convergence. The parameters found in each step of the iteration are then combined to find the final value for φ. 2.2.4 Twist Motion Model To track the motion of an articulated figure such as a human, the affine model is insufficient because it does not allow for a straightforward description of the motion of a chain of joints. Instead, it is more natural to use the twist motion model, as described in detail by Bregler [14]. The following section is a review of that method in enough detail to make the new factorization method described in section 2.3 clear. In the following derivations we will use homogeneous coordinates, and will refer to the point q o = [x o,y o,z o,1] T in the object frame and q c = [x c,y c,z c,1] T in the camera frame. The orientation of a rigid body relative to the camera frame can be represented by the

2.2. BACKGROUND THEORY AND PREVIOUS WORK 23 transformation q c = G q o (2.9) where G is the familiar transformation G = r 1,1 r 1,2 r 1,3 d x r 2,1 r 2,2 r 2,3 d y r 3,1 r 3,2 r 3,3 d z 0 0 0 1 [ R d 0 1 ] (2.10) The 3D translation d = [d x,d y,d z ] T can be arbitrary, but the rotation matrix R SO(3) actually has only 3 degrees of freedom. Thus G has 6 degrees of freedom. Now define a unit vector ω that points in the direction of the rotation axis, and a vector q that gives the location of the origin of rotation. Further define the vector v = ( ω q) = [v 1,v 2,v 3 ] T. It can be shown [34] that an equivalent representation for G is G = e ˆξ θ (2.11) where θ is the angle of rotation about the vector ω and the matrix ˆξ is defined as 0 ω z ω y v 1 ˆξ = ω z 0 ω x v 2 ω y ω x 0 v 3 0 0 0 1 (2.12) The matrix ˆξ is referred to as a twist and the representation of equation 2.11 is referred to as an exponential map. This representation is convenient because, as will be shown in the following sections, for small changes the linear approximation has a simple form. Furthermore, it remains simple when applied to a kinematic chain such as the limbs of a human. If we now consider a chain of rigid bodies linked by joints that can rotate about a fixed point in 3D (a reasonable approximation for a human arm or leg) we can represent the transformation of a point q o on link k of the chain by a product of exponentially mapped

24 CHAPTER 2. VIDEO MOTION CAPTURE twists. We will use s to represent the overall scale factor. The first twist ˆ ξp θ p gives the parameters for the overall translation and rotation, often referred to as the pose. The terms ξˆ 1 θ 1, ξ ˆ 1 θ 1,..., ξ ˆ k θ k give the rotations about joint k. q c = se ˆ ξ p θ p e ˆ ξ 1 θ 1 e ˆ ξ 2 θ 2... e ˆ ξ k θ k q o (2.13) The point q o in the object frame is projected into the image location under scaled orthographic projection as follows. [ xim y im ] = [ 1 0 0 0 0 1 0 0 ] se ˆ ξ p θ p e ˆ ξ 1 θ 1 e ˆ ξ 2 θ 2... e ˆ ξ k θ k q o (2.14) The motion model u(x,y,φ) can then be developed by using a Taylor expansion of the exponential maps and using a linear approximation, as demonstrated in the following steps. [ ux u y [ ux u y ] ] = = [ ] xim (t + 1) x im (t) (2.15) y im (t + 1) y im (t) [ ] 1 0 0 0 ξ [s(t + 1)e ˆ p θ p (t+1) ξ e ˆ 1 θ 1 (t+1) ξ e ˆ 2 θ 2 (t+1) ξ... e ˆ k θ k (t+1) 0 1 0 0 s(t)e ˆ ξ p θ p (t) e ˆ ξ 1 θ 1 (t) e ˆ ξ 2 θ 2 (t)... e ˆ ξ k θ k (t) ]q o (2.16) Let s(t + 1) = s + s, θ(t + 1) = θ + θ note that e ˆξ θ(t+1) = e ˆξ (θ+ θ) = e ˆξ θ e ˆξ θ. Let P = [ 1 0 0 0 0 1 0 0 ] [ ux u y ] = P[(s + s)e ˆ ξ p θ p e ˆ ξ 1 θ 1 e ˆ ξ 2 θ 2... e ˆ ξ k θ k s] e ˆ ξ p θ p e ˆ ξ 1 θ 1 e ˆ ξ 2 θ 2... e ˆ ξ k θ k q o (2.17)

2.2. BACKGROUND THEORY AND PREVIOUS WORK 25 recall q c = se ˆ ξ p θ p e ˆ ξ 1 θ 1 e ˆ ξ 2 θ 2... e ˆ ξ k θ k q o Note θ 1, so e ˆξ θ (1 + ˆξ θ). [ ux u y ] [ s P s + ξ ˆ p θ p + ξ ˆ 1 θ 1 + ξ ˆ 2 θ 2 +... + ξ ˆ k θ k ]q c (2.18) In the last step we have kept only terms that are first order in θ and s. Now that we have the motion model u(x,y,φ), we follow exactly the same procedure as we did for the example affine model described above. In this case we will have K = 6+k parameters to solve for, and φ = [ s/s,ωx p θ p, ωy p θ p,ωz p θ p,v p 1 θ p,v p 2 θ p, θ 1, θ 2,..., θ k ] T. Here we have used the superscript p to indicate twist and angle parameters associated with the pose. We will define a matrix B analogous to that of the affine case (equation 2.6). It is convenient to break B into two parts, one for the pose in which we solve for [ s/s,ωx p θ p, ωy p θ p,ωz p θ p,v p 1 θ p,v p 2 θ p] which we will call H and one for the joint angles in which we solve for [ θ 1, θ 2,..., θ k ] T ] which we will call J. Thus we will have B = [H J] For the pose we have: ] [ ux u y [ ux u y ] pose pose ( = P ξˆ p θ p + s ) q c (2.19) s = [ s s x ω p z θ p y + ω p y θ p z + v p 1 θ p ω p z θ p x + s s y ω p x θ p z + v p 2 θ p ] (2.20) For the joint angles we have: ] [ ux u y [ ux u y ] joint k joint k = P ˆξ k θ k q c (2.21) = [ ( ω k z y + ωy k z + v k 1 ) θ ] k (ωz k x ωx k z + v k 2 ) θ k (2.22)

26 CHAPTER 2. VIDEO MOTION CAPTURE Again, note that for the joint angles, there is only one unknown for each joint, the angle θ k, while for the pose there are 6 unknowns, s/s,ωx p θ p, ωy p θ p,ωz p θ p,v p 1 θ p, and v p 2 θ p. Now if we make use of equations 2.2, 2.20, and 2.22, and follow the same procedure we did for the affine case, we can define [ ] B i = H i J i (2.23) where [ ] H i = I x,i x i + I y,i y i I y,i z i I x,i z i I y,i x i I x,i y i I x,i I y,i (2.24) [ ] J i = Ji 1 Ji 2... Ji k (2.25) Ji k = [( ωz k y + ωy k z + v k 1 )I x,i + (ωz k x + ωx k z + v k 2 )I y,i ] (2.26) Finally we solve for B in exactly the same way as we did for the affine case using equation 2.8. 2.3 Solving for Joint Locations In describing the above methods, we assumed that we knew the parameters ω and v for each joint. Given only one time frame, there is not enough information to solve for the twist itself, as these parameters form a product with the angles. (See for example equation 2.18). That is why we leave the angles combined with the parameters for the pose in equation 2.20, and provide the twist parameters in an initialization step for finding the angles in the kinematic chain (equation 2.22). However, if we consider more than just two frames, we can find a robust solution to the problem of finding the twist parameters as well as the angles. In the following explanation, we will define quantities analogous to those in sections 2.2.3 and 2.2.4, but will label them with a tilde here. The notation may become cluttered, so table 2.1 shows what the various symbols stand for. First, we need to rewrite equation 2.22 in a manner analogous to equation 2.20 so that

2.3. SOLVING FOR JOINT LOCATIONS 27 symbol i j k p N t K meaning which pixel which frame which joint pose number of pixels number of frames number of degrees of freedom Table 2.1: Symbols used in derivations. now there will be 5 unknowns: ω k x θ k,ω k y θ k,ω k z θ k,v k 1 θ k,v k 2 θ k. [ ux u y ] [ = jointk 0 ω k z θ k ω k y θ k v k 1 θ k ω k z θ k 0 ω k x θ k v k 2 θ k ] q c (2.27) Now we use this form for u to redefine J i and Ji k (see equation 2.25 and 2.26) for this situation as J i and J i k which take the form [ ] J i = J i 1 J i 2... J i k [ ] J i k = I y,i z i I x,i z i I y,i x i I x,i y i I x,i I y,i (2.28) (2.29) Here the significance of the superscript k on J i k is only that we assume that the known quantities I x,i y,x,y and z for the ith pixel are taken from a region of the image that is affected by the motion of joint k. Also note that the form of this equation is the same as that for H i (equation 2.24), as it should be, except for the first term of H i, which is associated with the overall scale factor. Next, we need to write these equations for multiple frames. Let the subscript j represent the jth frame. Then we can write B i j = [H i j J i j ], where H i j is the same as equation 2.24 but specified for the jth frame, and J i j is the same as equation 2.29 but specified for the jth frame. Finally, define the quantities φ, φ j, φ p j for the pose and φ k j for joint k. φ = [ φ 1 φ 2... φ t ] (2.30) φ j = [ φ p j φ 1 j φ 2 j... φ k j ] (2.31)

28 CHAPTER 2. VIDEO MOTION CAPTURE φ p j = φ k j = [ [ s s ωx p θ p, j ωy p θ p, j ωz p θ p, j v p 1 θ p, j v p 2 θ p, j ] ωx k θ k, j ωy k θ k, j ωz k θ k, j v k 1 θ k, j v k 2 θ k, j ] (2.32) (2.33) Now we solve for φ j at multiple frames, again using an equation of the form of 2.7 but now extended to multiple frames. Now we redefine a B and z (compare to equation 2.6) for this situation as B 1 B 2 B = (2.34). B N [ ] B i = B i1 B i2 B it (2.35) I t (x i,y i ) 1 I t (x i,y i ) 2 z i = (2.36). I t (x i,y i ) t z 1 z 2 z = (2.37). z N Now finally we can solve the equation analogous to equation 2.7: B φ = z (2.38) The desired parameters can be factored from the angles using a singular valued decomposition (SVD) of the result, treating one angle at a time. To illustrate the process, consider

2.4. EXPERIMENTS 29 joint k. There are 5 parameters at t times, so we can form the 5 t matrix A as follows: ωx k θ k,1 ωx k θ k,2 ωx k θ k,t ωy k θ k,1 ωy k θ k,2 ω k y θ k,t A = ω z k θ k,1 ωz k θ k,2 ωz k θ k,t (2.39) v k 1 θ k,1 v k 1 θ k,2 v k 1 θ k,t v k 2 θ k,1 v k 2 θ k,2 v k 2 θ k,t This matrix is actually of rank 1, and could be written A = ωx k ωy k ωz k v k 1 [ ] θ k,1 θ k,2... θ k,t (2.40) v k 2 To get equation 2.40 from 2.39, we find [U,S,V ] = SV D(A). The first column of U returns c[ ωx k ωy k ωz k v k 1 v k 2 ], where c is a constant. This constant can be found by noting that (ωx k ) 2 + (ωy k ) 2 + (ωz k ) 2 = 1. 2.4 Experiments As an initial test of this fitting technique, we used video data of a wallaby (a small species of kangaroo) hopping on a treadmill. The animal had markers placed on its joints, as the data was originally intended for biomechanical studies of the forces on its joints. However, it was clear that measuring the locations of the markers and computing the angles directly from that data would not be accurate, as the distance between any given pair of consecutive markers (for example, the hip and knee markers) varied by up to 50% over one hop cycle due to the soft deformations of the skin and muscle. As a result, this is a situation where a method such as ours that could actually determine the kinematic structure of the animal would be valuable. Equation (2.38) is greatly simplified in 2D, because ω x and ω y are zero. Because the

30 CHAPTER 2. VIDEO MOTION CAPTURE Figure 2.1: Hopping Wallaby with acquired kinematic model overlayed wallaby hops with its legs together, it is a valid approximation to assume the motion occurs in a plane. The rate of the data was 250 frames per second, yielding roughly 80 frames per hop cycle. As an initial guess for the kinematic model at each time, the markers on the joints were used. Then 8-10 successive frames were used to solve for the twist parameters. Results are shown in figure 2.1 in which we have overlayed the resulting model on the images.

Chapter 3 Sampling Kernel-Based Probability Distributions 3.1 Overview As a step toward our goal of enabling an animator to use motion capture data to enhance an animation, we investigated whether is would be possible to synthesize motion with minimal input from the animator. Our ultimate goal was actually to create a more interactive method, in which the animator may specify a great deal about the motion. However, we felt that by demonstrating a method for complete synthesis, we would uncover a deeper understanding of how to model a motion texture. In particular, we wanted our synthesis method to accurately represent not only the style of the original motion, but the variations within it. Here we specifically focus on cyclic motions such as walking, because this restriction simplifies the problem. We felt that walking motions would be a good test ground for our methods, because it is such a familiar motion that a large amount of information about mood and personality is conveyed by exactly how the character is moving. In fact, because of its familiarity, walking is one of the most challenging motions to truly animate well. Any flaw in the motion is immediately noticeable to an observer. We were also interested in the cyclic nature of walking. When using motion capture data for a cyclic motion, it is common practice to cyclify the motion. In other words, the 31

32 CHAPTER 3. SAMPLING KERNEL-BASED PROBABILITY DISTRIBUTIONS animator uses one step of the walk cycle and repeats it over and over again. However, in real life people do not repeat the same step over and over again; each is slightly different, due to variations inherent in the way a live being moves. Some steps may be slightly shorter or longer, and the head and upper body usually move around differently with each step. The loss of these variations during cyclification may cause the resulting animation to lose some of the life-like quality that an animator may want to achieve. In fact, we consider these variations to be part of the texture we wish to capture in our synthesis method. We propose a new method for creating animations that addresses these issues, an earlier version of which is published in [40]. We use the motion capture data as a source of soft constraints that determine the personality texture of the motion. The animator then specifies hard constraints, such as places the foot should contact the floor, or other intermediate positions. Given this information, our algorithm can synthesize motions that will both capture the style of the original motion capture data and perform the exact actions that the animator desires. To achieve such a result, we analyze the motion capture data by noting that it can be characterized by features such as correlations among joint angles, the frequency spectrum of each set of joint angle data, and the hard constraints present in the original data. We use these facts to create statistical probability distributions that represent the data, and which can be sampled to generate new animations to the specifications of the animator. The remainder of this chapter describes the details of the method. 3.2 Related Work There has been a great deal of past research in a number of different areas that are related to our project. We divide this work into two main categories that are described below. 3.2.1 Sampling Probability Densities In our work we use a statistical representation of our motion data, and make use of sampling techniques to determine likely outcomes. Other projects in animation and speech recognition have also made use of these ideas. A Markov chain monte carlo algorithm was used to sample multiple animations that satisfy constraints for the case of multi-body collisions of

3.3. METHODS 33 inanimate objects [17]. In other projects, a common method of representing data has been to use mixtures of Gaussians and hidden Markov models. Bregler [14] has used them to recognize full body motions in video sequences, and Brand [12] has used them to synthesize facial animations from example sets of audio and video. Brand and Hertzmann [13] have also used hidden Markov models along with an entropy minimization procedure to learn and synthesize motions with particular styles. Our method differs from these projects in that we want to keep as much of the information in the original data as possible, and so we have chosen to use kernel-based probability distributions to represent our data as a way to generalize it while keeping all of the fine detail. 3.2.2 Signal Processing There are a number of earlier studies in which researchers in both image texture synthesis and motion studies have found it to be useful to look at their data in frequency space. In image texture synthesis, one of the earliest such approaches divided the data into multi-level Laplacian pyramids, and synthetic data was created by using a histogram matching technique [24]. This work was further developed by DeBonet [11], in which the synthesis takes into account the fact that the higher frequency bands tend to be conditionally dependent upon the lower frequency bands. We incorporate a similar approach, but applied to motion data. In animation, Unuma et al. [48] use fourier analysis to manipulate motion data by performing interpolation, extrapolation, and transitional tasks, as well as to alter the style. Bruderlin and Williams [16] apply a number of different signal processing techniques to motion data to allow editing. Lee and Shin [29] develop a multiresolution analysis method that guarantees coordinate invariance for use in motion editing operations such as smoothing, blending, and stitching. Our work relates to these animation papers in that we also use frequency bands as a useful feature of the data, but we use them to synthesize motion data. 3.3 Methods In this section we will describe the method used to create our animations. There are two main aspects to the process: analysis and synthesis. In the analysis phase, we decide

34 CHAPTER 3. SAMPLING KERNEL-BASED PROBABILITY DISTRIBUTIONS which aspects of the real motion data are important to preserve in the final animation, and represent the data accordingly. In the synthesis phase, we use the database created in the analysis phase, as well as additional information and constraints input by the animator, to produce the final product. 3.3.1 Analysis One of the most important questions we seek to answer with our work is: which features of the original data are important to fully describe the texture of a motion? Although the original data and final result that is input to the animation are in the form of joint angles and translations, we find that other features may be more useful during the process of defining and applying a motion texture to an animation. In particular, we make use of phases, frequency bands, and correlations, each of which is described below along with the reason we feel it is an important aspect of the motion to preserve during synthesis. A second problem we seek to solve is how to represent these features in such a way as to be the most useful for the synthesis process. We chose to use kernel-based probability distributions as a balance between generalizing the data and keeping all of the fine detail in the motion. Features Phases. By a phase we mean a segment of time during which a particular set of hard constraints is satisfied. For example, during a walk cycle, there is a phase where the right foot is flat on the floor, another phase where the right heel lifts while the right toe stays on the floor, then the left heel touches the floor, and so on (figure 3.1). The points of initiation of these phases correspond to what traditional animators often use as key frames in their animations, which is why we felt they were important to take note of in our method. In addition, just by knowing which phase the motion is in, the angle data becomes much more constrained (figure 3.2). For example, when the left foot is on the floor, the hip and knee angle are likely to fall within a different range than when the left leg is not touching the floor. If there are no hard constraints in effect (note it is not often true that no hard constraints are in effect, only if the character is airborne as in jumping or briefly during each step in running) we could classify this situation as a phase as well.

3.3. METHODS 35 Figure 3.1: Example of a set of 4 phases during a walk cycle. The phases are as follows (a) right foot flat on the floor; (b) right heel lifts, right toe still contacting floor; (c) left foot flat on the floor; (d) left heel lifts, left toe contacting floor. Note this is a simplified model, for example in reality there is a moment when the left toes are on the floor at the same time the right heel is touching the floor. However, we found this simplified model gave good results in the synthesis process. 40 30 Left Hip Angle in Degrees 20 10 0 10 20 30 40 1 1.5 2 2.5 3 3.5 4 4.5 5 Time in Seconds Figure 3.2: Hip angle data with the phases marked. Right foot flat, green circle; right toe in contact, magenta triangle; left foot flat, blue star; left toe in contact, red square. Note how the data has a very particular structure within each phase.

36 CHAPTER 3. SAMPLING KERNEL-BASED PROBABILITY DISTRIBUTIONS 20 0 20 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 20 Left Hip Angle in Degrees 0 20 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 20 0 20 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 20 0 20 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time in Seconds Figure 3.3: Example of decomposing data into frequency bands. Shown is the left hip angle data, higher frequencies are at the top, lower at the bottom. A Laplacian pyramid decomposition was used for this plot. Frequency Bands. In most cases we divide the angle and translation data into frequency bands before using it for synthesis (figure 3.3). The decomposition can be made with any standard technique such as wavelets or the Laplacian pyramid, with similar results. We choose this representation because it often simplifies the form of the data, for example by separating the smooth sinusoidal overall motion of a walk cycle from the high frequency jitter associated with live motion. These two aspects of the motion are important in different ways. Variations in the lower frequency bands are associated with the large scale motions, such as the stride length or overall motion. On the other hand, we perceive variations in the higher frequency bands as jitter or wiggling around. Both forms of fluctuation are present in live motion, and they are important to preserve in any synthesis or texturing method that is to capture the essence of the original motion.

3.3. METHODS 37 Knee Angle in Radians 90 80 70 60 50 40 30 20 10 0 10 Normal Walk (a) 40 20 0 20 40 Hip Angle in Radians 90 80 70 Funky Walk (b) 60 50 40 30 20 10 0 10 40 20 0 20 40 Figure 3.4: Plot of the knee angle vs. the hip angle at each point in time for two walk styles. (a) normal walk (b) funky walk. Correlations. In coordinated human or animal motion, the angle and translation data for each joint are related to each other. For example, when the hip angle has a certain value, the knee angle is most likely to fall within a certain range that depends upon the hip angle. Another type of correlation can be found if we look at the relationship between angle values at a given time to past and future times. In other words, the hip angle at time t will be related to the hip angle at time t 1 and t 2, because of the dynamics of live motion. One way to visualize such correlations is with a plot such as that in figure 3.4, where we show the example of the knee angle vs. hip angles for each point in time of motion capture data of two different walk styles. Notice that the shapes of the plots are similar, but not exactly the same, both appearing as a skewed horseshoe shape. The shape of such a correlation plot contains the personality information. In addition, such a plot contains information on how the data is likely to vary within a given style. Neither plot is an exact shape, but allows for some variation; given a hip angle, the knee angle may fall within a certain range. For clarity in this example we have plotted a two dimensional correlation, but in the synthesis method we usually use more than two dimensions, looking at joint probability distributions of up to 8 features at once.

38 CHAPTER 3. SAMPLING KERNEL-BASED PROBABILITY DISTRIBUTIONS Representation of the features Merely representing the correlations among the data features is not enough, because the data are still discrete points. We really want a smooth distribution, so that we could potentially sample not just values that are actually in the data, but any of an infinite number of values that are likely to occur given the data. Probability distributions are commonly created by fitting a function such as a gaussian to the data. However, in this situation finding such a function would be difficult if not impossible because of the complex shape of the distribution. In addition, the data may be sparse, especially if there is not much motion data available and we are looking in several dimensions at once. We want to preserve all of the information present in the original data, yet not lose any of the subtleties that create the motion texture. As a result, we chose to represent the data with a kernel-based probability distribution, in which a kernel function is placed over each of the data points and all of the kernels are summed to create a smooth distribution [9]. We used a gaussian kernel function because of its simplicity and it gave good results, but one could use any of a number of standard kernels. Using the example of the correlation between knee and hip angles, we could mathematically represent the corresponding (unnormalized) two dimensional kernel-based joint probability distribution as [ P(θ,φ) = e ( θ θ i 2σ ) 2 θ e ( φ φ i 2σ ) 2] φ (3.1) i where P(θ,φ) is the probability of finding hip angle θ and knee angle φ together, θ i is the hip angle at the ith point in time in the original data, φ i is the knee angle at the ith point in time, and σ θ and σ φ are the sigmas corresponding to the Gaussian kernels used for the hip and knee angle, respectively. In figure 3.5 we show a plot of such a distribution, again for the case of the knee vs. hip angle. The user must chose the width of the kernel, which in our case is the sigma of the gaussian. In the plot we show several different choices of the sigma. In plot (a) the sigma is too small to generalize the data, in plot (d) it is too wide to capture the specifics, whereas in the intermediate plots the sigmas allow for a reasonable representation of the data. In practice we choose the sigmas automatically based on the spread of the data, usually about 1/10 the standard deviation, which corresponds to plot (b).

3.3. METHODS 39 80 60 (a) 80 60 (b) 40 40 20 20 0 0 Knee Angle in Degrees 20 80 60 40 20 0 20 50 0 50 (c) 50 0 50 Hip Angle in Degrees 20 50 0 50 80 (d) 60 40 20 0 20 50 0 50 Figure 3.5: Contour plot of a 2-D kernel-based probability distribution for the hip and knee angle, the same data as shown in the correlations plot in figure 3.4a. Four different sigmas were used, as a fraction of the standard deviation of the angle data. (a) 1/40 (b) 1/10 (c) 1/5 (d) 1/2.

40 CHAPTER 3. SAMPLING KERNEL-BASED PROBABILITY DISTRIBUTIONS 3.3.2 Synthesis The goal of the synthesis phase is to start with the distributions created during the analysis as well as the constraints set by the animator, and create the final animation. We achieve this result by first sampling the kernel-based distributions, and then optimizing the result as described below. Sampling To begin synthesis, the animator specifies the hard constraints (in our work so far, the hard constraints are always foot positions on the floor, but other constraints such as intermediate leg positions could also be used) and how long the constraint should be satisfied. Given this information, we create the first angle by sampling based on phase, frequency band, and previous points in time. For example, suppose the first phase we are synthesizing is the left foot flat on the floor, and the first angle data we are synthesizing is the left hip x angle, which we will represent as θ. To get the first synthetic angle value, a 1-D kernel-based probability distribution of likely values in the lowest frequency band at the beginning of the first phase is constructed from the data and sampled. The second point is sampled from a 2-D kernel-based conditional probability distribution of the value at time t vs. the value at time t 1. We fix the value of the first point, which is now time t 1, and create a conditional distribution P(θ t θ t 1 ), where only data from the relevant phase and frequency band is used, from which we sample θ t, the hip x angle at time t. The third value is similarly obtained by sampling from the 3-D distribution P(θ t θ t 1,θ t 2 ), and so on until we have N points. From then on, each subsequent point is sampled from a N + 1 dimensional conditional distribution P(θ t θ t 1,θ t 2,...θ t N ). In most cases, we use N=4 for good results. We sample to the end of that phase, and then continue sampling into the next phase, using data from the new phase, and so on until we reach the end of the time specified by the animator. This whole process is repeated for the other frequency bands, and then all of the bands are summed to yield the final sampled angle data. Now that we have one angle, we can use that information in our conditional probability distributions when we synthesize further angles. For example, suppose we now want to

3.3. METHODS 41 synthesize the hip y angle, which we will represent here as α. We would sample as we did for the hip x angle, but now the distribution would be P(α t α t 1,α t 2,... α t N,θ t ), where α t represents the hip y angle at time t, and θ t is again the hip x angle. We continue this process throughout the whole skeleton until all angles have been synthesized. In all cases we start from the center of the body and move outward when deciding which angles to include in the conditional distributions. The hips and overall rotations and translations are sampled first, then the knee angles are sampled from the phases, previous points, and hip angles; the ankles are sampled from the phases, previous points, and knee angles, and so on. We set up the sampling in this manner because often motion is initiated from the center of the body, and it gave good results. Optimization Now that we have a set of sampled data, the resulting animation will be close to what the animator wants. However, the hard constraints are probably not fully satisfied, because they only appeared in the initial sampling in that they determined which phase to sample from. In addition, it is necessary to bin the data in order to achieve the sampling, which leads to some roughness in the final result. To remove these problems, we use a gradient based method to optimize the synthetic data. In a sense, we have two sets of constraints. The kernel-based probability distributions can be thought of as being soft constraints on the data. There is a range of possible values allowed for all of the degrees of freedom of the synthetic data that will satisfy the distributions specified by the original data. On the other hand, the constraints specified by the animator, in our case foot positions on the floor at particular times, are hard constraints that must be satisfied exactly. We want our optimization procedure to force the hard constraints to be satisfied while not pushing the data beyond reasonable values allowed by the soft constraints. To optimize the data based on the hard constraints, we use a gradient decent method, in which we allow the angles of the hip, knee, and ankle of the leg that is supposed to be constrained to the floor to vary. The function we want to minimize is F hard = (T x o x c ) T (T x o x c ) (3.2)

42 CHAPTER 3. SAMPLING KERNEL-BASED PROBABILITY DISTRIBUTIONS where T is the full set of transformation matricies that describe the motion of the leg, x o is the initial position of the foot, and x c is the desired constrained position that the animator has specified. The form of T is actually quite complex, as it includes rotation matricies for the x, y, and z axis for each joint. For example, for joint k, the transformation looks as follows: 1 0 0 cosθ y 0 sinθ y cosθ z sinθ z 0 T jointk = 0 cosθ x sinθ x 0 1 0 sinθ z cosθ z 0 (3.3) 0 sinθ x cosθ x sinθ y 0 cosθ y 0 0 1 The optimization was performed over the hip, knee and ankle joints, so T is a product of nine matricies. As a result, to keep the computations from being unreasonably slow, the derivatives of the rotation matricies are taken numerically to choose the step direction. In other words, we compute the derivatives using F x = (F(x + h) F(x)) h (3.4) and similarly for y and z. The results did not depend upon the choice for h as long as it was small enough, as would be expected. We used a value of.001 radians for h in most cases. To choose the step direction and size, we fixed a step size of.05 radians, and multiplied this value by the normalized gradient. Again the results were not highly dependent on the step size, as long as it were small enough. About 10-20 iterations were required for convergence. To optimize based on the soft constraints, we represent the data with the same kernelbased probability distributions that were used in the sampling, except that now we also include N points in the future, θ t+1,θ t+2,...θ t+n. We found that including these points reduced the number of iterations required in the optimization. Here we want to stay near a local maximum in an equation of the form of equation 3.1. Consider the example of optimizing the hip y angle, and using as the other features in the distribution the hip x angle and N = 1 point on either side of the time point being optimized. (In practice, we used N = 8). We will use the same notation as above, letting θ t represent the hip x angle at time t, and α t represent the hip y angle at time t. The sum is over all of the motion capture data

3.4. EXPERIMENTS 43 points, represented by the index i, while again t is the time point of the synthetic data being optimized. We write the function below in equation 3.5. [ P(α t,α t 1,α t + 1,θ t ) = i e ( α t α i 2σα )2 e ( α t 1 α i 1 2σα ) 2 e ( α t+1 α i+1 ) 2σα 2 e ( θ t θ i 2σ ) 2] θ (3.5) We take the derivative of this distribution with respect to α t, and take a step in the direction that maximizes the probability of occurrence. The step sizes were chosen automatically to be 1/20-1/40 the size of the standard deviation for the data being optimized. This process would be repeated for each time t in the synthetic data for the hip y angle α. Roughly 10-20 iterations were required for convergence. A similar optimization process would be repeated for the other degrees of freedom, including the appropriate other joints in the probability distribution, the same ones that were used for sampling. The results of the sampling and optimization process are shown in figures 3.6 and 3.7. In practice, we alternate optimizing for the hard constraints with optimizing for the soft. After the initial sampling, we begin by optimizing for the hard constraints, then the soft, and repeat. In the final round of optimizing the soft constraints, we only optimized in phases where the hard constraints would not be disturbed. For example, if the phase were such that the left foot were in contact with the floor, we only optimized the angles of the right leg, and not the overall translations, overall rotations, or left leg angles. This final round was useful for smoothing over small discontinuities that sometimes arouse at the boundary between phases after optimizing for the hard constraints. The angles in the upper body only underwent one round of optimization for the soft constraints, since they are not affected by the hard constraints that we used. 3.4 Experiments To demonstrate this method, we worked with motion capture data of two styles of walk, one of which was rather stylized and will be referred to as the funky walk, the other of which will be called the normal walk. We used 33 degrees of freedom to represent the character, 27 joint angles, 3 overall rotations, and 3 overall translations. In each case we used 512 time points of real data, which corresponded to about 12 steps.

44 CHAPTER 3. SAMPLING KERNEL-BASED PROBABILITY DISTRIBUTIONS Hip Angle in Degrees 2 1 0 1 2 (a) 3 1 1.5 2 2.5 2 1 0 1 2 (b) 3 1 1.5 2 2.5 2 1 0 1 2 (c) 3 1 1.5 2 2.5 Time in Seconds Figure 3.6: Plots of hip angle data after sampling and optimization. Shown is the 5th lowest frequency band in a Laplacian Pyramid decomposition. (a) Motion capture data; (b) synthetic data after sampling; (c) the same synthetic data as in figure 3.6b after optimization.

3.4. EXPERIMENTS 45 2 2 Hip Angle at time t 2 1.5 1 0.5 0 0.5 1 1.5 (a) 1.5 1 0.5 0 0.5 1 1.5 (b) 2 2 1 0 1 2 Hip Angle at time t 2 2 1 0 1 2 Figure 3.7: Correlation plots of hip angle data after sampling and optimization. Shown is the 5th lowest frequency band in a Laplacian Pyramid decomposition, the same data as in figure 3.6. The plot is of each point at time t-2 versus the point at time t. (a) Motion capture data is shown with black circles, sampled data is shown with blue squares. Note how some of the blue squares fall outside the range one would expect them to be in based on the distribution of black circles. (b) Motion capture data is again shown with black circles, and the sampled data after being optimized is shown with magenta stars. Now the synthetic data falls within a range predicted by the real data.

46 CHAPTER 3. SAMPLING KERNEL-BASED PROBABILITY DISTRIBUTIONS For each walk we were able to synthesize large amounts of data that had the characteristics we desired. In other words, the final animations (1) had the style of the original motion capture data; (2) were not exactly the same as the original data, but showed the variation we wanted for a life like feeling (this aspect is especially clear with the funky walk, where the original data had a large amount of fluctuation in it); (3) the hard constraints specified at the beginning of the synthesis of foot positions on the floor were satisfied. The results are especially well illustrated for the case of the normal walk. For example, if we animate three characters offset in space but with the same motion capture data, the result looks artificial because they all move exactly the same. However, if we animate two of the characters with synthetic data created with step sizes roughly equal in size to that of the original data, the animation is much more convincing because even though the characters are marching in step with each other, their movements vary a bit. Furthermore, an observer cannot tell by looking which is the real data and which is the synthetic data. For the case of the funky walk, the results are less convincing. The motion and variations are still present, but one can easily tell the synthetic data apart from the real data due to occasional high frequency glitches in the motion. These arise because our initial sampling relies upon the correlations between joints over the whole walk cycle, which are less constrained for the funky walk than for the normal walk (figure 3.4). We find that for motions with more variations, relying more on correlations to points at past and future times improves the result, but also increases the computation time greatly. The method of chapter 5, which considers correlations between joints on a more local time scale, overcomes these limitations. The importance of the variations is especially noticeable if we create a crowd of characters from one data set. If the characters are all animated with the original data, the motion looks artificial. Even if the data is shifted in phase, an observer can still pick up on the repeating patterns, especially if not much data is available in the first place. However, when we synthesize the data using our method, we can create an unlimited supply of data to animate a crowd. Figure 3.8 shows an example output image from such an animation. We can create even more variability by specifying hard constraints in ways that are unlikely to be found in the data. For example, if we make the distance between steps very small, the character prances in place with a style like that of the original motion.

3.4. EXPERIMENTS 47 Figure 3.8: Example frame from one of the output animations. Each of the characters was animated with a different set of synthetic data, note how they vary in their motions.

48 CHAPTER 3. SAMPLING KERNEL-BASED PROBABILITY DISTRIBUTIONS 3.5 Discussion In this work, we were aiming to create a method that gives the animator control over the result, while keeping the fine detail fo live data. We address the problem of giving the animator control by dividing the data into features that may be meaningful to an animator. It is intuitive for an animator to think in terms of key frames and the length of time between them. A key frame can be thought of as a hard constraint in our method, and we have included the specification of hard constraints in our method. Keeping motion coordinated often means that the joints are moving properly with respect to one another, which we have represented by looking at correlations among the joint angles as a function of time, and being sure these correlations are maintained while the hard constraints are also satisfied. Finally, we suspect that much of what we perceive as texture occurs at the mid to high frequency range, which is why it is useful to divide the motion data into frequency bands. By having access to the frequency information, we may be better able to abstract the part of the texture that we want. The disadvantage of dividing the data up in this way is that it may require more effort on the part of the animator, as he or she must specify how to use the features. However, this may be exactly what an experienced animator wants to do to have full control over the result. We address the problem of capturing fine detail with a limited data set by using kernel-based probability distributions to represent the features of interest. Such distributions allow all of the data to contribute to the distribution without losing any of the fine structure. The disadvantage of this representation is that it leads to much slower computations than a method that generalizes the data more such as mixtures of Gaussians. However, if the goal is to create the highest quality animations possible, then speed may not be a primary concern. In fact, in such cases, the issue is not speed at all, but whether or not the desired result can be achieved at all. We plan to speed up the computation by implementing an algorithm that only uses gaussian kernels in the proximity of the current state.

Chapter 4 Principle Components Analysis Based Methods 4.1 Overview In this chapter, we discuss some work that ultimately served as a bridge between the work presented in Chapter 3 and in Chapter 5. Although it was not refined as much as the work in those chapters, we include it here for two reasons. One, to answer a common question regarding our project: why not use a machine learning technique to reduce the dimensionality of the space we are working in? Two, to provide background and ideas to others who may wish to carry on this work. Since our goal is to enable an animator to start with a few degrees of freedom and be able to create the rest using data in the motion capture, it may appear that that using one of the many automatic techniques available for reducing the dimensionality of a high degree of freedom system would be an appropriate approach. Among the most common methods for reducing the dimensionality of a data set is principle components analysis (PCA). In the following we describe the approaches we made using PCA and the results we obtained, and discuss why they may not be as useful as other methods for our particular goals. 49

50 CHAPTER 4. PRINCIPLE COMPONENTS ANALYSIS BASED METHODS 4.2 Review of Principle Components Analysis In this section we present a brief review of the method of PCA. A more thorough description can be found in [9]. Suppose we have a data set of N vectors x, each of dimension D. In our case, N would represent the number of time points of motion data, and D the number of degrees of freedom in the skeleton we are working with. We could easily describe each vector x(t) as a sum of D basis vectors of dimension D, for example the unit vectors in each direction. If we let z i represent basis vector i, and a i (t) represent the corresponding coefficient at time t, we could then write x as follows. x(t) = D i=1 a i (t)z i (4.1) However, that may be more information than is needed. Often when there is strong correlation between the degrees of freedom of the data set, as we expect for the joint angles of motion data, a good representation of each vector x(t) can be achieved by using a set of M basis vectors of length D, where M < D. The goal of PCA is to find the set of such of vectors that provides the best possible representation of the data. In other words, we want to be able to write x(t) as where the b i are constants. x(t) = M i=1 a i (t)z i + D i=m+1 b i z i (4.2) It can be shown that the best approximation is achieved by choosing the basis vectors to be the eigenvectors of the covariance matrix C of the data set with the M largest eigenvalues. In other words, if we let x represent the mean of the data set, and let C = N t=1 Then we find the D eigenvalues z i of C such that [x(t) x ][x(t) x ] T (4.3) Cz i = λ i z i (4.4)

4.3. APPLICATION TO MOTION DATA 51 Finally, to find the best possible representation of the space with only M basis vectors, choose the M vectors z i with the M largest eigenvalues λ i. The coefficients a i for representing x as a sum of these M vectors may be found by using a i (t) = z T i x(t) (4.5) The constants b i may be found using b i = z T i x (4.6) 4.3 Application to Motion Data Initially, we wanted to determine whether it would be possible to represent motion data with a set of M basis vectors, where M is less than the number of degrees of freedom in our skeleton. The method of PCA was first applied to walking data, to see how the quality of the data would be affected by removing some of the eigenvectors with low eigenvalues. It was immediately apparent that the resulting animations would not be satisfactory. If even a small fraction of the eigenvectors are removed, such as 10%, the motion begins to lose some of the life-like quality that we wish to achieve. We did not pursue this simplistic approach to using PCA any farther, because even if some of the eigenvectors could be removed, an even greater problem is it is still not clear how the remaining vectors would be useful to an animator. The eigenvectors represent overall modes in the motion that are not likely to be intuitive to use. On the other hand, an animator creating an animation from scratch is already working in a space of reduced dimensionality. For example, suppose he animates a human figure with 63 degrees of freedom, and initially sketches out a forward walking motion by animating the rotation about the x-axis of the hips and knees. He has provided information for only 4 degrees of freedom, and as he builds up the animation, will want the rest of the body to move in synchronicity with these joint motions. Thus, it is reasonable to consider methods that make use of this situation already set up by the animator. One approach we developed makes use of this information before subjecting the data to

52 CHAPTER 4. PRINCIPLE COMPONENTS ANALYSIS BASED METHODS PCA. Suppose the animator now wants to synthesize the remaining 57 degrees of freedom based on what he has created so far. The first step is to divide the data into frequency bands. We found that as in the method of chapter 3, the results were much better with a multiresolution approach. In the following explanation the term data refers to one frequency band. At the end of the process, the bands are summed to yield the final angle data. Let the vectors y(t) represent the keyframed data created by the animator. There are N such vectors y, each of length 4. We will use these vectors to synthesize the missing degrees of freedom one at a time. Suppose the first one we wish to synthesize is the lowest spine joint x angle. We organize the motion capture data into vectors x(t) with the degrees of freedom we know the animator will provide in the first 4 components. [ ] x(t) = left hip left knee right hip right knee spine x (4.7) The covariance matrix C of all N such vectors is created, and the 4 eigenvectors with the largest eigenvectors are found. We choose to create a bases set of 4 vectors of length 5 because we know that the animator will provide information for 4 degrees of freedom, so that is the number of coefficients we can find. To find the coefficients of the resulting basis vectors we wish to use an equation of the form of 4.5. However, y(t), the data provided by the animator, is only of length 4 while the z i are of length 5. Thus we must temporarily truncate the z i by removing component 5, which represents the missing degree of freedom to be synthesized. Let these truncated vectors be called z i. We are making the assumption that by finding the coefficients that properly describe the first 4 degrees of freedom, we are also finding the right coefficients for the missing degree of freedom that we are trying to synthesize. This is a reasonable assumption in this particular application, because the animator has decided he wants to synthesize the rest of the degrees of freedom based on what he created already for the hips and knees. We use an equation analogous to equation 4.5 to find the coefficients. a i = z T i y (4.8) Once the coefficients are found, equation 4.2 is used to generate x at each time point, using the full length eigenvectors.

4.4. DISCUSSION 53 An immediate question that may arise is: why not synthesize all of the missing degrees of freedom at once, so that equation 4.7 becomes a vector of length 63? In practice, the results are much better when one degree of freedom is synthesized at a time, because the reconstruction error is minimized. An example may help clarify this point. Continuing our example of using the 4 angles representing the hip and knee x angles as the degrees of freedom the animator has keyframed, suppose we wanted to synthesize the motion data for the first and second spine joint x angles. We would apply the method twice, each time starting with a vector of the form of equation 4.7, and the 5th degree of freedom would be either the first or second spine x angle data. The 4 basis vectors that we keep each time would be similar, but slightly different, as each set is specialized for that particular angle. However, if we tried to apply the method to synthesize both spine angles at the same time, so that now equation 4.7 is of length 6, we are forcing both sets of angle data to be determined from the same set of 4 basis vectors obtained by keeping the eigenvectors with the 4 largest eigenvalues, resulting in greater error in synthesizing each angle. 4.4 Discussion When the above method is applied to walking data, the resulting motions are quite good, comparable or better in quality to those of chapter 3, in that they are life-like and capture the essence of the motion capture data. However, when we applied the method to animations and data with more variety in it than walking, it failed to produce good results. The motions were small in amplitude and vague, as if they had been averaged from the rest of the data. This result makes sense, because PCA finds the best vectors globally over all of the data present, and we would not expect the joints to be correlated to each other in the same way at all times if the motion is not cyclic as in walking. Yet, the joint are still correlated locally over smaller time scales. We explored a couple of methods by which to apply PCA over local portions of data. In one, we break the data into fragments at intervals, and apply the PCA method described above to one fragment at a time. This method had promise, but there were often very large discontinuities between where we switched from one fragment to another. To try to overcome this problem, we made use of a sliding window about each point. In other words,

54 CHAPTER 4. PRINCIPLE COMPONENTS ANALYSIS BASED METHODS at each point in time, we use the K points on either side to create a fragment over with to perform PCA, to get the local correlations. This method avoided the large discontinuities we observed by breaking the data into fragments, but suffered from a large amount of high frequency noise. At this point, it became clear that breaking the data into fragments to explore local correlations had promise. However, the use of an automatic method of dimensionality reduction was not simplifying the path to our goal of assisting an animator in her creative process, but complicating it. The best results were obtained by making use of the reduced dimensionality space already set up by the animator, and performing PCA over those degrees of freedom plus the angles to be synthesized. Furthermore, in the work in chapter 3 we clearly demonstrated that due to the highly correlated nature of joint motion, one can construct motion curves for one joint based on those of other joints. Thus, we turned to developing a technique for texturing and synthesis based on these successes, described in chapter 5.

Chapter 5 Fragment Based Methods 5.1 Overview The methods discussed in chapters 3 and 4, using kernel-based probability distributions and PCA, were quite successful for walking data. However, they were not so successful for motions with more variety in them, such as dance. Part of the problem with these methods is that they are point-by-point methods. In other words, one point of data is synthesized at a time based on the statistics of the entire data set. When the data contains a great deal of variety, we would not expect the same probability distribution to work at every time point. We expect that there will still be correlations among the joint motions, but they will occur over more local time scales. Both methods do address local correlations in that each point is created in relation to its neighbors. In the kernel-based method, correlations to neighboring points are explicitly specified in the probability distributions being used for sampling and optimization. In the PCA based method, the coefficients for the eigenvectors are found based on the keyframed data at one particular time point. The values at that time point will depend on the values at neighboring time points, so the coefficients in turn will usually vary smoothly as long as the same basis vectors are used. However, for motions with more variety in them it is not appropriate to use the same basis vectors for the entire motion sequence, so discontinuities arise at the points in time where we change from one basis to another. In either case, these dependencies were not enough to create a well-defined curve in the 55

56 CHAPTER 5. FRAGMENT BASED METHODS case for a data set with a large amount of variety in it. A large amount of noise tended to appear. In both methods, it is possible to increase the dependence on neighboring points. In the kernel-based method, one can add more and more points to the probability distributions, and in the PCA based method one can add previous points into the vector of known data being used to synthesize the rest. Making these adjustments do indeed improve the result, but not to the point of being good enough for high quality animations, and in addition the computation time becomes quite lengthy, especially for the case of the kernel-based distributions. To solve this problem, it became clear that it would make sense to consider local correlations among whole fragments of data, not just single points. However, there are other complications that arise from using fragments of data. One must decide how to break the data into fragments, and how to measure correlations among them. The kernel-based distributions of chapter 3 do not easily lend themselves to use with fragments of data. In addition, one must decide how to smoothly join the fragments, as discontinuities will inevitably arise. In this chapter we discuss our solutions to these problems. We were successfully able to apply them to data with a high amount of variation in it. The goal of the project described in this chapter and published in [41] is to create a method for creating animations that combines the strengths of keyframe animation with those of using motion capture data. The animator begins by creating a rough sketch of the scene he or she is creating by setting a small number of keyframes on a few degrees of freedom. The information in motion capture data is used to add detail to the degrees of freedom that were animated if desired, a process we call adding texture to the motion. Degrees of freedom that were not keyframed at all are synthesized. The result is an animation that does exactly what the animator wants it to, but has the nuance of live motion. This idea is illustrated graphically in figure 5.1. 5.2 Related Work Numerous other projects besides ours have addressed the problem of synthesizing motions or altering pre-existing motions to have a particular style. In work with similar goals to ours but applied to image-based graphics, other researchers [46] develop the concept of a video

5.2. RELATED WORK 57 20 10 (a) Angle in Degrees 0 10 20 0 1 2 3 4 5 6 7 15 10 5 0 (b) 5 0 1 2 3 4 5 6 7 30 20 (c) 10 0 10 0 1 2 3 4 5 6 7 Time in Seconds Figure 5.1: Illustration of the difference between texturing and synthesis. All plots are of the lowest spine x angle. (a) A keyframed curve is shown with a dashed blue line. Here we did not illustrate the key positions in this figure, only the resulting curve after the computer interpolates between them. Note its smooth appearance, as is common with computer generated curves. The solid magenta line is the result after texturing. (b) In this plot, we consider a case in which the spine angle was not animated at all, as indicated by the dashed blue line which does not change with time. This degree of freedom was synthesized, and the result is shown with the solid magenta line. (c) A plot of motion capture data is shown here for comparison. Note that its overall appearance is similar to the textured and synthesized curves. Compare also figure 1.2.

58 CHAPTER 5. FRAGMENT BASED METHODS texture, which enables a user to begin with a short video clip and then generate an infinite amount of similar looking video. Monte carlo techniques are used to address the stochastic nature of the texture, and appropriate transitions are found in the motion to create a loop. The method was applied to example motions that contain both a repetitive and stochastic component, such as fire or a flag blowing in the wind. In other interesting work, Chi and her colleagues [18] presented work with similar goals to ours, in that they were seeking to create a method that allows animators to enhance the style of pre-existing motions in an intuitive manner. They made use of the principles of Laban Movement Analysis to create a new interface for applying particular movement qualities to the motion. More recently, there have been a number of projects aimed toward allowing an animator to create new animations based on motion capture data. For example, in the work of Li et al. [30], the data was divided into motion textons, each of which could be modelled by a linear dynamic system. Motions were synthesized by considering the likelihood of switching from one texton to the next. Other researchers developed a method for automatic motion generation at interactive rates [4]. Here the animator sets high level constraints and a random search algorithm is used to find appropriate pieces of motion data to fill in between. In closely related work, the concept of a motion graph is defined to enable one to control a characters s locomotion [26]. The motion graph contains original motion and automatically generated translations, and allows a user to have high level control over the motions of the character. In the work of [27], a new technique is developed for controlling a character in real time using several possible interfaces. The user can choose from from a set of possible actions, sketch a path on the screen, or act out the motion in front of a video camera. Animations are created by searching through a motion data base using a clustering algorithm. Any of the above techniques would be more appropriate to use than ours in the case where the user has a large database of motions and wants high level control over the actions of the character. Our project is geared more toward an animator who may have a limited set of data of a particular style, and who wants to have fine control over the motion using the familiar tools of keyframing.

5.3. METHODS 59 5.3 Methods In human and animal motion, there are many correlations between joint actions. These correlations are especially clear for a repetitive motion like walking. For example as the right foot steps forward, the left arm swings forward, or when the hip angle has a certain value, the knee angle is most likely to fall within a certain range. We can see those correlations graphically with a plot such as that shown in figure 5.2, where we plot the knee angle as a function of hip angle for some human walking data. The fact that the plot has a specific shape, a skewed horseshoe shape in this case, indicates that there is a relationship between the angles. These relationships hold true for more complex motions as well, but may be more local in time, specific to a particular action within a motion data set. In our method we take advantage of these relationships to synthesize degrees of freedom that have not been animated. Similarly, we can add detail to a degree of freedom that has been animated by synthesizing only the higher frequency bands, a process we refer to as texturing. To begin, the animator must provide the following information: (1) which joint angle data should be textured; (2) which joint angle data should be synthesized; and (3) which joint angle data should be used to drive the motion in each case. (See section 5.5 for a more thorough discussion of the parameters the animater has control over.) For example, suppose an animator sketches out a walk by animating only the legs and wants to synthesize the upper body motions. A good choice for the degrees of freedom to drive the animation would be the hip x and knee x angle data (where we define the x axis as horizontal, perpendicular to the direction of walking), because the hip and knee action defines the walking motion. These data are broken into fragments, and used to find fragments of the motion capture data with hip x and knee x angles similar to what has been created by keyframing. The corresponding fragments of motion capture data for the upper body motion can then be used to animate the upper body of the computer character. We will refer to the angles used to drive the rest of the animation as the matching angles. The process of choosing these degrees of freedom is illustrated in figures 5.3 and 5.4. (Note that one could also use one or more of the 3 overall translational degrees of freedom in the matching step, in which case the matching angle data actually represents a translation, not a rotation) To achieve this task, we require a method to determine what constitutes a matching

60 CHAPTER 5. FRAGMENT BASED METHODS 70 Correlation Between Joint Angles 60 50 Knee angle in degrees 40 30 20 10 0 10 40 30 20 10 0 10 20 30 40 50 60 Hip angle in degrees Figure 5.2: Correlation between joint angles. Shown is the left knee x angle versus the left hip x angle for each point in time for human walking data. Data points are indicated with blue circles, and points that are consecutive in time are connected by black lines. The fact that this plot has a definite form demonstrates that the angles are related to each other. (Also see figure 3.4.)

5.3. METHODS 61 spine1 x spine1 y spine1 z spine2 x spine2 y spine2 z spine3 x spine3 y neck x neck y neck z head x head y head z Left Shoulder x Left Shoulder y Left Shoulder z Left Elbow x Left Wrist x Left Wrist y Left Wrist z Right Shoulder x Right Shoulder y Right Shoulder z Right Elbow x Right Wrist x Right Wrist y Right Wrist z Left Hip x Left Hip y Left Hip z Left Knee x Left Ankle x Left Ankle y Left Ankle z Right Hip x Right Hip y Right Hip z Right Knee x Right Ankle x Right Ankle y Right Ankle z Figure 5.3: Choosing the matching angles from the keyframed data. Shown are plots of joint angle as a function of time for some of the degrees of freedom from a keyframed sketch of a humanoid character. (Not all of the degrees of freedom of this particular character are shown to save space). In this sketch, only the lower body degrees of freedom were animated, as can be seen by the fact that only the joint angles from the legs show any change with time. We choose some of the degrees of freedom that were animated to serve as the matching angles that will drive the rest of the animation. In this example we use the left hip x and knee x angles, as indicated with red dashed lines in the figure. spine1 x spine1 y spine1 z spine2 x spine2 y spine2 z spine3 x spine3 y neck x neck y neck z head x head y head z Left Shoulder x Left Shoulder y Left Shoulder z Left Elbow x Left Wrist x Left Wrist y Left Wrist z Right Shoulder x Right Shoulder y Right Shoulder z Right Elbow x Right Wrist x Right Wrist y Right Wrist z Left Hip x Left Hip y Left Hip z Left Knee x Left Ankle x Left Ankle y Left Ankle z Right Hip x Right Hip y Right Hip z Right Knee x Right Ankle x Right Ankle y Right Ankle z Figure 5.4: Matching angles in the motion capture data. In these plot we show the same degrees of freedom as in figure 5.3, but for the motion capture data. Here we can see that the motion for all degrees of freedom, including the upper body, is specified. The matching angles selected from the keyframed data are again indicated here by dashed red lines. These selected degrees of freedom will be compared to the keyframed data to find similar regions as described in the text.

62 CHAPTER 5. FRAGMENT BASED METHODS region of data. The problem is complicated by the fact that the keyframed data may be of a different time scale from the real data. In addition, the ends of the fragments we choose must join together smoothly to avoid high frequency glitches in the motion. We address these issues in our method, which we divide into the following steps: (1) frequency analysis (2) matching (3) path finding and (4) joining. In the following explanation, as in figures 5.3 and 5.4, we will use the example of using the left hip x and left knee x angles as the matching angles. The figures in the next sections will illustrate using these matching angles to synthesize data for one of the spine x angles. Also note that we define keyframed data as the data at every time point that has been generated in the animation after setting the keyframes (figure 1.2a). 5.3.1 Frequency Analysis In order to separate different aspects of the motion, the first step is to divide the data (both keyframed and motion capture) into frequency bands (figure 5.5). For a joint that has already been animated, we may only want to alter the mid to high frequency range, leaving the overall motion intact. For a degree of freedom that has not been animated, we may wish to synthesize all of the frequency bands. In the work described here we have used the Laplacian pyramid decomposition to create the frequency bands, because of its simplicity. In particular we used the algorithm which is thoroughly described in reference [16]. In this thesis we will use the convention of numbering the frequency bands from highest frequency to lowest, so that band 1 is the highest frequency band. The lowest possible band that can be generated depends on the length of the data set [16]. Usually we used 8 bands, so band 8 would represent the lowest frequency band, and contain any constant offset in the data. In all of the following work, we omit band 1, because we found it often added only undesirable noise to the result. 5.3.2 Matching Matching is at the heart of our method. It is the process by which fragments of data from the keyframed animation are compared to fragments of motion capture data to find similar regions. To begin this step, a low frequency band of one of the joints is chosen, in our

5.3. METHODS 63 Keyframed Data Motion Capture Data Figure 5.5: Frequency analysis. Shown are bands 2-7 (where lower numbers refer to higher frequencies) of a Laplacian Pyramid decomposition of the left hip x angle for dance motions from both keyframing and motion capture. Higher frequency bands are shown at the top of the figure, lower frequency bands at the bottom. Adding all the bands together yields the original signal. One band, shown with a red dashed line, is chosen for the matching step. example the left hip x angle. The results are not highly dependent upon which frequency band is chosen, as long as it is low enough to provide information about the overall motion. For example in figure 5.5 we illustrate choosing band 6 of the Laplacian pyramid, but choosing band 4 or 5 also yields good results. Band 7 is too low, as can be seen by the lack of structure in the curve, and band 3 is too high, as it does not reflect the overall motion well enough. We find the locations in time where the first derivative of the chosen band of one of the matching angles changes sign. The real and keyframed data of all of the matching angles of that band (the left hip x and left knee x angles, in our example) are broken into fragments at those locations (figure 5.6). Note that in the figures we illustrate the process for just one of the matching angles, the hip, but actually the process is applied to all of the matching angles simultaneously. We also match the first derivative of the chosen band of each of

64 CHAPTER 5. FRAGMENT BASED METHODS these angles. Including the first derivatives in the matching helps choose fragments of real data that are more closely matched not only in value but in dynamics to the keyframed data. Note that the sign change of the first derivative of only one of the angles is used to determine where to break all of the data corresponding to the matching angles into fragments, so that all are broken at the same locations. Keyframed Data Motion Capture Data (a) (b) (c) (d) Figure 5.6: Breaking data into fragments. The bands of the keyframed data and motion capture data shown with red dashed lines in figure 5.5 are broken into fragments where the sign of the first derivative changes. (a) keyframed data. (b) motion capture data. (c) keyframed data broken in to fragments. (d) motion capture data broken into fragments. All of the fragments of keyframed data in the chosen frequency band and their first derivatives are stepped through one by one, and for each we ask which fragment of real data is most similar (figure 5.7a). To achieve this comparison, we stretch or compress the real data fragments in time by linearly resampling them to make them the same length as the keyframed fragment. In the motion capture data, there are often unnatural poses held for relatively long periods of time for calibration purposes. To avoid choosing these fragments, any real fragment that was originally more than 4 times as long as the fragment of

5.3. METHODS 65 Low Frequency Hip Angle Data (A Matching Angle) (a) keyframed fragment matching fragments Spine Angle Data for Synthesis (b) High Pass filtered Spine Angle Data for Texturing (b) Figure 5.7: Matching. (a) A longer segment of the low frequency band of the hip x angle data (a matching angle) from figures 5.5 and 5.6 is shown here again in black, broken into fragments. To the left in blue is the first keyframed fragment from figure 5.6c. Note the position of this fragment is arbitrary here, it is shown only for purposes of comparison to the motion capture curve. We wish to find fragments of the motion capture data that are similar to it, and some possibilities are shown with dashed magenta lines. (b) The spine x angle motion capture data from the same locations in time is shown, broken into fragments at the same location as the matching angle data. If the animator wished to synthesize the spine angle data, the fragments of spine angle data from the same locations in time where the matching hip angle data was chosen would be saved, as indicated by dashed magenta lines. (c) If on the other hand the animator had already keyframed a sketch of the spine angle motion and wished to texture the result, only the high frequency bands of the spine angle data would be selected. Shown is a plot of the sum of bands 2 and 3 of a Laplacian pyramid decomposition of spine x angle motion capture data, and the chosen fragments after matching are again indicated by the dashed magenta lines.

66 CHAPTER 5. FRAGMENT BASED METHODS 2 (a) 2 (b) 1 1 0 0 1 1 2 2 3 10 20 30 40 3 20 40 60 80 2 (c) 2 (d) Angle in degrees 1 0 1 2 1 0 1 2 3 10 20 30 40 Time in Seconds X 24 3 10 20 30 40 Figure 5.8: Close-up of the matching process. Each keyframed fragment is compared to all of the motion capture fragments, and the K closest matches are kept. Shown is the process of matching the first fragment shown in figure 5.6c. (a) The keyframed fragment to be matched. (b) The keyframed fragment, shown in a thick blue line, compared to all of the motion capture fragments, shown in thin black lines. (c) Same as figure 5.8b, but the motion captured fragments have been stretched or compressed to be the same length as the keyframed fragment. (d) Same as figure 5.8c, but only the 5 closest matches are shown. keyframed data being matched is rejected. We find the sum of squared differences between the the keyframed fragment being matched and each of the real data fragments, and keep the K closest matches. (figures 5.8). As we save fragments of the matching angles, we also save the corresponding fragments for all of the angles to be synthesized or textured (figures 5.7b, 5.7c, and 5.9). We usually omit the very highest frequency band of a Laplacian pyramid decomposition of the data, as it often contributed only undesirable noise. If an angle is being synthesized, we keep fragments of all of the frequency bands, in other words the original data. If the angle is being textured, we keep only some of the upper frequency bands. The same upper frequency

5.3. METHODS 67 2 1 0 1 2 (a) Angles in Degrees 0 50 100 150 200 250 40 30 (b) 20 10 0 10 0 50 100 150 200 250 40 30 20 10 0 10 (c) 0 50 100 150 200 250 Time in Seconds X 24 Figure 5.9: Matching and synthesis. (a) The five closest matches for a series of fragments of keyframed data is shown. The keyframed data is shown with a thick blue line, the matching motion capture fragments are shown with thin black lines. (b) An example of one of the angles being synthesized is shown, the lowest spine joint angle rotation about the x axis. The five fragments for each section come from the spine motion capture data from the same location in time as the matching hip angle fragments shown in figure 5.9a. (c) An example of a possible path through the chosen spine angle fragments is shown with a thick red line.

68 CHAPTER 5. FRAGMENT BASED METHODS Before Texturing After Texturing Figure 5.10: Texturing. Shown are bands 2-7 of a Laplacian Pyramid decomposition of the lowest spine x angle for a keyframe animation of dance motion. On the left is the original keyframed data. On the right is the result after texturing, in which bands 2-3, shown in magenta, have been replaced by joined fragments of the corresponding bands of the motion capture data as described in the text. bands of the keyframed data will then be replaced with those that we create from the fragments of high frequency motion capture data (figure 5.10). The number of bands to keep is a choice made by the animator. Choosing more bands may make the motion more lifelike, but may change the motion from what the animator originally intended, especially if there were not similar motions in the motion capture data. When using the Laplacian pyramid decomposition, we found changing bands anywhere from 2-3 to 2-7 (where lower numbered bands are higher frequencies) to yield good results; in each case it is an artistic decision how many to use. At this point, it is sometimes beneficial to include a simple scale factor. Let A be the m n matrix of values in the keyframed data being matched, where m is the number of matching angles and n is the length of those fragments. Let M be the m n matrix of one of the K choices of matching fragments. Then to scale the data, we look for the scale factor s that minimizes Ms A. The factor s is then multiplied by all of the data being

5.3. METHODS 69 synthesized. In practice such a scale factor is useful only in a limited set of cases, because it assumes a linear relationship between the magnitude of the matching angles and the magnitude of the rest of the angles, which is not usually likely to be true. However, it can improve the resulting animations for cases in which the keyframed data is similar to the motion capture data, and the action is fairly constrained, such as walking. More fragments than just the closest match are saved because there is more to consider than just how close the data fragment is to the original. We must take into consideration which fragments come before and after. We would like to encourage the use of consecutive chunks of data as described in the next section. 5.3.3 Path finding Now that we have the K closest matches for each fragment, we must choose a path through the possible choices to create a single data set. The resulting animation is usually more pleasing if there are sections in time where fragments that were consecutive in the data are used consecutively to create the path. As a result, our algorithm considers the neighbors of each fragment, and searches for paths that maximize the use of consecutive fragments. For each join between fragments, we create a cost matrix, the i jth component of which gives the cost for joining fragment i with fragment j. A score of zero is given if the fragments were consecutive in the original data, and one if they were not. We find all of the possible combinations of fragments that go through the points of zero cost. This technique is easiest to explain using an example, which is diagrammed in figure 5.11. Suppose we had 4 fragments of synthetic data to match, and saved 3 nearest matches. In the illustration we show that for fragment 1 of the keyframed data, the best matches were to fragments 4, 1, and 3 of the real data, and for fragment 2 of the keyframed data the closest matches were to fragments 5, 7, and 2 of the real data, and so on. We have drawn lines between fragments to indicate paths of zero cost. Here there are three best choices. One is fragment 4, 5, 6, and 2 from the real data. In this case we choose fragment 2 of the real data to match the fourth fragment of keyframed data rather than 8 or 5 because it was originally the closest match. A second possible path would be 4, 5, 4, and 5, and a third would be 1, 2, 4, 5. All three would yield two instances of zero cost. An example of an

70 CHAPTER 5. FRAGMENT BASED METHODS Keyframed Fragment 1 2 3 4 Matching Data Fragments 4 5 9 2 1 7 4 8 3 2 6 5 Cost Matricies 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 Figure 5.11: Choosing a path by maximizing the instances of consecutive fragments. In the table we show a hypothetical example of a case where four keyframed fragments were matched, and the K = 3 closest matches of motion capture fragments were kept for each keyframed fragment. The matches at the tops of the columns are the closest of the 3 matches. Blue lines are drawn between fragments that were consecutive in the motion capture data, and the cost matricies between each set of possible matches are shown below.

5.3. METHODS 71 actual path taken through fragments chosen by matching is shown in figure 5.9c. Note that for z instances of zero cost, there can be no greater than z paths to consider, and in fact will usually be far less because the instances can be linked up. In our example (figure 5.11) there were four instances of zero cost, but only three possible paths that minimize the cost. The P best paths (where P is a parameter set by the animator) are saved for the animator to look at. All are valid choices, and ultimately it is an artistic decision which is best. 5.3.4 Joining Now that we have the best possible paths, the ends may still not quite line up in cases where the fragments were not originally consecutive. For example, in figure 5.9c we show an example of data after matching and choosing the paths. To take care of these discontinuities, we join the ends together by the following process. For fragment i, we define new endpoints. The new first point will be the mean between the first point of fragment i and the last point of fragment i 1. (Note that there is overlap between the ends of the fragments; if the last point of fragment i is placed at time t, the first point of fragment i + 1 is also at time t.) The new last point of fragment i will be the mean between the last point of fragment i and the first point of fragment i + 1. The next step is to skew the fragment to pass through the new endpoints. To achieve this warping, we define two lines, one that passes through the old endpoints, and one that passes through the new endpoints. We subtract the line that passes through the old endpoints and add the line that passes through the new endpoints to yield the shifted fragment. The process is diagramed in figure 5.12. In order to further smooth any remaining discontinuity, a quadratic function is fit to the join region from N points away from the joint point to within 2 points of the join point, where N is a parameter. A smaller value of N keeps the data from being altered too greatly from what was in the motion capture data, and a larger value more effectively blends between different fragments. In practice we found a N from 5-20 to be effective, corresponding to 0.2-0.8 seconds. The resulting quadratic is blended with the original joined data using a sine squared function as follows.

72 CHAPTER 5. FRAGMENT BASED METHODS 30 30 25 20 (a) 25 20 (b) 15 15 10 10 5 5 0 0 5 20 40 60 80 100 120 5 20 40 60 80 100 120 30 30 Angle in Degrees 25 20 15 10 5 0 (c) 25 20 15 10 5 0 (d) 5 20 40 60 80 100 120 Time in Seconds x 24 5 20 40 60 80 100 120 Figure 5.12: Joining the ends of selected fragments. (a) Four fragments of spine angle data that were chosen in the matching step are shown. Note this graph is a close up view of the first part of the path illustrated in figure 5.9c. There are significant discontinuities between the first and second fragments, as well as between the third and fourth. (b) The original endpoints of the fragments are marked with black circles, the new endpoints are marked with blue stars. The second and third fragments were consecutive in the motion capture data, so the new and old endpoints are the same. (c) For each fragment, the line between the old endpoints (black dashes) and the line between the new endpoints (blue solid line) are shown. (d) For each fragment, the line between the old endpoints is subtracted, and the line between the new endpoints is added, to yield the curve of joined fragments. The new endpoints are again marked with blue stars.

5.3. METHODS 73 4 2 (a) 0 2 4 Joined Data Quadratic Fit 5 10 15 20 25 30 35 40 Angle in Degrees 4 2 0 2 4 (b) Joined Data Smoothed Data 5 10 15 20 25 30 35 40 Time in Seconds x 24 Figure 5.13: Smoothing at the join point. A close up of the join between fragments 1 and 2 from figure 5.12 is shown with a red solid line. (a) The quadratic fit using the points on either side of the join point (as described in the text) is shown with a black dashed line. (b) The data after blending with the quadratic fit is shown with a blue dashed line. Define the blend function f as f = (cos πt 2N + 1)2 (5.1) where t is the time, shifted to be zero at the join region. If we define q as the quadratic function we obtained from the fit, and m as the data after matching, then the data s after smoothing is An example of this process is shown in figure 5.13. s(t) = f (t)q(t) + (1 f (t))m(t). (5.2)

74 CHAPTER 5. FRAGMENT BASED METHODS 5.4 Experiments We tested our method on several different situations, three of which are described below. 5.4.1 Walking A short animation of two characters walking toward each other, slowing to a stop, stomping, and crouching was created using keyframes. Keyframes were set only on the positions of the root (not the rotations) and feet. Inverse kinematics were used on the feet at the ankle joint, as is customary in keyframe animation of articulated figures, and the joint angles for the hips and knees were read out afterwards for use in texturing and synthesis. Each character s motion was enhanced using a different motion capture data set. The two data sets each consisted of walking with roughly a single step size, but each exhibited a very different style of walking. One was a relatively normal walk, but rather bouncy, and the other was of the same funky walk used in chapter 3 and was quite stylized, containing unusual arm and head gestures. The length of each data set was 440 time points at 24 frames per second, or about 18 seconds worth of data. A Laplacian pyramid was used for the frequency analysis. The 4th highest band was used for matching. For texturing, bands 2-3 were synthesized, and for synthesis, all bands 2 and lower. The upper body degrees of freedom could successfully be synthesized using a number of different combinations for the matching angles. For example, both hip x angles; the left hip x and left knee x angle; or the right hip x and right knee x all gave good results. The most pleasing results were obtained by using data from the left hip x and left knee x angles during the stomp (the character stomps his left foot) and data from both hips for the rest of the animation. Scaling after matching also improved the results in this case, for example when the character slows down and comes to a stop, scaling caused the motion of the body and arm motions to reduce in coordination with the legs. The method does not directly incorporate hard constraints, so we used the following method to maintain the feet contact with the floor. First the pelvis and upper body motions were synthesized. Since altering the pelvis degrees of freedom causes large scale motions of the body, inverse kinematic constraints were subsequently applied to keep the feet in place on the floor. This new motion was used for texturing the lower body motion during

5.4. EXPERIMENTS 75 Figure 5.14: Example frames from the walking animations. On the top row are some frames from the keyframed sketch, and on the bottom row are the corresponding frames after enhancement. times the feet were not in contact with the floor. The motion of the characters was much more life-like after enhancement. The upper body moved in a realistic way and responded appropriately to the varying step sizes and the stomp, even though these motions were not in the motion capture data. In addition, the style of walking for each character clearly came from the data set used for the enhancement. Some example frames are shown in figure 5.14. 5.4.2 Otter Character Although we have focussed on the idea of filling in missing degrees of freedom by synthesis or adding detail by texturing, the method can also be used to alter the style of an existing animation that already has a large amount of detail in it. To test this possibility, we used an otter character that had been animated by keyframe animation to run. Using the motion capture sets of walking described above, we could affect the style of the character s run by texturing the upper body motions, using the hip and knee angles as the matching angles. The effect was particularly noticeable when using the funky walk for texturing, as the otter character picked up some of the head bobs and asymmetrical arm usage of the motion capture data. Some example frames are shown in figure 5.15.

76 CHAPTER 5. FRAGMENT BASED METHODS Figure 5.15: Example frames from animations of the otter character. On the top row are some frames from the original keyframed animation, while on the bottom are the corresponding frames after texturing. 5.4.3 Modern Dance In order to investigate a wider range of motions than those related to walking or running, we turned to modern dance. Unlike other styles of dance such as ballet or other classical forms, modern does not have a set vocabulary of motions, and yet it uses the whole body at its full range of motion. Thus it provides a situation where the correlations between joints will exist only extremely locally in time, and a stringent test of our method. A modern dance phrase was animated by sketching only the lower body and root motions with keyframes. Motion capture data of several phrases of modern dance was collected, and a total of 1097 time points (24 frames per second) from 4 phrases was used. The upper body was synthesized, and the lower body textured. The same method for maintaining feet contact with the floor that was described above for the walking experiments was used here. The frequency analysis was the same as for the walking, except that the 6th highest band was used for matching. A lower frequency band was used because the large motions in the dance data set tended to happen over longer times than the steps in walking.

5.5. DISCUSSION 77 Figure 5.16: Example frames from the dance animations. The blue character, on the left in each image, represents the keyframed sketch. The purple character, on the right in each image, shows the motion after enhancement. The results were quite successful here, especially for synthesis of the upper body motions. The motions were full and well coordinated with the lower body, and looked like something the dancer who performed for the motion capture session could have done, but did not actually do. Some example frames are shown in figure 5.16. The best results were obtained by using all of the hip and knee angles as the matching angles, but some good animations could also be created using fewer degrees of freedom. In these experiments, the effects of choosing different paths through the matched data became especially noticeable. Because of the wide variation within the data, different paths yielded significantly different upper body motions, all of which were well coordinated with the lower body. 5.5 Discussion Presently the two main methods by which the motions for computer animations are created are by using keyframes and by using motion capture. The method of keyframes is labor

78 CHAPTER 5. FRAGMENT BASED METHODS intensive, but has the advantage of allowing the animator precise control over the actions of the character. Motion capture data has the advantage of providing a complete data set with all the detail of live motion, but the animator does not have full control over the result. In this work we present a method that combines the advantages of both methods, by allowing the animator to control an initial rough sketch of an animation with keyframes, and then fill in missing degrees of freedom and detail using the information in motion capture data. Missing degrees of freedom are created by synthesis, and detail is added to keyframed degrees of freedom by texturing. The same method is used for both synthesis and texturing, but in texturing only the higher frequencies of the degree of freedom being enhanced are changed. One drawback of this method as it currently stands is that it does not directly incorporate hard constraints. As a result the texturing cannot be applied directly to cases where the feet are meant to remain in contact with the floor, unless it is combined with an inverse kinematic solver in the animation package being used. Currently we are working to remedy this deficiency. Another active area of research is to determine a more fundamental method for breaking the data into fragments. In this work we used the sign change of the derivative of one of the joint angles used for matching, because it is simple to detect and often represents a change from one movement idea to another. The exact choice of where to break the data into fragments is not as important as it may seem. What is important is that both the keyframed and real data are broken at analogous locations, which is clearly the case with our method. The method could be made more efficient by detecting more fundamental units of movement that may yield larger fragments. In fact, our method of breaking the data into fragments that are likely to be smaller than fundamental units of motion, and then reassembling units by looking for consecutive series of fragments, is a way of achieving the same result as breaking the data into larger fragments. However, due to the complexity of human motion, the problem of finding the larger fundamental units of motion in the first place is a challenging one, and an ongoing topic of research. At this point it is worth taking a moment to review the choices the animator must make when using this method: (1) which degrees of freedom to use as matching angles; (2) which degrees of freedom should be synthesized; (3) which degrees of freedom should

5.5. DISCUSSION 79 be textured; (4) whether to use scaling; (5) which frequency band should be used in the matching step; (6) how many of the higher frequency bands should be used for degrees of freedom being textured; and (7) how many matches should be kept. On the surface having to make so many choices may seem daunting, but ultimately each is an artistic decision for the animator to make. The best results will depend on the particular motion being created, and so these choices should be left up to the artist rather than being made automatically by the computer. If one has spent some time keyframing a character, choosing the matching angles for the case of synthesis is straightforward. The most simplistic approach is to simply use all of the degrees of freedom that the animator has sketched out the motion for with keyframes. In many cases, however, fewer degrees of freedom can be specified, and equally good results can be obtained. If the motion has less variations, such as walking, the results will still be pleasing if fewer angles are chosen as the matching angles. In fact it is fascinating how correlated the motions of the human body are. Given only the data in two angles, such as the hip x and knee x angles of one leg, one can automatically create life-like upper body motions that are coordinated with the lower body. However, for a motion with more variation in it, such as dancing, it is better to include more angles, to ensure good choices during matching. If fewer joints are used for matching in this case, some of the resulting paths may still be good results, but others may appear somewhat uncoordinated with the full body. For the case of texturing, the results are often better if one chooses matching angles from joints that are the same as or near the one whose motion is being textured. Notice that we can use a degree of freedom as a matching angle to texture itself. For example, suppose we were texturing the left elbow x angle. We could use the keyframed data of the left elbow x angle in the matching step. We may also want to include some of the other degrees of freedom that are nearby, such as the left shoulder angles. The reason the results are better if we use only nearby joints is that in the case of texturing we want to keep the motion close to what the animator had already created. By matching only left arm joints to texture left arm angles, we ensure that we are choosing fragments of motion capture data that specifically match what the left arm is doing. If the motion capture data set is not large, there may not be any places in the data that closely match what the animator has created

80 CHAPTER 5. FRAGMENT BASED METHODS for the whole body motion. On the other hand, there are likely to be pieces of left elbow x angle data that are close to what is in the keyframed data, and thus would be appropriate to be used in texturing. Scaling is quite useful for some animations, but creates poor results in others, so we leave it up to the animator whether to use it or not. It is very useful if the keyframe animation contains motion similar to what is in the data, but at a different magnitude. For example, in the walking animation described in section 5.4.1, when the character walks with smaller and smaller steps until he comes to a complete stop, it made sense to have scaling turned on. Thus the upper body motions selected from the motion capture data of walking for use in synthesis of the upper body motions of the character would appropriately get smaller and smaller as he slowed to a stop. However, if one is synthesizing motion with more variety in it, such as the dance motions, the use of scaling sometimes creates inferior results. In the case of the dance motion capture data, the motion was often extreme. If it is scaled even slightly either up or down, one would often get artifacts like arms passing through the body or uncoordinated motion. In addition, scaling can creates problems during the path finding step. If we find a path with a number of consecutive fragments, but all of the fragments have been scaled to different degrees, they will no longer join together as nicely as if they were not scaled. A possible solution would be to redo the scaling after selecting the paths, and ensure that any consecutive series of fragments had the same scale factor applied. Work is in progress to investigate this possibility. Choosing which degrees of freedom to texture and which to synthesize is also straightforward. If the animator has set some keyframes on a degree of freedom, but wants the motion to have more detail, then it should be textured. In other words, the keyframed data is broken into frequency bands, and the lower bands are left unchanged, while the higher bands are altered as described in this chapter. On the other hand, if a degree of freedom was not keyframed at all, and the animator wants it to be animated, then it should be synthesized. The next choice to be made is which frequency bands to use in matching. The choice is not difficult to make, and in fact the results are not highly dependent upon the choice. Any low frequency band that provides information about the overall motion will provide good results. The resulting animations sometimes vary from one another depending upon which

5.5. DISCUSSION 81 frequency band is chosen as slightly different regions of data are matched, but more often they are quite similar. If too high of a band is chosen, however, the resulting animation has an uncoordinated look to it, as the overall motion is not accurately represented in the matching step. It should also be noted that it is important to use a low frequency band that does not contain any constant offset or very slow baseline shift, i.e. the output of a lowpass filter will not be effective. The reason is that differences in offset in the data and in the keyframed animation will cause the matching to be inaccurate. For degrees of freedom that are to be textured, the animator must specify how many of the higher frequency bands to alter. This choice will depend highly on the situation. If the animator wants the motion to remain very close to what he or she has created, then the texturing should alter only the very highest bands, say 2-3 of a Laplacian pyramid decomposition. (Recall we usually omit the very highest band because it often adds undesirable noise). In particular, if the motion capture data does not have motions similar to what the animator has created, then only a few of the top bands should be altered. On the other hand, if the animater doesn t mind if the motion changes somewhat from what he or she animated, or if the motion capture data contains motions quite similar to what has been animated, then more bands can be included in the texturing. In fact we have produced good results using all of the bands except the constant offset. Similarly, how many matches to keep is also an artistic decision. For the case of texturing, good results can often be obtained by keeping only one match, which will yield only one possible path. This result makes sense, as in texturing the animator wants to keep the motion similar to what he or she has already created, and since only higher frequency bands are involved, discontinuities between fragments are not likely to be very great when compared to the magnitude of the overall signal, and our joining method easily created smooth transitions between them. On the other hand, for synthesis, especially of motion with a great deal of variety in it, it is better to keep more matches. In practice we found that saving roughly 1/10 the total number of motion capture fragments produced good results. Saving too many matches resulted in motions that were not coordinated with the rest of the body. Saving too few decreased the chances of finding a path through the fragments that included one or more series of fragments that were originally consecutive in the motion capture data.

82 CHAPTER 5. FRAGMENT BASED METHODS A common question that arises regarding our method for choosing a path through the matches is: why do we not take into account the discontinuity between fragments that were not consecutive in the original data in the case of synthesis? We did in fact implement a version of this algorithm that took the distance between the ends of all the fragments, consecutive or not, into account. In this version, for each join between fragments we create a K K (where K is the number of matches) cost matrix. The i jth component gives the distance between the last point of match i to the first fragment and the first point of match j of the second fragment being joined. (Compare to the cost matricies shown in figure 5.11, where the cost was a binary decision depending on whether the matching fragments were consecutive or not). We then search through all possible paths, and keep the P paths with the minimum cost, where P is a parameter selected by the animator. One is unlikely to set P much higher than 10, because most people will probably not take the time to look at more paths than that. We found that this method of selecting the best paths was not as effective as the method of considering only whether a fragment was consecutive or not. First, it should be noted that in the case of non-consecutive fragments, our method of joining the ends using baseline shifting and blending with a quadratic function at the join point usually creates a smooth transition from one fragment to the next. In addition, the discontinuities are usually not too great. If we look at the matches for two consecutive fragments, we see that even if they weren t consecutive in the motion capture data, in many cases they appear almost as if they could have been consecutive (see figure 5.9). Even more importantly, however, is the fact that in the case of synthesis, selecting fragments that have ends near each other doesn t necessarily improve the appearance of the output animation. However, selecting consecutive fragments has an enormous impact, especially when the motion capture data has a large amount of variety in it. As a result, the animator would want most of all to see the different possible combinations of consecutive fragments. If we also consider the discontinuities between the matched fragments that are not consecutive for any of the paths, the top M paths will probably all look very similar, with minor differences between regions of non-consecutive fragments. But what the animator would probably rather see is an example of another possible path that includes a different

5.5. DISCUSSION 83 series of consecutive matches, even if the overall score is higher because of fewer consecutive matches or a particularly large discontinuity in one of the sets of non-consecutive matches. An example may help clarify this idea. If we look back at figure 5.11, the 3 paths the animator would probably be most interested in looking at would be the three mentioned in section 5.3.3, which were 4, 5, 6, and 2; 1, 2, 4 and 5; or 4, 5, 4, and 5. However, if we consider all possible paths, there are K J+1, where J is the number of joins. In this example, with 3 matches and 3 joins, there are 81 possible paths. Suppose there happened to be a rather large discontinuity between fragments 6 and 2. Then if the animator only looks at the top M matches, it is unlikely that he would ever see the path 4, 5, 6 and 2, which might actually be a pleasing one to look at because it has a series of 3 consecutive fragments in it. In a more realistic example, one might save 5 matches for an animation with 20 fragments, which means there would be over 10 14 possible paths, but there might be only of the order of 5 or 10 unique combinations of consecutive paths. In such a case it becomes even more important to pick out only the paths of most value to the animator. All in all, we have deliberately left many decisions up to the animator. For example, consider again the choice of matching angles. The animator may want to try a couple of different combinations to see which give the best results. It may even be best to use different matching angles in different parts of the animation. For example, in the walking animation described in section 5.4.1, there is a moment when the character stops his left foot. The only input motion capture data in this case was of walking, and the animator would want to use upper body and arm motion similar to walking when the left leg is forward, but more extreme. Thus during the stomp it made sense to use the left hip and knee x angles as matching angles, with scaling turned on. In this case the animator used her knowledge of how she had created the motion of the character (by using the left hip x and knee x angles to cause him to stop his foot) to decide what the matching angles should be, and whether to use scaling. It will often be the case that the animator will want to user her knowledge in this manner, and having the computer automatically make the choices would most likely interfere with her creative process. As a result, we leave many of the parameters that must be chosen in this texture/synthesis method up to the animator.

84 CHAPTER 5. FRAGMENT BASED METHODS

Chapter 6 Conclusions and Discussion The need for truly life-like animations that capture the fine detail and subtleties of motion has become even more pressing with the advent of photo-realism in computer graphics. In such situations, the observer expects the motion to reflect life, much more so than when the animated characters are more cartoon-like. Animators at studios that do work in photorealistic settings, even if they are experienced professionals, find their jobs to be extremely challenging. One solution to the problem of how to achieve life-like motion is to use motion capture data. The drawback of using motion capture data is the lack of flexibility. Motion capture data is difficult and expensive to collect, and once it has been collected, it may not be exactly what the animator needs. It can be edited, but it is still difficult to incorporate directly into the creative process of an animator trained to use keyframes. In fact in many cases the animator may not want the particular action in a data set. Instead, he may want to use the motion texture, to enhance an animation he or she has already partially created. One way to increase the flexibility of using motion capture data is to make it easier to obtain in the first place. Part of the problem with current technology is that obtaining motion capture data requires a great deal of time, effort, and planning. A designated room with special lighting and equipped with special sensor technology must be used. Days of data processing are often required. These factors reduce the flexibility of motion capture as an animation method, and inhibit the creative process by discouraging spontaneity. As a result, we have been working toward a video-based motion capture system. The ideal 85

86 CHAPTER 6. CONCLUSIONS AND DISCUSSION scenario would be for the animator to be able to simply video a subject doing actions similar to what he is trying to animate, or moving in a style he would like to use, and then be able to directly incorporate the motions captured on video into his animation. The work presented in chapter 2 is one step toward that goal. We have developed a method that allows one to obtain the joint positions from a series of video frames, an important step in the process of collecting motion capture data from video. The bulk of the work in this thesis has focussed on creating methods that are more flexible and intuitive for the animator to use after the data has been collected. Our goal has been to create a method in which the animator has control over the animation, but could capture the nuance and style of a motion capture data set. There were two main problems to address: (1) what features of the data are most important to preserve the texture of a motion (2) what is the best way to model these features. The features we have found most useful are the frequency characteristics of motion data and correlations among joints. Working in frequency space proved to be useful for giving the animator control over different aspects of the motion. The low frequencies describe the general path of the motion, and are often what the animator creates as a first draft. Detail and personality are often described by mid to high frequencies. In the synthesis methods described in chapters 3 and 4, the data was divided into frequency bands. One band was created at a time, and the bands were summed to yield the final result. In both cases, the methods were successful only with this multiresolution approach; applying them directly to the original data did not yield good results. In the fragment based method of chapter 5, it was also necessary to break the data into frequency bands, as accurate matching required considering the overall motion described by the lower frequency bands. Furthermore, whichever method is used, the frequencies involved are what distinguish between texturing and synthesis. The same method, whether sampling from kernel-based probability distributions, using principle components analysis, or matching fragments can be used for both texturing and synthesis. In texturing, only higher frequencies of the motion are altered, whereas in synthesis the entire frequency range is constructed. The use of correlations among joints was also key to the methods we have developed. In recent years, there has been a great deal of interest in reducing the dimensionality of

87 motion data, and in using automatic techniques from machine learning to achieve this simplification. We considered the use of PCA to simplify the data set, as discussed in chapter 4. However we found the greatest success by taking the alternative approach of looking at how nature has already packaged motion. For example, often motion originates near the pelvis, so if we know the hip motion, we have a good idea of what the spine motion might be. Or given a rough sketch of a motion curve by looking at its low frequency characteristics, we can guess possibilities for the more detailed higher frequency bands. This way of breaking down the motion also happens to be exactly the way an animator would begin creating an animation. Usually one begins by making a rough draft of the motion, animating the overall translations and rotations to set where the character should go in space, and animating the legs to go with that motion. The first drafts are not animated in detail. A few keyframes are set on some of the degrees of freedom, which creates a smooth motion without a lot of high frequency detail. Thus, it makes sense to use the low frequency information in these initial animations to drive the rest of the animation. We explored several different ways of using these features to assist in the creation of a life-like animation. In order to create a technique that is as flexible as possible, we first sought to develop methods for texturing and synthesis that could be applied point-by-point. In other words, each time point of the synthetic or textured data would be created one at a time, using the information in the motion capture data and some input by the animator. Such a method contrasts with the fragment-based method discussed in chapter 5, in which the data is broken into fragments and the synthetic data is created one fragment at a time. The advantages of a point-by-point method are that (1) no arbitrary decisions about where to break up the data need to be made and (2) there are no problems with discontinuities between fragments. In our earliest work, we modelled the data with multidimensional kernel-based probability distributions, which represented the joint probability of finding particular values of joint angles for different joints together at the same time. We also found it useful to model the correlation between frequency bands and previous and future points in time. The kernelbased representation was useful because it created a continuous distribution that could be sampled and used as a basis for optimization, and yet maintained the fine detail of the data set. This representation was successfully used as a basis for complete synthesis of cyclic

88 CHAPTER 6. CONCLUSIONS AND DISCUSSION motions, as described in chapter 3. The resulting motions captured the style and nuance of the motion capture data, and maintained realistic variations between cycles. However, this method is not as successful when applied to motion with more variation in it. The problem is that for a complex data set, considering the probability point-by-point over the entire data set does not create distinct enough distributions. In principle if we considered more dimensions in our distribution, especially by correlating to more points at past and future times, we might still obtain reasonable results with this method. However, the computations become quite slow, suppressing the usefulness of the method. There are a couple of solutions to this problem, described in chapter 5. First, the animator specifies more information about what the animation should be. For a more complex motion that is non-cyclic, the animator is likely to want to have more control over what the character does anyhow, otherwise the synthesized motion will just be a random mix of what was in the motion capture data. The information the animator should give is a sparse set of keyframes on a few degrees of freedom, in other words, a sketch of the motion. Second, instead of looking at one point at a time, we consider small fragments of data at a time, and select appropriate fragments by our matching process. These ideas led to our most successful method so far, and enabled the creation of a variety of life-like animations that combined the control of keyframing with the detail of motion capture. In fact, the goal of this project was not to create a completely automatic method, but to give the animator another tool for incorporating the information in motion capture data into his or her creations. Different choices of the matching angles can yield different results and provide the animator with different possibilities to use in the final animation. Another source of different motions comes from examining different paths through the best matches. The animator has the option of looking at several possibilities and making an artistic decision which is best. We hope that methods such as this one will further allow animators to take advantage of the benefits of motion capture data without sacrificing the control they are used to having when keyframing. Ultimately, we would like to allow an animator to start with any motion he or she likes the style of, either in motion capture or from another computer animation, and create an animation that captures that style. The animator should have options for how the new motion is created, either by simply specifying a few key poses and letting the computer

89 synthesize the rest, or starting from a complete set of angles and translations, such as if it came from another motion capture data set, and putting the texture of the desired style on top of it. This work is a step toward these goals, and provides more tools an animator can use to incorporate the information in motion capture data into his or her work in an intuitive manner.

Bibliography [1] E. J. Alexander, T. P. Andriacchi, and C. O. Dyrby. Internal to external correspondence in the analysis of lower limb bone motion. Proc. 1999 ASME Summer Bioengineering Conference, Big Sky Montana, pages 415 416, June 1999. [2] T. P. Andriacchi and E. J. Alexander. Studies of human locomotion: past, present, and future. Journal of Bioemechanics, 33(10):1217 1224, 2000. [3] T. P. Andriacchi, E. J. Alexander, M. K. Toney, C. O. Dyrby, and J. Sum. A point cluster method for in vivo motion analysis: Applied to a study of knee kinematics. Journal of Bioemechanical Engineering, 120(12):743 749, 1998. [4] O. Arikan and D. A. Forsyth. Interactive motion generation from examples. Proc. SIGGRAPH 2002, 2002. [5] D. Baraff. Curved surfaces and coherence for non-penetrating rigid body simulation. Proc. SIGGRAPH 1990, 24:19 28, August 1990. [6] D. Baraff. Fast contact force computation for nonpenetrating rigid bodies. Proc. SIGGRAPH 1994, July 1994. [7] D. Baraff and A. Witkin. Large steps in cloth simulation. Proc. SIGGRAPH 1998, pages 43 54, August 19980. [8] M. Berchuck, T. P. Andriacchi, B. R. Bach, and B. R. Reider. Gait adaptations by patients who have a deficient acl. Journal of Bone and Joint Surgery, 72A(871), 1990. 90

BIBLIOGRAPHY 91 [9] C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995. [10] B. Bodenheimer, A. Shleyfman, and J. Hodgins. The effects of noise on the perception of animated human running. Computer Animation and Simulation 99, Eurographics Animation Workshop, pages 53 63, September 1999. [11] J. S. De Bonet. Multiresolution sampling procedure for analysis and synthesis of texture images. proc. SIGGRAPH 1997, pages 361 368, 1997. [12] M. Brand. Voice puppetry. Proc. SIGGRAPH 1999, pages 21 28, 1999. [13] M. Brand and A. Hertzmann. Style machines. proc. SIGGRAPH 2000, pages 183 192, 2000. [14] C. Bregler. Learning and recognizing human dynamics in video sequences. Proc. CVPR, pages 569 574, 1997. [15] C. Bregler, J. Malik, and K. Pullen. Twist based acquisition and tracking of animal and human kinematics. Int. Journal on Computer Vision, In Press. [16] A. Bruderlin and L. Williams. Motion signal processing. proc. SIGGRAPH 1995, pages 97 104, 1995. [17] S. Chenney and D. A. Forsyth. Sampling plausible solutions to multi-body constraint problems. proc. SIGGRAPH 2000, pages 219 228, 2000. [18] D. Chi, M. Costa, L. Zhao, and N. Badler. The emote model for effort and shape. proc. SIGGRAPH 2000, pages 173 182, 2000. [19] S. L. Delp and J. P. Loan. A computational framework for simulation and anlaysis of human and animal movement. IEEE Computing in Science and Engineering, 2(5):46 55, 2000. [20] T. DeRose, M. Kass, and T. Truong. Subdivision surfaces in character animation. Proc. SIGGRAPH 1998, pages 85 94, July 1998.

92 BIBLIOGRAPHY [21] R. Fedkiw, J. Stam, and H. W. Jensen. Visual simulation of smoke. Proc. SIGGRAPH 2001, pages 23 30, 2001. [22] N. Foster and R. Fedkiw. Practical animation of liquids. Proc. SIGGRAPH 2001, pages 15 22, 2001. [23] M. Gleicher. motion editing with spacetime constraints. 1997 Symposium on Interactive 3D graphics, pages 139 148, 1997. [24] D. J. Heeger and J. R. Bergen. Pyramid-based texture analysis/sysnthesis. proc. SIGGRAPH 1995, pages 229 238, 1995. [25] J. Hodgins, W. L. Wooten, D. C. Brogan, and J. F. O Brien. Animating human athletics. proc. SIGGRAPH 1995, pages 229 238, 1995. [26] L. Kovar, M. Gleicher, and F. Pighin. Motion graphs. Proc. SIGGRAPH 2002, 2002. [27] J. Lee, J. Chai, P. S. A. Reitsma, J. K. Hodgins, and N. S. Pollard. Interactive control of avatars animated with human motion data. Proc. SIGGRAPH 2002, 2002. [28] J. Lee and S. Y. Shin. A hierarchical approach to interactive motion editing for humanlike figures. proc. SIGGRAPH 1999, pages 39 48, 1999. [29] J. Lee and S. Y. Shin. A coordinate-invariant approach to multiresolution motion analysis. Graphical Models, 63(2):87 105, 2001. [30] Y. Li, T. Wang, and H. Shum. Motion texture: A two-level statistical model for character motion synthesis. Proc. SIGGRAPH 2002, 2002. [31] B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. proc. 7th Int. Joint Conf. on Art. Intell., 1981. [32] G. Maestri. Capturing motion. Computer Graphics World, pages 47 51, December 1995. [33] M. Moore and J. Wilhelms. Collision detection and response for computer animation. Proc. SIGGRAPH 1988, 22:289 298, August 1988.

BIBLIOGRAPHY 93 [34] M. P. Murray, Z. Li, and S. S. Sastry. A Mathematical Introduction to Robotic Manipulation. CRC Press, 1994. [35] M. G. Pandy. Computer modeling and simulation of human movement. Annu. Rev. Bioengineer., 3:245 273, 2001. [36] K. Perlin. An image synthesizer. Computer Graphics, 19(3):287 296, 1985. [37] K. Perlin and A. Goldberg. Improv: a system for scripting interactive actors in virtual reality worlds. proc. SIGGRAPH 1996, pages 205 216, 1996. [38] Z. Popovic and A. Witkin. Physically based motion transformation. proc. SIGGRAPH 1999, pages 159 168, 1999. [39] C. C. Prodomos, T. P. Andriacchi, and J. O. Galante. A relationship between knee joint loads and clinical changes following high tibial osteotomy. journal of bone and joint surgery, 67A(8):1188 1194, 1985. [40] K. Pullen and C. Bregler. Animating by multi-level sampling. IEEE Computer Animation Conference, pages 36 42, 2000. [41] K. Pullen and C. Bregler. Motion capture assisted animation: texturing and synthesis. Proc. SIGGRAPH 2000, 2000. [42] M. H. Raibert. Legged Robots that Balance. Cambridge: MIT Press, 1986. [43] M. H. Raibert and J. Hodgins. Animation of dynamic legged locomotion. proc. SIGGRAPH 1991, pages 349 356, 1991. [44] C. Rose, M. Cohen, and B. Bodenheimer. Verbs and adverbs: multidimensional motion interpolation. IEEE Computer graphics and applications, 18(5):32 40, September 1998. [45] C. Rose, B. Guenter, B. Bodenheimer, and M. Cohen. Efficient generation of motion transitions using spacetime constraints. Proc. SIGGRAPH 96, pages 147 154, August 1996.

94 BIBLIOGRAPHY [46] A. Schodl, R. Szeliski, D. H. Salesin, and I. Essa. Video textures. Proc. SIGGRAPH 00, pages 489 498, 2000. [47] H. Sidenbladh, M. J. Black, and L. Sigal. Implicit probablilstic models of human moion for synthesis and tracking. European Conf. on Computer Vision, 2002. [48] M. Unuma, K. Anjyo, and R. Tekeuchi. Fourier principles for emotion-based human figure animation. proc. SIGGRAPH 1995, pages 91 96, 1995. [49] A. Witkin and M. Kass. Spacetime constraints. Computer Graphics, 22:159 168, 1988. [50] A. Witkin and Z. Popovic. Motion warping. proc. SIGGRAPH 1995, pages 105 108, 1995.