A Novel Multitouch Interface for 3D Object Manipulation

A Novel Multitouch Interface for 3D Object Manipulation Oscar Kin-Chung Au School of Creative Media City University of Hong Kong kincau@cityu.edu.hk Chiew-Lan Tai Department of Computer Science & Engineering Hong Kong University of Science & Technology taicl@cse.ust.hk Introduction Nowadays multitouch techniques are commonly used in various applications. Many digital devices, from large desktop computers to handheld mobile internet devices, are now equipped with touchscreens or touchpads supporting multitouch operations. However, the use of multitouch input in real-world applications have so far been mainly for enhancing navigation and browsing functionalities, for example, browsing of images and maps on a virtual plane or navigating 3D spaces using multitouch supported rotation and zooming operations. 3D modeling has made significant impact in the engineering, design and entertainment industries. However, the interfaces provided in commercial modeling systems are mostly designed for use with keyboard and single-point-input (mouse or pen devices). To cope with the massive complexity of modern 3D modeling software, designers rely on a large set of keyboard shotkeys and mode-switching buttons. This is particularly apparent for 3D manipulation, which, due to its high frequency of use, is usually allocated the most common buttons and keys. Ideally, the use of buttons and keys should be avoided on touch-based systems. While the traditional 3D manipulation widgets found in commercial modeling systems can be integrated into touch-based interfaces, the design of these widgets is based on a tool-switching metaphor which conflicts with the more seamless toolfree philosophy of multitouch based paradigm. In addition, the effective fingertip-blob input resolution of touch devices makes the smaller widgets in standard interfaces difficult to operate. There has been little research on using multitouch input for complex editing such as manipulation of multiple objects in 3D space. In fact multitouch input contains rich orientation and transformation information, allowing user to provide multiple input data using a single multitouch action. This avoids tedious editing steps such as mode/tool switching and item selection required in traditional modeling environments. In this paper we present a novel multitouch interface for direct 3D object manipulation, which is widgetless and buttonless, and supports imprecise touch-based input without sacrificing control power and usability. Using carefully-designed combinations of multitouch input, context-sensitive suggestions and gestural commands, our system supports most manipulating capabilities found in commercial 3D modeling interfaces. In addition to standard translation / rotation / scaling operations, our system includes quick free-form snapping and manipulation relative to arbitrary reference frames. The fluid transfer of reference frames between objects is enabled by our axis transfer interface, which further simplifies many 3D manipulation tasks among multiple objects. Technical Details 1. Finger registration Whenever the system detects five contact points, the finger registration procedure is invoked to determine which hand of the user is touching the screen and which contact point belongs to which finger. The process is performed in real time and

our system supports placing of fingers in any arbitrary location and orientation. The system draws circles around the detected contact points (Figure 1 left). Figure 1. Automatic finger registration. (Left) The palm and fingers are registered whenever the user s fingers touch the screen. (Right) Note that the thumb always has the largest spanning angle (red colored angles). The registration process is based on the relative positions of the finger tips when all the five fingers are touching the screen in a natural pose. We compute the center point of all the contact points and measure all the angles between the lines connecting each of the contact points with the center point (Figure 1 right). The spanning angle of each contact point is then defined as the sum of the two angles on each side of the connecting line. Since the thumb always has the largest spanning angle (when the fingers are in their natural pose), our system first identifies the thumb based on the spanning angle. The index finger is then detected as the contact point closest to the thumb. Next, we determine which hand is being used as follows. If the index finger appears before the thumb (assuming that the contact points are ordered anticlockwise around the center point), then it is the right hand, otherwise it is the left hand. The remaining contact points (middle, ring and little fingers) can then be easily determined in clockwise (anticlockwise, resp.) order according to the identified right (left, resp.) palm. To avoid activating the registration process through wrong touching inputs, we also check whether the contact points are within a reasonable distance from the center point and whether the spanning angle of each finger is within a reasonable range. Concurrent use with 1-point and 2-points interfaces Since the finger registration is only activated whenever five contact points are detected, our framework can be concurrently used with other touch-based interfaces that do not require finger registration, such as 2D rotation and scaling with two fingers, or direct cursor tracking, scrolling or flicking with one finger. Multi-user Support For large touch tablet devices that can support more than 10 touching points (e.g. Microsoft Surface, FTIR, SMART table, and also our apparatus), it is possible to group the touching points based on their spatial distances, apply finger registration to each group of contact points, and use the orientation of each detected palm to distinguish the different users. In other words, our finger registration method can support multi-user simultaneous use of the touchscreen, which is useful for multi-user applications such as interactive games. 2. Palm Menu In traditional desktop computing, menu systems are standard UI elements for navigating and accessing commands. Menu systems, in general, should provide efficient access and avoid occupying screen space of the main working screen area [Card82]. We present an efficient and intuitive command selection interface, which we call the Palm Menu, as an alternative to the traditional command selection interfaces such as toolbar and popup menu. The basic idea of the design is to minimize the hand and eye movement when the user selects a command using the Palm Menu.

Figure 2. The Palm Menu concept. A five-finger tap activates the menu buttons which are located exactly at the finger touch points. The Palm Menu is activated when the user performs a five-finger tap (touch and raise) action. First, the fingers are registered and then a set of popup buttons is defined at the contact points of the registered fingers (Figure 2). User can then tap one of the popup buttons to select a desired command. This design is simple and intuitive since different commands are directly mapped to different fingers. Moreover, users do not need to displace his/her hand when using the Palm Menu only a five-finger tap is needed to activate the Palm Menu and another in-place tap to select a command. Users even do not need to view the popup buttons because the buttons are already located exactly at the finger contact points. This avoids the switching of focus between the object to be manipulated and the menu itself. We consider the object under the index finger as the one selected when the Palm Menu is activated. This integrates the selection and command activation steps into one single user action. In addition, the Palm Menu inherits the other benefits of dynamic interfaces enabled by our finger registration method; specifically, it provides popup visualization and independence of hand s location, orientation and scale, allows imprecise input and supports twohand manipulation. The basic setting of Palm Menu only supports up to five menu commands per hand, which is quite a limitation and there is a need to extend the design to allow intuitive and efficient selection of more commands. We propose and examine two different extensions to allow more selectable commands, namely, the finger chord technique and the shifted buttons technique. The former technique uses multiple-finger tap (finger chords) as additional command selectors (rather than single-finger tap, see Figure 2 middle), while the latter technique introduces more popup buttons for the extra commands. The extra buttons are shifted (upward in our case) from the basic popup buttons to facilitate tapping with the corresponding fingers of a slightly shifted hand (Figure 2 right). 3. 3D Manipulation UI Designs To allow seamless browsing in 3D space and 3D manipulation, our UI does not require mode switching between browsing and manipulation. Instead, we adopt a context-sensitive approach to determine the user s desired operation based on several factors, namely, the current selected objects, the registered palm and fingers, and the motion and orientation of the contact points. We assign different operations to specific user actions, in order to provide natural and intuitive mapping between user action and 3D manipulation. 3.1 Global Browsing Our UI adopts the virtual trackball interface as the rotation controlling tool. Virtual trackball is a common interface for 3D graphics applications that mimics the paradigm of holding an object in the hand and inspecting, or examining it. Object and scene can be easily rotated about any axis. In our design, a start and end points specify the orientation and scale of rotation (see Figure 3). User activates the trackball interface by touching the screen using one finger (thus only one contact point), and then drag that finger to the desired end

position and finally raise the finger to finish the operation. ). Smooth transformation can be obtained as the end point is moved by the user, providing instant and continuous feedback. Figure 3. Global rotation. For global panning (panning the viewing point/camera parallel to the screen), our UI uses the right palm translation (with 5 fingers touching the screen in its natural pose) to activate the operation. It considers the screen as a large paper on the table and the panning operation simulates shifting the paper (see Figure 4). In order to avoid accidental activation of global panning, we rely on our palm and finger registration method to check if all the contact points are from the same palm. Figure 4. Global Panning. For global zooming (moving the viewing point/camera forward and backward), our UI uses the fingers radial movement of the right palm to activate the operation. It simulates the pulling and pushing actions corresponding to the zoom in and zoom out operations (see Figure 5). Also we use the finger registration to check whether the contact points are from the same palm before applying the operation. Figure 5. Global zooming.

3.2 Object Selection User selects an individual object by tapping the object using one finger. The selected object is highlighted in red, indicating that it is the focus object to receive subsequent operation (Figure 6 left). Multiple object selection is also possible. To extend the current selection, user uses a 5-finger tap with the new object to be included under the index finger. The newly selected object then changes to blue, indicating that it is now in the current selection (see Figure 6 right), while the first selected object (which we call the focus object) is still in red. We will present some operations that are performed on multiple objects in the scene in later sections. 3.2.1 Centering Object Figure 6. Object selection. To facilitate manipulation of the focus object, we provide a fast way for the user to center it on the screen. When the user double-taps an object, it is selected and the viewing point is panned automatically such that the newly selected object is centered on the screen (Figure 7). 3.3 Axis-based Manipulation Figure 7. Automatic centering of focus object. Most major commercial 3D modeling tools use 3D widget interfaces to provide detailed 3D manipulation. Generally, such widgets act as visual handles corresponding to specific axes for different operations such as rotation along an axis or translation in a particular direction. Almost all these 3D modeling tools are designed for single pointer interfaces (e.g., mouse and pen devices), thus the widgets they use always contain numerous small elements, corresponding to different operations. Figure 8 shows the 3D transformation widgets of several 3D modeling systems. The small elements such as arrows, curves and cubes are designed for precise manipulation of different operations, namely translation, rotation and scaling in the predefined axes of the object.

Figure 8: 3D transformation widgets used in existing 3D modeling systems. Translation widgets from 3DS Max (a) and Blender (b), Rotation widget from XSI (c), and combo-widgets from Houdini (d), Modo (e), and Maya (f and g). While the visual design varies, functionality is largely identical. Our general goal is to design a widgetless 3D modeling environment powered by multitouch input. As multitouch inputs contain rich orientation and transformation (motion of touch points) information, they suit the needs of 3D editing applications which require multi-dimension inputs. With the power of the finger and palm registration, we can define rich and easy-to-use gestures to replace the traditional complex widgets used in non-multitouch systems. We envision basic manipulation of 3D object achievable with two-point gestures, since such gestures are sufficient for specifying 2D position, orientation and transformation information in the screen space for editing process. Since a 2D orientation in screen space cannot define a unique 3D axis or orientation, we predefine a set of candidate axes of the focus object or a group of objects. The candidate axes usually form an orthogonal frame, for example, they may be the principal axes or the face normals of the boundary box of the selected objects. Users use a 2-point touch to specify a 2D orientation which is then compared with the projected directions of the candidate axes. The axis with the most similar orientation is used for the current manipulation (Figure 9). Note that the axis selection process is activated when the user uses a 2-point touch while there is an object focused. User can then apply manipulations right after the axis is selected, giving a seamless and smooth manipulation which integrates the operation selection, axis selection and editing into one single action. Figure 9: The axis for manipulation is chosen from the candidate axes based only on the angle between the 2D orientation specified by the 2-point touch (red circles) and the 2D projections of the axes. The selected axis is rendered as a thick red line. Note that the positions of touch point and axis are ignored. 3.3.1 Axis Translation Once the axis of an object is selected by a 2-point touch, and the user moves the touching fingers along the axis direction (i.e., translate the contact points parallel to the axis orientation), we consider that the axis translation is activated and translate the object according to the moved distance of the contact points (Figure 10). Similar to most 3D modeling tools

which consider the object center as the origin, the center of the selected object is taken as the origin of the translation in our setting. Recall that our system is widgetless which does not require user to touch specific positions on the object or elements of the a widget to perform editing. Only the orientation and motion given by the contact points are considered. This reduces the difficulty of editing using multitouch input since there is no complex widget needed to handle and no precise tapping and selecting are required. The amount of translation is according to the displacement of the contact points on the screen, allowing the whole screen space to be the input region and thus more easily achieving precise control. 3.3.2 Axis Rotation Figure 10. Axis translation. Orienting 3D objects using a 2D input device is difficult for non-experienced users. This is one of the reasons why 3D modeling systems have steep learning curves. The virtual trackball technique can provide intuitive control, however it is difficult to specify precise orientation. There is quantitative evidence that users are significantly faster and more accurate when performing the relatively simple single-axis rotation. Single-axis rotation is a common manipulation and commercial 3D modeling systems uniformly include constrained rotation widgets (see Figure 8). Expert users favor constrained rotations rather than free rotations with unconstrained virtual trackball. In our UI design, axis rotation is activated when the two touching fingers are moved perpendicularly to the selected axis (Figure 11). 3.3.3 Axis Scaling Figure 11. Axis translation. Like most 3D modeling systems, our UI supports two different types of scaling manipulations. The first one is uniform scaling, which scales simultaneously in all three dimensions. Similar to the global zooming, we design to use the radial movement of the left hand s fingers to activate this operation, but with the target object selected. For scaling in a particular axis direction, user contracts or expands the distance between the two contact points along the selected axis (Figure 12).

The amount of scaling (enlarge or shrink) is defined by the initial distance between the two contact points and their displacements. 3.3.4 Buffer Zone Figure 12. Axis scaling. The manipulation operations is based on the relative movement of contact points to determine the user desired operations, namely, 2D translation (for axis translation and axis rotation) and 2D scaling (for axis scaling). For robustness and minimize misuse of wrong operations due to unstable user touch input, our system ignores those movements of touch points within a given short distance, and only determines the transformation type and user desired operations when the contact points have displacements larger than the given threshold. This approach also allows the user to cancel the current manipulation by moving his/her fingers to the initial touch positions, causing the contact points to be located within the buffer zone and thus no manipulation is applied. 3.3.5 Object Duplication Object duplication is an important and basic operation in many 3D editing scenarios. Our multitouch modeling system supports object duplication in two ways. We call the first way active duplication, which is similar to axis translation, but a new copy of the selected object is created and translated (Figure 13). The active duplication is activated by using 3-finger translation (with thumb, index and middle). The translation direction of the fingers decide the translation axis used in the duplication operation. Figure 13. Active duplication. The second way is called transformed duplication, which involves two objects and creates a new copy of the selected object and transforms the new copy to a new location based on the transformation between the two selected objects. This operation is supported by many commercial modeling systems and is sometime referred to as advance duplication or transformed duplication. We include this operation in our system for users to create more complex scenes. Note that the

input selected object need not be of the same type since only the relative transformation between the objects is needed for the duplication. Figure 14. Transformed duplication. (Left) Initial selection of two objects, (Middle to right) Duplication results. 3.4 Snapping Snapping is one of the most effective ways of specifying highly accurate transformation in 3D modeling. Interactive snapping techniques are extensively used in 3D modeling systems, especially engineering CAD systems, as they avoid tedious menu or dialog operations. We also provide automatic snapping in our system to reduce user s interaction. We predefine the snapping planes and edges for each input object, which can be found by analyzing the surface normals of the object or simply use the faces of the object boundary box. The snapping planes and edges define the snapping conditions during manipulation. If two objects are snapped if any pair of their snapping planes or edges is closed to each other and have similar orientation (Figure 15). In the cases when two or more snapping pairs are found, the system will snap the object according to the distances of the snapping pairs, and ignore the snapping pairs if they conflict to the previously applied snapping (Figure 16). Figure 15. Automatic Snapping with single snapping condition.

Figure 16. Automatic Snapping with multiple conflicting conditions, only the closer snapping pairs are considered. 3.4.1 Active Snapping Automatic snapping is simple to use however it still requires the user to place the object at an approximate location and orientation to activate the snapping. It is useful and efficient to allow user to directly snap or stack objects together, without specifying the approximate transformation of the editing object. Our system provides such a functionality by allowing the user to use free touch path to snap objects, which we call active snapping. By projecting the normal of the snapping planes of the objects to the screen space, the user can easily select the snapping planes with a free touch path. The snapping plane of the editing object is selected if the starting orientation of the path drawn by the user s finger aligns with the plane normal (Figure 17), and the target object and its snapping plane are selected according to the ending location and orientation of the path (see some examples in Figure 18). This operation avoids the otherwise tedious movement and rotation of the objects, giving a convenient and intuitive interface for efficient multiple object editing. Figure 17. Selection of snapping plane. Figure 18. Active snapping examples. Snapping interfaces assume the existence of salient features such as flat patches or sharp edges, which can be used for defining the snapping actions. For curved regions or freeform surface, no snapping planes or edges can be predefined

automatically, however it is possible to use the local normal on the editing objects (at the start point and end point of the path) to define the snapping action. 3.5 Axis Transfer One limitation of most 3D editing systems is that, while it is easy to specify the canonical axes of the editing object, setting up arbitrary axes is often a difficult task and need extra manipulation to setup the pivot or frame object. However, often the user s desired axis is a canonical axis of another object. For example, in Figure 19, to construct the ladder, the horizontal cylinder needs to be translated in the coordinate system of the vertical beams. Generally, allowing transfer axes between objects would be helpful. We provide a general solution for this task using a two-hand operation. User can pick the canonical axis set of any object in the scene by pressing it with one hand and manipulate the target object using the other hand (Figure 19 right). This approach provides a solution to several other problems. For example, it allows the designer to construct arbitrary canonical axes for editing via placing a simple reference object as proxy. Figure 19. (Left) The user wants to duplicate the cylinder and translate it up the ladder, but it has no suitable axis. (Middle) User presses the beam and selects one of its canonical axes for subsequent editing. (Right) Editing result.