Chapter 2: Getting Started

Chapter 2: Getting Started Once Partek Flow is installed, Chapter 2 will take the user to the next stage and describes the user interface and, of note, defines a number of terms required to understand the implementation of Partek Flow. In conjunction with Chapter 3, the current discusses basic functions of Partek Flow and provides an introduction to next-generation sequencing analysis. This chapter covers the following topics: Naming conventions used in the manual Flow interface Quick start guide (for advanced users) 2.1 Naming Conventions Used in the Manual Compute cluster: A collection of network-connected computers that work together to improve performance and increase processing speed of tasks Compute cluster administrator: The system administrator appointed by the customer s company/institution and responsible for installing, maintaining, and allocating computing resources to Partek Flow and the compute cluster. The Partek Flow administrator and compute cluster administrator are distinct roles usually held by two different individuals. Compute node: A single computer that is part of a larger computer cluster. It is used to run jobs and can have a different amount of internal computing resources than the other computers within the same cluster. Computing resource: The physical components required to do computational work including, but not limited to, the number of CPUs, RAM, and available hard disk storage. Central processing units (CPU) and cores: A core is the basic computation unit of the CPU. At the hardware level, a CPU consists of one or more cores where each core is a processing unit. Increasing cores enables more jobs to run simultaneously and significantly increases work completed. At the software level, there is no distinction between CPUs and cores, therefore the total number of cores is reported as the number of CPUs. This is the case with Partek Flow as all references to CPU counts are the total number of CPU cores in the computer. Chapter 2 Getting Started 1

Head node: The controlling computer in a compute cluster. The job scheduler and Partek Flow server are both installed on the head node and accessible to users. In most cases, jobs are not processed on the head node, but on the compute nodes. Jobs: Computational work requested by users of a computing cluster. Each job requires computing resources to perform properly. Job queue: A list of computational work requested by all users of a computer cluster and managed by the job scheduler. This includes all non-flow work. There can be several job queues on a computer cluster with each having unique names and resource limits. Job scheduler: Responsible for managing jobs requested by users of a shared computing resource. Management is accomplished by prioritizing, running, and tracking the status of jobs based on available computational resources and peruser computing resource limits. Schedulers are commonly employed in computing clusters. Examples of job schedulers include SGE, Torque, and Slurm. Job submission script: The computer code that specifies the resources required for a specific job, along with the directions for how to execute the job. When starting a remote worker, the job submission script is used to specify to the job scheduler that all available resources on a particular computer will be used along with the command to start the remote worker. Typically anyone using a cluster must manually write this code to interact with the cluster. Partek Flow performs this for users and allows them to interact with the cluster without writing code. Linux user account: The account under which all work requested by a user of a computer cluster run. The Partek Flow server and workers all run under a single Linux user named flow and all work done by all Partek Flow users is done using this single Linux account. Partek Flow administrator: A special Partek Flow user account that is used to configure the Partek Flow server and manage user accounts. This account has full permissions to complete any action inside Partek Flow such as assigning user and project permissions and deleting user data. Partek Flow internal worker: A Partek Flow worker that runs on the same computer running the Partek Flow server. There can be only one internal worker. This worker should be used only for single-server Partek Flow installations and disabled on computer clusters in order to keep the head node responsive. Partek Flow queue: A list of Partek Flow jobs running or waiting to run. This list is usually sorted by priority or submission time. Partek Flow workers process the work in the Partek Flow queue. Chapter 2 Getting Started 2

Partek Flow remote worker: A Partek Flow worker that runs on a different computer than the Partek Flow server. There can be zero or more remote workers, which are managed by the Partek Flow server. Remote workers can be started using the cluster job scheduler. It is assumed that the Partek Flow remote worker can use all resources provided by a compute node, thus only one remote worker per compute node should be running. Partek Flow server: The program that schedules Partek Flow-related computational work and allows users to view and analyze Partek Flow-generated results. This program runs on the cluster head node and sends Partek Flow user requested computational work to Partek Flow workers. Partek Flow user account: User accounts internal to Partek Flow. There is no equivalency between Partek Flow user accounts and Linux user accounts. Partek Flow worker: A program installed on a single computer within a computer cluster and receives job requests from the Partek Flow server. The program will determine if the computer has the needed resources to complete the requested job, and based on the answer, deny or accept the job and report when completed. Server: Any program that users interact with in order to accomplish a specific task or to receive data. 2.2 Flow Interface 2.2.1 Home Page Partek Flow Home page is the first page displayed upon log in. It provides a quick overview of recent actions and access to several system options (Figure 2-1); the exact content on the Home page depends on the user status (i.e. regular vs. administrator; for the latter please see section on Main Icons later in this chapter). Figure 2-1: Partek Flow Home page of a regular user Chapter 2 Getting Started 3

System Options Icons and links in the upper right corner of the Home page (Figure 2-2) are not related to a project or a task, but to the Partek Flow application as a whole. Figure 2-2: System options of Partek Flow The left-most icon is the progress indicator, summarizing the current status of Partek Flow server. If no tasks are currently processed, the icon is grey and static and the idle message is shown upon mouse over (Figure 2-3). Figure 2-3: Progress indicator showing no tasks in progress (baloon displayed upon mouse over) If the server is running the progress indicator will depict blue waves and, following mouse-over, list the number of running tasks (launched by the user), estimated completion time as well the total number of queued tasks (launched by all the users) (Figure 2-4). Clicking on the icon opens the Data tab of the project with the current running task. Figure 2-4: Progress indicator showing an overview of current tasks (baloon displayed upon mouse over) The remaining options (and the avatar) are links to the following pages: My profile (Chapter 17.1.1), Help (Chapter 19), and Settings (Chapter 17.2). Recent Projects, Recent Activities, and Status Recent projects (Figure 2-1) lists up to seven projects with recent activity by the user. By default, the list contains all the projects which were created by the current user (i.e. the user is the project owner) or for which the user is a collaborator (for a discussion on collaborations, see Chapter 3.5). The list entries are links that automatically load the selected project. Older projects can be accessed by using the View all projects button, which opens the Project management page (Figure 2-5). Moreover, a new project can be created on the Project management page by selecting the New project link. Chapter 2 Getting Started 4

Figure 2-5: Project management page. For an administrator, the list contains all the projects created under the current instance of Partek Flow. For a regular user, the list contains only the projects involving the current user. The Recent projects list can be filtered based on the role played by the current user. Filter is activated by clicking on the arrow ( ) to the right of the All my projects label. Two options are available: to show only the projects created by the user (Owned by me), or to show the projects created by the user in addition to the projects for which the user is a collaborator (All my projects) (Figure 2-6). Filter 2-6: Filtering the Recent project lists based on user s role Recent activity (Figure 2-1) list mirrors the Recent projects list: it provides an overview of individual activities (i.e. tasks) in the projects displayed within the Recent projects. The list entries are links that load either Task details page of the task or the project page (depending on the context). If a previous task needs to be looked up (Recent activity list shows up to seven entries), it can be shown via the View all activity link. View all activity link loads the Activity log page (Figure 2-7). It displays all the tasks within the projects accessible to the user (either as the owner or as a collaborator), including the tasks launched by other users of the Partek Flow instance. Chapter 2 Getting Started 5

Figure 2-7: Activity log pages shows all the tasks within the projects accessible to the user The Display options enable filtering of the log. All activity will show all the tasks (irrespective of the task owner, i.e. the user starting the task), while My activity lists only the tasks started by the current user. In contrast to the latter, Collaborator s activity displays the tasks that are not owned by the current user (but to which the user has access as a collaborator). The Activity log page also contains a search function that can help to find a particular task. Search can be performed through the entire log (All columns), or narrowed down to one of the columns (by using the drop-down list). The search terms should be typed in the search box (an example in Figure 2-8). Figure 2-8: A search term entered in the search box The Status line summarizes the current tasks launched by the current user (either running or pending) (Figure 2-9). Figure 2-9: Status line summarizes the number of tasks queued by the current user (an example is shown) Chapter 2 Getting Started 6

Main Icons The main icons, shown at the top of the page, are quick links to commonly used tasks. A regular user will see two, New project and Queued tasks (illustrated by Figure 2-1), while an administrator will see the Resource management button as well (Figure 2-10). However, a regular user can access the same information via Settings > Resource Management. Note that functions of New project and Queued tasks are described in Chapters 3.1 and 3.4 (respectively). Figure 2-10: Main icons of the Home page as seen by a Partek Flow administrator 2.2.2 Resource Management System resources page gives an insight into the hardware resources available for analysis (Figure 2-11). Figure 2-11: System resources page (an example of a server with low workload is shown) Queues status section indicates the number of all currently Running tasks as well as the number of Pending tasks (i.e. a pending task waits for an upstream task to complete, for instance, post-alignment QA/QC needs to wait for the respective alignment task to finish). An estimated time of completion of all queued tasks is also given (Queue will empty by). A link out to the Queued tasks page is provided as well. Licensing section summarizes the number of Available cores licenses and Available worker licenses (for detailed discussion see Chapter 17.4.3). Active workers table lists one active worker node per row (an example with a single worker is shown in Figure 2-11) and contains performance metrics for worker processes and server processes. The columns provide the following information: Chapter 2 Getting Started 7

Name: worker s identifier (IP address or name); Worker CPU: the computational load of the worker process; Worker Memory: the memory usage of the worker process; Server CPU: the computational load of the machine running the worker process; Server Memory: the memory usage of the machine running the worker process; Uptime: the duration that the worker has been running; Type: type of the worker (see below); Stop: selecting the red square ( ) stops the worker and moves it to the list of Inactive workers (see below). Partek Flow supports two types of workers: internal (a worker running on the same machine and in the same process as the Partek Flow server) and worker (a worker that connects to Partek Flow system by means of running a worker process on the machine). If more than one worker is available to the given instance of Partek Flow, they will all be shown on the Resources management page, depending on the worker status. When a worker is no longer needed, it is considered inactive and moved from Active workers table to Inactive workers table. Figure 2-12 shows an example of two active workers, while an active and an inactive worker are depicted in Figure 2-13. An inactive worker will be labeled as unmanaged (Type column) and can be removed from the list of Inactive workers by selecting the red cross icon ( ; Delete column) (Figure 2-13). Figure 2-12: Resources management page displaying two active workers Chapter 2 Getting Started 8

Figure 2-13: Resources management page displaying an active and an inactive worker Finally, the Flow server utilization section of the Resource management page shows two pie charts, Cores and Memory, which summarize the information provided by the Active workers table. 2.3 Quick-start guide 1. New Project button on the Home page (Figure 2-1) creates a new project (detailed description in Chapter 3.1). 2. When the new project is created, use Add Sample button on the Data tab to import samples to the project (Chapter 3.2.2). 3. Once imported, the samples should usually be annotated by attributes and that can be done be selecting Manage attributes button on the Data tab (Chapter 3.2.4). 4. Switch to Analyses tab. Selecting a node (either a data node or a task node) invokes the toolbar, which shows options related to that node. Detailed description of the Analyses tab is provided in Chapter 3.3 and an overview is in Figure 2-14. Figure 2-14: Overview of the Analyses tab. A data set is analyzed by selecting data nodes and performing one of the tasks shown in the toolbox. A description of a task is contained within a task node. Data associated with particular data node can be downloaded to a local computer with the help of the Download data link. Mouseover balloon shows basic information about a (data) node, such as the total Chapter 2 Getting Started 9

number and size of the associated files. Tasks performed using the same tool, but with different options, are organized into layers. When an analysis protocol is optimized a pipeline can be created to save it for a future use. 5. Monitor the progress and manage the tasks by using the Queue tab (Figure 2-15). Detailed discussion of the Queue tab is provided in Chapter 3.4. Figure 2-15: Overview of the Queue tab. The current task is shown on the top row, with the information on its progress and an estimate on the completion time. A task can be quickly tracked down by using either the sort icons, or the search box. Chapter 2 Getting Started 10