CDD user guide. PsN 4.4.8. Revised 2015-02-23



Similar documents
CA Workload Automation Agent for Databases

Regression Clustering

Setting Up the Site Licenses

CAPIX Job Scheduler User Guide

DiskPulse DISK CHANGE MONITOR

CLC Server Command Line Tools USER MANUAL

UCINET Quick Start Guide

Configuring Backup Settings. Copyright 2009, Oracle. All rights reserved.

Adding a DNS Update Step to a Recovery Plan VMware vcenter Site Recovery Manager 4.0 and later

SAS Visual Analytics 7.2 for SAS Cloud: Quick-Start Guide

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

This presentation introduces you to the new call home feature in IBM PureApplication System V2.0.

Adobe Acrobat 9 Deployment on Microsoft Systems Management

Quick Tutorial for Portable Batch System (PBS)

MapGuide Open Source Repository Management Back up, restore, and recover your resource repository.

Changing Passwords in Cisco Unity 8.x

Package hive. January 10, 2011

MarkLogic Server. Connector for SharePoint Administrator s Guide. MarkLogic 8 February, 2015

Siebel Application Deployment Manager Guide. Siebel Innovation Pack 2013 Version 8.1/8.2 September 2013

Integrating VoltDB with Hadoop

InfiniteInsight 6.5 sp4

How to Configure the Workflow Service and Design the Workflow Process Templates

AIMMS Function Reference - Data Change Monitor Functions

dotdefender v5.12 for Apache Installation Guide Applicure Web Application Firewall Applicure Technologies Ltd. 1 of 11 support@applicure.

HP A-IMC Firewall Manager

About this release. McAfee Application Control and Change Control Addendum. Content change tracking. Configure content change tracking rule

HP IMC Firewall Manager

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

Maintaining the Content Server

IBM SPSS Direct Marketing 23

SUMO user guide. PsN Revised

Batch and Import Guide

Legal Notes. Regarding Trademarks KYOCERA Document Solutions Inc.

Cache Configuration Reference

IBM SPSS Direct Marketing 22

Designing and Implementing Forms 34

TIBCO ActiveMatrix BusinessWorks Plug-in for Big Data User s Guide

Note : It may be possible to run Test or Development instances on 32-bit systems with less memory.

SpatialWare. Version for Microsoft SQL Server 2008 INSTALLATION GUIDE

PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Gentran Integration Suite. Archiving and Purging. Version 4.2

Mascot Search Results FAQ

LAB: Enterprise Single Sign-On Services. Last Saved: 7/17/ :48:00 PM

Future Technology Devices International Ltd. Mac OS X Installation Guide

A Computer Glossary. For the New York Farm Viability Institute Computer Training Courses

This document presents the new features available in ngklast release 4.4 and KServer 4.2.

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Silect Software s MP Author

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

User's Guide - Beta 1 Draft

Migrating to vcloud Automation Center 6.1

PROJECTIONS SUITE. Database Setup Utility (and Prerequisites) Installation and General Instructions. v0.9 draft prepared by David Weinstein

How To Install Netbackup Access Control (Netbackup) On A Netbackups (Net Backup) On Unix And Non Ha (Net Backup) (Net Backups) (Unix) (Non Ha) (Windows) (

Kiko> A personal job scheduler

Getting Started with IntelleView POS Administrator Software

Siebel System Monitoring and Diagnostics Guide. Siebel Innovation Pack 2013 Version 8.1/8.2 September 2013

Netezza PureData System Administration Course

Wave Analytics External Data API Developer Guide

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Exporting Client Information

User Document. Adobe Acrobat 7.0 for Microsoft Windows Group Policy Objects and Active Directory

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

EVALUATION ONLY. WA2088 WebSphere Application Server 8.5 Administration on Windows. Student Labs. Web Age Solutions Inc.

IBM SPSS Data Preparation 22

PingFederate. IWA Integration Kit. User Guide. Version 3.0

WS_FTP Professional 12

Monitoring Replication

IBM Sterling Control Center

Active-Active and High Availability

AXL Troubleshooting. Overview. Architecture

Siebel System Monitoring and Diagnostics Guide. Version 8.0, Rev. B December 2011

HYPERION DATA RELATIONSHIP MANAGEMENT RELEASE BATCH CLIENT USER S GUIDE

TREK GETTING STARTED GUIDE

Litigation Support connector installation and integration guide for Summation

IP Interface for the Somfy Digital Network (SDN) & RS485 URTSII

Structural Health Monitoring Tools (SHMTools)

RTI Database Integration Service. Getting Started Guide

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

Table of Contents. The RCS MINI HOWTO

Sage ERP MAS 90 Sage ERP MAS 200 Sage ERP MAS 200 SQL. Installation and System Administrator's Guide 4MASIN450-08

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

Creating Basic Custom Monitoring Dashboards Antonio Mangiacotti, Stefania Oliverio & Randy Allen

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Item Audit Log 2.0 User Guide

EMC Documentum Content Services for SAP iviews for Related Content

Package erp.easy. September 26, 2015

Outlook Express and Express Archiver to Backup and Retrieve at UW 1

Active-Active ImageNow Server

A QUICK OVERVIEW OF THE OMNeT++ IDE

FreeForm Designer. Phone: Fax: POB 8792, Natanya, Israel Document2

HP Operations Manager Software for Windows Integration Guide

Symantec ApplicationHA Agent for WebSphere Application Server Configuration Guide

Understanding Task Scheduler FIGURE Task Scheduler. The error reporting screen.

Exporting Contact Information

Project 5 Twitter Analyzer Due: Fri :59:59 pm

Uni Sales Analysis CRM Extension for Sage Accpac ERP 5.5

Transcription:

CDD user guide PsN 4.4.8 Revised 2015-02-23 1 Introduction The Case Deletions Diagnostics (CDD) algorithm is a tool primarily used to identify influential components of the dataset, usually individuals. The CDD works by identifying groups in the data set and creating one new data set for each member of the group, where that member has been removed. The model is run once with each new data set. The PsN implementation of the CDD can take any column as base for the grouping and all rows with the same value in that column will be considered a group as long as no individual contain multiple values in that column. One should take care that the grouping creates sensible individual records. PsN will renumber the ID column so that two individuals with the same ID will not end up next to each other. Examples cdd moxonidine.mod -case_column=1 cdd pheno.mod -case_column=age 2 Input and options 2.1 Required input A model file is required on the command-line. -case_column = name number The column on which the case-deletion is done. You can either 1

use the name of the column as specified in the $INPUT record in the model file or you can use the column number in the $INPUT record. Numbering starts with 1. 2.2 Optional input -bins = N Sets the number of databins, or cdd datasets, to use. If the number of unique values, or factors, in the case column is higher than N then one or more factors will be deleted in each cdd dataset. Specifying N as higher than the number of factors will have no effect. N is then reset to the number of factors. Default value = Number of unique values in the case column. -xv Default true. Run the cross-validation step (-xv) or not (-noxv). -selection_method = random consecutive Default consecutive. Specifies whether the factors selected for exclusion should be drawn randomly or consecutively from the datafile. -outside_n_sd_check = X Default 2. Mark the runs with CS-CR outside X standard deviations of the PCA. 2.3 Some important common PsN options There are many options that govern how PsN manages NONMEM runs, and those options are common to all PsN programs that run NONMEM. For a complete list see common_options.pdf, or psn_options -h on the commandline. -h or -? -help Print the list of available options and exit. With -help all programs will print a longer help message. If an 2

option name is given as argument, help will be printed for this option. If no option is specified, help text for all options will be printed. -directory = string Default cdd_dirn, where N will start at 1 and be increased by one each time you run the script. The directory option sets the directory in which PsN will run NONMEM and where PsNgenerated output files will be stored. You do not have to create the directory, it will be done for you. If you set -directory to a the name of a directory that already exists, PsN will run in the existing directory. -seed = string You can set your own random seed to make PsN runs reproducible. The random seed is a string, so both -seed=12345 and -seed=justinbieber are valid. It is important to know that because of the way the Perl pseudo-random number generator works, for two similar string seeds the random sequences may be identical. This is the case e.g. with the two different seeds 123 and 122. Setting the same seed guarantees the same sequence, but setting two slightly different seeds does not guarantee two different random sequences, that must be verified. -clean = integer Default 1. The clean option can take four different values: 0 Nothing is removed 1 NONMEM binary and intermediate files except INTER are removed, and files specified with option -extra_files. 2 model and output files generated by PsN restarts are removed, and data files in the NM_run directory, and (if option -nmqual is used) the xml-formatted NONMEM output. 3 All NM_run directories are completely removed. If the PsN tool has created modelfit_dir:s inside the main run directory, these will also be removed. 3

-nm_version = string Default is default. If you have more than one NONMEM version installed you can use option -nm_version to choose which one to use, as long as it is defined in the [nm_versions] section in psn.conf, see psn_configuration.pdf for details. You can check which versions are defined, without opening psn.conf, using the command psn -nm_versions -threads = integer Default 1. Use the threads option to enable parallel execution of multiple models. This option decides how many models PsN will run at the same time, and it is completely independent of whether the individual models are run with serial NONMEM or parallel NONMEM. If you want to run a single model in parallel you must use options -parafile and -nodes. On a desktop computer it is recommended to not set -threads higher the number of CPUs in the system plus one. You can specify more threads, but it will probably not increase the performance. If you are running on a computer cluster, you should consult your system administrator to find out how many threads you can specify. -version Prints the PsN version number of the tool, and then exit. -last_est_complete is optional and only applies with NONMEM7 and cdd option -xv. See common_options.pdf for details. 2.4 Auto-generated R-plots from PsN PsN can automatically generate R plots to visualize results, if a template file is available. The PsN installation package includes default template files for some tools, found in the R-scripts subdirectory of the installation directory, but R plots can be generated also for the other tools if the user provides the template file. Most of the default templates are quite basic, but the user can easily modify or replace them. 4

PsN will create a preamble with some run specific information, such as the name of the raw_results file, the parameter labels and the name of the tool results file. Please note that R plots generation might fail if there are special characters, such as µ, in the parameter labels. Some tools, for example vpc and randtest, also add tool-specific information to the preamble. Then the template file is appended. If the template file contains an old preamble, then that preamble will be replaced with the new one. This means the user can modify an old PsN-generated R script and then use this script as a new template, without having to remove the preamble. When a template file is available for the tool and option -rplots is 0 or positive, the R script will be generated and saved in the main run directory. If R is configured in psn.conf or command R is available and option -rplots is positive the script will also be run and a number of pdf-format plots be created. The default cdd template requires no special R libraries. If no pdf is generated, see the.rout file in the main run directory for error messages. -rplots = level -rplots<0 means R script is not generated -rplots=0 (default) means R script is generated but not run -rplots=1 means basic plots are generated -rplots=2 means basic and extended plots are generated -template_file_rplots = file When the rplots feature is used, the default template file PsN will look for is cdd_default.r. The user can choose a different template file by setting option -template_file_rplots to a different file name. PsN will look for the file in the template_directory_rplots directory, see the help text for that option. -template_directory_rplots = path PsN can look for the rplots template file in a number of places. PsN looks in the following places in the order they are listed: 1. template_directory_rplots from command-line, if set 2. calling directory (where PsN run is started from) 3. template_directory_rplots set in psn.conf 5

4. R-scripts subdirectory of the PsN installation directory -subset_variable_rplots = variable name Default not set. The user can specify a subset variable to be used with the -rplots feature. This variable will, if set, be set as subset.variable in the preamble, and can then be used in the plot code. The only default template that uses this variable is the execute template, but as mentioned before the user can create new templates in which this variable can be useful. Troubleshooting If no pdf was generated even if a template file is available and the appropriate options were set, check the.rout-file in the main run directory for error messages. 3 Known bugs/issues If NONMEM6 is used with the cdd and the S matrix is algorithmically singular (message in lst-file, checked also by sumo script) the Cook scores cannot be trusted. The cdd does not check for this error. 6