Additional File 1 - A model-based circular binary segmentation algorithm for the analysis of array CGH data



Similar documents
(Semi)Parametric Models vs Nonparametric Models

AREA COVERAGE SIMULATIONS FOR MILLIMETER POINT-TO-MULTIPOINT SYSTEMS USING STATISTICAL MODEL OF BUILDING BLOCKAGE

A New replenishment Policy in a Two-echelon Inventory System with Stochastic Demand

TRUCK ROUTE PLANNING IN NON- STATIONARY STOCHASTIC NETWORKS WITH TIME-WINDOWS AT CUSTOMER LOCATIONS

Bending Stresses for Simple Shapes

PCA vs. Varimax rotation

Keywords: Transportation network, Hazardous materials, Risk index, Routing, Network optimization.

Efficient Evolutionary Data Mining Algorithms Applied to the Insurance Fraud Prediction

STATISTICAL DATA ANALYSIS IN EXCEL

4. SHAFT SENSORLESS FORCED DYNAMICS CONTROL OF RELUCTANCE SYNCHRONOUS MOTOR DRIVES

AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM

Electric Potential. otherwise to move the object from initial point i to final point f

REAL TIME MONITORING OF DISTRIBUTION NETWORKS USING INTERNET BASED PMU. Akanksha Eknath Pachpinde

A Novel Lightweight Algorithm for Secure Network Coding

An Algorithm For Factoring Integers

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing

Quantization Effects in Digital Filters

Joint Virtual Machine and Bandwidth Allocation in Software Defined Network (SDN) and Cloud Computing Environments

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Statistical modelling of gambling probabilities

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

The Can-Order Policy for One-Warehouse N-Retailer Inventory System: A Heuristic Approach

Brigid Mullany, Ph.D University of North Carolina, Charlotte

A Coverage Gap Filling Algorithm in Hybrid Sensor Network

An Alternative Way to Measure Private Equity Performance

Mixed Task Scheduling and Resource Allocation Problems

Analysis of Premium Liabilities for Australian Lines of Business

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Prejudice and the Economics of Discrimination

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Simultaneous Detection and Estimation, False Alarm Prediction for a Continuous Family of Signals in Gaussian Noise

Gravitation. Definition of Weight Revisited. Newton s Law of Universal Gravitation. Newton s Law of Universal Gravitation. Gravitational Field

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Continuous Compounding and Annualization

Statistical Discrimination or Prejudice? A Large Sample Field Experiment. Michael Ewens, Bryan Tomlin, and Liang Choon Wang.

Supplementary Material for EpiDiff

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

Impact on inventory costs with consolidation of distribution centers

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

On the Efficiency of Equilibria in Generalized Second Price Auctions

Recurrence. 1 Definitions and main statements

Research on Cloud Computing Load Balancing Based on Virtual Machine Migration

PREVENTIVE AND CORRECTIVE SECURITY MARKET MODEL

REAL INTERPOLATION OF SOBOLEV SPACES

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Molecular Dynamics. r F. r dt. What is molecular dynamics?

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

An Introduction to Omega

Ilona V. Tregub, ScD., Professor

Portfolio Loss Distribution

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

Modeling and computing constrained

The Application of Fractional Brownian Motion in Option Pricing

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

A New Estimation Model for Small Organic Software Project

Criminal Justice System on Crime *

FAIR VALUATION OF VARIOUS PARTICIPATION SCHEMES IN LIFE INSURANCE ABSTRACT

A PARTICLE-BASED LAGRANGIAN CFD TOOL FOR FREE-SURFACE SIMULATION

Degrees of freedom in HLM models

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

SIMPLE LINEAR CORRELATION

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Survival analysis methods in Insurance Applications in car insurance contracts

Generalizing the degree sequence problem

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Forecasting the Direction and Strength of Stock Market Movement

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Damage detection in composite laminates using coin-tap method

AN EQUILIBRIUM ANALYSIS OF THE INSURANCE MARKET WITH VERTICAL DIFFERENTIATION

Order-Degree Curves for Hypergeometric Creative Telescoping

Green's function integral equation methods for plasmonic nanostructures

Reduced Pattern Training Based on Task Decomposition Using Pattern Distributor

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Effect of Contention Window on the Performance of IEEE WLANs

How to create RAID 1 mirroring with a hard disk that already has data or an operating system on it

Determinants of Borrowing Limits on Credit Cards Shubhasis Dey and Gene Mumy

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Credit Limit Optimization (CLO) for Credit Cards

CHAPTER 14 MORE ABOUT REGRESSION

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Perturbation Theory and Celestial Mechanics

Design and Implementation of a Smart LED Lighting System Using a Self Adaptive Weighted Data Fusion Algorithm

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Statistical Methods to Develop Rating Models

Redesign of a University Hospital Preanesthesia Evaluation Clinic. using a Queuing Theory Approach

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Marginal Returns to Education For Teachers

Security of Full-State Keyed Sponge and Duplex: Applications to Authenticated Encryption

the Manual on the global data processing and forecasting system (GDPFS) (WMO-No.485; available at

LINES ON BRIESKORN-PHAM SURFACES

Transcription:

1 Addtonal Fle 1 - A model-based ccula bnay segmentaton algothm fo the analyss of aay CGH data Fang-Han Hsu 1, Hung-I H Chen, Mong-Hsun Tsa, Lang-Chuan La 5, Ch-Cheng Huang 1,6, Shh-Hsn Tu 6, Ec Y Chuang* 1, and Ydong Chen*, 1 Gaduate Insttute of Bomedcal Electoncs and Bonfomatcs, Depatment of Electcal Engneeng, Natonal Tawan Unvesty, Tape 106, Tawan, Geehey Chlden's Cance Reseach Insttute, The Unvesty of Texas Health Scence Cente at San Antono, San Antono, TX 789, USA, Depatment of Epdemology and Bostatstcs, The Unvesty of Texas Health Scence Cente at San Antono, San Antono, TX 789, USA, Insttute of Botechnology, Cente fo Systems Bology and Bonfomatcs, Natonal Tawan Unvesty, Tape 106, Tawan, 5 Gaduate Insttute of Physology, Natonal Tawan Unvesty, Tape 100, Tawan, 6 Cathy Geneal Hosptal, Tape 106, Tawan Contents Compason Platfom ----------------------------------------------------------------------------------- 1 Typcal Estmates of Skewness and Kutoss fom Real acgh Data ----------------------- Compason of Pefomance between the Hybd CBS and ecbs -------------------------- The Valdty of acgh Data Smulaton Usng the Peason System --------------------- Altenatve Estmatos of Skewness & Kutoss ------------------------------------------------ 5 Supplementay Mateals Compason Platfom Tme consumpton studes wee made fo compang the algothm pefomance n speed usng the hybd CBS and ecbs. The hadwae fo compason s IBM xseve 5 wth two Xeon.GHz CPUs and 1G RAM. As fo softwae, DNAcopy veson 1.16 dstbuted though Boconducto/R [http://www.boconducto.og/ was nstalled. All paametes of CBS wee set as default, and the mbedded smoothng functon was used fo emovng outles. If not specal specfed, sgnfcance theshold of maxmal-t test was set as p-value < 0.01. Addtonal Fle 1

Typcal Estmates of Skewness and Kutoss fom Real acgh Data Supplementay Fgue 1 shows typcal estmates of skewness and kutoss fom eal acgh data. These values ae obtaned fom 10 beast cance acgh samples usng the Aglent Human Genome CGH 105A and 11 human globlastoma GBM acgh samples (GSE9177) usng the Aglent A human CGH aays. As shown n the fgue, acgh data ae typcally skewed wth -0. < skewness < 0. and heavy-taled wth.0 < kutoss <.5. Supplementay Fgue 1. Estmates of skewness and kutoss on eal data. These values ae obtaned fom (a) 10 beast cance acgh samples usng the Aglent Human Genome CGH 105A and (b) 11 human globlastoma GBM acgh samples (GSE9177) usng the Aglent A human CGH aays. Pe-segmentaton was appled befoe evaluatng the estmates; ths s to avod estmaton bas due to extemely lage values n the data. Compason of Pefomance between the Hybd CBS and ecbs The smulated data usng the second model mentoned n the atcle contans 1,500 pobes (N = 1,500) and one change-pont nea the edges o two change-ponts n the cente of the chomosomes. The locatons and ampltudes of change-ponts wee contolled by m = cvi, whee I s an ndcato functon, whch equals 1 fo segments between l < x < (l + k) and 0 othewse. Paamete k efes to the wdth of the vaaton, and l efes to the locaton of the vaaton. Supplementay Table 1 shows the esults. Addtonal Fle 1

Algothm 1 : hybd CBS (DNAcopy1.16) Algothm : ecbs Change- ponts ( edge ) Change- ponts ( cente ) k c methods Exact 0 1 >=5 Exact 0 1 >=5 5 hybd CBS 1 97 17 9 0 0 0 1 971 0 9 0 0 0 ecbs 18 968 10 0 0 0 15 966 0 0 1 0 hybd CBS 179 789 199 11 1 0 0 15 800 0 195 0 5 0 ecbs 18 77 1 11 1 0 0 180 767 0 8 0 5 0 hybd CBS 6 16 668 7 8 1 0 55 99 0 59 0 8 0 ecbs 695 69 719 9 1 0 60 0 656 0 11 0 hybd CBS 65 898 91 11 0 0 0 909 0 90 0 1 0 ecbs 71 888 99 1 0 0 0 5 897 0 101 0 0 hybd CBS 9 9 589 10 7 1 0 9 06 0 58 0 10 0 ecbs 77 77 608 11 1 0 57 89 0 60 0 9 0 hybd CBS 96 5 10 0 0 86 5 0 958 1 0 ecbs 0 0 97 0 0 866 7 0 956 0 17 0 hybd CBS 117 801 187 11 1 0 0 11 76 0 0 0 ecbs 16 786 01 1 1 0 0 15 78 0 8 0 0 hybd CBS 715 18 85 9 10 0 0 599 15 0 88 0 17 0 ecbs 76 11 865 9 0 1 60 19 0 857 0 1 0 hybd CBS 99 1 98 5 11 0 0 888 0 980 0 18 0 ecbs 9 1 988 6 0 1 890 0 98 0 16 0 hybd CBS 7 6 6 11 1 0 161 67 0 65 0 8 0 ecbs 59 600 8 1 1 1 165 61 0 78 0 10 0 hybd CBS 80 950 7 10 0 0 667 0 0 95 1 0 ecbs 807 7 958 8 6 0 1 668 0 951 0 16 0 hybd CBS 90 0 986 10 0 0 879 0 0 98 0 16 0 ecbs 90 0 986 10 0 1 87 0 0 978 0 0 Supplementay Table 1. The numbe of change-ponts detected by the hybd CBS and ecbs. We appled these methods to 1,000 datasets; each of them contans 1,500 pobes smulated fom the nomal dstbuton. The Exact columns count the numbe of cases n whch the segmentaton esults exactly match the desed numbe (1 fo edge and fo cente) and locatons of change-ponts. Hee k s the wdth of the changed segment and c s the numbe of standad devatons between the two means. Each dataset had one elevated egon angng fom to 5 ponts, and the elevated egon vaed fom to SDs above the mean. The cutoff of p-value fo the smulaton was 0.01. Addtonal Fle 1

The Valdty of acgh Data Smulaton Usng the Peason System In the study, the Peason system was assumed suffcent to smulate a wde ange of acgh data unde the null condton (no change-ponts). To assess the valdty of ou assumpton, we smulated seveal datasets usng the Peason system and compaed the dstbuton of smulated data to the dstbuton of eal acgh data usng a two-sample Kolmogoov-Smnov test (KS-test). One of the 10 beast cance and 11 globlastoma GBM acgh data ndcated n Secton Methods - Real acgh Data was selected and pe-segmented; afte the pe-segmentaton pocess, the skewness and kutoss of the aay wee estmated. Usng the estmates of mean, standad devaton, skewness and kutoss fom the selected aay as nput paametes, we andomly geneated 1,000 pobes usng a Matlab functon peasnd() as the smulated data fo hypothess testng. Addtonally, we andomly pcked up 1,000 pobes fom the selected aay afte pe-segmentaton. Ths set of pobes s the eal data fo hypothess testng. Now we have one smulated sample fom the Peason system and one eal sample fom the selected aay. A two-sample KS-test wth the null hypothess - the two datasets unde consdeaton ae fom the same contnuous dstbuton - was appled. If the p-value s smalle than alpha = 0.01, we eect the null hypothess. Real Data Sze 1000 Sze 100 Real Data Sze 1000 Sze 100 Aay #10 0 0 GSM188 0 Aay #19 0 1 GSM189 1 1 Aay # 0 1 GSM1850 Aay #8 1 1 GSM1851 0 Aay # 0 1 GSM185 0 Aay #5 1 GSM185 1 1 Aay #8 0 0 GSM185 0 1 Aay #65 0 0 GSM1855 0 1 Aay #7 1 GSM1856 7 1 Aay #78 0 1 GSM1857 0 GSM1858 1 1 Supplementay Table. The numbe of tmes among 100 that the p-values of the two-sample KS-test ae smalle than 0.01. Real data ae dawn fom the acgh data labeled n the column Real Data, whle the smulated data ae dawn fom the Peason system wth the paametes, mean, standad devaton, skewness, and kutoss, beng set as the same as the estmates deved fom the coespondng aay. The column Sze 1000 efes to the cases wth 1,000 pobes, and the column Sze 100 efes to the cases wth 100 pobes. Addtonal Fle 1

5 We epeated the pocess fo 100 tmes pe aay and lsted the numbe of tmes that the p-values ae smalle than 0.01. Supplementay Table shows the esults. As shown n the table, whethe data sze s lage (sze = 1000) o small (sze = 100), vaables dawn fom the eal data and vaables dawn fom the Peason system dd not lead to statstcally sgnfcant dffeence n dstbuton. Ths ndcates that ou assumpton - the Peason system can smulate acgh data - s sound and most lkely coect. Altenatve Estmatos of Skewness & Kutoss To avod estmaton bas due to copy numbe alteatons n data, we ted altenatve estmatos fo the nd, d, and th cental moments as follows. Let 1,,..., n denote ndependent and dentcally dstbuted (..d.) andom vaables wth E[ = μ. We ae hee nteested n devng the nd, d, and th cental moments. Assumng new andom vaables 1,,,,,, and, as 1,,,, 1,,,, whee 1 n 1 fo 1,, 1 n fo,, 1 n fo,, and 1 n fo,. An unbased estmato fo the nd cental moment mˆ has been poposed and s gven by mˆ E[( E[ E[ 1, ) 1, Smlaly, an unbased estmato fo the d cental moment by mˆ E[( E[ E[ n ) E[ 1, 1,, 1 n,. (s1) mˆ s poposed and gven, (s) Addtonal Fle 1

6 and an unbased estmato fo the th cental moment mˆ s poposed and gven by mˆ E[( E[ E[ n 1 ) E[ 1, 1,,, n,, 6E[,,. (s) Estmates of the skewness and kutoss of acgh data can theoetcally be deved usng ˆ, ˆ, and ˆ, whch ae gven by m m m / mˆ skewness mˆ, mˆ kutoss mˆ. The motvaton of usng the dffeence between neghbong pobes, 1,, nstead of the ognal data, s fom the obsevaton (shown n Supplementay Fgue ) that bas due to copy numbe changes can be vtually emoved. As shown n subplot (a), the ognal data contans a egon of obvous copy numbe gan, whle afte conveson (fom to ), as shown n subplot (b), egonal nose was conveted to pont nose. 1, Pactcally, the standad devaton of, 1, 1,, can be easly acheved usng the medan of absolute devaton (MAD) to avod the nfluence of pont nose, o, Usng the data conveson (fom to 1.785 MAD( ). 1, 1, 1, ) and the MAD method, we can obtan obust estmates of the nd cental moment, mˆ. Howeve, fo the d and th cental moment, we cannot smply apply the mean o the MAD opeatos to get obust estmates. Multple easons ae povded: 1) The nput acgh data,, may not satsfy the assumpton of ndependence completely. Whle we may get good estmates of standad devaton, the devaton of d and th moments eques much stngent ndependence condton; ) The MAD fo estmatng standad devaton eques a nomal dstbuton of 1,. Ths s not the case fo d and th moments, whee we cannot assume a symmetc dstbuton (skewness). Addtonal Fle 1

7 ) The mean opeato s pone to pont nose, whethe the nose s due to the conveson fom to, o due to the ntnsc nose fom aay measuement. 1, Ou expeence wth eal acgh data ndcates that the estmates povded by Eqs. (s1, s, s) wee not obust enough due to above easons. Thus, we appled the pe-segmentaton pocess to get accuate estmates of skewness and kutoss n the study. Pont nose due to conveson Supplementay Fgue. Suppose a egon (pobe #51 to #100) of copy numbe gan lfts the sequental data by 1, (a) estmatng cental moments fom the ognal acgh data may lead to based esults due to egonal noses fom CNAs; (b) estmatng cental moments fom dffeences between neghbong pobes can esult n mnmzed bas (pont nose). Addtonal Fle 1