Supplementary Material for EpiDiff



Similar documents
AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM

An Introduction to Omega

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing

ON THE (Q, R) POLICY IN PRODUCTION-INVENTORY SYSTEMS

Episode 401: Newton s law of universal gravitation

VISCOSITY OF BIO-DIESEL FUELS

Tracking/Fusion and Deghosting with Doppler Frequency from Two Passive Acoustic Sensors

2 r2 θ = r2 t. (3.59) The equal area law is the statement that the term in parentheses,

STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION

INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS

Software Engineering and Development

LATIN SQUARE DESIGN (LS) -With the Latin Square design you are able to control variation in two directions.

Comparing Availability of Various Rack Power Redundancy Configurations

The transport performance evaluation system building of logistics enterprises

YARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH

NBER WORKING PAPER SERIES FISCAL ZONING AND SALES TAXES: DO HIGHER SALES TAXES LEAD TO MORE RETAILING AND LESS MANUFACTURING?

Experiment 6: Centripetal Force

Semipartial (Part) and Partial Correlation

The Binomial Distribution

Deflection of Electrons by Electric and Magnetic Fields

The Role of Gravity in Orbital Motion

Chris J. Skinner The probability of identification: applying ideas from forensic statistics to disclosure risk assessment

Reduced Pattern Training Based on Task Decomposition Using Pattern Distributor

Comparing Availability of Various Rack Power Redundancy Configurations

Chapter 3 Savings, Present Value and Ricardian Equivalence

MATHEMATICAL SIMULATION OF MASS SPECTRUM

Gauss Law. Physics 231 Lecture 2-1

Effect of Contention Window on the Performance of IEEE WLANs

CRRC-1 Method #1: Standard Practice for Measuring Solar Reflectance of a Flat, Opaque, and Heterogeneous Surface Using a Portable Solar Reflectometer

Manual ultrasonic inspection of thin metal welds

The impact of migration on the provision. of UK public services (SRG ) Final Report. December 2011

FXA Candidates should be able to : Describe how a mass creates a gravitational field in the space around it.

Research on Risk Assessment of the Transformer Based on Life Cycle Cost

Ilona V. Tregub, ScD., Professor

30 H. N. CHIU 1. INTRODUCTION. Recherche opérationnelle/operations Research

An application of stochastic programming in solving capacity allocation and migration planning problem under uncertainty

Gravitational Mechanics of the Mars-Phobos System: Comparing Methods of Orbital Dynamics Modeling for Exploratory Mission Planning

INVESTIGATION OF FLOW INSIDE AN AXIAL-FLOW PUMP OF GV IMP TYPE

Review Graph based Online Store Review Spammer Detection

Peer-to-Peer File Sharing Game using Correlated Equilibrium

A Capacitated Commodity Trading Model with Market Power

Lecture 16: Color and Intensity. and he made him a coat of many colours. Genesis 37:3

Uncertain Version Control in Open Collaborative Editing of Tree-Structured Documents

Gravitation. AP Physics C

Data Center Demand Response: Avoiding the Coincident Peak via Workload Shifting and Local Generation

Carter-Penrose diagrams and black holes

An Epidemic Model of Mobile Phone Virus

NUCLEAR MAGNETIC RESONANCE

est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.

Efficient Redundancy Techniques for Latency Reduction in Cloud Systems

Instituto Superior Técnico Av. Rovisco Pais, Lisboa virginia.infante@ist.utl.pt

Adaptive Queue Management with Restraint on Non-Responsive Flows

There is considerable variation in health care utilization and spending. Geographic Variation in Health Care: The Role of Private Markets

Database Management Systems

Automatic Closed Caption Detection and Filtering in MPEG Videos for Video Structuring

Uncertainty Associated with Microbiological Analysis

4a 4ab b (count number of places from first non-zero digit to

Figure 2. So it is very likely that the Babylonians attributed 60 units to each side of the hexagon. Its resulting perimeter would then be 360!

High Availability Replication Strategy for Deduplication Storage System

Converting knowledge Into Practice

Financing Terms in the EOQ Model

MERGER SIMULATION AS A SCREENING DEVICE: SIMULATING THE EFFECTS OF THE KRAFT/CADBURY TRANSACTION

The Electric Potential, Electric Potential Energy and Energy Conservation. V = U/q 0. V = U/q 0 = -W/q 0 1V [Volt] =1 Nm/C

Voltage ( = Electric Potential )

IBM Research Smarter Transportation Analytics

Chapter 22. Outside a uniformly charged sphere, the field looks like that of a point charge at the center of the sphere.

CHAPTER 10 Aggregate Demand I

Exam #1 Review Answers

Patent renewals and R&D incentives

Self-Adaptive and Resource-Efficient SLA Enactment for Cloud Computing Infrastructures

HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING

The Impacts of Congestion on Commercial Vehicle Tours

A framework for the selection of enterprise resource planning (ERP) system based on fuzzy decision making methods

Optimizing Content Retrieval Delay for LT-based Distributed Cloud Storage Systems

Determining solar characteristics using planetary data

UNIT CIRCLE TRIGONOMETRY

A Web Application for Geothermal Borefield Design

Secure Smartcard-Based Fingerprint Authentication

METHODOLOGICAL APPROACH TO STRATEGIC PERFORMANCE OPTIMIZATION

Channel selection in e-commerce age: A strategic analysis of co-op advertising models

The LCOE is defined as the energy price ($ per unit of energy output) for which the Net Present Value of the investment is zero.

Symmetric polynomials and partitions Eugene Mukhin

Magnetic Bearing with Radial Magnetized Permanent Magnets

An Efficient Group Key Agreement Protocol for Ad hoc Networks

Optimal Capital Structure with Endogenous Bankruptcy:

Transcription:

Supplementay Mateial fo EpiDiff Supplementay Text S1. Pocessing of aw chomatin modification data In ode to obtain the chomatin modification levels in each of the egions submitted by the use QDCMR module povides a pocessing pipeline fo nomalization of chomatin modification density. The chomatin modification eads in each data file (i ) ae mapped to each egion ( ). Howeve the ead count ( Readcount _ i _ ) can t epesent the eal chomatin modification density due to the impact of diffeent egion length ( Length ) and total ead numbe of the data file ( Fileeads ). Thus the ead count is nomalized by the total numbe of bases in the egion and the total ead numbe of the data file as Wang et al. did in thei wok ( 1) to obtain the nomalized chomatin modification density ( CMD ). Howeve the nomalized chomatin modification density obtained by this method is minimal floating numbe close to 0 which is difficult fo mathematical calculations and vaiance analysis. In ode to ovecome this shotcoming of the method hee we popose a new nomalization algoithm. Fistly the total ead numbe Fileeads of each ChIP-Seq data file is counted. Then the mean ead numbe of all ChIP-Seq data files is calculated Fileeads _ i as i1 (1) MeanFileeads whee is the numbe of ChIP-Seq data files. Fo each ChIP-Seq data filei a weight is defined Fileeads _ i as ChIPSeqweight _ i. (2) MeanFileeads As a esult the nomalized density of chomatin modification i in egion is obtained as Readcount _ i _ CMD _ i _ ChIPSeqweight _ i. (3) Length _ The CMDs of multiple chomatin modifications in multiple egions ae used fo futhe analysis in QDCMR. Supplementay Text S2. Quantification of epigenetic diffeence by entopy In ode to quantify modification diffeence acoss samples we poposed a new method based on Shannon entopy. Although entopy has been used peviously to identify tissue-specific genes fom gene expession data (2) we fistly apply entopy to quantify epigenetic modification diffeence. The modification vecto m of egion acoss samples was defined

as m ( m1 m2 m s m ) whee ms epesents the modification level in sample s. In ode to equally quantify the modification diffeence of the egions with hype- o hypomodification in mino samples we calculated a one-step Tukey's biweight ( ) fo egion as Kadota et al. did in the development of ROKU method ( 3). One-step Tukey biweight povides a obust weighted mean that is elatively insensitive to outlies (4). The median M fo modification levels in samples of egion was fist computed. Then the median absolute distance ( MAD ) fo each fom the median was calculated as MAD s ms M. Thidly the median of the absolute distance ( S ) fom M was detemined. Fo each sample s a unifom measue of distance fom the cente was defined asu s ms M (4) cs whee c is a tuning constant (default T b s m s c =5) and is a vey small value used to avoid zeo values fom happening in the denominato (default =0.0001). A weight in each sample was then calculated by the bisquae function: 2 2 (1 us ) us 1 wu ( s ) 0 us 1. (5) Fo each sample s the weight was educed by a function of its distance fom the median M. Thus outlies can be effectively discounted by a smooth function. When modification levels ae vey fa fom the median thei weights ae educed to zeo. Finally the one-step Tukey's biweight ( wu ( s ) ms s1 egion was calculated ast b. (6) wu ( ) s s1 The pocessed modification level m' fo sample s then can be calculated by using T (a weighted s mean) as m m T. (7) ' s s b m' s s1 The sum of modification levels of egion in samples ( ) was teated as a total modification value. The atio of modification level of egion in sample s elative to the total value was defined as the elative modification pobability ps/ m' s / m' s which was then used to s1 calculate the egion s entopy as H p / lo g ( 2 p ). (8) s1 s s/ b T b ) fo Consideing of the ange of vaiation of the modification data the entopy fo each egion was

adjusted by a modification weight which was defined as w max( m ) min( m ) s s log 2( ) MAX MI whee max( ) and min( m ) wee the max and min modification level of egion in all samples m s s espectively and the MAX and MI wee defined as the highest modification level and the lowest modification level espectively and is a small value used to avoid zeo values in the logaithm (default =0.0001). Then the entopy calculated by pocessed modification vecto was adjusted by weight as H H w (10) whee epesents the extent of modification diffeence acoss multiple samples. It anges fom zeo fo egions diffeentially methylated in a single sample with the biggest ange to log w H Q 2 log2 Q P (9) 1 fo egions with unifom modification level in all samples consideed. The maximum value of HQ depends on the numbe of samples and value. Supplementay Text S3. Identification of diffeential egions by theshold Based on the quantitative modification diffeence DEMRs can be identified if a theshold can be appopiately defined. In this study we detemined the theshold fo DEMRs fom the modification pobability model as Schug et al. did in selecting tissue-specifically expessed genes fom gene expession pofiles (2). The andom biological vaiability among samples was modeled based on the assumption that each egion exhibits an aveage modification level acoss all samples. Compaed with Schug s method thee wee two majo diffeences in this method. Fistly the entopy in cuent wok is independent of the aveage modification acoss all samples because it is deived fom the modification value pocessed by T b. Theefoe the biological vaiability modeled in this 1 appoach exhibited the aveage modification level Mean ( MAX MI) acoss all samples. 2 Secondly the fold change between sample-dependent diffeence fom the aveage level and the ms Mean theoetical maximum ange of modification was defined as. It was assumed in this MAX MI study that the fold change follows a nomal distibution with mean equal to zeo and some unknown but small standad deviation (SD) (Supplementay Figue S1). Thus SD can be used to indicate the degee of the biological vaiation. If SD equals to zeo the modification levels in all samples will be the same and equal to the Mean. The lage the SD is the geate the modification diffeence acoss multiple samples is. It is noted hee that diffeent data have diffeent Data chaacteistics. Fo example the ange of most DA methylation data is fom 0 to 1 while

chomatin modification and gene expession levels ae positive float numbes. Thus based on the statistical pinciple of nomal distibution and a lage numbe of tests we ecommend SD=0.07 fo DMR theshold and SD=0.1 fo chomatin modification data and gene expession data. In addition uses can define the theshold by themselves based the open souce code of EpiDiff if the ecommended values do not fill thei specific biological poblems. Take the detemination of DMR theshold fo 16 samples as an example. In total 80 000 (5000 16) andom values wee geneated fom the nomal distibution model with mean=0 and 1 SD=0.07. The aveage methylation level is 0.5( Mean ( MAX MI) MAX=1and MI=0). 2 And 5000 unifomly methylated egions acoss 16 samples wee modeled. Then entopy fo each of these egions was calculated. The value at p = 0.05 (one-sided) fom the distibution of 5000 entopies which was nomal was detemined as the theshold H value. This pocess was epeated 10 times and theefoe 10 Hs with mean (SD) equals to 5.326 (0.022) wee poduced. This mean was detemined as the theshold H than H DMR DMR fo DMR identification. Regions with entopy that is lowe ae defined as DMRs while emaining egions ae not diffeentially methylated egions (-DMRs). With this method the H DMR thesholds wee poduced fo samples that vay in numbe fom 2 to 100 and embed in the EpiDiff softwae. It is noted hee that the theshold fo chomatin modification and gene expession data is infeed by the data submitted by uses accoding to the desciption above. H Q Supplementay Text S4. Measuement of sample specificity fo diffeential egions Based on Shannon entopy theoy the incease of vaiable numbe would educe uncetainty while significant changes in the individual vaiables would esult in a substantial incease of uncetainty. The sample-specific modification levels wee consideed as the main individual factos that detemine the modification diffeences acoss samples. Fo the egion the entopy H the modification diffeence acoss all samples. Fo each sample s the entopy H / Q epesents Qs fo the modification diffeence acoss the samples that do not include sample s can also be calculated. Thus the contibution of sample s to the whole modification diffeence can be eflected by the entopy diffeence between H Q and H Qs / which was defined as H/ s HQ/ s H Q. (11) H / s When egion is specifically methylated in sample s H / s is geate than 0. To futhe identify hypemodification o hypomodification in a egion the categoical sample-specificity ( CS / s ) was

pesented as CS s / Hs / signs Hs / 0 (12) 0 H s / 0 whee sign s was the sign of the diffeence between modification level ms in sample s and the median modification level of vecto m in egion. Thus the absolute value of CS is then / s CS / s associated with H and the sign of is the same as sign s. When value in the sample s is vey close to the med ian CS/ s equals to zeo. Specific hype-modification in sample s will have H and since signs 0 / s 0 socs/ s 0. CS/ seaches its maximum when a egion is elatively high-modified in the sample s and deceases as eithe the numbe of samples high- modified in the egion inceases o as the elative contibution of sample s to the egion s oveall H s patten deceases. Similaly specific hypo-modification in sample s w ill have / / s 0 and since sign s 0 socs / s 0. CS/ seaches its minimum when a egion is elatively low- modified in the sample s and inceases as eithe the numbe of samples low- modified in the egion inceases o as the elative contibution of sample s to the egion s oveall patten deceases. REFERECES 1. Wang Z. Zang C. Rosenfeld J.A. Schones D.E. Baski A. Cuddapah S. Cui K. Roh T.Y. Peng W. Zhang M.Q. et al. (2008) Combinatoial pattens of histone acetylations and methylations in the human genome. at Genet 40 897-903. 2. Schug J. Schulle W.P. Kappen C. Salbaum J.M. Bucan M. and Stoecket C.J. J. (2005) Pomote featues elated to tissue specificity as measued by Shannon entopy. Genome Biol 6 R33. 3. Kadota K. Ye J. akai Y. Teada T. and Shimizu K. (2006) ROKU: a novel method fo identification of tissue-specific genes. BMC Bioinfomatics 7 294. 4. Hubbell E. Liu W.M. and Mei R. (2002) Robust estimatos fo expession analysis. Bioinfomatics 18 1585-1592.

Supplementay Figue S1. Genome annotations of egions in UCSC Genome Bowse ch6 (p25.3) 6p22.3 21.1 p12.3 12.1 6q12 6q13 6q14.1 6q15 q16.1 16.3 6q21 q22.31 25.3 q26 6q27 GM12878 CTCF S Scale ch6: AX747750 AK092822 AX747750 CpG: 40 RepeatMaske GM128 H3K4me1 S GM128 H3K4me3 S GM128 H3K27me3 S GM128 H3K36me3 S K562 CTCF S K562 H3K4me1 S K562 H3K4me3 S K562 H3K27ac S K562 H3K27me3 S K562 H3K36me3 S V$ZIC3_01 V$FREAC3_01 CHR6_M0005_R1 10 _ 500 bases hg18 655700 655750 655800 655850 655900 655950 656000 656050 656100 656150 656200 656250 656300 656350 656400 656450 656500 656550 656600 UCSC Genes Based on RefSeq UniPot GenBank CCDS and Compaative Genomics RefSeq Genes Human mras fom GenBank CpG Islands (Islands < 300 Bases ae Light Geen) Repeating Elements by RepeatMaske Chomosome Bands Localized by FISH Mapping Clones 6p25.3 C/D and H/ACA Box snoras scaras and micoras fom snorabase and mirbase ECODE Histone Mods Boad ChIP-seq Peaks (CTCF GM12878) ECODE Histone Mods Boad ChIP-seq Signal (CTCF GM12878) ECODE Histone Mods Boad ChIP-seq Peaks (H3K4me1 GM12878) ECODE Histone Mods Boad ChIP-seq Signal (H3K4me1 GM12878) ECODE Histone Mods Boad ChIP-seq Peaks (H3K4me3 GM12878) ECODE Histone Mods Boad ChIP-seq Signal (H3K4me3 GM12878) ECODE Histone Mods Boad ChIP-seq Peaks (H3K27ac GM12878) ECODE Histone Mods Boad ChIP-seq Signal (H3K27me3 GM12878) ECODE Histone Mods Boad ChIP-seq Peaks (H3K36me3 GM12878) ECODE Histone Mods Boad ChIP-seq Signal (H3K36me3 GM12878) ECODE Histone Mods Boad ChIP-seq Peaks (CTCF K562) ECODE Histone Mods Boad ChIP-seq Signal (CTCF K562) ECODE Histone Mods Boad ChIP-seq Peaks (H3K4me1 K562) ECODE Histone Mods Boad ChIP-seq Signal (H3K4me1 K562) ECODE Histone Mods Boad ChIP-seq Peaks (H3K4me3 K562) ECODE Histone Mods Boad ChIP-seq Signal (H3K4me3 K562) ECODE Histone Mods Boad ChIP-seq Peaks (H3K27ac K562) ECODE Histone Mods Boad ChIP-seq Signal (H3K27ac K562) ECODE Histone Mods Boad ChIP-seq Peaks (H3K27me3 K562) ECODE Histone Mods Boad ChIP-seq Signal (H3K27me3 K562) ECODE Histone Mods Boad ChIP-seq Peaks (H3K36me3 K562) ECODE Histone Mods Boad ChIP-seq Signal (H3K36me3 K562) HMR Conseved Tansciption Facto Binding Sites SwitchGea Genomics Tansciption Stat Sites UW Pedicted ucleosome Occupancy - A375 ucl Occ: A375 0 - -10 _ 10 _ UW Pedicted ucleosome Occupancy - Dennis ucl Occ: Dennis 0 - -10 _ 10 _ UW Pedicted ucleosome Occupancy - MEC ucl Occ: MEC 0 - -10 _ This figue shows the genome anotations about the most diffeential chomatin modification egion acoss ten histone modifications shown in the Figue 3 in the EpiDiff pape.

Supplementay Figue S2. Distibution of diffeential egions on chomosomes In this figue the visualization module shows the distibution of diffeentially methylated egions acoss 16 tissues/cells on chomosomes.