Supplementay Mateial fo EpiDiff Supplementay Text S1. Pocessing of aw chomatin modification data In ode to obtain the chomatin modification levels in each of the egions submitted by the use QDCMR module povides a pocessing pipeline fo nomalization of chomatin modification density. The chomatin modification eads in each data file (i ) ae mapped to each egion ( ). Howeve the ead count ( Readcount _ i _ ) can t epesent the eal chomatin modification density due to the impact of diffeent egion length ( Length ) and total ead numbe of the data file ( Fileeads ). Thus the ead count is nomalized by the total numbe of bases in the egion and the total ead numbe of the data file as Wang et al. did in thei wok ( 1) to obtain the nomalized chomatin modification density ( CMD ). Howeve the nomalized chomatin modification density obtained by this method is minimal floating numbe close to 0 which is difficult fo mathematical calculations and vaiance analysis. In ode to ovecome this shotcoming of the method hee we popose a new nomalization algoithm. Fistly the total ead numbe Fileeads of each ChIP-Seq data file is counted. Then the mean ead numbe of all ChIP-Seq data files is calculated Fileeads _ i as i1 (1) MeanFileeads whee is the numbe of ChIP-Seq data files. Fo each ChIP-Seq data filei a weight is defined Fileeads _ i as ChIPSeqweight _ i. (2) MeanFileeads As a esult the nomalized density of chomatin modification i in egion is obtained as Readcount _ i _ CMD _ i _ ChIPSeqweight _ i. (3) Length _ The CMDs of multiple chomatin modifications in multiple egions ae used fo futhe analysis in QDCMR. Supplementay Text S2. Quantification of epigenetic diffeence by entopy In ode to quantify modification diffeence acoss samples we poposed a new method based on Shannon entopy. Although entopy has been used peviously to identify tissue-specific genes fom gene expession data (2) we fistly apply entopy to quantify epigenetic modification diffeence. The modification vecto m of egion acoss samples was defined
as m ( m1 m2 m s m ) whee ms epesents the modification level in sample s. In ode to equally quantify the modification diffeence of the egions with hype- o hypomodification in mino samples we calculated a one-step Tukey's biweight ( ) fo egion as Kadota et al. did in the development of ROKU method ( 3). One-step Tukey biweight povides a obust weighted mean that is elatively insensitive to outlies (4). The median M fo modification levels in samples of egion was fist computed. Then the median absolute distance ( MAD ) fo each fom the median was calculated as MAD s ms M. Thidly the median of the absolute distance ( S ) fom M was detemined. Fo each sample s a unifom measue of distance fom the cente was defined asu s ms M (4) cs whee c is a tuning constant (default T b s m s c =5) and is a vey small value used to avoid zeo values fom happening in the denominato (default =0.0001). A weight in each sample was then calculated by the bisquae function: 2 2 (1 us ) us 1 wu ( s ) 0 us 1. (5) Fo each sample s the weight was educed by a function of its distance fom the median M. Thus outlies can be effectively discounted by a smooth function. When modification levels ae vey fa fom the median thei weights ae educed to zeo. Finally the one-step Tukey's biweight ( wu ( s ) ms s1 egion was calculated ast b. (6) wu ( ) s s1 The pocessed modification level m' fo sample s then can be calculated by using T (a weighted s mean) as m m T. (7) ' s s b m' s s1 The sum of modification levels of egion in samples ( ) was teated as a total modification value. The atio of modification level of egion in sample s elative to the total value was defined as the elative modification pobability ps/ m' s / m' s which was then used to s1 calculate the egion s entopy as H p / lo g ( 2 p ). (8) s1 s s/ b T b ) fo Consideing of the ange of vaiation of the modification data the entopy fo each egion was
adjusted by a modification weight which was defined as w max( m ) min( m ) s s log 2( ) MAX MI whee max( ) and min( m ) wee the max and min modification level of egion in all samples m s s espectively and the MAX and MI wee defined as the highest modification level and the lowest modification level espectively and is a small value used to avoid zeo values in the logaithm (default =0.0001). Then the entopy calculated by pocessed modification vecto was adjusted by weight as H H w (10) whee epesents the extent of modification diffeence acoss multiple samples. It anges fom zeo fo egions diffeentially methylated in a single sample with the biggest ange to log w H Q 2 log2 Q P (9) 1 fo egions with unifom modification level in all samples consideed. The maximum value of HQ depends on the numbe of samples and value. Supplementay Text S3. Identification of diffeential egions by theshold Based on the quantitative modification diffeence DEMRs can be identified if a theshold can be appopiately defined. In this study we detemined the theshold fo DEMRs fom the modification pobability model as Schug et al. did in selecting tissue-specifically expessed genes fom gene expession pofiles (2). The andom biological vaiability among samples was modeled based on the assumption that each egion exhibits an aveage modification level acoss all samples. Compaed with Schug s method thee wee two majo diffeences in this method. Fistly the entopy in cuent wok is independent of the aveage modification acoss all samples because it is deived fom the modification value pocessed by T b. Theefoe the biological vaiability modeled in this 1 appoach exhibited the aveage modification level Mean ( MAX MI) acoss all samples. 2 Secondly the fold change between sample-dependent diffeence fom the aveage level and the ms Mean theoetical maximum ange of modification was defined as. It was assumed in this MAX MI study that the fold change follows a nomal distibution with mean equal to zeo and some unknown but small standad deviation (SD) (Supplementay Figue S1). Thus SD can be used to indicate the degee of the biological vaiation. If SD equals to zeo the modification levels in all samples will be the same and equal to the Mean. The lage the SD is the geate the modification diffeence acoss multiple samples is. It is noted hee that diffeent data have diffeent Data chaacteistics. Fo example the ange of most DA methylation data is fom 0 to 1 while
chomatin modification and gene expession levels ae positive float numbes. Thus based on the statistical pinciple of nomal distibution and a lage numbe of tests we ecommend SD=0.07 fo DMR theshold and SD=0.1 fo chomatin modification data and gene expession data. In addition uses can define the theshold by themselves based the open souce code of EpiDiff if the ecommended values do not fill thei specific biological poblems. Take the detemination of DMR theshold fo 16 samples as an example. In total 80 000 (5000 16) andom values wee geneated fom the nomal distibution model with mean=0 and 1 SD=0.07. The aveage methylation level is 0.5( Mean ( MAX MI) MAX=1and MI=0). 2 And 5000 unifomly methylated egions acoss 16 samples wee modeled. Then entopy fo each of these egions was calculated. The value at p = 0.05 (one-sided) fom the distibution of 5000 entopies which was nomal was detemined as the theshold H value. This pocess was epeated 10 times and theefoe 10 Hs with mean (SD) equals to 5.326 (0.022) wee poduced. This mean was detemined as the theshold H than H DMR DMR fo DMR identification. Regions with entopy that is lowe ae defined as DMRs while emaining egions ae not diffeentially methylated egions (-DMRs). With this method the H DMR thesholds wee poduced fo samples that vay in numbe fom 2 to 100 and embed in the EpiDiff softwae. It is noted hee that the theshold fo chomatin modification and gene expession data is infeed by the data submitted by uses accoding to the desciption above. H Q Supplementay Text S4. Measuement of sample specificity fo diffeential egions Based on Shannon entopy theoy the incease of vaiable numbe would educe uncetainty while significant changes in the individual vaiables would esult in a substantial incease of uncetainty. The sample-specific modification levels wee consideed as the main individual factos that detemine the modification diffeences acoss samples. Fo the egion the entopy H the modification diffeence acoss all samples. Fo each sample s the entopy H / Q epesents Qs fo the modification diffeence acoss the samples that do not include sample s can also be calculated. Thus the contibution of sample s to the whole modification diffeence can be eflected by the entopy diffeence between H Q and H Qs / which was defined as H/ s HQ/ s H Q. (11) H / s When egion is specifically methylated in sample s H / s is geate than 0. To futhe identify hypemodification o hypomodification in a egion the categoical sample-specificity ( CS / s ) was
pesented as CS s / Hs / signs Hs / 0 (12) 0 H s / 0 whee sign s was the sign of the diffeence between modification level ms in sample s and the median modification level of vecto m in egion. Thus the absolute value of CS is then / s CS / s associated with H and the sign of is the same as sign s. When value in the sample s is vey close to the med ian CS/ s equals to zeo. Specific hype-modification in sample s will have H and since signs 0 / s 0 socs/ s 0. CS/ seaches its maximum when a egion is elatively high-modified in the sample s and deceases as eithe the numbe of samples high- modified in the egion inceases o as the elative contibution of sample s to the egion s oveall H s patten deceases. Similaly specific hypo-modification in sample s w ill have / / s 0 and since sign s 0 socs / s 0. CS/ seaches its minimum when a egion is elatively low- modified in the sample s and inceases as eithe the numbe of samples low- modified in the egion inceases o as the elative contibution of sample s to the egion s oveall patten deceases. REFERECES 1. Wang Z. Zang C. Rosenfeld J.A. Schones D.E. Baski A. Cuddapah S. Cui K. Roh T.Y. Peng W. Zhang M.Q. et al. (2008) Combinatoial pattens of histone acetylations and methylations in the human genome. at Genet 40 897-903. 2. Schug J. Schulle W.P. Kappen C. Salbaum J.M. Bucan M. and Stoecket C.J. J. (2005) Pomote featues elated to tissue specificity as measued by Shannon entopy. Genome Biol 6 R33. 3. Kadota K. Ye J. akai Y. Teada T. and Shimizu K. (2006) ROKU: a novel method fo identification of tissue-specific genes. BMC Bioinfomatics 7 294. 4. Hubbell E. Liu W.M. and Mei R. (2002) Robust estimatos fo expession analysis. Bioinfomatics 18 1585-1592.
Supplementay Figue S1. Genome annotations of egions in UCSC Genome Bowse ch6 (p25.3) 6p22.3 21.1 p12.3 12.1 6q12 6q13 6q14.1 6q15 q16.1 16.3 6q21 q22.31 25.3 q26 6q27 GM12878 CTCF S Scale ch6: AX747750 AK092822 AX747750 CpG: 40 RepeatMaske GM128 H3K4me1 S GM128 H3K4me3 S GM128 H3K27me3 S GM128 H3K36me3 S K562 CTCF S K562 H3K4me1 S K562 H3K4me3 S K562 H3K27ac S K562 H3K27me3 S K562 H3K36me3 S V$ZIC3_01 V$FREAC3_01 CHR6_M0005_R1 10 _ 500 bases hg18 655700 655750 655800 655850 655900 655950 656000 656050 656100 656150 656200 656250 656300 656350 656400 656450 656500 656550 656600 UCSC Genes Based on RefSeq UniPot GenBank CCDS and Compaative Genomics RefSeq Genes Human mras fom GenBank CpG Islands (Islands < 300 Bases ae Light Geen) Repeating Elements by RepeatMaske Chomosome Bands Localized by FISH Mapping Clones 6p25.3 C/D and H/ACA Box snoras scaras and micoras fom snorabase and mirbase ECODE Histone Mods Boad ChIP-seq Peaks (CTCF GM12878) ECODE Histone Mods Boad ChIP-seq Signal (CTCF GM12878) ECODE Histone Mods Boad ChIP-seq Peaks (H3K4me1 GM12878) ECODE Histone Mods Boad ChIP-seq Signal (H3K4me1 GM12878) ECODE Histone Mods Boad ChIP-seq Peaks (H3K4me3 GM12878) ECODE Histone Mods Boad ChIP-seq Signal (H3K4me3 GM12878) ECODE Histone Mods Boad ChIP-seq Peaks (H3K27ac GM12878) ECODE Histone Mods Boad ChIP-seq Signal (H3K27me3 GM12878) ECODE Histone Mods Boad ChIP-seq Peaks (H3K36me3 GM12878) ECODE Histone Mods Boad ChIP-seq Signal (H3K36me3 GM12878) ECODE Histone Mods Boad ChIP-seq Peaks (CTCF K562) ECODE Histone Mods Boad ChIP-seq Signal (CTCF K562) ECODE Histone Mods Boad ChIP-seq Peaks (H3K4me1 K562) ECODE Histone Mods Boad ChIP-seq Signal (H3K4me1 K562) ECODE Histone Mods Boad ChIP-seq Peaks (H3K4me3 K562) ECODE Histone Mods Boad ChIP-seq Signal (H3K4me3 K562) ECODE Histone Mods Boad ChIP-seq Peaks (H3K27ac K562) ECODE Histone Mods Boad ChIP-seq Signal (H3K27ac K562) ECODE Histone Mods Boad ChIP-seq Peaks (H3K27me3 K562) ECODE Histone Mods Boad ChIP-seq Signal (H3K27me3 K562) ECODE Histone Mods Boad ChIP-seq Peaks (H3K36me3 K562) ECODE Histone Mods Boad ChIP-seq Signal (H3K36me3 K562) HMR Conseved Tansciption Facto Binding Sites SwitchGea Genomics Tansciption Stat Sites UW Pedicted ucleosome Occupancy - A375 ucl Occ: A375 0 - -10 _ 10 _ UW Pedicted ucleosome Occupancy - Dennis ucl Occ: Dennis 0 - -10 _ 10 _ UW Pedicted ucleosome Occupancy - MEC ucl Occ: MEC 0 - -10 _ This figue shows the genome anotations about the most diffeential chomatin modification egion acoss ten histone modifications shown in the Figue 3 in the EpiDiff pape.
Supplementay Figue S2. Distibution of diffeential egions on chomosomes In this figue the visualization module shows the distibution of diffeentially methylated egions acoss 16 tissues/cells on chomosomes.