PCA vs. Vamax otaton The goal of the otaton/tansfomaton n PCA s to maxmze the vaance of the new SNP (egensnp), whle mnmzng the vaance aound the egensnp. Theefoe the dffeence between the vaances captued n each egensnp s maxmzed. The constant, Γ' ΛΓ s dagonal, on the coeffcents of ognal SNPs and egensnps s a mathematcal convenence to make the coeffcents unque; howeve, t can complcate the poblem of ntepetaton. (See the scatte plot (fgue 1) of the coeffcents (table 2) fom the dataset (table 1) whee each SNP s epesented as a pont n the fst 2 dmensons of the egenspace.) The ntepetaton of the coeffcents s the most staghtfowad f each SNP s coelated hghly on at most one egensnp, and f all the coeffcent ae ethe lage o nea zeo, wth few ntemedate values. The SNPs ae then splt nto dsjont sets, each of whch s assocated wth one egensnp, pehaps some SNPs ae left ove. To acheve ths clea patten of coeffcents, we could otate the axes defned by PCA n any decton wthout changng the elatve locatons of the ponts to each othe n evey two dmensons; but the actual coodnates of the ponts would change. The otated solutons spam n the same geometc space as the ognal solutons and explan the same amount of vaance n the data as the ognal soluton, howeve the dffeence of the vaances captued n the otated axes s no longe maxmzed. Thee ae seveal analytcal choces of otaton that have been poposed n the past. One of them s the vamax method of othogonal otaton. The vamax otaton cteon maxmzes the sum of the vaances of the squaed coeffcents wthn each egenvecto, and the otated axes eman othogonal. Fgue 2 demonstates the otated soluton (table 3) afte a vamax otaton. Afte the coodnate axes ae otated clockwse by an angle about 45 degees, we obtan a clea patten of SNPs coespondng to otated egensnps. In ths smple example the oveall ntepetaton s the same whethe we otate the axes o no, but n moe complcated stuatons we could beneft moe. Zhen Ln zln@sm.stanfod.edu 1
Tables: snp1 snp2 snp3 snp4 ch1 1 1 0 1 ch2 1 0 1 1 ch3 0 0 0 0 ch4 0 1 0 1 ch5 1 0 0 0 Table 1. a small dataset of 4 SNPs fom 5 chomosomes Vaables E1 E2 Snp1 0.0978322 0.5690011 Snp2 0.647965-0.40432 Snp3 0.1198195 0.6968811 Snp4 0.745972 0.164681 Table 2. patal PCA esults (unotated) Vaables E1 E2 Snp1 0.00012428 0.5773503 Snp2 0.70744623-0.288827 Snp3 0.00015222 0.7071068 Snp4 0.70716891 0.2885229 Table 3. otated soluton Zhen Ln zln@sm.stanfod.edu 2
Fgues: Fgue 1. scatte plot of SNPs n the othogonal space (unotated) Fgue 2. scatte plot of SNPs n the otated othogonal space Zhen Ln zln@sm.stanfod.edu 3
R souce code: data <- c(1,1,0,0,1,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0) dm(data) <- c(5,4) snploadngs <- loadngs(pncomp(data, co=t)) plot(snploadngs[,1:2]) otated <- vamax(snploadngs[,1:2])$loadngs plot(otated) Zhen Ln zln@sm.stanfod.edu 4
An example of fndng htsnps fom a small SNP dataset usng the vamax otaton method. We stat wth a SNP dataset: SNP1 SNP2 SNP3 SNP4 SNP5 Chomosome1 1 1 1 0 1 Chomosome2 1 1 0 0 0 Chomosome3 0 0 0 0 1 Chomosome4 0 0 1 1 0 PCA esults, unotated, ae: e 1 e 2 e 3 e 4 e 5 SNP1-0.5532-0.3854 0-0.7385 0 SNP2-0.5532-0.3854 0 0.6155 0.4082 SNP3 0.2025-0.5265-0.7071 0.1231-0.4082 SNP4 0.5532-0.3854 0-0.2132 0.7071 SNP5-0.2025 0.5265-0.7071-0.1231 0.4082 PCA esults upon vamax oaton (Mada et al. 1979; Dunteman 1989) ae: e 1 e 2 SNP1 0 0 0-1 0 SNP2-1 0 0 0 0 SNP3 0-0.7071-0.7071 0 0 SNP4 0 0 0 0 1 SNP5 0 0.7071-0.7071 0 0 e 3 We compae the aveage coeffcent fo all k egensnps ( Γ ) to the one fo the est of (pk) egensnps ( γ ) fo each SNP; and select the SNP f Γ > γ, whch ndcates that ths SNP contbutes mostly to the k egensnp (Meng et al. 2003). Suppose k = 2, htsnp selectons ae: Γ γ htsnp SNP1 0 0.5 N SNP2 0.5 0 Y SNP3 0.3536 0.2357 Y SNP4 0 0.3333 N SNP5 0.3536 0.2357 Y e 4 e 5 Refeences: Dunteman GH (1989) Pncpal components analyss. Sage Publcatons, Newbuy Pak Mada KV, Kent JT, Bbby JM (1979) Multvaate analyss. Academc Pess, London ; New Yok Meng Z, Zaykn DV, Xu CF, Wagne M, Ehm MG (2003) Selecton of genetc makes fo assocaton analyses, usng lnkage dsequlbum and haplotypes. Am J Hum Genet 73:115-130 Zhen Ln zln@sm.stanfod.edu 5