arxiv: v1 [math.st] 23 Nov 2014
|
|
|
- Jasper Toby McGee
- 9 years ago
- Views:
Transcription
1 On the High-dimensional Power of Linear-time Kernel Two-Sample Testing nder Mean-difference Alternaties arxi:4.634 [math.st] 3 No 04 Aaditya Ramdas [email protected] Aarti Singh [email protected] Sashank J. Reddi [email protected] Larry Wasserman [email protected] Barnabás Póczós [email protected] Department of Statistics and Machine Learning Department Carnegie Mellon Uniersity Noember 5, 04 Abstract Nonparametric two sample testing deals with the qestion of consistently deciding if two distribtions are different, gien samples from both, withot making any parametric assmptions abot the form of the distribtions. The crrent literatre is split into two kinds of tests - those which are consistent withot any assmptions abot how the distribtions may differ general alternaties, and those which are designed to specifically test easier alternaties, like a difference in means mean-shift alternaties. The main contribtion of this paper is to explicitly characterize the power of a poplar nonparametric two sample test, designed for general alternaties, nder a mean-shift alternatie in the high-dimensional setting. Specifically, we explicitly derie the power of the linear-time Maximm Mean Discrepancy statistic sing the Gassian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distribtions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test s power goes to one if the nmber of samples increases faster than the dimension increases. This is the first explicit power deriation for a general nonparametric test in the high-dimensional setting, and also the first analysis of how tests designed for general alternaties perform when faced with easier ones. Introdction The central topic of this paper is nonparametric two-sample testing, in which we try to detect a difference between two d-dimensional distribtions P and Q based on n samples from both, i.e. deciding whether two samples are drawn from the same distribtion. We will be concerned with the following two settings, the first of which deals with general alternaties GA, i.e. Both stdent athors had eqal contribtion. H 0 : P Q.s. H : P Q. GA
2 It is called nonparametric two-sample testing becase no parametric assmptions are made abot the form of P, Q like Gassianity or exponential families. We se the term general alternaties to mean that the difference between P, Q need not hae a simple form. In contrast, the second setting that we are concerned abot deals with mean-shift alternaties MSA, i.e. H 0 : µ P µ Q.s. H : µ P µ Q MSA where µ P E X P [X] and µ Q E Y Q [Y ]. It is still nonparametric two-sample testing, since we make no assmptions abot P, Q, bt deals with easier alternaties, meaning that we specify the exact form in which P and Q differ, i.e. they differ in their means. Parametric two-sample testing for example, when P, Q are Gassian is also important, bt will be ot of the scope of or discssion; see Lopes et al. 0 for a recent example. We assme eqal nmber n of samples for simplicity; or reslts wold also go throgh if n /n n c 0, as n, n.. Hypothesis testing terminology Let X n {x,..., x n } P and Y n {y,..., y n } Q be the two sets of samples, where x i, y j R d for all i, j n. A test is any fnction or algorithm that takes X n, Y n as inpt, and otpts {0, } where is interpreted to mean that it rejects the nll hypothesis H 0, and 0 is interpreted to mean that there is insfficient eidence to reject H 0. A test is characterized by its false positie rate or type- error and its false negatie rate or type- error α P rejecting H 0 H 0 is tre β P not rejecting H 0 H is tre. There is sally a tradeoff inoled - decreasing one error rate increases the other. Hence, one sometimes fixes α to some small ale say 0.0, and refers to φ β as the power of the test at α 0.0. A test is classically called consistent if for any fixed α, the power φ as n wheneer H 0 is false. Many tests in the literatre, inclding the ones we will consider, calclate a test statistic T as a fnction of X n, Y n, and reject the nll hypothesis if T > c α, where the threshold c α depends on the distribtion of T nder H 0 and on a pre-defined α. See Lehmann & Romano 006 for a detailed introdction.. Motiation Or first motiation comes from the fact that there is a big difference between the classical setting of fixing d while letting n, and the high-dimensional HD setting obtained when n, d HD A test wold be called consistent nder HD if for any fixed α, the power φ as n, d wheneer H 0 is false. It is of ital importance, both theoretically and practically, to nderstand the power of tests in sch settings, and to characterize the rate at which n mst grow as a fnction of d so that the test is still consistent. While classical tests were proposed for the low-dimensional settings, oer the past two decades seeral tests hae been proposed specifically for MSA and stdied in the HD setting; see Sbsection.3. Howeer, to the best of or knowledge there has been no formal and precise characterization of power of tests designed for GA in high dimensions. Or second motiation comes from the obseration that there is no literatre on how tests designed for GA perform nder MSA. In other words, while it is expected that tests designed for MSA will not be consistent against more general GA, it is nclear how exactly tests designed for general alternaties fare when when faced with a mean-shift alternatie.
3 .3 Related Work MSA It is well known see Kariya 98; Simaika 94; Anderson 958; Salaeskii 97 that if P, Q are Gassians, then the niformly most powerfl test in the fixed-dimension setting nder fairly general conditions, is the T-test by Hotelling 93 : T H : m P m Q T S m P m Q where m P, m Q and S are the sal empirical estimators of µ P, µ Q and the joint coariance matrix Σ. In a seminal paper, Bai & Saranadasa 996, showed that in the high-dimensional setting, the T-test performs qite poorly specifically when n, d with d/n ɛ for small ɛ. This is intitiely becase of the difficlty of estimating the Od parameters of Σ with ery few samples. Indeed, S is not een defined when d > n and is poorly conditioned when d is of similar order as n. To aoid this problem, they proposed to se the test statistic T BS : m P m Q trs/n T BS has non-triial power when d/n c 0,. Sriastaa & D 008 proposed to instead se diags instead of S in T H, and showed its adantages in certain settings oer T BS. More recently, Chen & Qin 00, henceforth called CQ, proposed a slight ariant of T BS, which is a U-statistic of the form n T CQ : x T i x j yi T y j n nn n x T i y j that achiees the same power withot explicit restrictions on d, n, bt rather in terms of conditions stated in terms of n, trσ, µ P µ Q. The settings of nder which these arios statistics are consistent, or achiee non-triial power, are slightly complicated to describe, and the reader is referred to their papers for details..4 Related Work GA There are many nonparametric test statistics for two-sample testing. One of the most poplar tests is the kernel Maximm Mean Discrepancy, henceforth called MMD, proposed in Gretton et al. 0. While the technical details of the kernel literatre are nnecessary for the prposes of this paper, it sffices to say that the poplation statistic is MMD : max E P fx E Q fy f H where H is a Reprodcing Kernel Hilbert Space and f H is its nit norm ball. There are two related sample statistics, both of which can be shown to be nbiased estimators of MMD. The first is a U-statistic MMD The second is a linear-time statistic nn nn n kx i, x j n ky i, y j n i,j n i,j kx i, y j MMD l n/ [kx i, x i ky i, y i n/ i kx i, y i ky i, x i ] 3
4 Note that T CQ is jst MMD nder the linear kernel kx, y x T y. It is known that in the fixed d setting, the power of both MMD l and MMD approaches at the rate of Φ n where Φ is the standard normal cdf, see Gretton et al. 0. Howeer, nothing is formally known when d cold be increasing with n. A recent related manscript by Reddi et al. 04 condcts detailed experiments that demonstrate that in the fixed n, increasing d setting, the power of MMD and distance correlation decay polynomially in high dimensions against fair alternaties. While the athors proide some initial insights into this phenomenon for specific examples, there is still no theoretical analysis of the power of MMD or any statistic designed for GA against MSA or GA or any other set of alternaties, in the high dimensional setting. Another statistic called Energy Distance by Székely & Rizzo 004 is closely tied to the MMD - indeed it has the same form as the MMD with the Eclidean distance instead of a kernel; Lyons 03 showed that one can also se other metrics instead of the Eclidean distance and Sejdinoic et al. 03 showed that there is a close tie between metrics and kernels for these problems. There has been an initial attempt to characterize some properties of distance correlation which is a related statistic for the related problem of independence testing in high dimensions in Székely & Rizzo 03, bt no analysis of power is aailable or easily deriable. There also exist many other tests nder GA like the cross-match test by Rosenbam 005, bt none of them hae been analyzed nder HD. Power of MMD l fixed dimension Let s first reiew the basic argment from Gretton et al. 0 showing the power in the fixed dimensional setting. It will then become clear what the main difficlties are in establishing reslts in the high-dimensional setting. The main tool needed is a simple conergence reslt of the sample statistic to the poplation qantity. It becomes conenient to introdce the notation z i x i, y i and h ij hz i, z j where Then we can rewrite or test statistic as h ij : kx i, x j ky i, y j kx i, y j kx j, y i. MMD l n/ hz i, z i. n/ i Its expectation is E z,z hz, z MMD and then Corollary 6 of Gretton et al. 0 states that nder both H 0 and H, we hae F : nmmd l MMD V N0, 3 where V Var z,z hz, z and means conergence in distribtion as n. Note that V is a constant independent of n, and so there exists a constant z α sch that P Z > z α α when Z N0,. Then, the corresponding test rejects H 0 wheneer Test-MMD l : nmmd l > z α 4 where is twice the empirical ariance of hz, z. If Pr denotes the probability nder H, the power 4
5 of this test is gien by nmmd Pr l > z α Pr F > n Pr Z > z α V z α nmmd V nmmd V nmmd Φ z α V nmmd Φ z α V where Φ is the standard normal cdf. This behaes like Φ n since the poplation MMD and V are constants that are both independent of n.. The challenges in high dimensions There are seeral significant difficlties in lifting this argment to the high-dimensional setting. C. The poplation MMD depends on dimension ia the signal strength and bandwidth, as we later show, and one needs to explicitly accont for this. C. The ariance V also depends on dimension and the signal strength, and the bandwidth, as we later show, and again one needs to explicitly track this, especially its dependence on dimension. C3. In the increasing d, n setting, the limiting distribtion is no longer triially normal, and one needs to establish conditions nder which it is indeed normal - the most important qestion being if the rate of conergence to normality depends on d. C4. In the increasing d, n setting, one needs to characterize the rate at which /V still tends to, so that V z α conerges to z α - since, V depend on d, the key qestion is again whether the rate of conergence depends on d or not. We will hae to accont for each of these challenges explicitly, as we shall see in later sections. Let s first smmarize and discss or assmptions and contribtions before we dele into the technical details. 3 Assmptions and Contribtions We are now in a position to clearly state or contribtions. We focs on analyzing the power of MMD l in the high-dimensional setting when n, d for the Gassian kernel with bandwidth γ, i.e. kx, y exp x y γ, in the mean-shift setting when P and Q differ in their means. Let s first otline or assmptions below; note that we comment abot these assmptions in the next sbsection. A. x i Us i µ P and y i Ut i µ Q, where, s i, t i are i.i.d random ectors for i {,..., n}, each haing d i.i.d. zero-mean coordinates.and U corresponds to a d d orthogonal rotation i.e. UU T I. 5
6 A. The k-th central moments of each i.i.d. coordinate of s, t exist for k 6. Note that the coordinates of x, y need not be independent and E x P [X] µ P, E y Q [Y ] µ Q. Denote δ : µ P µ Q. Denote the second, third and forth central moments of each i.i.d. coordinate of s, t by σ, µ 3, µ 4. Remember that Ehz i, z j MMD see Eq.. Denote the second, third and forth central moments of hz i, z j by V, τ 3, τ 4. Let. represent the Eclidean norm. Or main contribtion is: Theorem. For the Gassian kernel with bandwidth chosen as γ Ω d, nder assmptions A, A, with n, d at any rate, the Test-MMD l Eq. 4 has asymptotic type- error α and asymptotic power n δ β Φ 8dσ4 8σ δ z α where Φ is the cdf of a standard Normal distribtion and z α is the α qantile of the standard Normal distribtion. For finite samples, type- error behaes like α 0/ n and the power like β 0/ n. The first remarkable point abot this theorem is that the power is independent of bandwidth γ, as long as γ Ω d. Sch behaior has already been noted bt not explained in the experiments of Reddi et al. 04 and we will erify this careflly in or experiments section. While this may not hold tre for other kernels, like the Laplace kernel kx, y exp x y γ, or against more general alternaties, it is both srprising and interesting that this is the case for the Gassian kernel nder MSA. As discssed later, this theorem applies to the bandwidth chosen by the so-called median heristic; see Schölkopf & Smola 00. It implies that the median heristic proides an argably safe choice in the light of haing no frther information, and also why it works reasonably well in practice/simlations. If we consider the signal to noise ratio henceforth called SNR to be defined as Ψ : δ /σ, then focsing on the more important first term, the power behaes like Φ n Ψ 8d 8Ψ z α From this, we get the following two corollaries. The first applies to the small SNR regime which incldes the fair alternatie setting, see Reddi et al. 04 for details, and the second applies when SNR is large. Corollary. When the signal to noise ratio Ψ is small, specifically Ψ od /, the power goes to at the rate of Φ nψ / d.. Corollary. When the signal to noise ratio Ψ is large, specifically Ψ ωd /, then the power goes to at the rate of Φ nψ, independent of d. Note that the switch in behaior between the two corollaries occrs at Ψ being on the order of d /, and at this point the prediction of the two corollaries match - hence one cold se O, Ω instead of o, ω for describing growth of Ψ in both corollaries. 3. Remarks abot assmptions Assmptions A,A are general enogh for the predictions made by or theorem to be accrate and representatie of obsered behaior. We will erify the predictions of the theorem, corollaries and later lemmas in or simlations. 6
7 A. While the coordinates of x, y need not be independent, the first assmption does restrict their coariances to be σ I. We note that Székely & Rizzo 03 makes a more restrictie assmption of independent coordinates, while Assmption a in Bai & Saranadasa 996 and Eq.3. in Chen & Qin 00 assme the same model as we do bt don t reqire spherical coariance. Howeer, or assmption is trly only for mathematical conenience; if we instead had UD / in A, where D is a diagonal rescaling, all or calclations can still be carried ot, bt wold be more tedios since the coordinates of D / s are still independent bt not identically distribted, and we wold need to track σ j, µ 3j, µ 4j in Appendix Sections 3-6. A. The existence of third and forth moments is needed for calclating poplation MMD and ariance terms, as well as for the Berry-Esseen lemma to control the deiation from normality, and the conergence of to V. The existence of the sixth moment is needed to bond the Taylor expansion residal term in all or calclations. Note that CQ needs the existence of eighth moments, and BS assme the existence of forth moments see Eq. 3. in Chen & Qin 00 and Assmption a in Bai & Saranadasa Remark abot bandwidth choice Remember that the power is independent of the bandwidth γ, as long as γ Ω d. This restriction of γ Ω d is to allow s to control the residal term in the Taylor expansion of the Gassian kernel. Howeer, it is not ery restrictie, since smaller γ typically leads to worse power. Specifically, we note that the experiments in Reddi et al. 04 for mean-shift alternaties show conincingly that when γ is chosen to be a constant or d α for α < 0.5 inclding constant γ, then the power of MMD is poor, while when the highest power occrs for ales α 0.5. Hence or choice coers most reasonable choices of bandwidth. Frthermore, one of the most poplar methods for bandwidth selection is called the median heristic, see Schölkopf & Smola 00, where one chooses the bandwidth as the median of distances between all pairs of points. A simple calclation shows E x P,y Q x y σ d µ P µ Q, so generally speaking the median heristic chooses γ of the same order as σ d or larger if µ P µ Q is large. 3.3 Comparisons to CQ The assmptions in CQ, BS, SD are slightly differently stated from or reslts here. Howeer, their reslts can broadly be compared to ors. We can smmarize the most recent reslts, those of CQ, nder A and A in the following two obserations. The first obseration follows from Eq. 3. in Chen & Qin 00 which applies to the small SNR regime dictated by Eq Obseration. When the signal to noise ratio Ψ is small, specifically Ψ o d/n, the power goes to at the rate of ΦnΨ / d. We beliee there is a mistake in the deriation of Eq. 3. in Chen & Qin 00 which applies in the small SNR regime dictated by Eq We describe this in more detail in the Appendix Section, and jst smmarize the corrected reslting obseration below. Obseration. When the signal to noise ratio Ψ is large, specifically Ψ ω d/n, then the power goes to at the rate of Φ nψ, independent of d. Comparing these expressions with Corollary and, it is clear that CQ has an adantage oer MMD l in the low-snr setting. For example, when n d and the SNR Ψ is constant, the power of CQ can increase n times faster than that of MMD l bt when the SNR is ωd /, the power of both methods scales in the same fashion. This adantage for low SNR might be wiped ot by considering MMD - ascertaining if this is the case is an important direction of ftre work. The 7
8 main technical challenge is nderstanding the limiting distribtions of general degenerate U-statistics in high dimensions which in fixed dimensional setting is an infinite sm of χ s; see Serfling 009, Section We now proide the proof of Theorem and then erify all or claims in simlations, to conincingly show that these expressions are tight p to constant factors. 4 Proof of Theorem We split the proof into for sbsections, one for each of the challenges C-C4. For C and C, we need to calclate the first two moments of h, introdced in Eq., for which the main tool we se is Taylor expansions whose alidity is explained in Appendix Section, following which the reslts follow after a seqence of tedios calclations and detailed book-keeping. For C3 and C4, we need to bond the third and forth moments of h. The main tool sed for C3 is a Berry-Esseen theorem which helps s track the deiation from normality at finite samples, and C4 is tackled by Chebyshe s ineqality once we hae a handle on the ariance of. Most of the details will be deferred to the Appendix, bt we will otline the main steps of the deriations here. 4. The Poplation MMD The main takeaway point of the following lemma is the dependence of poplation MMD on the bandwidth γ and the signal strength δ recall δ : µ P µ Q. If p, q are the pdfs of P, Q, then note that the poplation MMD with the Gassian kernel is gien by x y pxpy qxqy pxqydxdy R d e Lemma. Under A,A, and when γ Ω d we hae MMD δ o. Proof. We defer details to the Appendix Section 3. On sing Taylor s expansion for the Gassian kernel, the terms in the aforementioned MMD expression can be approximated by bonding higher order residal terms. We proe that the first MMD term is R d e x y pxpydxdy d σ. Using similar techniqes we can also dedce: e x y pxqydxdy R d i σ δ i. Combining these, again sing Taylor expansions, gies s or expression. 4. The Variance As arged earlier, the ariance is gien by V/n where V Var z,z hz, z. The takeaway points of the following lemma are the identical dependence that V has on bandwidth γ as the MMD which then cases their ratio to be essentially independent of γ, and also the role played by dimension and the signal strength in determining the ariance. 8
9 Lemma. Under A,A, and when γ Ω d, we hae V 6dσ4 6σ δ o. Proof. Note that V E z,z h z, z MMD 4 since MMD E z,z hz, z. Let s focs on the first term: E z,z [h z, z ] E x,x P k x, x E y,y Qk y, y E x P,y Q k x, y E x,x P,y,y Qkx, x ky, y E x,x P,y,y Qkx, y kx, y 4E x,x P,y Qkx, x kx, y 4E x P,y,y Qkx, yky, y Hence, there are fie different kinds of terms to calclate the first and last two are similar. Combining these gies s or soltion. The details are tedios and hence are gien in the Appendix Section The Berry-Esseen Bond Lemma 3. Under A, A, and when γ Ω d, we hae n/mmd sp t P l MMD t Φt V 0 n Proof. The Berry-Esseen Lemma see for example Theorem 3.6 or 3.7 in Chen et al. 00, when translated to or problem, essentially yields the aboe lemma, except that the right hand side is ξ 3 0 V 3/ n 0 where ξ 3 E[ hz, z Ehz, z 3 ], and the constant 0 is not optimal. Note that ξ 3 τ 3 third central moment of h de to the absolte ale sign. Gien that we hae the mean and second central moment of h MMD and V respectiely, one might imagine sing similar techniqes to calclate ξ 3. Howeer, the absolte ale poses a problem, and so we mst take an alternate rote. Specifically, tedios calclations in the Appendix Section 5 proe that τ 4 the forth central moment of h is bonded as τ 4 4 ov, allowing s to bond ξ 3 as ξ 3 τ 4 V V 3/ since E X 3 E X 4 E X by Cachy-Schwarz. Sbstitting into Eq.0 gies s or Lemma. The main challenge inoled is in proing that the ratio ξ 3 /V 3/ is independent of d. Note that a ery crde bond of h Eh 4 since e z gies s ξ 3 4V, which wold yield a dimension dependence de to an extra V factor, bt becase τ 4 and hence ξ 3 has exactly the right scaling with V, the dependence on V and hence, importantly, the dimension cancels ot and or Lemma follows. This is only one of the reasons we needed a bond on τ 4, the other appearing in the next lemma. 9
10 4.4 Bonding /V Recall that is the empirical estimator of V - it is an empirical aerage of n/ nidimensional terms. The sbtlety is that depends on d since V depends on d. What matters is whether the rate of conergence of their ratio to depends on d - fortnately it does not. Lemma 4. Under A,A, and when γ Ω d, we hae /V OP /n /4 Proof. Using k in Theorem A of Section..3 in Serfling 009, the bias of is gien by and its ariance is gien by E[] V V n ar τ 4 V 3V n n both p to smaller order terms where the ineqality follows from the preios lemma. Then, it is easy to see that V O P n, i.e. V O P V/ n. This is becase for any ɛ > 0, V P V/ n > 3 ɛ ɛ P E[] > 3V V ɛ V nɛ n ɛ ar 3V nɛ V n V n where we sed Chebyshe s ineqality, and the second ineqality follows since 3V nɛ V n V n 3V nɛ. At this point we hae all the key elements of the proof of Theorem. Specifically, eqations 5 to 9 follow exactly as written, with the exception of 7 holding een with a n,d - note that this step allows n, d to grow at any relatie rate to precisely becase the rate at which Q conerges to the standard normal Z Berry-Esseen bond and the rate at which /V conerges to, were both independent of d and only needs n. The dependence on d only enters throgh the MMD and its ariance. This concldes the proof of Theorem. One can also write down the finite sample type- error rate as being at most α 0/ n and the finite sample power as being at least β 0/ n, where the additional error is introdced de to the Berry-Esseen bond whose constants we don t optimize, bt cold be tightened to abot 5 instead of 0. We now confirm the tightness of all the predictions in this section by detailed simlations in the next section. 0
11 5 Experiments Or aim in this section is to confirm the theoretical predictions made by or lemmas and theorems. The most important claims to address are that the Berry-Esseen bond is independent of d, the nll and alternate distribtions are indeed normal een in the extreme case when n is fixed and d is increasing, the ratio of MMD / V is essentially independent of the bandwidth, and finally the final power expression is essentially independent of the bandwidth and has the exact predicted scaling as gien by or expressions. 5. Berry-Esseen bond is independent of d Since the calclations of τ 4 are rather tedios, let s also erify the prediction made in Sbsection 4.3 that ξ 3 /V 3/ is constant and independent of dimension remember that the ratio inoles poplation qantities. To erify this, we draw 000 samples from P, Q, and calclate the empirical ratio for d ranging from 40 to 000, in steps of 0. We make 3 sets of choices for P, Q - standard normals with γ d 0.75, t 4 distribtion with γ d 0.5 and t 4 distribtion with γ d. The reason we se t 4 t distribtion with 4 degrees of freedom is becase it does not hae a finite forth moment τ 4. We find that in all 3 cases, the ratio is a constant of abot.65, showing that or prediction is extremely accrate. Also, while or proof proceeded ia bonding τ 4, it seems to hold tre een when higher moments than 3 don t exist, since it holds for the t 4 distribtion. The spikes are becase we calclate a single empirical ratio at each d..9.8 t dist log γ/d 0.5 t dist log γ/d 0.75 Normal dist log γ/d B E ratio d Figre : The empirical Berry-Esseen ratio ξ 3 /V 3/ s dimension, when n 000 for the distribtions t 4, t 4 and normal, with bandwidths d 0.5, d, d 0.75 respectiely. 5. Normality of nll/alternate distribtions Let s now erify that the nll and alternate distribtions are indeed almost standard normal when n is held constant and d is increased. We do this by fixing n 50, and choosing d {50, 00, 00} and calclating or test statistic nmmd l /. We experimentally approximate the nll and alternate distribtions by repeating this process 000 times; the histogram obtained is compared to a normal by plotting a standard normal qantile-qantile plot. The oerlapping straight lines indicate that each of the nll and alternate distribtions for three different d ales are almost exactly standard normal een at a small ale of n like 50. This agrees with or deriation that the Berry-Esseen constant is ery small and normality is achieed soon.
12 Qantiles of Inpt Sample Standard Normal Qantiles Qantiles of Inpt Sample Standard Normal Qantiles Figre : A normal qantile-qantile plot of nll left and alternate right distribtions of or test statistic for d 50, 00, 00 when n repetitions....3 logratio logγ/d 0.5 logγ/d 0.75 logγ/d Median Heristic logd Figre 3: A log-log plot of MMD / V s dimension for different bandwidth choices when Ψ and n is large. Note that the slope is 0.5, independent of γ. 5.3 MMD / V is independent of bandwidth Or first two lemmas together imply that the ratio MMD / V is independent of γ as long as γ Ω d. To test this, we actally calclate this ratio for γ d 0.5, d 0.75, d. Remember that these are poplation qantities - we will estimate the ratio sing sample qantities sing a large n, when Ψ. We plot the obtained log-ratio against log-dimension in Figre 3, showing that the power scales as / d as predicted. 5.4 The scaling of power with n, d Here are a few testable predictions of Theorem :. When n 50 and Ψ.5, the power shold decrease as / d Corollary.. When n 50, and Ψ d /4, then the power shold be a constant Corollary. 3. When n d, and Ψ, the power shold stay constant Corollary. 4. When n d, and Ψ 0.3d /, then the power shold increase as d Corollary. From Figre 4, we infer that the precise form of Theorem and Corollaries, is extremely accrate, een at small n and significantly larger d, inclding that it is independent of the bandwidth γ as predicted, as long as γ Ω d.
13 lmmd logγ/d 0.5 lmmd logγ/d 0.75 lmmd logγ/d lmmd Median Power Power d lmmd logγ/d 0.5 lmmd logγ/d 0.75 lmmd logγ/d lmmd Median d lmmd logγ/d 0.5 lmmd logγ/d 0.75 lmmd logγ/d lmmd Median Power Power d d lmmd logγ/d 0.5 lmmd logγ/d 0.75 lmmd logγ/d lmmd Median Figre 4: All plots show power s d for different γ {median, d 0.5, d 0.75, d} for d 40 to 00 in steps of 0. From top left to bottom right are the settings -4, with P, Q being Gassians. The power is estimated oer 00 repetitions at each d. 6 Conclsion This paper has two main noelties - the first is to precisely characterize how a nonparametric two sample test, which is consistent in fixed dimensions against general alternaties, performs against a mean-shift alternatie; the second is to perform the analysis in the significantly more difficlt high-dimensional regime. Ftre work inoles nderstanding MMD, bt the limiting distribtions of general U-statistics are be difficlt to ascertain in high dimensions. Another direction inoles the stdy of sparse alternaties, where δ is sparse, as done by Cai et al. 04. Lastly, minimax lower bonds are reqired to nderstand the tradeoffs inoled between Ψ, d, n. References Anderson, Theodore W. An introdction to mltiariate statistical analysis Bai, Zhidong D and Saranadasa, Hewa. Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 6:3 39, 996. Cai, Tony, Li, Weidong, and Xia, Yin. Two-sample test of high dimensional means nder dependence. Jornal of the Royal Statistical Society: Series B Statistical Methodology, 76:349 37, 04. 3
14 Chen, Lois HY, Goldstein, Larry, and Shao, Qi-Man. Normal approximation by Steins method. Springer, 00. Chen, Song Xi and Qin, Ying-Li. A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 38: , apr 00. doi: 0.4/09-aos76. URL Gretton, A., Borgwardt, K., Rasch, M., Schoelkopf, B., and Smola, A. A kernel two-sample test. Jornal of Machine Learning Research, 3:73 773, 0. Hotelling, Harold. The generalization of stdent s ratio. Annals of Mathematical Statistics, 3: , ag 93. doi: 0.4/aoms/ URL Kariya, Takeaki. A robstness property of hotelling s t-test. The Annals of Statistics, pp. 4, 98. Lehmann, Erich L and Romano, Joseph P. Testing statistical hypotheses. springer, 006. Lopes, M.E., Jacob, L., and Wainwright, M.J. A more powerfl two-sample test in high dimensions sing random projection. In Adances in Neral Information Processing Systems 4. MIT Press, 0. Lyons, R. Distance coariance in metric spaces. Annals of Probability, 45: , 03. Reddi, Sashank J., Ramdas, Aaditya, Póczos, Barnabás, Singh, Aarti, and Wasserman, Larry A. Kernel MMD, the median heristic and distance correlation in high dimensions. CoRR, abs/ , 04. URL Rosenbam, Pal R. An exact distribtion-free test comparing two mltiariate distribtions based on adjacency. Jornal of the Royal Statistical Society: Series B Statistical Methodology, 674: , 005. Salaeskii, O.V. Minimax character of hotellings t test. i. In Inestigations in Classical Problems of Probability Theory and Mathematical Statistics, pp Springer, 97. Schölkopf, Bernhard and Smola, A. J. Learning with Kernels. MIT Press, Cambridge, MA, 00. Sejdinoic, D., Sripermbdr, B., Gretton, A., Fkmiz, K., et al. Eqialence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics, 45:63 9, 03. Serfling, Robert J. Approximation theorems of mathematical statistics, olme 6. John Wiley & Sons, 009. Simaika, JB. On an optimm property of two important statistical tests. Biometrika, pp , 94. Sriastaa, Mni S. and D, Meng. A test for the mean ector with fewer obserations than the dimension. Jornal of Mltiariate Analysis, 993:386 40, mar 008. doi: 0.06/j.jma URL Székely, Gábor J and Rizzo, Maria L. Testing for eqal distribtions in high dimension. InterStat, 5, 004. Székely, G.J. and Rizzo, M.L. The distance correlation t-test of independence in high dimension. J. Mltiariate Analysis, 7:93 3, 03. 4
15 A The Power of CQ for high SNR Let s first briefly describe what we beliee is an important mistake in Chen & Qin 00 - all notations, eqation nmbers and theorems in this paragraph refer to those in Chen & Qin 00. Using the test statistic T n /ˆσ n defined below Theorem, we can derie the power nder assmption 3.5 as Tn P > ξ α ˆσ n P Tn µ µ Φ ˆσ n > ˆσ n ξ α µ µ ˆσ n ˆσ n µ µ Φ the denominator is not ˆσ n ˆσ n n µ µ µ µ T Σµ µ which shold be the expression for power that they derie in Eq.3., the most important differnce being the presence of n instead of n in the nmerator. They also do not hae an explicit Berry- Esseen bond dealing with the deiation from normality. B Remarks for this Appendix B. Taylor Expansion In all or calclations, we se the Taylor expansion for the fnction e x arond 0. More specifically, we hae λ e p i q i dd e 4 p i p i dd where λ [0, ]. The aboe eqality follows from the exact formla for Taylor expansions haing exact residals. Note that e λ. When γ Ω d and forth moments of the distribtions p i and q i exist, the aboe integral becomes e p i q i dd [ ] p i p i dd o Similarly, an higher order expansion can also be obtained by assming existence of sixth order moments. For ease of exposition, we drop o throghot or calclations. To emphasize this isse, we se symbol in or calclations to indicate that the o term is ignored. B. Independent Coordinates In or calclations, we assme that the coordinates of x, y are independent and that their central moments are σ, µ 3, µ 4. In other words, we se U I in Assmption to derie expressions in this 5
16 Appendix. Howeer, this is only for ease of exposition and all or proofs hold een when U I. This can be seen from the following argment. x y Us µ P Ut µ Q U s t U µ P µ Q s U µ P t U µ Q s t. where s s U µ P and t t U µ Q. Since U T µ P and U T µ Q are jst rotated mean ectors, the coordinates of s and t are independent since the coordinates of s, t are independent in assmption A and s, t still hae the same central moments as s, t. Using the aboe relation, we can rewrite or calclations inoling e x y / in terms of e s t /. Note that the difference between the means of the distribtions on x, y is µ P µ Q and the that the difference between the means of the distribtions on s, t is also U µ P U µ Q µ P µ Q since U is orthogonal. So all the problem parameters remain the same, except we shift from non-independent coordinates for x, y to independent coordinates for s, t. C Proof of Lemma First note that we can rewrite the poplation MMD as MMD E x,x P [kx, x ] E y,y Q[ky, y ] E x P,y Q [kx, y] e ppdd e qqdd e pqdd We calclate each of these integrals in the following manner. Since the coordinates of the P and Q are independent, we hae e ppdd e p i p i dd i ] [ p i p i dd i σ The last two steps follow from the fact that the coordinates are independent and definition of the second moments of the distribtions p i and q i see Section F. of the Appendix. Similarly the corresponding term for distribtion Q is d e qqdd d σ 6
17 For the final term, we hae e pqdd i i i i ] [ p i q i d d µ P i µ P i µ P i µ P i σ µ Qi µ Qi µ P i σ σ δ i q i d p i q i dd The second step follows from since integral. The third step follows from independence of the coordinates. The forth step follows from taylor expansion. The final few steps follow from the definition of second moment of the distribtions see Section F. of the Appendix. Combining the aboe terms, we hae MMD i i δ σ i σ D Proof of Lemma i The ariance for the linear time MMD is gien by σ i σ i σ σ δ i σ i ar z,z, hz, z E z,z [h z, z ] E z,z hz, z σ i where hz, z kx, x ky, y kx, y kx, y where x, x P and y, y Q and E z,z [hz, z ] MMD. Hence the second term is jst E z,z hz, z MMD. Let s concentrate on the first term: E z,z [h z, z ] E x,x P k x, x E y,y Qk y, y E x P,y Q k x, y Hence, there are fie kinds of terms to calclate E x,x P,y,y Qkx, x ky, y E x,x P,y,y Qkx, y kx, y 4E x,x P,y Qkx, x kx, y 4E x P,y,y Qkx, yky, y. E x,x P k x, x from which E y,y Qk y, y can follow. E x P,y Q k x, y 3. E x,x P,y,y Qkx, x ky, y 4. E x,x P,y,y Qkx, y kx, y 5. E x,x P,y Qkx, x kx, y from which E x P,y,y Qkx, yky, y can follow Let s calclate these fie terms in order. δ i 7
18 D. Term : E x,x P k x, x i i x,x P e x x x,x e pxpx dxdx x i x i p i x i p i x idx i dx i 4σ 4µ 4 σ4 4dσ 4dµ 4 dσ4 8dd σ4 The third step follows from or calclations in Section F. of the Appendix. Note that the extra terms arise from considering all cross terms with denominator. D. Term : E x P,y Q k x, y i i x P 4dσ y Q e e 4σ x y x i yi pxqy dxdy p i x i q i y idx i dy i δ i 4µ 4 4σ δi σ4 δ 4dµ 4 4σ δ dσ4 δ4 i δ 4 4 8dd σ4 8d σ δ The third step follows from or calclations in Section F. of the Appendix. D.3 Term 3: E x,x P,y,y Qkx, x ky, y i i i x,x P y,y Q e x i xi e x x e y y pxpx qyqy dxdx dydy σ µ 4 3σ4 4σ µ 4 0σ4 p i x i p i x idx i dx i e σ i y i yi µ 4 3σ4 q i y i q i y idy i dy i δ 4 δ 4 4 4dσ dµ 4 0dσ4 8dd σ4 The third step follows from or calclations in Section F. of the Appendix. 8
19 D.4 Term 4: E x,x P,y,y Qkx, ykx, y i i x,x P y,y Q σ 4σ 4dσ e x y e x y pxpx qyqy dxdx dydy δ i µ 4 6σ δi δ i µ 4 6σ δi 3σ4 δ4 i 0σ4 δ4 i δ dµ 4 6σ δ 0dσ4 δ 4 4 8dd σ4 The third step follows from or calclations in Section F. of the Appendix. D.5 Term 5: E x,x P,y Qkx, x kx, y i i x,x P 4dσ y Q e 4σ e x x e x y x i xi pxpx qydxdx dy e x i y i p i x i p i x iqy i δ i 3µ 4 8σ δi 9σ4 δ 3dµ 4 8σ δ 9dσ4 δ4 µ 3δ i µ 3 i δ i δ 4 4 δ 4 δ 4 4 8d σ δ 8dd σ4 4d σ δ The second step follows from or calclations in Section F. of the Appendix. Combining the all the terms aboe, we get the following bond on the ariance. δ 4 δ 4 4 9
20 D.6 The bond on E z,z [h z, z ] E x,x P k x, x E y,y Qk y, y E x P,y Q k x, y E x,x P,y,y Qkx, x ky, y E x,x P,y,y Qkx, y kx, y 4E x,x P,y Qkx, x kx, y 4E x P,y,y Qkx, yky, y 4dσ 4dµ 4 dσ4 8dd σ 4 4dσ 4dµ 4 dσ4 8dd σ 4 4dσ δ 4dµ 4 4σ δ dσ4 8dd σ 4 8d σ δ δ 4 4dσ dµ 4 0dσ4 8dd σ 4 4dµ δ dµ 4 6σ δ 0dσ4 8dd σ 4 δ 4 8d σ δ 4 4dσ δ 3dµ 4 8σ δ 9dσ4 dµ 3 i δ i 8dd σ 4 4d σ δ δ 4 4 4dσ δ 3dµ 4 8σ δ 9dσ4 dµ 3 i δ i 8dd σ 4 4d σ δ δ 4 4 δ 4 6dσ4 6σ δ Finally, sing the bond deried aboe on E z,z [h z, z ], the bond on ariance is ar z,z, hz, z E z,z [h z, z ] E z,z hz, z 6dσ4 6σ δ. E Proof of Lemma 3 E. Upper bond on τ 4 We derie the pper bond on τ 4 in this section. An pper bond on E z,z [hz, z E z,z [hz, z ] 4 ] can be obtain in the following manner. First note that E z,z [hz, z E z,z [hz, z ] 4 ] E[h 4 z, z ] 3MMD 4 4E[h 3 z, z ]MMD 6E[h z, z ]MMD 6κ 4 48 δ 8 δ γ 8 64κ 3 96 δ 8 γ 8 384dσ4 δ 4 γ 8 384σ δ 6 γ 8 where κ 4 E[h 4 z, z ] and κ 3 E[h 3 z, z ]. 0
21 Calclations for κ 4 We now calclate an pper bond to E z,z [h 4 z, z ] in the following manner. With slight abse of notation, we se x i to denote the i th coordinate of x. We first note that E z,z [h 4 z, z ] E z,z [kx, x ky, y kx, y kx, y] 4 E z,z [ x x y y x y x y [ x x y y x y x ] 4 y 6E z,z 6E z,z 6 γ 8 E z,z 6 γ 8 [ d j x j y j x j y j k k d 4 k k d 4 i d ] 4 4 x i y i ki x i y k k i ki d i d 4 E z [x i y i ki ] k k d ] 4 The aboe smmation splits into fie different sms, based on the different ways to write k k d 4 - we derie these terms sing the calclations in Section F. and Section F., as well as some terms from the Variance calclations in Section D, and explain in brackets which way to sm the k i s to 4 was sed. κ γ 8 [µ 4 σ δi 6σ 4 δi 4 ] sing 4,0,0... i 4 γ 8 δi 3 6σ δ i δj sing 3,,0, γ 8 4σ 4 δi 4 4σ δi 4σ 4 δj 4 4σ δj sing,,0, γ 8 γ 8 4σ 4 δi 4 4σ δi δj δk sing,,,0,0... k k l δ i δ j δ kδ l sing,,,,0,0... Expanding the each of the aboe terms frther, we get
22 Term : Term : Term 3: Term 4: Term 5: [ γ 8 4dµ 4 44 σ 4 δi 4 36σ 8 d δi 8 i i 48µ 4 σ δ 4dµ 4 σ 4 4µ 4 δi 4 44σ 6 δ 4σ ] δi 6 i i [ 4 γ 8 δi 6 δj 36σ 4 δi δj σ ] δi 4 δj [ 3 γ 8 8dd σ 8 δi 4 δj 4 8σ 4 d δi 4 3σ 6 δ d 8σ δi 4 δj 6σ ] 4 δi δj i [ 6 γ 8 4σ 4 d δi δj δi 4 δj δk 4σ ] δi δj δk k k [ ] γ 8 δi δj δkδ l k l Calclations for κ 3 Similar to the mltinomial expansion for κ 4, we hae κ 3 γ 6 δi 6 36σ 4 δi σ δi 4 sing 3,0,0,0... i 3 γ 6 4σ 4 δi 4 4σ δi δj sing,,0,0... γ 6 k Using the aboe expansion, we get δ i δ j δ k sing,,,0,0... κ 3 δ γ 8 δi 6 δj 36σ 4 δi δj σ δi 4 δj γ 8 δi 8 36σ 4 δi 4 σ δi 6 3 γ 8 i 4σ 4 δj δk δi 4 δj δk 4σ δi δj δk k 3 γ 8 4σ 4 δi δj δi 6 δj 4σ δi 4 δj 3 γ 8 4σ 4 δj 4 δi 4 δj 4 4σ δi δj 4 γ 8 k l δ i δ j δ kδ l 3 γ 8 k δ 4 i δ j δ k
23 Also note the following expansions of δ 8 and δ 6. δ 8 γ 8 δ 6 γ 6 i i δi 8 4 δi 6 δj 3 δi 4 δj 4 6 δi 6 3 δi 4 δj δi δj δk k k δ 4 i δ j δ k k l δ i δ j δ kδ l Ptting all terms together Using the aboe calclations for κ 3 E z,z [hz, z ] 4 ]. and κ 4, we obtain the following bond on E z,z [hz, z E z,z [hz, z E z,z [hz, z ] 4 ] E[h 4 z, z ] 3MMD 4 4E[h 3 z, z ]MMD 6E[h z, z ]MMD 6κ 4 48 δ 8 δ γ 8 64κ 3 96 δ 8 γ 8 384dσ4 δ 4 γ 8 384σ δ 6 γ 8 6 4dµ 4 36σ 8 d 4dµ 4 σ 4 4dd σ 8 96dσ 6 48σ 6 48µ 4 σ i 3σ 4 4µ 4 i δ 4 i 44σ 4 δ i δ j 64 4σ 4 i δ 4 i 4σ 4 δ i δi δj γ 8 64dµ 4 576σ 8 d 384dµ 4 σ 4 384dd σ 8 536dσ 6 768σ 6 768µ 4 σ i δ i 576σ 4 64µ 4 i δ 4 i 768σ 4 δ i δ j 3 ov where we sbstitted κ 4, κ 3 in the third eqationand the δ 6 and δ 8 terms perfectly cancel ot. F Helpfl Calclations for Lemma,, 3, 4 F. Doble Integrals e fgdd [ σ ] 4 fgdd δ µ 4 6σ δ 3σ4 δ4 becase γ fgdd µf µ f fgdd σ µ f gd σ δ 3
24 and 4 γ fgdd 4 µ f µ f 4 fgdd µ4 µ 4 µf 4 µ f 4 4 µ f 3 µ f 4 µ f µ f 3 µ f 4 4µ 3 µ f 6σ µ f gd [ µ4 4µ 3δ 6σ δ ] [ ] [ δ4 4µ3 δ 6σ σ δ ] µ 4 σ δ 6σ4 δ4 6 µ f µ f Finally, we hae 3 fgdd µ f µ f 3 fgdd µ 3 3σ µ f µ f 3 gd F. Triple Integral y e y y σ σ [ [ e y fggydddy y δ σ ] 4 y [ 3σ δ δ 3 3σ δ δ 3 6σ δ. 4 y4 ] [ µ 4 σ δ 6σ4 δ4 3 ] y4 fggydddy y [ ] µ4 6σ4 [ σ µ f ] [ σ µ g ] gd [ ] δ σ µ 4 σ δ 6σ4 δ4 [ ] µ4 6σ4 3σ4 σ δ µ 4 µ 3δ 4σ δ 3µ 4 8σ δ 9σ4 δ4 µ 3δ ] fggydddy The last eqality is obtained from the following: [ σ µ f ] [ σ µ g ] gd σ 4 σ σ µ g µ f σ 4 µ f µ g gd fgdd 4
25 G Additional Experiments logmmd logγ/d 0.5 logγ/d 0.75 logγ/d Median Heristic logd logvariance logγ/d 0.5 logγ/d 0.75 logγ/d Median Heristic logd Figre 5: A plot for MMD and Variance of linear statistic, when n 000 for Normal distribtion with identity coariance and Ψ, for bandwidths d 0.5, d, d Note that these plots proide empirical erification for Lemma and Lemma. 5
3 Distance in Graphs. Brief outline of this lecture
Distance in Graphs While the preios lectre stdied jst the connectiity properties of a graph, now we are going to inestigate how long (short, actally) a connection in a graph is. This natrally leads to
Every manufacturer is confronted with the problem
HOW MANY PARTS TO MAKE AT ONCE FORD W. HARRIS Prodction Engineer Reprinted from Factory, The Magazine of Management, Volme 10, Nmber 2, Febrary 1913, pp. 135-136, 152 Interest on capital tied p in wages,
CHAPTER ONE VECTOR GEOMETRY
CHAPTER ONE VECTOR GEOMETRY. INTRODUCTION In this chapter ectors are first introdced as geometric objects, namely as directed line segments, or arrows. The operations of addition, sbtraction, and mltiplication
On the urbanization of poverty
On the rbanization of poverty Martin Ravallion 1 Development Research Grop, World Bank 1818 H Street NW, Washington DC, USA Febrary 001; revised Jly 001 Abstract: Conditions are identified nder which the
HOMOTOPY FIBER PRODUCTS OF HOMOTOPY THEORIES
HOMOTOPY FIBER PRODUCTS OF HOMOTOPY THEORIES JULIA E. BERGNER Abstract. Gien an appropriate diagram of left Qillen fnctors between model categories, one can define a notion of homotopy fiber prodct, bt
Chapter 3. 2. Consider an economy described by the following equations: Y = 5,000 G = 1,000
Chapter C evel Qestions. Imagine that the prodction of fishing lres is governed by the prodction fnction: y.7 where y represents the nmber of lres created per hor and represents the nmber of workers employed
Manipulating Deformable Linear Objects: Characteristic Features for Vision-Based Detection of Contact State Transitions
Maniplating Deformable Linear Objects: Characteristic Featres for Vision-Based Detection of Contact State Transitions Jürgen Acker Dominik Henrich Embedded Systems and Robotics Lab. (RESY) Faclty of Informatics,
Global attraction to the origin in a parametrically-driven nonlinear oscillator
Global attraction to the origin in a parametrically-drien nonlinear oscillator M.V. Bartccelli, J.H.B. Deane, G. Gentile and S.A. Gorley Department of Mathematics and Statistics, Uniersity of Srrey, Gildford,
Solutions to Assignment 10
Soltions to Assignment Math 27, Fall 22.4.8 Define T : R R by T (x) = Ax where A is a matrix with eigenvales and -2. Does there exist a basis B for R sch that the B-matrix for T is a diagonal matrix? We
Configuration Management for Software Product Lines
onfigration Management for Software Prodct Lines Roland Laqa and Peter Knaber Franhofer Institte for Experimental Software Engineering (IESE) Saerwiesen 6 D-67661 Kaiserslatern, Germany +49 6301 707 161
WHITE PAPER. Filter Bandwidth Definition of the WaveShaper S-series Programmable Optical Processor
WHITE PAPER Filter andwidth Definition of the WaveShaper S-series 1 Introdction The WaveShaper family of s allow creation of ser-cstomized filter profiles over the C- or L- band, providing a flexible tool
Modeling Roughness Effects in Open Channel Flows D.T. Souders and C.W. Hirt Flow Science, Inc.
FSI-2-TN6 Modeling Roghness Effects in Open Channel Flows D.T. Soders and C.W. Hirt Flow Science, Inc. Overview Flows along rivers, throgh pipes and irrigation channels enconter resistance that is proportional
Graph-based Simplex Method for Pairwise Energy Minimization with Binary Variables
Graph-based Simplex Method for Pairwise Energy Minimization with Binary Variables Daniel Průša Center for Machine Perception, Faclty of Electrical Engineering, Czech Technical Uniersity Karloo náměstí
PHY2061 Enriched Physics 2 Lecture Notes Relativity 4. Relativity 4
PHY6 Enriched Physics Lectre Notes Relativity 4 Relativity 4 Disclaimer: These lectre notes are not meant to replace the corse textbook. The content may be incomplete. Some topics may be nclear. These
In the insurance business risky investments are dangerous
Noname manscript No. (will be inserted by the editor) In the insrance bsiness risy investments are dangeros Anna Frolova 1, Yri Kabanov 2, Sergei Pergamenshchiov 3 1 Alfaban, Moscow, Rssia 2 Laboratoire
Spectrum Balancing for DSL with Restrictions on Maximum Transmit PSD
Spectrm Balancing for DSL with Restrictions on Maximm Transmit PSD Driton Statovci, Tomas Nordström, and Rickard Nilsson Telecommnications Research Center Vienna (ftw.), Dona-City-Straße 1, A-1220 Vienna,
Corporate performance: What do investors want to know? Innovate your way to clearer financial reporting
www.pwc.com Corporate performance: What do investors want to know? Innovate yor way to clearer financial reporting October 2014 PwC I Innovate yor way to clearer financial reporting t 1 Contents Introdction
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Using GPU to Compute Options and Derivatives
Introdction Algorithmic Trading has created an increasing demand for high performance compting soltions within financial organizations. The actors of portfolio management and ris assessment have the obligation
A Spare Part Inventory Management Model for Better Maintenance of Intelligent Transportation Systems
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 A Spare Part Inventory Management Model for Better Maintenance of Intelligent
Covering planar graphs with degree bounded forests
Coering planar graphs ith degree bonded forests D. Gonçales LaBRI, U.M.R. 5800, Uniersité Bordea 1, 351 cors de la Libération 33405 alence Cede, France. Abstract We proe that eer planar graphs has an edge
Sample Pages. Edgar Dietrich, Alfred Schulze. Measurement Process Qualification
Sample Pages Edgar Dietrich, Alfred Schlze Measrement Process Qalification Gage Acceptance and Measrement Uncertainty According to Crrent Standards ISBN: 978-3-446-4407-4 For frther information and order
5 Using Your Verbatim Autodialer
5 Using Yor Verbatim Atodialer 5.1 Placing Inqiry Calls to the Verbatim Atodialer ( Yo may call the Verbatim atodialer at any time from any phone. The nit will wait the programmed nmber of rings before
STAT 830 Convergence in Distribution
STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31
Cosmological Origin of Gravitational Constant
Apeiron, Vol. 5, No. 4, October 8 465 Cosmological Origin of Gravitational Constant Maciej Rybicki Sas-Zbrzyckiego 8/7 3-6 Krakow, oland [email protected] The base nits contribting to gravitational constant
Research on Pricing Policy of E-business Supply Chain Based on Bertrand and Stackelberg Game
International Jornal of Grid and Distribted Compting Vol. 9, No. 5 (06), pp.-0 http://dx.doi.org/0.457/ijgdc.06.9.5.8 Research on Pricing Policy of E-bsiness Spply Chain Based on Bertrand and Stackelberg
Exact moments of generalized order statistics from type II exponentiated log-logistic distribution
Hacettepe Jornal of Mathematics and Statistics Volme 44 3 25, 75 733 Eact moments of generalized order statistics from type II eponentiated log-logistic distribtion Deendra Kmar Abstract In this paper
Closer Look at ACOs. Putting the Accountability in Accountable Care Organizations: Payment and Quality Measurements. Introduction
Closer Look at ACOs A series of briefs designed to help advocates nderstand the basics of Accontable Care Organizations (ACOs) and their potential for improving patient care. From Families USA Janary 2012
ASAND: Asynchronous Slot Assignment and Neighbor Discovery Protocol for Wireless Networks
ASAND: Asynchronos Slot Assignment and Neighbor Discovery Protocol for Wireless Networks Fikret Sivrikaya, Costas Bsch, Malik Magdon-Ismail, Bülent Yener Compter Science Department, Rensselaer Polytechnic
Closer Look at ACOs. Making the Most of Accountable Care Organizations (ACOs): What Advocates Need to Know
Closer Look at ACOs A series of briefs designed to help advocates nderstand the basics of Accontable Care Organizations (ACOs) and their potential for improving patient care. From Families USA Updated
9 Setting a Course: Goals for the Help Desk
IT Help Desk in Higher Edcation ECAR Research Stdy 8, 2007 9 Setting a Corse: Goals for the Help Desk First say to yorself what yo wold be; and then do what yo have to do. Epictets Key Findings Majorities
Distribution of the Ratio of Normal and Rice Random Variables
Journal of Modern Applied Statistical Methods Volume 1 Issue Article 7 11-1-013 Distribution of the Ratio of Normal and Rice Random Variables Nayereh B. Khoolenjani Uniersity of Isfahan, Isfahan, Iran,
Bonds with Embedded Options and Options on Bonds
FIXED-INCOME SECURITIES Chapter 14 Bonds with Embedded Options and Options on Bonds Callable and Ptable Bonds Instittional Aspects Valation Convertible Bonds Instittional Aspects Valation Options on Bonds
Joint Routing and Scheduling in Multi-hop Wireless Networks with Directional Antennas
Joint Roting and Schedling in Mlti-hop Wireless Netorks ith Directional Antennas Partha Dtta IBM Research India [email protected] Viek Mhatre Motorola Inc. [email protected] Debmalya Panigrahi CSAIL,
Evolutionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending
Evoltionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending Abstract Xiaoyn Liao G. Gary Wang * Dept. of Mechanical & Indstrial Engineering, The University of Manitoba Winnipeg, MB,
Multivariate normal distribution and testing for means (see MKB Ch 3)
Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................
Compensation Approaches for Far-field Speaker Identification
Compensation Approaches for Far-field Speaer Identification Qin Jin, Kshitiz Kmar, Tanja Schltz, and Richard Stern Carnegie Mellon University, USA {qjin,shitiz,tanja,rms}@cs.cm.ed Abstract While speaer
Introduction to HBase Schema Design
Introdction to HBase Schema Design Amandeep Khrana Amandeep Khrana is a Soltions Architect at Clodera and works on bilding soltions sing the Hadoop stack. He is also a co-athor of HBase in Action. Prior
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Curriculum development
DES MOINES AREA COMMUNITY COLLEGE Crriclm development Competency-Based Edcation www.dmacc.ed Why does DMACC se competency-based edcation? DMACC tilizes competency-based edcation for a nmber of reasons.
TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings
TrstSVD: Collaborative Filtering with Both the Explicit and Implicit Inflence of User Trst and of Item Ratings Gibing Go Jie Zhang Neil Yorke-Smith School of Compter Engineering Nanyang Technological University
Linear Programming. Non-Lecture J: Linear Programming
The greatest flood has the soonest ebb; the sorest tempest the most sdden calm; the hottest love the coldest end; and from the deepest desire oftentimes enses the deadliest hate. Socrates Th extremes of
Nonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Periodized Training for the Strength/Power Athlete
Periodized Training for the /Power Athlete Jay R. Hoffman, PhD, FACSM, CSCS *D The se of periodized training has been reported to go back as far as the ancient Olympic games. Its basic premise is that
LIMITS IN CATEGORY THEORY
LIMITS IN CATEGORY THEORY SCOTT MESSICK Abstract. I will start assming no knowledge o category theory and introdce all concepts necessary to embark on a discssion o limits. I will conclde with two big
Regular Specifications of Resource Requirements for Embedded Control Software
Reglar Specifications of Resorce Reqirements for Embedded Control Software Rajeev Alr and Gera Weiss University of Pennsylvania Abstract For embedded control systems a schedle for the allocation of resorces
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Closed form spread option valuation
Closed form spread option aluation Petter Bjerksund and Gunnar Stensland Department of Finance, NHH Helleeien 30, N-5045 Bergen, Norway e-mail: [email protected] Phone: +47 55 959548, Fax: +47 55
CRM Customer Relationship Management. Customer Relationship Management
CRM Cstomer Relationship Management Farley Beaton Virginia Department of Taxation Discssion Areas TAX/AMS Partnership Project Backgrond Cstomer Relationship Management Secre Messaging Lessons Learned 2
7 Help Desk Tools. Key Findings. The Automated Help Desk
7 Help Desk Tools Or Age of Anxiety is, in great part, the reslt of trying to do today s jobs with yesterday s tools. Marshall McLhan Key Findings Help desk atomation featres are common and are sally part
On the maximum average degree and the incidence chromatic number of a graph
Discrete Mathematics and Theoretical Computer Science DMTCS ol. 7, 2005, 203 216 On the maximum aerage degree and the incidence chromatic number of a graph Mohammad Hosseini Dolama 1 and Eric Sopena 2
Optimal Trust Network Analysis with Subjective Logic
The Second International Conference on Emerging Secrity Information, Systems and Technologies Optimal Trst Network Analysis with Sbjective Logic Adn Jøsang UNIK Gradate Center, University of Oslo Norway
Resource Pricing and Provisioning Strategies in Cloud Systems: A Stackelberg Game Approach
Resorce Pricing and Provisioning Strategies in Clod Systems: A Stackelberg Game Approach Valeria Cardellini, Valerio di Valerio and Francesco Lo Presti Talk Otline Backgrond and Motivation Provisioning
Lecture 39: Intro to Differential Amplifiers. Context
Lecture 39: Intro to Differential Amplifiers Prof J. S. Smith Context Next week is the last week of lecture, and we will spend those three lectures reiewing the material of the course, and looking at applications
A Contemporary Approach
BORICP01.doc - 1 Second Edition Edcational Psychology A Contemporary Approach Gary D. Borich The University of Texas at Astin Martin L. Tombari University of Denver (This pblication may be reprodced for
Candidate: Cassandra Emery. Date: 04/02/2012
Market Analyst Assessment Report 04/02/2012 www.resorceassociates.com To Improve Prodctivity Throgh People. 04/02/2012 Prepared For: Resorce Associates Prepared by: John Lonsbry, Ph.D. & Lcy Gibson, Ph.D.,
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
Central Angles, Arc Length, and Sector Area
CHAPTER 5 A Central Angles, Arc Length, and Sector Area c GOAL Identify central angles and determine arc length and sector area formed by a central angle. Yo will need a calclator a compass a protractor
Stability of Linear Control System
Stabilit of Linear Control Sstem Concept of Stabilit Closed-loop feedback sstem is either stable or nstable. This tpe of characterization is referred to as absolte stabilit. Given that the sstem is stable,
Statistical Image Completion
STATISTICAL IMAGE COMPLETION 1 Statistical Image Completion Kaiming He, Member, IEEE and Jian Sn, Member, IEEE Abstract Image completion inoles filling missing parts in images. In this paper we address
A Layered Architecture for Querying Dynamic Web Content
A Layered Architectre for Qerying Dynamic Web Content Hasan Dalc Uniersity at Stony Brook [email protected] Jliana Freire Bell Laboratories [email protected] Michael Kifer Uniersity at Stony
A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails
12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
The Dot Product. Properties of the Dot Product If u and v are vectors and a is a real number, then the following are true:
00 000 00 0 000 000 0 The Dot Prodct Tesday, 2// Section 8.5, Page 67 Definition of the Dot Prodct The dot prodct is often sed in calcls and physics. Gien two ectors = and = , then their
3. Fluid Dynamics. 3.1 Uniform Flow, Steady Flow
3. Flid Dynamics Objectives Introdce concepts necessary to analyse flids in motion Identify differences between Steady/nsteady niform/non-niform compressible/incompressible flow Demonstrate streamlines
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Non-Inferiority Tests for Two Means using Differences
Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous
Purposefully Engineered High-Performing Income Protection
The Intelligent Choice for Disability Income Insrance Prposeflly Engineered High-Performing Income Protection Keeping Income strong We engineer or disability income prodcts with featres that deliver benefits
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
Goodness of fit assessment of item response theory models
Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
