# A Kernel Method for the Two-Sample-Problem

Save this PDF as:

Size: px
Start display at page:

Download "A Kernel Method for the Two-Sample-Problem"

## Transcription

1 A Kernel Method for the Two-Sample-Problem Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany Karsten M. Borgwardt Ludwig-Maximilians-Univ. Munich, Germany Malte Rasch Graz Univ. of Technology, Graz, Austria Bernhard Schölkopf MPI for Biological Cybernetics Tübingen, Germany Alexander J. Smola NICTA, ANU Canberra, Australia Abstract We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS. The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. The test statistic can be computed in O(m 2 time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist. 1 Introduction We address the problem of comparing samples from two probability distributions, by proposing a statistical test of the hypothesis that these distributions are different (this is called the two-sample or homogeneity problem. This test has application in a variety of areas. In bioinformatics, it is of interest to compare microarray data from different tissue types, either to determine whether two subtypes of cancer may be treated as statistically indistinguishable from a diagnosis perspective, or to detect differences in healthy and cancerous tissue. In database attribute matching, it is desirable to merge databases containing multiple fields, where it is not known in advance which fields correspond: the fields are matched by maximising the similarity in the distributions of their entries. In this study, we propose to test whether distributions p and q are different on the basis of samples drawn from each of them, by finding a smooth function which is large on the points drawn from p, and small (as negative as possible on the points from q. We use as our test statistic the difference between the mean function values on the two samples; when this is large, the samples are likely from different distributions. We call this statistic the Maximum Mean Discrepancy (MMD. Clearly the quality of MMD as a statistic depends heavily on the class F of smooth functions that define it. On one hand, F must be rich enough so that the population MMD vanishes if and only if p = q. On the other hand, for the test to be consistent, F needs to be restrictive enough for the empirical estimate of MMD to converge quickly to its expectation as the sample size increases. We shall use the unit balls in universal reproducing kernel Hilbert spaces [22] as our function class, since these will be shown to satisfy both of the foregoing properties. On a more practical note, MMD is cheap to compute: given m points sampled from p and n from q, the cost is O(m + n 2 time. We define two non-parametric statistical tests based on MMD. The first, which uses distributionindependent uniform convergence bounds, provides finite sample guarantees of test performance, at the expense of being conservative in detecting differences between p and q. The second test is

2 based on the asymptotic distribution of MMD, and is in practice more sensitive to differences in distribution at small sample sizes. These results build on our earlier work in [6] on MMD for the two sample problem, which addresses only the second kind of test. In addition, the present approach employs a more accurate approximation to the asymptotic distribution of the test statistic. We begin our presentation in Section 2 with a formal definition of the MMD, and a proof that the population MMD is zero if and only if p = q when F is the unit ball of a universal RKHS. We also give an overview of hypothesis testing as it applies to the two-sample problem, and review previous approaches. In Section 3, we provide a bound on the deviation between the population and empirical MMD, as a function of the Rademacher averages of F with respect to p and q. This leads to a first hypothesis test. We take a different approach in Section 4, where we use the asymptotic distribution of an unbiased estimate of the squared MMD as the basis for a second test. Finally, in Section 5, we demonstrate the performance of our method on problems from neuroscience, bioinformatics, and attribute matching using the Hungarian marriage approach. Our approach performs well on high dimensional data with low sample size; in addition, we are able to successfully apply our test to graph data, for which no alternative tests exist. Proofs and further details are provided in [13], and software may be downloaded from http : // 2 The Two-Sample-Problem Our goal is to formulate a statistical test that answers the following question: Problem 1 Let p and q be distributions defined on a domain X. Given observations X := {x 1,..., x m } and Y := {y 1,...,y n }, drawn independently and identically distributed (i.i.d. from p and q respectively, is p q? To start with, we wish to determine a criterion that, in the population setting, takes on a unique and distinctive value only when p = q. It will be defined based on [10, Lemma 9.3.2]. Lemma 1 Let (X, d be a separable metric space, and let p, q be two Borel probability measures defined on X. Then p = q if and only if E p (f(x = E q (f(x for all f C(X, where C(X is the space of continuous bounded functions on X. Although C(X in principle allows us to identify p = q uniquely, such a rich function class is not practical in the finite sample setting. We thus define a more general class of statistic, for as yet unspecified function classes F, to measure the discrepancy between p and q, as proposed in [11]. Definition 2 Let F be a class of functions f : X R and let p, q, X, Y be defined as above. Then we define the maximum mean discrepancy (MMD and its empirical estimate as MMD[F, p, q] := sup (E x p [f(x] E y q [f(y], (1 f F MMD [F, X, Y ] := sup f F ( 1 m m f(x i 1 n i=1 n f(y i. (2 We must now identify a function class that is rich enough to uniquely establish whether p = q, yet restrictive enough to provide useful finite sample estimates (the latter property will be established in subsequent sections. To this end, we select F to be the unit ball in a universal RKHS H [22]; we will henceforth use F only to denote this function class. With the additional restriction that X be compact, a universal RKHS is dense in C(X with respect to the L norm. It is shown in [22] that Gaussian and Laplace kernels are universal. Theorem 3 Let F be a unit ball in a universal RKHS H, defined on the compact metric space X, with associated kernel k(,. Then MMD [F, p, q] = 0 if and only if p = q. This theorem is proved in [13]. We next express the MMD in a more easily computable form. This is simplified by the fact that in an RKHS, function evaluations can be written f(x = φ(x, f, where φ(x = k(x,.. Denote by µ[p] := E x p(x [φ(x] the expectation of φ(x (assuming that it exists i=1

3 a sufficient condition for this is µ[p] 2 H <, which is rearranged as E p[k(x, x ] <, where x and x are independent random variables drawn according to p. Since E p [f(x] = µ[p], f, we may rewrite MMD[F, p, q] = i,j=1 sup µ[p] µ[q], f = µ[p] µ[q] H. (3 f H 1 Using µ[x] := 1 m m i=1 φ(x i and k(x, x = φ(x, φ(x, an empirical estimate of MMD is 1 MMD [F, X, Y ] = 1 m m 2 k(x i, x j 2 m,n k(x i, y j + 1 n 2 mn n 2 k(y i, y j. (4 Eq. (4 provides us with a test statistic for p q. We shall see in Section 3 that this estimate is biased, although it is straightforward to upper bound the bias (we give an unbiased estimate, and an associated test, in Section 4. Intuitively we expect MMD[F, X, Y ] to be small if p = q, and the quantity to be large if the distributions are far apart. Computing (4 costs O((m + n 2 time. Overview of Statistical Hypothesis Testing, and of Previous Approaches Having defined our test statistic, we briefly describe the framework of statistical hypothesis testing as it applies in the present context, following [9, Chapter 8]. Given i.i.d. samples X p of size m and Y q of size n, the statistical test, T(X, Y : X m X n {0, 1} is used to distinguish between the null hypothesis H 0 : p = q and the alternative hypothesis H 1 : p q. This is achieved by comparing the test statistic MMD[F, X, Y ] with a particular threshold: if the threshold is exceeded, then the test rejects the null hypothesis (bearing in mind that a zero population MMD indicates p = q. The acceptance region of the test is thus defined as any real number below the threshold. Since the test is based on finite samples, it is possible that an incorrect answer will be returned: we define the Type I error as the probability of rejecting p = q based on the observed sample, despite the null hypothesis being true. Conversely, the Type II error is the probability of accepting p = q despite the underlying distributions being different. The level α of a test is an upper bound on the Type I error: this is a design parameter of the test, and is used to set the threshold to which we compare the test statistic (finding the test threshold for a given α is the topic of Sections 3 and 4. A consistent test achieves a level α, and a Type II error of zero, in the large sample limit. We will see that both of the tests proposed in this paper are consistent. We next give a brief overview of previous approaches to the two sample problem for multivariate data. Since our later experimental comparison is with respect to certain of these methods, we give abbreviated algorithm names in italics where appropriate: these should be used as a key to the tables in Section 5. We provide further details in [13]. A generalisation of the Wald-Wolfowitz runs test to the multivariate domain was proposed and analysed in [12, 17] (Wolf, which involves counting the number of edges in the minimum spanning tree over the aggregated data that connect points in X to points in Y. The computational cost of this method using Kruskal s algorithm is O((m + n 2 log(m + n, although more modern methods improve on the log(m + n term. Two possible generalisations of the Kolmogorov-Smirnov test to the multivariate case were studied in [4, 12]. The approach of Friedman and Rafsky (Smir in this case again requires a minimal spanning tree, and has a similar cost to their multivariate runs test. A more recent multivariate test was proposed in [20], which is based on the minimum distance non-bipartite matching over the aggregate data, at cost O((m + n 3. Another recent test was proposed in [15] (Hall: for each point from p, it requires computing the closest points in the aggregated data, and counting how many of these are from q (the procedure is repeated for each point from q with respect to points from p. The test statistic is costly to compute; [15] consider only tens of points in their experiments. Yet another approach is to use some distance (e.g. L 1 or L 2 between Parzen window estimates of the densities as a test statistic [1, 3], based on the asymptotic distribution of this distance given p = q. When the L 2 norm is used, the test statistic is related to those we present here, although it is arrived at from a different perspective (see [13]: the test in [1] is obtained in a more restricted setting where the RKHS kernel is an inner product between Parzen windows. Since we are not doing density estimation, however, we need not decrease the kernel width as the sample grows. In fact, decreasing the kernel width reduces the convergence rate of the associated two-sample test, compared with the (m + n 1/2 rate for fixed kernels. The L 1 approach of [3] (Biau requires the space to be partitioned into a grid of bins, which becomes difficult or impossible for high dimensional problems. Hence we use this test only for low-dimensional problems in our experiments. i,j=1 i,j=1

4 3 A Test based on Uniform Convergence Bounds In this section, we establish two properties of the MMD. First, we show that regardless of whether or not p = q, the empirical MMD converges in probability at rate 1/ m + n to its population value. This establishes the consistency of statistical tests based on MMD. Second, we give probabilistic bounds for large deviations of the empirical MMD in the case p = q. These bounds lead directly to a threshold for our first hypothesis test. We begin our discussion of the convergence of MMD[F, X, Y ] to MMD[F, p, q]. Theorem 4 Let p, q, X, Y be defined as in Problem 1, and assume k(x, y K. Then { ( } ( Pr MMD[F, X, Y ] MMD[F, p, q] > 2 (K/m (K/n 2 + ǫ 2 exp ǫ 2 mn 2K(m+n Our next goal is to refine this result in a way that allows us to define a test threshold under the null hypothesis p = q. Under this circumstance, the constants in the exponent are slightly improved. Theorem 5 Under the conditions of Theorem 4 where additionally p = q and m = n, MMD[F, X, Y ] > m 1 2 2E p [k(x, x k(x, x ] + ǫ > 2(K/m 1/2 + ǫ, } {{ } } {{ } B 2(F,p ( both with probability less than exp ǫ2 m 4K B 1(F,p (see [13] for the proof. In this theorem, we illustrate two possible bounds B 1 (F, p and B 2 (F, p on the bias in the empirical estimate (4. The first inequality is interesting inasmuch as it provides a link between the bias bound B 1 (F, p and kernel size (for instance, if we were to use a Gaussian kernel with large σ, then k(x, x and k(x, x would likely be close, and the bias small. In the context of testing, however, we would need to provide an additional bound to show convergence of an empirical estimate of B 1 (F, p to its population equivalent. Thus, in the following test for p = q based on Theorem 5, we use B 2 (F, p to bound the bias. Lemma 6 A hypothesis test of level α for the null hypothesis p = q (equivalently MMD[F, p, q] = 0 has the acceptance region MMD[F, X, Y ] < 2 K/m (1 + log α 1. We emphasise that Theorem 4 guarantees the consistency of the test, and that the Type II error probability decreases to zero at rate 1/ m (assuming m = n. To put this convergence rate in perspective, consider a test of whether two normal distributions have equal means, given they have unknown but equal variance [9, Exercise 8.41]. In this case, the test statistic has a Student-t distribution with n + m 2 degrees of freedom, and its error probability converges at the same rate as our test. 4 An Unbiased Test Based on the Asymptotic Distribution of the U-Statistic We now propose a second test, which is based on the asymptotic distribution of an unbiased estimate of MMD 2. We begin by defining this test statistic. Lemma 7 Given x and x independent random variables with distribution p, and y and y independent random variables with distribution q, the population MMD 2 is MMD 2 [F, p, q] = E x,x p [k(x, x ] 2E x p,y q [k(x, y] + E y,y q [k(y, y ] (5 (see [13] for details. Let Z := (z 1,...,z m be m i.i.d. random variables, where z i := (x i, y i (i.e. we assume m = n. An unbiased empirical estimate of MMD 2 is MMD 2 1 m u [F, X, Y ] = h(z i, z j, (6 (m(m 1 which is a one-sample U-statistic with h(z i, z j := k(x i, x j + k(y i, y j k(x i, y j k(x j, y i. i j.

5 The empirical statistic is an unbiased estimate of MMD 2, although it does not have minimum variance (the minimum variance estimate is almost identical: see [21, Section 5.1.4]. We remark that these quantities can easily be linked with a simple kernel between probability measures: (5 is a special case of the Hilbertian metric [16, Eq. (4] with the associated kernel K(p, q = E p,q k(x, y [16, Theorem 4]. The asymptotic distribution of this test statistic under H 1 is given by [21, Section 5.5.1], and the distribution under H 0 is computed based on [21, Section 5.5.2] and [1, Appendix]; see [13] for details. Theorem 8 We assume E ( h 2 <. Under H 1, MMD 2 u converges in distribution (defined e.g. in [14, Section 7.2] to a Gaussian according to ( m 1 2 MMD 2 u MMD 2 [F, p, q] D ( N 0, σ 2 u, where σ 2 u = 4 (E z [ (Ez h(z, z 2] [E z,z (h(z, z ] 2, uniformly at rate 1/ m [21, Theorem B, p. 193]. Under H 0, the U-statistic is degenerate, meaning E z h(z, z = 0. In this case, MMD 2 u converges in distribution according to mmmd 2 D [ u λ l z 2 l 2 ], (7 where z l N(0, 2 i.i.d., λ i are the solutions to the eigenvalue equation k(x, x ψ i (xdp(x = λ i ψ i (x, X l=1 and k(x i, x j := k(x i, x j E x k(x i, x E x k(x, x j +E x,x k(x, x is the centred RKHS kernel. Our goal is to determine whether the empirical test statistic MMD 2 u is so large as to be outside the 1 α quantile of the null distribution in (7 (consistency of the resulting test is guaranteed by the form of the distribution under H 1. One way to estimate this quantile is using the bootstrap [2] on the aggregated data. Alternatively, we may approximate the null distribution by fitting Pearson curves to its first four moments [18, Section 18.8]. Taking advantage of the degeneracy of the U-statistic, we obtain (see [13] ( [MMD ] 2 2 E u = 2 m(m 1 E [ z,z h 2 (z, z ] and ( [MMD ] 2 3 8(m 2 E u = m 2 (m 1 2E z,z [h(z, z E z (h(z, z h(z, z ] + O(m 4. (8 ( [MMD ] 2 4 The fourth moment E u is not computed, since it is both very small (O(m 4 and expensive to calculate (O(m 4. Instead, we replace the kurtosis with its lower bound kurt ( MMD 2 ( ( u skew MMD 2 2 u Experiments We conducted distribution comparisons using our MMD-based tests on datasets from three realworld domains: database applications, bioinformatics, and neurobiology. We investigated the uniform convergence approach (MMD, the asymptotic approach with bootstrap (MMD 2 u B, and the asymptotic approach with moment matching to Pearson curves (MMD 2 u M. We also compared against several alternatives from the literature (where applicable: the multivariate t- test, the Friedman-Rafsky Kolmogorov-Smirnov generalisation (Smir, the Friedman-Rafsky Wald- Wolfowitz generalisation (Wolf, the Biau-Györfi test (Biau, and the Hall-Tajvidi test (Hall. Note that we do not apply the Biau-Györfi test to high-dimensional problems (see end of Section 2, and that MMD is the only method applicable to structured data such as graphs. An important issue in the practical application of the MMD-based tests is the selection of the kernel parameters. We illustrate this with a Gaussian RBF kernel, where we must choose the kernel width

6 σ (we use this kernel for univariate and multivariate data, but not for graphs. The empirical MMD is zero both for kernel size σ = 0 (where the aggregate Gram matrix over X and Y is a unit matrix, and also approaches zero as σ (where the aggregate Gram matrix becomes uniformly constant. We set σ to be the median distance between points in the aggregate sample, as a compromise between these two extremes: this remains a heuristic, however, and the optimum choice of kernel size is an ongoing area of research. Data integration As a first application of MMD, we performed distribution testing for data integration: the objective is to aggregate two datasets into a single sample, with the understanding that both original samples are generated from the same distribution. Clearly, it is important to check this last condition before proceeding, or an analysis could detect patterns in the new dataset that are caused by combining the two different source distributions, and not by real-world phenomena. We chose several real-world settings to perform this task: we compared microarray data from normal and tumor tissues (Health status, microarray data from different subtypes of cancer (Subtype, and local field potential (LFP electrode recordings from the Macaque primary visual cortex (V1 with and without spike events (Neural Data I and II. In all cases, the two data sets have different statistical properties, but the detection of these differences is made difficult by the high data dimensionality. We applied our tests to these datasets in the following fashion. Given two datasets A and B, we either chose one sample from A and the other from B (attributes = different; or both samples from either A or B (attributes = same. We then repeated this process up to 1200 times. Results are reported in Table 1. Our asymptotic tests perform better than all competitors besides Wolf: in the latter case, we have greater Type II error for one neural dataset, lower Type II error on the Health Status data (which has very high dimension and low sample size, and identical (error-free performance on the remaining examples. We note that the Type I error of the bootstrap test on the Subtype dataset is far from its design value of 0.05, indicating that the Pearson curves provide a better threshold estimate for these low sample sizes. For the remaining datasets, the Type I errors of the Pearson and Bootstrap approximations are close. Thus, for larger datasets, the bootstrap is to be preferred, since it costs O(m 2, compared with a cost of O(m 3 for Pearson (due to the cost of computing (8. Finally, the uniform convergence-based test is too conservative, finding differences in distribution only for the data with largest sample size. Dataset Attr. MMD MMD 2 u B MMD 2 u M t-test Wolf Smir Hall Neural Data I Same Different Neural Data II Same Different Health status Same Different Subtype Same Different Table 1: Distribution testing for data integration on multivariate data. Numbers indicate the percentage of repetitions for which the null hypothesis (p=q was accepted, given α = Sample size (dimension; repetitions of experiment: Neural I 4000 (63; 100 ; Neural II 1000 (100; 1200; Health Status 25 (12,600; 1000; Subtype 25 (2,118; Attribute matching Our second series of experiments addresses automatic attribute matching. Given two databases, we want to detect corresponding attributes in the schemas of these databases, based on their data-content (as a simple example, two databases might have respective fields Wage and Salary, which are assumed to be observed via a subsampling of a particular population, and we wish to automatically determine that both Wage and Salary denote to the same underlying attribute. We use a two-sample test on pairs of attributes from two databases to find corresponding pairs. 1 This procedure is also called table matching for tables from different databases. We performed attribute matching as follows: first, the dataset D was split into two halves A and B. Each of the n attributes 1 Note that corresponding attributes may have different distributions in real-world databases. Hence, schema matching cannot solely rely on distribution testing. Advanced approaches to schema matching using MMD as one key statistical test are a topic of current research.

7 in A (and B, resp. was then represented by its instances in A (resp. B. We then tested all pairs of attributes from A and from B against each other, to find the optimal assignment of attributes A 1,...,A n from A to attributes B 1,...,B n from B. We assumed that A and B contain the same number of attributes. As a naive approach, one could assume that any possible pair of attributes might correspond, and thus that every attribute of A needs to be tested against all the attributes of B to find the optimal match. We report results for this naive approach, aggregated over all pairs of possible attribute matches, in Table 2. We used three datasets: the census income dataset from the UCI KDD archive (CNUM, the protein homology dataset from the 2004 KDD Cup (BIO [8], and the forest dataset from the UCI ML archive [5]. For the final dataset, we performed univariate matching of attributes (FOREST and multivariate matching of tables (FOREST10D from two different databases, where each table represents one type of forest. Both our asymptotic MMD 2 u-based tests perform as well as or better than the alternatives, notably for CNUM, where the advantage of MMD 2 u is large. Unlike in Table 1, the next best alternatives are not consistently the same across all data: e.g. in BIO they are Wolf or Hall, whereas in FOREST they are Smir, Biau, or the t-test. Thus, MMD 2 u appears to perform more consistently across the multiple datasets. The Friedman-Rafsky tests do not always return a Type I error close to the design parameter: for instance, Wolf has a Type I error of 9.7% on the BIO dataset (on these data, MMD 2 u has the joint best Type II error without compromising the designed Type I performance. Finally, our uniform convergence approach performs much better than in Table 1, although surprisingly it fails to detect differences in FOREST10D. A more principled approach to attribute matching is also possible. Assume that φ(a = (φ 1 (A 1, φ 2 (A 2,..., φ n (A n : in other words, the kernel decomposes into kernels on the individual attributes of A (and also decomposes this way on the attributes of B. In this case, MMD 2 can be written n i=1 µ i(a i µ i (B i 2, where we sum over the MMD terms on each of the attributes. Our goal of optimally assigning attributes from B to attributes of A via MMD is equivalent to finding the optimal permutation π of attributes of B that minimizes n i=1 µ i(a i µ i (B π(i 2. If we define C ij = µ i (A i µ i (B j 2, then this is the same as minimizing the sum over C i,π(i. This is the linear assignment problem, which costs O(n 3 time using the Hungarian method [19]. Dataset Attr. MMD MMD 2 u B MMD 2 u M t-test Wolf Smir Hall Biau BIO Same Different FOREST Same Different CNUM Same Different FOREST10D Same Different Table 2: Naive attribute matching on univariate (BIO, FOREST, CNUM and multivariate data (FOREST10D. Numbers indicate the percentage of accepted null hypothesis (p=q pooled over attributes. α = Sample size (dimension; attributes; repetitions of experiment: BIO 377 (1; 6; 100; FOREST 538 (1; 10; 100; CNUM 386 (1; 13; 100; FOREST10D 1000 (10; 2; 100. We tested this Hungarian approach to attribute matching via MMD 2 u B on three univariate datasets (BIO, CNUM, FOREST and for table matching on a fourth (FOREST10D. To study MMD 2 u B on structured data, we obtained two datasets of protein graphs (PROTEINS and ENZYMES and used the graph kernel for proteins from [7] for table matching via the Hungarian method (the other tests were not applicable to this graph data. The challenge here is to match tables representing one functional class of proteins (or enzymes from dataset A to the corresponding tables (functional classes in B. Results are shown in Table 3. Besides on the BIO dataset, MMD 2 u B made no errors. 6 Summary and Discussion We have established two simple multivariate tests for comparing two distributions p and q. The test statistics are based on the maximum deviation of the expectation of a function evaluated on each of the random variables, taken over a sufficiently rich function class. We do not require density

8 Dataset Data type No. attributes Sample size Repetitions % correct matches BIO univariate CNUM univariate FOREST univariate FOREST10D multivariate ENZYME structured PROTEINS structured Table 3: Hungarian Method for attribute matching via MMD 2 u B on univariate (BIO, CNUM, FOR- EST, multivariate (FOREST10D, and structured data (ENZYMES, PROTEINS (α = 0.05; % correct matches is the percentage of the correct attribute matches detected over all repetitions. estimates as an intermediate step. Our method either outperforms competing methods, or is close to the best performing alternative. Finally, our test was successfully used to compare distributions on graphs, for which it is currently the only option. Acknowledgements: The authors thank Matthias Hein for helpful discussions, Patrick Warnat (DKFZ, Heidelberg for providing the microarray datasets, and Nikos Logothetis for providing the neural datasets. NICTA is funded through the Australian Government s Backing Australia s Ability initiative, in part through the ARC. This work was supported in part by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST References [1] N. Anderson, P. Hall, and D. Titterington. Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. Journal of Multivariate Analysis, 50:41 54, [2] M. Arcones and E. Giné. On the bootstrap of u and v statistics. The Annals of Statistics, 20(2: , [3] G. Biau and L. Gyorfi. On the asymptotic properties of a nonparametric l 1-test statistic of homogeneity. IEEE Transactions on Information Theory, 51(11: , [4] P. Bickel. A distribution free version of the Smirnov two sample test in the p-variate case. The Annals of Mathematical Statistics, 40(1:1 23, [5] C. L. Blake and C. J. Merz. UCI repository of machine learning databases, [6] K. M. Borgwardt, A. Gretton, M.J. Rasch, H.P. Kriegel, B. Schölkopf, and A.J. Smola. Integrating structured biological data by kernel maximum mean discrepancy. In ISMB, [7] K. M. Borgwardt, C. S. Ong, S. Schonauer, S. V. N. Vishwanathan, A. J. Smola, and H. P. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(Suppl 1:i47 i56, Jun [8] R. Caruana and T. Joachims. Kdd cup [9] G. Casella and R. Berger. Statistical Inference. Duxbury, Pacific Grove, CA, 2nd edition, [10] R. M. Dudley. Real analysis and probability. Cambridge University Press, Cambridge, UK, [11] R. Fortet and E. Mourier. Convergence de la réparation empirique vers la réparation théorique. Ann. Scient. École Norm. Sup., 70: , [12] J. Friedman and L. Rafsky. Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. The Annals of Statistics, 7(4: , [13] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola. A kernel method for the two sample problem. Technical Report 157, MPI for Biological Cybernetics, [14] G. R. Grimmet and D. R. Stirzaker. Probability and Random Processes. Oxford University Press, Oxford, third edition, [15] P. Hall and N. Tajvidi. Permutation tests for equality of distributions in high-dimensional settings. Biometrika, 89(2: , [16] M. Hein, T.N. Lal, and O. Bousquet. Hilbertian metrics on probability measures and their application in svm s. In Proceedings of the 26th DAGM Symposium, pages , Berlin, Springer. [17] N. Henze and M. Penrose. On the multivariate runs test. The Annals of Statistics, 27(1: , [18] N. L. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Volume 1 (Second Edition. John Wiley and Sons, [19] H.W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83 97, [20] P. Rosenbaum. An exact distribution-free test comparing two multivariate distributions based on adjacency. Journal of the Royal Statistical Society B, 67(4: , [21] R. Serfling. Approximation Theorems of Mathematical Statistics. Wiley, New York, [22] I. Steinwart. On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res., 2:67 93, 2002.

### Density Level Detection is Classification

Density Level Detection is Classification Ingo Steinwart, Don Hush and Clint Scovel Modeling, Algorithms and Informatics Group, CCS-3 Los Alamos National Laboratory {ingo,dhush,jcs}@lanl.gov Abstract We

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle

### E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

### The Method of Least Squares

The Method of Least Squares Steven J. Miller Mathematics Department Brown University Providence, RI 0292 Abstract The Method of Least Squares is a procedure to determine the best fit line to data; the

### Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

### Universal Algorithm for Trading in Stock Market Based on the Method of Calibration

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.

### Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

### An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### . (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minami-azabu

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### On Mardia s Tests of Multinormality

On Mardia s Tests of Multinormality Kankainen, A., Taskinen, S., Oja, H. Abstract. Classical multivariate analysis is based on the assumption that the data come from a multivariate normal distribution.

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### Chapter 4: Non-Parametric Classification

Chapter 4: Non-Parametric Classification Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination

### BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### C19 Machine Learning

C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

### Cheng Soon Ong & Christfried Webers. Canberra February June 2016

c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

### Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

### Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### The Variability of P-Values. Summary

The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### Variance estimation in clinical studies with interim sample size re-estimation

Variance estimation in clinical studies with interim sample size re-estimation 1 Variance estimation in clinical studies with interim sample size re-estimation Frank Miller AstraZeneca, Clinical Science,

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Large Margin DAGs for Multiclass Classification

S.A. Solla, T.K. Leen and K.-R. Müller (eds.), 57 55, MIT Press (000) Large Margin DAGs for Multiclass Classification John C. Platt Microsoft Research Microsoft Way Redmond, WA 9805 jplatt@microsoft.com

### HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

### Statistics Graduate Courses

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### 3. INNER PRODUCT SPACES

. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

### A methodology for estimating joint probability density functions

A methodology for estimating joint probability density functions Mauricio Monsalve Computer Science Department, Universidad de Chile Abstract We develop a theoretical methodology for estimating joint probability

### Bandwidth Selection for Nonparametric Distribution Estimation

Bandwidth Selection for Nonparametric Distribution Estimation Bruce E. Hansen University of Wisconsin www.ssc.wisc.edu/~bhansen May 2004 Abstract The mean-square efficiency of cumulative distribution function

### Non Parametric Inference

Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

### Online Classification on a Budget

Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk

### Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

### BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

### Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

### Learning with Local and Global Consistency

Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

### CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

### 1 Covariate Shift by Kernel Mean Matching

1 Covariate Shift by Kernel Mean Matching Arthur Gretton 1 Alex Smola 1 Jiayuan Huang Marcel Schmittfull Karsten Borgwardt Bernhard Schölkopf Given sets of observations of training and test data, we consider

### Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

### Drug Store Sales Prediction

Drug Store Sales Prediction Chenghao Wang, Yang Li Abstract - In this paper we tried to apply machine learning algorithm into a real world problem drug store sales forecasting. Given store information,

### Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

### Department of Economics

Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

### Decompose Error Rate into components, some of which can be measured on unlabeled data

Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance

### Monte Carlo testing with Big Data

Monte Carlo testing with Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:

### Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

### Ranking on Data Manifolds

Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname

### A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects

### The Optimality of Naive Bayes

The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

### NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

### Solution Using the geometric series a/(1 r) = x=1. x=1. Problem For each of the following distributions, compute

Math 472 Homework Assignment 1 Problem 1.9.2. Let p(x) 1/2 x, x 1, 2, 3,..., zero elsewhere, be the pmf of the random variable X. Find the mgf, the mean, and the variance of X. Solution 1.9.2. Using the

### Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

### Learning with Local and Global Consistency

Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### A Learning Algorithm For Neural Network Ensembles

A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República

### FACULTY WORKING PAPER NO. 1021

330 3385 1021 COPY 2 STX '«««.2 3 J FACULTY WORKING PAPER NO. 1021 The Use of Linear Approximation to Nonlinear Regression Analysis Anil K. Bera an*** 6? r i ce anc Bus nes; Bureau of Economic and Business

### Integer Factorization using the Quadratic Sieve

Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give

### Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling

Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu Topics 1. Hypothesis Testing 1.

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Lecture 14: Fourier Transform

Machine Learning: Foundations Fall Semester, 2010/2011 Lecture 14: Fourier Transform Lecturer: Yishay Mansour Scribes: Avi Goren & Oron Navon 1 14.1 Introduction In this lecture, we examine the Fourier

### Applied Multivariate Analysis - Big data analytics

Applied Multivariate Analysis - Big data analytics Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Bayesian sample size determination of vibration signals in machine learning approach to fault diagnosis of roller bearings

Bayesian sample size determination of vibration signals in machine learning approach to fault diagnosis of roller bearings Siddhant Sahu *, V. Sugumaran ** * School of Mechanical and Building Sciences,

### Triangular Visualization

Triangular Visualization Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland tmaszczyk@is.umk.pl;google:w.duch http://www.is.umk.pl Abstract.

### Factorization Theorems

Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

### 3.6: General Hypothesis Tests

3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.

### The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

### TRANSFORMATIONS OF RANDOM VARIABLES

TRANSFORMATIONS OF RANDOM VARIABLES 1. INTRODUCTION 1.1. Definition. We are often interested in the probability distributions or densities of functions of one or more random variables. Suppose we have

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### MATH41112/61112 Ergodic Theory Lecture Measure spaces

8. Measure spaces 8.1 Background In Lecture 1 we remarked that ergodic theory is the study of the qualitative distributional properties of typical orbits of a dynamical system and that these properties

### Convex analysis and profit/cost/support functions

CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

### Senior Secondary Australian Curriculum

Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero

### Chapter 7. Diagnosis and Prognosis of Breast Cancer using Histopathological Data

Chapter 7 Diagnosis and Prognosis of Breast Cancer using Histopathological Data In the previous chapter, a method for classification of mammograms using wavelet analysis and adaptive neuro-fuzzy inference

### Which Is the Best Multiclass SVM Method? An Empirical Study

Which Is the Best Multiclass SVM Method? An Empirical Study Kai-Bo Duan 1 and S. Sathiya Keerthi 2 1 BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 askbduan@ntu.edu.sg

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

### Finding statistical patterns in Big Data

Finding statistical patterns in Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research IAS Research Workshop: Data science for the real world (workshop 1)

### NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

### A study on the bi-aspect procedure with location and scale parameters

통계연구(2012), 제17권 제1호, 19-26 A study on the bi-aspect procedure with location and scale parameters (Short Title: Bi-aspect procedure) Hyo-Il Park 1) Ju Sung Kim 2) Abstract In this research we propose a

### Course 221: Analysis Academic year , First Semester

Course 221: Analysis Academic year 2007-08, First Semester David R. Wilkins Copyright c David R. Wilkins 1989 2007 Contents 1 Basic Theorems of Real Analysis 1 1.1 The Least Upper Bound Principle................

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### Business Statistics. Lecture 8: More Hypothesis Testing

Business Statistics Lecture 8: More Hypothesis Testing 1 Goals for this Lecture Review of t-tests Additional hypothesis tests Two-sample tests Paired tests 2 The Basic Idea of Hypothesis Testing Start

### LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as