Robust Regression on MapReduce

Size: px
Start display at page:

Download "Robust Regression on MapReduce"

Transcription

1 Xiangrui Meng LinkedIn Cororation, 2029 Stierlin Ct, Mountain View, CA Michael W. Mahoney Deartment of Mathematics, Stanford University, Stanford, CA Abstract Although the MaReduce framework is now the de facto standard for analyzing massive data sets, many algorithms (in articular, many iterative algorithms oular in machine learning, otimization, and linear algebra) are hard to fit into MaReduce. Consider, e.g., the l regression roblem: given a matrix A R m n and a vector b R m, find a vector x R n that minimizes f(x) = Ax b. The widely-used l 2 regression, i.e., linear least-squares, is known to be highly sensitive to outliers; and choosing [1, 2) can hel imrove robustness. In this work, we roose an efficient algorithm for solving strongly over-determined (m n) robust l regression roblems to moderate recision on MaReduce. Our emirical results on data u to the terabyte scale demonstrate that our algorithm is a significant imrovement over traditional iterative algorithms on MaReduce for l 1 regression, even for a fairly small number of iterations. In addition, our roosed interior-oint cutting-lane method can also be extended to solving more general convex roblems on MaReduce. 1 Introduction Statistical analysis of massive data sets resents very substantial challenges both to data infrastructure and to algorithm develoment. In articular, many oular data analysis and machine learning algorithms that erform well when alied to small-scale and mediumscale data that can be stored in RAM are infeasible when alied to the terabyte-scale and etabyte-scale Proceedings of the 30 th International Conference on Machine Learning, Atlanta, Georgia, USA, JMLR: W&CP volume 28. Coyright 2013 by the author(s). data sets that are stored in distributed environments and that are increasingly common. In this aer, we develo algorithms for variants of the robust regression roblem, and we evaluate imlementations of them on data of u to the terabyte scale. In addition to being of interest since ours are the first algorithms for these roblems that are aroriate for data of that scale, our results are also of interest since they highlight algorithm engineering challenges that will become more common as researchers try to scale u small-scale and medium-scale data analysis and machine learning methods. For examle, at several oints we had to work with variants of more rimitive algorithms that were worse by traditional comlexity measures but that had better communication roerties. 1.1 MaReduce and Large-scale Data The MaReduce framework, introduced by (Dean & Ghemawat, 2004) in 2004, has emerged as the de facto standard arallel environment for analyzing massive data sets. Aache Hadoo 1, an oen source software framework insired by Google s MaReduce, is now extensively used by comanies such as Facebook, LinkedIn, Yahoo!, etc. In a tyical alication, one builds clusters of thousands of nodes containing etabytes of storage in order to rocess terabytes or even etabytes of daily data. As a arallel comuting framework, MaReduce is well-known for its scalability to massive data. However, the scalability comes at the rice of a very restrictive interface: sequential access to data, and functions limited to ma and reduce. Working within this framework demands that traditional algorithms be redesigned to resect this interface. For examle, the Aache Mahout 2 roject is building a collection of scalable machine learning algorithms that includes algorithms for collaborative filtering, clustering, matrix decomosition, etc. 1 Aache Hadoo, htt://hadoo.aache.org/ 2 Aache Mahout, htt://mahout.aache.org/

2 Although some algorithms are easily adated to the MaReduce framework, many algorithms (and in articular many iterative algorithms oular in machine learning, otimization, and linear algebra) are not. When the data are stored on RAM, each iteration is usually very chea in terms of floating-oint oerations (FLOPs). However, when the data are stored on secondary storage, or in a distributed environment, each iteration requires at least one ass over the data. Since the cost of communication to and from secondary storage often dominates FLOP count costs, each ass can become very exensive for very large-scale roblems. Moreover, there is generally no arallelism between iterations: an iterative algorithm must wait until the revious ste gets comleted before the next ste can begin. 1.2 Our Main Results In this work, we are interested in develoing algorithms for robust regression roblems on MaReduce. Of greatest interest will be algorithms for the strongly over-determined l 1 regression roblem, 3 although our method will extend to more general l regression. For simlicity of resentation, we will formulate most of our discussion in terms of l regression; and toward the end we will describe the results of our imlementation of our algorithm for l 1 regression. Recall the strongly over-determined l regression roblem: given a matrix A R m n, with m n, a vector b R m, a number [1, ), and an error arameter ɛ > 0, find a (1 + ɛ)-aroximate solution ˆx R n to: i.e., find a vector ˆx such that f = min x R n Ax b, (1) Aˆx b (1 + ɛ)f, (2) where the l norm is given by x = ( i x i ) 1/. A more robust alternative to the widely-used l 2 regression is obtained by working with l regression, with [1, 2), where = 1 is by far the most oular alternative. This, however, comes at the cost of increased comlexity. While l 2 regression can be solved with, e.g., a QR decomosition, l 1 regression roblems can be formulated as linear rograms, and other l regression roblems can be formulated as convex rograms. In those cases, iterated weighted least-squares methods or simlex methods or interior-oint methods are tyically used in ractice. These algorithms tend to require dot roducts, orthogonalization, and thus a great 3 The l 1 regression roblem is also known as the Least Absolute Deviations or Least Absolute Errors roblem. deal of communication, rendering them challenging to imlement in the MaReduce framework. In this aer, we describe an algorithm with better communication roerties that is efficient for solving strongly over-determined l regression roblems to moderate recision on MaReduce. 4 Several asects of our main algorithm are of articular interest: Single-ass conditioning. We use a recentlydeveloed fast rounding algorithm (which takes O(mn 3 log m) time to construct a 2n-rounding of a centrally symmetric convex set in R n (Clarkson et al., 2013)) to construct a single-ass deterministic conditioning algorithm for l regression. Single-ass random samling. By using a constrained form of l regression (that was also used recently (Clarkson et al., 2013)), we show that the method of subsace-reserving random samling (Dasguta et al., 2009) can be (easily) imlemented in the MaReduce framework, i.e., with ma and reduce functions, in a single ass. Effective initialization. By using multile subsamled solutions from the single-ass random samling, we can construct a small initial search region for interior-oint cutting-lane methods. Effective iterative solving. By erforming in arallel multile queries at each iteration, we develo a randomized IPCPM (interior-oint cutting lane method) for solving the convex l regression rogram. In addition to describing the basic algorithm, we also resent emirical results from a numerical imlementation of our algorithm alied to the l 1 regression roblems on data sets of size u to the terabyte scale. 1.3 Prior Related Work There is a large literature on robust regression, distributed comutation, MaReduce, and randomized matrix algorithms that is beyond our scoe to review. See, e.g., (Rousseeuw & Leroy, 1987), (Bertsekas & Tsitsiklis, 1991), (Dean & Ghemawat, 2004), and (Mahoney, 2011), resectively, for details. Here, we will review only the most recent related work. Strongly over-determined l 1 regression roblems were considered by (Portnoy & Koenker, 1997), who used a uniformly-subsamled solution for l 1 regression to estimate the signs of the otimal residuals in order 4 Interestingly, both our single-ass conditioning algorithm as well as our iterative rocedure are worse in terms of FLOP counts than state-of-the-art algorithms (develoed for RAM) for these roblems see Tables 1 and 2 but we refer them since they erform better in very largescale distributed settings that are of interest to us here.

3 to reduce the roblem size; their samle size is roortional to (mn) 2/3. (Clarkson, 2005) showed that, with roer conditioning, relative-error aroximate solutions can be obtained from row norm-based samling; and (Dasguta et al., 2009) extended these subsace-reserving samling schemes to l regression, for [1, ), thereby obtaining relative-error aroximations. (Sohler & Woodruff, 2011) roved that a Cauchy Transform can be used for l 1 conditioning and thus l 1 regression in O(mn 2 log n) time; this was imroved to O(mn log n) time with the Fast Cauchy Transform by (Clarkson et al., 2013), who also develoed an ellisoidal rounding algorithm (see Lemma 1 below) and used it and a fast random rojection to construct a fast single-ass conditioning algorithm (uon which our Algorithm 1 below is based). (Clarkson & Woodruff, 2013) and (Meng & Mahoney, 2013) show that both l 2 regression and l regression can be solved in inut-sarsity time via subsacereserving samling. The large body of work on fast randomized algorithms for l 2 regression (and related) roblems has been reviewed recently (Mahoney, 2011). To obtain a (1 + ɛ)-aroximate solution in relative scale, the samle sizes required by all these algorithms are all roortional to 1/ɛ 2, which limits samling algorithms to low recision, e.g., ɛ 10 2, solutions. By using the outut of the samling/rojection ste as a reconditioner for a traditional iterative method, thereby leading to an O(log(1/ɛ)) deendence, this roblem has been overcome for l 2 regression (Mahoney, 2011). For l 1 regression, the O(1/ɛ 2 ) convergence rate of the subgradient method of (Clarkson, 2005) was imroved by (Nesterov, 2009), who showed that, with a smoothing technique, the number of iterations can be reduced to O(1/ɛ). Interestingly (and as we will return to in Section 3.3), the rarely-used ellisoid method (see (Grötschel et al., 1981)) as well as IPCPMs (see (Mitchell, 2003)) can solve general convex roblems and converge in O(log(1/ɛ)) iterations with extra oly(n) work er iteration. More generally, there has been a lot of interest recently in distributed machine learning comutations. For examle, (Daumé et al., 2012) describes efficient rotocols for distributed classification and otimization; (Balcan et al., 2012) analyzes communication comlexity and rivacy asects of distributed learning; (Mackey et al., 2011) adots a divide-andconquer aroach to matrix factorization such as CUR decomositions; and (Zhang et al., 2012) develo communication-efficient algorithms for statistical otimization. Algorithms for these and other roblems can be analyzed in models for MaReduce (Karloff et al., 2010; Goodrich, 2010; Feldman et al., 2010); and work on arallel and distributed aroaches to scaling u machine learning has been reviewed recently (Bekkerman et al., 2011). 2 Background and Overview In the remainder of this aer, we use the following formulation of the l regression roblem: minimize x R n Ax subject to c T x = 1. (3) This formulation of l regression, which consists of a homogeneous objective and an affine constraint, can be shown to be equivalent to the formulation of (1). 5 Denote the feasible region by Ω = {x R n c T x = 1}, where recall that we are interested in the case when m n. Let X be the set of all otimal solutions to (3) and x be an arbitrary otimal solution. Then, let f(x) = Ax, f = Ax, and let g(x) = A T [Ax] 1 / Ax 1 f(x), where ([Ax] 1 ) i = sign(a T i x) at i x 1 and a i is the i-th row of A, i = 1,..., m. Note that g(x) T x = f(x). For simlicity, we assume that A has full column rank and c 0. Our assumtions imly that X is a nonemty and bounded convex set and f > 0. Thus, given an ɛ > 0, our goal is to find an ˆx Ω that is a (1 + ɛ)-aroximate solution to (3) in relative scale, i.e., such that f(ˆx) < (1 + ɛ)f. As with l 2 regression, l regression roblems are easier to solve when they are well-conditioned. The l -norm condition number of A, denoted κ (A), is defined as: where κ (A) = σ max (A)/σ min (A), σ max (A) = max Ax and σ min (A) = min Ax. x 2 1 x 2 1 This imlies σ min (A) x 2 Ax σ max (A) x 2, x R n. We use κ, σ min, and σ max for simlicity when the underlying matrix is clear. The element-wise l -norm of A is denoted by A. We use E(d, E) = {x R n x = d + Ez, z 2 = 1} to describe an ellisoid where E R n n is a non-singular matrix. The volume of a full-dimensional ellisoid E is denoted by E. We 5 In articular, the new A is A concatenated with b, etc. Note that the same formulation is also used by (Nesterov, 2009) for solving unconstrained convex roblems in relative scale as well as by (Clarkson et al., 2013).

4 use S(S, t) = {x R n Sx t} to describe a olytoe, where S R s n and t R s for some s n + 1. Given an l regression roblem, its condition number is generally unknown and can be arbitrarily large; and thus one needs to run a conditioning algorithm before randomly samling and iteratively solving. Given any non-singular matrix E R n, let y be an otimal solution to the following roblem: minimize y R n AEy subject to c T Ey = 1. (4) This roblem is equivalent to (3), in that we have x = Ey X, but the condition number associated with (4) is κ (AE), instead of κ (A). So, the conditioning algorithm amounts to finding a nonsingular matrix E R n such that κ (AE) is small. One aroach to conditioning is via ellisoidal rounding. In this aer, we will modify the following result from (Clarkson et al., 2013) to comute a fast ellisoidal rounding. Lemma 1 ((Clarkson et al., 2013)). Given A R m n with full column rank and [1, 2), it takes at most O(mn 3 log m) time to find a non-singular matrix E R n n such that y 2 AEy 2n y 2, y R n. Finally, we call a work online if it is executed on MaReduce, and offline otherwise. An online work deals with large-scale data stored on secondary storage but the work can be well distributed on MaReduce; an offline work deals with data stored on RAM. 3 l Regression on MaReduce In this section, we will describe our main algorithm for l regression on MaReduce. 3.1 Single-ass Conditioning Algorithm The algorithm of Lemma 1 for comuting a 2nrounding is not immediately-alicable to large-scale l regression roblems, since each call to the oracle requires a ass to the data. 6 We can grou n calls together within a single ass, but we would still need O(n log m) asses. Here, we resent a deterministic single-ass conditioning algorithm that balances the cost-erformance trade-off to rovide a 2n 2/ - conditioning of A. See Algorithm 1. Our main result for Algorithm 1 is given in the following lemma. Lemma 2. Algorithm 1 is a 2n 2/ -conditioning algorithm and it runs in O((mn 2 + n 4 ) log m) time. 6 The algorithm takes a centrally-symmetric convex set described by a searation oracle that is a subgradient of Ax ; see (Clarkson et al., 2013) for details. Algorithm 1 A single-ass conditioning algorithm. Inut: A R m n with full column rank & [1, 2). Outut: A non-singular matrix E R n n such that y 2 AEy 2n 2/ y 2, y R n. 1: Partition A along its rows into sub-matrices of size n 2 n, denoted by A 1,..., A M. 2: For each A i, comute its economy-sized singular value decomosition (SVD): A i = U i Σ i Vi T. 3: Let Ãi = Σ i Vi T for i = 1,..., M, ( Ã1 ) C = {x ( Ãix 2 )1/ 1}, and à =... à M 4: Comute à s SVD: à = Ũ ΣṼ T. 5: Let E 0 = E(0, E 0 ) where E 0 = n 1/ 1/2 Ṽ Σ 1. E 0 gives an (Mn 2 ) 1/ 1/2 -rounding of C. 6: With the algorithm of Lemma 1, comute an ellisoid E = E(0, E) that gives a 2n-rounding of C. 7: Return E. Proof. The idea is to use block-wise reduction in l 2 - norm and aly fast rounding to a small roblem. The tool we need is simly the equivalence of vector norms. Let C = {x R n Ax 1}, which is convex, full-dimensional, bounded, and centrally symmetric. Adoting notation from Algorithm 1, we first have n 1 2/ C C C because for all x R n, M M Ax = A i x n 2 A i x 2 =n2 Ãix 2 and M Ax = A i x A i x 2 = Ãix 2. Next we rove that E 0 gives an (Mn 2 ) 1/ 1/2 -rounding of C. For all x R n, we have M Ãix 2 Ãix = Ãx (Mn) 1 /2 Ãx 2 = (Mn) 1 /2 ΣṼ T x 2, and Ãix 2 M n/2 1 Ãix = n /2 1 Ãx n /2 1 Ãx 2 = n/2 1 ΣṼ T x 2. Then by choosing E 0 = n 1/ 1/2 Ṽ Σ 1, we get E0 1 x 2 ( Ãix 2 )1/ (Mn 2 ) 1/ 1/2 E0 1 x 2

5 time κ 1 (Clarkson, 2005) O(mn 5 log m) (n(n + 1)) 1/2 Lemma 1 O(mn 3 log m) 2n Lemma 2 & Algorithm 1 O(mn 2 log m) 2n 2 (Sohler & Woodruff, 2011) O(mn 2 log n) O(n 3/2 log 3/2 n) (Clarkson et al., 2013) O(mn log m) O(n 5/2 log 1/2 n) (Clarkson et al., 2013) O(mn log n) O(n 5/2 log 5/2 n) (Meng & Mahoney, 2013) O(nnz(A)) O(n 3 log 3 n) Table 1. Comarison of l 1-norm conditioning algorithms on the running time and conditioning quality. for all x R n and hence E 0 gives an (Mn 2 ) 1/ 1/2 - rounding of C. Since n 1 2/ C C C, we know that any 2n-rounding of C is a 2n n 2/ 1 = 2n 2/ -rounding of C. Therefore, Algorithm 1 comutes a 2n 2/ - conditioning of A. Note that the rounding rocedure is alied to a roblem of size Mn n m/n n. Therefore, Algorithm 1 only needs a single ass through the data, with O(mn 2 ) FLOPs and an offline work of O((mn 2 + n 4 ) log m) FLOPs. The offline work requires m RAM, which might be too much for largescale roblems. In such cases, we can increase the block size from n 2 to, for examle, n 3. This gives us a 2n 3/ 1/2 -conditioning algorithm that only needs m/n offline RAM and O((mn + n 4 ) log m) offline FLOPs. The roof follows similar arguments. See Table 1 for a comarison of the results of Algorithm 1 and Lemma 2 with rior work on l 1 norm conditioning (and note that some of these results, e.g., those of (Clarkson et al., 2013) and (Meng & Mahoney, 2013), have extensions that aly to l - norm conditioning). Although the Cauchy Transform (Sohler & Woodruff, 2011) and the Fast Cauchy Transform (Clarkson et al., 2013) are indeendent of A and require little offline work, there are several concerns with using them in our alication. First, the constants hidden in κ 1 are not exlicitly given, and they may be too large for ractical use, esecially when n is small. Second, although random samling algorithms do not require σ min and σ max as inuts, some algorithms, e.g., IPCPMs, need accurate bounds of them. Third, these transforms are randomized algorithms that fail with certain robability. Although we can reeat trials to make the failure rate arbitrarily small, we don t have a simle way to check whether or not any given trial succeeds. Finally, although the online work in Algorithm 1 remains O(mn 2 ), it is embarrassingly arallel and can be well distributed on MaReduce. For large-scale strongly over-determined roblems, Algorithm 1 with block size n 3 seems to be a good comromise in ractice. This guarantees 2n 3/ 1/2 -conditioning, and the O(mn 2 ) online work can be easily distributed on MaReduce. 3.2 Single-ass Random Samling Here, we describe our method for imlementing the subsace-samling rocedure with ma and reduce functions. Suose that after conditioning we have σ min (A) = 1 and κ (A) = oly(n). (Here, we use A instead of AE for simlicity.) Then the following method of (Dasguta et al., 2009) can be used to erform subsace-reserving samling. Lemma 3 ((Dasguta et al., 2009)). Given A R m n that is (α, β, )-conditioned 7 and an error arameter ɛ < 1 7, let r 16(2 +2)(αβ) (n log 12 ɛ +log 2 δ )/(2 ɛ 2 ), and let S R m m be a diagonal samling matrix, with random entries: { 1 S ii = i with robability i, 0 otherwise, where the imortance samling robabilities { i min 1, a i } A r, i = 1,..., m. Then, with robability at least 1 δ, the following holds for all x R n, (1 ɛ) Ax SAx (1 + ɛ) Ax. (5) This subsace-reserving samling lemma can be used, with the formulation (3), to obtain a relative-error aroximation to the l regression roblem, the roof of which is immediate. Lemma 4 ((Clarkson et al., 2013)). Let S be constructed as in Lemma 3, and let ˆx be the otimal solution to the subsamled roblem: minimize x R n SAx subject to c T x = 1. Then with robability at least 1 δ, ˆx is a 1+ɛ 1 ɛ - aroximate solution to (3). It is straightforward to imlement this algorithm in MaReduce in a single ass. This is resented in Algorithm 2. Imortantly, note that more than one subsamled solution can be obtained in a single ass. This translates to a higher recision or a lower failure rate; and, as described in Section 3.3, it can also be used to construct a better initialization. Several ractical oints are worth noting. First, nκ is an uer bound of A, which makes the actual samle size likely to be smaller than r. For better control on the samle size, we can comute A directly via one ass over A rior to samling, or we can set a 7 See (Clarkson et al., 2013) for the relationshi between κ (A) and the notion of (α, β, )-conditioning.

6 Algorithm 2 A single-ass samling algorithm. Inut: A R m n with σ max (A) = κ, c R n, a desired samle size r, and an integer N. Outut: N aroximate solutions: ˆx k, k = 1,..., N. 1: function maer(a: a row of A) 2: Let = min{r a /(nκ ), 1}. 3: Emit (k, a/) with robability, k = 1,..., N. 4: end function 5: function reducer(k, {a i }) 6: Assemble A k from {a i }. 7: Comute ˆx k = arg min ct x=1 A k x. 8: Emit (k, ˆx k ). 9: end function big r in maers and discard rows at random in reducers if the actual samle size is too big. Second, in ractice, it is hard to accet ɛ as an inut and determine the samle size r based on Lemma 3. Instead, we choose r directly based on our hardware caacity and running time requirements. For examle, suose we use a standard rimal-dual ath-following algorithm (see (Nesterov & Nemirovsky, 1994)) to solve subsamled roblems. Then, since each roblem needs O(rn) RAM and O(r 3/2 n 2 log r ɛ ) running time for a (1 + ɛ)- aroximate solution, where r is the samle size, this should dictate the choice of r. Similar considerations aly to the use of the ellisoid method or IPCPMs. 3.3 A Randomized IPCPM Algorithm A roblem with a vanilla alication of the subsacereserving random samling algorithm is accuracy: it is very efficient if we only need one or two accurate digits (see (Clarkson et al., 2013) for details), but if we are looking for moderate-recision solutions, e.g., those with ɛ 10 5, then we very quickly be limited by the O(1/ɛ 2 ) samle size required by Lemma 3. For examle, setting = 1, n = 10, αβ = 1000, and ɛ = 10 3 into Lemma 3, we get a samle size of aroximately 10 12, which as a ractical matter is certainly intractable for a subsamled roblem. In this section, we will describe an algorithm with a O(log(1/ɛ)) deendence, which is thus aroriate for comuting moderate-recision solutions. This algorithm will be a randomized IPCPM with several features secially-designed for MaReduce. In articular, the algorithm will take advantage of the multile subsamled solutions and the arallelizability of the MaReduce framework. As background, recall that IPCPMs are similar to the bisection method but work in a high dimensional sace. An IPCPM requires a olytoe S 0 that is known to contain a full-dimensional ball B of desired num. iter. addl work subgradient (Clarkson, 2005) O(n 4 /ɛ 2 ) gradient (Nesterov, 2009) O(m 1/2 log m/ɛ) ellisoid (Grötschel et al., 1981) O(n 2 log(κ/ɛ)) O(n 2 ) IPCPMs (see text for refs.) O(n log(κ/ɛ) oly(n) Table 2. Iterative algorithms for l regression: number of iterations and extra work er iteration. solutions described by a searation oracle. At ste k, a query oint x k int S k is sent to the oracle. If the query oint is not a desired solution, the oracle returns a half sace K k which contains B but not x k, and then we set S k+1 = S k K k and continue. If x k is chosen such that S k+1 / S k α, k for some α < 1, then the IPCPM converges geometrically. Such a choice of x k was first given by (Levin, 1965), who used (but did not rovide a way to comute) the center of gravity of S k. (Tarasov et al., 1988) roved that the center of the maximal-volume inscribed ellisoid also works; (Vaidya, 1996) showed the volumetric center works, but he didn t give an exlicit bound; and (Bertsimas & Vemala, 2004) suggest aroximating the center of gravity by random walks, e.g., the hitand-run algorithm (Lovász, 1999). Table 2 comares IPCPMs with other iterative methods on l regression roblems. Although they require extra work at each iteration, IPCPMs converge in the fewest number of iterations. 8 For comleteness, we will first describe a standard IPCPM aroach to l regression; and then we will describe the modifications we made to make it work in MaReduce. Assume that σ min (A) = 1 and κ (A) = oly(n). Let ˆf always denote the best objective value we have obtained. Then for any x R n, by convexity, g(x) T x = f(x) + g(x) T (x x) f ˆf. (6) This subgradient gives us the searation oracle. Let x 0 be the minimal l 2 -norm oint in Ω, in which case Ax 0 κ x 0 2 κ x 2 κ Ax, and hence x 0 is a κ -aroximate solution. Moreover, x x 0 x 2 Ax Ax 0, (7) 8 It is for this reason that ICPCMs seem to be good candidates for imroving subsamled solutions. Previous work assumes that data are in RAM, which means that the extra work er iteration is exensive. Since we consider large-scale distributed environments where data have to be accessed via asses, the number of iterations is the most recious resource, and thus the extra comutation at each iteration is relatively inexensive. Indeed, by using a randomized IPCPM, we will demonstrate that subsamled solutions can be imroved in very few asses.

7 Algorithm 3 A randomized IPCPM Inut: A R m n with σ min (A) 1, c R n, a set of initial oints, number of iterations M, and N 1. Outut: An aroximate solution ˆx. 1: Choose K = O(n). 2: Comute (f(x), g(x)) for each initial oint x. 3: Let ˆf = f(ˆx) always denote the best we have. 4: for i=0,...,m-1 do 5: Construct S i from known (f, g) airs and ˆf. 6: Generate random walks in S i : z (i) 1, z(i) 2,... 7: Let x (i) k = 1 K 8: Comute (f(x (i) k 9: end for 10: Return ˆx. kk j=(k 1)K+1 z(i) j, k = 1,..., N. ), g(i) (x (i) k )) for each k. which defines the initial olytoe S 0. Given ɛ > 0, for any x B = {x Ω x x 2 ɛ Ax 0 /κ 2 }, Ax Ax A(x x ) κ x x 2 ɛ Ax 0 /κ ɛ Ax. So all oints in B are (1 + ɛ)-aroximate solutions. The number of iterations to reach a (1 + ɛ)- aroximation is O(log( S 0 / B )) = O(log((κ 2 /ɛ) n )) = O(n log(n/ɛ)). This leads to an O((mn 2 + oly(n)) log(n/ɛ))-time algorithm, which is better than samling when ɛ is very small. Note that we will actually aly the IPCPM in a coordinate system defined on Ω, where the maings from and to the coordinate system of R n are given by Householder transforms; we omit the details. Our randomized IPCPM for use on MaReduce, which is given in Algorithm 3, differs from the standard aroach just described in two asects: samling initialization; and multile queries er iteration. In both cases, we take imortant advantage of the eculiar roerties of the MaReduce framework. For the initialization, note that constructing S 0 from x 0 may not be a good choice since we can only guarantee κ = oly(n). Recall, however, that we actually have N subsamled solutions from Algorithm 2, and all of these solutions can be used to construct a better S 0. Thus, we first comute ˆf k = f(ˆx k ) and ĝ k = g(ˆx k ) for k = 1,..., N in a single ass. For each ˆx k, we define a olytoe containing x using (6) and x ˆx k A(x ˆx k ) f + ˆf k ˆf + ˆf k. We then merge all these olytoes to construct S 0, which is described by 2n + N constraints. Note also that it would be hard to use all the available aroximate solutions if we chose to iterate with a subgradient or gradient method. For the iteration, the question is which query oint we send at each ste. Here, instead of one query, we send multile queries. Recall that, for a data intensive job, the dominant cost is the cost of inut/outut, and hence we want to extract as much information as ossible for each ass. Take an examle of one of our runs on a 10-node Hadoo cluster: with a matrix A of size , then a ass with a single query took 282 seconds, while a ass with 100 queries only took 328 seconds so the extra 99 queries come almost for free. To generate these multile queries, we follow the random walk aroach roosed by (Bertsimas & Vemala, 2004). The urose of the random walk is to generate uniformly distributed oints in S k such that we can estimate the center of gravity. Instead of comuting one estimate, we comute multile estimates. We conclude our discussion of our randomized IPCPM algorithm with a few comments. The online work of comuting (f, g) airs and the offline work of generating random walks can be done artially in arallel. Because S i+1 S i, we can continue generating random walks in S i while comuting (f, g) airs. When we have S i+1, simly discard oints outside S i+1. Even if we don t have enough oints left, it is very likely that we have a warm-start distribution that allows fast mixing. The way we choose query oints works well in ractice but doesn t guarantee faster convergence. How to choose query oints for guaranteed faster convergence is worth further investigation. However, we are not execting that by sending O(n) queries er ste we can reduce the number of iterations to O(log(1/ɛ)), which may require exonentially many queries. Sending multile queries makes the number of linear inequalities describing S k increase raidly, which is a roblem if we have too many iterations. But here we are just looking for, say, fewer than 30 iterations. Otherwise, we can urge redundant or unimortant linear constraints on the fly. 4 Emirical Evaluation The comutations are erformed on a Hadoo cluster with 40 CPU cores. We used the l 1 regression test roblem from (Clarkson et al., 2013). The roblem is of size 5.24e9 15, generated in the following way: The true signal x is a standard Gaussian vector. Each row of the design matrix A is a canonical

8 x x 1 x 1 x x 2 x 2 x x x ALG1 [0.0057, ] [0.0059, ] [0.0059, ] CT [0.008, ] [0.0090, ] [0.0113, ] UNIF [0.0572, ] [0.089, 0.166] [0.129, 0.254] NOCD [0.0823, 22.1] [0.126, 70.8] [0.193, 134] standard IPCPM roosed IPCPM Table 3. The 1st and the 3rd quartiles of the relative errors in 1-, 2-, and -norms from 100 indeendent subsamled solutions of samle size (f f * )/f * vector, which means that we only estimate a single entry of x in each measurement. The number of measurements on the i-th entry of x is twice as large as that on the (i + 1)-th entry, i = 1,..., 14. We have 2.62 billion measurements on the first entry while only 0.16 million measurements on the last. Imbalanced measurements aarently create difficulties for samling-based algorithms. The resonse vector b is given by { 1000ɛ i with rob b i = a T, i = 1,..., m, i x + ɛ i otherwise where {ɛ i } are i.i.d. samles drawn from the standard Lalace distribution. 0.1% measurements are corruted to simulate noisy real-world data. Since the roblem is searable, we know that an otimal solution is simly given by the median of resonses corresonding to each entry. If we use l 2 regression, the otimal solution is given by the mean values, which is inaccurate due to corruted measurements. We first check the accuracy of subsamled solutions. We imlement Algorithm 1 with block size n 3 (ALG1), which gives 2n 5/2 -conditioning; and the Cauchy transform (CT) by (Sohler & Woodruff, 2011), which gives asymtotic O(n 3/2 log 3/2 n)-conditioning; and then we use Algorithm 2 to comute 100 subsamled solutions in a single ass. We comute AE 1 exlicitly rior to samling for a better control on the samle size. We choose r = in Algorithm 2. We also imlement Algorithm 2 without conditioning (NOCD) and uniform samling (UNIF) for comarison. The 1st and the 3rd quartiles of the relative errors in 1-, 2-, and - norms are shown in Table 3. ALG1 clearly erforms the best, achieving 0.01 relative error in all the metrics we use. CT has better asymtotic conditioning quality than ALG1 in theory, but it doesn t generate better solutions in this test. This confirms our concerns on the hidden constant in κ 1 and the failure robability. UNIF works but it is about a magnitude worse than ALG1. NOCD generates large errors. So both UNIF and NOCD are not reliable aroaches. Next we try to iteratively imrove the subsamled solutions using Algorithm 3. We imlement and comare number of iterations Figure 1. A standard IPCPM aroach (single oint initialization and single query er iteration) vs. the roosed aroach (samling initialization and multile queries er iteration) on relative errors in function value. the roosed IPCPM with a standard IPCPM based on random walks with single oint initialization and single query er iteration. We set the number of iterations to 30. The running times for each of them are aroximately the same. Figure 1 shows the convergence behavior in terms of relative error in objective value. IPCPMs are not monotonically decreasing algorithms. Hence we see even we begin with a 10- aroximate solution with the standard IPCPM, the error goes to 10 3 after a few iterations and the initial guess is not imroved in 30 iterations. The samling initialization hels create a small initial search region; this makes the roosed IPCPM begin at a aroximate solution, stay below that level, and reach 10 6 in only 30 iterations. Moreover, it is easy to see that the multile-query strategy imroves the rate of convergence, though still at a linear rate. 5 Conclusion We have roosed an algorithm for solving strongly over-determined l regression roblems, for [1, 2), with an emhasis on its theoretical and emirical roerties for = 1. Although some of the building blocks of our algorithm are not better than state-of-the-art algorithms in terms of FLOP counts, we have shown that our algorithm has suerior communication roerties that ermit it to be imlemented in MaReduce and alied to terabyte-scale data to obtain a moderaterecision solution in only a few asses. The roosed method can also be extended to solving more general convex roblems on MaReduce. Acknowledgments Most of the work was done while the first author was at ICME, Stanford University suorted by NSF DMS The authors would like to thank Suresh Venkatasubramanian for helful discussion and for bringing to our attention several helful references.

9 References Balcan, M.-F., Blum, A., Fine, S., and Mansour, Y. Distributed learning, communication comlexity and rivacy. Arxiv rerint arxiv: , Bekkerman, R., Bilenko, M., and Langford, J. (eds.). Scaling u Machine Learning: Parallel and Distributed Aroaches. Cambridge University Press, Bertsekas, D. P. and Tsitsiklis, J. N. Some asects of arallel and distributed iterative algorithms a survey. Automatica, 27(1):3 21, Bertsimas, D. and Vemala, S. Solving convex rograms by random walks. Journal of the ACM, 51(4): , Clarkson, K. L. Subgradient and samling algorithms for l 1 regression. In Proceedings of the Sixteenth Annual ACM- SIAM Symosium on Discrete Algorithms (SODA), SIAM, Clarkson, K. L. and Woodruff, D. P. Low rank aroximation and regression in inut sarsity time. In Proceedings of the 45th Annual ACM symosium on Theory of Comuting (STOC), Clarkson, K. L., Drineas, P., Magdon-Ismail, M., Mahoney, M. W., Meng, X., and Woodruff, D. P. The Fast Cauchy Transform and faster robust linear regression. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symosium on Discrete Algorithms (SODA), Dasguta, A., Drineas, P., Harb, B., Kumar, R., and Mahoney, M. W. Samling algorithms and coresets for l regression. SIAM J. Comut., 38(5): , Daumé, III, H., Phillis, J. M., Saha, A., and Venkatasubramanian, S. Efficient rotocols for distributed classification and otimization. In Proceedings of the 23rd International Conference on Algorithmic Learning Theory, , Dean, J. and Ghemawat, S. MaReduce: Simlified data rocessing on large clusters. In Proceedings of the Sixth Symosium on Oerating System Design and Imlementation (OSDI), , Feldman, J., Muthukrishnan, S., Sidirooulos, A., Stein, C., and Svitkina, Z. On distributing symmetric streaming comutations. ACM Transactions on Algorithms, 6 (4):Article 66, Goodrich, M. T. Simulating arallel algorithms in the MaReduce framework with alications to arallel comutational geometry. Arxiv rerint arxiv: , Levin, A. Y. On an algorithm for the minimization of convex functions. In Soviet Mathematics Doklady, volume 160, , Lovász, L. Hit-and-run mixes fast. Math. Prog., 86(3): , Mackey, L., Talwalkar, A., and Jordan, M. I. Divide-andconquer matrix factorization. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS), Mahoney, M. W. Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning. NOW Publishers, Boston, Also available at: arxiv: Meng, X. and Mahoney, M. W. Low-distortion subsace embeddings in inut-sarsity time and alications to robust linear regression. In Proceedings of the 45th Annual ACM symosium on Theory of Comuting (STOC), Mitchell, J. E. Polynomial interior oint cutting lane methods. Otimization Methods and Software, 18(5): , Nesterov, Y. Unconstrained convex minimization in relative scale. Mathematics of Oerations Research, 34(1): , Nesterov, Y. and Nemirovsky, A. Interior Point Polynomial Methods in Convex Programming. SIAM, Portnoy, S. and Koenker, R. The Gaussian hare and the Lalacian tortoise: comutability of squared-error versus absolute-error estimators. Statistical Science, 12(4): , Rousseeuw, P. J. and Leroy, A. M. Robust Regression and Outlier Detection. Wiley, Sohler, C. and Woodruff, D. P. Subsace embeddings for the l 1-norm with alications. In Proceedings of the 43rd annual ACM symosium on Theory of comuting (STOC), ACM, Tarasov, S., Khachiyan, L. G., and Erlikh, I. The method of inscribed ellisoids. In Soviet Mathematics Doklady, volume 37, , Vaidya, P. M. A new algorithm for minimizing convex functions over convex sets. Math. Prog., 73: , Zhang, Y., Duchi, J., and Wainwright, M. J. Communication-efficient algorithms for statistical otimization. In Annual Advances in Neural Information Processing Systems 26: Proceedings of the 2012 Conference, Grötschel, M., Lovász, L., and Schrijver, A. The ellisoid method and its consequences in combinatorial otimization. Combinatorica, 1(2): , Karloff, H., Suri, S., and Vassilvitskii, S. A model of comutation for MaReduce. In Proceedings of the 21st Annual ACM-SIAM Symosium on Discrete Algorithms, , 2010.

Large-Scale IP Traceback in High-Speed Internet: Practical Techniques and Theoretical Foundation

Large-Scale IP Traceback in High-Speed Internet: Practical Techniques and Theoretical Foundation Large-Scale IP Traceback in High-Seed Internet: Practical Techniques and Theoretical Foundation Jun Li Minho Sung Jun (Jim) Xu College of Comuting Georgia Institute of Technology {junli,mhsung,jx}@cc.gatech.edu

More information

The Online Freeze-tag Problem

The Online Freeze-tag Problem The Online Freeze-tag Problem Mikael Hammar, Bengt J. Nilsson, and Mia Persson Atus Technologies AB, IDEON, SE-3 70 Lund, Sweden mikael.hammar@atus.com School of Technology and Society, Malmö University,

More information

A MOST PROBABLE POINT-BASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION

A MOST PROBABLE POINT-BASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION 9 th ASCE Secialty Conference on Probabilistic Mechanics and Structural Reliability PMC2004 Abstract A MOST PROBABLE POINT-BASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION

More information

Point Location. Preprocess a planar, polygonal subdivision for point location queries. p = (18, 11)

Point Location. Preprocess a planar, polygonal subdivision for point location queries. p = (18, 11) Point Location Prerocess a lanar, olygonal subdivision for oint location ueries. = (18, 11) Inut is a subdivision S of comlexity n, say, number of edges. uild a data structure on S so that for a uery oint

More information

Load Balancing Mechanism in Agent-based Grid

Load Balancing Mechanism in Agent-based Grid Communications on Advanced Comutational Science with Alications 2016 No. 1 (2016) 57-62 Available online at www.isacs.com/cacsa Volume 2016, Issue 1, Year 2016 Article ID cacsa-00042, 6 Pages doi:10.5899/2016/cacsa-00042

More information

The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling

The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling The Fundamental Incomatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsamling Michael Betancourt Deartment of Statistics, University of Warwick, Coventry, UK CV4 7A BETANAPHA@GMAI.COM Abstract

More information

Monitoring Frequency of Change By Li Qin

Monitoring Frequency of Change By Li Qin Monitoring Frequency of Change By Li Qin Abstract Control charts are widely used in rocess monitoring roblems. This aer gives a brief review of control charts for monitoring a roortion and some initial

More information

Discrete Stochastic Approximation with Application to Resource Allocation

Discrete Stochastic Approximation with Application to Resource Allocation Discrete Stochastic Aroximation with Alication to Resource Allocation Stacy D. Hill An otimization roblem involves fi nding the best value of an obective function or fi gure of merit the value that otimizes

More information

Stochastic Derivation of an Integral Equation for Probability Generating Functions

Stochastic Derivation of an Integral Equation for Probability Generating Functions Journal of Informatics and Mathematical Sciences Volume 5 (2013), Number 3,. 157 163 RGN Publications htt://www.rgnublications.com Stochastic Derivation of an Integral Equation for Probability Generating

More information

An important observation in supply chain management, known as the bullwhip effect,

An important observation in supply chain management, known as the bullwhip effect, Quantifying the Bullwhi Effect in a Simle Suly Chain: The Imact of Forecasting, Lead Times, and Information Frank Chen Zvi Drezner Jennifer K. Ryan David Simchi-Levi Decision Sciences Deartment, National

More information

Synopsys RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Development FRANCE

Synopsys RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Development FRANCE RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Develoment FRANCE Synosys There is no doubt left about the benefit of electrication and subsequently

More information

Softmax Model as Generalization upon Logistic Discrimination Suffers from Overfitting

Softmax Model as Generalization upon Logistic Discrimination Suffers from Overfitting Journal of Data Science 12(2014),563-574 Softmax Model as Generalization uon Logistic Discrimination Suffers from Overfitting F. Mohammadi Basatini 1 and Rahim Chiniardaz 2 1 Deartment of Statistics, Shoushtar

More information

Machine Learning with Operational Costs

Machine Learning with Operational Costs Journal of Machine Learning Research 14 (2013) 1989-2028 Submitted 12/11; Revised 8/12; Published 7/13 Machine Learning with Oerational Costs Theja Tulabandhula Deartment of Electrical Engineering and

More information

ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS

ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS Liviu Grigore Comuter Science Deartment University of Illinois at Chicago Chicago, IL, 60607 lgrigore@cs.uic.edu Ugo Buy Comuter Science

More information

Web Application Scalability: A Model-Based Approach

Web Application Scalability: A Model-Based Approach Coyright 24, Software Engineering Research and Performance Engineering Services. All rights reserved. Web Alication Scalability: A Model-Based Aroach Lloyd G. Williams, Ph.D. Software Engineering Research

More information

Local Connectivity Tests to Identify Wormholes in Wireless Networks

Local Connectivity Tests to Identify Wormholes in Wireless Networks Local Connectivity Tests to Identify Wormholes in Wireless Networks Xiaomeng Ban Comuter Science Stony Brook University xban@cs.sunysb.edu Rik Sarkar Comuter Science Freie Universität Berlin sarkar@inf.fu-berlin.de

More information

On Multicast Capacity and Delay in Cognitive Radio Mobile Ad-hoc Networks

On Multicast Capacity and Delay in Cognitive Radio Mobile Ad-hoc Networks On Multicast Caacity and Delay in Cognitive Radio Mobile Ad-hoc Networks Jinbei Zhang, Yixuan Li, Zhuotao Liu, Fan Wu, Feng Yang, Xinbing Wang Det of Electronic Engineering Det of Comuter Science and Engineering

More information

Memory management. Chapter 4: Memory Management. Memory hierarchy. In an ideal world. Basic memory management. Fixed partitions: multiple programs

Memory management. Chapter 4: Memory Management. Memory hierarchy. In an ideal world. Basic memory management. Fixed partitions: multiple programs Memory management Chater : Memory Management Part : Mechanisms for Managing Memory asic management Swaing Virtual Page relacement algorithms Modeling age relacement algorithms Design issues for aging systems

More information

An Efficient Method for Improving Backfill Job Scheduling Algorithm in Cluster Computing Systems

An Efficient Method for Improving Backfill Job Scheduling Algorithm in Cluster Computing Systems The International ournal of Soft Comuting and Software Engineering [SCSE], Vol., No., Secial Issue: The Proceeding of International Conference on Soft Comuting and Software Engineering 0 [SCSE ], San Francisco

More information

Static and Dynamic Properties of Small-world Connection Topologies Based on Transit-stub Networks

Static and Dynamic Properties of Small-world Connection Topologies Based on Transit-stub Networks Static and Dynamic Proerties of Small-world Connection Toologies Based on Transit-stub Networks Carlos Aguirre Fernando Corbacho Ramón Huerta Comuter Engineering Deartment, Universidad Autónoma de Madrid,

More information

X How to Schedule a Cascade in an Arbitrary Graph

X How to Schedule a Cascade in an Arbitrary Graph X How to Schedule a Cascade in an Arbitrary Grah Flavio Chierichetti, Cornell University Jon Kleinberg, Cornell University Alessandro Panconesi, Saienza University When individuals in a social network

More information

From Simulation to Experiment: A Case Study on Multiprocessor Task Scheduling

From Simulation to Experiment: A Case Study on Multiprocessor Task Scheduling From to Exeriment: A Case Study on Multirocessor Task Scheduling Sascha Hunold CNRS / LIG Laboratory Grenoble, France sascha.hunold@imag.fr Henri Casanova Det. of Information and Comuter Sciences University

More information

Time-Cost Trade-Offs in Resource-Constraint Project Scheduling Problems with Overlapping Modes

Time-Cost Trade-Offs in Resource-Constraint Project Scheduling Problems with Overlapping Modes Time-Cost Trade-Offs in Resource-Constraint Proect Scheduling Problems with Overlaing Modes François Berthaut Robert Pellerin Nathalie Perrier Adnène Hai February 2011 CIRRELT-2011-10 Bureaux de Montréal

More information

Storage Basics Architecting the Storage Supplemental Handout

Storage Basics Architecting the Storage Supplemental Handout Storage Basics Architecting the Storage Sulemental Handout INTRODUCTION With digital data growing at an exonential rate it has become a requirement for the modern business to store data and analyze it

More information

The risk of using the Q heterogeneity estimator for software engineering experiments

The risk of using the Q heterogeneity estimator for software engineering experiments Dieste, O., Fernández, E., García-Martínez, R., Juristo, N. 11. The risk of using the Q heterogeneity estimator for software engineering exeriments. The risk of using the Q heterogeneity estimator for

More information

Mean shift-based clustering

Mean shift-based clustering Pattern Recognition (7) www.elsevier.com/locate/r Mean shift-based clustering Kuo-Lung Wu a, Miin-Shen Yang b, a Deartment of Information Management, Kun Shan University of Technology, Yung-Kang, Tainan

More information

Concurrent Program Synthesis Based on Supervisory Control

Concurrent Program Synthesis Based on Supervisory Control 010 American Control Conference Marriott Waterfront, Baltimore, MD, USA June 30-July 0, 010 ThB07.5 Concurrent Program Synthesis Based on Suervisory Control Marian V. Iordache and Panos J. Antsaklis Abstract

More information

Jena Research Papers in Business and Economics

Jena Research Papers in Business and Economics Jena Research Paers in Business and Economics A newsvendor model with service and loss constraints Werner Jammernegg und Peter Kischka 21/2008 Jenaer Schriften zur Wirtschaftswissenschaft Working and Discussion

More information

A Note on Integer Factorization Using Lattices

A Note on Integer Factorization Using Lattices A Note on Integer Factorization Using Lattices Antonio Vera To cite this version: Antonio Vera A Note on Integer Factorization Using Lattices [Research Reort] 2010, 12 HAL Id: inria-00467590

More information

Failure Behavior Analysis for Reliable Distributed Embedded Systems

Failure Behavior Analysis for Reliable Distributed Embedded Systems Failure Behavior Analysis for Reliable Distributed Embedded Systems Mario Tra, Bernd Schürmann, Torsten Tetteroo {tra schuerma tetteroo}@informatik.uni-kl.de Deartment of Comuter Science, University of

More information

A Multivariate Statistical Analysis of Stock Trends. Abstract

A Multivariate Statistical Analysis of Stock Trends. Abstract A Multivariate Statistical Analysis of Stock Trends Aril Kerby Alma College Alma, MI James Lawrence Miami University Oxford, OH Abstract Is there a method to redict the stock market? What factors determine

More information

Branch-and-Price for Service Network Design with Asset Management Constraints

Branch-and-Price for Service Network Design with Asset Management Constraints Branch-and-Price for Servicee Network Design with Asset Management Constraints Jardar Andersen Roar Grønhaug Mariellee Christiansen Teodor Gabriel Crainic December 2007 CIRRELT-2007-55 Branch-and-Price

More information

4 Perceptron Learning Rule

4 Perceptron Learning Rule Percetron Learning Rule Objectives Objectives - Theory and Examles - Learning Rules - Percetron Architecture -3 Single-Neuron Percetron -5 Multile-Neuron Percetron -8 Percetron Learning Rule -8 Test Problem

More information

Comparing Dissimilarity Measures for Symbolic Data Analysis

Comparing Dissimilarity Measures for Symbolic Data Analysis Comaring Dissimilarity Measures for Symbolic Data Analysis Donato MALERBA, Floriana ESPOSITO, Vincenzo GIOVIALE and Valentina TAMMA Diartimento di Informatica, University of Bari Via Orabona 4 76 Bari,

More information

Multiperiod Portfolio Optimization with General Transaction Costs

Multiperiod Portfolio Optimization with General Transaction Costs Multieriod Portfolio Otimization with General Transaction Costs Victor DeMiguel Deartment of Management Science and Oerations, London Business School, London NW1 4SA, UK, avmiguel@london.edu Xiaoling Mei

More information

Joint Production and Financing Decisions: Modeling and Analysis

Joint Production and Financing Decisions: Modeling and Analysis Joint Production and Financing Decisions: Modeling and Analysis Xiaodong Xu John R. Birge Deartment of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208,

More information

Project Management and. Scheduling CHAPTER CONTENTS

Project Management and. Scheduling CHAPTER CONTENTS 6 Proect Management and Scheduling HAPTER ONTENTS 6.1 Introduction 6.2 Planning the Proect 6.3 Executing the Proect 6.7.1 Monitor 6.7.2 ontrol 6.7.3 losing 6.4 Proect Scheduling 6.5 ritical Path Method

More information

6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks

6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks 6.042/8.062J Mathematics for Comuter Science December 2, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Random Walks Gambler s Ruin Today we re going to talk about one-dimensional random walks. In

More information

A Modified Measure of Covert Network Performance

A Modified Measure of Covert Network Performance A Modified Measure of Covert Network Performance LYNNE L DOTY Marist College Deartment of Mathematics Poughkeesie, NY UNITED STATES lynnedoty@maristedu Abstract: In a covert network the need for secrecy

More information

An inventory control system for spare parts at a refinery: An empirical comparison of different reorder point methods

An inventory control system for spare parts at a refinery: An empirical comparison of different reorder point methods An inventory control system for sare arts at a refinery: An emirical comarison of different reorder oint methods Eric Porras a*, Rommert Dekker b a Instituto Tecnológico y de Estudios Sueriores de Monterrey,

More information

SECTION 6: FIBER BUNDLES

SECTION 6: FIBER BUNDLES SECTION 6: FIBER BUNDLES In this section we will introduce the interesting class o ibrations given by iber bundles. Fiber bundles lay an imortant role in many geometric contexts. For examle, the Grassmaniann

More information

The impact of metadata implementation on webpage visibility in search engine results (Part II) q

The impact of metadata implementation on webpage visibility in search engine results (Part II) q Information Processing and Management 41 (2005) 691 715 www.elsevier.com/locate/inforoman The imact of metadata imlementation on webage visibility in search engine results (Part II) q Jin Zhang *, Alexandra

More information

DAY-AHEAD ELECTRICITY PRICE FORECASTING BASED ON TIME SERIES MODELS: A COMPARISON

DAY-AHEAD ELECTRICITY PRICE FORECASTING BASED ON TIME SERIES MODELS: A COMPARISON DAY-AHEAD ELECTRICITY PRICE FORECASTING BASED ON TIME SERIES MODELS: A COMPARISON Rosario Esínola, Javier Contreras, Francisco J. Nogales and Antonio J. Conejo E.T.S. de Ingenieros Industriales, Universidad

More information

1 Gambler s Ruin Problem

1 Gambler s Ruin Problem Coyright c 2009 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins

More information

POISSON PROCESSES. Chapter 2. 2.1 Introduction. 2.1.1 Arrival processes

POISSON PROCESSES. Chapter 2. 2.1 Introduction. 2.1.1 Arrival processes Chater 2 POISSON PROCESSES 2.1 Introduction A Poisson rocess is a simle and widely used stochastic rocess for modeling the times at which arrivals enter a system. It is in many ways the continuous-time

More information

Rummage Web Server Tuning Evaluation through Benchmark

Rummage Web Server Tuning Evaluation through Benchmark IJCSNS International Journal of Comuter Science and Network Security, VOL.7 No.9, Setember 27 13 Rummage Web Server Tuning Evaluation through Benchmark (Case study: CLICK, and TIME Parameter) Hiyam S.

More information

Risk in Revenue Management and Dynamic Pricing

Risk in Revenue Management and Dynamic Pricing OPERATIONS RESEARCH Vol. 56, No. 2, March Aril 2008,. 326 343 issn 0030-364X eissn 1526-5463 08 5602 0326 informs doi 10.1287/ore.1070.0438 2008 INFORMS Risk in Revenue Management and Dynamic Pricing Yuri

More information

A Simple Model of Pricing, Markups and Market. Power Under Demand Fluctuations

A Simple Model of Pricing, Markups and Market. Power Under Demand Fluctuations A Simle Model of Pricing, Markus and Market Power Under Demand Fluctuations Stanley S. Reynolds Deartment of Economics; University of Arizona; Tucson, AZ 85721 Bart J. Wilson Economic Science Laboratory;

More information

Int. J. Advanced Networking and Applications Volume: 6 Issue: 4 Pages: 2386-2392 (2015) ISSN: 0975-0290

Int. J. Advanced Networking and Applications Volume: 6 Issue: 4 Pages: 2386-2392 (2015) ISSN: 0975-0290 2386 Survey: Biological Insired Comuting in the Network Security V Venkata Ramana Associate Professor, Deartment of CSE, CBIT, Proddatur, Y.S.R (dist), A.P-516360 Email: ramanacsecbit@gmail.com Y.Subba

More information

Forensic Science International

Forensic Science International Forensic Science International 214 (2012) 33 43 Contents lists available at ScienceDirect Forensic Science International jou r nal h o me age: w ww.els evier.co m/lo c ate/fo r sc iin t A robust detection

More information

Service Network Design with Asset Management: Formulations and Comparative Analyzes

Service Network Design with Asset Management: Formulations and Comparative Analyzes Service Network Design with Asset Management: Formulations and Comarative Analyzes Jardar Andersen Teodor Gabriel Crainic Marielle Christiansen October 2007 CIRRELT-2007-40 Service Network Design with

More information

Title: Stochastic models of resource allocation for services

Title: Stochastic models of resource allocation for services Title: Stochastic models of resource allocation for services Author: Ralh Badinelli,Professor, Virginia Tech, Deartment of BIT (235), Virginia Tech, Blacksburg VA 2461, USA, ralhb@vt.edu Phone : (54) 231-7688,

More information

Predicate Encryption Supporting Disjunctions, Polynomial Equations, and Inner Products

Predicate Encryption Supporting Disjunctions, Polynomial Equations, and Inner Products Predicate Encrytion Suorting Disjunctions, Polynomial Equations, and Inner Products Jonathan Katz Amit Sahai Brent Waters Abstract Predicate encrytion is a new aradigm for ublic-key encrytion that generalizes

More information

Service Network Design with Asset Management: Formulations and Comparative Analyzes

Service Network Design with Asset Management: Formulations and Comparative Analyzes Service Network Design with Asset Management: Formulations and Comarative Analyzes Jardar Andersen Teodor Gabriel Crainic Marielle Christiansen October 2007 CIRRELT-2007-40 Service Network Design with

More information

Risk and Return. Sample chapter. e r t u i o p a s d f CHAPTER CONTENTS LEARNING OBJECTIVES. Chapter 7

Risk and Return. Sample chapter. e r t u i o p a s d f CHAPTER CONTENTS LEARNING OBJECTIVES. Chapter 7 Chater 7 Risk and Return LEARNING OBJECTIVES After studying this chater you should be able to: e r t u i o a s d f understand how return and risk are defined and measured understand the concet of risk

More information

Buffer Capacity Allocation: A method to QoS support on MPLS networks**

Buffer Capacity Allocation: A method to QoS support on MPLS networks** Buffer Caacity Allocation: A method to QoS suort on MPLS networks** M. K. Huerta * J. J. Padilla X. Hesselbach ϒ R. Fabregat O. Ravelo Abstract This aer describes an otimized model to suort QoS by mean

More information

Automatic Search for Correlated Alarms

Automatic Search for Correlated Alarms Automatic Search for Correlated Alarms Klaus-Dieter Tuchs, Peter Tondl, Markus Radimirsch, Klaus Jobmann Institut für Allgemeine Nachrichtentechnik, Universität Hannover Aelstraße 9a, 0167 Hanover, Germany

More information

FDA CFR PART 11 ELECTRONIC RECORDS, ELECTRONIC SIGNATURES

FDA CFR PART 11 ELECTRONIC RECORDS, ELECTRONIC SIGNATURES Document: MRM-1004-GAPCFR11 (0005) Page: 1 / 18 FDA CFR PART 11 ELECTRONIC RECORDS, ELECTRONIC SIGNATURES AUDIT TRAIL ECO # Version Change Descrition MATRIX- 449 A Ga Analysis after adding controlled documents

More information

On the predictive content of the PPI on CPI inflation: the case of Mexico

On the predictive content of the PPI on CPI inflation: the case of Mexico On the redictive content of the PPI on inflation: the case of Mexico José Sidaoui, Carlos Caistrán, Daniel Chiquiar and Manuel Ramos-Francia 1 1. Introduction It would be natural to exect that shocks to

More information

Re-Dispatch Approach for Congestion Relief in Deregulated Power Systems

Re-Dispatch Approach for Congestion Relief in Deregulated Power Systems Re-Disatch Aroach for Congestion Relief in Deregulated ower Systems Ch. Naga Raja Kumari #1, M. Anitha 2 #1, 2 Assistant rofessor, Det. of Electrical Engineering RVR & JC College of Engineering, Guntur-522019,

More information

United Arab Emirates University College of Sciences Department of Mathematical Sciences HOMEWORK 1 SOLUTION. Section 10.1 Vectors in the Plane

United Arab Emirates University College of Sciences Department of Mathematical Sciences HOMEWORK 1 SOLUTION. Section 10.1 Vectors in the Plane United Arab Emirates University College of Sciences Deartment of Mathematical Sciences HOMEWORK 1 SOLUTION Section 10.1 Vectors in the Plane Calculus II for Engineering MATH 110 SECTION 0 CRN 510 :00 :00

More information

Computational Finance The Martingale Measure and Pricing of Derivatives

Computational Finance The Martingale Measure and Pricing of Derivatives 1 The Martingale Measure 1 Comutational Finance The Martingale Measure and Pricing of Derivatives 1 The Martingale Measure The Martingale measure or the Risk Neutral robabilities are a fundamental concet

More information

High Quality Offset Printing An Evolutionary Approach

High Quality Offset Printing An Evolutionary Approach High Quality Offset Printing An Evolutionary Aroach Ralf Joost Institute of Alied Microelectronics and omuter Engineering University of Rostock Rostock, 18051, Germany +49 381 498 7272 ralf.joost@uni-rostock.de

More information

Alpha Channel Estimation in High Resolution Images and Image Sequences

Alpha Channel Estimation in High Resolution Images and Image Sequences In IEEE Comuter Society Conference on Comuter Vision and Pattern Recognition (CVPR 2001), Volume I, ages 1063 68, auai Hawaii, 11th 13th Dec 2001 Alha Channel Estimation in High Resolution Images and Image

More information

An Associative Memory Readout in ESN for Neural Action Potential Detection

An Associative Memory Readout in ESN for Neural Action Potential Detection g An Associative Memory Readout in ESN for Neural Action Potential Detection Nicolas J. Dedual, Mustafa C. Ozturk, Justin C. Sanchez and José C. Princie Abstract This aer describes how Echo State Networks

More information

NEWSVENDOR PROBLEM WITH PRICING: PROPERTIES, ALGORITHMS, AND SIMULATION

NEWSVENDOR PROBLEM WITH PRICING: PROPERTIES, ALGORITHMS, AND SIMULATION Proceedings of the 2005 Winter Simulation Conference M. E. Kuhl, N. M. Steiger, F. B. rmstrong, and J.. Joines, eds. NEWSVENDOR PROBLEM WITH PRICING: PROPERTIES, LGORITHMS, ND SIMULTION Roger L. Zhan ISE

More information

Evaluating a Web-Based Information System for Managing Master of Science Summer Projects

Evaluating a Web-Based Information System for Managing Master of Science Summer Projects Evaluating a Web-Based Information System for Managing Master of Science Summer Projects Till Rebenich University of Southamton tr08r@ecs.soton.ac.uk Andrew M. Gravell University of Southamton amg@ecs.soton.ac.uk

More information

Compensating Fund Managers for Risk-Adjusted Performance

Compensating Fund Managers for Risk-Adjusted Performance Comensating Fund Managers for Risk-Adjusted Performance Thomas S. Coleman Æquilibrium Investments, Ltd. Laurence B. Siegel The Ford Foundation Journal of Alternative Investments Winter 1999 In contrast

More information

Two-resource stochastic capacity planning employing a Bayesian methodology

Two-resource stochastic capacity planning employing a Bayesian methodology Journal of the Oerational Research Society (23) 54, 1198 128 r 23 Oerational Research Society Ltd. All rights reserved. 16-5682/3 $25. www.algrave-journals.com/jors Two-resource stochastic caacity lanning

More information

Beyond the F Test: Effect Size Confidence Intervals and Tests of Close Fit in the Analysis of Variance and Contrast Analysis

Beyond the F Test: Effect Size Confidence Intervals and Tests of Close Fit in the Analysis of Variance and Contrast Analysis Psychological Methods 004, Vol. 9, No., 164 18 Coyright 004 by the American Psychological Association 108-989X/04/$1.00 DOI: 10.1037/108-989X.9..164 Beyond the F Test: Effect Size Confidence Intervals

More information

Multistage Human Resource Allocation for Software Development by Multiobjective Genetic Algorithm

Multistage Human Resource Allocation for Software Development by Multiobjective Genetic Algorithm The Oen Alied Mathematics Journal, 2008, 2, 95-03 95 Oen Access Multistage Human Resource Allocation for Software Develoment by Multiobjective Genetic Algorithm Feng Wen a,b and Chi-Ming Lin*,a,c a Graduate

More information

COST CALCULATION IN COMPLEX TRANSPORT SYSTEMS

COST CALCULATION IN COMPLEX TRANSPORT SYSTEMS OST ALULATION IN OMLEX TRANSORT SYSTEMS Zoltán BOKOR 1 Introduction Determining the real oeration and service costs is essential if transort systems are to be lanned and controlled effectively. ost information

More information

C-Bus Voltage Calculation

C-Bus Voltage Calculation D E S I G N E R N O T E S C-Bus Voltage Calculation Designer note number: 3-12-1256 Designer: Darren Snodgrass Contact Person: Darren Snodgrass Aroved: Date: Synosis: The guidelines used by installers

More information

Scalable Simple Random Sampling and Stratified Sampling

Scalable Simple Random Sampling and Stratified Sampling Xiangrui Meng LinkedIn Corporation, 2029 Stierlin Court, Mountain View, CA 94043, USA ximeng@linkedin.com Abstract Analyzing data sets of billions of records has now become a regular task in many companies

More information

Design and Development of Decision Making System Using Fuzzy Analytic Hierarchy Process

Design and Development of Decision Making System Using Fuzzy Analytic Hierarchy Process American Journal of Alied Sciences 5 (7): 783787, 2008 ISSN 15469239 2008 Science Publications Design and Develoment of Decision Making System Using Fuzzy Analytic Hierarchy Process Chin Wen Cheong, Lee

More information

A Virtual Machine Dynamic Migration Scheduling Model Based on MBFD Algorithm

A Virtual Machine Dynamic Migration Scheduling Model Based on MBFD Algorithm International Journal of Comuter Theory and Engineering, Vol. 7, No. 4, August 2015 A Virtual Machine Dynamic Migration Scheduling Model Based on MBFD Algorithm Xin Lu and Zhuanzhuan Zhang Abstract This

More information

Software Cognitive Complexity Measure Based on Scope of Variables

Software Cognitive Complexity Measure Based on Scope of Variables Software Cognitive Comlexity Measure Based on Scoe of Variables Kwangmyong Rim and Yonghua Choe Faculty of Mathematics, Kim Il Sung University, D.P.R.K mathchoeyh@yahoo.com Abstract In this aer, we define

More information

A Study of Active Queue Management for Congestion Control

A Study of Active Queue Management for Congestion Control In IEEE INFOCOM 2 A Study of Active Queue Management for Congestion Control Victor Firoiu Marty Borden 1 vfiroiu@nortelnetworks.com mborden@tollbridgetech.com Nortel Networks TollBridge Technologies 6

More information

CSI:FLORIDA. Section 4.4: Logistic Regression

CSI:FLORIDA. Section 4.4: Logistic Regression SI:FLORIDA Section 4.4: Logistic Regression SI:FLORIDA Reisit Masked lass Problem.5.5 2 -.5 - -.5 -.5 - -.5.5.5 We can generalize this roblem to two class roblem as well! SI:FLORIDA Reisit Masked lass

More information

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree The Priority R-Tree: A Practically Efficient and Worst-Case Otimal R-Tree Lars Arge Deartment of Comuter Science Duke University, ox 90129 Durham, NC 27708-0129 USA large@cs.duke.edu Mark de erg Deartment

More information

Effect Sizes Based on Means

Effect Sizes Based on Means CHAPTER 4 Effect Sizes Based on Means Introduction Raw (unstardized) mean difference D Stardized mean difference, d g Resonse ratios INTRODUCTION When the studies reort means stard deviations, the referred

More information

Managing specific risk in property portfolios

Managing specific risk in property portfolios Managing secific risk in roerty ortfolios Andrew Baum, PhD University of Reading, UK Peter Struemell OPC, London, UK Contact author: Andrew Baum Deartment of Real Estate and Planning University of Reading

More information

NAVAL POSTGRADUATE SCHOOL THESIS

NAVAL POSTGRADUATE SCHOOL THESIS NAVAL POSTGRADUATE SCHOOL MONTEREY CALIFORNIA THESIS SYMMETRICAL RESIDUE-TO-BINARY CONVERSION ALGORITHM PIPELINED FPGA IMPLEMENTATION AND TESTING LOGIC FOR USE IN HIGH-SPEED FOLDING DIGITIZERS by Ross

More information

The Economics of the Cloud: Price Competition and Congestion

The Economics of the Cloud: Price Competition and Congestion Submitted to Oerations Research manuscrit The Economics of the Cloud: Price Cometition and Congestion Jonatha Anselmi Basque Center for Alied Mathematics, jonatha.anselmi@gmail.com Danilo Ardagna Di. di

More information

Simulink Implementation of a CDMA Smart Antenna System

Simulink Implementation of a CDMA Smart Antenna System Simulink Imlementation of a CDMA Smart Antenna System MOSTAFA HEFNAWI Deartment of Electrical and Comuter Engineering Royal Military College of Canada Kingston, Ontario, K7K 7B4 CANADA Abstract: - The

More information

SQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWO-DIMENSIONAL GRID

SQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWO-DIMENSIONAL GRID International Journal of Comuter Science & Information Technology (IJCSIT) Vol 6, No 4, August 014 SQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWO-DIMENSIONAL GRID

More information

Factoring Variations in Natural Images with Deep Gaussian Mixture Models

Factoring Variations in Natural Images with Deep Gaussian Mixture Models Factoring Variations in Natural Images with Dee Gaussian Mixture Models Aäron van den Oord, Benjamin Schrauwen Electronics and Information Systems deartment (ELIS), Ghent University {aaron.vandenoord,

More information

Design of A Knowledge Based Trouble Call System with Colored Petri Net Models

Design of A Knowledge Based Trouble Call System with Colored Petri Net Models 2005 IEEE/PES Transmission and Distribution Conference & Exhibition: Asia and Pacific Dalian, China Design of A Knowledge Based Trouble Call System with Colored Petri Net Models Hui-Jen Chuang, Chia-Hung

More information

17609: Continuous Data Protection Transforms the Game

17609: Continuous Data Protection Transforms the Game 17609: Continuous Data Protection Transforms the Game Wednesday, August 12, 2015: 8:30 AM-9:30 AM Southern Hemishere 5 (Walt Disney World Dolhin) Tony Negro - EMC Rebecca Levesque 21 st Century Software

More information

Minimizing the Communication Cost for Continuous Skyline Maintenance

Minimizing the Communication Cost for Continuous Skyline Maintenance Minimizing the Communication Cost for Continuous Skyline Maintenance Zhenjie Zhang, Reynold Cheng, Dimitris Paadias, Anthony K.H. Tung School of Comuting National University of Singaore {zhenjie,atung}@com.nus.edu.sg

More information

The fast Fourier transform method for the valuation of European style options in-the-money (ITM), at-the-money (ATM) and out-of-the-money (OTM)

The fast Fourier transform method for the valuation of European style options in-the-money (ITM), at-the-money (ATM) and out-of-the-money (OTM) Comutational and Alied Mathematics Journal 15; 1(1: 1-6 Published online January, 15 (htt://www.aascit.org/ournal/cam he fast Fourier transform method for the valuation of Euroean style otions in-the-money

More information

Provable Ownership of File in De-duplication Cloud Storage

Provable Ownership of File in De-duplication Cloud Storage 1 Provable Ownershi of File in De-dulication Cloud Storage Chao Yang, Jian Ren and Jianfeng Ma School of CS, Xidian University Xi an, Shaanxi, 710071. Email: {chaoyang, jfma}@mail.xidian.edu.cn Deartment

More information

Fluent Software Training TRN-99-003. Solver Settings. Fluent Inc. 2/23/01

Fluent Software Training TRN-99-003. Solver Settings. Fluent Inc. 2/23/01 Solver Settings E1 Using the Solver Setting Solver Parameters Convergence Definition Monitoring Stability Accelerating Convergence Accuracy Grid Indeendence Adation Aendix: Background Finite Volume Method

More information

Probabilistic models for mechanical properties of prestressing strands

Probabilistic models for mechanical properties of prestressing strands Probabilistic models for mechanical roerties of restressing strands Luciano Jacinto a, Manuel Pia b, Luís Neves c, Luís Oliveira Santos b a Instituto Suerior de Engenharia de Lisboa, Rua Conselheiro Emídio

More information

Dynamic Load Balance for Approximate Parallel Simulations with Consistent Hashing

Dynamic Load Balance for Approximate Parallel Simulations with Consistent Hashing Dynamic Load Balance for Aroximate Parallel Simulations with Consistent Hashing Roberto Solar Yahoo! Labs Santiago, Chile rsolar@yahoo-inc.com Veronica Gil-Costa Universidad Nacional de San Luis, Argentina

More information

Asymmetric Information, Transaction Cost, and. Externalities in Competitive Insurance Markets *

Asymmetric Information, Transaction Cost, and. Externalities in Competitive Insurance Markets * Asymmetric Information, Transaction Cost, and Externalities in Cometitive Insurance Markets * Jerry W. iu Deartment of Finance, University of Notre Dame, Notre Dame, IN 46556-5646 wliu@nd.edu Mark J. Browne

More information

Introduction to NP-Completeness Written and copyright c by Jie Wang 1

Introduction to NP-Completeness Written and copyright c by Jie Wang 1 91.502 Foundations of Comuter Science 1 Introduction to Written and coyright c by Jie Wang 1 We use time-bounded (deterministic and nondeterministic) Turing machines to study comutational comlexity of

More information

Modeling and Simulation of an Incremental Encoder Used in Electrical Drives

Modeling and Simulation of an Incremental Encoder Used in Electrical Drives 10 th International Symosium of Hungarian Researchers on Comutational Intelligence and Informatics Modeling and Simulation of an Incremental Encoder Used in Electrical Drives János Jób Incze, Csaba Szabó,

More information

IMPROVING NAIVE BAYESIAN SPAM FILTERING

IMPROVING NAIVE BAYESIAN SPAM FILTERING Master Thesis IMPROVING NAIVE BAYESIAN SPAM FILTERING Jon Kågström Mid Sweden University Deartment for Information Technology and Media Sring 005 Abstract Sam or unsolicited e-mail has become a major roblem

More information

Penalty Interest Rates, Universal Default, and the Common Pool Problem of Credit Card Debt

Penalty Interest Rates, Universal Default, and the Common Pool Problem of Credit Card Debt Penalty Interest Rates, Universal Default, and the Common Pool Problem of Credit Card Debt Lawrence M. Ausubel and Amanda E. Dawsey * February 2009 Preliminary and Incomlete Introduction It is now reasonably

More information