A polynomial time algorithm for computing the area under a GDT curve
|
|
|
- Homer Leonard
- 10 years ago
- Views:
Transcription
1 DOI /s RESEARCH Open Access A polynomial time algorithm for computing the area under a GDT curve Aleksandar Poleksic * Abstract Background: Progress in the field of protein three-dimensional structure prediction depends on the development of new and improved algorithms for measuring the quality of protein models. Perhaps the best descriptor of the quality of a protein model is the GDT function that maps each distance cutoff θ to the number of atoms in the protein model that can be fit under the distance θ from the corresponding atoms in the experimentally determined structure. It has long been known that the area under the graph of this function (GDT_A) can serve as a reliable, single numerical measure of the model quality. Unfortunately, while the well-known GDT_TS metric provides a crude approximation of GDT_A, no algorithm currently exists that is capable of computing accurate estimates of GDT_A. Methods: We prove that GDT_A is well defined and that it can be approximated by the Riemann sums, using available methods for computing accurate (near-optimal) GDT function values. Results: In contrast to the GDT_TS metric, GDT_A is neither insensitive to large nor oversensitive to small changes in model s coordinates. Moreover, the problem of computing GDT_A is tractable. More specifically, GDT_A can be computed in cubic asymptotic time in the size of the protein model. Conclusions: This paper presents the first algorithm capable of computing the near-optimal estimates of the area under the GDT function for a protein model. We believe that the techniques implemented in our algorithm will pave ways for the development of more practical and reliable procedures for estimating 3D model quality. Keywords: Protein structure, Structure modeling, Structure prediction, Model quality Background Advances in the area of protein three-dimensional structure prediction depend on the ability to accurately measure the quality of a protein model. One of the most popular and most reliable measure of the protein model quality is GDT_TS. It is defined as the average value of GDT_P θ computed for four distance cutoffs θ = 2 i, i = 0, 3, where GDT_P θ is the percentage of model residues (represented by their C α atoms) that can be placed under θ ångströms from the corresponding residues in the experimental structure [1, 2]. In a high-accuracy version of GDT_TS, denoted by GDT_HA, the distance cutoffs are cut in half (θ = 2 i, i = 1, 2) [3]. In both approaches, the underlying assumption is that the experimental (crystallographic or NMR) structure is close to *Correspondence: [email protected] Department of Computer Science, University of Northern Iowa, 305 ITTC, Cedar Falls, Iowa 50613, USA the real (native) structure (which is sometimes not true due to experimental errors). Several methods exist for computing GDT_TS. The LGA algorithm [4] can estimate GDT_TS quickly, but those estimates deviate from the true GDT_TS values in about 10 % of the cases [5]. Rigorous algorithms for computing GDT_TS have also been developed [6 9], but they are computationally much more expensive. The GDT_TS is commonly interpreted as an approximation of the area under the GDT curve, denoted by GDT_A [10 12]. Unfortunately, since the measure is approximated using the GDT function values at only several distance cutoffs, the errors in the area approximation are large. As we demonstrate later, GDT_TS is not only overly sensitive to small but also insensitive to large changes in the protein model s coordinates. In this paper, we present a polynomial time algorithm for computing GDT_A. Our method runs on the order 2015 Poleksic. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( zero/1.0/) applies to the data made available in this article, unless otherwise stated.
2 Page 2 of 8 Õ(n 3 ), where n represents the length of the protein model (and Õ hides the log factor). The algorithm returns nearoptimal GDT_A scores, meaning that the errors in our estimates can be made arbitrary small i.e., smaller than any upfront specified vale. Although our method is theoretical, we believe that its parallel implementations, coupled with carefully designed speed up techniques, can result in a practical and widely used software tool. The rest of this paper is structured as follows. First, we present three examples that illustrate drawbacks of GDT_TS and advantages of GDT_A. Then, we place our theory on a firm mathematical ground, which enables us to formally define the GDT_A computation problem. Finally, we describe the actual algorithm for GDT_A and provide its running time analysis. Methods Definition of the GDT function The GDT function is a mapping that relates each distance cutoff θ to the percentage of model residues that can be placed at distance θ from the corresponding residues in the experimentally determined structure. The graph of a GDT function provides a valuable insight into the quality of a protein model (Fig. 1). More specifically, the closer the graph runs to the horizontal axis (in other words, the smaller the area under the graph), the better the model. As a single numerical measure of the model quality, GDT_TS is extensively used at CASP to rank different models for the same target [13, 14]. Since it represents the average of GDT_P θ at several distance cutoffs, GDT_TS is often viewed as an approximation of the area under the GDT curve (GDT_A) [10 12]: GDT_TS = 3 GDT_P 2 i. i=0 However, as we demonstrate below, such a sparse sampling of the values of GDT function compromises the reliability of GDT_TS. In our first example, we analyze the protein model for the target T0482, submitted by the group TS208 at CASP8 (Fig. 1). The GDT_TS score of this particular model was not even among the best dozen at CASP8, despite the fact that it fits the largest number of residues at distance 4 from the corresponding residues in the experimental structure. In fact, the blue model (Fig. 1a) can be superimposed onto the experimental structure so that all of its residues are at distance 8 from the residues in the experimental structure (Fig. 1b), while no such superposition exists for any other model, even for the distance cutoff of 10Å. Interestingly, according to the MAMMOTH algorithm [15], the blue model is the best model for this particular target, while the DALI [16] algorithm ranks it as the second best. Although it is impossible to tell whether #13 GDT_TS rank is more or less fair than #1 and #2 rank assigned by MAMMOTH and DALI, respectively, it is also not difficult to see that the ranking by the area under the GDT plot (GDT_A) would serve as a good compromise between these extremes. The next two examples illustrate further disadvantages of GDT_TS. As seen in Fig. 2, better GDT_TS scores can be assigned to obviously worse models. Moreover, as demonstrated in Fig. 3, very similar models can have significantly different GDT_TS scores. (1) Fig. 1 CASP 8 example. a Red lines represent GDT plots of different 3D models of the target protein T0482 submitted during the CASP8 experiment. The model submitted by the group TS208 is represented in blue. b Structural alignment of the model submitted by the group TS208 (blue) and the experimental structure (red) over 67 residues evaluated by the CASP assessors
3 Page 3 of 8 Fig. 2 Insensitivity of GDT_TS. This theoretical example shows no sensitivity of GDT_TS to large variations in model quality. Surprisingly, the red model has a better GDT_TS score than the better blue model, even though it is worse by all standards. Notice that, unlike GDT_TS, the GDT_A measure is not skewed by the values at the cutoff points 1, 2, 4 and 8 Å. In fact, the GDT_A score of the blue model is twice as good as that of the red model Mathematical formalism Strictly speaking, the GDT function is not well-defined. Zooming into the plot of the model highlighted in Fig. 1a, we see a set of many small vertical segments, meaning that each point on the horizontal axis is mapped to zero or more points on the vertical axis (Fig. 4). On the other hand, the inverse function (mapping each distance cutoffs θ to the percentage of residues in the model structure that can be fit under the distance θ from the corresponding residues in the experimental structure) is obviously well defined. This allows us to define the area under the GDT plot as the complement of the area under the inverse function: GDT_A = Total_Area GDT_A where Total_Area represents the area of the rectangular region under consideration (100 10). We start our mathematical formalism by first defining a protein structure. (2) Definition 1 A protein structure a is a sequence of points in the three dimensional Euclidean space R 3 a = (a 1,..., a n ). The sequence elements a i can represent individual atoms, but it is more typical (in particular in protein structure prediction experiments) to assume that each point a i corresponds to the alpha-carbon atom of the protein s ith amino acid. In what follows, we formally define the GDT function [17]. For simplicity of presentation, we will modify the codomain of GDT to represent the fraction of residues (ranging from 0 to 1) instead of percentages of residues (ranging from 0 to 100). We note that this simple rescaling of the ordinate values will have no effects on the results obtained in our study. Definition 2 Let a = (a 1,..., a n ) be a protein structure consisting of n amino acids, let b = (b 1,..., b n ) be a 3D model of a, and let θ be a positive constant. The Hubbard function (or GDT function) is the function H b :[0, θ] (0, 1], defined by H b (θ) = max τ {i a i τ(b i ) θ} /n, where denotes the Euclidean norm on R 3 and τ is a rigid transformation (a composition of a rotation and a translation). Theorem 1 H b is a stepwise function with finitely many steps θ 1,..., θ k, 1 k n 1. Proof Since H b is monotony non-decreasing and since the range of H b is a finite subset of (0,1], it follows that H b must be a stepwise function. To complete the proof, we note that the number of steps in H b matches the size of its range, which does not exceed n 1, where n is the length of b. For simplicity of presentation, from now on (and whenever the model b is implied), we will omit the subscript in H b and denote the Hubbard function only by H. (3) Fig. 3 Oversensitivity of GDT_TS. a A four helix bundle-like (toy) protein (dashed grey line) along with two of its, almost identical, models (red and blue). A realistic example of such a target protein (PDB ID: 1JM0A) is shown on the right (b). In this example, we assume that the protein and its models are extended to the right to include 100 or more residues. Note that, if d {1 Å, 2 Å, 4 Å, 8 Å} then the GDT score of the blue model is significantly higher than that of the red model. For instance, if d = 2 Å, then the blue model has the GDT_TS score of about 87.5 since ~50 % all of its residues can be fit at distance 1 Å and 100 % under each distance 2, 4 and 8 Å from the corresponding residues in the experimental structure (dashed grey). On the other hand, the GDT_TS score of the red model is only about 75, since only ~50 % of the red model s residues can be placed under 1 and 2 Å and 100 % under 4 and 8 Å. In fact, no matter how close the red model gets to the blue model, its GDT_TS score will never improve. Note also that the blue and red models have almost identical GDT_A scores, since GDT_A is not sensitive to small coordinate changes
4 Page 4 of 8 Fig. 4 A closer look at the GDT_TS function. Zooming into the GDT plot of the model highlighted in Fig. 1. What appears to be the graph of a continuous function is, in fact, a set consisting of many separated vertical line segments Algorithm for GDT_A The area under H is the sum of the areas of the rectangular regions (θ i )(θ i θ i 1 ): Area = H(θ i )(θ i θ i 1 ), where θ 0 = 0 and θ k+1 = θ (Fig. 5). It would be trivial to compute Area had we known all θ i and all function values H(θ i ). Unfortunately, even if we knew the step points θ i, it would be computationally very difficult to compute the function values at them, since the best to date algorithm for computing H(θ i ) runs on the order of O(n 7 ) [7]. Hence, we resort to using the Riemann sums to approximate (instead of to compute exactly) the area under the graph of H. The following definition and an accompanying theorem can be found in virtually any mathematical analysis textbook. Definition 3 If f : [a, b] R is a function then R = n v i (x i x i 1 ), where a = x 0 < x 1 < < x n = b k+1 is the partition of the interval [a, b] and v i denotes the supremum of f over [x i 1, x i ], is called the upper Riemann sum of f on [a, b]. Theorem 2 Let f be a real, non-decreasing, Riemann integrable function on an interval [a, b]. Then b a f (x)dx R < x( f (b) f (a) ), (4) (5) Fig. 5 The general shape of the Hubbard function. Notice that the values θ i along with the function values H b (θ i ), i = 1, k, uniquely determine the area under the graph of H b. At the biannual CASP experiment, θ = 10 where R = n v i (x i x i 1 ) is the upper Riemann sum of f the and x = max i (x i x i 1 ). Observe that, since H b is piecewise continuous, it must be integrable on [0, θ]. Thus, the area under the graph of H is Area = θ 0 H(θ)dθ. To approximate Area with a Riemann sum, one can define the partition points ǫ,2ǫ,..., mǫ, where m = θ/ǫ (Fig. 6) and then compute an estimate Area(ǫ ) of Area as (6) (7)
5 Page 5 of 8 Fig. 6 Approximation of the Hubbard function by the Riemann sum. Area(ǫ) is the sum of the areas of all rectangular regions Area(ǫ) = m ǫh(iǫ) The error Area Area (ǫ) in the estimate (8) is below 2ǫ. Up to a half of this error is due to the error in the Riemann sum with the remaining error being due to the possible placement of the last partition point mǫ outside the interval [0, θ]. Unfortunately, computing the area estimates according to (8) is still a challenging problem, because (as we mentioned above), there is no computationally effective procedure for finding the function values H(iǫ). To circumvent the problem, we utilize an efficient algorithm capable of computing the lower bound estimates H i of H(iǫ), satisfying H((i 1)ǫ) H i H(iǫ), i = 1, m. We then compute an estimate Ãrea(ǫ) of Area as m Ãrea(ǫ) = ǫh i. (9) Since Ãrea(ǫ) Area(ǫ) < 2ǫ, it follows that Ãrea(ǫ) is a 4ǫ-approximation of Area. Below we show how to compute all H i s, and, in turn, Ãrea(ǫ) in time O ( n 3 logn/ǫ 6), where n is the length of b. Our algorithm takes advantage of an efficient procedure for computing near optimal GDT_TS values [5]. Let T(b) denotes the image of the model structure b under the transformation T. Denote by MAX(T, θ) the largest fraction of residues from T(b) that are at distance θ from the corresponding residues in the experimental structure a. To find each H i, it is enough to compute a rigid body transformation T i satisfying H((i 1)ǫ) MAX(T i, iǫ) H(iǫ). Denote by T θ a transformation that places a largest subset b θ of residues from b at distance θ from the corresponding residues in the experimental structure. Given T θ, one can easily compute b θ by calculating all n distances between the residues a i and T θ (b i ). Note that (8) P(T θ, θ) = H(θ). We approximate the transformation T θ by a so-called near-optimal transformation i.e., a transformation that places at least as many residues from the model structure under distance θ + ǫ as the optimal transformation T θ places under the distance θ. From now on, we will use Tθ ǫ to denote a near-optimal transformation and the corresponding set of residues will be denoted by bθ ǫ. Observe that P( Tθ ǫ, θ + ǫ) P(T θ, θ) = H(θ). Building upon any procedure for computing Tθ ǫ, one can ( develop ) an algorithm for Ãrea(ǫ) by substituting P Tθ ǫ i, θ i +ǫ for H i in (10), where θ i = (i 1)ǫ. Several existing methods can be modified and made suitable for finding Tθ ǫ. The most efficient such method relies on the concept of radial pair [5]. Definition 4 Let S = {s 1,..., s n } be a set of points in the three-dimensional Euclidean space. An ordered pair of points (s i, s j ) is called a radial pair of S if s j is the furthest point from s i among all points in S. Theorem 3 Let T 1 and T 2 be two transformations and let (s k, s l ) be a radial pair of S. If T 1 (s k ) T 2 (s k ) < ǫ/3 and T 1 (s l ) T 2 (s l ) < ǫ/3 then there exists a rotation R around the line through T 1 (s k ) and T 1 (s l ) such that R ( ( )) T 1 sp T2 (s p ) <ǫ, for any s p in S. The rotation R can be found in time O (nlogn), where n is the size of S. A proof of the above theorem can be found in [5]. The algorithm for finding R is fairly straightforward and it relies on the so-called plane-sweep approach [18]. The Theorem 3 implies that one choice for the nearoptimal transformation Tθ ǫ is the transformation R T, where T is any transformation that maps the points b k and b l from the radial pair (b k, b l ) of b θ to the ǫ/3 neighborhoods of T θ (b k ) and T θ (b l ), respectively, and R is the rotation around the radial axis T(b k )T(b l ) that maps the remaining points from T(b θ ) to the ǫ-neighborhoods of the corresponding points from T θ (b θ ). In search for a radial pair of b θ, the algorithm in [5] explores all n 2 possible pairs of residues in b. For each candidate radial pair (b k, b l ), the algorithm generates a finite, representative set of transformations that map b k and b l into θ + ǫ/3 neighborhoods of a k and a l, respectively (see the paragraph below for more details). For every such transformation T, a plane-sweep algorithm [18] is used to find a rotation R around the axis T(b k )T(b l ) that maximizes the number of residues from R(T(b)) that can be placed at distance < θ+ ǫ from the corresponding residues in a.
6 Page 6 of 8 Fig. 7 Discretizing the space of rigid body transformations. 2D illustration of A k (the set of the vertices of the squares shown on the left) and the set A l (a k ) generated for one a k A k A finite set of transformations that map the residues b k and b l into the θ + ǫ/3 neighborhoods of a k and a l, respectively, is constructed in such a way to ensure that for at least one of those transformation T, T(b k ) T θ (b k ) < ǫ/3 and T(b l ) T θ (b l ) < ǫ/3. This can be achieved by partitioning R 3 into small cubes of side length slightly smaller than 3ǫ/9 and then collecting the vertices of the cubes that are inside the open ball of radius θ + ǫ/6 around a k (Fig. 7). The elements of this set, denoted by A k, are the candidate points T(b k ). The number of points in A k is O(1/ǫ 3 ) and at least one of them must be at distance < ǫ/6 from T θ (b k ) (Fig. 7). For each point a k A k, the set A l (a k ) of possible images of b l under T is computed by discretizing the spherical cap S(a k, b k b l ) B(a l, θ + ǫ/3), where S(a, r) and B(a, r) denote the sphere and the open ball in R 3 with center a and radius r, respectively, in such a way that at least one point from A l (a k ) is found at distance < ǫ/3 from T θ (b l ) (Fig. 7). We note that size of A l (a k ) is O(1/ǫ 2 ). Hence, the total number of candidate pairs of points (T(b k ), T(b l )) is O(1/ǫ 5 ). An obvious to compute Tθ ǫ 1,..., Tθ ǫ m is to run the just described algorithm m times in succession, for θ = θ 1,..., θ = θ m. However, such an approach results in many unnecessary repeated calculations as the area around a k and the corresponding spherical cap in the neighborhoods of a l are discretized over and over again. Moreover, all transformations T and R, generated and inspected during the procedure for finding Tθ ǫ i, are inspected again during the procedure for finding Tθ ǫ j, for each j > i.
7 Page 7 of 8 We show that all transformations T ǫ θ 1,..., T ǫ θ m and the corresponding values H 1,, H m can be computed, at once, during the procedure of finding the last transformation, namely T ǫ θ m. As demonstrated in the pseudocode above, the transformation T is generated only once for each pair of points ( a k, a l) A k A l ( a k ) and a sweepplane algorithm for finding R is called only once for each i satisfying a k a k <θ i + ǫ/6 and a l a l <θ i + ǫ/3. The values of H i are updated on the fly. Running time analysis To analyze the algorithm s running time, we note that the number of iterations of the first for loop is equal to the number of candidate radial pairs (b k, b l ), which is O ( n 2). The number of iterations of the second for loop matches the number of pairs of grid points around a k and a l, which is O ( 1/ǫ 3) O ( 1/ǫ 2) = O ( 1/ǫ 5). Each one of O(m) = O( θ/ǫ ) = O(1/ǫ) iterations of the third for loop calls a O(nlogn) plane-sweep procedure to compute an optimal rotation and (if needed) to update the value H i. Hence, the asymptotic time complexity of the three nested for loops is O ( n 3 logn/ǫ 6). Conclusions Estimating the quality of a protein 3D model is a challenging task. Automatically generated GDT_TS score is helpful as the first raw approximation but this measure is neither sensitive nor selective enough to be exclusively relied upon in ranking different models for the same target. In this paper, we show that using a more accurate approximation of the area under the GDT curve as the criterion of model quality addresses many of the drawbacks of GDT_TS. We also present a rigorous Õ ( n 3) algorithm for computing the area under the GDT curve for a given model, where n is the model s length. The area estimate returned by our method is near-optimal, meaning that the error in the estimate can be made smaller than any upfront specified value.
8 Page 8 of 8 Despite the cubic asymptotic running time with a relatively large hidden constant, we believe that the techniques presented in this paper can guide a future development of a computationally efficient computer program, in particular since our methodology is amenable to parallel implementations. A heuristic version of the algorithm for estimating the area under the GDT plot can be found at Acknowledgements This project was supported by the University of Northern Iowa Professional Development Award. The structure alignment figures were prepared in Jmol ( Competing interests The author declares that there is no competing interests regarding the publication of this article. Received: 7 March 2015 Accepted: 9 October 2015 References 1. Zemla A, Venclovas C, Moult J, Fidelis K. Processing and analysis of CASP3 protein structure predictions. Proteins. 1999;37(S3): Zemla A, Venclovas C, Moult J, Fidelis K. Processing and evaluation of predictions in CASP4. Proteins. 2001;45(S5): Read RJ, Chavali G. Assessment of CASP7 predictions in the high accuracy template-based modeling category. Proteins. 2007;69(S8): Zemla A. LGA a method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003;31: Li SC, Bu D, Xu J, Li M. Finding nearly optimal GDT scores. J Comput Biol. 2011;18(5): Li SC, Ng YK. On protein structure alignment under distance constraint. Theor Comput Sci. 2011;412: Choi V, Goyal N. A combinatorial shape matching algorithm for rigid protein docking. CPM Lecture Notes Comput Sci. 2004;3109: Akutsu T. Protein structure alignment using dynamic programming and iterative improvement. IEICE Trans Ins Syst. 1995; E79-D(12): Poleksic A. Improved algorithms for matching r-separated sets with applications to protein structure alignment. IEEE/ACM Trans Comput Biol Bioinform. 2013;10(1): Kryshtafovych A, Fidelis K, Moult J. CASP8 results in context of previous experiments. Proteins. 2009;77(9): Tramontano A, Cozzetto D, Giorgetti A, Raimondo D. The assessment of methods for protein structure prediction. Methods Mol Biol. 2008;413: Kryshtafovych A, Milostan M, Szajkowski L, Daniluk P, Fidelis K. CASP6 data processing and automatic evaluation at the protein structure prediction center. Proteins. 2005;61(S7): Huang YJ, Mao B, Aramini JM, Montelione GT. Assessment of template-based protein structure predictions in CASP10. Proteins. 2014;82(S2): Tai CH, Bai H, Taylor TJ, Lee B. Assessment of template-free modeling in CASP10 and ROLL. Proteins. 2014;82(S2): Ortiz AR, Strauss CE, Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 2002;11: Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993;233: Pevsner J. Bioinformatics and functional genomics. 2nd edn. Wiley-Blackwell; Alt H, Mehlhorn K, Wagener H, Welzl E. Congruence, similarity, and symmetries of geometric objects. Dicrete Comput Geom. 1988;3: Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color figure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at
Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.
Mathematics Review for MS Finance Students
Mathematics Review for MS Finance Students Anthony M. Marino Department of Finance and Business Economics Marshall School of Business Lecture 1: Introductory Material Sets The Real Number System Functions,
Basic Concepts of Point Set Topology Notes for OU course Math 4853 Spring 2011
Basic Concepts of Point Set Topology Notes for OU course Math 4853 Spring 2011 A. Miller 1. Introduction. The definitions of metric space and topological space were developed in the early 1900 s, largely
Metric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
Metric Spaces. Chapter 1
Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence
NEW MEXICO Grade 6 MATHEMATICS STANDARDS
PROCESS STANDARDS To help New Mexico students achieve the Content Standards enumerated below, teachers are encouraged to base instruction on the following Process Standards: Problem Solving Build new mathematical
Rotation Rate of a Trajectory of an Algebraic Vector Field Around an Algebraic Curve
QUALITATIVE THEORY OF DYAMICAL SYSTEMS 2, 61 66 (2001) ARTICLE O. 11 Rotation Rate of a Trajectory of an Algebraic Vector Field Around an Algebraic Curve Alexei Grigoriev Department of Mathematics, The
Consensus alignment server for reliable comparative modeling with distant templates
W50 W54 Nucleic Acids Research, 2004, Vol. 32, Web Server issue DOI: 10.1093/nar/gkh456 Consensus alignment server for reliable comparative modeling with distant templates Jahnavi C. Prasad 1, Sandor Vajda
Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks
Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks Welcome to Thinkwell s Homeschool Precalculus! We re thrilled that you ve decided to make us part of your homeschool curriculum. This lesson
The degree of a polynomial function is equal to the highest exponent found on the independent variables.
DETAILED SOLUTIONS AND CONCEPTS - POLYNOMIAL FUNCTIONS Prepared by Ingrid Stewart, Ph.D., College of Southern Nevada Please Send Questions and Comments to [email protected]. Thank you! PLEASE NOTE
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
Section 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
Applied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
Lecture 19: Proteins, Primary Struture
CPS260/BGT204.1 Algorithms in Computational Biology November 04, 2003 Lecture 19: Proteins, Primary Struture Lecturer: Pankaj K. Agarwal Scribe: Qiuhua Liu 19.1 The Building Blocks of Protein [1] Proteins
Why? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
Estimating the Average Value of a Function
Estimating the Average Value of a Function Problem: Determine the average value of the function f(x) over the interval [a, b]. Strategy: Choose sample points a = x 0 < x 1 < x 2 < < x n 1 < x n = b and
A REMARK ON ALMOST MOORE DIGRAPHS OF DEGREE THREE. 1. Introduction and Preliminaries
Acta Math. Univ. Comenianae Vol. LXVI, 2(1997), pp. 285 291 285 A REMARK ON ALMOST MOORE DIGRAPHS OF DEGREE THREE E. T. BASKORO, M. MILLER and J. ŠIRÁŇ Abstract. It is well known that Moore digraphs do
How To Understand And Solve Algebraic Equations
College Algebra Course Text Barnett, Raymond A., Michael R. Ziegler, and Karl E. Byleen. College Algebra, 8th edition, McGraw-Hill, 2008, ISBN: 978-0-07-286738-1 Course Description This course provides
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
The Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs The degree-diameter problem for circulant graphs of degree 8 and 9 Journal Article How to cite:
Section 3.7. Rolle s Theorem and the Mean Value Theorem. Difference Equations to Differential Equations
Difference Equations to Differential Equations Section.7 Rolle s Theorem and the Mean Value Theorem The two theorems which are at the heart of this section draw connections between the instantaneous rate
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
Binary Image Reconstruction
A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands [email protected] Abstract We present a new algorithm
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Factoring & Primality
Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount
Offline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: [email protected] 2 IBM India Research Lab, New Delhi. email: [email protected]
Surface bundles over S 1, the Thurston norm, and the Whitehead link
Surface bundles over S 1, the Thurston norm, and the Whitehead link Michael Landry August 16, 2014 The Thurston norm is a powerful tool for studying the ways a 3-manifold can fiber over the circle. In
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
ON FIBER DIAMETERS OF CONTINUOUS MAPS
ON FIBER DIAMETERS OF CONTINUOUS MAPS PETER S. LANDWEBER, EMANUEL A. LAZAR, AND NEEL PATEL Abstract. We present a surprisingly short proof that for any continuous map f : R n R m, if n > m, then there
24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no
MA 323 Geometric Modelling Course Notes: Day 02 Model Construction Problem
MA 323 Geometric Modelling Course Notes: Day 02 Model Construction Problem David L. Finn November 30th, 2004 In the next few days, we will introduce some of the basic problems in geometric modelling, and
MASCOT Search Results Interpretation
The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually
. 0 1 10 2 100 11 1000 3 20 1 2 3 4 5 6 7 8 9
Introduction The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive integer We say d is a
TOPIC 4: DERIVATIVES
TOPIC 4: DERIVATIVES 1. The derivative of a function. Differentiation rules 1.1. The slope of a curve. The slope of a curve at a point P is a measure of the steepness of the curve. If Q is a point on the
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
CS3220 Lecture Notes: QR factorization and orthogonal transformations
CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss
Math 4310 Handout - Quotient Vector Spaces
Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable
A Note on Maximum Independent Sets in Rectangle Intersection Graphs
A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada [email protected] September 12,
CHAPTER 1 Splines and B-splines an Introduction
CHAPTER 1 Splines and B-splines an Introduction In this first chapter, we consider the following fundamental problem: Given a set of points in the plane, determine a smooth curve that approximates the
JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004
Scientiae Mathematicae Japonicae Online, Vol. 10, (2004), 431 437 431 JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS Ondřej Čepeka and Shao Chin Sung b Received December May 12, 2003; revised February
The Student-Project Allocation Problem
The Student-Project Allocation Problem David J. Abraham, Robert W. Irving, and David F. Manlove Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK Email: {dabraham,rwi,davidm}@dcs.gla.ac.uk.
136 CHAPTER 4. INDUCTION, GRAPHS AND TREES
136 TER 4. INDUCTION, GRHS ND TREES 4.3 Graphs In this chapter we introduce a fundamental structural idea of discrete mathematics, that of a graph. Many situations in the applications of discrete mathematics
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration
Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the
Administrative - Master Syllabus COVER SHEET
Administrative - Master Syllabus COVER SHEET Purpose: It is the intention of this to provide a general description of the course, outline the required elements of the course and to lay the foundation for
Integer Factorization using the Quadratic Sieve
Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 [email protected] March 16, 2011 Abstract We give
1 Sets and Set Notation.
LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most
FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES
FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES CHRISTOPHER HEIL 1. Cosets and the Quotient Space Any vector space is an abelian group under the operation of vector addition. So, if you are have studied
New York State Student Learning Objective: Regents Geometry
New York State Student Learning Objective: Regents Geometry All SLOs MUST include the following basic components: Population These are the students assigned to the course section(s) in this SLO all students
http://www.aleks.com Access Code: RVAE4-EGKVN Financial Aid Code: 6A9DB-DEE3B-74F51-57304
MATH 1340.04 College Algebra Location: MAGC 2.202 Meeting day(s): TR 7:45a 9:00a, Instructor Information Name: Virgil Pierce Email: [email protected] Phone: 665.3535 Teaching Assistant Name: Indalecio
SECTION 2.5: FINDING ZEROS OF POLYNOMIAL FUNCTIONS
SECTION 2.5: FINDING ZEROS OF POLYNOMIAL FUNCTIONS Assume f ( x) is a nonconstant polynomial with real coefficients written in standard form. PART A: TECHNIQUES WE HAVE ALREADY SEEN Refer to: Notes 1.31
Higher Education Math Placement
Higher Education Math Placement Placement Assessment Problem Types 1. Whole Numbers, Fractions, and Decimals 1.1 Operations with Whole Numbers Addition with carry Subtraction with borrowing Multiplication
Research Article Stability Analysis for Higher-Order Adjacent Derivative in Parametrized Vector Optimization
Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2010, Article ID 510838, 15 pages doi:10.1155/2010/510838 Research Article Stability Analysis for Higher-Order Adjacent Derivative
Optimal shift scheduling with a global service level constraint
Optimal shift scheduling with a global service level constraint Ger Koole & Erik van der Sluis Vrije Universiteit Division of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The
Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
THREE DIMENSIONAL GEOMETRY
Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,
Lecture 4 DISCRETE SUBGROUPS OF THE ISOMETRY GROUP OF THE PLANE AND TILINGS
1 Lecture 4 DISCRETE SUBGROUPS OF THE ISOMETRY GROUP OF THE PLANE AND TILINGS This lecture, just as the previous one, deals with a classification of objects, the original interest in which was perhaps
Factoring Patterns in the Gaussian Plane
Factoring Patterns in the Gaussian Plane Steve Phelps Introduction This paper describes discoveries made at the Park City Mathematics Institute, 00, as well as some proofs. Before the summer I understood
Common Core Unit Summary Grades 6 to 8
Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
The Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
Strategic Online Advertising: Modeling Internet User Behavior with
2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew
GEOMETRY COMMON CORE STANDARDS
1st Nine Weeks Experiment with transformations in the plane G-CO.1 Know precise definitions of angle, circle, perpendicular line, parallel line, and line segment, based on the undefined notions of point,
Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar
Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples
Geometry. Higher Mathematics Courses 69. Geometry
The fundamental purpose of the course is to formalize and extend students geometric experiences from the middle grades. This course includes standards from the conceptual categories of and Statistics and
1 Approximating Set Cover
CS 05: Algorithms (Grad) Feb 2-24, 2005 Approximating Set Cover. Definition An Instance (X, F ) of the set-covering problem consists of a finite set X and a family F of subset of X, such that every elemennt
Mathematics 31 Pre-calculus and Limits
Mathematics 31 Pre-calculus and Limits Overview After completing this section, students will be epected to have acquired reliability and fluency in the algebraic skills of factoring, operations with radicals
North Carolina Math 2
Standards for Mathematical Practice 1. Make sense of problems and persevere in solving them. 2. Reason abstractly and quantitatively 3. Construct viable arguments and critique the reasoning of others 4.
AN ALGORITHM FOR DETERMINING WHETHER A GIVEN BINARY MATROID IS GRAPHIC
AN ALGORITHM FOR DETERMINING WHETHER A GIVEN BINARY MATROID IS GRAPHIC W. T. TUTTE. Introduction. In a recent series of papers [l-4] on graphs and matroids I used definitions equivalent to the following.
The number of generalized balanced lines
The number of generalized balanced lines David Orden Pedro Ramos Gelasio Salazar Abstract Let S be a set of r red points and b = r + 2δ blue points in general position in the plane, with δ 0. A line l
Analysis of Algorithms, I
Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Polynomial Degree and Finite Differences
CONDENSED LESSON 7.1 Polynomial Degree and Finite Differences In this lesson you will learn the terminology associated with polynomials use the finite differences method to determine the degree of a polynomial
4.3 Lagrange Approximation
206 CHAP. 4 INTERPOLATION AND POLYNOMIAL APPROXIMATION Lagrange Polynomial Approximation 4.3 Lagrange Approximation Interpolation means to estimate a missing function value by taking a weighted average
COLLEGE ALGEBRA LEARNING COMMUNITY
COLLEGE ALGEBRA LEARNING COMMUNITY Tulsa Community College, West Campus Presenter Lori Mayberry, B.S., M.S. Associate Professor of Mathematics and Physics [email protected] NACEP National Conference
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Section I Using Jmol as a Computer Visualization Tool
Section I Using Jmol as a Computer Visualization Tool Jmol is a free open source molecular visualization program used by students, teachers, professors, and scientists to explore protein structures. Section
Equations. #1-10 Solve for the variable. Inequalities. 1. Solve the inequality: 2 5 7. 2. Solve the inequality: 4 0
College Algebra Review Problems for Final Exam Equations #1-10 Solve for the variable 1. 2 1 4 = 0 6. 2 8 7 2. 2 5 3 7. = 3. 3 9 4 21 8. 3 6 9 18 4. 6 27 0 9. 1 + log 3 4 5. 10. 19 0 Inequalities 1. Solve
VRSPATIAL: DESIGNING SPATIAL MECHANISMS USING VIRTUAL REALITY
Proceedings of DETC 02 ASME 2002 Design Technical Conferences and Computers and Information in Conference Montreal, Canada, September 29-October 2, 2002 DETC2002/ MECH-34377 VRSPATIAL: DESIGNING SPATIAL
Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one
Determination of the normalization level of database schemas through equivalence classes of attributes
Computer Science Journal of Moldova, vol.17, no.2(50), 2009 Determination of the normalization level of database schemas through equivalence classes of attributes Cotelea Vitalie Abstract In this paper,
SCORE SETS IN ORIENTED GRAPHS
Applicable Analysis and Discrete Mathematics, 2 (2008), 107 113. Available electronically at http://pefmath.etf.bg.ac.yu SCORE SETS IN ORIENTED GRAPHS S. Pirzada, T. A. Naikoo The score of a vertex v in
U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra
U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory
Estimated Pre Calculus Pacing Timeline
Estimated Pre Calculus Pacing Timeline 2010-2011 School Year The timeframes listed on this calendar are estimates based on a fifty-minute class period. You may need to adjust some of them from time to
LINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Physics 9e/Cutnell. correlated to the. College Board AP Physics 1 Course Objectives
Physics 9e/Cutnell correlated to the College Board AP Physics 1 Course Objectives Big Idea 1: Objects and systems have properties such as mass and charge. Systems may have internal structure. Enduring
ON THE COMPLEXITY OF THE GAME OF SET. {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu
ON THE COMPLEXITY OF THE GAME OF SET KAMALIKA CHAUDHURI, BRIGHTEN GODFREY, DAVID RATAJCZAK, AND HOETECK WEE {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu ABSTRACT. Set R is a card game played with a
Near Optimal Solutions
Near Optimal Solutions Many important optimization problems are lacking efficient solutions. NP-Complete problems unlikely to have polynomial time solutions. Good heuristics important for such problems.
BookTOC.txt. 1. Functions, Graphs, and Models. Algebra Toolbox. Sets. The Real Numbers. Inequalities and Intervals on the Real Number Line
College Algebra in Context with Applications for the Managerial, Life, and Social Sciences, 3rd Edition Ronald J. Harshbarger, University of South Carolina - Beaufort Lisa S. Yocco, Georgia Southern University
