Gaussian Processes for Regression: A Quick Introduction


 Regina Garrison
 1 years ago
 Views:
Transcription
1 Gaussian Processes for Regression A Quick Introduction M Ebden, August 28 Coents to MOTIVATION Figure illustrates a typical eaple of a prediction proble given soe noisy observations of a dependent variable at certain values of the independent variable, what is our best estiate of the dependent variable at a new value,? If we epect the underlying function to be linear, and can ake soe assuptions about the input data, we ight use a leastsquares ethod to fit a straight line (linear regression) Moreover, if we suspect ay also be quadratic, cubic, or even nonpolynoial, we can use the principles of odel selection to choose aong the various possibilities Gaussian process regression (GPR) is an even finer approach than this Rather than claiing relates to soe specific odels (eg ), a Gaussian process can represent obliquely, but rigorously, by letting the data speak ore clearly for theselves GPR is still a for of supervised learning, but the training data are harnessed in a subtler way As such, GPR is a less paraetric tool However, it s not copletely freefor, and if we re unwilling to ake even basic assuptions about, then ore general techniques should be considered, including those underpinned by the principle of aiu entropy; Chapter 6 of Sivia and Skilling (26) offers an introduction 5 5? y Figure Given si noisy data points (error bars are indicated with vertical lines), we are interested in estiating a seventh at
2 ( ( ( ( ( Y WW 2 DEFINITION OF A GAUSSIAN PROCESS Gaussian processes (GPs) etend ultivariate Gaussian distributions to infinite diensionality Forally, a Gaussian process generates data located throughout soe doain such that any finite subset of the range follows a ultivariate Gaussian distribution Now, the observations in an arbitrary data set,! "#%$'&, can always be iagined as a single point sapled fro soe ultivariate ( variate) Gaussian distribution, after enough thought Hence, working backwards, this data set can be partnered with a GP Thus GPs are as universal as they are siple Very often, it s assued that the ean of this partner GP is zero everywhere What )*'+, relates one observation to another in such cases is just the covariance function, ( A popular choice is the squared eponential, # + /2 +, ; () %< = where the aiu allowable covariance is defined as / 3 this should be high for functions which cover a broad range on the ais If?>'+, then ( * +, approaches this aiu, eaning is nearly perfectly correlated with This is good for our function to look sooth, neighbours ust be alike Now if is distant fro '+, we have instead ( * +A B>C, ie the two points cannot see each other So, for eaple, during interpolation at new values, distant observations will have negligible effect How uch effect this separation has will depend on the length paraeter, <, so there is uch fleibility built into () Not quite enough fleibility though the data are often noisy as well, fro easureent errors and so on Each observation can be thought of as related to an underlying function through a Gaussian noise odel D FE G#/ $ H (2) soething which should look failiar to those who ve done regression before Re gression is the search for Purely for siplicity of eposition in the net page, we take the novel approach of folding the noise into ( # '+,, by writing )* + / 3 '+, 4I576J8 K< L/ = $7M # + I (3) where M # '+, is the Kronecker delta function (When ost people use Gaussian processes, they keep /G$ separate fro ( * +, However, our redefinition of ( )* is equally suitable for working with probles of the sort posed in Figure So, given observations, our objective is to predict, not the actual ; their epected values are identical according to (2), but their variances differ owing to the observational noise process eg in Figure, the epected value of, and of, is the dot at ) To prepare for GPR, we calculate the covariance function, (3), aong all possible cobinations of these points, suarizing our findings in three atrices 2;# ( ;* TS S!S ( 2;# '$ VXW N POQ # ( * TS S!S ( # '$ RQQ (4) '$2# I ( $ * US S!S ( '$2#'$ N [Z ( %*2 ( %* \S!S S ( K* $G ^] N _ ( * I N (5) Confir for yourself that the diagonal eleents of are / 3 `/ $, and that its etree offdiagonal eleents tend to zero when spans a large enough doain 2
3 N Z QO Q QQ QRQ N W WW W W Y 3 HOW TO REGRESS USING GAUSSIAN PROCESSES Since the key assuption in GP odelling is that our data can be represented as a saple fro a ultivariate Gaussian distribution, we have that 8 a =cb EedGf 8 N Nhg N i =kj (6) on given the data, how likely is a certain prediction for? As where l indicates atri transposition We are of course interested in the conditional probability eplained ore slowly in the Appendi, the probability follows a Gaussian distribution n b E N NLp N i N Nqp N g H (7) Our best estiate for is the ean of this distribution N NLp (8) and the uncertainty in our estiate is captured in its variance r;skt! N _ We re now ready to tackle the data in Figure There are vu observations, at wh y% o B B Gz%y N N p N g (9) G { G oy G o ] We know /G$D G } fro the error bars With judicious choices of / 3 and < (ore on this later), we have enough to calculate a covariance atri using (4) zk {a ~z zk y VXW {a zk y%u }K{ z y%u zk y { G ~azt }%{ y z; yk {o~ Gz% {a yk z; yku G y G az {o~ yku z; N Fro (5) we also have iƒ zk and N Z G }o~tgzk %} }oy {ou yk~ ] 2 Fro (8) and (9), G y and r;skt v 3 Figure shows a data point with a question ark underneath, representing the estiation of the dependent variable at G We can repeat the above procedure for various other points spread over soe portion of the ais, as shown in Figure 2 (In fact, equivalently, we could N avoid the repetition by perforing the above procedure once with suitably larger N and N _ atrices In this case, since there are, test points spread over the ais, _ would be of size,,) Rather than plotting siple error bars, we ve decided to plot %ugˆ r;skt, giving a 95% confidence interval 3
4 ( 5 5 y Figure 2 The solid line indicates an estiation of for, values of Pointwise 95% confidence intervals are shaded 4 GPR IN THE REAL WORLD The reliability of our regression is dependent on how well we select the covariance function Clearly if its paraeters call the Š <Œi/ 3 #/G$ & are not chosen sensibly, the result is nonsense Our aiu a posteriori estiate of occurs when n w is at its greatest Bayes theore tells us that, assuing we have little prior knowledge about what should be, this corresponds to aiizing Ž % " given by Ž o " n w g NLp Ž % n N n n w, Ž % K () Siply run your favourite ultivariate optiization algorith (eg conjugate gradients, NelderMead siple, etc) on this equation and you ve found a pretty good choice for ; in our eaple, < and / 3 oz It s only pretty good because, of course, Thoas Bayes is rolling in his grave Why coend just one answer for, when you can integrate everything over the any different possible choices for? Chapter 5 of Rasussen and Willias (26) presents the equations necessary in this case Finally, if you feel you ve grasped the toy proble in Figure 2, the net two eaples handle ore coplicated cases Figure 3(a), in addition to a longter downward trend, has soe fluctuations, so we ight use a ore sophisticated covariance function * + / '+, 3 4I576 8 K< / = 3 '+, 4I576J8 K< L/ = $ M # + I () The first ter takes into account the sall vicissitudes of the dependent variable, and the second ter has a longer length paraeter ( < > u%< ) to represent its longter 4
5 ( (a) (b) y y Figure 3 Estiation of (solid line) for a function with (a) shortter and longter dynaics, and (b) longter dynaics and a periodic eleent Observations are shown as crosses trend Covariance functions can be grown in this way ad infinitu, to suit the copleity of your particular data The function looks as if it ight contain a periodic eleent, but it s difficult to be sure Let s consider another function, which we re told has a periodic eleent The solid line in Figure 3(b) was regressed with the following covariance function )* + v/ 3 4I576J8 '+, K< = 4I576 # š7% œa + ^ ž&ÿl/ $M )* + I (2) The first ter represents the hilllike trend over the long ter, and the second ter gives periodicity with frequency œ This is the first tie we ve encountered a case * +,? > for where and '+ can be distant and yet still see each other (that is, ( + ) What if the dependent variable has other dynaics which, a priori, you epect to )*'+, N can be, provided is positive appear? There s no liit to how coplicated ( definite Chapter 4 of Rasussen and Willias (26) offers a good outline of the range of covariance functions you should keep in your toolkit Hang on a inute, you ask, isn t choosing a covariance function fro a toolkit a lot like choosing a odel type, such as linear versus cubic which we discussed at the outset? Well, there are indeed siilarities In fact, there is no way to perfor regression without iposing at least a odicu of structure on the data set; such is the nature of generative odelling However, it s worth repeating that Gaussian processes do allow the data to speak very clearly For eaple, there eists ecellent theoretical justification for the use of () in any settings (Rasussen and Willias (26), Section 43) You will still want to investigate carefully which covariance functions are appropriate for your data set Essentially, choosing aong alternative functions is a way of reflecting various fors of prior knowledge about the physical process under investigation 5
6 g g ª 5 DISCUSSION We ve presented a brief outline of the atheatics of GPR, but practical ipleentation of the above ideas requires the solution of a few algorithic hurdles as opposed to those of data analysis If you aren t a good coputer prograer, then the code for Figures and 2 is at ftp//ftprobotsoacuk/pub/outgoing/ebden/isc/gptutzip, and ore general code can be found at http//wwwgaussianprocessorg/gpl We ve erely scratched the surface of a powerful technique (MacKay, 998) First, although the focus has been on onediensional inputs, it s siple to accept those of higher diension Whereas would then change fro a scalar to a vector, ( )* + would reain a scalar and so the aths overall would be virtually unchanged Second, the zero vector representing the ean of the ultivariate Gaussian distribution in (6) can be replaced with functions of Third, in addition to their use in regression, GPs are applicable to integration, global optiization, itureofeperts odels, unsupervised learning odels, and ore see Chapter 9 of Rasussen and Willias (26) The net tutorial will focus on their use in classification REFERENCES MacKay, D (998) In CM Bishop (Ed), Neural networks and achine learning (NATO ASI Series, Series F, Coputer and Systes Sciences, Vol 68, pp 3366) Dordrecht Kluwer Acadeic Press Rasussen, C and C Willias (26) Gaussian Processes for Machine Learning MIT Press Sivia, D and J Skilling (26) Data Analysis A Bayesian Tutorial (second ed) Oford Science Publications APPENDIX Iagine a data saple taken fro soe ultivariate Gaussian distribution with zero ean and a covariance given by atri Now decopose arbitrarily into two consecutive subvectors and in other words, writing E f b would be the sae as writing 8 =Db Eed f" 8 ª =«j (3) where, ª, and are the corresponding bits and pieces that ake up Interestingly, the conditional distribution of given is itself Gaussiandistributed If the covariance atri were diagonal or even block diagonal, then knowing wouldn t tell us anything about specifically, n E f" b On the other hand, if were nonzero, then soe atri algebra leads us to n E p b p g H ª (4) p The ean,, is known as the atri of regression coefficients, and the variance, ª, is the Schur copleent of in p In suary, if we know soe of, we can use that to infor our estiate of what the rest of ight be, thanks to the revealing offdiagonal eleents of 6
7 N Gaussian Processes for Classification A Quick Introduction M Ebden, August 28 Prerequisite reading Gaussian Processes for Regression OVERVIEW As entioned in the previous docuent, GPs can be applied to probles other than, it can regression For eaple, if the output of a GP is squashed onto the range represent the probability of a data point belonging to one of say two types, and voilà, we can ascertain classifications This is the subject of the current docuent The big difference between GPR and GPC is how the output data,, are linked to the underlying function outputs, They are no longer connected siply via a noise process as in (2) in the previous docuent, but are instead now discrete say L precisely for one class and B for the other In principle, we could try fitting a GP that produces an output of approiately for soe values of and approiately B for others, siulating this discretization Instead, we interpose the GP between the data and a squashing function; then, classification of a new data point involves two steps instead of one Evaluate a latent function which odels qualitatively how the likelihood of one class versus the other changes over the ais This is the GP using any sigoidal function, 2 Squash the output of this latent function onto G prob D n Writing these two steps scheatically, data, GP 7 latent function, %n sigoid H* class probability, The net section will walk you through ore slowly how such a classifier operates Section 3 eplains how to train the classifier, so perhaps we re presenting things in reverse order! Section 4 handles classification when there are ore than two classes Before we get started, a quick note on Although other fors will do, here we will prescribe it to be the cuulative Gaussian distribution, This ± shaped function satisfies our needs, apping high into ²>, and low into «> A second quick note, revisiting (6) and (7) in the first docuent confir for yourself that, if there were no noise ( /G$D v ), the two equations could be rewritten as 8 = b N g E d f 8 N N _ =Ÿj () and n b E N NLp N _ N NLp N g H (2)
8 ˆ $ 2 USING THE CLASSIFIER Suppose we ve trained a classifier fro input data, w, and their corresponding epertlabelled output data, And suppose that in the process we fored soe GP outputs corresponding to these data, which have soe uncertainty but ean values given by We re now ready to input a new data point,, in the left side of our scheatic, in order to deterine at the other end the probability 2 of its class ebership In the first step, finding the probability n is siilar to GPR, ie we adapt (2) n ² E N NLp N i N N + p N g I (3) N ( + will be eplained soon, N but for now consider it to be very siilar to ) In the second step, we squash to find the probability of class ebership, The epected value is µ %n * (4) This is the integral of a cuulative Gaussian ties a Gaussian, which can be solved analytically By Section 39 of Rasussen and Willias (26), the solution is d An eaple is depicted in Figure r;skt! ;j (5) 3 TRAINING THE GP IN THE CLASSIFIER Our objective now is to find N and +, so that we know everything about the GP producing (3), the first step of the classifier The second step of the classifier does not require training as it s a fied sigoidal function Aong the any GPs which could be partnered with our data set, naturally we d like to copare their usefulness quantitatively Considering the outputs of a certain GP, how likely they are to be appropriate for the training data can be decoposed using Bayes theore n w Ÿ n n w n w (6) Let s focus on the two factors in the nuerator Assuing the data set is iid, n ² ¹ º ¹ n ¹ I (7) Dropping the subscripts in the product, n is infored by our sigoid function, Specifically, n is by definition, and to coplete the picture, n? B» A terse way of cobining these two cases is to write n The second factor in the nuerator is n w This is related to the output of the first step of our scheatic drawing, but first we re interested in the value of n w which aiizes the posterior probability n w This occurs when the derivative 2
9 Predictive probability Latent function f() Figure (a) Toy classification dataset, where circles and crosses indicate class ebership of the input (training) data, at locations w is for one class and B for another, but for illustrative purposes we pretend ¼ë instead of B in this figure The solid line is the (ean) probability ) prob ƒ n!, ie the answer to our proble after successfully perforing GPC (b) The corresponding distribution of the latent function, not constrained to lie between and of (6) with respect to is zero, or equivalently and ore siply, when the derivative of its logarith is zero Doing this, and using the sae logic that produced () in the previous docuent, we find that N9½ Ž o " is the best for our proble Unfortunately, n I (8) where appears on both sides of the equation, so we ake an initial guess (zero is fine) and go through a few iterations The answer to (8) can be used directly in (3), so we ve found one of the two quantities we seek therein The variance of is given by the negative second derivative of the logarith of (6), N p which turns out to be ¾ p, with ¾À ½Á½ Ž % 2 n Making a Laplace approiation, we pretend n w is Gaussian distributed, ie n w bâ n w Ÿ LE N p ¾ p H (9) (This assuption is occasionally inaccurate, so if it yields poor classifications, better ways of characterizing the uncertainty in should be considered, for eaple via epectation propagation) 3
10 Ê g Now for a subtle point The fact that can vary eans that using (2) directly is inappropriate in particular, its ean is correct but its variance no longer tells the whole N story This is why we use the adapted version, (3), with + N instead of Since the varying quantity in (2), N, is being ultiplied by N p, we N N add N p cov N p N g to the variance in (2) Siplification leads to (3), in which +' N ¾ p With the GP now copletely specified, we re ready to use the classifier as described in the previous section GPC in the Real World As with GPR, the reliability of our classification is dependent on how well we select the covariance function in the GP in our first step The paraeters are ë <Œi/ 3 &, one fewer now because / $ [ However, as usual, is optiized by aiizing n w, or (oitting on the righthand side of the equation), n w µ n This can be siplified, using a Laplace approiation, to yield n w g Nqp Ž o " n n w * () Ž % n N n S n Nqp ¾ n H () This is the equation to run your favourite optiizer on, as perfored in GPR 4 MULTICLASS GPC We ve described binary classification, where the nuber of possible classes,, is just two In the case of ÄÃ classes, one approach is to fit an for each class In the first of the two steps of classification, our GP values are concatenated as!!2 $! 2 $! 2 Å! 2 Å $ g (2) Let be a vector of the sae length as which, for each Æ!2, is for the class which is the label N and for the other Ç N entries Let grow to being block diagonal in the atrices! 2 N Å is a lengthening So the first change we see for ÈÃ of the GP Section 35 of Rasussen and Willias (26) offers hints on keeping the coputations anageable The second change is that the (erely onediensional) cuulative Gaussian distribution is no longer sufficient to describe the squashing function in our classifier; instead we use the softa function For the Æ th data point, É ¹ n ¹ ² É ¹ 4I576 ž ¹ É É^Ë ¹ É Ë (3) where ¹ is a nonconsecutive subset of, viz ¹ Ä ¹ ¹!!2 ¹ Å & We can suarize our results with Ì!!!2* $ *!!"* $!"* Å!2* $ Å & Now that we ve presented the two big changes needed to go fro binary to ulticlass GPC, we continue as before Setting to zero the derivative of the logarith of the coponents in (6), we replace (8) with N Ì H N p (4) The corresponding variance is ¾ p as before, but now ¾Š diag Ì ÎÍ Í, where Í is a v L atri obtained by stacking vertically the diagonal atrices diag Ì É, if Ì É is the subvector of Ì pertaining to class 4
11 $ Å With these quantities estiated, we have enough to generalize (3) to ž É n ² LEÏ N É N É p É N diag _ N É g N É ¾ É p p N É g Ð (5) where É N, É, and ¾ É represent the classrelevant inforation only Finally, () is replaced with n w ² g N p g Ñ ¹ º Ž % hò Ñ É º 4I576 É ¹ Ó Ž o Ïin N n S n N p ¾ We won t present an eaple of ulticlass GPC, but hopefully you get the idea n Ð (6) 5 DISCUSSION As with GPR, classification can be etended to accept values with ultiple diensions, while keeping ost of the atheatics unchanged Other possible etensions include using the epectation propagation ethod in lieu of the Laplace approiation as entioned previously, putting confidence intervals on the classification probabilities, calculating the derivatives of (6) to aid the optiizer, or using the variational Gaussian process classifiers described in MacKay (998), to nae but four etensions Second, we repeat the Bayesian call fro the previous docuent to integrate over a range of possible covariance function paraeters This should be done regardless of how uch prior knowledge is available see for eaple Chapter 5 of Sivia and Skilling (26) on how to choose priors in the ost opaque situations Third, we ve again spared you a few practical algorithic details; coputer code is available at http//wwwgaussianprocessorg/gpl, with eaples ACKNOWLEDGMENTS Thanks are due to Prof Stephen Roberts and ebers of his Pattern Analysis Research Group, as well as the ALADDIN project (wwwaladdinprojectorg) REFERENCES MacKay, D (998) In CM Bishop (Ed), Neural networks and achine learning (NATO ASI Series, Series F, Coputer and Systes Sciences, Vol 68, pp 3366) Dordrecht Kluwer Acadeic Press Rasussen, C and C Willias (26) Gaussian Processes for Machine Learning MIT Press Sivia, D and J Skilling (26) Data Analysis A Bayesian Tutorial (second ed) Oford Science Publications 5
ABSTRACT KEYWORDS. Comonotonicity, dependence, correlation, concordance, copula, multivariate. 1. INTRODUCTION
MEASURING COMONOTONICITY IN MDIMENSIONAL VECTORS BY INGE KOCH AND ANN DE SCHEPPER ABSTRACT In this contribution, a new easure of coonotonicity for diensional vectors is introduced, with values between
More informationTHE FIVE DO S AND FIVE DON TS OF SUCCESSFUL BUSINESSES BDC STUDY. BDC Small Business Week 2014
BDC STUDY THE FIVE DO S AND FIVE DON TS OF SUCCESSFUL BUSINESSES BDC Sall Business Week 2014 bdc.ca BUSINESS DEVELOPMENT BANK OF CANADA BDC Sall Business Week 2014 PAGE 1 Executive suary 
More informationGenerative or Discriminative? Getting the Best of Both Worlds
BAYESIAN STATISTICS 8, pp. 3 24. J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2007 Generative or Discriminative?
More informationThe Backpropagation Algorithm
7 The Backpropagation Algorithm 7. Learning as gradient descent We saw in the last chapter that multilayered networks are capable of computing a wider range of Boolean functions than networks with a single
More informationUSECASE 2.0. The Guide to Succeeding with Use Cases. Ivar Jacobson Ian Spence Kurt Bittner. December 2011. USECASE 2.0 The Definitive Guide
USECASE 2.0 The Guide to Succeeding with Use Cases Ivar Jacobson Ian Spence Kurt Bittner December 2011 USECASE 2.0 The Definitive Guide About this Guide 3 How to read this Guide 3 What is UseCase 2.0?
More informationCommunication Theory of Secrecy Systems
Communication Theory of Secrecy Systems By C. E. SHANNON 1 INTRODUCTION AND SUMMARY The problems of cryptography and secrecy systems furnish an interesting application of communication theory 1. In this
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationHow to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationIntroduction to Data Mining and Knowledge Discovery
Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining
More informationIntellectual Need and ProblemFree Activity in the Mathematics Classroom
Intellectual Need 1 Intellectual Need and ProblemFree Activity in the Mathematics Classroom Evan Fuller, Jeffrey M. Rabin, Guershon Harel University of California, San Diego Correspondence concerning
More informationAn Introduction to Regression Analysis
The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes * Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator
More informationLearning Deep Architectures for AI. Contents
Foundations and Trends R in Machine Learning Vol. 2, No. 1 (2009) 1 127 c 2009 Y. Bengio DOI: 10.1561/2200000006 Learning Deep Architectures for AI By Yoshua Bengio Contents 1 Introduction 2 1.1 How do
More informationLearning to Select Features using their Properties
Journal of Machine Learning Research 9 (2008) 23492376 Submitted 8/06; Revised 1/08; Published 10/08 Learning to Select Features using their Properties Eyal Krupka Amir Navot Naftali Tishby School of
More informationThe InStat guide to choosing and interpreting statistical tests
Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 19902003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:
More informationHow many numbers there are?
How many numbers there are? RADEK HONZIK Radek Honzik: Charles University, Department of Logic, Celetná 20, Praha 1, 116 42, Czech Republic radek.honzik@ff.cuni.cz Contents 1 What are numbers 2 1.1 Natural
More informationA First Encounter with Machine Learning. Max Welling Donald Bren School of Information and Computer Science University of California Irvine
A First Encounter with Machine Learning Max Welling Donald Bren School of Information and Computer Science University of California Irvine November 4, 2011 2 Contents Preface Learning and Intuition iii
More informationFrom Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
SIAM REVIEW Vol. 51,No. 1,pp. 34 81 c 2009 Society for Industrial and Applied Mathematics From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images Alfred M. Bruckstein David
More informationSteering User Behavior with Badges
Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,
More informationRevised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m)
Chapter 23 Squares Modulo p Revised Version of Chapter 23 We learned long ago how to solve linear congruences ax c (mod m) (see Chapter 8). It s now time to take the plunge and move on to quadratic equations.
More informationGuide for Texas Instruments TI83, TI83 Plus, or TI84 Plus Graphing Calculator
Guide for Texas Instruments TI83, TI83 Plus, or TI84 Plus Graphing Calculator This Guide is designed to offer stepbystep instruction for using your TI83, TI83 Plus, or TI84 Plus graphing calculator
More informationOptimization by Direct Search: New Perspectives on Some Classical and Modern Methods
SIAM REVIEW Vol. 45,No. 3,pp. 385 482 c 2003 Society for Industrial and Applied Mathematics Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods Tamara G. Kolda Robert Michael
More informationDiscovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow
Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell
More informationWe show that social scientists often do not take full advantage of
Making the Most of Statistical Analyses: Improving Interpretation and Presentation Gary King Michael Tomz Jason Wittenberg Harvard University Harvard University Harvard University Social scientists rarely
More informationA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning Pedro Domingos Department of Computer Science and Engineering University of Washington Seattle, WA 981952350, U.S.A. pedrod@cs.washington.edu ABSTRACT
More informationHighRate Codes That Are Linear in Space and Time
1804 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 7, JULY 2002 HighRate Codes That Are Linear in Space and Time Babak Hassibi and Bertrand M Hochwald Abstract Multipleantenna systems that operate
More informationWhat are requirements?
2004 Steve Easterbrook. DRAFT PLEASE DO NOT CIRCULATE page 1 C H A P T E R 2 What are requirements? The simple question what are requirements? turns out not to have a simple answer. In this chapter we
More informationSpaceTime Approach to NonRelativistic Quantum Mechanics
R. P. Feynman, Rev. of Mod. Phys., 20, 367 1948 SpaceTime Approach to NonRelativistic Quantum Mechanics R.P. Feynman Cornell University, Ithaca, New York Reprinted in Quantum Electrodynamics, edited
More informationIf A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C?
Problem 3 If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C? Suggested Questions to ask students about Problem 3 The key to this question
More informationIndexing by Latent Semantic Analysis. Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637
Indexing by Latent Semantic Analysis Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637 Susan T. Dumais George W. Furnas Thomas K. Landauer Bell Communications Research 435
More informationOrthogonal Bases and the QR Algorithm
Orthogonal Bases and the QR Algorithm Orthogonal Bases by Peter J Olver University of Minnesota Throughout, we work in the Euclidean vector space V = R n, the space of column vectors with n real entries
More information