Regression is the art of inferring E(Y|x) (the expected value of Y, given x) from a number of realisations of the pair (x, y). One can never generalize beyond one’s data without making subjective assumptions (Hume 1739–40; Mitchell 1980; Schaffer 1994; Wolpert 1996). Although linearity is rather special (outside quantum mechanics no real system is truly linear), detecting linear relations has been the focus of much research in statistics and machine learning for decades and the resulting algorithms are well understood, well developed and efficient. Also, a linear model may still be useful for modelling a non-linear process. For example, the simplest non-trivial model obtainable from the Taylor expansion of any infinitely-differentiable function is a linear model (the first-order expansion of the Taylor series). For these reasons it is often reasonable to assume a linear relationship exists between X and Y in the first instance. A system of linear equations is considered overdetermined if there are more equations than unknowns. In general regression analysis is performed when there are enough (x, y) so that the system is overdetermined. The problem, then, is finding an approximate solution to an overdetermined system of linear equations. In order to find the best approximate solution, one needs to define an error function and measure it. The Gauss-Markov theorem states that in a linear model in which the errors have expectation zero and are uncorrelated and have equal variances, a best linear unbiased estimator (BLUE) of the coefficients is given by the least-squares estimator. Note that, being Bayesians at heart, we are careful to make all of our assumptions explicit. Linear (least squares) regression admits a unique closed-form solution, but the trick is to use a numerically stable method, such as QR decomposition.