Differentiability and some of its consequences Definition: A function f : (a, b) R is differentiable at a point x 0 (a, b) if f(x)) f(x 0 ) x x 0 exists. If the it exists for all x 0 (a, b) then f is said to be differentiable in the interval. It is differentiable at the end points if the appropriate one-sided its exists. If we just say that f is differentiable, we mean that f is differentiable at all points of its domain. If f is differentiable at x 0 we denote its derivative by any of the following: f (x 0 ) or f(x 0 ) (Newton), df dx (x 0) (Leibniz), Df(x 0 ) (other). Examples: 1. The function f(x) = x is differentiable on (, 0) (0, ), but not at x = 0. 2. The function f(x) = [x] (greatest integer function) is differentiable at all points except integers. 3. The function f(x) = c (a constant) is differentiable. 4. The function f(x) = x is differentiable. The proofs of the last two items are immediate from the definition. The first two are exercises. Proposition: If f and g are differentiable at x 0, then f ± g is differentiable at x 0 and (f ± g) (x 0 ) = f (x 0 ) + g (x 0 ). fg is differentiable at x 0 and (fg) (x 0 ) = f (x 0 )g(x 0 ) + f(x 0 )g (x 0 ). If g(x 0 ) 0, then f/g is differentiable at x 0 and ( ) f (x 0 ) = f (x 0 )g(x 0 ) f(x 0 )g (x 0 ). g (g(x 0 )) 2 Proof: This is standard. See your text. Consequences of the proposition: polynomials are differentiable; rational functions are differentiable at points where the denominator is not zero. In your text, there are also proofs (non-trivial) of the following: d dx x1/n = (1/n)x 1/n n, for all n N and x > 0. 1
d sin x = cos x. dx From the previous proposition, the chain rule (below) and the first of these, it follows that all algebraic functions are differentiable on their domains, and from the second, one can deduce the differentiability of the trigonometric functions (again, on their domains). Definition: An alternative definition: The function f : (a, b) R is differentiable at x 0 (a, b) provided there exists a constant A, and a function ɛ(x, x 0 ) such that f(x) = f(x 0 ) + A( ) + ɛ(x, x 0 ), where ɛ(x, x 0 ) = 0. x x 0 The number A is called the derivative of f at x = x 0. What this says: If we approximate f(x) by the linear function f(x 0 ) + A( ), then the error we make (which is denoted by ɛ) is quadratic in ( ) - that is, it behaves like the quantity ( ) 2 as x x 0. Example: Let f(x) = 4 2 2x + 3, and let x 0 = 1. Suppose we take A = 6 (the smart thing to do, since f (1) = 6. Then an easy computation gives f(x) = f(1) + 6(x 1) + 4(x 1) 2. Here, the function ɛ = 4(x 1) 2 and evidently, x 1 ɛ/(x 1) = 0. On the other hand, if we take x 0 = 1, and A = 5, we find that And for this value of A, we have f(x) = f(1) + 5(x 1) + ɛ, where ɛ = (x 1) + 4(x 1) 2. x 1 ɛ x 1 = 1( 0), and so A = 5 doesn t work. The only value of A which gives the correct, quadratic error is A = 6. Proposition: The two definitions are equivalent Proof: (a) Suppose the usual definition of differentiability holds. Define a function ɛ by ɛ(x, x 0 ) = f(x) f(x 0 ) f (x 0 )( ). Then ɛ = f(x) f(x 0) f ( 0 ), 2
which goes to 0 as x x 0 since f is differentiable at x 0. Thus the second definition holds with A = f (x 0 ). (b) Suppose the new definition holds; then we need to show that f (x 0 ) = A in the standard definition. But we re given So f(x) = f(x 0 ) + A( ) + ɛ(x, x 0 ), f(x) f(x 0 ) x x 0 ( = A x x0 ɛ ) = A, by the second definition, and so f (x 0 ) = A by the first definition. Proposition: f differentiable at x 0 f is continuous at x 0. Proof: Differentiability implies the existence of ɛ quadratic in such that f(x) f(x 0 ) = A( ) + ɛ(x, x 0 ), so as x x 0, the right hand side of this expression vanishes. Proposition (Chain rule): Let f : D R, and g : E R be differentiable, with g(e) D. Then the composition f g is differentiable, and (f g) (x) = f (g(x))g (x). Proof: Fix any y 0 D. Since f is differentiable at y 0, we have f(y) = f(y 0 ) + f (y 0 )(y y 0 ) + ɛ(y, y 0 ). This holds in particular for y 0 = g(x 0 ), and y = g(x), for x, x 0 E: f(g(x)) = f(g(x 0 )) + f (g(x 0 ))(g(x) g(x 0 )) + ɛ(g(x), g(x 0 )). Rearranging slightly, and dividing by, we get ( ) ( ) f(g(x)) f(g(x 0 )) g(x) = f g(x0 ) ɛ(g(x) g(x0 )) (g(x 0 )) +. We ll be done if we can show that the last term goes to 0 as x x 0. We rewrite this term as ( ) ( ) ɛ(g(x) g(x0 )) g(x) g(x0 ). g(x) g(x 0 ) We claim that x x 0 exists and equals 0: First of all, the function ( ) ɛ(g(x) g(x0 )) g(x) g(x 0 ) h(y) = ɛ(y, y 0) y y 0 = f(y) f(y 0) y y 0 f (y 0 ) 3
is continuous at y y 0, and we can define h(y 0 ) = y y0 h(y) = 0 since this it exists. So h(y) is continuous at y 0 as well. Since g is continuous at x 0, and g(x 0 ) = y 0, the composition h(g(x)) is continuous at x 0, and h(g(x 0 )) = 0. So the last term on the right hand side of the expression above goes to 0 g (x 0 ) = 0 and [ ( ) ( )] f(g(x)) f(g(x 0 )) g(x) = f g(x0 ) ɛ(g(x) g(x0 )) (g(x 0 )) + x x 0 x x0 = f (g(x 0 ))g (x 0 ) Rolle s Theorem: Suppose f is continuous on [a, b] and differentiable on (a, b) and suppose also that f(a) = f(b) = 0. Then there exists c (a, b), with f (c) = 0. Proof: f is continuous on the closed interval, so let x m and x M be the points in [a, b] where f achieves its minimum and maximum values respectively. One possibility is that f(x m ) = f(x M ), in which case f is identically 0 on the interval (why?) and its derivative is 0 so such a c exists. Alternatively, at least one of the max and min is non-zero. We ll do one case; the other is similar and left as an exercise. Suppose f(x m ) < 0. Then a < x m < b, since f vanishes at a, b. For small h < 0 we have f(x m + h) f(x m ) 0 (why?), and then the quotient f(x m + h) f(x m ) 0. h But for small h > 0, this same quotient is 0. Since the it of this quotient exists as h 0, the it must be 0 - i.e., we can take c = x m. For the other case, it is easily shown that c = x M. The mean value theorem: Suppose f is continuous on [a, b] and differentiable on (a, b). Then there exists c (a, b) such that b a = f (c). Proof: We construct a function that allows us to use Rolle s theorem to get the answer. Define h(x) = f(x) f(a) (x a). b a Then h(a) = h(b) = 0, and by Rolle s theorem, there s a number c (a, b) such that h (c) = 0. But taking h (c) and setting it equal to 0 gives the mean value theorem. Cauchy s generalized mean value theorem: Let f and g both be continuous on [a, b] and differentiable on (a, b). Suppose that g(b) g(a) 0 and that f and g do not vanish simultaneously. Then there exists a c (a, b) such that g(b) g(a) = f (c) g (c). 4
Remark: Geometrically, if we regard t (x = f(t), y = g(t)) as a parametric curve in the plane, the theorem says that, if we connect the endpoints of the curve with a straight line, there is some point on the curve where the tangent vector is parallel to this line. Proof: Again,we construct a function that allows us to use Rolle s theorem. In this case we put k =, and define g(b) g(a) φ(x) = f(x) f(a) k[g(x) g(a)] Then φ(a) = φ(b) = 0, and by Rolle s theorem, there exists a c such that φ (c) = 0 = f (c) kg (c). Now g (c) 0, since otherwise, we d have f (c) = 0, violating the hypothesis. So we must have k = f (c)/g (c) as claimed. The primary utility of these mean value theorems lies in the proof of other interesting results. We give two examples: Theorem (l Hôpital s Rule): Suppose f and g are differentiable in an open interval containing x = a, and suppose that f(a) = g(a) = 0, where a is an isolated zero of g. Suppose also that x a f (x)/g (x) exists. Then f(x) x a g(x) = f (x) x a g (x). Proof: Choose x near a. By Cauchy s theorem, there exists c between a and x such that f(x) f(a) g(x) g(a) = f(x) g(x) = f (c) g (c). Now as x a, c a, and since the it on the right exists, the theorem follows. (The hypothesis can be weakened a bit, and there are other versions involving other indeterminate forms. See, e.g., Advanced Calculus by A. E. Taylor.) Taylor s theorem: Let f and its first n derivatives be continuous in [a, b] and suppose f n+1 exists in (a, b). Then there exists c (a, b) such that f(b) = f(a) + f (a)(b a) + f (a) 2! (b a) 2 + + f n (a) (b a) n + f n+1 (c) (n + 1)! (b a)n+1. ( Note that if n = 0, this is the MVT; it s a generalization for n 1.) Proof: We define 2 functions: F (x) = f(b) f(x) f (x)(b x) f n (x) (b x) n (b x)n+1 G(x) = (n + 1)! 5
Note that F (b) = G(b) = 0, and that both F and G satisfy the hypotheses of Cauchy s MVT. So there exists a c (a, b) such that We just need to work this out: F (b) F (a G(b) G(a) = F (a) G(a) = F (c) G (c). F (x) = f (x) f (x)(b x) f (x) + f (x)(b x) f n+1 (x) (b x) n + f n (x) (b x)n 1 (n 1)! = f n+1 (x) (b x) n (all the other terms cancel) G (b x)n (x) = Now, from the above, which gives the result. F (a) = F (c) G (c) G(a) = f n+1 (b a)n+1 (c), (n + 1)! Remarks: This proof looks unmotivated in the sense that it consists of a one-time trick used to obtain the result. Not very aesthetic, perhaps, but it gets us where we need to go.... We usually write this in the form f(x) = n k=0 f k (a) (x a) k + f n+1 (c) k! (n + 1)! (x a)n+1, where the last term is called Lagrange s form of the remainder. One final note: If the second derivative exists and is continuous, we have f(x) = f(a) + f (a)(x a) + f (c) (x a) 2. 2! So in this case, the error function ɛ is seen to be explicitly quadratic in (x a), as claimed before. 6