Principles of special relativity

Principles of special relativity Introduction. Here we shall discuss very succintly the basic principles of special relativity. There are two reasons for doing this. First, when introducing optical devices, we saw that the electron-photon perturbation Hamiltonian is given by (see Lecture Notes, page 125, Eq. (253); page 281, Eq. (726)): H phot = ie h mc A. (A1) This interaction term can be obtained by starting from the free-electron Hamiltonian p 2 /(2m) and replacing the electron momentum p with p ea/c. (Thefactorofc in the denominators appears when using Gaussian units, which are more convenient in this context and we shall use them here). The reason behind this substitution relies on some basic principles of special relativity and it is worth understanding how this comes about. The second reason stems from the recent popularity of graphene as a promising material for nanolectronic applications. The band structure of graphene shows a unique feature: At the Fermi level the dispersion of the valence and conduction bands cross as straight lines. This E(k) relation resembles the dispersion of relativistic massless particles and it has triggered interest in exploiting what is known about relativistic massless spin-1/2 particles in order to understand the electronic properties of graphene. Such particles are described by a spin-1/2 relativistic wave equation (the Dirac equation for zero-mass spin-1/2 Fermions), so that it is interesting to see how Schrödinger equation can be generalized to statisfy the principle of relativistic invariance. Galilean invariance and Maxwell s equations. A soon as Maxwell s equation were formulated, it became clear that there was a major difference with respect to Newton s law. Let s start with Newton s second law, F = ma or F = m d2 x dt 2. (A2) University of Tsukuba, Summer 212 373

andconsiderthissameequationascouldbewrittenbysomebody(calledan observer )whichismovingwith respect to us with uniform velocity u. Let s assume that we and the other observer use a reference frame with parallel x, y, andz axes, that the other observer moves along the x axis, so that u =(u,, ), andthat the origin of the two reference frames coincide at a fixed instant in time which we take t =. Then, calling (x, y, z) our frame and (x,y,z ) the other observer s frame, an object located at a point (x, y, z) in our frame will be located at a point (x,y,z ) in the observer s frame, such that: x y z = x ut = y = z. (A3) Therefore, expressing Newton s second law in the primed frame, F = m d2 x dt 2 = m d2 (x ut) dt 2 = m d2 x dt 2 du dt = md2 x dt 2. (A4) In other words, Newton s law retains the same algebraic form in all frames which are moving with uniform velocity with respect to our frame. This principle can be generalized by saying that the laws of mechanics are valid in all inertial frames. An observer, by performing experiments, cannot tell whether he/she is moving with respect to other inertial frames. This is the principle of Galilean relativity, since it was proposed (noted, discovered, invented?... the choice of a term is a matter of deep philosophical discussions) by Galileo. Consider now Maxwell s equations. For simplicity, let s just consider a wave equation: ( 2 1 2 ) c 2 t 2 ψ =. (A5) Let s apply the transformation Eq. (A3). Since in the unprimed frame ψ(x, y, z, t) =ψ(x + ut,y,z,t), University of Tsukuba, Summer 212 374

so that ψ/ x = ψ/ x +(1/u) ψ/ t, we find: ( 2 1 2 c 2 t 2 2u 2 c 2 x t u2 2 ) c 2 x 2 ψ =. (A6) What a mess! The form of the equation has been completely altered by the transformation! In hindsight, we already knew it had to be so: After all, magnetic fields caused by moving charges must disappear when we use a frame in which the charges are at rest. Therefore, the E and B fields do not transform correctly under the Galilean transformation Eq. (A1). Moreover, the Lorentz force depends explicitly on the velocity of the particle, so that the form of the equation will differ in a different inertial frame. Historically, this is also related to the difficulty of understanding electromagnetic waves: Sound waves are oscillations of the medium in which they propagate. But electromagnetic waves are oscillations of what? University of Tsukuba, Summer 212 375

In order to fix the situation we have three alternatives we can choose from: 1. Maxwell s equations are wrong. The correct equations, yet to be discovered, are invariant under Galilean transformations. 2. Galilean invariance is valid for mechanics, not for electromagnetism. This is the historical solution before Einstein: The ether determines the existence of the absolute frame in which the ether is at rest and Mawxwell s equations hold. 3. Galilean invariance is wrong. There is a more general invariance yet to be dsicovered which preserves the form of Maxwell s equation. Classical mechanics is incorrect and must be reformulated so that it is invariant under this new transformation. Having to chose between thrashing Maxwell (option 1) or Newton (option 3), physicists chose the easier option 2. Einstein, instead, decided to follow the third option, guided by two postulates: 1. Postulate of relativity: All physical laws must lookthe same in all frames moving with uniform velocity with repect to each other. 2. Postulate of the constancy of the speed of light: The speed of light is the same (numerically the same!) independent of the velocity of the observer or of its source. This stems logically from the Michelson-Morley experiment of 1887, but the result could have been explained saving ether and using the Lorentz-Fitzgerald contractions. Armed with these postulates, Einstein set to build a new set of transformations between inertial frames. Maxwell s equation are now invariant under this new set of transformations, but Newtonian mechanics has to be modified: If two frames move at a relative speed much smaller than the speed of light, the new transformations approach the usual Galilean transfomation and Newton is approximately correct. But for relative velocities approaching the speed of light, the laws of mechanics deviate enormously from Newton s laws. It is impossible to pay justice to special relativity in such a short time. We shall only discuss those few concepts which are required to answer our original question of why we perform the substitution p p (e/c)a. Lorentz transformations. Consider the same two frames ( primed K and unprimed K frames) considered above. Assume now that at t =(the time at which the origins of the two frames coincided... at least when looking at our clock...!) a ray of light was emitted from the origin. Since light travels at the same speed in both frames, we must have for the University of Tsukuba, Summer 212 376

wavefronts of the emitted light: ct 2 x 2 y 2 z 2 = and ct 2 x 2 y 2 z 2 =, (A7) so that, assuming that space-time is isotropic and homogeneous, ct 2 x 2 y 2 z 2 = λ(u)[ct 2 x 2 y 2 z 2 ]. (A8) The function λ(u) is a possible velocity-dependent change of scale between the two frames. However, since going from K to K must involve the same transformation as going from K to K with a sign-flip for u, we must have λ =1.Usingthenotationx = ct, x 1 = x, x 2 = y, andx 3 = z, Eq. (A8) is satisfied if: x = γ(x βx 1 ) x 1 = γ(x 1 βx ) x 2 = x 2 x 3 = x 3, (A9) where β = u c and γ = (1 β 2 ) 1/2. (A1) The transformations Eq. (A9) are called Lorentz transformations. Obviously the inverse transformations read: x = γ(x + βx 1 ) x 1 = γ(x 1 + βx ) x 2 = x 2 x 3 = x 3, (A11) Note that, unlike the Galilean transformations, now time and space transform together: Simultaneous events in one frame will not be simultaneous in another frame. Indeed consider two events (ct 1, x 1 ) and (ct 2, x 2 ). University of Tsukuba, Summer 212 377

Thanks to Eq. (A8), their distance s 12 = c 2 (t 1 t 2 ) 2 x 1 x 2 2 (A12) is an invariant (that is, it s the same in all inertial frames). If s 12 > the separation between the events is said to be time-like : It is always possible to find a transformation (actually with β = x 1 x 2 /(c t 1 t 2 )) such that in the transformed frame the two events are at the same spatial location, separated only by time. If s 12 < the separation between the events is said to be space-like : It is always possible to find a Lorentz transformation such that in the transformed frame the two events are simultaneous, separated only spatially. If, finally, s 12 =, the separation is said to be light-like : One event lies on the light-cone of the other. Proper time and time dilatation. Since time has become an observer-dependent quantity, when dealing with moving particles it is convenient to consider the time in the frame in which the particle is at rest. If v(t) is the velocity of the moving particle in our frame, let s consider the invariant: ds 2 = dx µ dx µ = c 2 dt 2 dx 2 dy 2 dz 2 = c 2 dt 2 v 2 dt 2 = c 2 (1 v 2 /c 2 )dt 2 = c 2 dt 2 /γ 2. (A13) Since this quantity is invariant (that is, it is numerically the same in all inertial frames), in the frame in which the particle is at rest it will be ds 2 = c 2 dτ 2,whereτ is the time in that frame. This time is called proper time. If the particle travels over a time interval τ 2 τ 1 in its proper time, as seen by us the time interval will stretch to the interval t 2 t 1 obtained by integrating Eq. (A13) along the particle trajectory: t 2 t 1 = τ2 τ 1 dτ γ(τ) = τ2 τ 1 dτ 1 v(τ) 2 /c 2. (A14) Since γ>1, the time interval we observe, t 2 t 1, is longer than the proper time interval τ 2 τ 1. This has been experimentally verified: The µ-mesons (actually, leptons) produced by cosmic-ray hits in the upper atmosphere, often reach the ground. Since the lifetime of the µ-meson is about 2.2 µs, even at the speed of University of Tsukuba, Summer 212 378

light the particle could not travel more than about 66 m before decaying. Yet, they can easily be detected after having traveled distances more than two orders of magnitude longer (the thickness of the atmosphere, of the order of 1 5 m). This is because their lifetime, as observed by us, is stretched enormously, as these particles travel at speeds approaching the speed of light. Lorentz contraction. Consider a rod of length L at rest in the K frame. Let the ends of the rod be at x = x 1 and x = x 2,so that L = x 2 x 1. What is the length of the rod in our K frame? By the Lorentz transformations, Eq. (A9) (which we must use since when we measure the length of the rod we measure its ends at the same time in our frame), we have: L = x 2 x 1 = 1 γ (x 2 x 1 ) = L γ < L. (A15) The rod in our frame appears shorter than in its rest frame. This is the Lorentz-FitzGerald contraction which was postulated (without proof or arguments behind) in order to explain the Michelson-Morley experiment. Addition of velocities. Let s consider the Lorentz transformation dx = γ(dx + βdx 1 ) dx 1 = γ(dx 1 + βdx ) dx 2 = dx 2 dx 3 = dx 3. (A16) In our K frame, the velocity v = cdx 1 /dx of a particle moving with velocity v = cdx 1 /dx frame will be: in the K v = c dx 1 dx = c dx 1 + βdx dx + βdx 1 = c dx (dx 1 /dx + β) dx (1 + βdx 1 /dx ) = v + u 1+v u/c 2. (A17) Note how the particle velocity v and the frame velocity u add, as seen by us: In the limit of small velocities, v v + u, as usual. But as either v or u approach c, the denominator grows and the sum-velocity v as seen University of Tsukuba, Summer 212 379

by us cannot exceed c. This is consistent with the second postulate. Note that the 4-vector U µ = (γc, γu) (A18) transforms like the coordinate vector x µ. Note the we shall use the contravariant (e.g., xµ) andcovariant(e.g., x µ = g µν xν) notation. Using the metric (+,,, ) for µ =,3wehavexµ =(x, x) but x µ =(x, x)). Thus xµx µ = x 2 x x. 4-momentum. In classical mechanics the momentum and energy of a particle are: E = E() + 1 2 mu2 p = mu. (A19) The term E() is a constant which refers to the rest-energy of the particle. It is usually ignored in non-relativistic discussions. In order to generalize these concepts, we can generally start by finding arbitrary functions of the velocity, E(u) and M(u) such that: E = E(u) p = M(u)u, (A2) with the constrains (dictacted by the fact that we want to recover Eq. (A19) in the limit u ): E u 2() = m 2 M() = m. (A21) The general expressions for the functions E and M can be obtained by analyzing the elastic collision of two identical particles and require momentum and energy conservation in two inertial frames K and K. We ll skip the derivation (see Jackson, Classical Electrodynamics, Sec. 11.5) and simply state the result: The general form University of Tsukuba, Summer 212 38

for these two functions consistent with the two postulates, and energy and momentum conservation is: E(u) = γmc 2 M(u) = γm. (A22) Note that in the limit u, E = γmc 2 mc 2 + mu 2 /2, which is the classical result with a rest energy mc 2. From this we can define an energy-momentum 4-vector, p µ : p µ = (γmc, γmu) = (E/c, γmu) = mu µ, (A23) having used Eq. (A18) in the last step. The invariant length of the energy-momentum 4-vector is p µ p µ = p 2 p 2 = E 2 /c 2 γ 2 m 2 u 2 = γ 2 (m 2 c 2 m 2 u 2 ) = m 2 c 2. (A24) Finally, from this expression we can write the energy E of a particle as: E = c 2 p 2 + m 2 c 4. (A25) Lorentz force. Our goal now is to re-express the Lorentz force (recall that we are using Gassian units here): dp dt = e(e + u c B) (A26) in a manifetsly covariant form and find a Hamiltonian from which Eq. (A26) may be derived. First, let s re-write Eq. (A26) in terms of the proper time τ and add to it the equation expressing the University of Tsukuba, Summer 212 381

power-balance de/dτ =(e/c)u E, so that we can employ the 4-vector p µ : dp dτ Defining the electromagnetic field tensor: = γ dp dt = eγe + eγ u c B = e c (U + U B) dp dτ = γ dp dt = γ e c u E = e c U E. (A27) F µν = we can write Eq. (A27) in the manifestly covariant form: Ex Ey Ez Ex Bz By Ey Bz Bx Ez By Bx dp µ dτ, (A28) = e c F µν Uν (A29) We must now find a Hamiltonian function whose dynamic equations (the Hamilton equations of motion) yield the Lorentz-force equation. To do this correctly it would be necessary to develop a bit of Lagrangian theory. So, here we follow a pragmatic approach : Let s define the Hamiltonian: H = c 2 [cp (e/c)a] 2 + m 2 c 4 + eφ, (A3) where Φ is the scalar potential. Considering that the particle velocity in terms of p and A is (as it follows from Lagrangian theory): cp ea u = (p (e/ca) 2 + m 2 c 4, (A31) with some algebra one can verify that the Hamilton equations of motion (i =1, 2, 3): H x i = dp i dτ (A32) University of Tsukuba, Summer 212 382

is equivalent to the Lorentz-force equation (the first of the two Eqns. (A27)). Comparing this expression with Eq. (A25) we see that the interaction between a charged particle and the electromagnetic field has been accounted for by replacing the particle momentum p with p (e/c)a, which is what we wanted to show. Relativistic wave equations. One possible way to derive (wrong term, but let s ignore deep philosophical discussions!) Schrödinger s equation is to write the energy-momentum dispersion for a free particle of mass m, E = p2 2m, (A33) set E i h / t, p i h, and view Eq. (A33) as operators acting on a vector ψ in some Hilbert space H. Thus: E = p2 2m h2 2m 2 ψ(r,t) = i h t ψ(r,t), (A34) which is the wave equation for non-relativistic particles. Searching for a relativistic wave equation, we should start by replacing the non-relativistic energy-momentum relation given by Eq. (A33) with its relativistic counterpart given by Eq. (A25). We immediately encounter a big problem: Eq. (A25) contains a square root. As innocent as this may seem, viewing this equation in terms of operators would yield a mathematically very nasty outcome: Taking the square root of a linear differential operator yields a nonlocal, unbound (so, not continuous in the operator sense) operator. As a first, alternative approach, let s avoid this square-root issue by considering the square of Eq. (A25): E 2 = c 2 p 2 + m 2 c 4 h 2 2 t 2φ(r,t) = c2 h 2 2 φ + m 2 c 4 φ(r,t), (A35) or (c 2 h 2 2 + m 2 c 4 ) φ(r,t) =, (A36) where the d Alembert operator is defined as 2 =(1/c 2 ) 2 / t 2 2 and φ represents the relativistic state vector. This a called the Klein-Gordon equation. In relativistic quantum mechanics it is customary to use units University of Tsukuba, Summer 212 383

in which c = h =1, so the Klein-Gordon equation, Eq. (A36) is written simply as: ( 2 + m 2 ) φ(x) = [ µ µ + m 2 ]φ(x) =, (A37) where x is the 4-vector (r,t), inner products have the metric (,,, +), and 2 takes the form 2 / t 2 2 = µ µ, the latter form having been writen using the notation µ = / x µ and also the usual relativistic convention according to which the index µ runs over the 4 dimensions and the sum is implicitly implied over repeated indices. The Klein-Gordon equation is relativistically covariant, as desired, but its interpretation as a wave equation for a single particle (so, not as a second-quantization field equation) is problematic for two major reasons. First, plane-wave solutions of Eq. (A37) are of the form φ(r,t)=exp[ i(k r Et)] with both positive and negative energies, E = ± k 2 + m 2. One may attempt to interpret the negative-energy waves as anti-particles of the positive-energy solutions, but this interpretation encounters problems as soon as interactions are introduced. For example, adding the interaction with the electromagnetic field, µ µ iea µ,weget an equation which admits transitions between negative- and positive-energy solutions. This problem affects all relativistic first-quantization formulations and can be bypassed only in second quantization. More serious is a second problem specific to the Kein-Gordon equation: The charge-current 4-vector j µ (x) = i[ µ φ (x)φ(x) φ (x) µ φ(x)], (A38) obeys the conservation law µ j µ (x) =, (A39) as it follows directly from Eq. (A37), but attempting to identify ρ(x) = ij (x) = i[ φ (x) φ (x) φ(x)] as a probability density yields negative values. For example, for an eigenstate of energy E, ρ(x) =2Eφ (x)φ(x), which is negative for negative E. One may simply ignore negative-energy solutions, but, yet again, interactions can induce transitions to those and the interpretation becomes really problematic. A second approach is quite different. We saw that squaring Eq. (A25) results in negative-energy solutions and negative probability densities. On the other hand, retaining the square root is not an option, since it results in University of Tsukuba, Summer 212 384

nonlocal, unbound operators. But looking at the expression E 2 = p 2 + m 2 (A4) one is reminded of the factorization of the form A 2 = α 2 + β 2 as AA where A = α + iβ. This look s like a clean way to take some sort of square root. The quantity i (the imaginary unit) does the trickwhen we consider the two-dimensional form AA = α 2 + β 2, but we need other mathematical objects in the four-dimensional case we are considering here. We require E 2 = m 2 + p 2 to be expressed in a form of the type (βm + α p)(βm + α p), whereα and β are mathematical entities yet to be determined. The relation we just wrote can be satisfied only if βα i + α i β =and α i α j + α j α i =for i j (so that all the unwanted cross-terms vanish) and α 2 i = β2 =1. This implies that the quantities β and α must be matrices. It turns out that these quantities (with algebraic properites related to the so-called quaternions ) have to be 4 4 matrices and can be represented in terms of the Pauli matrices as follows: γ i = ( σi σ i ), γ = ( I I ) (A41) where i =1, 3, I is the 2 2 identity matrix and σ i are the Pauli matrices: ( ) ( ) 1 i σ 1 = σ 1 2 = σ i 3 = ( 1 1 ). (A42) In the literature one may find several different conventions regarding the definitions of the Dirac matrices, the difference originating from the different convention used for the metric: A 4-vector x can be defined as (x, x) with (+,,, ) metric, as done here. Alternatively, x can be defined as (x,x 4 )=(x,ix ) with purely imaginary x 4 and Eucledian metric (+, +, +, +). Note that these matrices obey the anti-commutation laws: [γ µ,γ ν ] + = 2 g µν, (A43) where g µν = g µν is the diagonal metric tensor (+,,, ). To see that these matrices accomplish what we University of Tsukuba, Summer 212 385

set out to do, let s write the wave equation ( iγ µ µ + m)ψ(x) = (or ( iγ µ µ + mc h )ψ(x) = with units restored, x = ct), (A44) where ψ is now a 4-vector ψ α,whereα =, 3. This index can be viewed as a spin-index a sign-index labeling the sign of the energy (positive for electrons, negative for positrons, in Dirac s original interpretation) and spin for the spin-1/2 case. If we multiply the Dirac equation, Eq. (A44), by ( iγ λ λ m), wehave ( iγ λ λ m)( iγ µ µ + m)ψ(x) = ( γ λ γ µ λ µ m 2 )ψ(x) =. (A45) Now, expressing γ λ γ µ λ µ as 1/2 the sum this quantity and itself, swapping the name of the dummy indices λ and µ in the second term, noticing that λ µ = µ λ and using the anticommutation properties, Eq. (A43), we can write this as: [ ] 1 2 (γλ γ µ λ µ + γ µ γ λ λ µ )+m 2 ψ(x) = ( ) 1 2 [γλ,γ µ ] + λ µ + m 2 ψ(x) = ( µ µ +m 2 )ψ(x) =, (A46) which is the Klein-Gordon equation. So, we have effectively squared the dispersion and Dirac s equation can be viewed as a sort of square root of the Klein-Gordon equation. The plane-wave solutions of Dirac s equation can be obtained as follows. Let s express the desired solution in the form ψ(x) = w(k,e) e i(k r iet), where w(k,e) is a four-component spinor which satisfies the equation (A47) (γ E γ k + m)w(k,e) =. (A48) Let us now write w(k,e) in terms of two two-component objects w + and w : ( ) w+ (k,e) w(k,e) =. (A49) w (k,e) University of Tsukuba, Summer 212 386

The quantities w ± must satisfy the two coupled two-component equations: σ kw =(E m)w + σ kw + =(E + m)w. (A5) For m it is convenient to employ the rest frame of the particle. We have either E = m { w+ w =, (A51) or { w+ = E = m. (A52) w The nonzero component w ± can be chosen arbitrarily, so it is convenient to consider them eigenvalues of σ z. Thus we have the four solutions: 1 1 1 1. (A53) The first two solutions correspond to particles of energy E = m with spin up and down, respectively, while the latter solutions correspond to particles of energy E = m. The existence of negative-energy solutions constitute the problem which, as mentioned above, affects all first-quantization relativistic formulations. Dirac suggested that these solutions should be intepreted as anti-particles (i.e., positrons in the case of electrons). Note how spin, introduced as an ad hoc new degree of freedom in conventional Quantum Mechanics, now emerges naturally from the formulation of a relativistic wave equation. This constitutes a major success of Dirac s theory. University of Tsukuba, Summer 212 387

For arbitrary k the solutions are: u k,σ = w σ (k,e k ) = ( ) ( Ek + m 1/2 2m ξ σ σ k E k +m ξ σ ), (A54) (for σ =1,2)and ( ) ( Ek + m 1/2 σ k v k,σ = w σ ( k, E k ) = E k +m ξ ) σ. (A55) 2m ξ σ ( ) ( ) 1 In these expressions ξ 1 = and ξ 2 =. Considering now the case m =(as for the early 1 theories of neutrinos or looking at the analogy with the dispersion in graphene), we can chose the 4 independent 4-component spinors (in place of Eqns. (A54) and (A55)): (for σ =1,2)and u k,σ = 1 2 ( v k,σ = 1 2 ( ζσ σ k k ζ σ σ k k ζ σ ζ σ ) ), (A56), (A57) corresponding to the negative and positive-energy dispersion E = ±k on the light cone. The 2-component spinors ζ σ are now chosen not as eigenstates of σ z, but as eigenstates of the helicity σ k/k: σ n ζ 1 = ζ 1 σ n ζ 2 = ζ 2, (A58) where n = k/k is the unit vector along the direction of motion. This is rendered necessary by a profound difference between massless and massive particles: For massive particles one is free to select aribitrarily the University of Tsukuba, Summer 212 388

rest-frame form of w ± in Eqns. (A51)-(A55), since one can chose to define the polarization in the rest frame along any arbitrary direction. On the contrary, for massless particles traveling at the speed of light there is no such a thing as a rest frame and the polarization must be measured along the direction of motion, k/k. University of Tsukuba, Summer 212 389