Find an iPhone, then flip it face upwards (hopefully over a soft cushion or mattress). What do you see?
An iPhone is a rectangular parallelepiped with three unequal dimensions and hence three unequal principle moments of inertia I1 < I2 < I3. These axes are: vertical to the face, horizontal through the small dimension, and horizontal through the long dimension. So now spin the iPhone around its long axis, it keeps a nice and steady spin. And then spin it around an axis point out of the face, again it’s a nice steady spin. But flip it face upwards, and it almost always does a half twist. Why?
The answer is variously known as the Tennis Racket Theorem or the Intermediate Axis Theorem or even the Dzhanibekov Effect. If you don’t have an iPhone or Samsung handy, then watch this NASA video of the effect.
The flipping iPhone is a rigid body experiencing force-free motion. The Euler equations are an easy way to approach the somewhat complicated physics. These equations are
They all equal zero because there is no torque. First let’s assume the object is rotating mainly around the x1 axis so that ω2 and ω3 are small (rotating mainly around ω1). Then solving for the angular accelerations yields
This is a two-dimensional flow equation in the variables ω2, ω3. Hence we can apply classic stability analysis for rotation mainly about the x1 axis. The Jacobian matrix is
This matrix has a trace τ = 0 and a determinant Δ given by
Because of the ordering I1 < I2 < I3 we know that this is quantity is positive.
Armed with the trace and the determinant of a two-dimensional flow, we simply need to look at the 2D “stability space” as shown in Fig. 1. The horizontal axis is the determinant of the Jacobian matrix evaluated at the fixed point of the motion, and the vertical axis is the trace. In the case of the flipping iPhone, the Jacobian matrix is independent of both ω2 and ω3 (if they are remain small), so it has a global stability. When the determinant is positive, the stability depends on the trace. If the trace is positive, all motions are unstable (deviations grow exponentially). If the trace is negative, all motions are stable. The sideways parabola in the figure is known as the discriminant. If solutions are within the discriminant, they are spirals. As the trace approaches the origin, the spirals get slower and slow, until they become simple harmonic motions when the trace goes to zero. This kind of marginal stability is also known as centers. Centers have a stead-state stability without dissipation.
For the flipping iPhone (or tennis racket or book), the trace is zero and the determinant is positive for rotation mainly about the x1 axis, and the stability is therefore a “center”. This is why the iPhone spins nicely about its axis with the smallest moment.
Let’s permute the indices to get the motion about the x3 axis with the largest moment. Then
The trace and determinant are
where the determinant is again positive and the stability is again a center.
But now let’s permute again so that the motion is mainly about the x2 axis with the intermediate moment. In this case
And the trace and determinant are
The determinant is now negative, and from Fig. 1, this means that the stability is a saddle point.
Saddle points in 2D have one stable manifold and one unstable manifold. If the initial condition is just a little off the stability point, then the deviation will grow as the dynamical trajectory moves away from the equilibrium point along the unstable manifold.
The components of the angular frequencies of each of these cases is shown in Fig. 2 for rotation mainly around x1, then x2 and then x3. A small amount of rotation is given as an initial condition about the other two axes for each case. For these calculations, no approximations were made, using the full Euler equations, and the motion is fully three-dimensional.
Fate of the Spinning Earth
When two of the axes have very similar moments of inertia, that is, when the object becomes more symmetric, then the unstable dynamics can get very slow. An example is shown in Fig. 3 for I2 just a bit smaller than I3. The high frequency spin remains the same for long times and then quickly reverses. During the time when the spin is nearly stable, the other angular frequencies are close to zero, and the object would have only a slight wobble to it. Yet, in time, the wobble goes from bad to worse, until the whole thing flips over. It’s inevitable for almost any real-world solid…like maybe the Earth.
The Earth is an oblate spheroid, wider at the equator because of the centrifugal force of the rotation. If it were a perfect spheroid, then the two moments orthogonal to the spin axis would be identically equal. However, the Earth has landmasses, continents, that make the moments of inertia slightly unequal. This would have catastrophic consequences, because if the Earth were perfectly rigid, then every few million years it should flip over, scrambling the seasons!
But that doesn’t happen. The reason is that the Earth has a liquid mantel and outer core that very slowly dissipate any wobble. The Earth, and virtually every celestial object that has any type of internal friction, always spins about its axis with the highest moment of inertia, which also means the system relaxes to its lowest kinetic energy for conserved L through the simple equation
So we are safe!
Here is a simple Python code to explore the intermediate axis theorem. Change the moments of inertia and change the initial conditions. Note that this program does not solve for the actual motions–the configuration-space trajectories. The solution of the Euler equations gives the time evolution of the three components of the angular velocity. Incremental rotations could be applied through rotation matrices operating on the configuration space to yield the configuration-space trajectory of the flipping iPhone (link to the technical details here).
# -*- coding: utf-8 -*-
Created on Thurs Oct 7 19:38:57 2021
@author: David Nolte
Introduction to Modern Dynamics, 2nd edition (Oxford University Press, 2019)
import numpy as np
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
from scipy import integrate
from matplotlib import pyplot as plt
fig = plt.figure()
ax = fig.add_axes([0, 0, 1, 1], projection='3d')
I1 = 0.45 # Moments of inertia can be changed here
I2 = 0.5
I3 = 0.55
# Flip Phone
def flow_deriv(x_y_z, t0):
x, y, z = x_y_z
yp1 = ((I2-I3)/I1)*y*z;
yp2 = ((I3-I1)/I2)*z*x;
yp3 = ((I1-I2)/I3)*x*y;
return [yp1, yp2, yp3]
model_title = 'Flip Phone'
t = np.linspace(0, max_time/4, int(250*max_time/4))
# Solve for trajectories
x0 = [[0.01,1,0.01]] # Initial Conditions: Change the major rotation axis here ....
t = np.linspace(0, max_time, int(250*max_time))
x_t = np.asarray([integrate.odeint(flow_deriv, x0i, t)
for x0i in x0])
x, y, z = x_t[0,:,:].T
lines = ax.plot(x, y, z, '-')
return t, x_t
t, x_t = solve_lorenz()
lines = plt.plot(t,x_t[0,:,0],t,x_t[0,:,1],t,x_t[0,:,2])
Symmetry is the canvas upon which the laws of physics are written. Symmetry defines the invariants of dynamical systems. But when symmetry breaks, the laws of physics break with it, sometimes in dramatic fashion. Take the Big Bang, for example, when a highly-symmetric form of the vacuum, known as the “false vacuum”, suddenly relaxed to a lower symmetry, creating an inflationary cascade of energy that burst forth as our Universe.
The early universe was extremely hot and energetic, so much so that all the forces of nature acted as one–described by a unified Lagrangian (as yet resisting discovery by theoretical physicists) of the highest symmetry. Yet as the universe expanded and cooled, the symmetry of the Lagrangian broke, and the unified forces split into two (gravity and electro-nuclear). As the universe cooled further, the Lagrangian (of the Standard Model) lost more symmetry as the electro-nuclear split into the strong nuclear force and the electro-weak force. Finally, at a tiny fraction of a second after the Big Bang, the universe cooled enough that the unified electro-week force broke into the electromagnetic force and the weak nuclear force. At each stage, spontaneous symmetry breaking occurred, and invariants of physics were broken, splitting into new behavior. In 2008, Yoichiro Nambu received the Nobel Prize in physics for his model of spontaneous symmetry breaking in subatomic physics.
Physics is filled with examples of spontaneous symmetry breaking. Crystallization and phase transitions are common examples. When the temperature is lowered on a fluid of molecules with high average local symmetry, the molecular interactions can suddenly impose lower-symmetry constraints on relative positions, and the liquid crystallizes into an ordered crystal. Even solid crystals can undergo a phase transition as one symmetry becomes energetically advantageous over another, and the crystal can change to a new symmetry.
In mechanics, any time a potential function evolves slowly with some parameter, it can start with one symmetry and evolve to another lower symmetry. The mechanical system governed by such a potential may undergo a discontinuous change in behavior.
In complex systems and chaos theory, sudden changes in behavior can be quite common as some parameter is changed continuously. These discontinuous changes in behavior, in response to a continuous change in a control parameter, is known as a bifurcation. There are many types of bifurcation, carrying descriptive names like the pitchfork bifurcation, period-doubling bifurcation, Hopf bifurcation, and fold bifurcation, among others. The pitchfork bifurcation is a typical example, shown in Fig. 2. As a parameter is changed continuously (horizontal axis), a stable fixed point suddenly becomes unstable and two new stable fixed points emerge at the same time. This type of bifurcation is called pitchfork because the diagram looks like a three-tined pitchfork. (This is technically called a supercritical pitchfork bifurcation. In a subcritical pitchfork bifurcation the solid and dashed lines are swapped.) This is exactly the bifurcation displayed by a simple mechanical model that illustrates spontaneous symmetry breaking.
Sliding Mass on a Rotating Hoop
One of the simplest mechanical models that displays spontaneous symmetry breaking and the pitchfork bifurcation is a bead sliding without friction on a circular hoop that is spinning on the vertical axis, as in Fig. 3. When it spins very slowly, this is just a simple pendulum with a stable equilibrium at the bottom, and it oscillates with a natural oscillation frequency ω0 = sqrt(g/b), where b is the radius of the hoop and g is the acceleration due to gravity. On the other hand, when it spins very fast, then the bead is flung to to one side or the other by centrifugal force. The bead then oscillates around one of the two new stable fixed points, but the fixed point at the bottom of the hoop is very unstable, because any deviation to one side or the other will cause the centrifugal force to kick in. (Note that in the body frame, centrifugal force is a non-inertial force that arises in the non-inertial coordinate frame. )
The solution uses the Euler equations for the body frame along principal axes. In order to use the standard definitions of ω1, ω2, and ω3, the angle θ MUST be rotated around the x-axis. This means the x-axis points out of the page in the diagram. The y-axis is tilted up from horizontal by θ, and the z-axis is tilted from vertical by θ. This establishes the body frame.
The components of the angular velocity are
And the moments of inertia are (assuming the bead is small)
There is only one Euler equation that is non-trivial. This is for the x-axis and the angle θ. The x-axis Euler equation is
and solving for the angular acceleration gives.
This is a harmonic oscillator with a “phase transition” that occurs as ω increases from zero. At first the stable equilibrium is at the bottom. But when ω passes a critical threshold, the equilibrium angle begins to increase to a finite angle set by the rotation speed.
This can only be real if the magnitude of the argument is equal to or less than unity, which sets the critical threshold spin rate to make the system move to the new stable points to one side or the other for
which interestingly is the natural frequency of the non-rotating pendulum. Note that there are two equivalent angles (positive and negative), so this problem has a degeneracy.
This is an example of a dynamical phase transition that leads to spontaneous symmetry breaking and a pitchfork bifurcation. By integrating the angular acceleration we can get the effective potential for the problem. One contribution to the potential is due to gravity. The other is centrifugal force. When combined and plotted in Fig. 4 for a family of values of the spin rate ω, a pitchfork emerges naturally by tracing the minima in the effective potential. The values of the new equilibrium angles are given in Fig. 2.
Below the transition threshold for ω, the bottom of the hoop is the equilibrium position. To find the natural frequency of oscillation, expand the acceleration expression
For small oscillations the natural frequency is given by
As the effective potential gets flatter, the natural oscillation frequency decreases until it vanishes at the transition spin frequency. As the hoop spins even faster, the new equilibrium positions emerge. To find the natural frequency of the new equilibria, expand θ around the new equilibrium θ’ = θ – θ0
Which is a harmonic oscillator with oscillation angular frequency
Note that this is zero frequency at the transition threshold, then rises to match the spin rate of the hoop at high frequency. The natural oscillation frequency as a function of the spin looks like Fig. 5.
This mechanical analog is highly relevant for the spontaneous symmetry breaking that occurs in ferroelectric crystals when they go through a ferroelectric transition. At high temperature, these crystals have no internal polarization. But as the crystal cools towards the ferroelectric transition temperature, the optical-mode phonon modes “soften” as the phonon frequency decreases and vanishes at the transition temperature when the crystal spontaneously polarizes in one of several equivalent directions. The observation of mode softening in a polar crystal is one signature of an impending ferroelectric phase transition. Our mass on the hoop captures this qualitative physics nicely.
For fun, let’s find at what spin frequency the harmonic oscillation frequency at the dynamic equilibria equal the original natural frequency of the pendulum. Then
which is the golden ratio. It’s spooky how often the golden ratio appears in random physics problems!
The most energetic physical processes in the universe (shy of the Big Bang itself) are astrophysical jets. These are relativistic beams of ions and radiation that shoot out across intergalactic space, emitting nearly the full spectrum of electromagnetic radiation, seen as quasars (quasi-stellar objects) that are thought to originate from supermassive black holes at the center of distant galaxies. The most powerful jets emit more energy than the light from a thousand Milky Way galaxies.
Where can such astronomical amounts of energy come from?
Black Hole Accretion Disks
The potential wells of black holes are so deep and steep, that they attract matter from their entire neighborhood. If a star comes too close, the black hole can rip the hydrogen and helium atoms off the star’s surface and suck them into a death spiral that can only end in oblivion beyond the Schwarzschild radius.
However, just before they disappear, these atoms and ions make one last desperate stand to resist the inevitable pull, and they park themselves near an orbit that is just stable enough that they can survive many orbits before they lose too much energy, through collisions with the other atoms and ions, and resume their in-spiral. This last orbit, called the inner-most stable circular orbit (ISCO), is where matter accumulates into an accretion disk.
The Innermost Stable Circular Orbit (ISCO)
At what radius is the inner-most stable circular orbit? To find out, write the energy equation of a particle orbiting a black hole with an effective potential function as
where the effective potential is
The first two terms of the effective potential are the usual Newtonian terms that include the gravitational potential and the repulsive contribution from the angular momentum that normally prevents the mass from approaching the origin. The third term is the GR term that is attractive and overcomes the centrifugal barrier at small values of r, allowing the orbit to collapse to the center. This is the essential danger of orbiting a black hole—not all orbits around a black hole are stable, and even circular orbits will decay and be swallowed up if too close to the black hole.
To find the conditions for circular orbits, take the derivative of the effective potential and set it to zero
This is a quadratic equation that can be solved for r. There is an innermost stable circular orbit (ISCO) that is obtained when the term in the square root of the quadratic formula vanishes when the angular momentum satisfies the condition
which gives the simple result for the inner-most circular orbit as
Therefore, no particle can sustain a circular orbit with a radius closer than three times the Schwarzschild radius. Inside that, it will spiral into the black hole.
A single trajectory solution to the GR flow  is shown in Fig. 4. The particle begins in an elliptical orbit outside the innermost circular orbit and is captured into a nearly circular orbit inside the ISCO. This orbit eventually decays and spirals with increasing speed into the black hole. Accretion discs around black holes occupy these orbits before collisions cause them to lose angular momentum and spiral into the black hole.
The gravity of black holes is so great, that even photons can orbit black holes in circular orbits. The radius or the circular photon orbit defines what is known as the photon sphere. The radius of the photon sphere is RPS = 1.5RS, which is just a factor of 2 smaller than the ISCO.
Binding Energy of a Particle at the ISCO
So where does all the energy come from to power astrophysical jets? The explanation comes from the binding energy of a particle at the ISCO. The energy conservation equation including angular momentum for a massive particle of mass m orbiting a black hole of mass M is
where the term on the right is the kinetic energy of the particle at infinity, and the second and third terms on the left are the effective potential
Solving for the binding energy at the ISCO gives
Therefore, 6% of the rest energy of the object is given up when it spirals into the ISCO. Remember that the fusion of two hydrogen atoms into helium gives up only about 0.7% of its rest mass energy. Therefore, the energy emission per nucleon for an atom falling towards the ISCO is TEN times more efficient than nuclear fusion!
This incredible energy resource is where the energy for galactic jets and quasars comes from.
 These equations apply for particles that are nonrelativistic. Special relativity effects become important when the orbital radius of the particle approaches the Schwarzschild radius, which introduces relativistic corrections to these equations.
If you have ever seen euphemistically-named “snow”—the black and white dancing pixels on television screens in the old days of cathode-ray tubes—you may think it is nothing but noise. But the surprising thing about noise is that it is a good place to hide information.
Shine a laser pointer on any rough surface and look at the scattered light on a distant wall, then you will see the same patterns of light and dark known as laser speckle. If you move your head or move the pointer, then the speckle shimmers—just like the snow on the old TVs. This laser speckle—this snow—is providing fundamental new ways to extract information hidden inside three-dimensional translucent objects—objects like biological tissue or priceless paintings or silicon chips.
The science fiction novel Snow Crash, published in 1992 by Neal Stephenson, is famous for popularizing virtual reality and the role of avatars. The central mystery of the novel is the mind-destroying mental crash that is induced by Snow—white noise in the metaverse. The protagonist hero of the story—a hacker with an avatar improbably named Hiro Protagonist—must find the source of snow and thwart the nefarious plot behind it.
If Hiro’s snow in his VR headset is caused by laser speckle, then the seemingly random pattern is composed of amplitudes and phases that vary spatially and temporally. There are many ways to make computer-generated versions of speckle. One of the simplest is to just add together a lot of sinusoidal functions with varying orientations and periodicities. This is a “Fourier” approach to speckle which views it as a random superposition of two-dimensional spatial frequencies. An example is shown in Fig. 2 for one sinusoid which has been added to 20 others to generate the speckle pattern on the right. There is still residual periodicity in the speckle for N = 20, but as N increases, the speckle pattern becomes strictly random, like noise.
But if the sinusoids that are being added together link the periodicity with their amplitude through some functional relationship, then the final speckle can be analyze using a 2D Fourier transform to find its spatial frequency spectrum. The functional form of this spectrum can tell a lot about the underlying processes of the speckle formation. This is part of the information hidden inside snow.
An alternative viewpoint to generating a laser speckle pattern thinks in terms of spatially-localized patches that add randomly together with random amplitudes and phases. This is a space-domain view of speckle formation in contrast to the Fourier-space view of the previous construction. Sinusoids are “global” extending spatially without bound. The underlying spatially localized functions can be almost any local function. Gaussians spring to mind, but so do Airy functions, because they are common point-spread functions that participate in the formation of images through lenses. The example in Fig 3a shows one such Airy function, and in 3b for speckle generated from N = 20 Airy functions of varying amplitudes and phases and locations.
These two examples are complementary ways of generating speckle, where the 2D Fourier-domain approach is conjugate to the 2D space-domain approach.
However, laser speckle is actually a 3D phenomenon, and the two-dimensional speckle patterns are just 2D cross sections intersecting a complex 3D pattern of light filaments. To get a sense of how laser speckle is formed in a physical system, one can solve the propagation of a laser beam through a random optical medium. In this way you can visualize the physical formation of the regions of brightness and darkness when the fragmented laser beam exits the random material.
For a quantitative understanding of laser speckle, when 2D laser speckle is formed by an optical system, the central question is how big are the regions of brightness and darkness? This is a question of spatial coherence, and one way to define spatial coherence is through the coherence area at the observation plane
where A is the source emitting area, z is the distance to the observation plane, and Ωs is the solid angle subtended by the source emitting area as seen from the observation point. This expression assumes that the angular spread of the light scattered from the illumination area is very broad. Larger distances and smaller emitting areas (pinholes in an optical diffuser or focused laser spots on a rough surface) produce larger coherence areas in the speckle pattern. For a Gaussian intensity distribution at the emission plane, the coherence area is
for beam waist w0 at the emission plane. To put some numbers to these parameters to give an intuitive sense of the size of speckle spots, assume a wavelength of 1 micron, a focused beam waist of 0.1 mm and a viewing distance of 1 meter. This gives patches with a radius of about 2 millimeters. Examples of laser speckle are shown in Fig. 5 for a variety of beam waist values w0.
Associated with any intensity modulation must be a phase modulation through the Kramers-Kronig relations . Phase cannot be detected directly in the speckle intensity pattern, but it can be measured by using interferometry. One of the easiest interferometric techniques is holography in which a coherent plane wave is caused to intersect, at a small angle, a speckle pattern generated from the same laser source. An example of a speckle hologram and its associated phase is shown in Fig. 6. The fringes of the hologram are formed when a plane reference wave interferes with the speckle field. The fringes are not parallel because of the varying phase of the speckle field, but the average spatial frequency is still recognizable in Fig. 5a. The associated phase map is shown in Fig. 5b.
Optical Vortex Physics
In the speckle intensity field, there are locations where the intensity vanishes, and the phase becomes undefined. In the neighborhood of a singular point the phase wraps around it with a 2pi phase range. Because of the wrapping phase such a singular point is called and optical vortex . Vortices always come in pairs with opposite helicity (defined by the direction of the wrapping phase) with a line of neutral phase between them as shown in Fig. 7. The helicity defines the topological charge of the vortex, and they can have topological charges larger than ±1 if the phase wraps multiple times. In dynamic speckle these vortices are also dynamic and move with speeds related to the underlying dynamics of the scattering medium . Vortices can annihilate if they have opposite helicity, and they can be created in pairs. Studies of singular optics have merged with structured illumination  to create an active field of topological optics with applications in biological microscopy as well as material science.
 D. D. Nolte, Optical Interferometry for Biology and Medicine. (Springer, 2012)
 A. Mecozzi, C. Antonelli, and M. Shtaif, “Kramers-Kronig coherent receiver,” Optica, vol. 3, no. 11, pp. 1220-1227, Nov (2016)
 M. R. Dennis, R. P. King, B. Jack, K. O’Holleran, and M. J. Padgett, “Isolated optical vortex knots,” Nature Physics, vol. 6, no. 2, pp. 118-121, Feb (2010)
 S. J. Kirkpatrick, K. Khaksari, D. Thomas, and D. D. Duncan, “Optical vortex behavior in dynamic speckle fields,” Journal of Biomedical Optics, vol. 17, no. 5, May (2012), Art no. 050504
 H. Rubinsztein-Dunlop, A. Forbes, M. V. Berry, M. R. Dennis, D. L. Andrews, M. Mansuripur, C. Denz, C. Alpmann, P. Banzer, T. Bauer, E. Karimi, L. Marrucci, M. Padgett, M. Ritsch-Marte, N. M. Litchinitser, N. P. Bigelow, C. Rosales-Guzman, A. Belmonte, J. P. Torres, T. W. Neely, M. Baker, R. Gordon, A. B. Stilgoe, J. Romero, A. G. White, R. Fickler, A. E. Willner, G. D. Xie, B. McMorran, and A. M. Weiner, “Roadmap on structured light,” Journal of Optics, vol. 19, no. 1, Jan (2017), Art no. 013001
One of the most important conclusions from chaos theory is that not all random-looking processes are actually random. In deterministic chaos, structures such as strange attractors are not random at all but are fractal structures determined uniquely by the dynamics. But sometimes, in nature, processes really are random, or at least have to be treated as such because of their complexity. Brownian motion is a perfect example of this. At the microscopic level, the jostling of the Brownian particle can be understood in terms of deterministic momentum transfers from liquid atoms to the particle. But there are so many liquid particles that their individual influences cannot be directly predicted. In this situation, it is more fruitful to view the atomic collisions as a stochastic process with well-defined physical parameters and then study the problem statistically. This is what Einstein did in his famous 1905 paper that explained the statistical physics of Brownian motion.
Then there is the middle ground between deterministic mechanics and stochastic mechanics, where complex dynamics gains a stochastic component. This is what Paul Langevin did in 1908 when he generalized Einstein.
Paul Langevin (1872 – 1946) had the fortune to stand at the cross-roads of modern physics, making key contributions, while serving as a commentator expanding on the works of the giants like Einstein and Lorentz and Bohr. He was educated at the École Normale Supérieure and at the Sorbonne with a year in Cambridge studying with J. J. Thompson. At the Sorbonne he worked in the laboratory of Jean Perrin (1870 – 1942) who received the Nobel Prize in 1926 for the experimental work on Brownian motion that had set the stage for Einstein’s crucial analysis of the problem confirming the atomic nature of matter.
Langevin received his PhD in 1902 on the topic of x-ray ionization of gases and was appointed as a lecturer at the College de France to substitute in for Éleuthère Mascart (who was an influential French physicist in optics). In 1905 Langevin published several papers that delved into the problems of Lorentz contraction, coming very close to expressing the principles of relativity. This work later led Einstein to say that, had he delayed publishing his own 1905 paper on the principles of relativity, then Langevin might have gotten there first .
Also in 1905, Langevin published his most influential work, providing the theoretical foundations for the physics of paramagnetism and diamagnetism. He was working closely with Pierre Curie whose experimental work on magnetism had established the central temperature dependence of the phenomena. Langevin used the new molecular model of matter to derive the temperature dependence as well as the functional dependence on magnetic field. One surprising result was that only the valence electrons, moving relativistically, were needed to contribute to the molecular magnetic moment. This later became one of the motivations for Bohr’s model of multi-electron atoms.
Langevin suffered personal tragedy during World War II when the Vichy government arrested him because of his outspoken opposition to fascism. He was imprisoned and eventually released to house arrest. In 1942, his son-in-law was executed by the Nazis, and in 1943 his daughter was sent to Auschwitz. Fearing for his own life, Langevin escaped to Switzerland. He returned shortly after the liberation of Paris and was joined after the end of the war by his daughter who had survived Auschwitz and later served in the Assemblée Consultative as a communist member. Langevin passed away in 1946 and received a national funeral. His remains lie today in the Pantheon.
The Langevin Equation
In 1908, Langevin realized that Einstein’s 1905 theory on Brownian motion could be simplified while at the same time generalized. Langevin introduced a new quantity into theoretical physics—the stochastic force . With this new theoretical tool, he was able to work with diffusing particles in momentum space as dynamical objects with inertia buffeted by random forces, providing a Newtonian formulation for short-time effects that were averaged out and lost in Einstein’s approach.
Stochastic processes are understood by considering a dynamical flow that includes a random function. The resulting set of equations are called the Langevin equation, namely
where fa is a set of N regular functions, and σa is the standard deviation of the a-th process out of N. The stochastic functions ξa are in general non-differentiable but are integrable. They have zero mean, and no temporal correlations. The solution is an N-dimensional trajectory that has properties of a random walk superposed on the dynamics of the underlying mathematical flow.
As an example, take the case of a particle moving in a one-dimensional potential, subject to drag and to an additional stochastic force
where γ is the drag coefficient, U is a potential function and B is the velocity diffusion coefficient. The second term in the bottom equation is the classical force from a potential function, while the third term is the stochastic force. The crucial point is that the stochastic force causes jumps in velocity that integrate into displacements, creating a random walk superposed on the deterministic mechanics.
Random Walk in a Harmonic Potential
Diffusion of a particle in a weak harmonic potential is equivalent to a mass on a weak spring in a thermal bath. For short times, the particle motion looks like a random walk, but for long times, the mean-squared displacement must satisfy the equipartition relation
The Langevin equation is the starting point of motion under a stochastic force F’
where the second equation has been multiplied through by x. For a spherical particle of radius a, the viscous drag factor is
and η is the viscosity. The term on the left of the dynamical equation can be rewritten to give
It is then necessary to take averages. The last term on the right vanishes because of the random signs of xF’. However, the buffeting from the random force can be viewed as arising from an effective temperature. Then from equipartition on the velocity
Making the substitution y = <x2> gives
which is the dynamical equation for a particle in a harmonic potential subject to a constant effective force kBT. For small objects in viscous fluids, the inertial terms are negligible relative to the other terms (see Life at small Reynolds Number ), so the dynamic equation is
This solution at short times describes a diffusing particle (Fickian behavior) with a diffusion coefficient D. However, for long times the solution asymptotes to an equipartition value of <x2> = kBT/k. In the intermediate time regime, the particle is walking randomly, but the mean-squared displacement is no longer growing linearly with time.
Constrained motion shows clear saturation to the size set by the physical constraints (equipartition for an oscillator or compartment size for a freely diffusing particle ). However, if the experimental data do not clearly extend into the saturation time regime, then the fit to anomalous diffusion can lead to exponents that do not equal unity. This is illustrated in Fig. 3 with asymptotic MSD compared with the anomalous diffusion equation fit for the exponent β. Care must be exercised in the interpretation of the exponents obtained from anomalous diffusion experiments. In particular, all constrained motion leads to subdiffusive interpretations if measured at intermediate times.
Random Walk in a Double Potential
The harmonic potential has well-known asymptotic dynamics which makes the analytic treatment straightforward. However, the Langevin equation is general and can be applied to any potential function. Take a double-well potential as another example
The resulting Langevin equation can be solved numerically in the presence of random velocity jumps. A specific stochastic trajectory is shown in Fig. 4 that applies discrete velocity jumps using a normal distribution of jumps of variance 2B. The notable character of this trajectory, besides the random-walk character, is the ability of the particle to jump the barrier between the wells. In the deterministic system, the initial condition dictates which stable fixed point would be approached. In the stochastic system, there are random fluctuations that take the particle from one basin of attraction to the other.
The stochastic long-time probability distribution p(x,v) in Fig. 5 introduces an interesting new view of trajectories in state space that have a different character than typical state-space flows. If we think about starting a large number of systems with the same initial conditions, and then letting the stochastic dynamics take over, we can define a time-dependent probability distribution p(x,v,t) that describes the likely end-positions of an ensemble of trajectories on the state plane as a function of time. This introduces the idea of the trajectory of a probability cloud in state space, which has a strong analogy to time-dependent quantum mechanics. The Schrödinger equation can be viewed as a diffusion equation in complex time, which is the basis of a technique known as quantum Monte Carlo that solves for ground state wave functions using concepts of random walks. This goes beyond the topics of classical mechanics, and it shows how such diverse fields as econophysics, diffusion, and quantum mechanics can share common tools and language.
“Stochastic Chaos” sounds like an oxymoron. “Chaos” is usually synonymous with “deterministic chaos”, meaning that every next point on a trajectory is determined uniquely by its previous location–there is nothing random about the evolution of the dynamical system. It is only when one looks at long times, or at two nearby trajectories, that non-repeatable and non-predictable behavior emerges, so there is nothing stochastic about it.
On the other hand, there is nothing wrong with adding a stochastic function to the right-hand side of a deterministic flow–just as in the Langevin equation. One question immediately arises: if chaos has sensitivity to initial conditions (SIC), wouldn’t it be highly susceptible to constant buffeting by a stochastic force? Let’s take a look!
To the well-known Rössler model, add a stochastic function to one of the three equations,
in this case to the y-dot equation. This is just like the stochastic term in the random walks in the harmonic and double-well potentials. The solution is shown in Fig. 6. In addition to the familiar time-series of the Rössler model, there are stochastic jumps in the y-variable. An x-y projection similarly shows the familiar signature of the model, and the density of trajectory points is shown in the density plot on the right. The rms jump size for this simulation is approximately 10%.
Now for the supposition that because chaos has sensitivity to initial conditions that it should be highly susceptible to stochastic contributions–the answer can be seen in Fig. 7 in the state-space densities. Other than a slightly more fuzzy density for the stochastic case, the general behavior of the Rössler strange attractor is retained. The attractor is highly stable against the stochastic fluctuations. This demonstrates just how robust deterministic chaos is.
On the other hand, there is a saddle point in the Rössler dynamics a bit below the lowest part of the strange attractor in the figure, and if the stochastic jumps are too large, then the dynamics become unstable and diverge. A hint at this is already seen in the time series in Fig. 6 that shows the nearly closed orbit that occurs transiently at large negative y values. This is near the saddle point, and this trajectory is dangerously close to going unstable. Therefore, while the attractor itself is stable, anything that drives a dynamical system to a saddle point will destabilize it, so too much stochasticity can cause a sudden destruction of the attractor.
 E. M. Purcell, “Life at Low Reynolds-Number,” American Journal of Physics, vol. 45, no. 1, pp. 3-11, (1977)
 Ritchie, K., Shan, X.Y., Kondo, J., Iwasawa, K., Fujiwara, T., Kusumi, A.: Detection of non- Brownian diffusion in the cell membrane in single molecule tracking. Biophys. J. 88(3), 2266–2277 (2005)
Physics in high dimensions is becoming the norm in modern dynamics. It is not only that string theory operates in ten dimensions (plus one for time), but virtually every complex dynamical system is described and analyzed within state spaces of high dimensionality. Population dynamics, for instance, may describe hundreds or thousands of different species, each of whose time-varying populations define a separate axis in a high-dimensional space. Coupled mechanical systems likewise may have hundreds or thousands (or more) of degrees of freedom that are described in high-dimensional phase space.
In high-dimensional landscapes, mountain ridges are much more common than mountain peaks. This has profound consequences for the evolution of life, the dynamics of complex systems, and the power of machine learning.
For these reasons, as physics students today are being increasingly exposed to the challenges and problems of high-dimensional dynamics, it is important to build tools they can use to give them an intuitive feeling for the highly unintuitive behavior of systems in high-D.
Within the rapidly-developing field of machine learning, which often deals with landscapes (loss functions or objective functions) in high dimensions that need to be minimized, high dimensions are usually referred to in the negative as “The Curse of Dimensionality”.
Dimensionality might be viewed as a curse for several reasons. First, it is almost impossible to visualize data in dimensions higher than d = 4 (the fourth dimension can sometimes be visualized using colors or time series). Second, too many degrees of freedom create too many variables to fit or model, leading to the classic problem of overfitting. Put simply, there is an absurdly large amount of room in high dimensions. Third, our intuition about relationships among areas and volumes are highly biased by our low-dimensional 3D experiences, causing us to have serious misconceptions about geometric objects in high-dimensional spaces. Physical processes occurring in 3D can be over-generalized to give preconceived notions that just don’t hold true in higher dimensions.
Take, for example, the random walk. It is usually taught starting from a 1-dimensional random walk (flipping a coin) that is then extended to 2D and then to 3D…most textbooks stopping there. But random walks in high dimensions are the rule rather than the exception in complex systems. One example that is especially important in this context is the problem of molecular evolution. Each site on a genome represents an independent degree of freedom, and molecular evolution can be described as a random walk through that space, but the space of all possible genetic mutations is enormous. Faced with such an astronomically large set of permutations, it is difficult to conceive of how random mutations could possibly create something as complex as, say, ATP synthase which is the basis of all higher bioenergetics. Fortunately, the answer to this puzzle lies in the physics of random walks in high dimensions.
Why Ten Dimensions?
This blog presents the physics of random walks in 10 dimensions. Actually, there is nothing special about 10 dimensions versus 9 or 11 or 20, but it gives a convenient demonstration of high-dimensional physics for several reasons. First, it is high enough above our 3 dimensions that there is no hope to visualize it effectively, even by using projections, so it forces us to contend with the intrinsic “unvisualizability” of high dimensions. Second, ten dimensions is just big enough that it behaves roughly like any higher dimension, at least when it comes to random walks. Third, it is about as big as can be handled with typical memory sizes of computers. For instance, a ten-dimensional hypercubic lattice with 10 discrete sites along each dimension has 10^10 lattice points (10 Billion or 10 Gigs) which is about the limit of what a typical computer can handle with internal memory.
As a starting point for visualization, let’s begin with the well-known 4D hypercube but extend it to a 4D hyperlattice with three values along each dimension instead of two. The resulting 4D lattice can be displayed in 2D as a network with 3^4 = 81 nodes and 216 links or edges. The result is shown in Fig. 1, represented in two dimensions as a network graph with nodes and edges. Each node has four links with neighbors. Despite the apparent 3D look that this graph has about it, if you look closely you will see the frustration that occurs when trying to link to 4 neighbors, causing many long-distance links.
We can also look at a 10D hypercube that has 2^10 = 1024 nodes and 5120 edges, shown in Fig. 2. It is a bit difficult to see the hypercubic symmetry when presented in 2D, but each node has exactly 10 links.
Extending this 10D lattice to 10 positions instead of 2 and trying to visualize it is prohibitive, since the resulting graph in 2D just looks like a mass of overlapping circles. However, our interest extends not just to ten locations per dimension, but to an unlimited number of locations. This is the 10D infinite lattice on which we want to explore the physics of the random walk.
Diffusion in Ten Dimensions
An unconstrained random walk in 10D is just a minimal extension beyond a simple random walk in 1D. Because each dimension is independent, a single random walker takes a random step along any of the 10 dimensions at each iteration so that motion in any one of the 10 dimensions is just a 1D random walk. Therefore, a simple way to visualize this random walk in 10D is simply to plot the walk against each dimension, as in Fig. 3. There is one chance in ten that the walker will take a positive or negative step along any given dimension at each time point.
An alternate visualization of the 10D random walker is shown in Fig. 4 for the same data as Fig. 3. In this case the displacement is color coded, and each column is a different dimension. Time is on the vertical axis (starting at the top and increasing downward). This type of color map can easily be extended to hundreds of dimensions. Each row is a position vector of the single walker in the 10D space
In the 10D hyperlattice in this section, all lattice sites are accessible at each time point, so there is no constraint preventing the walk from visiting a previously-visited node. There is a possible adjustment that can be made to the walk that prevents it from ever crossing its own path. This is known as a self-avoiding-walk (SAW). In two dimensions, there is a major difference in the geometric and dynamical properties of an ordinary walk and an SAW. However, in dimensions larger than 4, it turns out that there are so many possibilities of where to go (high-dimensional spaces have so much free room) that it is highly unlikely that a random walk will ever cross itself. Therefore, in our 10D hyperlattice we do not need to make the distinction between an ordinary walk and a self-avoiding-walk. However, there are other constraints that can be imposed that mimic how complex systems evolve in time, and these constraints can have important consequences, as we see next.
Random Walk in a Maximally Rough Landscape
In the infinite hyperlattice of the previous section, all lattice sites are the same and are all equally accessible. However, in the study of complex systems, it is common to assign a value to each node in a high-dimensional lattice. This value can be assigned by a potential function, producing a high-dimensional potential landscape over the lattice geometry. Or the value might be the survival fitness of a species, producing a high-dimensional fitness landscape that governs how species compete and evolve. Or the value might be a loss function (an objective function) in a minimization problem from multivariate analysis or machine learning. In all of these cases, the scalar value on the nodes defines a landscape over which a state point executes a walk. The question then becomes, what are the properties of a landscape in high dimensions, and how does it affect a random walker?
As an example, let’s consider a landscape that is completely random point-to-point. There are no correlations in this landscape, making it maximally rough. Then we require that a random walker takes a walk along iso-potentials in this landscape, never increasing and never decreasing its potential. Beginning with our spatial intuition living in 3D space, we might be concerned that such a walker would quickly get confined in some area of the lanscape. Think of a 2D topo map with countour lines drawn on it — If we start at a certain elevation on a mountain side, then if we must walk along directions that maintain our elevation, we stay on a given contour and eventually come back to our starting point after circling the mountain peak — we are trapped! But this intuition informed by our 3D lives is misleading. What happens in our 10D hyperlattice?
To make the example easy to analyze, let’s assume that our potential function is restricted to N discrete values. This means that of the 10 neighbors to a given walker site, on average only 10/N are likely to have the same potential value as the given walker site. This constrains the available sites for the walker, and it converts the uniform hyperlattice into a hyperlattice site percolation problem.
Percolation theory is a fascinating topic in statistical physics. There are many deep concepts that come from asking simple questions about how nodes are connected across a network. The most important aspect of percolation theory is the concept of a percolation threshold. Starting with a complete network that is connected end-to-end, start removing nodes at random. For some critical fraction of nodes removed (on average) there will no longer be a single connected cluster that spans the network. This critical fraction is known as the percolation threshold. Above the percolation threshold, a random walker can get from one part of the network to another. Below the percolation threshold, the random walker is confined to a local cluster.
If a hyperlattice has N discrete values for the landscape potential (or height, or contour) and if a random walker can only move to site that has the same value as the walker’s current value (remains on the level set), then only a fraction of the hyperlattice sites are available to the walker, and the question of whether the walker can find a path the spans the hyperlattice becomes simply a question of how the fraction of available sites relates to the percolation threshold.
The percolation threshold for hyperlattices is well known. For reasonably high dimensions, it is given to good accuracy by
where d is the dimension of the hyperlattice. For a 10D hyperlattice the percolation threshold is pc(10) = 0.0568, or about 6%. Therefore, if more than 6% of the sites of the hyperlattice have the same value as the walker’s current site, then the walker is free to roam about the hyperlattice.
If there are N = 5 discrete values for the potential, then 20% of the sites are available, which is above the percolation threshold, and walkers can go as far as they want. This statement holds true no matter what the starting value is. It might be 5, which means the walker is as high on the landscape as they can get. Or it might be 1, which means the walker is as low on the landscape as they can get. Yet even if they are at the top, if the available site fraction is above the percolation threshold, then the walker can stay on the high mountain ridge, spanning the landscape. The same is true if they start at the bottom of a valley. Therefore, mountain ridges are very common, as are deep valleys, yet they allow full mobility about the geography. On the other hand, a so-called mountain peak would be a 5 surrounded by 4’s or lower. The odds for having this happen in 10D are 0.2*(1-0.8^10) = 0.18. Then the total density of mountain peaks, in a 10D hyperlattice with 5 potential values, is only 18%. Therefore, mountain peaks are rare in 10D, while mountain ridges are common. In even higher dimensions, the percolation threshold decreases roughly inversely with the dimensionality, and mountain peaks become extremely rare and play virtually no part in walks about the landscape.
To illustrate this point, Fig. 5 is the same 10D network that is in Fig. 2, but only the nodes sharing the same value are shown for N = 5, which means that only 20% of the nodes are accessible to a walker who stays only on nodes with the same values. There is a “giant cluster” that remains connected, spanning the original network. If the original network is infinite, then the giant cluster is also infinite but contains a finite fraction of the nodes.
The quantitative details of the random walk can change depending on the proximity of the sub-networks (the clusters, the ridges or the level sets) to the percolation threshold. For instance, a random walker in D =10 with N = 5 is shown in Fig. 6. The diffusion is a bit slower than in the unconstrained walk of Figs. 3 and 4. But the ability to wander about the 10D space is retained.
This is then the general important result: In high-dimensional landscapes, mountain ridges are much more common than mountain peaks. This has profound consequences for the evolution of life, the dynamics of complex systems, and the power of machine learning.
Consequences for Evolution and Machine Learning
When the high-dimensional space is the space of possible mutations on a genome, and when the landscape is a fitness landscape that assigns a survival advantage for one mutation relative to others, then the random walk describes the evolution of a species across generations. The prevalence of ridges, or more generally level sets, in high dimensions has a major consequence for the evolutionary process, because a species can walk along a level set acquiring many possible mutations that have only neutral effects on the survivability of the species. At the same time, the genetic make-up is constantly drifting around in this “neutral network”, allowing the species’ genome to access distant parts of the space. Then, at some point, natural selection may tip the species up a nearby (but rare) peak, and a new equilibrium is attained for the species.
One of the early criticisms of fitness landscapes was the (erroneous) criticism that for a species to move from one fitness peak to another, it would have to go down and cross wide valleys of low fitness to get to another peak. But this was a left-over from thinking in 3D. In high-D, neutral networks are ubiquitous, and a mutation can take a step away from one fitness peak onto one of the neutral networks, which can be sampled by a random walk until the state is near some distant peak. It is no longer necessary to think in terms of high peaks and low valleys of fitness — just random walks. The evolution of extremely complex structures, like ATP synthase, can then be understood as a random walk along networks of nearly-neutral fitness — once our 3D biases are eliminated.
The same arguments hold for many situations in machine learning and especially deep learning. When training a deep neural network, there can be thousands of neural weights that need to be trained through the minimization of a loss function, also known as an objective function. The loss function is the equivalent to a potential, and minimizing the loss function over the thousands of dimensions is the same problem as maximizing the fitness of an evolving species.
At first look, one might think that deep learning is doomed to failure. We have all learned, from the earliest days in calculus, that enough adjustable parameter can fit anything, but the fit is meaningless because it predicts nothing. Deep learning seems to be the worst example of this. How can fitting thousands of adjustable parameters be useful when the dimensionality of the optimization space is orders of magnitude larger than the degrees of freedom of the system being modeled?
The answer comes from the geometry of high dimensions. The prevalence of neutral networks in high dimensions gives lots of chances to escape local minima. In fact, local minima are actually rare in high dimensions, and when they do occur, there is a neutral network nearby onto which they can escape (if the effective temperature of the learning process is set sufficiently high). Therefore, despite the insanely large number of adjustable parameters, general solutions, that are meaningful and predictive, can be found by adding random walks around the objective landscape as a partial strategy in combination with gradient descent.
Given the superficial analogy of deep learning to the human mind, the geometry of random walks in ultra-high dimensions may partially explain our own intelligence and consciousness.
S. Gravilet, Fitness Landscapes and the Origins of Species. Princeton University Press, 2004.
M. Kimura, The Neutral Theory of Molecular Evolution. Cambridge University Press, 1968.
One of the hardest aspects to grasp about relativity theory is the question of whether an event “look as if” it is doing something, or whether it “actually is” doing something.
Take, for instance, the classic twin paradox of relativity theory in which there are twins who wear identical high-precision wrist watches. One of them rockets off to Alpha Centauri at relativistic speeds and returns while the other twin stays on Earth. Each twin sees the other twin’s clock running slowly because of relativistic time dilation. Yet when they get back together and, standing side-by-side, they compare their watches—the twin who went to Alpha Centauri is actually younger than the other, despite the paradox. The relativistic effect of time dilation is “real”, not just apparent, regardless of whether they come back together to do the comparison.
Yet this understanding of relativistic effects took many years, even decades, to gain acceptance after Einstein proposed them. He was aware himself that key experiments were required to prove that relativistic effects are real and not just apparent.
Einstein and the Transverse Doppler Effect
In 1905 Einstein used his new theory of special relativity to predict observable consequences that included a general treatment of the relativistic Doppler effect . This included the effects of time dilation in addition to the longitudinal effect of the source chasing the wave. Time dilation produced a correction to Doppler’s original expression for the longitudinal effect that became significant at speeds approaching the speed of light. More significantly, it predicted a transverse Doppler effect for a source moving along a line perpendicular to the line of sight to an observer. This effect had not been predicted either by Christian Doppler (1803 – 1853) or by Woldemar Voigt (1850 – 1919).
Despite the generally positive reception of Einstein’s theory of special relativity, some of its consequences were anathema to many physicists at the time. A key stumbling block was the question whether relativistic effects, like moving clocks running slowly, were only apparent, or were actually real, and Einstein had to fight to convince others of its reality. When Johannes Stark (1874 – 1957) observed Doppler line shifts in ion beams called “canal rays” in 1906 (Stark received the 1919 Nobel prize in part for this discovery) , Einstein promptly published a paper suggesting how the canal rays could be used in a transverse geometry to directly detect time dilation through the transverse Doppler effect . Thirty years passed before the experiment was performed with sufficient accuracy by Herbert Ives and G. R. Stilwell in 1938 to measure the transverse Doppler effect . Ironically, even at this late date, Ives and Stilwell were convinced that their experiment had disproved Einstein’s time dilation by supporting Lorentz’ contraction theory of the electron. The Ives-Stilwell experiment was the first direct test of time dilation, followed in 1940 by muon lifetime measurements .
A) Transverse Doppler Shift Relative to EmissionAngle
The Doppler effect varies between blue shifts in the forward direction to red shifts in the backward direction, with a smooth variation in Doppler shift as a function of the emission angle. Consider the configuration shown in Fig. 1 for light emitted from a source moving at speed v and emitting at an angle θ0 in the receiver frame. The source moves a distance vT in the time of a single emission cycle (assume a harmonic wave). In that time T (which is the period of oscillation of the light source — or the period of a clock if we think of it putting out light pulses) the light travels a distance cT before another cycle begins (or another pulse is emitted).
The observed wavelength in the receiver frame is thus given by
where T is the emission period of the moving source. Importantly, the emission period is time dilated relative to the proper emission time of the source
This expression can be evaluated for several special cases:
a) θ0 = 0 for forward emission
which is the relativistic blue shift for longitudinal motion in the direction of the receiver.
b) θ0 = π for backward emission
which is the relativistic red shift for longitudinal motion away from the receiver
c) θ0 = π/2 for transverse emission
This transverse Doppler effect for emission at right angles is a red shift, caused only by the time dilation of the moving light source. This is the effect proposed by Einstein and observed by Stark that proved moving clocks tick slowly. But it is not the only way to view the transverse Doppler effect.
B) Transverse Doppler Shift Relative to Angle at Reception
A different option for viewing the transverse Doppler effect is the angle to the moving source at the moment that the light is detected. The geometry of this configuration relative to the previous is illustrated in Fig. 2.
The transverse distance to the detection point is
The length of the line connecting the detection point P with the location of the light source at the moment of detection is (using the law of cosines)
Combining with the first equation gives
An equivalent expression is obtained as
Note that this result, relating θ1 to θ0, is independent of the distance to the observation point.
When θ1 = π/2, then
for which the Doppler effect is
which is a blue shift. This creates the unexpected result that sin θ0 = π/2 produces a red shift, while sin θ1 = π/2 produces a blue shift. The question could be asked: which one represents time dilation? In fact, it is sin θ0 = π/2 that produces time dilation exclusively, because in that configuration there is no foreshortening effect on the wavelength–only the emission time.
C) Compromise: The Null Transverse Doppler Shift
The previous two configurations each could be used as a definition for the transverse Doppler effect. But one gives a red shift and one gives a blue shift, which seems contradictory. Therefore, one might try to strike a compromise between these two cases so that sin θ1 = sin θ0, and the configuration is shown in Fig. 3.
This is the case when θ1 + θ2 = π. The sines of the two angles are equal, yielding
which is solved for
Inserting this into the Doppler equation gives
where the Taylor’s expansion of the denominator (at low speed) cancels the numerator to give zero net Doppler shift. This compromise configuration represents the condition of null Doppler frequency shift. However, for speeds approaching the speed of light, the net effect is a lengthening of the wavelength, dominated by time dilation, causing a red shift.
D) Source in Circular Motion Around Receiver
An interesting twist can be added to the problem of the transverse Doppler effect: put the source or receiver into circular motion, one about the other. In the case of a source in circular motion around the receiver, it is easy to see that this looks just like case A) above for θ0 = π/2, which is the red shift caused by the time dilation of the moving source
However, there is the possible complication that the source is no longer in an inertial frame (it experiences angular acceleration) and therefore it is in the realm of general relativity instead of special relativity. In fact, it was Einstein’s solution to this problem that led him to propose the Equivalence Principle and make his first calculations on the deflection of light by gravity. His solution was to think of an infinite number of inertial frames, each of which was instantaneously co-moving with the same linear velocity as the source. These co-moving frames are inertial and can be analyzed using the principles of special relativity. The general relativistic effects come from slipping from one inertial co-moving frame to the next. But in the case of the circular transverse Doppler effect, each instantaneously co-moving frame has the exact configuration as case A) above, and so the wavelength is red shifted exactly by the time dilation.
E) Receiver in Circular Motion Around Source
With the notion of co-moving inertial frames now in hand, this configuration is exactly the same as case B) above, and the wavelength is blue shifted
 A. Einstein, “On the electrodynamics of moving bodies,” Annalen Der Physik, vol. 17, no. 10, pp. 891-921, Sep (1905)
 D. D. Nolte, “The Fall and Rise of the Doppler Effect,” Physics Today, vol. 73, no. 3, pp. 31-35, Mar (2020)
 J. Stark, W. Hermann, and S. Kinoshita, “The Doppler effect in the spectrum of mercury,” Annalen Der Physik, vol. 21, pp. 462-469, Nov 1906.
 A. Einstein, “Possibility of a new examination of the relativity principle,” Annalen Der Physik, vol. 23, no. 6, pp. 197-198, May (1907)
 H. E. Ives and G. R. Stilwell, “An experimental study of the rate of a moving atomic clock,” Journal of the Optical Society of America, vol. 28, p. 215, 1938.
 B. Rossi and D. B. Hall, “Variation of the Rate of Decay of Mesotrons with Momentum,” Physical Review, vol. 59, pp. 223–228, 1941.
… GR combined with nonlinear synchronization yields the novel phenomenon of a “synchronization cascade”.
Imagine a space ship containing a collection of highly-accurate atomic clocks factory-set to arbitrary precision at the space-ship factory before launch. The clocks are lined up with precisely-equal spacing along the axis of the space ship, which should allow the astronauts to study events in spacetime to high accuracy as they orbit neutron stars or black holes. Despite all the precision, spacetime itself will conspire to detune the clocks. Yet all is not lost. Using the physics of nonlinear synchronization, the astronauts can bring all the clocks together to a compromise frequency—locking all the clocks to a common rate. This blog post shows how this can happen.
Synchronization of Oscillators
The simplest synchronization problem is two “phase oscillators” coupled with a symmetric nonlinearity. The dynamical flow is
where ωk are the individual angular frequencies and g is the coupling constant. When g is greater than the difference Δω, then the two oscillators, despite having different initial frequencies, will find a stable fixed point and lock to a compromise frequency.
Taking this model to N phase oscillators creates the well-known Kuramoto model that is characterized by a relatively sharp mean-field phase transition leading to global synchronization. The model averages N phase oscillators to a mean field where g is the coupling coefficient, K is the mean amplitude, Θ is the mean phase, and ω-bar is the mean frequency. The dynamics are given by
The last equation is the final mean-field equation that synchronizes each individual oscillator to the mean field. For a large number of oscillators that are globally coupled to each other, increasing the coupling has little effect on the oscillators until a critical threshold is crossed, after which all the oscillators synchronize with each other. This is known as the Kuramoto synchronization transition, shown in Fig. 2 for 20 oscillators with uniformly distributed initial frequencies. Note that the critical coupling constant gc is roughly half of the spread of initial frequencies.
The question that this blog seeks to answer is how this synchronization mechanism may be used in a space craft exploring the strong gravity around neutron stars or black holes. The key to answering this question is the metric tensor for this system
where the first term is the time-like term g00 that affects ticking clocks, and the second term is the space-like term that affects the length of the space craft.
Kuramoto versus the Neutron Star
Consider the space craft holding a steady radius above a neutron star, as in Fig. 3. For simplicity, hold the craft stationary rather than in an orbit to remove the details of rotating frames. Because each clock is at a different gravitational potential, it runs at a different rate because of gravitational time dilation–clocks nearer to the neutron star run slower than clocks farther away. There is also a gravitational length contraction of the space craft, which modifies the clock rates as well.
The analysis starts by incorporating the first-order approximation of time dilation through the component g00. The component is brought in through the period of oscillations. All frequencies are referenced to the base oscillator that has the angular rate ω0, and the other frequencies are primed. As we consider oscillators higher in the space craft at positions R + h, the 1/(R+h) term in g00 decreases as does the offset between each successive oscillator.
The dynamical equations for a system for only two clocks, coupled through the constant k, are
These are combined to a single equation by considering the phase difference
The two clocks will synchronize to a compromise frequency for the critical coupling coefficient
Now, if there is a string of N clocks, as in Fig. 3, the question is how the frequencies will spread out by gravitational time dilation, and what the entrainment of the frequencies to a common compromise frequency looks like. If the ship is located at some distance from the neutron star, then the gravitational potential at one clock to the next is approximately linear, and coupling them would produce the classic Kuramoto transition.
However, if the ship is much closer to the neutron star, so that the gravitational potential is no longer linear, then there is a “fan-out” of frequencies, with the bottom-most clock ticking much more slowly than the top-most clock. Coupling these clocks produces a modified, or “stretched”, Kuramoto transition as in Fig. 4.
In the two examples in Fig. 4, the bottom-most clock is just above the radius of the neutron star (at R0 = 4RS for a solar-mass neutron star, where RS is the Schwarzschild radius) and at twice that radius (at R0 = 8RS). The length of the ship, along which the clocks are distributed, is RS in this example. This may seem unrealistically large, but we could imagine a regular-sized ship supporting a long stiff cable dangling below it composed of carbon nanotubes that has the clocks distributed evenly on it, with the bottom-most clock at the radius R0. In fact, this might be a reasonable design for exploring spacetime events near a neutron star (although even carbon nanotubes would not be able to withstand the strain).
Kuramoto versus the Black Hole
Against expectation, exploring spacetime around a black hole is actually easier than around a neutron star, because there is no physical surface at the Schwarzschild radius RS, and gravitational tidal forces can be small for large black holes. In fact, one of the most unintuitive aspects of black holes pertains to a space ship falling into one. A distant observer sees the space ship contracting to zero length and the clocks slowing down and stopping as the space ship approaches the Schwarzschild radius asymptotically, but never crossing it. However, on board the ship, all appears normal as it crosses the Schwarzschild radius. To the astronaut inside, there is is a gravitational potential inside the space ship that causes the clocks at the base to run more slowly than the upper clocks, and length contraction affects the spacing a little, but otherwise there is no singularity as the event horizon is passed. This appears as a classic “paradox” of physics, with two different observers seeing paradoxically different behaviors.
The resolution of this paradox lies in the differential geometry of the two observers. Each approximates spacetime with a Euclidean coordinate system that matches the local coordinates. The distant observer references the warped geometry to this “chart”, which produces the apparent divergence of the Schwarzschild metric at RS. However, the astronaut inside the space ship has her own flat chart to which she references the locally warped space time around the ship. Therefore, it is the differential changes, referenced to the ships coordinate origin, that capture gravitational time dilation and length contraction. Because the synchronization takes place in the local coordinate system of the ship, this is the coordinate system that goes into the dynamical equations for synchronization. Taking this approach, the shifts in the clock rates are given by the derivative of the metric as
where hn is the height of the n-th clock above R0.
Fig. 5 shows the entrainment plot for the black hole. The plot noticeably has a much smoother transition. In this higher mass case, the system does not have as many hard coupling transitions and instead exhibits smooth behavior for global coupling. This is the Kuramoto “cascade”. Contrast the behavior of Fig. 5 (left) to the classic Kuramoto transition of Fig. 2. The increasing frequency separations near the black hole produces a succession of frequency locks as the coupling coefficient increases. For comparison, the case of linear coupling along the cable is shown in Fig. 5 on the right. The cascade is now accompanied with interesting oscillations as one clock entrains with a neighbor, only to be pulled back by interaction with locked subclusters.
Now let us consider what role the spatial component of the metric tensor plays in the synchronization. The spatial component causes the space between the oscillators to decrease closer to the supermassive object. This would cause the oscillators to entrain faster because the bottom oscillators that entrain the slowest would be closer together, but the top oscillators would entrain slower since they are a farther distance apart, as in Fig. 6.
In terms of the local coordinates of the space ship, the locations of each clock are
These values for hn can be put into the equation for ωn above. But it is clear that this produces a second order effect. Even at the event horizon, this effect is only a fraction of the shifts caused by g00 directly on the clocks. This is in contrast to what a distant observer sees–the clock separations decreasing to zero, which would seem to decrease the frequency shifts. But the synchronization coupling is performed in the ship frame, not the distant frame, so the astronaut can safely ignore this contribution.
As a final exploration of the black hole, before we leave it behind, look at the behavior for different values of R0 in Fig. 7. At 4RS, the Kuramoto transition is stretched. At 2RS there is a partial Kuramoto transition for the upper clocks, that then stretch into a cascade of locking events for the lower clocks. At 1RS we see the full cascade as before.
Note from the Editor:
This blog post by Moira Andrews is based on her final project for Phys 411, upper division undergraduate mechanics, at Purdue University. Students are asked to combine two seemingly-unrelated aspects of modern dynamics and explore the results. Moira thought of synchronizing clocks that are experiencing gravitational time dilation near a massive body. This is a nice example of how GR combined with nonlinear synchronization yields the novel phenomenon of a “synchronization cascade”.
Cheng, T.-P. (2010). Relativity, Gravitation and Cosmology. Oxford University Press.
“Society is founded on hero worship”, wrote Thomas Carlyle (1795 – 1881) in his 1840 lecture on “Hero as Divinity”—and the society of physicists is no different. Among physicists, the hero is the genius—the monomyth who journeys into the supernatural realm of high mathematics, engages in single combat against chaos and confusion, gains enlightenment in the mysteries of the universe, and returns home to share the new understanding. If the hero is endowed with unusual talent and achieves greatness, then mythologies are woven, creating shadows that can grow and eclipse the truth and the work of others, bestowing upon the hero recognitions that are not entirely deserved.
“Gentlemen! The views of space and time which I wish to lay before you … They are radical. Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.”
Herman Minkowski (1908)
The greatest hero of physics of the twentieth century, without question, is Albert Einstein. He is the person most responsible for the development of “Modern Physics” that encompasses:
Relativity theory (both special and general),
Quantum theory (he invented the quantum in 1905—see my blog),
Astrophysics (his field equations of general relativity were solved by Schwarzschild in 1916 to predict event horizons of black holes, and he solved his own equations to predict gravitational waves that were discovered in 2015),
Cosmology (his cosmological constant is now recognized as the mysterious dark energy that was discovered in 2000), and
Solid state physics (his explanation of the specific heat of crystals inaugurated the field of quantum matter).
Einstein made so many seminal contributions to so many sub-fields of physics that it defies comprehension—hence he is mythologized as genius, able to see into the depths of reality with unique insight. He deserves his reputation as the greatest physicist of the twentieth century—he has my vote, and he was chosen by Time magazine in 2000 as the Man of the Century. But as his shadow has grown, it has eclipsed and even assimilated the work of others—work that he initially criticized and dismissed, yet later embraced so whole-heartedly that he is mistakenly given credit for its discovery.
For instance, when we think of Einstein, the first thing that pops into our minds is probably “spacetime”. He himself wrote several popular accounts of relativity that incorporated the view that spacetime is the natural geometry within which so many of the non-intuitive properties of relativity can be understood. When we think of time being mixed with space, making it seem that position coordinates and time coordinates share an equal place in the description of relativistic physics, it is common to attribute this understanding to Einstein. Yet Einstein initially resisted this viewpoint and even disparaged it when he first heard it!
Spacetime was the brain-child of Hermann Minkowski.
Minkowski in Königsberg
Hermann Minkowski was born in 1864 in Russia to German parents who moved to the city of Königsberg (King’s Mountain) in East Prussia when he was eight years old. He entered the university in Königsberg in 1880 when he was sixteen. Within a year, when he was only seventeen years old, and while he was still a student at the University, Minkowski responded to an announcement of the Mathematics Prize of the French Academy of Sciences in 1881. When he submitted is prize-winning memoire, he could have had no idea that it was starting him down a path that would lead him years later to revolutionary views.
The specific Prize challenge of 1881 was to find the number of representations of an integer as a sum of five squares of integers. For instance, every integer n > 33 can be expressed as the sum of five nonzero squares. As an example, 42 = 22 + 22 + 32 + 32 + 42, which is the only representation for that number. However, there are five representation for n = 53
The task of enumerating these representations draws from the theory of quadratic forms. A quadratic form is a function of products of numbers with integer coefficients, such as ax2 + bxy + cy2 and ax2 + by2 + cz2 + dxy + exz + fyz. In number theory, one seeks to find integer solutions for which the quadratic form equals an integer. For instance, the Pythagorean theorem x2 + y2 = n2 for integers is a quadratic form for which there are many integer solutions (x,y,n), known as Pythagorean triplets, such as
The topic of quadratic forms gained special significance after the work of Bernhard Riemann who established the properties of metric spaces based on the metric expression
for infinitesimal distance in a D-dimensional metric space. This is a generalization of Euclidean distance to more general non-Euclidean spaces that may have curvature. Minkowski would later use this expression to great advantage, developing a “Geometry of Numbers”  as he delved ever deeper into quadratic forms and their uses in number theory.
Minkowski in Göttingen
After graduating with a doctoral degree in 1885 from Königsberg, Minkowski did his habilitation at the university of Bonn and began teaching, moving back to Königsberg in 1892 and then to Zurich in 1894 (where one of his students was a somewhat lazy and unimpressive Albert Einstein). A few years later he was given an offer that he could not refuse.
At the turn of the 20th century, the place to be in mathematics was at the University of Göttingen. It had a long tradition of mathematical giants that included Carl Friedrich Gauss, Bernhard Riemann, Peter Dirichlet, and Felix Klein. Under the guidance of Felix Klein, Göttingen mathematics had undergone a renaissance. For instance, Klein had attracted Hilbert from the University of Königsberg in 1895. David Hilbert had known Minkowski when they were both students in Königsberg, and Hilbert extended an invitation to Minkowski to join him in Göttingen, which Minkowski accepted in 1902.
A few years after Minkowski arrived at Göttingen, the relativity revolution broke, and both Minkowski and Hilbert began working on mathematical aspects of the new physics. They organized a colloquium dedicated to relativity and related topics, and on Nov. 5, 1907 Minkowski gave his first tentative address on the geometry of relativity.
Because Minkowski’s specialty was quadratic forms, and given his understanding of Riemann’s work, he was perfectly situated to apply his theory of quadratic forms and invariants to the Lorentz transformations derived by Poincaré and Einstein. Although Poincaré had published a paper in 1906 that showed that the Lorentz transformation was a generalized rotation in four-dimensional space , Poincaré continued to discuss space and time as separate phenomena, as did Einstein. For them, simultaneity was no longer an invariant, but events in time were still events in time and not somehow mixed with space-like properties. Minkowski recognized that Poincaré had missed an opportunity to define a four-dimensional vector space filled by four-vectors that captured all possible events in a single coordinate description without the need to separate out time and space.
Minkowski’s first attempt, presented in his 1907 colloquium, at constructing velocity four-vectors was flawed because (like so many of my mechanics students when they first take a time derivative of the four-position) he had not yet understood the correct use of proper time. But the research program he outlined paved the way for the great work that was to follow.
On Feb. 21, 1908, only 3 months after his first halting steps, Minkowski delivered a thick manuscript to the printers for an article to appear in the Göttinger Nachrichten. The title “Die Grundgleichungen für die elektromagnetischen Vorgänge in bewegten Körpern” (The Basic Equations for Electromagnetic Processes of Moving Bodies) belies the impact and importance of this very dense article . In its 60 pages (with no figures), Minkowski presents the correct form for four-velocity by taking derivatives relative to proper time, and he formalizes his four-dimensional approach to relativity that became the standard afterwards. He introduces the terms spacelikevector, timelike vector, light cone and world line. He also presents the complete four-tensor form for the electromagnetic fields. The foundational work of Levi Cevita and Ricci-Curbastro on tensors was not yet well known, so Minkowski invents his own terminology of Traktor to describe it. Most importantly, he invents the terms spacetime (Raum-Zeit) and events (Erignisse) .
Minkowski’s four-dimensional formalism of relativistic electromagnetics was more than a mathematical trick—it uncovered the presence of a multitude of invariants that were obscured by the conventional mathematics of Einstein and Lorentz and Poincaré. In Minkowski’s approach, whenever a proper four-vector is contracted with itself (its inner product), an invariant emerges. Because there are many fundamental four-vectors, there are many invariants. These invariants provide the anchors from which to understand the complex relative properties amongst relatively moving frames.
Minkowski’s master work appeared in the Nachrichten on April 5, 1908. If he had thought that physicists would embrace his visionary perspective, he was about to be woefully disabused of that notion.
Despite his impressive ability to see into the foundational depths of the physical world, Einstein did not view mathematics as the root of reality. Mathematics for him was a tool to reduce physical intuition into quantitative form. In 1908 his fame was rising as the acknowledged leader in relativistic physics, and he was not impressed or pleased with the abstract mathematical form that Minkowski was trying to stuff the physics into. Einstein called it “superfluous erudition” , and complained “since the mathematics pounced on the relativity theory, I no longer understand it myself! ”
With his collaborator Jakob Laub (also a former student of Minkowski’s), Einstein objected to more than the hard-to-follow mathematics—they believed that Minkowski’s form of the pondermotive force was incorrect. They then proceeded to re-translate Minkowski’s elegant four-vector derivations back into ordinary vector analysis, publishing two papers in Annalen der Physik in the summer of 1908 that were politely critical of Minkowski’s approach [7-8]. Yet another of Minkowski’s students from Zurich, Gunnar Nordström, showed how to derive Minkowski’s field equations without any of the four-vector formalism.
One can only wonder why so many of his former students so easily dismissed Minkowski’s revolutionary work. Einstein had actually avoided Minkowski’s mathematics classes as a student at ETH , which may say something about Minkowski’s reputation among the students, although Einstein did appreciate the class on mechanics that he took from Minkowski. Nonetheless, Einstein missed the point! Rather than realizing the power and universality of the four-dimensional spacetime formulation, he dismissed it as obscure and irrelevant—perhaps prejudiced by his earlier dim view of his former teacher.
Raum und Zeit
It is clear that Minkowski was stung by the poor reception of his spacetime theory. It is also clear that he truly believed that he had uncovered an essential new approach to physical reality. While mathematicians were generally receptive of his work, he knew that if physicists were to adopt his new viewpoint, he needed to win them over with the elegant results.
In 1908, Minkowski presented a now-famous paper Raum und Zeit at the 80thAssembly of German Natural Scientists and Physicians (21 September 1908). In his opening address, he stated :
To illustrate his arguments Minkowski constructed the most recognizable visual icon of relativity theory—the space-time diagram in which the trajectories of particles appear as “world lines”, as in Fig. 1. On this diagram, one spatial dimension is plotted along the horizontal-axis, and the value ct (speed of light times time) is plotted along the vertical-axis. In these units, a photon travels along a line oriented at 45 degrees, and the world-line (the name Minkowski gave to trajectories) of all massive particles must have slopes steeper than this. For instance, a stationary particle, that appears to have no trajectory at all, executes a vertical trajectory on the space-time diagram as it travels forward through time. Within this new formulation by Minkowski, space and time were mixed together in a single manifold—spacetime—and were no longer separate entities.
In addition to the spacetime construct, Minkowski’s great discovery was the plethora of invariants that followed from his geometry. For instance, the spacetime hyperbola
is invariant to Lorentz transformation in coordinates. This is just a simple statement that a vector is an entity of reality that is independent of how it is described. The length of a vector in our normal three-space does not change if we flip the coordinates around or rotate them, and the same is true for four-vectors in Minkowski space subject to Lorentz transformations.
In relativity theory, this property of invariance becomes especially useful because part of the mental challenge of relativity is that everything looks different when viewed from different frames. How do you get a good grip on a phenomenon if it is always changing, always relative to one frame or another? The invariants become the anchors that we can hold on to as reference frames shift and morph about us.
As an example of a fundamental invariant, the mass of a particle in its rest frame becomes an invariant mass, always with the same value. In earlier relativity theory, even in Einstein’s papers, the mass of an object was a function of its speed. How is the mass of an electron a fundamental property of physics if it is a function of how fast it is traveling? The construction of invariant mass removes this problem, and the mass of the electron becomes an immutable property of physics, independent of the frame. Invariant mass is just one of many invariants that emerge from Minkowski’s space-time description. The study of relativity, where all things seem relative, became a study of invariants, where many things never change. In this sense, the theory of relativity is a misnomer. Ironically, relativity theory became the motivation of post-modern relativism that denies the existence of absolutes, even as relativity theory, as practiced by physicists, is all about absolutes.
Despite his audacious gambit to win over the physicists, Minkowski would not live to see the fruits of his effort. He died suddenly of a burst gall bladder on Jan. 12, 1909 at the age of 44.
Arnold Sommerfeld (who went on to play a central role in the development of quantum theory) took up Minkowski’s four vectors, and he systematized it in a way that was palatable to physicists. Then Max von Laue extended it while he was working with Sommerfeld in Munich, publishing the first physics textbook on relativity theory in 1911, establishing the space-time formalism for future generations of German physicists. Further support for Minkowski’s work came from his distinguished colleagues at Göttingen (Hilbert, Klein, Wiechert, Schwarzschild) as well as his former students (Born, Laue, Kaluza, Frank, Noether). With such champions, Minkowski’s work was immortalized in the methodology (and mythology) of physics, representing one of the crowning achievements of the Göttingen mathematical community.
Already in 1907 Einstein was beginning to grapple with the role of gravity in the context of relativity theory, and he knew that the special theory was just a beginning. Yet between 1908 and 1910 Einstein’s focus was on the quantum of light as he defended and extended his unique view of the photon and prepared for the first Solvay Congress of 1911. As he returned his attention to the problem of gravitation after 1910, he began to realize that Minkowski’s formalism provided a framework from which to understand the role of accelerating frames. In 1912 Einstein wrote to Sommerfeld to say 
I occupy myself now exclusively with the problem of gravitation . One thing is certain that I have never before had to toil anywhere near as much, and that I have been infused with great respect for mathematics, which I had up until now in my naivety looked upon as a pure luxury in its more subtle parts. Compared to this problem. the original theory of relativity is child’s play.
By the time Einstein had finished his general theory of relativity and gravitation in 1915, he fully acknowledge his indebtedness to Minkowski’s spacetime formalism without which his general theory may never have appeared.
Einstein’s theory of gravity came from a simple happy thought that occurred to him as he imagined an unfortunate worker falling from a roof, losing hold of his hammer, only to find both the hammer and himself floating motionless relative to each other as if gravity had ceased to exist. With this one thought, Einstein realized that the falling (i.e. accelerating) reference frame was in fact an inertial frame, and hence all the tricks that he had learned and invented to deal with inertial relativistic frames could apply just as well to accelerating frames in gravitational fields.
Gravitational lensing (and microlensing) have become a major tool of discovery in astrophysics applied to the study of quasars, dark matter and even the search for exoplanets.
Armed with this new perspective, one of the earliest discoveries that Einstein made was that gravity must bend light paths. This phenomenon is fundamentally post-Newtonian, because there can be no possible force of gravity on a massless photon—yet Einstein’s argument for why gravity should bend light is so obvious that it is manifestly true, as demonstrated by Arthur Eddington during the solar eclipse of 1919, launching Einstein to world-wide fame. It is also demonstrated by the beautiful gravitational lensing phenomenon of Einstein arcs. Einstein arcs are the distorted images of bright distant light sources in the universe caused by an intervening massive object, like a galaxy or galaxy cluster, that bends the light rays. A number of these arcs are seen in images of the Abel cluster of galaxies in Fig. 1.
Gravitational lensing (and microlensing) have become a major tool of discovery in astrophysics applied to the study of quasars, dark matter and even the search for exoplanets. However, as soon as Einstein conceived of gravitational lensing, in 1912, he abandoned the idea as too small and too unlikely to ever be useful, much like he abandoned the idea of gravitational waves in 1915 as similarly being too small ever to detect. It was only at the persistence of an amateur Czech scientist twenty years later that Einstein reluctantly agreed to publish his calculations on gravitational lensing.
The History of Gravitational Lensing
In 1912, only a few years after his “happy thought”, and fully three years before he published his definitive work on General Relativity, Einstein derived how light would be affected by a massive object, causing light from a distant source to be deflected like a lens. The historian of physics, Jürgen Renn discovered these derivations in Einstein’s notebooks while at the Max Planck Institute for the History of Science in Berlin in 1996 . However, Einstein also calculated the magnitude of the effect and dismissed it as too small, and so he never published it.
Years later, in 1936, Einstein received a visit from a Czech electrical engineer Rudi Mandl, an amateur scientist who had actually obtained a small stipend from the Czech government to visit Einstein at the Institute for Advanced Study at Princeton. Mandl had conceived of the possibility of gravitational lensing and wished to bring it to Einstein’s attention, thinking that the master would certainly know what to do with the idea. Einstein was obliging, redoing his calculations of 1912 and obtaining once again the results that made him believe that the effect would be too small to be seen. However, Mandl was persistent and pressed Einstein to publish the results, which he did . In his submission letter to the editor of Science, Einstein stated “Let me also thank you for your cooperation with the little publication, which Mister Mandl squeezed out of me. It is of little value, but it makes the poor guy happy”. Einstein’s pessimism was based on his thinking that isolated stars would be the only source of the gravitational lens (he did not “believe” in black holes), but in 1937 Fritz Zwicky at Cal Tech (a gadfly genius) suggested that the newly discovered phenomenon of “galaxy clusters” might provide the massive gravity that would be required to produce the effect. Although, to be visible, a distant source would need to be extremely bright.
Potential sources were discovered in the 1960’s using radio telescopes that discovered quasi-stellar objects (known as quasars) that are extremely bright and extremely far away. Quasars also appear in the visible range, and in 1979 a twin quasar was discovered by astronomers using the telescope at the Kitt Peak Obversvatory in Arizona–two quasars very close together that shared identical spectral fingerprints. The astronomers realized that it could be a twin image of a single quasar caused by gravitational lensing, which they published as a likely explanation. Although the finding was originally controversial, the twin-image was later confirmed, and many additional examples of gravitational lensing have since been discovered.
The Optics of Gravity and Light
Gravitational lenses are terrible optical instruments. A good imaging lens has two chief properties: 1) It produces increasing delay on a wavefront as the radial distance from the optic axis decreases; and 2) it deflects rays with increasing deflection angle as the radial distance of a ray increases away from the optic axis (the center of the lens). Both properties are part of the same effect: the conversion, by a lens, of an incident plane wave into a converging spherical wave. A third property of a good lens ensures minimal aberrations of the converging wave: a quadratic dependence of wavefront delay on radial distance from the optic axis. For instance, a parabolic lens produces a diffraction-limited focal spot.
Now consider the optical effects of gravity around a black hole. One of Einstein’s chief discoveries during his early investigations into the effects of gravity on light is the analogy of warped space-time as having an effective refractive index. Light propagates through space affected by gravity as if there were a refractive index associated with the gravitational potential. In a previous blog on the optics of gravity, I showed the simple derivation of the refractive effects of gravity on light based on the Schwarschild metric applied to a null geodesic of a light ray. The effective refractive index near a black hole is
This effective refractive index diverges at the Schwarzschild radius of the black hole. It produces the maximum delay, not on the optic axis as for a good lens, but at the finite distance RS. Furthermore, the maximum deflection also occurs at RS, and the deflection decreases with increasing radial distance. Both of these properties of gravitational lensing are opposite to the properties of a good lens. For this reason, the phrase “gravitational lensing” is a bit of a misnomer. Gravitating bodies certainly deflect light rays, but the resulting optical behavior is far from that of an imaging lens.
The path of a ray from a distant quasar, through the thin gravitational lens of a galaxy, and intersecting the location of the Earth, is shown in Fig. 2. The location of the quasar is a distance R from the “optic axis”. The un-deflected angular position is θ0, and with the intervening galaxy the image appears at the angular position θ. The angular magnification is therefore M = θ/θ0.
The deflection angles are related through
where b is the “impact parameter”
These two equations are solved to give to an expression that relates the unmagnified angle θ0 to the magnified angle θ as
is the angular size of the Einstein ring when the source is on the optic axis. The quadratic equation has two solutions that gives two images of the distant quasar. This is the origin of the “double image” that led to the first discovery of gravitational lensing in 1979.
When the distant quasar is on the optic axis, then θ0 = 0 and the deflection of the rays produces, not a double image, but an Einstein ring with an angular size of θE. For typical lensing objects, the angular size of Einstein rings are typically in the range of tens of microradians. The angular magnification for decreasing distance R diverges as
But this divergence is more a statement of the bad lens behavior than of actual image size. Because the gravitational lens is inverted (with greater deflection closer to the optic axis) compared to an ideal thin lens, it produces a virtual image ring that is closer than the original object, as in Fig. 3.
The location of the virtual image behind the gravitational lens (when the quasar is on the optic axis) is obtained from
If the quasar is much further from the lens than the Earth, then the image location is zi = -L1/2, or behind the lens by half the distance from the Earth to the lens. The longitudinal magnification is then
Note that while the transverse (angular) magnification diverges as the object approaches the optic axis, the longitudinal magnification remains finite but always greater than unity.
The Caustic Curves of Einstein Rings
Because gravitational lenses have such severe aberration relative to an ideal lens, and because the angles are so small, an alternate approach to understanding the optics of gravity is through the theory of light caustics. In a previous blog on the optics of caustics I described how sets of deflected rays of light become enclosed in envelopes that produce regions of high and low intensity. These envelopes are called caustics. Gravitational light deflection also causes caustics.
In addition to envelopes, it is also possible to trace the time delays caused by gravity on wavefronts. In the regions of the caustic envelopes, these wavefronts can fold back onto themselves so that different parts of the image arrive at different times coming from different directions.
An example of gravitational caustics is shown in Fig. 4. Rays are incident vertically on a gravitational thin lens which deflects the rays so that they overlap in the region below the lens. The red curves are selected wavefronts at three successively later times. The gravitational potential causes a time delay on the propgating front, with greater delays in regions of stronger gravitational potential. The envelope function that is tangent to the rays is called the caustic, here shown as the dense blue mesh. In this case there is a cusp in the caustic near z = -1 below the lens. The wavefronts become multiple-valued past the cusp
The intensity of the distant object past the lens is concentrated near the caustic envelope. The intensity of the caustic at z = -6 is shown in Fig. 5. The ring structure is the cross-sectional spatial intensity at the fixed observation plane, but a transform to the an angular image is one-to-one, so the caustic intensity distribution is also similar to the view of the Einstein ring from a position at z = -6 on the optic axis.
The gravitational potential is a function of the mass distribution in the gravitational lens. A different distribution with a flatter distribution of mass near the optic axis is shown in Fig. 6. There are multiple caustics in this case with multi-valued wavefronts. Because caustics are sensitive to mass distribution in the gravitational lens, astronomical observations of gravitational caustics can be used to back out the mass distribution, including dark matter or even distant exoplanets.
# -*- coding: utf-8 -*-
Created on Tue Mar 30 19:47:31 2021
@author: David Nolte
Introduction to Modern Dynamics, 2nd edition (Oxford University Press, 2019)
import numpy as np
from matplotlib import pyplot as plt
n = n0/(1 + abs(x)**expon)**(1/expon);
delt = 0.001
Ly = 10
Lx = 5
n0 = 1
expon = 2 # adjust this from 1 to 10
delx = 0.01
rng = np.int(Lx/delx)
x = delx*np.linspace(-rng,rng)
n = refindex(x)
dndx = np.diff(n)/np.diff(x)
lines = plt.plot(x,n)
lines2 = plt.plot(dndx)
Nloop = 160;
xd = np.zeros((Nloop,3))
yd = np.zeros((Nloop,3))
for loop in range(0,Nloop):
xp = -Lx + 2*Lx*(loop/Nloop)
plt.plot([xp, xp],[2, 0],'b',linewidth = 0.25)
thet = (refindex(xp+delt) - refindex(xp-delt))/(2*delt)
xb = xp + np.tan(thet)*Ly
plt.plot([xp, xb],[0, -Ly],'b',linewidth = 0.25)
for sloop in range(0,3):
delay = n0/(1 + abs(xp)**expon)**(1/expon) - n0
dis = 0.75*(sloop+1)**2 - delay
xfront = xp + np.sin(thet)*dis
yfront = -dis*np.cos(thet)
xd[loop,sloop] = xfront
yd[loop,sloop] = yfront
for sloop in range(0,3):
plt.plot(xd[:,sloop],yd[:,sloop],'r',linewidth = 0.5)
 J. Renn, T. Sauer and J. Stachel, “The Origin of Gravitational Lensing: A Postscript to Einstein’s 1936 Science Paper, Science 275. 184 (1997)
 A. Einstein, “Lens-Like Action of a Star by the Deviation of Light in the Gravitational Field”, Science 84, 506 (1936)
 (Here is an excellent review article on the topic.) J. Wambsganss, “Gravitational lensing as a powerful astrophysical tool: Multiple quasars, giant arcs and extrasolar planets,” Annalen Der Physik, vol. 15, no. 1-2, pp. 43-59, Jan-Feb (2006) SpringerLink