One of the most important conclusions from chaos theory is that not all random-looking processes are actually random. In deterministic chaos, structures such as strange attractors are not random at all but are fractal structures determined uniquely by the dynamics. But sometimes, in nature, processes really are random, or at least have to be treated as such because of their complexity. Brownian motion is a perfect example of this. At the microscopic level, the jostling of the Brownian particle can be understood in terms of deterministic momentum transfers from liquid atoms to the particle. But there are so many liquid particles that their individual influences cannot be directly predicted. In this situation, it is more fruitful to view the atomic collisions as a stochastic process with well-defined physical parameters and then study the problem statistically. This is what Einstein did in his famous 1905 paper that explained the statistical physics of Brownian motion.
Then there is the middle ground between deterministic mechanics and stochastic mechanics, where complex dynamics gains a stochastic component. This is what Paul Langevin did in 1908 when he generalized Einstein.
Paul Langevin (1872 – 1946) had the fortune to stand at the cross-roads of modern physics, making key contributions, while serving as a commentator expanding on the works of the giants like Einstein and Lorentz and Bohr. He was educated at the École Normale Supérieure and at the Sorbonne with a year in Cambridge studying with J. J. Thompson. At the Sorbonne he worked in the laboratory of Jean Perrin (1870 – 1942) who received the Nobel Prize in 1926 for the experimental work on Brownian motion that had set the stage for Einstein’s crucial analysis of the problem confirming the atomic nature of matter.
Langevin received his PhD in 1902 on the topic of x-ray ionization of gases and was appointed as a lecturer at the College de France to substitute in for Éleuthère Mascart (who was an influential French physicist in optics). In 1905 Langevin published several papers that delved into the problems of Lorentz contraction, coming very close to expressing the principles of relativity. This work later led Einstein to say that, had he delayed publishing his own 1905 paper on the principles of relativity, then Langevin might have gotten there first .
Also in 1905, Langevin published his most influential work, providing the theoretical foundations for the physics of paramagnetism and diamagnetism. He was working closely with Pierre Curie whose experimental work on magnetism had established the central temperature dependence of the phenomena. Langevin used the new molecular model of matter to derive the temperature dependence as well as the functional dependence on magnetic field. One surprising result was that only the valence electrons, moving relativistically, were needed to contribute to the molecular magnetic moment. This later became one of the motivations for Bohr’s model of multi-electron atoms.
Langevin suffered personal tragedy during World War II when the Vichy government arrested him because of his outspoken opposition to fascism. He was imprisoned and eventually released to house arrest. In 1942, his son-in-law was executed by the Nazis, and in 1943 his daughter was sent to Auschwitz. Fearing for his own life, Langevin escaped to Switzerland. He returned shortly after the liberation of Paris and was joined after the end of the war by his daughter who had survived Auschwitz and later served in the Assemblée Consultative as a communist member. Langevin passed away in 1946 and received a national funeral. His remains lie today in the Pantheon.
The Langevin Equation
In 1908, Langevin realized that Einstein’s 1905 theory on Brownian motion could be simplified while at the same time generalized. Langevin introduced a new quantity into theoretical physics—the stochastic force . With this new theoretical tool, he was able to work with diffusing particles in momentum space as dynamical objects with inertia buffeted by random forces, providing a Newtonian formulation for short-time effects that were averaged out and lost in Einstein’s approach.
Stochastic processes are understood by considering a dynamical flow that includes a random function. The resulting set of equations are called the Langevin equation, namely
where fa is a set of N regular functions, and σa is the standard deviation of the a-th process out of N. The stochastic functions ξa are in general non-differentiable but are integrable. They have zero mean, and no temporal correlations. The solution is an N-dimensional trajectory that has properties of a random walk superposed on the dynamics of the underlying mathematical flow.
As an example, take the case of a particle moving in a one-dimensional potential, subject to drag and to an additional stochastic force
where γ is the drag coefficient, U is a potential function and B is the velocity diffusion coefficient. The second term in the bottom equation is the classical force from a potential function, while the third term is the stochastic force. The crucial point is that the stochastic force causes jumps in velocity that integrate into displacements, creating a random walk superposed on the deterministic mechanics.
Random Walk in a Harmonic Potential
Diffusion of a particle in a weak harmonic potential is equivalent to a mass on a weak spring in a thermal bath. For short times, the particle motion looks like a random walk, but for long times, the mean-squared displacement must satisfy the equipartition relation
The Langevin equation is the starting point of motion under a stochastic force F’
where the second equation has been multiplied through by x. For a spherical particle of radius a, the viscous drag factor is
and η is the viscosity. The term on the left of the dynamical equation can be rewritten to give
It is then necessary to take averages. The last term on the right vanishes because of the random signs of xF’. However, the buffeting from the random force can be viewed as arising from an effective temperature. Then from equipartition on the velocity
Making the substitution y = <x2> gives
which is the dynamical equation for a particle in a harmonic potential subject to a constant effective force kBT. For small objects in viscous fluids, the inertial terms are negligible relative to the other terms (see Life at small Reynolds Number ), so the dynamic equation is
This solution at short times describes a diffusing particle (Fickian behavior) with a diffusion coefficient D. However, for long times the solution asymptotes to an equipartition value of <x2> = kBT/k. In the intermediate time regime, the particle is walking randomly, but the mean-squared displacement is no longer growing linearly with time.
Constrained motion shows clear saturation to the size set by the physical constraints (equipartition for an oscillator or compartment size for a freely diffusing particle ). However, if the experimental data do not clearly extend into the saturation time regime, then the fit to anomalous diffusion can lead to exponents that do not equal unity. This is illustrated in Fig. 3 with asymptotic MSD compared with the anomalous diffusion equation fit for the exponent β. Care must be exercised in the interpretation of the exponents obtained from anomalous diffusion experiments. In particular, all constrained motion leads to subdiffusive interpretations if measured at intermediate times.
Random Walk in a Double Potential
The harmonic potential has well-known asymptotic dynamics which makes the analytic treatment straightforward. However, the Langevin equation is general and can be applied to any potential function. Take a double-well potential as another example
The resulting Langevin equation can be solved numerically in the presence of random velocity jumps. A specific stochastic trajectory is shown in Fig. 4 that applies discrete velocity jumps using a normal distribution of jumps of variance 2B. The notable character of this trajectory, besides the random-walk character, is the ability of the particle to jump the barrier between the wells. In the deterministic system, the initial condition dictates which stable fixed point would be approached. In the stochastic system, there are random fluctuations that take the particle from one basin of attraction to the other.
The stochastic long-time probability distribution p(x,v) in Fig. 5 introduces an interesting new view of trajectories in state space that have a different character than typical state-space flows. If we think about starting a large number of systems with the same initial conditions, and then letting the stochastic dynamics take over, we can define a time-dependent probability distribution p(x,v,t) that describes the likely end-positions of an ensemble of trajectories on the state plane as a function of time. This introduces the idea of the trajectory of a probability cloud in state space, which has a strong analogy to time-dependent quantum mechanics. The Schrödinger equation can be viewed as a diffusion equation in complex time, which is the basis of a technique known as quantum Monte Carlo that solves for ground state wave functions using concepts of random walks. This goes beyond the topics of classical mechanics, and it shows how such diverse fields as econophysics, diffusion, and quantum mechanics can share common tools and language.
“Stochastic Chaos” sounds like an oxymoron. “Chaos” is usually synonymous with “deterministic chaos”, meaning that every next point on a trajectory is determined uniquely by its previous location–there is nothing random about the evolution of the dynamical system. It is only when one looks at long times, or at two nearby trajectories, that non-repeatable and non-predictable behavior emerges, so there is nothing stochastic about it.
On the other hand, there is nothing wrong with adding a stochastic function to the right-hand side of a deterministic flow–just as in the Langevin equation. One question immediately arises: if chaos has sensitivity to initial conditions (SIC), wouldn’t it be highly susceptible to constant buffeting by a stochastic force? Let’s take a look!
To the well-known Rössler model, add a stochastic function to one of the three equations,
in this case to the y-dot equation. This is just like the stochastic term in the random walks in the harmonic and double-well potentials. The solution is shown in Fig. 6. In addition to the familiar time-series of the Rössler model, there are stochastic jumps in the y-variable. An x-y projection similarly shows the familiar signature of the model, and the density of trajectory points is shown in the density plot on the right. The rms jump size for this simulation is approximately 10%.
Now for the supposition that because chaos has sensitivity to initial conditions that it should be highly susceptible to stochastic contributions–the answer can be seen in Fig. 7 in the state-space densities. Other than a slightly more fuzzy density for the stochastic case, the general behavior of the Rössler strange attractor is retained. The attractor is highly stable against the stochastic fluctuations. This demonstrates just how robust deterministic chaos is.
On the other hand, there is a saddle point in the Rössler dynamics a bit below the lowest part of the strange attractor in the figure, and if the stochastic jumps are too large, then the dynamics become unstable and diverge. A hint at this is already seen in the time series in Fig. 6 that shows the nearly closed orbit that occurs transiently at large negative y values. This is near the saddle point, and this trajectory is dangerously close to going unstable. Therefore, while the attractor itself is stable, anything that drives a dynamical system to a saddle point will destabilize it, so too much stochasticity can cause a sudden destruction of the attractor.
 E. M. Purcell, “Life at Low Reynolds-Number,” American Journal of Physics, vol. 45, no. 1, pp. 3-11, (1977)
 Ritchie, K., Shan, X.Y., Kondo, J., Iwasawa, K., Fujiwara, T., Kusumi, A.: Detection of non- Brownian diffusion in the cell membrane in single molecule tracking. Biophys. J. 88(3), 2266–2277 (2005)
Physics in high dimensions is becoming the norm in modern dynamics. It is not only that string theory operates in ten dimensions (plus one for time), but virtually every complex dynamical system is described and analyzed within state spaces of high dimensionality. Population dynamics, for instance, may describe hundreds or thousands of different species, each of whose time-varying populations define a separate axis in a high-dimensional space. Coupled mechanical systems likewise may have hundreds or thousands (or more) of degrees of freedom that are described in high-dimensional phase space.
In high-dimensional landscapes, mountain ridges are much more common than mountain peaks. This has profound consequences for the evolution of life, the dynamics of complex systems, and the power of machine learning.
For these reasons, as physics students today are being increasingly exposed to the challenges and problems of high-dimensional dynamics, it is important to build tools they can use to give them an intuitive feeling for the highly unintuitive behavior of systems in high-D.
Within the rapidly-developing field of machine learning, which often deals with landscapes (loss functions or objective functions) in high dimensions that need to be minimized, high dimensions are usually referred to in the negative as “The Curse of Dimensionality”.
Dimensionality might be viewed as a curse for several reasons. First, it is almost impossible to visualize data in dimensions higher than d = 4 (the fourth dimension can sometimes be visualized using colors or time series). Second, too many degrees of freedom create too many variables to fit or model, leading to the classic problem of overfitting. Put simply, there is an absurdly large amount of room in high dimensions. Third, our intuition about relationships among areas and volumes are highly biased by our low-dimensional 3D experiences, causing us to have serious misconceptions about geometric objects in high-dimensional spaces. Physical processes occurring in 3D can be over-generalized to give preconceived notions that just don’t hold true in higher dimensions.
Take, for example, the random walk. It is usually taught starting from a 1-dimensional random walk (flipping a coin) that is then extended to 2D and then to 3D…most textbooks stopping there. But random walks in high dimensions are the rule rather than the exception in complex systems. One example that is especially important in this context is the problem of molecular evolution. Each site on a genome represents an independent degree of freedom, and molecular evolution can be described as a random walk through that space, but the space of all possible genetic mutations is enormous. Faced with such an astronomically large set of permutations, it is difficult to conceive of how random mutations could possibly create something as complex as, say, ATP synthase which is the basis of all higher bioenergetics. Fortunately, the answer to this puzzle lies in the physics of random walks in high dimensions.
Why Ten Dimensions?
This blog presents the physics of random walks in 10 dimensions. Actually, there is nothing special about 10 dimensions versus 9 or 11 or 20, but it gives a convenient demonstration of high-dimensional physics for several reasons. First, it is high enough above our 3 dimensions that there is no hope to visualize it effectively, even by using projections, so it forces us to contend with the intrinsic “unvisualizability” of high dimensions. Second, ten dimensions is just big enough that it behaves roughly like any higher dimension, at least when it comes to random walks. Third, it is about as big as can be handled with typical memory sizes of computers. For instance, a ten-dimensional hypercubic lattice with 10 discrete sites along each dimension has 10^10 lattice points (10 Billion or 10 Gigs) which is about the limit of what a typical computer can handle with internal memory.
As a starting point for visualization, let’s begin with the well-known 4D hypercube but extend it to a 4D hyperlattice with three values along each dimension instead of two. The resulting 4D lattice can be displayed in 2D as a network with 3^4 = 81 nodes and 216 links or edges. The result is shown in Fig. 1, represented in two dimensions as a network graph with nodes and edges. Each node has four links with neighbors. Despite the apparent 3D look that this graph has about it, if you look closely you will see the frustration that occurs when trying to link to 4 neighbors, causing many long-distance links.
We can also look at a 10D hypercube that has 2^10 = 1024 nodes and 5120 edges, shown in Fig. 2. It is a bit difficult to see the hypercubic symmetry when presented in 2D, but each node has exactly 10 links.
Extending this 10D lattice to 10 positions instead of 2 and trying to visualize it is prohibitive, since the resulting graph in 2D just looks like a mass of overlapping circles. However, our interest extends not just to ten locations per dimension, but to an unlimited number of locations. This is the 10D infinite lattice on which we want to explore the physics of the random walk.
Diffusion in Ten Dimensions
An unconstrained random walk in 10D is just a minimal extension beyond a simple random walk in 1D. Because each dimension is independent, a single random walker takes a random step along any of the 10 dimensions at each iteration so that motion in any one of the 10 dimensions is just a 1D random walk. Therefore, a simple way to visualize this random walk in 10D is simply to plot the walk against each dimension, as in Fig. 3. There is one chance in ten that the walker will take a positive or negative step along any given dimension at each time point.
An alternate visualization of the 10D random walker is shown in Fig. 4 for the same data as Fig. 3. In this case the displacement is color coded, and each column is a different dimension. Time is on the vertical axis (starting at the top and increasing downward). This type of color map can easily be extended to hundreds of dimensions. Each row is a position vector of the single walker in the 10D space
In the 10D hyperlattice in this section, all lattice sites are accessible at each time point, so there is no constraint preventing the walk from visiting a previously-visited node. There is a possible adjustment that can be made to the walk that prevents it from ever crossing its own path. This is known as a self-avoiding-walk (SAW). In two dimensions, there is a major difference in the geometric and dynamical properties of an ordinary walk and an SAW. However, in dimensions larger than 4, it turns out that there are so many possibilities of where to go (high-dimensional spaces have so much free room) that it is highly unlikely that a random walk will ever cross itself. Therefore, in our 10D hyperlattice we do not need to make the distinction between an ordinary walk and a self-avoiding-walk. However, there are other constraints that can be imposed that mimic how complex systems evolve in time, and these constraints can have important consequences, as we see next.
Random Walk in a Maximally Rough Landscape
In the infinite hyperlattice of the previous section, all lattice sites are the same and are all equally accessible. However, in the study of complex systems, it is common to assign a value to each node in a high-dimensional lattice. This value can be assigned by a potential function, producing a high-dimensional potential landscape over the lattice geometry. Or the value might be the survival fitness of a species, producing a high-dimensional fitness landscape that governs how species compete and evolve. Or the value might be a loss function (an objective function) in a minimization problem from multivariate analysis or machine learning. In all of these cases, the scalar value on the nodes defines a landscape over which a state point executes a walk. The question then becomes, what are the properties of a landscape in high dimensions, and how does it affect a random walker?
As an example, let’s consider a landscape that is completely random point-to-point. There are no correlations in this landscape, making it maximally rough. Then we require that a random walker takes a walk along iso-potentials in this landscape, never increasing and never decreasing its potential. Beginning with our spatial intuition living in 3D space, we might be concerned that such a walker would quickly get confined in some area of the lanscape. Think of a 2D topo map with countour lines drawn on it — If we start at a certain elevation on a mountain side, then if we must walk along directions that maintain our elevation, we stay on a given contour and eventually come back to our starting point after circling the mountain peak — we are trapped! But this intuition informed by our 3D lives is misleading. What happens in our 10D hyperlattice?
To make the example easy to analyze, let’s assume that our potential function is restricted to N discrete values. This means that of the 10 neighbors to a given walker site, on average only 10/N are likely to have the same potential value as the given walker site. This constrains the available sites for the walker, and it converts the uniform hyperlattice into a hyperlattice site percolation problem.
Percolation theory is a fascinating topic in statistical physics. There are many deep concepts that come from asking simple questions about how nodes are connected across a network. The most important aspect of percolation theory is the concept of a percolation threshold. Starting with a complete network that is connected end-to-end, start removing nodes at random. For some critical fraction of nodes removed (on average) there will no longer be a single connected cluster that spans the network. This critical fraction is known as the percolation threshold. Above the percolation threshold, a random walker can get from one part of the network to another. Below the percolation threshold, the random walker is confined to a local cluster.
If a hyperlattice has N discrete values for the landscape potential (or height, or contour) and if a random walker can only move to site that has the same value as the walker’s current value (remains on the level set), then only a fraction of the hyperlattice sites are available to the walker, and the question of whether the walker can find a path the spans the hyperlattice becomes simply a question of how the fraction of available sites relates to the percolation threshold.
The percolation threshold for hyperlattices is well known. For reasonably high dimensions, it is given to good accuracy by
where d is the dimension of the hyperlattice. For a 10D hyperlattice the percolation threshold is pc(10) = 0.0568, or about 6%. Therefore, if more than 6% of the sites of the hyperlattice have the same value as the walker’s current site, then the walker is free to roam about the hyperlattice.
If there are N = 5 discrete values for the potential, then 20% of the sites are available, which is above the percolation threshold, and walkers can go as far as they want. This statement holds true no matter what the starting value is. It might be 5, which means the walker is as high on the landscape as they can get. Or it might be 1, which means the walker is as low on the landscape as they can get. Yet even if they are at the top, if the available site fraction is above the percolation threshold, then the walker can stay on the high mountain ridge, spanning the landscape. The same is true if they start at the bottom of a valley. Therefore, mountain ridges are very common, as are deep valleys, yet they allow full mobility about the geography. On the other hand, a so-called mountain peak would be a 5 surrounded by 4’s or lower. The odds for having this happen in 10D are 0.2*(1-0.8^10) = 0.18. Then the total density of mountain peaks, in a 10D hyperlattice with 5 potential values, is only 18%. Therefore, mountain peaks are rare in 10D, while mountain ridges are common. In even higher dimensions, the percolation threshold decreases roughly inversely with the dimensionality, and mountain peaks become extremely rare and play virtually no part in walks about the landscape.
To illustrate this point, Fig. 5 is the same 10D network that is in Fig. 2, but only the nodes sharing the same value are shown for N = 5, which means that only 20% of the nodes are accessible to a walker who stays only on nodes with the same values. There is a “giant cluster” that remains connected, spanning the original network. If the original network is infinite, then the giant cluster is also infinite but contains a finite fraction of the nodes.
The quantitative details of the random walk can change depending on the proximity of the sub-networks (the clusters, the ridges or the level sets) to the percolation threshold. For instance, a random walker in D =10 with N = 5 is shown in Fig. 6. The diffusion is a bit slower than in the unconstrained walk of Figs. 3 and 4. But the ability to wander about the 10D space is retained.
This is then the general important result: In high-dimensional landscapes, mountain ridges are much more common than mountain peaks. This has profound consequences for the evolution of life, the dynamics of complex systems, and the power of machine learning.
Consequences for Evolution and Machine Learning
When the high-dimensional space is the space of possible mutations on a genome, and when the landscape is a fitness landscape that assigns a survival advantage for one mutation relative to others, then the random walk describes the evolution of a species across generations. The prevalence of ridges, or more generally level sets, in high dimensions has a major consequence for the evolutionary process, because a species can walk along a level set acquiring many possible mutations that have only neutral effects on the survivability of the species. At the same time, the genetic make-up is constantly drifting around in this “neutral network”, allowing the species’ genome to access distant parts of the space. Then, at some point, natural selection may tip the species up a nearby (but rare) peak, and a new equilibrium is attained for the species.
One of the early criticisms of fitness landscapes was the (erroneous) criticism that for a species to move from one fitness peak to another, it would have to go down and cross wide valleys of low fitness to get to another peak. But this was a left-over from thinking in 3D. In high-D, neutral networks are ubiquitous, and a mutation can take a step away from one fitness peak onto one of the neutral networks, which can be sampled by a random walk until the state is near some distant peak. It is no longer necessary to think in terms of high peaks and low valleys of fitness — just random walks. The evolution of extremely complex structures, like ATP synthase, can then be understood as a random walk along networks of nearly-neutral fitness — once our 3D biases are eliminated.
The same arguments hold for many situations in machine learning and especially deep learning. When training a deep neural network, there can be thousands of neural weights that need to be trained through the minimization of a loss function, also known as an objective function. The loss function is the equivalent to a potential, and minimizing the loss function over the thousands of dimensions is the same problem as maximizing the fitness of an evolving species.
At first look, one might think that deep learning is doomed to failure. We have all learned, from the earliest days in calculus, that enough adjustable parameter can fit anything, but the fit is meaningless because it predicts nothing. Deep learning seems to be the worst example of this. How can fitting thousands of adjustable parameters be useful when the dimensionality of the optimization space is orders of magnitude larger than the degrees of freedom of the system being modeled?
The answer comes from the geometry of high dimensions. The prevalence of neutral networks in high dimensions gives lots of chances to escape local minima. In fact, local minima are actually rare in high dimensions, and when they do occur, there is a neutral network nearby onto which they can escape (if the effective temperature of the learning process is set sufficiently high). Therefore, despite the insanely large number of adjustable parameters, general solutions, that are meaningful and predictive, can be found by adding random walks around the objective landscape as a partial strategy in combination with gradient descent.
Given the superficial analogy of deep learning to the human mind, the geometry of random walks in ultra-high dimensions may partially explain our own intelligence and consciousness.
S. Gravilet, Fitness Landscapes and the Origins of Species. Princeton University Press, 2004.
M. Kimura, The Neutral Theory of Molecular Evolution. Cambridge University Press, 1968.
One of the hardest aspects to grasp about relativity theory is the question of whether an event “look as if” it is doing something, or whether it “actually is” doing something.
Take, for instance, the classic twin paradox of relativity theory in which there are twins who wear identical high-precision wrist watches. One of them rockets off to Alpha Centauri at relativistic speeds and returns while the other twin stays on Earth. Each twin sees the other twin’s clock running slowly because of relativistic time dilation. Yet when they get back together and, standing side-by-side, they compare their watches—the twin who went to Alpha Centauri is actually younger than the other, despite the paradox. The relativistic effect of time dilation is “real”, not just apparent, regardless of whether they come back together to do the comparison.
Yet this understanding of relativistic effects took many years, even decades, to gain acceptance after Einstein proposed them. He was aware himself that key experiments were required to prove that relativistic effects are real and not just apparent.
Einstein and the Transverse Doppler Effect
In 1905 Einstein used his new theory of special relativity to predict observable consequences that included a general treatment of the relativistic Doppler effect . This included the effects of time dilation in addition to the longitudinal effect of the source chasing the wave. Time dilation produced a correction to Doppler’s original expression for the longitudinal effect that became significant at speeds approaching the speed of light. More significantly, it predicted a transverse Doppler effect for a source moving along a line perpendicular to the line of sight to an observer. This effect had not been predicted either by Christian Doppler (1803 – 1853) or by Woldemar Voigt (1850 – 1919).
Despite the generally positive reception of Einstein’s theory of special relativity, some of its consequences were anathema to many physicists at the time. A key stumbling block was the question whether relativistic effects, like moving clocks running slowly, were only apparent, or were actually real, and Einstein had to fight to convince others of its reality. When Johannes Stark (1874 – 1957) observed Doppler line shifts in ion beams called “canal rays” in 1906 (Stark received the 1919 Nobel prize in part for this discovery) , Einstein promptly published a paper suggesting how the canal rays could be used in a transverse geometry to directly detect time dilation through the transverse Doppler effect . Thirty years passed before the experiment was performed with sufficient accuracy by Herbert Ives and G. R. Stilwell in 1938 to measure the transverse Doppler effect . Ironically, even at this late date, Ives and Stilwell were convinced that their experiment had disproved Einstein’s time dilation by supporting Lorentz’ contraction theory of the electron. The Ives-Stilwell experiment was the first direct test of time dilation, followed in 1940 by muon lifetime measurements .
A) Transverse Doppler Shift Relative to EmissionAngle
The Doppler effect varies between blue shifts in the forward direction to red shifts in the backward direction, with a smooth variation in Doppler shift as a function of the emission angle. Consider the configuration shown in Fig. 1 for light emitted from a source moving at speed v and emitting at an angle θ0 in the receiver frame. The source moves a distance vT in the time of a single emission cycle (assume a harmonic wave). In that time T (which is the period of oscillation of the light source — or the period of a clock if we think of it putting out light pulses) the light travels a distance cT before another cycle begins (or another pulse is emitted).
The observed wavelength in the receiver frame is thus given by
where T is the emission period of the moving source. Importantly, the emission period is time dilated relative to the proper emission time of the source
This expression can be evaluated for several special cases:
a) θ0 = 0 for forward emission
which is the relativistic blue shift for longitudinal motion in the direction of the receiver.
b) θ0 = π for backward emission
which is the relativistic red shift for longitudinal motion away from the receiver
c) θ0 = π/2 for transverse emission
This transverse Doppler effect for emission at right angles is a red shift, caused only by the time dilation of the moving light source. This is the effect proposed by Einstein and observed by Stark that proved moving clocks tick slowly. But it is not the only way to view the transverse Doppler effect.
B) Transverse Doppler Shift Relative to Angle at Reception
A different option for viewing the transverse Doppler effect is the angle to the moving source at the moment that the light is detected. The geometry of this configuration relative to the previous is illustrated in Fig. 2.
The transverse distance to the detection point is
The length of the line connecting the detection point P with the location of the light source at the moment of detection is (using the law of cosines)
Combining with the first equation gives
An equivalent expression is obtained as
Note that this result, relating θ1 to θ0, is independent of the distance to the observation point.
When θ1 = π/2, then
for which the Doppler effect is
which is a blue shift. This creates the unexpected result that sin θ0 = π/2 produces a red shift, while sin θ1 = π/2 produces a blue shift. The question could be asked: which one represents time dilation? In fact, it is sin θ0 = π/2 that produces time dilation exclusively, because in that configuration there is no foreshortening effect on the wavelength–only the emission time.
C) Compromise: The Null Transverse Doppler Shift
The previous two configurations each could be used as a definition for the transverse Doppler effect. But one gives a red shift and one gives a blue shift, which seems contradictory. Therefore, one might try to strike a compromise between these two cases so that sin θ1 = sin θ0, and the configuration is shown in Fig. 3.
This is the case when θ1 + θ2 = π. The sines of the two angles are equal, yielding
which is solved for
Inserting this into the Doppler equation gives
where the Taylor’s expansion of the denominator (at low speed) cancels the numerator to give zero net Doppler shift. This compromise configuration represents the condition of null Doppler frequency shift. However, for speeds approaching the speed of light, the net effect is a lengthening of the wavelength, dominated by time dilation, causing a red shift.
D) Source in Circular Motion Around Receiver
An interesting twist can be added to the problem of the transverse Doppler effect: put the source or receiver into circular motion, one about the other. In the case of a source in circular motion around the receiver, it is easy to see that this looks just like case A) above for θ0 = π/2, which is the red shift caused by the time dilation of the moving source
However, there is the possible complication that the source is no longer in an inertial frame (it experiences angular acceleration) and therefore it is in the realm of general relativity instead of special relativity. In fact, it was Einstein’s solution to this problem that led him to propose the Equivalence Principle and make his first calculations on the deflection of light by gravity. His solution was to think of an infinite number of inertial frames, each of which was instantaneously co-moving with the same linear velocity as the source. These co-moving frames are inertial and can be analyzed using the principles of special relativity. The general relativistic effects come from slipping from one inertial co-moving frame to the next. But in the case of the circular transverse Doppler effect, each instantaneously co-moving frame has the exact configuration as case A) above, and so the wavelength is red shifted exactly by the time dilation.
E) Receiver in Circular Motion Around Source
With the notion of co-moving inertial frames now in hand, this configuration is exactly the same as case B) above, and the wavelength is blue shifted
 A. Einstein, “On the electrodynamics of moving bodies,” Annalen Der Physik, vol. 17, no. 10, pp. 891-921, Sep (1905)
 D. D. Nolte, “The Fall and Rise of the Doppler Effect,” Physics Today, vol. 73, no. 3, pp. 31-35, Mar (2020)
 J. Stark, W. Hermann, and S. Kinoshita, “The Doppler effect in the spectrum of mercury,” Annalen Der Physik, vol. 21, pp. 462-469, Nov 1906.
 A. Einstein, “Possibility of a new examination of the relativity principle,” Annalen Der Physik, vol. 23, no. 6, pp. 197-198, May (1907)
 H. E. Ives and G. R. Stilwell, “An experimental study of the rate of a moving atomic clock,” Journal of the Optical Society of America, vol. 28, p. 215, 1938.
 B. Rossi and D. B. Hall, “Variation of the Rate of Decay of Mesotrons with Momentum,” Physical Review, vol. 59, pp. 223–228, 1941.
… GR combined with nonlinear synchronization yields the novel phenomenon of a “synchronization cascade”.
Imagine a space ship containing a collection of highly-accurate atomic clocks factory-set to arbitrary precision at the space-ship factory before launch. The clocks are lined up with precisely-equal spacing along the axis of the space ship, which should allow the astronauts to study events in spacetime to high accuracy as they orbit neutron stars or black holes. Despite all the precision, spacetime itself will conspire to detune the clocks. Yet all is not lost. Using the physics of nonlinear synchronization, the astronauts can bring all the clocks together to a compromise frequency—locking all the clocks to a common rate. This blog post shows how this can happen.
Synchronization of Oscillators
The simplest synchronization problem is two “phase oscillators” coupled with a symmetric nonlinearity. The dynamical flow is
where ωk are the individual angular frequencies and g is the coupling constant. When g is greater than the difference Δω, then the two oscillators, despite having different initial frequencies, will find a stable fixed point and lock to a compromise frequency.
Taking this model to N phase oscillators creates the well-known Kuramoto model that is characterized by a relatively sharp mean-field phase transition leading to global synchronization. The model averages N phase oscillators to a mean field where g is the coupling coefficient, K is the mean amplitude, Θ is the mean phase, and ω-bar is the mean frequency. The dynamics are given by
The last equation is the final mean-field equation that synchronizes each individual oscillator to the mean field. For a large number of oscillators that are globally coupled to each other, increasing the coupling has little effect on the oscillators until a critical threshold is crossed, after which all the oscillators synchronize with each other. This is known as the Kuramoto synchronization transition, shown in Fig. 2 for 20 oscillators with uniformly distributed initial frequencies. Note that the critical coupling constant gc is roughly half of the spread of initial frequencies.
The question that this blog seeks to answer is how this synchronization mechanism may be used in a space craft exploring the strong gravity around neutron stars or black holes. The key to answering this question is the metric tensor for this system
where the first term is the time-like term g00 that affects ticking clocks, and the second term is the space-like term that affects the length of the space craft.
Kuramoto versus the Neutron Star
Consider the space craft holding a steady radius above a neutron star, as in Fig. 3. For simplicity, hold the craft stationary rather than in an orbit to remove the details of rotating frames. Because each clock is at a different gravitational potential, it runs at a different rate because of gravitational time dilation–clocks nearer to the neutron star run slower than clocks farther away. There is also a gravitational length contraction of the space craft, which modifies the clock rates as well.
The analysis starts by incorporating the first-order approximation of time dilation through the component g00. The component is brought in through the period of oscillations. All frequencies are referenced to the base oscillator that has the angular rate ω0, and the other frequencies are primed. As we consider oscillators higher in the space craft at positions R + h, the 1/(R+h) term in g00 decreases as does the offset between each successive oscillator.
The dynamical equations for a system for only two clocks, coupled through the constant k, are
These are combined to a single equation by considering the phase difference
The two clocks will synchronize to a compromise frequency for the critical coupling coefficient
Now, if there is a string of N clocks, as in Fig. 3, the question is how the frequencies will spread out by gravitational time dilation, and what the entrainment of the frequencies to a common compromise frequency looks like. If the ship is located at some distance from the neutron star, then the gravitational potential at one clock to the next is approximately linear, and coupling them would produce the classic Kuramoto transition.
However, if the ship is much closer to the neutron star, so that the gravitational potential is no longer linear, then there is a “fan-out” of frequencies, with the bottom-most clock ticking much more slowly than the top-most clock. Coupling these clocks produces a modified, or “stretched”, Kuramoto transition as in Fig. 4.
In the two examples in Fig. 4, the bottom-most clock is just above the radius of the neutron star (at R0 = 4RS for a solar-mass neutron star, where RS is the Schwarzschild radius) and at twice that radius (at R0 = 8RS). The length of the ship, along which the clocks are distributed, is RS in this example. This may seem unrealistically large, but we could imagine a regular-sized ship supporting a long stiff cable dangling below it composed of carbon nanotubes that has the clocks distributed evenly on it, with the bottom-most clock at the radius R0. In fact, this might be a reasonable design for exploring spacetime events near a neutron star (although even carbon nanotubes would not be able to withstand the strain).
Kuramoto versus the Black Hole
Against expectation, exploring spacetime around a black hole is actually easier than around a neutron star, because there is no physical surface at the Schwarzschild radius RS, and gravitational tidal forces can be small for large black holes. In fact, one of the most unintuitive aspects of black holes pertains to a space ship falling into one. A distant observer sees the space ship contracting to zero length and the clocks slowing down and stopping as the space ship approaches the Schwarzschild radius asymptotically, but never crossing it. However, on board the ship, all appears normal as it crosses the Schwarzschild radius. To the astronaut inside, there is is a gravitational potential inside the space ship that causes the clocks at the base to run more slowly than the upper clocks, and length contraction affects the spacing a little, but otherwise there is no singularity as the event horizon is passed. This appears as a classic “paradox” of physics, with two different observers seeing paradoxically different behaviors.
The resolution of this paradox lies in the differential geometry of the two observers. Each approximates spacetime with a Euclidean coordinate system that matches the local coordinates. The distant observer references the warped geometry to this “chart”, which produces the apparent divergence of the Schwarzschild metric at RS. However, the astronaut inside the space ship has her own flat chart to which she references the locally warped space time around the ship. Therefore, it is the differential changes, referenced to the ships coordinate origin, that capture gravitational time dilation and length contraction. Because the synchronization takes place in the local coordinate system of the ship, this is the coordinate system that goes into the dynamical equations for synchronization. Taking this approach, the shifts in the clock rates are given by the derivative of the metric as
where hn is the height of the n-th clock above R0.
Fig. 5 shows the entrainment plot for the black hole. The plot noticeably has a much smoother transition. In this higher mass case, the system does not have as many hard coupling transitions and instead exhibits smooth behavior for global coupling. This is the Kuramoto “cascade”. Contrast the behavior of Fig. 5 (left) to the classic Kuramoto transition of Fig. 2. The increasing frequency separations near the black hole produces a succession of frequency locks as the coupling coefficient increases. For comparison, the case of linear coupling along the cable is shown in Fig. 5 on the right. The cascade is now accompanied with interesting oscillations as one clock entrains with a neighbor, only to be pulled back by interaction with locked subclusters.
Now let us consider what role the spatial component of the metric tensor plays in the synchronization. The spatial component causes the space between the oscillators to decrease closer to the supermassive object. This would cause the oscillators to entrain faster because the bottom oscillators that entrain the slowest would be closer together, but the top oscillators would entrain slower since they are a farther distance apart, as in Fig. 6.
In terms of the local coordinates of the space ship, the locations of each clock are
These values for hn can be put into the equation for ωn above. But it is clear that this produces a second order effect. Even at the event horizon, this effect is only a fraction of the shifts caused by g00 directly on the clocks. This is in contrast to what a distant observer sees–the clock separations decreasing to zero, which would seem to decrease the frequency shifts. But the synchronization coupling is performed in the ship frame, not the distant frame, so the astronaut can safely ignore this contribution.
As a final exploration of the black hole, before we leave it behind, look at the behavior for different values of R0 in Fig. 7. At 4RS, the Kuramoto transition is stretched. At 2RS there is a partial Kuramoto transition for the upper clocks, that then stretch into a cascade of locking events for the lower clocks. At 1RS we see the full cascade as before.
Note from the Editor:
This blog post by Moira Andrews is based on her final project for Phys 411, upper division undergraduate mechanics, at Purdue University. Students are asked to combine two seemingly-unrelated aspects of modern dynamics and explore the results. Moira thought of synchronizing clocks that are experiencing gravitational time dilation near a massive body. This is a nice example of how GR combined with nonlinear synchronization yields the novel phenomenon of a “synchronization cascade”.
Cheng, T.-P. (2010). Relativity, Gravitation and Cosmology. Oxford University Press.
“Society is founded on hero worship”, wrote Thomas Carlyle (1795 – 1881) in his 1840 lecture on “Hero as Divinity”—and the society of physicists is no different. Among physicists, the hero is the genius—the monomyth who journeys into the supernatural realm of high mathematics, engages in single combat against chaos and confusion, gains enlightenment in the mysteries of the universe, and returns home to share the new understanding. If the hero is endowed with unusual talent and achieves greatness, then mythologies are woven, creating shadows that can grow and eclipse the truth and the work of others, bestowing upon the hero recognitions that are not entirely deserved.
“Gentlemen! The views of space and time which I wish to lay before you … They are radical. Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.”
Herman Minkowski (1908)
The greatest hero of physics of the twentieth century, without question, is Albert Einstein. He is the person most responsible for the development of “Modern Physics” that encompasses:
Relativity theory (both special and general),
Quantum theory (he invented the quantum in 1905—see my blog),
Astrophysics (his field equations of general relativity were solved by Schwarzschild in 1916 to predict event horizons of black holes, and he solved his own equations to predict gravitational waves that were discovered in 2015),
Cosmology (his cosmological constant is now recognized as the mysterious dark energy that was discovered in 2000), and
Solid state physics (his explanation of the specific heat of crystals inaugurated the field of quantum matter).
Einstein made so many seminal contributions to so many sub-fields of physics that it defies comprehension—hence he is mythologized as genius, able to see into the depths of reality with unique insight. He deserves his reputation as the greatest physicist of the twentieth century—he has my vote, and he was chosen by Time magazine in 2000 as the Man of the Century. But as his shadow has grown, it has eclipsed and even assimilated the work of others—work that he initially criticized and dismissed, yet later embraced so whole-heartedly that he is mistakenly given credit for its discovery.
For instance, when we think of Einstein, the first thing that pops into our minds is probably “spacetime”. He himself wrote several popular accounts of relativity that incorporated the view that spacetime is the natural geometry within which so many of the non-intuitive properties of relativity can be understood. When we think of time being mixed with space, making it seem that position coordinates and time coordinates share an equal place in the description of relativistic physics, it is common to attribute this understanding to Einstein. Yet Einstein initially resisted this viewpoint and even disparaged it when he first heard it!
Spacetime was the brain-child of Hermann Minkowski.
Minkowski in Königsberg
Hermann Minkowski was born in 1864 in Russia to German parents who moved to the city of Königsberg (King’s Mountain) in East Prussia when he was eight years old. He entered the university in Königsberg in 1880 when he was sixteen. Within a year, when he was only seventeen years old, and while he was still a student at the University, Minkowski responded to an announcement of the Mathematics Prize of the French Academy of Sciences in 1881. When he submitted is prize-winning memoire, he could have had no idea that it was starting him down a path that would lead him years later to revolutionary views.
The specific Prize challenge of 1881 was to find the number of representations of an integer as a sum of five squares of integers. For instance, every integer n > 33 can be expressed as the sum of five nonzero squares. As an example, 42 = 22 + 22 + 32 + 32 + 42, which is the only representation for that number. However, there are five representation for n = 53
The task of enumerating these representations draws from the theory of quadratic forms. A quadratic form is a function of products of numbers with integer coefficients, such as ax2 + bxy + cy2 and ax2 + by2 + cz2 + dxy + exz + fyz. In number theory, one seeks to find integer solutions for which the quadratic form equals an integer. For instance, the Pythagorean theorem x2 + y2 = n2 for integers is a quadratic form for which there are many integer solutions (x,y,n), known as Pythagorean triplets, such as
The topic of quadratic forms gained special significance after the work of Bernhard Riemann who established the properties of metric spaces based on the metric expression
for infinitesimal distance in a D-dimensional metric space. This is a generalization of Euclidean distance to more general non-Euclidean spaces that may have curvature. Minkowski would later use this expression to great advantage, developing a “Geometry of Numbers”  as he delved ever deeper into quadratic forms and their uses in number theory.
Minkowski in Göttingen
After graduating with a doctoral degree in 1885 from Königsberg, Minkowski did his habilitation at the university of Bonn and began teaching, moving back to Königsberg in 1892 and then to Zurich in 1894 (where one of his students was a somewhat lazy and unimpressive Albert Einstein). A few years later he was given an offer that he could not refuse.
At the turn of the 20th century, the place to be in mathematics was at the University of Göttingen. It had a long tradition of mathematical giants that included Carl Friedrich Gauss, Bernhard Riemann, Peter Dirichlet, and Felix Klein. Under the guidance of Felix Klein, Göttingen mathematics had undergone a renaissance. For instance, Klein had attracted Hilbert from the University of Königsberg in 1895. David Hilbert had known Minkowski when they were both students in Königsberg, and Hilbert extended an invitation to Minkowski to join him in Göttingen, which Minkowski accepted in 1902.
A few years after Minkowski arrived at Göttingen, the relativity revolution broke, and both Minkowski and Hilbert began working on mathematical aspects of the new physics. They organized a colloquium dedicated to relativity and related topics, and on Nov. 5, 1907 Minkowski gave his first tentative address on the geometry of relativity.
Because Minkowski’s specialty was quadratic forms, and given his understanding of Riemann’s work, he was perfectly situated to apply his theory of quadratic forms and invariants to the Lorentz transformations derived by Poincaré and Einstein. Although Poincaré had published a paper in 1906 that showed that the Lorentz transformation was a generalized rotation in four-dimensional space , Poincaré continued to discuss space and time as separate phenomena, as did Einstein. For them, simultaneity was no longer an invariant, but events in time were still events in time and not somehow mixed with space-like properties. Minkowski recognized that Poincaré had missed an opportunity to define a four-dimensional vector space filled by four-vectors that captured all possible events in a single coordinate description without the need to separate out time and space.
Minkowski’s first attempt, presented in his 1907 colloquium, at constructing velocity four-vectors was flawed because (like so many of my mechanics students when they first take a time derivative of the four-position) he had not yet understood the correct use of proper time. But the research program he outlined paved the way for the great work that was to follow.
On Feb. 21, 1908, only 3 months after his first halting steps, Minkowski delivered a thick manuscript to the printers for an article to appear in the Göttinger Nachrichten. The title “Die Grundgleichungen für die elektromagnetischen Vorgänge in bewegten Körpern” (The Basic Equations for Electromagnetic Processes of Moving Bodies) belies the impact and importance of this very dense article . In its 60 pages (with no figures), Minkowski presents the correct form for four-velocity by taking derivatives relative to proper time, and he formalizes his four-dimensional approach to relativity that became the standard afterwards. He introduces the terms spacelikevector, timelike vector, light cone and world line. He also presents the complete four-tensor form for the electromagnetic fields. The foundational work of Levi Cevita and Ricci-Curbastro on tensors was not yet well known, so Minkowski invents his own terminology of Traktor to describe it. Most importantly, he invents the terms spacetime (Raum-Zeit) and events (Erignisse) .
Minkowski’s four-dimensional formalism of relativistic electromagnetics was more than a mathematical trick—it uncovered the presence of a multitude of invariants that were obscured by the conventional mathematics of Einstein and Lorentz and Poincaré. In Minkowski’s approach, whenever a proper four-vector is contracted with itself (its inner product), an invariant emerges. Because there are many fundamental four-vectors, there are many invariants. These invariants provide the anchors from which to understand the complex relative properties amongst relatively moving frames.
Minkowski’s master work appeared in the Nachrichten on April 5, 1908. If he had thought that physicists would embrace his visionary perspective, he was about to be woefully disabused of that notion.
Despite his impressive ability to see into the foundational depths of the physical world, Einstein did not view mathematics as the root of reality. Mathematics for him was a tool to reduce physical intuition into quantitative form. In 1908 his fame was rising as the acknowledged leader in relativistic physics, and he was not impressed or pleased with the abstract mathematical form that Minkowski was trying to stuff the physics into. Einstein called it “superfluous erudition” , and complained “since the mathematics pounced on the relativity theory, I no longer understand it myself! ”
With his collaborator Jakob Laub (also a former student of Minkowski’s), Einstein objected to more than the hard-to-follow mathematics—they believed that Minkowski’s form of the pondermotive force was incorrect. They then proceeded to re-translate Minkowski’s elegant four-vector derivations back into ordinary vector analysis, publishing two papers in Annalen der Physik in the summer of 1908 that were politely critical of Minkowski’s approach [7-8]. Yet another of Minkowski’s students from Zurich, Gunnar Nordström, showed how to derive Minkowski’s field equations without any of the four-vector formalism.
One can only wonder why so many of his former students so easily dismissed Minkowski’s revolutionary work. Einstein had actually avoided Minkowski’s mathematics classes as a student at ETH , which may say something about Minkowski’s reputation among the students, although Einstein did appreciate the class on mechanics that he took from Minkowski. Nonetheless, Einstein missed the point! Rather than realizing the power and universality of the four-dimensional spacetime formulation, he dismissed it as obscure and irrelevant—perhaps prejudiced by his earlier dim view of his former teacher.
Raum und Zeit
It is clear that Minkowski was stung by the poor reception of his spacetime theory. It is also clear that he truly believed that he had uncovered an essential new approach to physical reality. While mathematicians were generally receptive of his work, he knew that if physicists were to adopt his new viewpoint, he needed to win them over with the elegant results.
In 1908, Minkowski presented a now-famous paper Raum und Zeit at the 80thAssembly of German Natural Scientists and Physicians (21 September 1908). In his opening address, he stated :
To illustrate his arguments Minkowski constructed the most recognizable visual icon of relativity theory—the space-time diagram in which the trajectories of particles appear as “world lines”, as in Fig. 1. On this diagram, one spatial dimension is plotted along the horizontal-axis, and the value ct (speed of light times time) is plotted along the vertical-axis. In these units, a photon travels along a line oriented at 45 degrees, and the world-line (the name Minkowski gave to trajectories) of all massive particles must have slopes steeper than this. For instance, a stationary particle, that appears to have no trajectory at all, executes a vertical trajectory on the space-time diagram as it travels forward through time. Within this new formulation by Minkowski, space and time were mixed together in a single manifold—spacetime—and were no longer separate entities.
In addition to the spacetime construct, Minkowski’s great discovery was the plethora of invariants that followed from his geometry. For instance, the spacetime hyperbola
is invariant to Lorentz transformation in coordinates. This is just a simple statement that a vector is an entity of reality that is independent of how it is described. The length of a vector in our normal three-space does not change if we flip the coordinates around or rotate them, and the same is true for four-vectors in Minkowski space subject to Lorentz transformations.
In relativity theory, this property of invariance becomes especially useful because part of the mental challenge of relativity is that everything looks different when viewed from different frames. How do you get a good grip on a phenomenon if it is always changing, always relative to one frame or another? The invariants become the anchors that we can hold on to as reference frames shift and morph about us.
As an example of a fundamental invariant, the mass of a particle in its rest frame becomes an invariant mass, always with the same value. In earlier relativity theory, even in Einstein’s papers, the mass of an object was a function of its speed. How is the mass of an electron a fundamental property of physics if it is a function of how fast it is traveling? The construction of invariant mass removes this problem, and the mass of the electron becomes an immutable property of physics, independent of the frame. Invariant mass is just one of many invariants that emerge from Minkowski’s space-time description. The study of relativity, where all things seem relative, became a study of invariants, where many things never change. In this sense, the theory of relativity is a misnomer. Ironically, relativity theory became the motivation of post-modern relativism that denies the existence of absolutes, even as relativity theory, as practiced by physicists, is all about absolutes.
Despite his audacious gambit to win over the physicists, Minkowski would not live to see the fruits of his effort. He died suddenly of a burst gall bladder on Jan. 12, 1909 at the age of 44.
Arnold Sommerfeld (who went on to play a central role in the development of quantum theory) took up Minkowski’s four vectors, and he systematized it in a way that was palatable to physicists. Then Max von Laue extended it while he was working with Sommerfeld in Munich, publishing the first physics textbook on relativity theory in 1911, establishing the space-time formalism for future generations of German physicists. Further support for Minkowski’s work came from his distinguished colleagues at Göttingen (Hilbert, Klein, Wiechert, Schwarzschild) as well as his former students (Born, Laue, Kaluza, Frank, Noether). With such champions, Minkowski’s work was immortalized in the methodology (and mythology) of physics, representing one of the crowning achievements of the Göttingen mathematical community.
Already in 1907 Einstein was beginning to grapple with the role of gravity in the context of relativity theory, and he knew that the special theory was just a beginning. Yet between 1908 and 1910 Einstein’s focus was on the quantum of light as he defended and extended his unique view of the photon and prepared for the first Solvay Congress of 1911. As he returned his attention to the problem of gravitation after 1910, he began to realize that Minkowski’s formalism provided a framework from which to understand the role of accelerating frames. In 1912 Einstein wrote to Sommerfeld to say 
I occupy myself now exclusively with the problem of gravitation . One thing is certain that I have never before had to toil anywhere near as much, and that I have been infused with great respect for mathematics, which I had up until now in my naivety looked upon as a pure luxury in its more subtle parts. Compared to this problem. the original theory of relativity is child’s play.
By the time Einstein had finished his general theory of relativity and gravitation in 1915, he fully acknowledge his indebtedness to Minkowski’s spacetime formalism without which his general theory may never have appeared.
Einstein’s theory of gravity came from a simple happy thought that occurred to him as he imagined an unfortunate worker falling from a roof, losing hold of his hammer, only to find both the hammer and himself floating motionless relative to each other as if gravity had ceased to exist. With this one thought, Einstein realized that the falling (i.e. accelerating) reference frame was in fact an inertial frame, and hence all the tricks that he had learned and invented to deal with inertial relativistic frames could apply just as well to accelerating frames in gravitational fields.
Gravitational lensing (and microlensing) have become a major tool of discovery in astrophysics applied to the study of quasars, dark matter and even the search for exoplanets.
Armed with this new perspective, one of the earliest discoveries that Einstein made was that gravity must bend light paths. This phenomenon is fundamentally post-Newtonian, because there can be no possible force of gravity on a massless photon—yet Einstein’s argument for why gravity should bend light is so obvious that it is manifestly true, as demonstrated by Arthur Eddington during the solar eclipse of 1919, launching Einstein to world-wide fame. It is also demonstrated by the beautiful gravitational lensing phenomenon of Einstein arcs. Einstein arcs are the distorted images of bright distant light sources in the universe caused by an intervening massive object, like a galaxy or galaxy cluster, that bends the light rays. A number of these arcs are seen in images of the Abel cluster of galaxies in Fig. 1.
Gravitational lensing (and microlensing) have become a major tool of discovery in astrophysics applied to the study of quasars, dark matter and even the search for exoplanets. However, as soon as Einstein conceived of gravitational lensing, in 1912, he abandoned the idea as too small and too unlikely to ever be useful, much like he abandoned the idea of gravitational waves in 1915 as similarly being too small ever to detect. It was only at the persistence of an amateur Czech scientist twenty years later that Einstein reluctantly agreed to publish his calculations on gravitational lensing.
The History of Gravitational Lensing
In 1912, only a few years after his “happy thought”, and fully three years before he published his definitive work on General Relativity, Einstein derived how light would be affected by a massive object, causing light from a distant source to be deflected like a lens. The historian of physics, Jürgen Renn discovered these derivations in Einstein’s notebooks while at the Max Planck Institute for the History of Science in Berlin in 1996 . However, Einstein also calculated the magnitude of the effect and dismissed it as too small, and so he never published it.
Years later, in 1936, Einstein received a visit from a Czech electrical engineer Rudi Mandl, an amateur scientist who had actually obtained a small stipend from the Czech government to visit Einstein at the Institute for Advanced Study at Princeton. Mandl had conceived of the possibility of gravitational lensing and wished to bring it to Einstein’s attention, thinking that the master would certainly know what to do with the idea. Einstein was obliging, redoing his calculations of 1912 and obtaining once again the results that made him believe that the effect would be too small to be seen. However, Mandl was persistent and pressed Einstein to publish the results, which he did . In his submission letter to the editor of Science, Einstein stated “Let me also thank you for your cooperation with the little publication, which Mister Mandl squeezed out of me. It is of little value, but it makes the poor guy happy”. Einstein’s pessimism was based on his thinking that isolated stars would be the only source of the gravitational lens (he did not “believe” in black holes), but in 1937 Fritz Zwicky at Cal Tech (a gadfly genius) suggested that the newly discovered phenomenon of “galaxy clusters” might provide the massive gravity that would be required to produce the effect. Although, to be visible, a distant source would need to be extremely bright.
Potential sources were discovered in the 1960’s using radio telescopes that discovered quasi-stellar objects (known as quasars) that are extremely bright and extremely far away. Quasars also appear in the visible range, and in 1979 a twin quasar was discovered by astronomers using the telescope at the Kitt Peak Obversvatory in Arizona–two quasars very close together that shared identical spectral fingerprints. The astronomers realized that it could be a twin image of a single quasar caused by gravitational lensing, which they published as a likely explanation. Although the finding was originally controversial, the twin-image was later confirmed, and many additional examples of gravitational lensing have since been discovered.
The Optics of Gravity and Light
Gravitational lenses are terrible optical instruments. A good imaging lens has two chief properties: 1) It produces increasing delay on a wavefront as the radial distance from the optic axis decreases; and 2) it deflects rays with increasing deflection angle as the radial distance of a ray increases away from the optic axis (the center of the lens). Both properties are part of the same effect: the conversion, by a lens, of an incident plane wave into a converging spherical wave. A third property of a good lens ensures minimal aberrations of the converging wave: a quadratic dependence of wavefront delay on radial distance from the optic axis. For instance, a parabolic lens produces a diffraction-limited focal spot.
Now consider the optical effects of gravity around a black hole. One of Einstein’s chief discoveries during his early investigations into the effects of gravity on light is the analogy of warped space-time as having an effective refractive index. Light propagates through space affected by gravity as if there were a refractive index associated with the gravitational potential. In a previous blog on the optics of gravity, I showed the simple derivation of the refractive effects of gravity on light based on the Schwarschild metric applied to a null geodesic of a light ray. The effective refractive index near a black hole is
This effective refractive index diverges at the Schwarzschild radius of the black hole. It produces the maximum delay, not on the optic axis as for a good lens, but at the finite distance RS. Furthermore, the maximum deflection also occurs at RS, and the deflection decreases with increasing radial distance. Both of these properties of gravitational lensing are opposite to the properties of a good lens. For this reason, the phrase “gravitational lensing” is a bit of a misnomer. Gravitating bodies certainly deflect light rays, but the resulting optical behavior is far from that of an imaging lens.
The path of a ray from a distant quasar, through the thin gravitational lens of a galaxy, and intersecting the location of the Earth, is shown in Fig. 2. The location of the quasar is a distance R from the “optic axis”. The un-deflected angular position is θ0, and with the intervening galaxy the image appears at the angular position θ. The angular magnification is therefore M = θ/θ0.
The deflection angles are related through
where b is the “impact parameter”
These two equations are solved to give to an expression that relates the unmagnified angle θ0 to the magnified angle θ as
is the angular size of the Einstein ring when the source is on the optic axis. The quadratic equation has two solutions that gives two images of the distant quasar. This is the origin of the “double image” that led to the first discovery of gravitational lensing in 1979.
When the distant quasar is on the optic axis, then θ0 = 0 and the deflection of the rays produces, not a double image, but an Einstein ring with an angular size of θE. For typical lensing objects, the angular size of Einstein rings are typically in the range of tens of microradians. The angular magnification for decreasing distance R diverges as
But this divergence is more a statement of the bad lens behavior than of actual image size. Because the gravitational lens is inverted (with greater deflection closer to the optic axis) compared to an ideal thin lens, it produces a virtual image ring that is closer than the original object, as in Fig. 3.
The location of the virtual image behind the gravitational lens (when the quasar is on the optic axis) is obtained from
If the quasar is much further from the lens than the Earth, then the image location is zi = -L1/2, or behind the lens by half the distance from the Earth to the lens. The longitudinal magnification is then
Note that while the transverse (angular) magnification diverges as the object approaches the optic axis, the longitudinal magnification remains finite but always greater than unity.
The Caustic Curves of Einstein Rings
Because gravitational lenses have such severe aberration relative to an ideal lens, and because the angles are so small, an alternate approach to understanding the optics of gravity is through the theory of light caustics. In a previous blog on the optics of caustics I described how sets of deflected rays of light become enclosed in envelopes that produce regions of high and low intensity. These envelopes are called caustics. Gravitational light deflection also causes caustics.
In addition to envelopes, it is also possible to trace the time delays caused by gravity on wavefronts. In the regions of the caustic envelopes, these wavefronts can fold back onto themselves so that different parts of the image arrive at different times coming from different directions.
An example of gravitational caustics is shown in Fig. 4. Rays are incident vertically on a gravitational thin lens which deflects the rays so that they overlap in the region below the lens. The red curves are selected wavefronts at three successively later times. The gravitational potential causes a time delay on the propgating front, with greater delays in regions of stronger gravitational potential. The envelope function that is tangent to the rays is called the caustic, here shown as the dense blue mesh. In this case there is a cusp in the caustic near z = -1 below the lens. The wavefronts become multiple-valued past the cusp
The intensity of the distant object past the lens is concentrated near the caustic envelope. The intensity of the caustic at z = -6 is shown in Fig. 5. The ring structure is the cross-sectional spatial intensity at the fixed observation plane, but a transform to the an angular image is one-to-one, so the caustic intensity distribution is also similar to the view of the Einstein ring from a position at z = -6 on the optic axis.
The gravitational potential is a function of the mass distribution in the gravitational lens. A different distribution with a flatter distribution of mass near the optic axis is shown in Fig. 6. There are multiple caustics in this case with multi-valued wavefronts. Because caustics are sensitive to mass distribution in the gravitational lens, astronomical observations of gravitational caustics can be used to back out the mass distribution, including dark matter or even distant exoplanets.
# -*- coding: utf-8 -*-
Created on Tue Mar 30 19:47:31 2021
@author: David Nolte
Introduction to Modern Dynamics, 2nd edition (Oxford University Press, 2019)
import numpy as np
from matplotlib import pyplot as plt
n = n0/(1 + abs(x)**expon)**(1/expon);
delt = 0.001
Ly = 10
Lx = 5
n0 = 1
expon = 2 # adjust this from 1 to 10
delx = 0.01
rng = np.int(Lx/delx)
x = delx*np.linspace(-rng,rng)
n = refindex(x)
dndx = np.diff(n)/np.diff(x)
lines = plt.plot(x,n)
lines2 = plt.plot(dndx)
Nloop = 160;
xd = np.zeros((Nloop,3))
yd = np.zeros((Nloop,3))
for loop in range(0,Nloop):
xp = -Lx + 2*Lx*(loop/Nloop)
plt.plot([xp, xp],[2, 0],'b',linewidth = 0.25)
thet = (refindex(xp+delt) - refindex(xp-delt))/(2*delt)
xb = xp + np.tan(thet)*Ly
plt.plot([xp, xb],[0, -Ly],'b',linewidth = 0.25)
for sloop in range(0,3):
delay = n0/(1 + abs(xp)**expon)**(1/expon) - n0
dis = 0.75*(sloop+1)**2 - delay
xfront = xp + np.sin(thet)*dis
yfront = -dis*np.cos(thet)
xd[loop,sloop] = xfront
yd[loop,sloop] = yfront
for sloop in range(0,3):
plt.plot(xd[:,sloop],yd[:,sloop],'r',linewidth = 0.5)
 J. Renn, T. Sauer and J. Stachel, “The Origin of Gravitational Lensing: A Postscript to Einstein’s 1936 Science Paper, Science 275. 184 (1997)
 A. Einstein, “Lens-Like Action of a Star by the Deviation of Light in the Gravitational Field”, Science 84, 506 (1936)
 (Here is an excellent review article on the topic.) J. Wambsganss, “Gravitational lensing as a powerful astrophysical tool: Multiple quasars, giant arcs and extrasolar planets,” Annalen Der Physik, vol. 15, no. 1-2, pp. 43-59, Jan-Feb (2006) SpringerLink
Imagine if you just discovered how to text through time, i.e. time-texting, when a close friend meets a shocking death. Wouldn’t you text yourself in the past to try to prevent it? But what if, every time you change the time-line and alter the future in untold ways, the friend continues to die, and you seemingly can never stop it? This is the premise of Stein’s Gate, a Japanese sci-fi animé bringing in the paradoxes of time travel, casting CERN as an evil clandestine spy agency, and introducing do-it-yourself inventors, hackers, and wacky characters, while it centers on a terrible death of a lovable character that can never be avoided.
It is also a good computational physics project that explores the dynamics of bifurcations, bistability and chaos. I teach a course in modern dynamics in the Physics Department at Purdue University. The topics of the course range broadly from classical mechanics to chaos theory, social networks, synchronization, nonlinear dynamics, economic dynamics, population dynamics, evolutionary dynamics, neural networks, special and general relativity, among others that are covered in the course using a textbook that takes a modern view of dynamics .
For the final project of the second semester the students (Junior physics majors) are asked to combine two or three of the topics into a single project. Students have come up with a lot of creative combinations: population dynamics of zombies, nonlinear dynamics of negative gravitational mass, percolation of misinformation in presidential elections, evolutionary dynamics of neural architecture, and many more. In that spirit, and for a little fun, in this blog I explore the so-called physics of Stein’s Gate.
Stein’s Gate and the Divergence Meter
Stein’s Gate is a Japanese TV animé series that had a world-wide distribution in 2011. The central premise of the plot is that certain events always occur even if you are on different timelines—like trying to avoid someone’s death in an accident.
This is the problem confronting Rintaro Okabe who tries to stop an accident that kills his friend Mayuri Shiina. But every time he tries to change time, she dies in some other way. It turns out that all the nearby timelines involve her death. According to a device known as The Divergence Meter, Rintaro must get farther than 4% away from the original timeline to have a chance to avoid the otherwise unavoidable event.
This is new. Usually, time-travel Sci-Fi is based on the Butterfly Effect. Chaos theory is characterized by something called sensitivity to initial conditions (SIC), meaning that slightly different starting points produce trajectories that diverge exponentially from nearby trajectories. It is called the Butterfly Effect because of the whimsical notion that a butterfly flapping its wings in China can cause a hurricane in Florida. In the context of the butterfly effect, if you go back in time and change anything at all, the effect cascades through time until the present time in unrecognizable. As an example, in one episode of the TV cartoon The Simpsons, Homer goes back in time to the age of the dinosaurs and kills a single mosquito. When he gets back to our time, everything has changed in bazaar and funny ways.
Stein’s Gate introduces a creative counter example to the Butterfly Effect. Instead of scrambling the future when you fiddle with the past, you find that you always get the same event, even when you change a lot of the conditions—Mayuri still dies. This sounds eerily familiar to a physicist who knows something about chaos theory. It means that the unavoidable event is acting like a stable fixed point in the time dynamics—an attractor! Even if you change the initial conditions, the dynamics draw you back to the fixed point—in this case Mayuri’s accident. What would this look like in a dynamical system?
The Local Basin of Attraction
Dynamical systems can be described as trajectories in a high-dimensional state space. Within state space there are special points where the dynamics are static—known as fixed points. For a stable fixed point, a slight perturbation away will relax back to the fixed point. For an unstable fixed point, on the other hand, a slight perturbation grows and the system dynamics evolve away. However, there can be regions in state space where every initial condition leads to trajectories that stay within that region. This is known as a basin of attraction, and the boundaries of these basins are called separatrixes.
A high-dimensional state space can have many basins of attraction. All the physics that starts within a basin stays within that basin—almost like its own self-consistent universe, bordered by countless other universes. There are well-known physical systems that have many basins of attraction. String theory is suspected to generate many adjacent universes where the physical laws are a little different in each basin of attraction. Spin glasses, which are amorphous solid-state magnets, have this property, as do recurrent neural networks like the Hopfield network. Basins of attraction occur naturally within the physics of these systems.
It is possible to embed basins of attraction within an existing dynamical system. As an example, let’s start with one of the simplest types of dynamics, a hyperbolic fixed point
that has a single saddle fixed point at the origin. We want to add a basin of attraction at the origin with a domain range given by a radius r0. At the same time, we want to create a separatrix that keeps the outer hyperbolic dynamics separate from the internal basin dynamics. To keep all outer trajectories in the outer domain, we can build a dynamical barrier to prevent the trajectories from crossing the separatrix. This can be accomplished by adding a radial repulsive term
In x-y coordinates this is
We also want to keep the internal dynamics of our basin separate from the external dynamics. To do this, we can multiply by a sigmoid function, like a Heaviside function H(r-r0), to zero-out the external dynamics inside our basin. The final external dynamics is then
Now we have to add the internal dynamics for the basin of attraction. To make it a little more interesting, let’s make the internal dynamics an autonomous oscillator
Putting this all together, gives
This looks a little complex, for such a simple model, but it illustrates the principle. The sigmoid is best if it is differentiable, so instead of a Heaviside function it can be a Fermi function
The phase-space portrait of the final dynamics looks like
Adding the internal dynamics does not change the far-field external dynamics, which are still hyperbolic. The repulsive term does split the central saddle point into two saddle points, one on each side left-and-right, so the repulsive term actually splits the dynamics. But the internal dynamics are self-contained and separate from the external dynamics. The origin is an unstable spiral that evolves to a limit cycle. The basin boundary has marginal stability and is known as a “wall”.
To verify the stability of the external fixed point, find the fixed point coordinates
and evaluate the Jacobian matrix (for A = 1 and x0 = 2)
which is clearly a saddle point because the determinant is negative.
In the context of Stein’s Gate, the basin boundary is equivalent to the 4% divergence which is necessary to escape the internal basin of attraction where Mayuri meets her fate.
Python Program: SteinsGate2D.py
# -*- coding: utf-8 -*-
Created on Sat March 6, 2021
@author: David Nolte
Introduction to Modern Dynamics, 2nd edition (Oxford University Press, 2019)
2D simulation of Stein's Gate Divergence Meter
import numpy as np
from scipy import integrate
from matplotlib import pyplot as plt
def solve_flow(param,lim = [-6,6,-6,6],max_time=20.0):
def flow_deriv(x_y, t0, alpha, beta, gamma):
#"""Compute the time-derivative ."""
x, y = x_y
w = 1
R2 = x**2 + y**2
R = np.sqrt(R2)
arg = (R-2)/0.1
env1 = 1/(1+np.exp(arg))
env2 = 1 - env1
f = env2*(x*(1/(R-1.99)**2 + 1e-2) - x) + env1*(w*y + w*x*(1 - R))
g = env2*(y*(1/(R-1.99)**2 + 1e-2) + y) + env1*(-w*x + w*y*(1 - R))
model_title = 'Steins Gate'
xmin = lim
xmax = lim
ymin = lim
ymax = lim
plt.axis([xmin, xmax, ymin, ymax])
N = 24*4 + 47
x0 = np.zeros(shape=(N,2))
ind = -1
for i in range(0,24):
ind = ind + 1
x0[ind,0] = xmin + (xmax-xmin)*i/23
x0[ind,1] = ymin
ind = ind + 1
x0[ind,0] = xmin + (xmax-xmin)*i/23
x0[ind,1] = ymax
ind = ind + 1
x0[ind,0] = xmin
x0[ind,1] = ymin + (ymax-ymin)*i/23
ind = ind + 1
x0[ind,0] = xmax
x0[ind,1] = ymin + (ymax-ymin)*i/23
ind = ind + 1
x0[ind,0] = 0.05
x0[ind,1] = 0.05
for thetloop in range(0,10):
ind = ind + 1
theta = 2*np.pi*(thetloop)/10
ys = 0.125*np.sin(theta)
xs = 0.125*np.cos(theta)
x0[ind,0] = xs
x0[ind,1] = ys
for thetloop in range(0,10):
ind = ind + 1
theta = 2*np.pi*(thetloop)/10
ys = 1.7*np.sin(theta)
xs = 1.7*np.cos(theta)
x0[ind,0] = xs
x0[ind,1] = ys
for thetloop in range(0,20):
ind = ind + 1
theta = 2*np.pi*(thetloop)/20
ys = 2*np.sin(theta)
xs = 2*np.cos(theta)
x0[ind,0] = xs
x0[ind,1] = ys
ind = ind + 1
x0[ind,0] = -3
x0[ind,1] = 0.05
ind = ind + 1
x0[ind,0] = -3
x0[ind,1] = -0.05
ind = ind + 1
x0[ind,0] = 3
x0[ind,1] = 0.05
ind = ind + 1
x0[ind,0] = 3
x0[ind,1] = -0.05
ind = ind + 1
x0[ind,0] = -6
x0[ind,1] = 0.00
ind = ind + 1
x0[ind,0] = 6
x0[ind,1] = 0.00
colors = plt.cm.prism(np.linspace(0, 1, N))
# Solve for the trajectories
t = np.linspace(0, max_time, int(250*max_time))
x_t = np.asarray([integrate.odeint(flow_deriv, x0i, t, param)
for x0i in x0])
for i in range(N):
x, y = x_t[i,:,:].T
lines = plt.plot(x, y, '-', c=colors[i])
return t, x_t
param = (0.02,0.5,0.2) # Steins Gate
lim = (-6,6,-6,6)
t, x_t = solve_flow(param,lim)
The Lorenz Butterfly
Two-dimensional phase space cannot support chaos, and we would like to reconnect the central theme of Stein’s Gate, the Divergence Meter, with the Butterfly Effect. Therefore, let’s actually incorporate our basin of attraction inside the classic Lorenz Butterfly. The goal is to put an attracting domain into the midst of the three-dimensional state space of the Lorenz butterfly in a way that repels the butterfly, without destroying it, but attracts local trajectories. The question is whether the butterfly can survive if part of its state space is made unavailable to it.
The classic Lorenz dynamical system is
As in the 2D case, we will put in a repelling barrier that prevents external trajectories from moving into the local basin, and we will isolate the external dynamics by using the sigmoid function. The final flow equations looks like
where the radius is relative to the center of the attracting basin
and r0 is the radius of the basin. The center of the basin is at [x0, y0, z0] and we are assuming that x0 = 0 and y0 = 0 and z0 = 25 for the standard Butterfly parameters p = 10, r = 25 and b = 8/3. This puts our basin of attraction a little on the high side of the center of the Butterfly. If we embed it too far inside the Butterfly it does actually destroy the Butterfly dynamics.
When r0 = 0, the dynamics of the Lorenz’ Butterfly are essentially unchanged. However, when r0 = 1.5, then there is a repulsive effect on trajectories that pass close to the basin. It can be seen as part of the trajectory skips around the outside of the basin in Figure 2.
Trajectories can begin very close to the basin, but still on the outside of the separatrix, as in the top row of Figure 3 where the basin of attraction with r0 = 1.5 lies a bit above the center of the Butterfly. The Butterfly still exists for the external dynamics. However, any trajectory that starts within the basin of attraction remains there and executes a stable limit cycle. This is the world where Mayuri dies inside the 4% divergence. But if the initial condition can exceed 4%, then the Butterfly effect takes over. The bottom row of Figure 2 shows that the Butterfly itself is fragile. When the external dynamics are perturbed more strongly by more closely centering the local basin, the hyperbolic dynamics of the Butterfly are impeded and the external dynamics are converted to a stable limit cycle. It is interesting that the Butterfly, so often used as an illustration of sensitivity to initial conditions (SIC), is itself sensitive to perturbations that can convert it away from chaos and back to regular motion.
Discussion and Extensions
In the examples shown here, the local basin of attraction was put in “by hand” as an isolated region inside the dynamics. It would be interesting to consider more natural systems, like a spin glass or a Hopfield network, where the basins of attraction occur naturally from the physical principles of the system. Then we could use the “Divergence Meter” to explore these physical systems to see how far the dynamics can diverge before crossing a separatrix. These systems are impossible to visualize because they are intrinsically very high dimensional systems, but Monte Carlo approaches could be used to probe the “sizes” of the basins.
Another interesting extension would be to embed these complex dynamics into spacetime. Since this all started with the idea of texting through time, it would be interesting (and challenging) to see how we could describe this process in a high dimensional Minkowski space that had many space dimensions (but still only one time dimension). Certainly it would violate the speed of light criterion, but we could then take the approach of David Deutsch and view the time axis as if it had multiple branches, like the branches of the arctangent function, creating time-consistent sheets within a sheave of flat Minkowski spaces.
Snorkeling above a shallow reef on a clear sunny day transports you to an otherworldly galaxy of spectacular deep colors and light reverberating off of the rippled surface. Playing across the underwater floor of the reef is a fabulous light show of bright filaments entwining and fluttering, creating random mesh networks of light and dark. These same patterns appear on the bottom of swimming pools in summer and in deep fountains in parks.
Johann Bernoulli had a stormy career and a problematic personality–but he was brilliant even among the bountiful Bernoulli clan. Using methods of tangents, he found the analytic solution of the caustic of the circle.
Something similar happens when a bare overhead light reflects from the sides of a circular glass of water. The pattern no longer moves, but a dazzling filament splays across the bottom of the glass with a sharp bright cusp at the center. These bright filaments of light have an age old name — Caustics — meaning burning as in burning with light. The study of caustics goes back to Archimedes of Syracuse and his apocryphal burning mirrors that are supposed to have torched the invading triremes of the Roman navy in 212 BC.
Caustics in optics are concentrations of light rays that form bright filaments, often with cusp singularities. Mathematically, they are envelope curves that are tangent to a set of lines. Cata-caustics are caustics caused by light reflecting from curved surfaces. Dia-caustics are caustics caused by light refracting from transparent curved materials.
From Leonardo to Huygens
Even after Archimedes, burning mirrors remained an interest for a broad range of scientists, artists and engineers. Leonardo Da Vinci took an interest around 1503 – 1506 when he drew reflected caustics from a circular mirror in his many notebooks.
In the decades after Newton and Leibniz invented the calculus, a small cadre of mathematicians strove to apply the new method to understand aspects of the physical world. At at a time when Newton had left the calculus behind to follow more arcane pursuits, Lebniz, Jakob and Johann Bernoulli, Guillaume de l’Hôpital, Émilie du Chatelet and Walter von Tschirnhaus were pushing notation reform (mainly following Leibniz) to make the calculus easier to learn and use, as well as finding new applications of which there were many.
Ehrenfried Walter von Tschirnhaus (1651 – 1708) was a German mathematician and physician and a lifelong friend of Leibniz, who he met in Paris in 1675. He was one of only five mathematicians to provide a solution to Johann Bernoulli’s brachistochrone problem. One of the recurring interests of von Tschirnhaus, that he revisited throughout his carrier, was in burning glasses and mirrors. A burning glass is a high-quality magnifying lens that brings the focus of the sun to a fine point to burn or anneal various items. Burning glasses were used to heat small items for manufacture or for experimentation. For instance, Priestly and Lavoisier routinely used burning glasses in their chemistry experiments. Low optical aberrations were required for the lenses to bring the light to the finest possible focus, so the study of optical focusing was an important topic both academically and practically. Tshirnhaus had his own laboratory to build and test burning mirrors, and he became aware of the cata-caustic patterns of light reflected from a circular mirror or glass surface. Given his parallel interest in the developing calculus methods, he published a paper in Acta Eruditorum in 1682 that constructed the envelope function created by the cata-caustics of a circle. However, Tschirnhaus did not produce the analytic function–that was provided by Johann Bernoulli ten years later in 1692.
Johann Bernoulli had a stormy career and a problematic personality–but he was brilliant even among the Bountiful Bernoulli clan. Using methods of tangents, he found the analytic solution of the caustic of the circle. He did this by stating the general equation for all reflected rays and then finding when their y values are independent of changing angle … in other words using the principle of stationarity which would later become a potent tool in the hands of Lagrange as he developed Lagrangian physics.
The equation for the reflected ray, expressing y as a function of x for a given angle α in Fig. 5, is
The condition of the caustic envelope requires the change in y with respect to the angle α to vanish while treating x as a constant. This is a partial derivative, and Johann Bernoulli is giving an early use of this method in 1692 to ensure the stationarity of y with respect to the changing angle. The partial derivative is
This is solved to give
Plugging this into the equation at the top equation above yields
These last two expressions for x and y in terms of the angle α are a parametric representation of the caustic. Combining them gives the solution to the caustic of the circle
The square root provides the characteristic cusp at the center of the caustic.
Python Code: raycaustic.py
There are lots of options here. Try them all … then add your own!
# -*- coding: utf-8 -*-
Created on Tue Feb 16 16:44:42 2021
D. D. Nolte, Optical Interferometry for Biology and Medicine (Springer,2011)
import numpy as np
from matplotlib import pyplot as plt
# model_case 1 = cosine
# model_case 2 = circle
# model_case 3 = square root
# model_case 4 = inverse power law
# model_case 5 = ellipse
# model_case 6 = secant
# model_case 7 = parabola
# model_case 8 = Cauchy
model_case = int(input('Input Model Case (1-7)'))
if model_case == 1:
model_title = 'cosine'
xleft = -np.pi
xright = np.pi
ybottom = -1
ytop = 1.2
elif model_case == 2:
model_title = 'circle'
xleft = -1
xright = 1
ybottom = -1
ytop = .2
elif model_case == 3:
model_title = 'square-root'
xleft = 0
xright = 4
ybottom = -2
ytop = 2
elif model_case == 4:
model_title = 'Inverse Power Law'
xleft = 1e-6
xright = 4
ybottom = 0
ytop = 4
elif model_case == 5:
model_title = 'ellipse'
a = 0.5
b = 2
xleft = -b
xright = b
ybottom = -a
ytop = 0.5*b**2/a
elif model_case == 6:
model_title = 'secant'
xleft = -np.pi/2
xright = np.pi/2
ybottom = 0.5
ytop = 4
elif model_case == 7:
model_title = 'Parabola'
xleft = -2
xright = 2
ybottom = 0
ytop = 4
elif model_case == 8:
model_title = 'Cauchy'
xleft = 0
xright = 4
ybottom = 0
ytop = 4
if model_case == 1:
y = -np.cos(x)
elif model_case == 2:
y = -np.sqrt(1-x**2)
elif model_case == 3:
y = -np.sqrt(x)
elif model_case == 4:
y = x**(-0.75)
elif model_case == 5:
y = -a*np.sqrt(1-x**2/b**2)
elif model_case == 6:
y = 1.0/np.cos(x)
elif model_case == 7:
y = 0.5*x**2
elif model_case == 8:
y = 1/(1 + x**2)
xx = np.arange(xleft,xright,0.01)
yy = feval(xx)
lines = plt.plot(xx,yy)
delx = 0.001
N = 75
for i in range(N+1):
x = xleft + (xright-xleft)*(i-1)/N
val = feval(x)
valp = feval(x+delx/2)
valm = feval(x-delx/2)
deriv = (valp-valm)/delx
phi = np.arctan(deriv)
slope = np.tan(np.pi/2 + 2*phi)
if np.abs(deriv) < 1:
xf = (ytop-val+slope*x)/slope;
yf = ytop;
xf = (ybottom-val+slope*x)/slope;
yf = ybottom;
plt.plot([x, x],[ytop, val],linewidth = 0.5)
plt.plot([x, xf],[val, yf],linewidth = 0.5)
The Dia-caustics of Swimming Pools
A caustic is understood mathematically as the envelope function of multiple rays that converge in the Fourier domain (angular deflection measured at far distances). These are points of mathematical stationarity, in which the ray density is invariant to first order in deviations in the refracting surface. The rays themselves are the trajectories of the Eikonal Equation as rays of light thread their way through complicated optical systems.
The basic geometry is shown in Fig 7 for a ray incident on a nonplanar surface emerging into a less-dense medium. From Snell’s law we have the relation for light entering a dense medium like light into water
where n is the relative index (ratio), and the small-angle approximation has been made. The incident angle θ1 is simply related to the slope of the interface dh/dx as
where the small-angle approximation is used again. The angular deflection relative to the optic axis is then
which is equal to the optical path difference through the sample.
In two dimensions, the optical path difference can be replaced with a general potential
and the two orthogonal angular deflections (measured in the far field on a Fourier plane) are
These angles describe the deflection of the rays across the sample surface. They are also the right-hand side of the Eikonal Equation, the equation governing ray trajectories through optical systems.
Caustics are lines of stationarity, meaning that the density of rays is independent of first-order changes in the refracting sample. The condition of stationarity is defined by the Jacobian of the transformation from (x,y) to (θx, θy) with
where the second expression is the Hessian determinant of the refractive power of the uneven surface. When this condition is satisfied, the envelope function bounding groups of collected rays is stationary to perturbations in the inhomogeneous sample.
An example of diacaustic formation from a random surface is shown in Fig. 8 generated by the Python program caustic.py. The Jacobian density (center) outlines regions in which the ray density is independent of small changes in the surface. They are positions of the zeros of the Hessian determinant, the regions of zero curvature of the surface or potential function. These high-intensity regions spread out and are intercepted at some distance by a suface, like the bottom of a swimming pool, where the concentrated rays create bright filaments. As the wavelets on the surface of the swimming pool move, the caustic filaments on the bottom of the swimming pool dance about.
Optical caustics also occur in the gravitational lensing of distant quasars by galaxy clusters in the formation of Einstein rings and arcs seen by deep field telescopes, as described in my following blog post.
Python Code: caustic.py
This Python code was used to generate the caustic patterns in Fig. 8. You can change the surface roughness by changing the divisors on the last two arguments on Line 58. The distance to the bottom of the swimming pool can be changed by changing the parameter d on Line 84.
# -*- coding: utf-8 -*-
Created on Tue Feb 16 19:50:54 2021
D. D. Nolte, Optical Interferometry for Biology and Medicine (Springer,2011)
import numpy as np
from matplotlib import pyplot as plt
from numpy import random as rnd
from scipy import signal as signal
N = 256
x = np.arange(-sx/2,sy/2,1)
y = np.arange(-sy/2,sy/2,1)
y = y[..., None]
ex = np.ones(shape=(sy,1))
x2 = np.kron(ex,x**2/(2*wx**2));
ey = np.ones(shape=(1,sx));
y2 = np.kron(y**2/(2*wy**2),ey);
rad2 = (x2+y2);
A = np.exp(-rad2);
Btemp = 2*np.pi*rnd.rand(sy,sx);
B = np.exp(complex(0,1)*Btemp);
C = gauss2(sy,sx,wy,wx);
Atemp = signal.convolve2d(B,C,'same');
Intens = np.mean(np.mean(np.abs(Atemp)**2));
D = np.real(Atemp/np.sqrt(Intens));
Dphs = np.arctan2(np.imag(D),np.real(D));
return D, Dphs
Sp, Sphs = speckle2(N,N,N/16,N/16)
plt.matshow(Sp,2,cmap=plt.cm.get_cmap('seismic')) # hsv, seismic, bwr
fx, fy = np.gradient(Sp);
fxx,fxy = np.gradient(fx);
fyx,fyy = np.gradient(fy);
J = fxx*fyy - fxy*fyx;
D = np.abs(1/J)
plt.matshow(D,3,cmap=plt.cm.get_cmap('gray')) # hsv, seismic, bwr
eps = 1e-7
cnt = 0
E = np.zeros(shape=(N,N))
for yloop in range(0,N-1):
for xloop in range(0,N-1):
d = N/2
indx = int(N/2 + (d*(fx[yloop,xloop])+(xloop-N/2)/2))
indy = int(N/2 + (d*(fy[yloop,xloop])+(yloop-N/2)/2))
if ((indx > 0) and (indx < N)) and ((indy > 0) and (indy < N)):
E[indy,indx] = E[indy,indx] + 1
The idea of parallel dimensions in physics has a long history dating back to Bernhard Riemann’s famous 1954 lecture on the foundations of geometry that he gave as a requirement to attain a teaching position at the University of Göttingen. Riemann laid out a program of study that included physics problems solved in multiple dimensions, but it was Rudolph Lipschitz twenty years later who first composed a rigorous view of physics as trajectories in many dimensions. Nonetheless, the three spatial dimensions we enjoy in our daily lives remained the only true physical space until Hermann Minkowski re-expressed Einstein’s theory of relativity in 4-dimensional space time. Even so, Minkowski’s time dimension was not on an equal footing with the three spatial dimensions—the four dimensions were entwined, but time had a different characteristic, what is known as pseudo-Riemannian metric. It is this pseudo-metric that allows space-time distances to be negative as easily as positive.
In 1919 Theodore Kaluza of the University of Königsberg in Prussia extended Einstein’s theory of gravitation to a fifth spatial dimension, and physics had its first true parallel dimension. It was more than just an exercise in mathematics—adding a fifth dimension to relativistic dynamics adds new degrees of freedom that allow the dynamical 5-dimensional theory to include more than merely relativistic massive particles and the electric field they generate. In addition to electro-magnetism, something akin to Einstein’s field equation of gravitation emerges. Here was a five-dimensional theory that seemed to unify E&M with gravity—a first unified theory of physics. Einstein, to whom Kaluza communicated his theory, was intrigued but hesitant to forward Kaluza’s paper for publication. It seemed too good to be true. But Einstein finally sent it to be published in the proceedings of the Prussian Academy of Sciences [Kaluza, 1921]. He later launched his own effort to explore such unified field theories more deeply.
Yet Kaluza’s theory was fully classical—if a fifth dimension can be called that—because it made no connection to the rapidly developing field of quantum mechanics. The person who took the step to make five-dimensional space-time into a quantum field theory was Oskar Klein.
Oskar Klein (1894 – 1977)
Oskar Klein was a Swedish physicist who was in the “second wave” of quantum physicists just a few years behind the titans Heisenberg and Schrödinger and Pauli. He began as a student in physical chemistry working in Stockholm under the famous Arrhenius. It was arranged for him to work in France and Germany in 1914, but he was caught in Paris at the onset of World War I. Returning to Sweden, he enlisted in military service from 1915 to 1916 and then joined Arrhenius’ group at the Nobel Institute where he met Hendrick Kramers—Bohr’s direct assistant at Copenhagen at that time. At Kramer’s invitation, Klein traveled to Copenhagen and worked for a year with Kramers and Bohr before returning to defend his doctoral thesis in 1921 in the field of physical chemistry. Klein’s work with Bohr had opened his eyes to the possibilities of quantum theory, and he shifted his research interest away from physical chemistry. Unfortunately, there were no positions at that time in such a new field, so Klein accepted a position as assistant professor at the University of Michigan in Ann Arbor where he stayed from 1923 to 1925.
The Fifth Dimension
In an odd twist of fate, this isolation of Klein from the mainstream quantum theory being pursued in Europe freed him of the bandwagon effect and allowed him to range freely on topics of his own devising and certainly in directions all his own. Unaware of Kaluza’s previous work, Klein expanded Minkowski’s space-time from four to five spatial dimensions, just as Kaluza had done, but now with a quantum interpretation. This was not just an incremental step but had far-ranging consequences in the history of physics.
Klein found a way to keep the fifth dimension Euclidean in its metric properties while rolling itself up compactly into a cylinder with the radius of the Planck length—something inconceivably small. This compact fifth dimension made the manifold into something akin to an infinitesimal string. He published a short note in Nature magazine in 1926 on the possibility of identifying the electric charge within the 5-dimensional theory [Klein, 2916a]. He then returned to Sweden to take up a position at the University of Lund. This odd string-like feature of 5-dimensional space-time was picked up by Einstein and others in their search for unified field theories of physics, but the topic soon drifted from the lime light where it lay dormant for nearly fifty years until the first forays were made into string theory. String theory resurrected the Kaluza-Klein theory which has bourgeoned into the vast topic of String Theory today, including Superstrings that occur in 10+1 dimensionsat the frontiers of physics.
Dirac Electrons without the Spin: Klein-Gordon Equation
Once back in Europe, Klein reengaged with the mainstream trends in the rapidly developing quantum theory and in 1926 developed a relativistic quantum theory of the electron [Klein, 1926b]. Around the same time Walter Gordon also proposed this equation, which is now called the “Klein-Gordon Equation”. The equation was a classic wave equation that was second order in both space and time. This was the most natural form for a wave equation for quantum particles and Schrödinger himself had started with this form. But Schrödinger had quickly realized that the second-order time term in the equation did not capture the correct structure of the hydrogen atom, which led him to express the time-dependent term in first order and non-relativistically—which is today’s “Schrödinger Equation”. The problem was in the spin of the electron. The electron is a spin-1/2 particle, a Fermion, which has special transformation properties. It was Dirac a few years later who discovered how to express the relativistic wave equation for the electron—not by promoting the time-dependent term to second order, but by demoting the space-dependent term to first order. The first-order expression for both the space and time derivatives goes hand in hand with the Pauli spin matrices for the electron, and the Dirac Equation is the appropriate relativistically-correct wave equation for the electron.
Klein’s relativistic quantum wave equation does turn out to be the relevant form for a spin-less particle like the pion, but the pion decays by the strong nuclear force and the Klein-Gordon equation is not a practical description. However, the Higgs boson also is a spin-zero particle, and the Klein-Gordon expression does have relevance for this fundamental exchange particle.
In those early days of the late 1920’s, the nature of the nucleus was still a mystery, especially the problem of nuclear radioactivity where a neutron could convert to a proton with the emission of an electron. Some suggested that the neutron was somehow a proton that had captured an electron in a potential barrier. Klein showed that this was impossible, that the electrons would be highly relativistic—something known as a Dirac electron—and they would tunnel with perfect probability through any potential barrier [Klein, 1929]. Therefore, Klein concluded, no nucleon or nucleus could bind an electron.
This phenomenon of unity transmission through a barrier became known as Klein tunneling. The relativistic electron transmits perfectly through an arbitrary potential barrier—independent of its width or height. This is unlike light that transmits through a dielectric slab in resonances that depend on the thickness of the slab—also known as a Fabry-Perot interferometer. The Dirac electron can have any energy, and the potential barrier can have any width, yet the electron will tunnel with 100% probability. How can this happen?
The answer has to do with the dispersion (velocity versus momentum) of the Dirac electron. As the momentum changes in a potential the speed of the Dirac electron stays constant. In the potential barrier, the moment flips sign, but the speed remains unchanged. This is equivalent to the effects of negative refractive index in optics. If a photon travels through a material with negative refractive index, its momentum is flipped, but its speed remains unchanged. From Fermat’s principle, it is speed which determines how a particle like a photon refracts, so if there is no speed change, then there is no reflection.
For the case of Dirac electrons in a potential with field F, speed v and transverse momentum py, the transmission coefficient is given by
If the transverse momentum is zero, then the transmission is perfect. A visual schematic of the role of dispersion and potentials for Dirac electrons undergoing Klein tunneling is shown in the next figure.
In this case, even if the transverse momentum is not strictly zero, there can still be perfect transmission. It is simply a matter of matching speeds.
Graphene became famous over the past decade because its electron dispersion relation is just like a relativistic Dirac electron with a Dirac point between conduction and valence bands. Evidence for Klein tunneling in graphene systems has been growing, but clean demonstrations have remained difficult to observe.
Now, published in the Dec. 2020 issue of Science magazine—almost a century after Klein first proposed it—an experimental group at the University of California at Berkeley reports a beautiful experimental demonstration of Klein tunneling—not from a nucleus, but in an acoustic honeycomb sounding board the size of a small table—making an experimental analogy between acoustics and Dirac electrons that bears out Klein’s theory.
In this special sounding board, it is not electrons but phonons—acoustic vibrations—that have a Dirac point. Furthermore, by changing the honeycomb pattern, the bands can be shifted, just like in a p-n-p junction, to produce a potential barrier. The Berkeley group, led by Xiang Zhang (now president of Hong Kong University), fabricated the sounding board that is about a half-meter in length, and demonstrated dramatic Klein tunneling.
It is amazing how long it can take between the time a theory is first proposed and the time a clean experimental demonstration is first performed. Nearly 90 years has elapsed since Klein first derived the phenomenon. Performing the experiment with actual relativistic electrons was prohibitive, but bringing the Dirac electron analog into the solid state has allowed the effect to be demonstrated easily.
 Kaluza, Theodor (1921). “Zum Unitätsproblem in der Physik”. Sitzungsber. Preuss. Akad. Wiss. Berlin. (Math. Phys.): 966–972
[1926a] Klein, O. (1926). “The Atomicity of Electricity as a Quantum Theory Law”. Nature118: 516-516.
[1926b] Klein, O. (1926). “Quantentheorie und fünfdimensionale Relativitätstheorie”. Zeitschrift für Physik. 37 (12): 895
 Klein, O. (1929). “Die Reflexion von Elektronen an einem Potentialsprung nach der relativistischen Dynamik von Dirac”. Zeitschrift für Physik. 53 (3–4): 157
The quantum of light—the photon—is a little over 100 years old. It was born in 1905 when Einstein merged Planck’s blackbody quantum hypothesis with statistical mechanics and concluded that light itself must be quantized. No one believed him! Fast forward to today, and the photon is a modern workhorse of modern quantum technology. Quantum encryption and communication are performed almost exclusively with photons, and many prototype quantum computers are optics based. Quantum optics also underpins atomic and molecular optics (AMO), which is one of the hottest and most rapidly advancing frontiers of physics today.
Only after the availability of “quantum” light sources … could photon numbers be manipulated at will, launching the modern era of quantum optics.
This blog tells the story of the early days of the photon and of quantum optics. It begins with Einstein in 1905 and ends with the demonstration of photon anti-bunching that was the first fundamentally quantum optical phenomenon observed seventy years later in 1977. Across that stretch of time, the photon went from a nascent idea in Einstein’s fertile brain to the most thoroughly investigated quantum particle in the realm of physics.
The Photon: Albert Einstein (1905)
When Planck presented his quantum hypothesis in 1900 to the German Physical Society , his model of black body radiation retained all its classical properties but one—the quantized interaction of light with matter. He did not think yet in terms of quanta, only in terms of steps in a continuous interaction.
The quantum break came from Einstein when he published his 1905 paper proposing the existence of the photon—an actual quantum of light that carried with it energy and momentum . His reasoning was simple and iron-clad, resting on Planck’s own blackbody relation that Einstein combined with simple reasoning from statistical mechanics. He was led inexorably to the existence of the photon. Unfortunately, almost no one believed him (see my blog on Einstein and Planck).
This was before wave-particle duality in quantum thinking, so the notion that light—so clearly a wave phenomenon—could be a particle was unthinkable. It had taken half of the 19th century to rid physics of Newton’s corpuscules and emmisionist theories of light, so to bring it back at the beginning of the 20th century seemed like a great blunder. However, Einstein persisted.
In 1909 he published a paper on the fluctuation properties of light  in which he proposed that the fluctuations observed in light intensity had two contributions: one from the discreteness of the photons (what we call “shot noise” today) and one from the fluctuations in the wave properties. Einstein was proposing that both particle and wave properties contributed to intensity fluctuations, exhibiting simultaneous particle-like and wave-like properties. This was one of the first expressions of wave-particle duality in modern physics.
In 1916 and 1917 Einstein took another bold step and proposed the existence of stimulated emission . Once again, his arguments were based on simple physics—this time the principle of detailed balance—and he was led to the audacious conclusion that one photon can stimulated the emission of another. This would become the basis of the laser forty-five years later.
While Einstein was confident in the reality of the photon, others sincerely doubted its existence. Robert Milliken (1868 – 1953) decided to put Einstein’s theory of photoelectron emission to the most stringent test ever performed. In 1915 he painstakingly acquired the definitive dataset with the goal to refute Einstein’s hypothesis, only to confirm it in spectacular fashion . Partly based on Milliken’s confirmation of Einstein’s theory of the photon, Einstein was awarded the Nobel Prize in Physics in 1921.
From that point onward, the physical existence of the photon was accepted and was incorporated routinely into other physical theories. Compton used the energy and the momentum of the photon in 1922 to predict and measure Compton scattering of x-rays off of electrons . The photon was given its modern name by Gilbert Lewis in 1926 .
Single-Photon Interference: Geoffry Taylor (1909)
If a light beam is made up of a group of individual light quanta, then in the limit of very dim light, there should just be one photon passing through an optical system at a time. Therefore, to do optical experiments on single photons, one just needs to reach the ultimate dim limit. As simple and clear as this argument sounds, it has problems that only were sorted out after the Hanbury Brown and Twiss experiments in the 1950’s and the controversy they launched (see below). However, in 1909, this thinking seemed like a clear approach for looking for deviations in optical processes in the single-photon limit.
In 1909, Geoffry Ingram Taylor (1886 – 1975) was an undergraduate student at Cambridge University and performed a low-intensity Young’s double-slit experiment (encouraged by J. J. Thomson). At that time the idea of Einstein’s photon was only 4 years old, and Bohr’s theory of the hydrogen atom was still a year away. But Thomson believed that if photons were real, then their existence could possibly show up as deviations in experiments involving single photons. Young’s double-slit experiment is the classic demonstration of the classical wave nature of light, so performing it under conditions when (on average) only a single photon was in transit between a light source and a photographic plate seemed like the best place to look.
The experiment was performed by finding an optimum exposure of photographic plates in a double slit experiment, then reducing the flux while increasing the exposure time, until the single-photon limit was achieved while retaining the same net exposure of the photographic plate. Under the lowest intensity, when only a single photon was in transit at a time (on average), Taylor performed the exposure for three months. To his disappointment, when he developed the film, there was no significant difference between high intensity and low intensity interference fringes . If photons existed, then their quantized nature was not showing up in the low-intensity interference experiment.
The reason that there is no single-photon-limit deviation in the behavior of the Young double-slit experiment is because Young’s experiment only measures first-order coherence properties. The average over many single-photon detection events is described equally well either by classical waves or by quantum mechanics. Quantized effects in the Young experiment could only appear in fluctuations in the arrivals of photons, but in Taylor’s day there was no way to detect the arrival of single photons.
Quantum Theory of Radiation : Paul Dirac (1927)
After Paul Dirac (1902 – 1984) was awarded his doctorate from Cambridge in 1926, he received a stipend that sent him to work with Niels Bohr (1885 – 1962) in Copenhagen. His attention focused on the electromagnetic field and how it interacted with the quantized states of atoms. Although the electromagnetic field was the classical field of light, it was also the quantum field of Einstein’s photon, and he wondered how the quantized harmonic oscillators of the electromagnetic field could be generated by quantum wavefunctions acting as operators. He decided that, to generate a photon, the wavefunction must operate on a state that had no photons—the ground state of the electromagnetic field known as the vacuum state.
Dirac put these thoughts into their appropriate mathematical form and began work on two manuscripts. The first manuscript contained the theoretical details of the non-commuting electromagnetic field operators. He called the process of generating photons out of the vacuum “second quantization”. In second quantization, the classical field of electromagnetism is converted to an operator that generates quanta of the associated quantum field out of the vacuum (and also annihilates photons back into the vacuum). The creation operators can be applied again and again to build up an N-photon state containing N photons that obey Bose-Einstein statistics, as they must, as required by their integer spin, and agreeing with Planck’s blackbody radiation.
Dirac then showed how an interaction of the quantized electromagnetic field with quantized energy levels involved the annihilation and creation of photons as they promoted electrons to higher atomic energy levels, or demoted them through stimulated emission. Very significantly, Dirac’s new theory explained the spontaneous emission of light from an excited electron level as a direct physical process that creates a photon carrying away the energy as the electron falls to a lower energy level. Spontaneous emission had been explained first by Einstein more than ten years earlier when he derived the famous A and B coefficients , but the physical mechanism for these processes was inferred rather than derived. Dirac, in late 1926, had produced the first direct theory of photon exchange with matter .
Einstein-Podolsky-Rosen (EPR) and Bohr (1935)
The famous dialog between Einstein and Bohr at the Solvay Conferences culminated in the now famous “EPR” paradox of 1935 when Einstein published (together with B. Podolsky and N. Rosen) a paper that contained a particularly simple and cunning thought experiment. In this paper, not only was quantum mechanics under attack, but so was the concept of reality itself, as reflected in the paper’s title “Can Quantum Mechanical Description of Physical Reality Be Considered Complete?” .
Einstein considered an experiment on two quantum particles that had become “entangled” (meaning they interacted) at some time in the past, and then had flown off in opposite directions. By the time their properties are measured, the two particles are widely separated. Two observers each make measurements of certain properties of the particles. For instance, the first observer could choose to measure either the position or the momentum of one particle. The other observer likewise can choose to make either measurement on the second particle. Each measurement is made with perfect accuracy. The two observers then travel back to meet and compare their measurements. When the two experimentalists compare their data, they find perfect agreement in their values every time that they had chosen (unbeknownst to each other) to make the same measurement. This agreement occurred either when they both chose to measure position or both chose to measure momentum.
It would seem that the state of the particle prior to the second measurement was completely defined by the results of the first measurement. In other words, the state of the second particle is set into a definite state (using quantum-mechanical jargon, the state is said to “collapse”) the instant that the first measurement is made. This implies that there is instantaneous action at a distance −− violating everything that Einstein believed about reality (and violating the law that nothing can travel faster than the speed of light). He therefore had no choice but to consider this conclusion of instantaneous action to be false. Therefore quantum mechanics could not be a complete theory of physical reality −− some deeper theory, yet undiscovered, was needed to resolve the paradox.
Bohr, on the other hand, did not hold “reality” so sacred. In his rebuttal to the EPR paper, which he published six months later under the identical title , he rejected Einstein’s criterion for reality. He had no problem with the two observers making the same measurements and finding identical answers. Although one measurement may affect the conditions of the second despite their great distance, no information could be transmitted by this dual measurement process, and hence there was no violation of causality. Bohr’s mind-boggling viewpoint was that reality was nonlocal, meaning that in the quantum world the measurement at one location does influence what is measured somewhere else, even at great distance. Einstein, on the other hand, could not accept a nonlocal reality.
The Intensity Interferometer: Hanbury Brown and Twiss (1956)
Optical physics was surprisingly dormant from the 1930’s through the 1940’s. Most of the research during this time was either on physical optics, like lenses and imaging systems, or on spectroscopy, which was more interested in the physical properties of the materials than in light itself. This hiatus from the photon was about to change dramatically, not driven by physicists, but driven by astronomers.
The development of radar technology during World War II enabled the new field of radio astronomy both with high-tech receivers and with a large cohort of scientists and engineers trained in radio technology. In the late 1940’s and early 1950’s radio astronomy was starting to work with long baselines to better resolve radio sources in the sky using interferometery. The first attempts used coherent references between two separated receivers to provide a common mixing signal to perform field-based detection. However, the stability of the reference was limiting, especially for longer baselines.
In 1950, a doctoral student in the radio astronomy department of the University of Manchester, R. Hanbury Brown, was given the task to design baselines that could work at longer distances to resolve smaller radio sources. After struggling with the technical difficulties of providing a coherent “local” oscillator for distant receivers, Hanbury Brown had a sudden epiphany one evening. Instead of trying to reference the field of one receiver to the field of another, what if, instead, one were to reference the intensity of one receiver to the intensity of the other, specifically correlating the noise on the intensity? To measure intensity requires no local oscillator or reference field. The size of an astronomical source would then show up in how well the intensity fluctuations correlated with each other as the distance between the receivers was changed. He did a back of the envelope calculation that gave him hope that his idea might work, but he needed more rigorous proof if he was to ask for money to try out his idea. He tracked down Richard Twiss at a defense research lab and the two working out the theory of intensity correlations for long-baseline radio interferometry. Using facilities at the famous Jodrell Bank Radio Observatory at Manchester, they demonstrated the principle of their intensity interferometer and measured the angular size of Cygnus A and Cassiopeia A, two of the strongest radio sources in the Northern sky.
One of the surprising side benefits of the intensity interferometer over field-based interferometry was insensitivity to environmental phase fluctuations. For radio astronomy the biggest source of phase fluctuations was the ionosphere, and the new intensity interferometer was immune to its fluctuations. Phase fluctuations had also been the limiting factor for the Michelson stellar interferometer which had limited its use to only about half a dozen stars, so Hanbury Brown and Twiss decided to revisit visible stellar interferometry using their new concept of intensity interferometry.
To illustrate the principle for visible wavelengths, Hanbury Brown and Twiss performed a laboratory experiment to correlate intensity fluctuations in two receivers illuminated by a common source through a beam splitter. The intensity correlations were detected and measured as a function of path length change, illustrating an excess correlation in noise for short path lengths that decayed as the path length increased. They published their results in Nature magazine in 1956 that immediately ignited a firestorm of protest from physicists .
In the 1950’s, many physicists had embraced the discrete properties of the photon and had developed a misleading mental picture of photons as individual and indivisible particles that could only go one way or another from a beam splitter, but not both. Therefore, the argument went, if the photon in an attenuated beam was detected in one detector at the output of a beam splitter, then it cannot be detected at the other. This would produce an anticorrelation in coincidence counts at the two detectors. However, the Hanbury Brown Twiss (HBT) data showed a correlation from the two detectors. This launched an intense controversy in which some of those who accepted the results called for a radical new theory of the photon, while most others dismissed the HBT results as due to systematics in the light source. The heart of this controversy was quickly understood by the Nobel laureate E. M Purcell. He correctly pointed out that photons are bosons and are indistinguishable discrete particles and hence are likely to “bunch” together, according to quantum statistics, even under low light conditions . Therefore, attenuated “chaotic” light would indeed show photodetector correlations, even if the average photon number was less than a single photon at a time, the photons would still bunch.
The bunching of photons in light is a second order effect that moves beyond the first-order interference effects of Young’s double slit, but even here the quantum nature of light is not required. A semiclassical theory of light emission from a spectral line with a natural bandwidth also predicts intensity correlations, and the correlations are precisely what would be observed for photon bunching. Therefore, even the second-order HBT results, when performed with natural light sources, do not distinguish between classical and quantum effects in the experimental results. But this reliance on natural light sources was about to change fundmaentally with the invention of the laser.
Invention of the Laser : Ted Maiman (1959)
One of the great scientific breakthroughs of the 20th century was the nearly simultaneous yet independent realization by several researchers around 1951 (by Charles H. Townes of Columbia University, by Joseph Weber of the University of Maryland, and by Alexander M. Prokhorov and Nikolai G. Basov at the Lebedev Institute in Moscow) that clever techniques and novel apparati could be used to produce collections of atoms that had more electrons in excited states than in ground states. Such a situation is called a population inversion. If this situation could be attained, then according to Einstein’s 1917 theory of photon emission, a single photon would stimulate a second photon, which in turn would stimulate two additional electrons to emit two identical photons to give a total of four photons −− and so on. Clearly this process turns a single photon into a host of photons, all with identical energy and phase.
Charles Townes and his research group were the first to succeed in 1953 in producing a device based on ammonia molecules that could work as an intense source of coherent photons. The initial device did not amplify visible light, but amplified microwave photons that had wavelengths of about 3 centimeters. They called the process microwave amplification by stimulated emission of radiation, hence the acronym “MASER”. Despite the significant breakthrough that this invention represented, the devices were very expensive and difficult to operate. The maser did not revolutionize technology, and some even quipped that the acronym stood for “Means of Acquiring Support for Expensive Research”. The maser did, however, launch a new field of study, called quantum electronics, that was the direct descendant of Einstein’s 1917 paper. Most importantly, the existence and development of the maser became the starting point for a device that could do the same thing for light.
The race to develop an optical maser (later to be called laser, for light amplification by stimulated emission of radiation) was intense. Many groups actively pursued this holy grail of quantum electronics. Most believed that it was possible, which made its invention merely a matter of time and effort. This race was won by Theodore H. Maiman at Hughes Research Laboratory in Malibu California in 1960 . He used a ruby crystal that was excited into a population inversion by an intense flash tube (like a flash bulb) that had originally been invented for flash photography. His approach was amazingly simple −− blast the ruby with a high-intensity pulse of light and see what comes out −− which explains why he was the first. Most other groups had been pursuing much more difficult routes because they believed that laser action would be difficult to achieve.
Perhaps the most important aspect of Maiman’s discovery was that it demonstrated that laser action was actually much simpler than people anticipated, and that laser action is a fairly common phenomenon. His discovery was quickly repeated by other groups, and then additional laser media were discovered such as helium-neon gas mixtures, argon gas, carbon dioxide gas, garnet lasers and others. Within several years, over a dozen different material and gas systems were made to lase, opening up wide new areas of research and development that continues unabated to this day. It also called for new theories of optical coherence to explain how coherent laser light interacted with matter.
Coherent States : Glauber (1963)
The HBT experiment had been performed with attenuated chaotic light that had residual coherence caused by the finite linewidth of the filtered light source. The theory of intensity correlations for this type of light was developed in the 1950’s by Emil Wolf and Leonard Mandel using a semiclassical theory in which the statistical properties of the light was based on electromagnetics without a direct need for quantized photons. The HBT results were fully consistent with this semiclassical theory. However, after the invention of the laser, new “coherent” light sources became available that required a fundamentally quantum depiction.
Roy Glauber was a theoretical physicist who received his PhD working with Julian Schwinger at Harvard. He spent several years as a post-doc at Princeton’s Institute for Advanced Study starting in 1949 at the time when quantum field theory was being developed by Schwinger, Feynman and Dyson. While Feynman was off in Brazil for a year learning to play the bongo drums, Glauber filled in for his lectures at Cal Tech. He returned to Harvard in 1952 in the position of an assistant professor. He was already thinking about the quantum aspects of photons in 1956 when news of the photon correlations in the HBT experiment were published, and when the laser was invented three years later, he began developing a theory of photon correlations in laser light that he suspected would be fundamentally different than in natural chaotic light.
Because of his background in quantum field theory, and especially quantum electrodynamics, it was a fairly easy task to couch the quantum optical properties of coherent light in terms of Dirac’s creation and annihilation operators of the electromagnetic field. Related to the minimum-uncertainty wave functions derived initially by Schrödinger in the late 1920’s, Glauber developed a “coherent state” operator that was a minimum uncertainty state of the quantized electromagnetic field . This coherent state represents a laser operating well above the lasing threshold and predicted that the HBT correlations would vanish. Glauber was awarded the Nobel Prize in Physics in 2005 for his work on such “Glauber” states in quantum optics.
Single-Photon Optics: Kimble and Mandel (1977)
Beyond introducing coherent states, Glauber’s new theoretical approach, and parallel work by George Sudarshan around the same time , provided a new formalism for exploring quantum optical properties in which fundamentally quantum processes could be explored that could not be predicted using only semiclassical theory. For instance, one could envision producing photon states in which the photon arrivals at a detector could display the kind of anti-bunching that had originally been assumed (in error) by the critics of the HBT experiment. A truly one-photon state, also known as a Fock state or a number state, would be the extreme limit in which the quantum field possessed a single quantum that could be directed at a beam splitter and would emerge either from one side or the other with complete anti-correlation. However, generating such a state in the laboratory remained a challenge.
In 1975 by Carmichel and Walls predicted that resonance fluorescence could produce quantized fields that had lower correlations than coherent states . In 1977 H. J. Kimble, M. Dagenais and L. Mandel demonstrated, for the first time, photon antibunching between two photodetectors at the two ports of a beam splitter . They used a beam of sodium atoms pumped by a dye laser.
This first demonstration of photon antibunching represents a major milestone in the history of quantum optics. Taylor’s first-order experiments in 1909 showed no difference between classical electromagnetic waves and a flux of photons. Similarly the second-order HBT experiment of 1956 using chaotic light could be explained equally well using classical or quantum approaches to explain the observed photon correlations. Even laser light (when the laser is operated far above threshold) produced classic “classical” wave effects with only the shot noise demonstrating the discreteness of photon arrivals. Only after the availability of “quantum” light sources, beginning with the work of Kimble and Mandel, could photon numbers be manipulated at will, launching the modern era of quantum optics. Later experiments by them and others have continually improved the control of photon states.
1900 – Planck (1901). “Law of energy distribution in normal spectra.” Annalen Der Physik 4(3): 553-563.
1905 – A. Einstein (1905). “Generation and conversion of light with regard to a heuristic point of view.” Annalen Der Physik 17(6): 132-148.
1909 – A. Einstein (1909). “On the current state of radiation problems.” Physikalische Zeitschrift 10: 185-193.
1909 – G.I. Taylor: Proc. Cam. Phil. Soc. Math. Phys. Sci. 15 , 114 (1909) Single photon double-slit experiment
1915 – Millikan, R. A. (1916). “A direct photoelectric determination of planck’s “h.”.” Physical Review 7(3): 0355-0388. Photoelectric effect.
1916 – Einstein, A. (1916). “Strahlungs-Emission un -Absorption nach der Quantentheorie.” Verh. Deutsch. Phys. Ges. 18: 318.. Einstein predicts stimulated emission
1923 –Compton, Arthur H. (May 1923). “A Quantum Theory of the Scattering of X-Rays by Light Elements”. Physical Review. 21 (5): 483–502.
1926 – Lewis, G. N. (1926). “The conservation of photons.” Nature 118: 874-875.. Gilbert Lewis named “photon”
1927 – D. Dirac, P. A. M. (1927). “The quantum theory of the emission and absorption of radiation.” Proceedings of the Royal Society of London Series a-Containing Papers of a Mathematical and Physical Character 114(767): 243-265.
1932 – E. P. Wigner: Phys. Rev. 40, 749 (1932)
1935 – A. Einstein, B. Podolsky, N. Rosen: Phys. Rev. 47 , 777 (1935). EPR paradox.
1935 – N. Bohr: Phys. Rev. 48 , 696 (1935). Bohr’s response to the EPR paradox.
 Einstein, A. (1916). “Strahlungs-Emission un -Absorption nach der Quantentheorie.” Verh. Deutsch. Phys. Ges. 18: 318; Einstein, A. (1917). “Quantum theory of radiation.” Physikalische Zeitschrift 18: 121-128.
 Brown, R. H. and R. Q. Twiss (1956). “Correlation Between Photons in 2 Coherent Beams of Light.” Nature177(4497): 27-29;  R. H. Brown and R. Q. Twiss, “Test of a new type of stellar interferometer on Sirius,” Nature, vol. 178, no. 4541, pp. 1046-1048, (1956).
 Glauber, R. J. (1963). “Photon Correlations.” Physical Review Letters 10(3): 84.
 Sudarshan, E. C. G. (1963). “Equivalence of semiclassical and quantum mechanical descriptions of statistical light beams.” Physical Review Letters 10(7): 277-&.; Mehta, C. L. and E. C. Sudarshan (1965). “Relation between quantum and semiclassical description of optical coherence.” Physical Review 138(1B): B274.