An excerpt from the upcoming book “Interference: The History of Optical Interferometry and the Scientists who Tamed Light” describes how a handful of 19^{th}-century scientists laid the groundwork for one of the key tools of modern optics. Published in Optics and Photonics News, March 2023.

François Arago rose to the highest levels of French science and politics. Along the way, he met Augustin Fresnel and, together, they changed the course of optical science.

Hyperspace by any other name would sound as sweet, conjuring to the mind’s eye images of hypercubes and tesseracts, manifolds and wormholes, Klein bottles and Calabi Yau quintics. Forget the dimension of time—that may be the most mysterious of all—but consider the extra spatial dimensions that challenge the mind and open the door to dreams of going beyond the bounds of today’s physics.

The geometry of n dimensions studies reality; no one doubts that. Bodies in hyperspace are subject to precise definition, just like bodies in ordinary space; and while we cannot draw pictures of them, we can imagine and study them.

(Poincare 1895)

Here is a short history of hyperspace. It begins with advances by Möbius and Liouville and Jacobi who never truly realized what they had invented, until Cayley and Grassmann and Riemann made it explicit. They opened Pandora’s box, and multiple dimensions burst upon the world never to be put back again, giving us today the manifolds of string theory and infinite-dimensional Hilbert spaces.

August Möbius (1827)

Although he is most famous for the single-surface strip that bears his name, one of the early contributions of August Möbius was the idea of barycentric coordinates [1] , for instance using three coordinates to express the locations of points in a two-dimensional simplex—the triangle. Barycentric coordinates are used routinely today in metallurgy to describe the alloy composition in ternary alloys.

Möbius’ work was one of the first to hint that tuples of numbers could stand in for higher dimensional space, and they were an early example of homogeneous coordinates that could be used for higher-dimensional representations. However, he was too early to use any language of multidimensional geometry.

Carl Jacobi (1834)

Carl Jacobi was a master at manipulating multiple variables, leading to his development of the theory of matrices. In this context, he came to study (n-1)-fold integrals over multiple continuous-valued variables. From our modern viewpoint, he was evaluating surface integrals of hyperspheres.

In 1834, Jacobi found explicit solutions to these integrals and published them in a paper with the imposing title “De binis quibuslibet functionibus homogeneis secundi ordinis per substitutiones lineares in alias binas transformandis, quae solis quadratis variabilium constant; una cum variis theorematis de transformatione et determinatione integralium multiplicium” [2]. The resulting (n-1)-fold integrals are

when the space dimension is even or odd, respectively. These are the surface areas of the manifolds called (n-1)-spheres in n-dimensional space. For instance, the 2-sphere is the ordinary surface 4πr^{2} of a sphere on our 3D space.

Despite the fact that we recognize these as surface areas of hyperspheres, Jacobi used no geometric language in his paper. He was still too early, and mathematicians had not yet woken up to the analogy of extending spatial dimensions beyond 3D.

Joseph Liouville (1838)

Joseph Liouville’s name is attached to a theorem that lies at the core of mechanical systems—Liouville’s Theorem that proves that volumes in high-dimensional phase space are incompressible. Surprisingly, Liouville had no conception of high dimensional space, to say nothing of abstract phase space. The story of the convoluted path that led Liouville’s name to be attached to his theorem is told in Chapter 6, “The Tangled Tale of Phase Space”, in Galileo Unbound (Oxford University Press, 2018).

Nonetheless, Liouville did publish a pure-mathematics paper in 1838 in Crelle’s Journal [3] that identified an invariant quantity that stayed constant during the differential change of multiple variables when certain criteria were satisfied. It was only later that Jacobi, as he was developing a new mechanical theory based on William R. Hamilton’s work, realized that the criteria needed for Liouville’s invariant quantity to hold were satisfied by conservative mechanical systems. Even then, neither Liouville nor Jacobi used the language of multidimensional geometry, but that was about to change in a quick succession of papers and books by three mathematicians who, unknown to each other, were all thinking along the same lines.

Arthur Cayley (1843)

Arthur Cayley was the first to take the bold step to call the emerging geometry of multiple variables to be actual space. His seminal paper “Chapters in the Analytic Theory of n-Dimensions” was published in 1843 in the Philosophical Magazine [4]. Here, for the first time, Cayley recognized that the domain of multiple variables behaved identically to multidimensional space. He used little of the language of geometry in the paper, which was mostly analysis rather than geometry, but his bold declaration for spaces of n-dimensions opened the door to a changing mindset that would soon sweep through geometric reasoning.

Hermann Grassmann (1844)

Grassmann’s life story, although not overly tragic, was beset by lifelong setbacks and frustrations. He was a mathematician literally 30 years ahead of his time, but because he was merely a high-school teacher, no-one took his ideas seriously.

Somehow, in nearly a complete vacuum, disconnected from the professional mathematicians of his day, he devised an entirely new type of algebra that allowed geometric objects to have orientation. These could be combined in numerous different ways obeying numerous different laws. The simplest elements were just numbers, but these could be extended to arbitrary complexity with arbitrary number of elements. He called his theory a theory of “Extension”, and he self-published a thick and difficult tome that contained all of his ideas [5]. He tried to enlist Möbius to help disseminate his ideas, but even Möbius could not recognize what Grassmann had achieved.

In fact, what Grassmann did achieve was vector algebra of arbitrarily high dimension. Perhaps more impressive for the time is that he actually recognized what he was dealing with. He did not know of Cayley’s work, but independently of Cayley he used geometric language for the first time describing geometric objects in high dimensional spaces. He said, “since this method of formation is theoretically applicable without restriction, I can define systems of arbitrarily high level by this method… geometry goes no further, but abstract science knows no limits.” [6]

Grassman was convinced that he had discovered something astonishing and new, which he had, but no one understood him. After years trying to get mathematicians to listen, he finally gave up, left mathematics behind, and actually achieved some fame within his lifetime in the field of linguistics. There is even a law of diachronic linguistics named after him. For the story of Grassmann’s struggles, see the blog on Grassmann and his Wedge Product .

Julius Plücker (1846)

Projective geometry sounds like it ought to be a simple topic, like the projective property of perspective art as parallel lines draw together and touch at the vanishing point on the horizon of a painting. But it is far more complex than that, and it provided a separate gateway into the geometry of high dimensions.

A hint of its power comes from homogeneous coordinates of the plane. These are used to find where a point in three dimensions intersects a plane (like the plane of an artist’s canvas). Although the point on the plane is in two dimensions, it take three homogeneous coordinates to locate it. By extension, if a point is located in three dimensions, then it has four homogeneous coordinates, as if the three dimensional point were a projection onto 3D from a 4D space.

These ideas were pursued by Julius Plücker as he extended projective geometry from the work of earlier mathematicians such as Desargues and Möbius. For instance, the barycentric coordinates of Möbius are a form of homogeneous coordinates. What Plücker discovered is that space does not need to be defined by a dense set of points, but a dense set of lines can be used just as well. The set of lines is represented as a four-dimensional manifold. Plücker reported his findings in a book in 1846 [7] and expanded on the concepts of multidimensional spaces published in 1868 [8].

Ludwig Schläfli (1851)

After Plücker, ideas of multidimensional analysis became more common, and Ludwig Schläfli (1814 – 1895), a professor at the University of Berne in Switzerland, was one of the first to fully explore analytic geometry in higher dimensions. He described multidimsnional points that were located on hyperplanes, and he calculated the angles between intersecting hyperplanes [9]. He also investigated high-dimensional polytopes, from which are derived our modern “Schläfli notation“. However, Schläffli used his own terminology for these objects, emphasizing analytic properties without using the ordinary language of high-dimensional geometry.

Bernhard Riemann (1854)

The person most responsible for the shift in the mindset that finally accepted the geometry of high-dimensional spaces was Bernhard Riemann. In 1854 at the university in Göttingen he presented his habilitation talk “Über die Hypothesen, welche der Geometrie zu Grunde liegen” (Over the hypotheses on which geometry is founded). A habilitation in Germany was an examination that qualified an academic to be able to advise their own students (somewhat like attaining tenure in US universities).

The habilitation candidate would suggest three topics, and it was usual for the first or second to be picked. Riemann’s three topics were: trigonometric properties of functions (he was the first to rigorously prove the convergence properties of Fourier series), aspects of electromagnetic theory, and a throw-away topic that he added at the last minute on the foundations of geometry (on which he had not actually done any serious work). Gauss was his faculty advisor and picked the third topic. Riemann had to develop the topic in a very short time period, starting from scratch. The effort exhausted him mentally and emotionally, and he had to withdraw temporarily from the university to regain his strength. After returning around Easter, he worked furiously for seven weeks to develop a first draft and then asked Gauss to set the examination date. Gauss initially thought to postpone to the Fall semester, but then at the last minute scheduled the talk for the next day. (For the story of Riemann and Gauss, see Chapter 4 “Geometry on my Mind” in the book Galileo Unbound (Oxford, 2018)).

Riemann gave his lecture on 10 June 1854, and it was a masterpiece. He stripped away all the old notions of space and dimensions and imbued geometry with a metric structure that was fundamentally attached to coordinate transformations. He also showed how any set of coordinates could describe space of any dimension, and he generalized ideas of space to include virtually any ordered set of measurables, whether it was of temperature or color or sound or anything else. Most importantly, his new system made explicit what those before him had alluded to: Jacobi, Grassmann, Plücker and Schläfli. Ideas of Riemannian geometry began to percolate through the mathematics world, expanding into common use after Richard Dedekind edited and published Riemann’s habilitation lecture in 1868 [10].

George Cantor and Dimension Theory (1878)

In discussions of multidimensional spaces, it is important to step back and ask what is dimension? This question is not as easy to answer as it may seem. In fact, in 1878, George Cantor proved that there is a one-to-one mapping of the plane to the line, making it seem that lines and planes are somehow the same. He was so astonished at his own results that he wrote in a letter to his friend Richard Dedekind “I see it, but I don’t believe it!”. A few decades later, Peano and Hilbert showed how to create area-filling curves so that a single continuous curve can approach any point in the plane arbitrarily closely, again casting shadows of doubt on the robustness of dimension. These questions of dimensionality would not be put to rest until the work by Karl Menger around 1926 when he provided a rigorous definition of topological dimension (see the Blog on the History of Fractals).

Hermann Minkowski and Spacetime (1908)

Most of the earlier work on multidimensional spaces were mathematical and geometric rather than physical. One of the first examples of physical hyperspace is the spacetime of Hermann Minkowski. Although Einstein and Poincaré had noted how space and time were coupled by the Lorentz equations, they did not take the bold step of recognizing space and time as parts of a single manifold. This step was taken in 1908 [11] by Hermann Minkowski who claimed

“Gentlemen! The views of space and time which I wish to lay before you … They are radical. Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.”Herman Minkowski (1908)

No story of multiple “integer” dimensions can be complete without mentioning the existence of “fractional” dimensions, also known as fractals. The individual who is most responsible for the concepts and mathematics of fractional dimensions was Felix Hausdorff. Before being compelled to commit suicide by being jewish in Nazi Germany, he was a leading light in the intellectual life of Leipzig, Germany. By day he was a brilliant mathematician, by night he was the author Paul Mongré writing poetry and plays.

In 1918, as the war was ending, he wrote a small book “Dimension and Outer Measure” that established ways to construct sets whose measured dimensions were fractions rather than integers [12]. Benoit Mandelbrot would later popularize these sets as “fractals” in the 1980’s. For the background on a history of fractals, see the Blog A Short History of Fractals.

The Fifth Dimension of Theodore Kaluza (1921) and Oskar Klein (1926)

The first theoretical steps to develop a theory of a physical hyperspace (in contrast to merely a geometric hyperspace) were taken by Theodore Kaluza at the University of Königsberg in Prussia. He added an additional spatial dimension to Minkowski spacetime as an attempt to unify the forces of gravity with the forces of electromagnetism. Kaluza’s paper was communicated to the journal of the Prussian Academy of Science in 1921 through Einstein who saw the unification principles as a parallel of some of his own attempts [13]. However, Kaluza’s theory was fully classical and did not include the new quantum theory that was developing at that time in the hands of Heisenberg, Bohr and Born.

Oskar Klein was a Swedish physicist who was in the “second wave” of quantum physicists having studied under Bohr. Unaware of Kaluza’s work, Klein developed a quantum theory of a five-dimensional spacetime [14]. For the theory to be self-consistent, it was necessary to roll up the extra dimension into a tight cylinder. This is like a strand a spaghetti—looking at it from far away it looks like a one-dimensional string, but an ant crawling on the spaghetti can move in two dimensions—along the long direction, or looping around it in the short direction called a compact dimension. Klein’s theory was an early attempt at what would later be called string theory. For the historical background on Kaluza and Klein, see the Blog on Oskar Klein.

John Campbell (1931): Hyperspace in Science Fiction

Art has a long history of shadowing the sciences, and the math and science of hyperspace was no exception. One of the first mentions of hyperspace in science fiction was in the story “Islands in Space’, by John Campbell [15], published in the Amazing Stories quarterly in 1931, where it was used as an extraordinary means of space travel.

In 1951, Isaac Asimov made travel through hyperspace the transportation network that connected the galaxy in his Foundation Trilogy [16].

John von Neumann and Hilbert Space (1932)

Quantum mechanics had developed rapidly through the 1920’s, but by the early 1930’s it was in need of an overhaul, having outstripped rigorous mathematical underpinnings. These underpinnings were provided by John von Neumann in his 1932 book on quantum theory [17]. This is the book that cemented the Copenhagen interpretation of quantum mechanics, with projection measurements and wave function collapse, while also establishing the formalism of Hilbert space.

Hilbert space is an infinite dimensional vector space of orthogonal eigenfunctions into which any quantum wave function can be decomposed. The physicists of today work and sleep in Hilbert space as their natural environment, often losing sight of its infinite dimensions that don’t seem to bother anyone. Hilbert space is more than a mere geometrical space, but less than a full physical space (like five-dimensional spacetime). Few realize that what is so often ascribed to Hilbert was actually formalized by von Neumann, among his many other accomplishments like stored-program computers and game theory.

Einstein-Rosen Bridge (1935)

One of the strangest entities inhabiting the theory of spacetime is the Einstein-Rosen Bridge. It is space folded back on itself in a way that punches a short-cut through spacetime. Einstein, working with his collaborator Nathan Rosen at Princeton’s Institute for Advanced Study, published a paper in 1935 that attempted to solve two problems [18]. The first problem was the Schwarzschild singularity at a radius r = 2M/c^{2} known as the Schwarzschild radius or the Event Horizon. Einstein had a distaste for such singularities in physical theory and viewed them as a problem. The second problem was how to apply the theory of general relativity (GR) to point masses like an electron. Again, the GR solution to an electron blows up at the location of the particle at r = 0.

To eliminate both problems, Einstein and Rosen (ER) began with the Schwarzschild metric in its usual form

where it is easy to see that it “blows up” when r = 2M/c^{2} as well as at r = 0. ER realized that they could write a new form that bypasses the singularities using the simple coordinate substitution

to yield the “wormhole” metric

It is easy to see that as the new variable u goes from -inf to +inf that this expression never blows up. The reason is simple—it removes the 1/r singularity by replacing it with 1/(r + ε). Such tricks are used routinely today in computational physics to keep computer calculations from getting too large—avoiding the divide-by-zero problem. It is also known as a form of regularization in machine learning applications. But in the hands of Einstein, this simple “bypass” is not just math, it can provide a physical solution.

It is hard to imagine that an article published in the Physical Review, especially one written about a simple variable substitution, would appear on the front page of the New York Times, even appearing “above the fold”, but such was Einstein’s fame this is exactly the response when he and Rosen published their paper. The reason for the interest was because of the interpretation of the new equation—when visualized geometrically, it was like a funnel between two separated Minkowski spaces—in other words, what was named a “wormhole” by John Wheeler in 1957. Even back in 1935, there was some sense that this new property of space might allow untold possibilities, perhaps even a form of travel through such a short cut.

As it turns out, the ER wormhole is not stable—it collapses on itself in an incredibly short time so that not even photons can get through it in time. More recent work on wormholes have shown that it can be stabilized by negative energy density, but ordinary matter cannot have negative energy density. On the other hand, the Casimir effect might have a type of negative energy density, which raises some interesting questions about quantum mechanics and the ER bridge.

Edward Witten’s 10+1 Dimensions (1995)

A history of hyperspace would not be complete without a mention of string theory and Edward Witten’s unification of the variously different 10-dimensional string theories into 10- or 11-dimensional M-theory. At a string theory conference at USC in 1995 he pointed out that the 5 different string theories of the day were all related through dualities. This observation launched the second superstring revolution that continues today. In this theory, 6 extra spatial dimensions are wrapped up into complex manifolds such as the Calabi-Yau manifold.

Prospects

There is definitely something wrong with our three-plus-one dimensions of spacetime. We claim that we have achieved the pinnacle of fundamental physics with what is called the Standard Model and the Higgs boson, but dark energy and dark matter loom as giant white elephants in the room. They are giant, gaping, embarrassing and currently unsolved. By some estimates, the fraction of the energy density of the universe comprised of ordinary matter is only 5%. The other 95% is in some form unknown to physics. How can physicists claim to know anything if 95% of everything is in some unknown form?

The answer, perhaps to be uncovered sometime in this century, may be the role of extra dimensions in physical phenomena—probably not in every-day phenomena, and maybe not even in high-energy particles—but in the grand expanse of the cosmos.

By David D. Nolte, Feb. 8, 2023

Bibliography:

M. Kaku, R. O’Keefe, Hyperspace: A scientific odyssey through parallel universes, time warps, and the tenth dimension. (Oxford University Press, New York, 1994).

A. N. Kolmogorov, A. P. Yushkevich, Mathematics of the 19th century: Geometry, analytic function theory. (Birkhäuser Verlag, Basel ; 1996).

References:

[1] F. Möbius, in Möbius, F. Gesammelte Werke,, D. M. Saendig, Ed. (oHG, Wiesbaden, Germany, 1967), vol. 1, pp. 36-49.

[2] Carl Jacobi, “De binis quibuslibet functionibus homogeneis secundi ordinis per substitutiones lineares in alias binas transformandis, quae solis quadratis variabilium constant; una cum variis theorematis de transformatione et determinatione integralium multiplicium” (1834)

[3] J. Liouville, Note sur la théorie de la variation des constantes arbitraires. Liouville Journal3, 342-349 (1838).

[4] A. Cayley, Chapters in the analytical geometry of n dimensions. Collected Mathematical Papers 1, 317-326, 119-127 (1843).

[5] H. Grassmann, Die lineale Ausdehnungslehre. (Wiegand, Leipzig, 1844).

[6] H. Grassmann quoted in D. D. Nolte, Galileo Unbound (Oxford University Press, 2018) pg. 105

[7] J. Plücker, System der Geometrie des Raumes in Neuer Analytischer Behandlungsweise, Insbesondere de Flächen Sweiter Ordnung und Klasse Enthaltend. (Düsseldorf, 1846).

[8] J. Plücker, On a New Geometry of Space (1868).

[9] L. Schläfli, J. H. Graf, Theorie der vielfachen Kontinuität. Neue Denkschriften der Allgemeinen Schweizerischen Gesellschaft für die Gesammten Naturwissenschaften 38. ([s.n.], Zürich, 1901).

The Black Swan was a mythical beast invented by the Roman poet Juvenal as a metaphor for things that are so rare they can only be imagined. His quote goes “rara avis in terris nigroque simillima cygno” (a rare bird in the lands and very much like a black swan).

Imagine the shock, then, when the Dutch explorer Willem de Vlamingh first saw black swans in Australia in 1697. The metaphor morphed into a new use, meaning when a broadly held belief (the impossibility of black swans) is refuted by a single new observation.

For instance, in 1870 the biologist Thomas Henry Huxley, known as “Darwin’s Bulldog” for his avid defense of Darwin’s theories, delivered a speech in Liverpool, England, where he was quoted in Nature magazine as saying,

… the great tragedy of Science—the slaying of a beautiful hypothesis by an ugly fact

This quote has been picked up and repeated over the years in many different contexts.

One of those contexts applies to the fate of a beautiful economic theory, proposed by Fischer Black and Myron Scholes in 1973, as a way to make the perfect hedge on Wall Street, purportedly risk free, yet guaranteeing a positive return in spite of the ups-and-downs of stock prices. Scholes and Black launched an investment company in 1994 to cash in on this beautiful theory, returning an unbelievable 40% on investment. Black died in 1995, but Scholes was awarded the Nobel Prize in Economics in 1997. The next year, the fund went out of business. The ugly fact that flew in the face of Black-Scholes was the Black Swan.

The Black Swan

A Black Swan is an outlier measurement that occurs in a sequence of data points. Up until the Black Swan event, the data points behave normally, following the usual statistics we have all come to expect, maybe a Gaussian distribution or some other form of exponential that dominate most variable phenomena.

But then a Black Swan occurs. It has a value so unexpected, and so unlike all the other measurements, that it is often assumed to be wrong and possibly even thrown out because it screws up the otherwise nice statistics. That single data point skews averages and standard deviations in non-negligible ways. The response to such a disturbing event is to take even more data to let the averages settle down again … until another Black Swan hits and again skews the mean value. However, such outliers are often not spurious measurements but are actually a natural part of the process. They should not, and can not, be thrown out without compromising the statistical integrity of the study.

This outlier phenomenon came to mainstream attention when the author Nassim Nicholas Taleb, in his influential 2007 book, The Black Swan: The Impact of the Highly Improbable, pointed out that it was a central part of virtually every aspect of modern life, whether in business, or the development of new technologies, or the running of elections, or the behavior of financial markets. Things that seemed to be well behaved … a set of products, or a collective society, or a series of governmental policies … are suddenly disrupted by a new invention, or a new law, or a bad Supreme Court decision, or a war, or a stock-market crash.

As an illustration, let’s see where Black-Scholes went wrong.

The Perfect Hedge on Wall Street?

Fischer Black (1938 – 1995) was a PhD advisor’s nightmare. He had graduated as an undergraduate physics major from Harvard in 1959, but then switched to mathematics for graduate school, then switched to computers, then switched again to artificial intelligence, after which he was thrown out of the graduate program at Harvard for having a serious lack of focus. So he joined the RAND corporation, where he had time to play with his ideas, eventually approaching Marvin Minsky at MIT, who helped guide him to an acceptable thesis that he was allowed to submit to the Harvard program for his PhD in applied mathematics. After that, he went to work in financial markets.

His famous contribution to financial theory was the Black-Scholes paper of 1973 on “The Pricing of Options and Corporate Liabilities” co-authored with Byron Scholes. Hedging is a venerable tradition on Wall Street. To hedge means that a broker sells an option (to purchase a stock at a given price at a later time) assuming that the stock will fall in value (selling short), and then buys, as insurance against the price rising, a number of shares of the same asset (buying long). If the broker balances enough long shares with enough short options, then the portfolio’s value is insulated from the day-to-day fluctuations of the value of the underlying asset.

This type of portfolio is one example of a financial instrument called a derivative. The name comes from the fact that the value of the portfolio is derived from the values of the underlying assets. The challenge with derivatives is finding their “true” value at any time before they mature. If a broker knew the “true” value of a derivative, then there would be no risk in buying and selling derivatives.

To be risk free, the value of the derivative needs to be independent of the fluctuations. This appears at first to be a difficult problem, because fluctuations are random and cannot be predicted. But the solution actually relies on just this condition of randomness. If the random fluctuations in stock prices are equivalent to a random walk superposed on the average rate of return, then perfect hedges can be constructed with impunity.

To make a hedge on an underlying asset, create a portfolio by selling one call option (selling short) and buying a number N shares of the asset (buying long) as insurance against the possibility that the asset value will rise. The value of this portfolio is

If the number N is chosen correctly, then the short and long positions will balance, and the portfolio will be protected from fluctuations in the underlying asset price. To find N, consider the change in the value of the portfolio as the variables fluctuate

and use an elegant result known as Ito’s Formula (a stochastic differential equation that includes the effects of a stochastic variable) to yield

Note that the last term contains the fluctuations, expressed using the stochastic term dW (a random walk). The fluctuations can be zeroed-out by choosing

which yields

The important observation about this last equation is that the stochastic function W has disappeared. This is because the fluctuations of the N share prices balance the fluctuations of the short option.

When a broker buys an option, there is a guaranteed rate of return r at the time of maturity of the option which is set by the value of a risk-free bond. Therefore, the price of a perfect hedge must increase with the risk-free rate of return. This is

or

Equating the two equations gives

Simplifying, this leads to a partial differential equation for V(S,t)

The Black-Scholes equation is a partial differential equation whose solution, given the boundary conditions and time, defines the “true” value of the derivative and determines how many shares to buy at t = 0 at a specified guaranteed return rate r (or, alternatively, stating a specified stock price S(T) at the time of maturity T of the option). It is a diffusion equation that incorporates the diffusion of the stock price with time. If the derivative is sold at any time t prior to maturity, when the stock has some value S, then the value of the derivative is given by V(S,t) as the solution to the Black-Scholes equation [1].

One of the interesting features of this equation is the absence of the mean rate of return μ of the underlying asset. This means that any stock of any value can be considered, even if the rate of return of the stock is negative! This type of derivative looks like a truly risk-free investment. You would be guaranteed to make money even if the value of the stock falls, which may sound too good to be true…which of course it is.

The success (or failure) of derivative markets depends on fundamental assumptions about the stock market. These include that it would not be subject to radical adjustments or to panic or irrational exuberance, i.i. Black-Swan events, which is clearly not the case. Just think of booms and busts. The efficient and rational market model, and ultimately the Black-Scholes equation, assumes that fluctuations in the market are governed by Gaussian random statistics. However, there are other types of statistics that are just as well behaved as the Gaussian, but which admit Black Swans.

Stable Distributions: Black Swans are the Norm

When Paul Lévy (1886 – 1971) was asked in 1919 to give three lectures on random variables at the École Polytechnique, the mathematical theory of probability was just a loose collection of principles and proofs. What emerged from those lectures was a lifetime of study in a field that now has grown to become one of the main branches of mathematics. He had a distinguished and productive career, although he struggled to navigate the anti-semitism of Vichy France during WWII. His thesis advisor was the famous Jacques Hadamard and one of his students was the famous Benoit Mandelbrot.

Lévy wrote several influential textbooks that established the foundations of probability theory, and his name has become nearly synonymous with the field. One of his books was on the theory of the addition of random variables [2] in which he extended the idea of a stable distribution.

In probability theory, a class of distributions are called stable if a sum of two independent random variables that come from a distribution have the same distribution. The normal (Gaussian) distribution clearly has this property because the sum of two normally distributed independent variables is also normally distributed. The variance and possibly the mean may be different, but the functional form is still Gaussian.

The general form of a probability distribution can be obtained by taking a Fourier transform as

where φ is known as the characteristic function of the probability distribution. A special case of a stable distribution is the Lévy symmetric stable distribution obtained as

which is parameterized by α and γ. The characteristic function in this case is called a stretched exponential with the length scale set by the parameter γ.

The most important feature of the Lévy distribution is that it has a power-law tail at large values. For instance, the special case of the Lévy distribution for α = 1 is the Cauchy distribution for positive values x given by

which falls off at large values as x^{-(α+1)}. The Cauchy distribution is normalizable (probabilities integrate to unity) and has a characteristic scale set by γ, but it has a divergent mean value, violating the central limit theorem [3]. For distributions that satisfy the central limit theorem, increasing the number of samples from the distribution allows the mean value to converge on a finite value. However, for the Cauchy distribution increasing the number of samples increases the chances of obtaining a black swan, which skews the mean value, and the mean value diverges to infinity in the limit of an infinite number of samples. This is why the Cauchy distribution is said to have a “heavy tail” that contains rare, but large amplitude, outlier events that keep shifting the mean.

Examples of Levy stable probability distribution functions are shown below for a range between α = 1 (Cauchy) and α = 2 (Gaussian). The heavy tail is seen even for the case α = 1.99 very close to the Gaussian distribution. Examples of two-dimensional Levy walks are shown in the figure for α = 1, α = 1.4 and α = 2. In the case of the Gaussian distribution, the mean-squared displacement is well behaved and finite. However, for all the other cases, the mean-squared displacement is divergent, caused by the large path lengths that become more probable as α approaches unity.

The surprising point of the Lévy probability distribution functions is how common they are in natural phenomena. Heavy Lévy tails arise commonly in almost any process that has scale invariance. Yet as students, we are virtually shielded from them, as if Poisson and Gaussian statistics are all we need to know, but ignorance is not bliss. The assumption of Gaussian statistics is what sank Black-Scholes.

Scale-invariant processes are often consequences of natural cascades of mass or energy and hence arise as neutral phenomena. Yet there are biased phenomena in which a Lévy process can lead to a form of optimization. This is the case for Lévy random walks in biological contexts.

Lévy Walks

The random walk is one of the cornerstones of statistical physics and forms the foundation for Brownian motion which has a long and rich history in physics. Einstein used Brownian motion to derive his famous statistical mechanics equation for diffusion, proving the existence of molecular matter. Jean Perrin won the Nobel prize for his experimental demonstrations of Einstein’s theory. Paul Langevin used Brownian motion to introduce stochastic differential equations into statistical physics. And Lévy used Brownian motion to illustrate applications of mathematical probability theory, writing his last influential book on the topic.

Most treatments of the random walk assume Gaussian or Poisson statistics for the step length or rate, but a special form of random walk emerges when the step length is drawn from a Lévy distribution. This is a Lévy random walk, also named a “Lévy Flight” by Benoit Mandelbrot (Lévy’s student) who studied its fractal character.

Originally, Lévy walks were studied as ideal mathematical models, but there have been a number of discoveries in recent years in which Lévy walks have been observed in the foraging behavior of animals, even in the run-and-tumble behavior of bacteria, in which rare long-distance runs are followed by many local tumbling excursions. It has been surmised that this foraging strategy allows an animal to optimally sample randomly-distributed food sources. There is evidence of Lévy walks of molecules in intracellular transport, which may arise from random motions within the crowded intracellular neighborhood. A middle ground has also been observed [4] in which intracellular organelles and vesicles may take on a Lévy walk character as they attach, migrate, and detach from molecular motors that drive them along the cytoskeleton.

By David D. Nolte, Feb. 8, 2023

Selected Bibliography

Paul Lévy, Calcul des probabilités (Gauthier-Villars, Paris, 1925).

Paul Lévy, Théorie de l’addition des variables aléatoires (Gauthier-Villars, Paris, 1937).

Paul Lévy, Processus stochastique et mouvement brownien (Gauthier-Villars, Paris, 1948).

R. Metzler, J. Klafter, The random walk’s guide to anomalous diffusion: a fractional dynamics approach. Physics Reports-Review Section Of Physics Letters339, 1-77 (2000).

J. Klafter, I. M. Sokolov, First Steps in Random Walks : From Tools to Applications. (Oxford University Press, 2011).

F. Hoefling, T. Franosch, Anomalous transport in the crowded world of biological cells. Reports on Progress in Physics76, (2013).

V. Zaburdaev, S. Denisov, J. Klafter, Levy walks. Reviews of Modern Physics87, 483-530 (2015).

References

[1] Black, Fischer; Scholes, Myron (1973). “The Pricing of Options and Corporate Liabilities”. Journal of Political Economy. 81 (3): 637–654.

[2] P. Lévy, Théorie de l’addition des variables aléatoire (1937)

[3] The central limit theorem holds if the mean value of a number of N samples converges to a stable value as the number of samples increases to infinity.

Physics forged ahead in 2022, making a wide range of advances. From a telescope far out in space to a telescope that spans the size of the Earth, from solid state physics and quantum computing at ultra-low temperatures to particle and nuclear physics at ultra-high energies, the year saw a number of firsts. Here’s a list of eight discoveries of 2022 that define the frontiers of physics.

James Webb Space Telescope

“First Light” has two meanings: the “First Light” that originated at the beginning of the universe, and the “First Light” that is collected by a new telescope. In the beginning of this year, the the James Webb Space Telescope (JWST) saw both types of first light, and with it came first surprises.

The JWST has found that galaxies are too well formed too early in the universe relative to current models of galaxy formation. Almost as soon as the JWST began forming images, it acquired evidence of massive galaxies from only a few hundred million years old. Existing theories of galaxy formation did not predict such large galaxies so soon after the Big Bang.

Another surprise came from images of the Southern Ring Nebula. While the Hubble did not find anything unusual about this planetary nebula, the JWST found cold dust surrounding the white dwarf that remained after the explosion of the supernova. This dust was not supposed to be there, but it may be coming from a third member of the intra-nebular environment. In addition, the ring-shaped nebula contained masses of swirling streams and ripples that are challenging astrophysicists who study supernova and nebula formation to refine their current models.

Quantum Machine Learning

Machine learning—the training of computers to identify and manipulate complicated patterns within massive data—has been on a roll in recent years, ever since efficient training algorithms were developed in the early 2000’s for large multilayer neural networks. Classical machine learning can take billions of bits of data and condense it down to understandable information in a matter of minutes. However, there are types of problems that even conventional machine learning might take the age of the universe to calculate, for instance calculating the properties of quantum systems based on a set of quantum measurements of the system.

In June of 2022, researchers at Caltech and Google announced that a quantum computer—Google’s Sycamore quantum computer—could calculate properties of quantum systems using exponentially fewer measurements than would be required to perform the same task using conventional computers. Quantum machine learning uses the resource of quantum entanglement that is not available to conventional machine learning, enabling new types of algorithms that can exponentially speed up calculations of quantum systems. It may come as no surprise that quantum computers are ideally suited to making calculations of quantum systems.

A Possible Heavy W Boson

High-energy particle physics has been in a crisis ever since 2012 when they reached the pinnacle of a dogged half-century search for the fundamental constituents of the universe. The Higgs boson was the crowning achievement, and was supposed to be the vanguard of a new frontier of physics uncovered by CERN. But little new physics has emerged, even though fundamental physics is in dire need of new results. For instance, dark matter and dark energy remain unsolved mysteries despite making up the vast majority of all there is. Therefore, when physicists at Fermilab announced that the W boson, a particle that carries the nuclear weak interaction, was heavier than predicted by the Standard Model, some physicists heaved sighs of relief. The excess mass could signal higher-energy contributions that might lead to new particles or interactions … if the excess weight holds up under continued scrutiny.

Imaging the Black Hole at the Center of the Milky Way

Imagine building a telescope the size of the Earth. What could it see?

If it detected in the optical regime, it could see a baseball on the surface of the Moon. If it detected at microwave frequencies, then it could see the material swirling around distant black holes. This is what the Event Horizon Telescope (EHT) can do. In 2019, it revealed the first image of a black hole: the super-massive black hole at the core of the M87 galaxy 53 million light years away. They did this Herculean feat by combining the signals of microwave telescopes from across the globe, combining their signals interferometrically to create an effective telescope aperture that was the size of the Earth.

The next obvious candidate was the black hole at the center of our own galaxy, the Milky Way. Even though our own black hole is much smaller than the one in M87, ours is much closer, and both subtend about the same solid angle. The challenge was observing it through the swirling stars and dust at the core of our galaxy. In May of this year, the EHT unveiled the first image of our own black hole, showing the radiation emitted by the in-falling material.

Tetraneutrons

Nuclear physics is a venerable part of modern physics that harkens back to the days of Bohr and Rutherford and the beginning of quantum physics, but in recent years it has yielded few new surprises (except at the RHIC collider which smashes heavy nuclei against each other to create quark-gluon plasma). That changed in June of 2022, when researchers in Germany announced the successful measurement of a tetraneutron–a cluster of four neutrons bound transiently together by the strong nuclear force.

Neutrons are the super-glue that holds together the nucleons in standard nuclei. The force is immense, strong enough to counteract the Coulomb repulsion of protons in a nucleus. For instance, Uranium 238 has 92 protons crammed within a volume of about 10 femtometer radius. It takes 146 neutrons to bind these together without flying apart. But neutrons don’t tend to bind to themselves, except in “resonance” states that decay rapidly. In 2012, a dineutron (two neutrons bound in a transient resonance state) was observed, but four neutrons were expected to produce an even more transient resonance (a three-neutron state is not allowed). When the German group created the tetraneutron, it had a lifetime of only about 1×10^{-21} seconds, so it is extremely ephemeral. Nonetheless, studying the properties of the tetraneutron may give insights into both the strong and weak nuclear forces.

Hi-Tc superconductivity

When Bednorz and Müller discovered Hi-Tc superconductivity in 1986, it set off both a boom and a crisis. The boom was the opportunity to raise the critical temperature of superconductivity from 23 K that had been the world record held by Nb_{3}Ge for 13 years since it was set in 1973. The crisis was that the new Hi-Tc materials violated the established theory of superconductivity explained by Bardeen-Cooper-Schrieffer (BCS). There was almost nothing in the theory of solid state physics that could explain how such high critical temperatures could be attained. At the March Meeting of the APS the following year in 1987, the session on the new Hi-Tc materials and possible new theories became known as the Woodstock of Physics, where physicists camped out in the hallway straining their ears to hear the latest ideas on the subject.

One of the ideas put forward at the session was the idea of superexchange by Phil Anderson. The superexchange of two electrons is related to their ability to hop from one lattice site to another. If the hops are coordinated, then there can be an overall reduction in their energy, creating a ground state of long-range coordinated electron hopping that could support superconductivity. Anderson was perhaps the physicist best situated to suggest this theory because of his close familiarity with what was, even then, known as the Anderson Hamiltonian that explicitly describes the role of hopping in solid-state many-body phenomena.

Ever since, the idea of superexchange has been floating around the field of Hi-Tc superconductivity, but no one had been able to pin it down conclusively, until now. In a paper published in the PNAS in September of 2022, an experimental group at Oxford presented direct observations of the spatial density of Cooper pairs in relation to the spatial hopping rates—where hopping was easiest then the Cooper pair density was highest, and vice versa. This experiment provides almost indisputable evidence in favor of Anderson’s superexchange mechanism for Cooper pair formation in the Hi-Tc materials, laying to rest the crisis launched 36 years ago.

Holographic Wormhole

The holographic principle of cosmology proposes that our three-dimensional physical reality—stars, galaxies, expanding universe—is like the projection of information encoded on a two-dimensional boundary—just as a two-dimensional optical hologram can be illuminated to recreate a three-dimensional visual representation. This 2D to 3D projection was first proposed by Gerald t’Hooft, inspired by the black hole information paradox in which the entropy of a black hole scales as surface area of the black hole instead of its volume. The holographic principle was expanded by Leonard Susskind in 1995 based on string theory and is one path to reconciling quantum physics with the physics of gravitation in a theory of quantum gravity—one of the Holy Grails of physics.

While it is an elegant cosmic idea, the holographic principle could not be viewed as anything down to Earth, until now. In November 2022 a research group at Caltech published a paper in Nature describing how they used Google’s Sycamore quantum computer (housed at UC Santa Barbara) to manipulate a set of qubits into creating a laboratory-based analog of a Einstein-Rosen bridge, also known as a “wormhole”, through spacetime. The ability to use quantum information states to simulate a highly-warped spacetime analog provides the first experimental evidence for the validity of the cosmological holographic principle. Although the simulation did not produce a physical wormhole in our spacetime, it showed how quantum information and differential geometry (the mathematics of general relativity) can be connected.

One of the most important consequences of this work is the proposal that ER = EPR (Einstein-Rosen = Einstein-Podolsky-Rosen). The EPR paradox of quantum entanglement has long been viewed as a fundamental paradox of physics that requires instantaneous non-local correlations among quantum particles that can be arbitrarily far apart. Although EPR violates local realism, it is a valuable real-world resource for quantum teleportation. By demonstrating the holographic wormhole, the recent Caltech results show how quantum teleportation and gravitational wormholes may arise from the same physics.

Net-Positive-Energy from Nuclear Fusion

Ever since nuclear fission was harnessed to generate energy, the idea of tapping the even greater potential of nuclear fusion to power the world has been a dream of nuclear physicists. Nuclear fusion energy would be clean and green and could help us avoid the long-run disaster of global warming. However, achieving that dream has been surprisingly frustrating. While nuclear fission was harnessed for energy (and weapons) within only a few years of discovery, and a fusion “boost” was added to nuclear destructive power in the so-called hydrogen bomb, sustained energy production from fusion has remained elusive.

In December of 2022, the National Ignition Facility (NIF) focussed the power of 192 pulsed lasers onto a deuterium-tritium pellet, causing it to implode, and the nuclei to fuse, releasing about 50% more energy that it absorbed. This was the first time that controlled fusion released net positive energy—about 3 million Joules out from 2 million Joules in—enough energy to boil about 3 liters of water. This accomplishment represents a major milestone in the history of physics and could one day provide useful energy. The annual budget of the NIF is about 300 million dollars, so there is a long road ahead (probably several more decades) before this energy source can be scaled down to an economical level.

Physical reality is nothing but a bunch of spikes and pulses—or glitches. Take any smooth phenomenon, no matter how benign it might seem, and decompose it into an infinitely dense array of infinitesimally transient, infinitely high glitches. Then the sum of all glitches, weighted appropriately, becomes the phenomenon. This might be called the “glitch” function—but it is better known as Green’s function in honor of the ex-millwright George Green who taught himself mathematics at night to became one of England’s leading mathematicians of the age.

The δ function is thus merely a convenient notation … we perform operations on the abstract symbols, such as differentiation and integration …

PAM Dirac (1930)

The mathematics behind the “glitch” has a long history that began in the golden era of French analysis with the mathematicians Cauchy and Fourier, was employed by the electrical engineer Heaviside, and ultimately fell into the fertile hands of the quantum physicist, Paul Dirac, after whom it is named.

Augustin-Louis Cauchy (1815)

The French mathematician and physicist Augustin-Louis Cauchy (1789 – 1857) has lent his name to a wide array of theorems, proofs and laws that are still in use today. In mathematics, he was one of the first to establish “modern” functional analysis and especially complex analysis. In physics he established a rigorous foundation for elasticity theory (including the elastic properties of the so-called luminiferous ether).

In the early days of the 1800’s Cauchy was exploring how integrals could be used to define properties of functions. In modern terminology we would say that he was defining kernel integrals, where a function is integrated over a kernel to yield some property of the function.

In 1815 Cauchy read before the Academy of Paris a paper with the long title “Theory of wave propagation on a surface of a fluid of indefinite weight”. The paper was not published until more than ten years later in 1827 by which time it had expanded to 300 pages and contained numerous footnotes. The thirteenth such footnote was titled “On definite integrals and the principal values of indefinite integrals” and it contained one of the first examples of what would later become known as a generalized distribution. The integral is a function F(μ) integrated over a kernel

Cauchy lets the scale parameter α be “an infinitely small number”. The kernel is thus essentially zero for any values of μ “not too close to α”. Today, we would call the kernel given by

in the limit that α vanishes, “the delta function”.

Cauchy’s approach to the delta function is today one of the most commonly used descriptions of what a delta function is. It is not enough to simply say that a delta function is an infinitely narrow, infinitely high function whose integral is equal to unity. It helps to illustrate the behavior of the Cauchy function as α gets progressively smaller, as shown in Fig. 1.

In the limit as α approaches zero, the function grows progressively higher and progressively narrower, but the integral over the function remains unity.

Joseph Fourier (1822)

The delayed publication of Cauchy’s memoire kept it out of common knowledge, so it can be excused if Joseph Fourier (1768 – 1830) may not have known of it by the time he published his monumental work on heat in 1822. Perhaps this is why Fourier’s approach to the delta function was also different than Cauchy’s.

Fourier noted that an integral over a sinusoidal function, as the argument of the sinusoidal function went to infinity, became independent of the limits of integration. He showed

when ε << 1/p as p went to infinity. In modern notation, this would be the delta function defined through the “sinc” function

and Fourier noted that integrating this form over another function f(x) yielded the value of the function f(α) evaluated at α, rediscovering the results of Cauchy, but using a sinc(x) function in Fig. 2 instead of the Cauchy function of Fig. 1.

George Green’s Function (1829)

A history of the delta function cannot be complete without mention of George Green, one of the most remarkable British mathematicians of the 1800’s. He was a miller’s son who had only one year of education and spent most of his early life tending to his father’s mill. In his spare time, and to cut the tedium of his work, he read the most up-to-date work of the French mathematicians, reading the papers of Cauchy and Poisson and Fourier, whose work far surpassed the British work at that time. Unbelievably, he mastered the material and developed new material of his own, that he eventually self published. This is the mathematical work that introduced the potential function and introduced fundamental solutions to unit sources—what today would be called point charges or delta functions. These fundamental solutions are equivalent to the modern Green’s function, although they were developed rigorously much later by Courant and Hilbert and by Kirchhoff.

The modern idea of a Green’s function is simply the system response to a unit impulse—like throwing a pebble into a pond to launch expanding ripples or striking a bell. To obtain the solutions for a general impulse, one integrates over the fundamental solutions weighted by the strength of the impulse. If the system response to a delta function impulse at x = a, that is, a delta function δ(x-a), is G(x-a), then the response of the system to a distributed force f(x) is given by

where G(x-a) is called the Green’s function.

Oliver Heaviside (1893)

Oliver Heaviside (1850 – 1925) tended to follow his own path, independently of whatever the mathematicians were doing. Heaviside took particularly pragmatic approaches based on physical phenomena and how they might behave in an experiment. This is the context in which he introduced once again the delta function, unaware of the work of Cauchy or Fourier.

Heaviside was an engineer at heart who practiced his art by doing. He was not concerned with rigor, only with what works. This part of his personality may have been forged by his apprenticeship in telegraph technology helped by his uncle Charles Wheatstone (of the Wheatstone bridge). While still a young man, Heaviside tried to tackle Maxwell on his new treatise on electricity and magnetism, but he realized his mathematics were lacking, so he began a project of self education that took several years. The product of those years was his development of an idiosyncratic approach to electronics that may be best described as operator algebra. His algebra contained mis-behaved functions, such as the step function that was later named after him. It could also handle the derivative of the step function, which is yet another way of defining the delta function, though certainly not to the satisfaction of any rigorous mathematician—but it worked. The operator theory could even handle the derivative of the delta function.

Perhaps the most important influence by Heaviside was his connection of the delta function to Fourier integrals. He was one of the first to show that

which states that the Fourier transform of a delta function is a complex sinusoid, and the Fourier transform of a sinusoid is a delta function. Heaviside wrote several influential textbooks on his methods, and by the 1920’s these methods, including the Heaviside function and its derivative, had become standard parts of the engineer’s mathematical toolbox.

Given the work by Cauchy, Fourier, Green and Heaviside, what was left for Paul Dirac to do?

Paul Dirac (1930)

Paul Dirac (1902 – 1984) was given the moniker “The Strangest Man” by Niels Bohr during his visit to Copenhagen shortly after he had received his PhD. In part, this was because of Dirac’s internal intensity that could make him seem disconnected from those around him. When he was working on a problem in his head, it was not unusual for him to start walking, and by the time he he became aware of his surroundings again, he would have walked the length of the city of Copenhagen. And his solutions to problems were ingenious, breaking bold new ground where others, some of whom were geniuses themselves, were fumbling in the dark.

Among his many influential works—works that changed how physicists thought of and wrote about quantum systems—was his 1930 textbook on quantum mechanics. This was more than just a textbook, because it invented new methods by unifying the wave mechanics of Schrödinger with the matrix mechanics of Born and Heisenberg.

In particular, there had been a disconnect between bound electron states in a potential and free electron states scattering off of the potential. In the one case the states have a discrete spectrum, i.e. quantized, while in the other case the states have a continuous spectrum. There were standard quantum tools for decomposing discrete states by a projection onto eigenstates in Hilbert space, but an entirely different set of tools for handling the scattering states.

Yet Dirac saw a commonality between the two approaches. Specifically, eigenstate decomposition on the one hand used discrete sums of states, while scattering solutions on the other hand used integration over a continuum of states. In the first format, orthogonality was denoted by a Kronecker delta notation, but there was no equivalent in the continuum case—until Dirac introduced the delta function as a kernel in the integrand. In this way, the form of the equations with sums over states multiplied by Kronecker deltas took on the same form as integrals over states multiplied by the delta function.

In addition to introducing the delta function into the quantum formulas, Dirac also explored many of the properties and rules of the delta function. He was aware that the delta function was not a “proper” function, but by beginning with a simple integral property as a starting axiom, he could derive virtually all of the extended properties of the delta function, including properties of its derivatives.

Mathematicians, of course, were appalled and were quick to point out the insufficiency of the mathematical foundation for Dirac’s delta function, until the French mathematician Laurent Schwartz (1915 – 2002) developed the general theory of distributions in the 1940’s, which finally put the delta function in good standing.

Dirac’s introduction, development and use of the delta function was the first systematic definition of its properties. The earlier work by Cauchy, Fourier, Green and Heaviside had all touched upon the behavior of such “spiked” functions, but they had used it in passing. After Dirac, physicists embraced it as a powerful new tool in their toolbox, despite the lag in its formal acceptance by mathematicians, until the work of Schwartz redeemed it.

By David D. Nolte Feb. 17, 2022

Bibliography

V. Balakrishnan, “All about the Dirac Delta function(?)”, Resonance, Aug., pg. 48 (2003)

M. G. Katz. “Who Invented Dirac’s Delta Function?”, Semantic Scholar (2010).

J. Lützen, The prehistory of the theory of distributions. Studies in the history of mathematics and physical sciences ; 7 (Springer-Verlag, New York, 1982).

Despite the many apparent paradoxes posed in physics—the twin and ladder paradoxes of relativity theory, Olber’s paradox of the bright night sky, Loschmitt’s paradox of irreversible statistical fluctuations—these are resolved by a deeper look at the underlying assumptions—the twin paradox is resolved by considering shifts in reference frames, the ladder paradox is resolved by the loss of simultaneity, Olber’s paradox is resolved by a finite age to the universe, and Loschmitt’s paradox is resolved by fluctuation theorems. In each case, no physical principle is violated, and each paradox is fully explained.

However, there is at least one “true” paradox in physics that defies consistent explanation—quantum entanglement. Quantum entanglement was first described by Einstein with colleagues Podolsky and Rosen in the famous EPR paper of 1935 as an argument against the completeness of quantum mechanics, and it was given its name by Schrödinger the same year in the paper where he introduced his “cat” as a burlesque consequence of entanglement.

Here is a short history of quantum entanglement [1], from its beginnings in 1935 to the recent 2022 Nobel prize in Physics awarded to John Clauser, Alain Aspect and Anton Zeilinger.

The EPR Papers of 1935

Einstein can be considered as the father of quantum mechanics, even over Planck, because of his 1905 derivation of the existence of the photon as a discrete carrier of a quantum of energy (see Einstein versus Planck). Even so, as Heisenberg and Bohr advanced quantum mechanics in the mid 1920’s, emphasizing the underlying non-deterministic outcomes of measurements, and in particular the notion of instantaneous wavefunction collapse, they pushed the theory in directions that Einstein found increasingly disturbing and unacceptable.

This feature is an excerpt from an upcoming book, Interference: The History of Optical Interferometry and the Scientists Who Tamed Light (Oxford University Press, July 2023), by David D. Nolte.

At the invitation-only Solvay Congresses of 1927 and 1930, where all the top physicists met to debate the latest advances, Einstein and Bohr began a running debate that was epic in the history of physics as the two top minds went head-to-head as the onlookers looked on in awe. Ultimately, Einstein was on the losing end. Although he was convinced that something was missing in quantum theory, he could not counter all of Bohr’s rejoinders, even as Einstein’s assaults became ever more sophisticated, and he left the field of battle beaten but not convinced. Several years later he launched his last and ultimate salvo.

At the Institute for Advanced Study in Princeton, New Jersey, in the 1930’s Einstein was working with Nathan Rosen and Boris Podolsky when he envisioned a fundamental paradox in quantum theory that occurred when two widely-separated quantum particles were required to share specific physical properties because of simple conservation theorems like energy and momentum. Even Bohr and Heisenberg could not deny the principle of conservation of energy and momentum, and Einstein devised a two-particle system for which these conservation principles led to an apparent violation of Heisenberg’s own uncertainty principle. He left the details to his colleagues, with Podolsky writing up the main arguments. They published the paper in the Physical Review in March of 1935 with the title “Can Quantum-Mechanical Description of Physical Reality be Considered Complete” [2]. Because of the three names on the paper (Einstein, Podolsky, Rosen), it became known as the EPR paper, and the paradox they presented became known as the EPR paradox.

When Bohr read the paper, he was initially stumped and aghast. He felt that EPR had shaken the very foundations of the quantum theory that he and his institute had fought so hard to establish. He also suspected that EPR had made a mistake in their arguments, and he halted all work at his institute in Copenhagen until they could construct a definitive answer. A few months later, Bohr published a paper in the Physical Review in July of 1935, using the identical title that EPR had used, in which he refuted the EPR paradox [3]. There is not a single equation or figure in the paper, but he used his “awful incantation terminology” to maximum effect, showing that one of the EPR assumptions on the assessment of uncertainties to position and momentum was in error, and he was right.

Einstein was disgusted. He had hoped that this ultimate argument against the completeness of quantum mechanics would stand the test of time, but Bohr had shot it down within mere months. Einstein was particularly disappointed with Podolsky, because Podolsky had tried too hard to make the argument specific to position and momentum, leaving a loophole for Bohr to wiggle through, where Einstein had wanted the argument to rest on deeper and more general principles.

Despite Bohr’s victory, Einstein had been correct in his initial formulation of the EPR paradox that showed quantum mechanics did not jibe with common notions of reality. He and Schrödinger exchanged letters commiserating with each other and encouraging each other in their counter beliefs against Bohr and Heisenberg. In November of 1935, Schrödinger published a broad, mostly philosophical, paper in Naturwissenschaften [4] in which he amplified the EPR paradox with the use of an absurd—what he called burlesque—consequence of wavefunction collapse that became known as Schrödinger’s Cat. He also gave the central property of the EPR paradox its name: entanglement.

Ironically, both Einstein’s entanglement paradox and Schrödinger’s Cat, which were formulated originally to be arguments against the validity of quantum theory, have become established quantum tools. Today, entangled particles are the core workhorses of quantum information systems, and physicists are building larger and larger versions of Schrödinger’s Cat that may eventually merge with the physics of the macroscopic world.

Bohm and Ahronov Tackle EPR

The physicist David Bohm was a rare political exile from the United States. He was born in the heart of Pennsylvania in the town of Wilkes-Barre, attended Penn State and then the University of California at Berkeley, where he joined Robert Oppenheimer’s research group. While there, he became deeply involved in the fight for unions and socialism, activities for which he was called before McCarthy’s Committee on Un-American Activities. He invoked his right to the fifth amendment for which he was arrested. Although he was later acquitted, Princeton University fired him from his faculty position, and fearing another arrest, he fled to Brazil where his US passport was confiscated by American authorities. He had become a physicist without a country.

Despite his personal trials, Bohm remained scientifically productive. He published his influential textbook on quantum mechanics in the midst of his Senate hearings, and after a particularly stimulating discussion with Einstein shortly before he fled the US, he developed and published an alternative version of quantum theory in 1952 that was fully deterministic—removing Einstein’s “God playing dice”—by creating a hidden-variable theory [5].

Hidden-variable theories of quantum mechanics seek to remove the randomness of quantum measurement by assuming that some deeper element of quantum phenomena—a hidden variable—explains each outcome. But it is also assumed that these hidden variables are not directly accessible to experiment. In this sense, the quantum theory of Bohr and Heisenberg was “correct” but not “complete”, because there were things that the theory could not predict or explain.

Bohm’s hidden variable theory, based on a quantum potential, was able to reproduce all the known results of standard quantum theory without invoking the random experimental outcomes that Einstein abhorred. However, it still contained one crucial element that could not sweep away the EPR paradox—it was nonlocal.

Nonlocality lies at the heart of quantum theory. In its simplest form, the nonlocal nature of quantum phenomenon says that quantum states span spacetime with space-like separations, meaning that parts of the wavefunction are non-causally connected to other parts of the wavefunction. Because Einstein was fundamentally committed to causality, the nonlocality of quantum theory was what he found most objectionable, and Bohm’s elegant hidden-variable theory, that removed Einstein’s dreaded randomness, could not remove that last objection of non-causality.

After working in Brazil for several years, Bohm moved to the Technion University in Israel where he began a fruitful collaboration with Yakir Ahronov. In addition to proposing the Ahronov-Bohm effect, in 1957 they reformulated Podolsky’s version of the EPR paradox that relied on continuous values of position and momentum and replaced it with a much simpler model based on the Stern-Gerlach effect on spins and further to the case of positronium decay into two photons with correlated polarizations. Bohm and Ahronov reassessed experimental results of positronium decay that had been made by Madame Wu in 1950 at Columbia University and found it in full agreement with standard quantum theory.

John Bell’s Inequalities

John Stuart Bell had an unusual start for a physicist. His family was too poor to give him an education appropriate to his skills, so he enrolled in vocational school where he took practical classes that included brick laying. Working later as a technician in a university lab, he caught the attention of his professors who sponsored him to attend the university. With a degree in physics, he began working at CERN as an accelerator designer when he again caught the attention of his supervisors who sponsored him to attend graduate school. He graduated with a PhD and returned to CERN as a card-carrying physicist with all the rights and privileges that entailed.

During his university days, he had been fascinated by the EPR paradox, and he continued thinking about the fundamentals of quantum theory. On a sabbatical to the Stanford accelerator in 1960 he began putting mathematics to the EPR paradox to see whether any local hidden variable theory could be compatible with quantum mechanics. His analysis was fully general, so that it could rule out as-yet-unthought-of hidden-variable theories. The result of this work was a set of inequalities that must be obeyed by any local hidden-variable theory. Then he made a simple check using the known results of quantum measurement and showed that his inequalities are violated by quantum systems. This ruled out the possibility of any local hidden variable theory (but not Bohm’s nonlocal hidden-variable theory). Bell published his analysis in 1964 [6] in an obscure journal that almost no one read…except for a curious graduate student at Columbia University who began digging into the fundamental underpinnings of quantum theory against his supervisor’s advice.

John Clauser’s Tenacious Pursuit

As a graduate student in astrophysics at Columbia University, John Clauser was supposed to be doing astrophysics. Instead, he spent his time musing over the fundamentals of quantum theory. In 1967 Clauser stumbled across Bell’s paper while he was in the library. The paper caught his imagination, but he also recognized that the inequalities were not experimentally testable, because they required measurements that depended directly on hidden variables, which are not accessible. He began thinking of ways to construct similar inequalities that could be put to an experimental test, and he wrote about his ideas to Bell, who responded with encouragement. Clauser wrote up his ideas in an abstract for an upcoming meeting of the American Physical Society, where one of the abstract reviewers was Abner Shimony of Boston University. Clauser was surprised weeks later when he received a telephone call from Shimony. Shimony and his graduate student Micheal Horne had been thinking along similar lines, and Shimony proposed to Clauser that they join forces. They met in Boston where they were met Richard Holt, a graudate student at Harvard who was working on experimental tests of quantum mechanics. Collectively, they devised a new type of Bell inequality that could be put to experimental test [7]. The result has become known as the CHSH Bell inequality (after Clauser, Horne, Shimony and Holt).

When Clauser took a post-doc position in Berkeley, he began searching for a way to do the experiments to test the CHSH inequality, even though Holt had a head start at Harvard. Clauser enlisted the help of Charles Townes, who convinced one of the Berkeley faculty to loan Clauser his graduate student, Stuart Freedman, to help. Clauser and Freedman performed the experiments, using a two-photon optical decay of calcium ions and found a violation of the CHSH inequality by 5 standard deviations, publishing their result in 1972 [8].

Alain Aspect’s Non-locality

Just as Clauser’s life was changed when he stumbled on Bell’s obscure paper in 1967, the paper had the same effect on the life of French physicist Alain Aspect who stumbled on it in 1975. Like Clauser, he also sought out Bell for his opinion, meeting with him in Geneva, and Aspect similarly received Bell’s encouragement, this time with the hope to build upon Clauser’s work.

In some respects, the conceptual breakthrough achieved by Clauser had been the CHSH inequality that could be tested experimentally. The subsequent Clauser Freedman experiments were not a conclusion, but were just the beginning, opening the door to deeper tests. For instance, in the Clauser-Freedman experiments, the polarizers were static, and the detectors were not widely separated, which allowed the measurements to be time-like separated in spacetime. Therefore, the fundamental non-local nature of quantum physics had not been tested.

Aspect began a thorough and systematic program, that would take him nearly a decade to complete, to test the CHSH inequality under conditions of non-locality. He began with a much brighter source of photons produced using laser excitation of the calcium ions. This allowed him to perform the experiment in 100’s of seconds instead of the hundreds of hours by Clauser. With such a high data rate, Aspect was able to verify violation of the Bell inequality to 10 standard deviations, published in 1981 [9].

However, the real goal was to change the orientations of the polarizers while the photons were in flight to widely separated detectors [10]. This experiment would allow the detection to be space-like separated in spacetime. The experiments were performed using fast-switching acoustic-optic modulators, and the Bell inequality was violated to 5 standard deviations [11]. This was the most stringent test yet performed and the first to fully demonstrate the non-local nature of quantum physics.

Anton Zeilinger: Master of Entanglement

If there is one physicist today whose work encompasses the broadest range of entangled phenomena, it would be the Austrian physicist, Anton Zeilinger. He began his career in neutron interferometery, but when he was bitten by the entanglement bug in 1976, he switched to quantum photonics because of the superior control that can be exercised using optics over sources and receivers and all the optical manipulations in between.

Working with Daniel Greenberger and Micheal Horne, they took the essential next step past the Bohm two-particle entanglement to consider a 3-particle entangled state that had surprising properties. While the violation of locality by the two-particle entanglement was observed through the statistical properties of many measurements, the new 3-particle entanglement could show violations on single measurements, further strengthening the arguments for quantum non-locality. This new state is called the GHZ state (after Greenberger, Horne and Zeilinger) [12].

As the Zeilinger group in Vienna was working towards experimental demonstrations of the GHZ state, Charles Bennett of IBM proposed the possibility for quantum teleportation, using entanglement as a core quantum information resource [13]. Zeilinger realized that his experimental set-up could perform an experimental demonstration of the effect, and in a rapid re-tooling of the experimental apparatus [14], the Zeilinger group was the first to demonstrate quantum teleportation that satisfied the conditions of the Bennett teleportation proposal [15]. An Italian-UK collaboration also made an early demonstration of a related form of teleportation in a paper that was submitted first, but published after Zeilinger’s, due to delays in review [16]. But teleportation was just one of a widening array of quantum applications for entanglement that was pursued by the Zeilinger group over the succeeding 30 years [17], including entanglement swapping, quantum repeaters, and entanglement-based quantum cryptography. Perhaps most striking, he has worked on projects at astronomical observatories that entangle photons coming from cosmic sources.

[2] A. Einstein, B. Podolsky, N. Rosen, Can quantum-mechanical description of physical reality be considered complete? Physical Review47, 0777-0780 (1935).

[3] N. Bohr, Can quantum-mechanical description of physical reality be considered complete? Physical Review48, 696-702 (1935).

[4] E. Schrödinger, Die gegenwärtige Situation in der Quantenmechanik. Die Naturwissenschaften 23, 807-12; 823-28; 844-49 (1935).

[5] D. Bohm, A suggested interpretation of the quantum theory in terms of hidden variables .1. Physical Review85, 166-179 (1952); D. Bohm, A suggested interpretation of the quantum theory in terms of hidden variables .2. Physical Review85, 180-193 (1952).

[6] J. Bell, On the Einstein-Podolsky-Rosen paradox. Physics1, 195 (1964).

[7] 1. J. F. Clauser, M. A. Horne, A. Shimony, R. A. Holt, Proposed experiment to test local hidden-variable theories. Physical Review Letters23, 880-& (1969).

[8] S. J. Freedman, J. F. Clauser, Experimental test of local hidden-variable theories. Physical Review Letters28, 938-& (1972).

[9] A. Aspect, P. Grangier, G. Roger, EXPERIMENTAL TESTS OF REALISTIC LOCAL THEORIES VIA BELLS THEOREM. Physical Review Letters47, 460-463 (1981).

[10] Alain Aspect, Bell’s Theorem: The Naïve Veiw of an Experimentalit. (2004), hal- 00001079

[11] A. Aspect, J. Dalibard, G. Roger, EXPERIMENTAL TEST OF BELL INEQUALITIES USING TIME-VARYING ANALYZERS. Physical Review Letters49, 1804-1807 (1982).

[12] D. M. Greenberger, M. A. Horne, A. Zeilinger, in 1988 Fall Workshop on Bells Theorem, Quantum Theory and Conceptions of the Universe. (George Mason Univ, Fairfax, Va, 1988), vol. 37, pp. 69-72.

[13] C. H. Bennett, G. Brassard, C. Crepeau, R. Jozsa, A. Peres, W. K. Wootters, Teleporting an unknown quantum state via dual classical and einstein-podolsky-rosen channels. Physical Review Letters70, 1895-1899 (1993).

[14] J. Gea-Banacloche, Optical realizations of quantum teleportation, in Progress in Optics, Vol 46, E. Wolf, Ed. (2004), vol. 46, pp. 311-353.

[15] D. Bouwmeester, J.-W. Pan, K. Mattle, M. Eibl, H. Weinfurter, A. Zeilinger, Experimental quantum teleportation. Nature390, 575-579 (1997).

[16] D. Boschi, S. Branca, F. De Martini, L. Hardy, S. Popescu, Experimental realization of teleporting an unknown pure quantum state via dual classical and Einstein-podolsky-Rosen Channels. Phys. Rev. Lett.80, 1121-1125 (1998).

[17] A. Zeilinger, Light for the quantum. Entangled photons and their applications: a very personal perspective. Physica Scripta92, 1-33 (2017).

New from Oxford Press: The History of Optical Interferometry (Late Summer 2023)

Quantum physics is often called “weird” because it does things that are not allowed in classical physics and hence is viewed as non-intuitive or strange. Perhaps the two “weirdest” aspects of quantum physics are quantum entanglement and quantum tunneling. Entanglement allows a particle state to extend across wide expanses of space, while tunneling allows a particle to have negative kinetic energy. Neither of these effects has a classical analog.

Quantum entanglement arose out of the Bohr-Einstein debates at the Solvay Conferences in the 1920’s and 30’s, and it was the subject of a recent Nobel Prize in Physics (2022). The quantum tunneling story is just as old, but it was recognized much earlier by the Nobel Prize in 1972 when it was awarded to Brian Josephson, Ivar Giaever and Leo Esaki—each of whom was a graduate student when they discovered their respective effects and two of whom got their big idea while attending a lecture class.

Always go to class, you never know what you might miss, and the payoff is sometimes BIG

Ivar Giaever

Of the two effects, tunneling is the more common and the more useful in modern electronic devices (although entanglement is coming up fast with the advent of quantum information science). Here is a short history of quantum tunneling, told through a series of publications that advanced theory and experiments.

Double-Well Potential: Friedrich Hund (1927)

The first analysis of quantum tunneling was performed by Friedrich Hund (1896 – 1997), a German physicist who studied early in his career with Born in Göttingen and Bohr in Copenhagen. He published a series of papers in 1927 in Zeitschrift für Physik [1] that solved the newly-proposed Schrödinger equation for the case of the double well potential. He was particularly interested in the formation of symmetric and anti-symmetric states of the double well that contributed to the binding energy of atoms in molecules. He derived the first tunneling-frequency expression for a quantum superposition of the symmetric and anti-symmetric states

where f is the coherent oscillation frequency, V is the height of the potential and hν is the quantum energy of the isolated states when the atoms are far apart. The exponential dependence on the potential height V made the tunnel effect extremely sensitive to the details of the tunnel barrier.

Electron Emission: Lothar Nordheim and Ralph Fowler (1927 – 1928)

The first to consider quantum tunneling from a bound state to a continuum state was Lothar Nordheim (1899 – 1985), a German physicist who studied under David Hilbert and Max Born at Göttingen and worked with John von Neumann and Eugene Wigner and later with Hans Bethe. In 1927 he solved the problem of a particle in a well that is separated from continuum states by a thin finite barrier [2]. Using the new Schrödinger theory, he found transmission coefficients that were finite valued, caused by quantum tunneling of the particle through the barrier. Nordheim’s use of square potential wells and barriers are now, literally, textbook examples that every student of quantum mechanics solves. (For a quantum simulation of wavefunction tunneling through a square barrier see the companion Quantum Tunneling YouTube video.) Nordheim later escaped the growing nationalism and anti-semitism in Germany in the mid 1930’s to become a visiting professor of physics at Purdue University in the United States, moving to a permanent position at Duke University.

One of the giants of mathematical physics in the UK from the 1920s through the 1930’s was Ralph Fowler (1889 – 1944). Three of his doctoral students went on to win Nobel Prizes (Chandrasekhar, Dirac and Mott) and others came close (Bhabha, Hartree, Lennard-Jones). In 1928 Fowler worked with Nordheim on a more realistic version of Nordheim’s surface electron tunneling that could explain thermionic emission of electrons from metals under strong electric fields. The electric field modified Nordheim’s square potential barrier into a triangular barrier (which they treated using WKB theory) to obtain the tunneling rate [3]. This type of tunnel effect is now known as Fowler-Nordheim tunneling.

Nuclear Alpha Decay: George Gamow (1928)

George Gamov (1904 – 1968) is one of the icons of mid-twentieth-century physics. He was a substantial physicist who also had a solid sense of humor that allowed him to achieve a level of cultural popularity shared by a few of the larger-than-life physicists of his time, like Richard Feynman and Stephen Hawking. His popular books included One Two Three … Infinity as well as a favorite series of books under the rubric of Mr. Tompkins (Mr. Tompkins in Wonderland and Mr. Tompkins Explores the Atom, among others). He also wrote a history of the early years of quantum theory (Thirty Years that Shook Physics).

In 1928 Gamow was in Göttingen (the Mecca of early quantum theory) with Max Born when he realized that the radioactive decay of Uranium by alpha decay might be explained by quantum tunneling. It was known that nucleons were bound together by some unknown force in what would be an effective binding potential, but that charged alpha particles would also feel a strong electrostatic repulsive potential from a nucleus. Gamow combined these two potentials to create a potential landscape that was qualitatively similar to Nordheim’s original system of 1927, but with a potential barrier that was neither square nor triangular (like the Fowler-Nordheim situation).

Gamow was able to make an accurate approximation that allowed him to express the decay rate in terms of an exponential term

where Z_{α} is the atomic charge of the alpha particle, Z is the nuclear charge of the Uranium decay product and v is the speed of the alpha particle detected in external measurements [4].

The very next day after Gamow submitted his paper, Ronald Gurney and Edward Condon of Princeton University submitted a paper [5] that solved the same problem using virtually the same approach … except missing Gamow’s surprisingly concise analytic expression for the decay rate.

Molecular Tunneling: George Uhlenbeck (1932)

Because tunneling rates depend inversely on the mass of the particle tunneling through the barrier, electrons are more likely to tunnel through potential barriers than atoms. However, hydrogen is a particularly small atom and is therefore the most amenable to experiencing tunneling.

The first example of atom tunneling is associated with hydrogen in the ammonia molecule NH_{3}. The molecule has a pyramidal structure with the Nitrogen hovering above the plane defined by the three hydrogens. However, an equivalent configuration has the Nitrogen hanging below the hydrogen plane. The energies of these two configurations are the same, but the Nitrogen must tunnel from one side of the hydrogen plane to the other through a barrier. The presence of light-weight hydrogen that can “move out of the way” for the nitrogen makes this barrier very small (infrared energies). When the ammonia is excited into its first vibrational excited state, the molecular wavefunction tunnels through the barrier, splitting the excited level by an energy associated with a wavelength of 1.2 cm which is in the microwave. This tunnel splitting was the first microwave transition observed in spectroscopy and is used in ammonia masers.

One of the earliest papers [6] written on the tunneling of nitrogen in ammonia was published by George Uhlenbeck in 1932. George Uhlenbeck (1900 – 1988) was a Dutch-American theoretical physicist. He played a critical role, with Samuel Goudsmit, in establishing the spin of the electron in 1925. Both Uhlenbeck and Goudsmit were close associates of Paul Ehrenfest at Leiden in the Netherlands. Uhlenbeck is also famous for the Ornstein-Uhlenbeck process which is a generalization of Einstein’s theory of Brownian motion that can treat active transport such as intracellular transport in living cells.

Solid-State Electron Tunneling: Leo Esaki (1957)

Although the tunneling of electrons in molecular bonds and in the field emission from metals had been established early in the century, direct use of electron tunneling in solid state devices had remained elusive until Leo Esaki (1925 – ) observed electron tunneling in heavily doped Germanium and Silicon semiconductors. Esaki joined an early precursor of Sony electronics in 1956 and was supported to obtain a PhD from the University of Tokyo. In 1957 he was working with heavily-doped p-n junction diodes and discovered a phenomenon known as negative differential resistance where the current through an electronic device actually decreases as the voltage increases.

Because the junction thickness was only about 100 atoms, or about 10 nanometers, he suspected and then proved that the electronic current was tunneling quantum mechanically through the junction. The negative differential resistance was caused by a decrease in available states to the tunneling current as the voltage increased.

Esaki tunnel diodes were the fastest semiconductor devices of the time, and the negative differential resistance of the diode in an external circuit produced high-frequency oscillations. They were used in high-frequency communication systems. They were also radiation hard and hence ideal for the early communications satellites. Esaki was awarded the 1973 Nobel Prize in Physics jointly with Ivar Giaever and Brian Josephson.

Superconducting Tunneling: Ivar Giaever (1960)

Ivar Giaever (1929 – ) is a Norwegian-American physicist who had just joined the GE research lab in Schenectady New York in 1958 when he read about Esaki’s tunneling experiments. He was enrolled at that time as a graduate student in physics at Rensselaer Polytechnic Institute (RPI) where he was taking a course in solid state physics and learning about superconductivity. Superconductivity is carried by pairs of electrons known as Cooper pairs that spontaneously bind together with a binding energy that produced an “energy gap” in the electron energies of the metal, but no one had ever found a way to directly measure it. The Esaki experiment made him immediately think of the equivalent experiment in which Cooper pairs might tunnel between two superconductors (through a thin oxide layer) and yield a measurement of the energy gap. The idea actually came to him during the class lecture.

The experiments used a junction between aluminum and lead (Al—Al_{2}O_{3}—Pb). At first, the temperature of the system was adjusted so that Al remained a normal metal and Pb was superconducting, and Giaever observed a tunnel current with a threshold related to the gap in Pb. Then the temperature was lowered so that both Al and Pb were superconducting, and a peak in the tunnel current appeared at the voltage associated with the difference in the energy gaps (predicted by Harrison and Bardeen).

The Josephson Effect: Brian Josephson (1962)

In Giaever’s experiments, the external circuits had been designed to pick up “ordinary” tunnel currents in which individual electrons tunneled through the oxide rather than the Cooper pairs themselves. However, in 1962, Brian Josephson (1940 – ), a physics graduate student at Cambridge, was sitting in a lecture (just like Giaever) on solid state physics given by Phil Anderson (who was on sabbatical there from Bell Labs). During lecture he had the idea to calculate whether it was possible for the Cooper pairs themselves to tunnel through the oxide barrier. Building on theoretical work by Leo Falicov who was at the University of Chicago and later at Berkeley (years later I was lucky to have Leo as my PhD thesis advisor at Berkeley), Josephson found a surprising result that even when the voltage was zero, there would be a supercurrent that tunneled through the junction (now known as the DC Josephson Effect). Furthermore, once a voltage was applied, the supercurrent would oscillate (now known as the AC Josephson Effect). These were strange and non-intuitive results, so he showed Anderson his calculations to see what he thought. By this time Anderson had already been extremely impressed by Josephson (who would often come to the board after one of Anderson’s lectures to show where he had made a mistake). Anderson checked over the theory and agreed with Josephson’s conclusions. Bolstered by this reception, Josephson submitted the theoretical prediction for publication [9].

As soon as Anderson returned to Bell Labs after his sabbatical, he connected with John Rowell who was making tunnel junction experiments, and they revised the external circuit configuration to be most sensitive to the tunneling supercurrent, which they observed in short time and submitted a paper for publication. Since then, the Josephson Effect has become a standard element of ultra-sensitive magnetometers, measurement standards for charge and voltage, far-infrared detectors, and have been used to construct rudimentary qubits and quantum computers.

[5] R. W. Gurney, E. U. Condon, Nature 122, 439 (1928). R. W. Gurney, E. U. Condon, Phys. Rev. 33, 127 (1929).

[6] Dennison, D. M. and G. E. Uhlenbeck. “The two-minima problem and the ammonia molecule.” Physical Review 41(3): 313-321. (1932)

[7] L. Esaki, New Phenomenon in Narrow Germanium Para-Normal-Junctions, Phys. Rev., 109, 603-604 (1958); L. Esaki, (1974). Long journey into tunneling, disintegration, Proc. of the Nature 123, IEEE, 62, 825.

[8] I. Giaever, Energy Gap in Superconductors Measured by Electron Tunneling, Phys. Rev. Letters, 5, 147-148 (1960); I. Giaever, Electron tunneling and superconductivity, Science, 183, 1253 (1974)

[9] B. D. Josephson, Phys. Lett. 1, 251 (1962); B.D. Josephson, The discovery of tunneling supercurrent, Science, 184, 527 (1974).

[10] P. W. Anderson, J. M. Rowell, Phys. Rev. Lett. 10, 230 (1963); Philip W. Anderson, How Josephson discovered his effect, Physics Today 23, 11, 23 (1970)

[11] Eugen Merzbacher, The Early History of Quantum Tunneling, Physics Today 55, 8, 44 (2002)

[12] Razavy, Mohsen. Quantum Theory Of Tunneling, World Scientific Publishing Company, 2003.

When our son was ten years old, he came home from a town fair in Battleground, Indiana, with an unwanted pet—a goldfish in a plastic bag. The family rushed out to buy a fish bowl and food and plopped the golden-red animal into it. In three days, it was dead!

It turns out that you can’t just put a gold fish in a fish bowl. When it metabolizes its food and expels its waste, it builds up toxic levels of ammonia unless you add filters or plants or treat the water with chemicals. In the end, the goldfish died because it was asphyxiated by its own pee.

It’s a basic rule—don’t pee in your own fish bowl.

The same can be said for humans living on the surface of our planet. Polluting the atmosphere with our wastes cannot be a good idea. In the end it will kill us. The atmosphere may look vast—the fish bowl was a big one—but it is shocking how thin it is.

Turn on your Apple TV, click on the screen saver, and you are skimming over our planet on the dark side of the Earth. Then you see a thin blue line extending over the limb of the dark disc. Hold! That thin blue line! That is our atmosphere! Is it really so thin?

When you look upwards on a clear sunny day, the atmosphere seems like it goes on forever. It doesn’t. It is a thin veneer on the surface of the Earth barely one percent of the Earth’s radius. The Earth’s atmosphere is frighteningly thin.

Consider Mars. It’s half the size of Earth, yet it cannot hold on to an atmosphere even 1/100^{th} the thickness of ours. When Mars first formed, it had an atmosphere not unlike our own, but through the eons its atmosphere has wafted away irretrievably into space.

An atmosphere is a precious fragile thing for a planet. It gives life and it gives protection. It separates us from the deathly cold of space, holding heat like a blanket. That heat has served us well over the eons, allowing water to stay liquid and allowing life to arise on Earth. But too much of a good thing is not a good thing.

Common Sense

If the fluid you are bathed in gives you life, then don’t mess with it. Don’t run your car in the garage while you are working in it. Don’t use a charcoal stove in an enclosed space. Don’t dump carbon dioxide into the atmosphere because it alsois an enclosed space.

At the end of winter, as the warm spring days get warmer, you take the winter blanket off your bed because blankets hold in heat. The thicker the blanket, the more heat it holds in. Common sense tells you to reduce the thickness of the blanket if you don’t want to get too warm. Carbon dioxide in the atmosphere acts like a blanket. If we don’t want the Earth to get too warm, then we need to limit the thickness of the blanket.

Without getting into the details of any climate change model, common sense already tells us what we should do. Keep the atmosphere clean and stable (Don’t’ pee in our fishbowl) and limit the amount of carbon dioxide we put into it (Don’t let the blanket get too thick).

Some Atmospheric Facts

Here are some facts about the atmosphere, about the effect humans have on it, and about the climate:

Fact 1. Humans have increased the amount of carbon dioxide in the atmosphere by 45% since 1850 (the beginning of the industrial age) and by 30% since just 1960.

Fact 2. Carbon dioxide in the atmosphere prevents some of the heat absorbed from the Sun to re-radiate out to space. More carbon dioxide stores more heat.

Fact 3. Heat added to the Earth’s atmosphere increases its temperature. This is a law of physics.

Fact 4. The Earth’s average temperature has risen by 1.2 degrees Celsius since 1850 and 0.8 degrees of that has been just since 1960, so the effect is accelerating.

These facts are indisputable. They hold true regardless of whether there is a Republican or a Democrat in the White House or in control of Congress.

There is another interesting observation which is not so direct, but may hold a harbinger for the distant future: The last time the Earth was 3 degrees Celsius warmer than it is today was during the Pliocene when the sea level was tens of meters higher. If that sea level were to occur today, all of Delaware, most of Florida, half of Louisiana and the entire east coast of the US would be under water, including Houston, Miami, New Orleans, Philadelphia and New York City. There are many reasons why this may not be an immediate worry. The distribution of water and ice now is different than in the Pliocene, and the effect of warming on the ice sheets and water levels could take centuries. Within this century, the amount of sea level rise is likely to be only about 1 meter, but accelerating after that.

Balance and Feedback

It is relatively easy to create a “rule-of-thumb” model for the Earth’s climate (see Ref. [2]). This model is not accurate, but it qualitatively captures the basic effects of climate change and is a good way to get an intuitive feeling for how the Earth responds to changes, like changes in CO_{2} or to the amount of ice cover. It can also provide semi-quantitative results, so that relative importance of various processes or perturbations can be understood.

The model is a simple energy balance statement: In equilibrium, as much energy flows into the Earth system as out.

This statement is both simple and immediately understandable. But then the work starts as we need to pin down how much energy is flowing in and how much is flowing out. The energy flowing in comes from the sun, and the energy flowing out comes from thermal radiation into space.

We also need to separate the Earth system into two components: the surface and the atmosphere. These are two very different things that have two different average temperatures. In addition, the atmosphere transmits sunlight to the surface, unless clouds reflect it back into space. And the Earth radiates thermally into space, unless clouds or carbon dioxide layers reflect it back to the surface.

The energy fluxes are shown in the diagram in Fig. 3 for the 4-component system: Sun, Surface, Atmosphere, and Space. The light from the sun, mostly in the visible range of the spectrum, is partially absorbed by the atmosphere and partially transmitted and reflected. The transmitted portion is partially absorbed and partially reflected by the surface. The heat of the Earth is radiated at long wavelengths to the atmosphere, where it is partially transmitted out into space, but also partially reflected by the fraction a’_{a} which is the blanket effect. In addition, the atmosphere itself radiates in equal parts to the surface and into outer space. On top of all of these radiative processes, there is also non-radiative convective interaction between the atmosphere and the surface.

These processes are captured by two energy flux equations, one for the atmosphere and one for the surface, in Fig. 4. The individual contributions from Fig. 3 are annotated in each case. In equilibrium, each flux equals zero, which can then be used to solve for the two unknowns: T_{s0} and T_{a0}: the surface and atmosphere temperatures.

After the equilibrium temperatures T_{s0} and T_{a0} are found, they go into a set of dynamic response equations that governs how deviations in the temperatures relax back to the equilibrium values. These relaxation equations are

where k_{s} and k_{a} are the relaxation rates for the surface and atmosphere. These can be quite slow, in the range of a century. For illustration, we can take k_{s} = 1/75 years and k_{a} = 1/25 years. The equilibrium temperatures for the surface and atmosphere differ by about 50 degrees Celsius, with T_{s} = 289 K and T_{a} = 248 K. These are rough averages over the entire planet. The solar constant is S = 1.36×10^{3} W/m^{2}, the Stefan-Boltzman constant is σ = 5.67×10^{-8} W/m^{2}/K^{4}, and the convective interaction constant is c = 2.5 W m^{-2} K^{-1}. Other parameters are given in Table I.

Short Wavelength

Long Wavelength

a_{s} = 0.11

t_{s} = 0.53

t’_{a} = 0.06

a_{a} = 0.30

a’_{a} = 0.31

The relaxation equations are in the standard form of a mathematical “flow” (see Ref. [1]) and the solutions are plotted as a phase-space portrait in Fig. 5 as a video of the flow as the parameters in Table I shift because of the addition of greenhouse gases to the atmosphere. The video runs from the year 1850 (the dawn of the industrial age) through to the year 2060 about 40 years from now.

The scariest part of the video is how fast it accelerates. From 1850 to 1950 there is almost no change, but then it accelerates, faster and faster, reflecting the time-lag in temperature rise in response to increased greenhouse gases.

What if the Models are Wrong? Russian Roulette

Now come the caveats.

This model is just for teaching purposes, not for any realistic modeling of climate change. It captures the basic physics, and it provides a semi-quantitative set of parameters that leads to roughly accurate current temperatures. But of course, the biggest elephant in the room is that it averages over the entire planet, which is a very crude approximation.

It does get the basic facts correct, though, showing an alarming trend in the rise in average temperatures with the temperature rising by 3 degrees by 2060.

The professionals in this business have computer models that are orders of magnitude more more accurate than this one. To understand the details of the real climate models, one needs to go to appropriate resources, like this NOAA link, this NASA link, this national climate assessment link, and this government portal link, among many others.

One of the frequent questions that is asked is: What if these models are wrong? What if global warming isn’t as bad as these models say? The answer is simple: If they are wrong, then the worst case is that life goes on. If they are right, then in the worst case life on this planet may end.

It’s like playing Russian Roulette. If just one of the cylinders on the revolver has a live bullet, do you want to pull the trigger?

Matlab Code

function flowatmos.m
mov_flag = 1;
if mov_flag == 1
moviename = 'atmostmp';
aviobj = VideoWriter(moviename,'MPEG-4');
aviobj.FrameRate = 12;
open(aviobj);
end
Solar = 1.36e3; % Solar constant outside atmosphere [J/sec/m2]
sig = 5.67e-8; % Stefan-Boltzman constant [W/m2/K4]
% 1st-order model of Earth + Atmosphere
ta = 0.53; % (0.53)transmissivity of air
tpa0 = 0.06; % (0.06)primes are for thermal radiation
as0 = 0.11; % (0.11)
aa0 = 0.30; % (0.30)
apa0 = 0.31; % (0.31)
c = 2.5; % W/m2/K
xrange = [287 293];
yrange = [247 251];
rngx = xrange(2) - xrange(1);
rngy = yrange(2) - yrange(1);
[X,Y] = meshgrid(xrange(1):0.05:xrange(2), yrange(1):0.05:yrange(2));
smallarrow = 1;
Delta0 = 0.0000009;
for tloop =1:80
Delta = Delta0*(exp((tloop-1)/8)-1); % This Delta is exponential, but should become more linear over time
date = floor(1850 + (tloop-1)*(2060-1850)/79);
[x,y] = f5(X,Y);
clf
hold off
eps = 0.002;
for xloop = 1:11
xs = xrange(1) +(xloop-1)*rngx/10 + eps;
for yloop = 1:11
ys = yrange(1) +(yloop-1)*rngy/10 + eps;
streamline(X,Y,x,y,xs,ys)
end
end
hold on
[XQ,YQ] = meshgrid(xrange(1):1:xrange(2),yrange(1):1:yrange(2));
smallarrow = 1;
[xq,yq] = f5(XQ,YQ);
quiver(XQ,YQ,xq,yq,.2,'r','filled')
hold off
axis([xrange(1) xrange(2) yrange(1) yrange(2)])
set(gcf,'Color','White')
fun = @root2d;
x0 = [0 -40];
x = fsolve(fun,x0);
Ts = x(1) + 288
Ta = x(2) + 288
hold on
rectangle('Position',[Ts-0.05 Ta-0.05 0.1 0.1],'Curvature',[1 1],'FaceColor',[1 0 0],'EdgeColor','k','LineWidth',2)
posTs(tloop) = Ts;
posTa(tloop) = Ta;
plot(posTs,posTa,'k','LineWidth',2);
hold off
text(287.5,250.5,strcat('Date = ',num2str(date)),'FontSize',24)
box on
xlabel('Surface Temperature (oC)','FontSize',24)
ylabel('Atmosphere Temperature (oC)','FontSize',24)
hh = figure(1);
pause(0.01)
if mov_flag == 1
frame = getframe(hh);
writeVideo(aviobj,frame);
end
end % end tloop
if mov_flag == 1
close(aviobj);
end
function F = root2d(xp) % Energy fluxes
x = xp + 288;
feedfac = 0.001; % feedback parameter
apa = apa0 + feedfac*(x(2)-248) + Delta; % Changes in the atmospheric blanket
tpa = tpa0 - feedfac*(x(2)-248) - Delta;
as = as0 - feedfac*(x(1)-289);
F(1) = c*(x(1)-x(2)) + sig*(1-apa)*x(1).^4 - sig*x(2).^4 - ta*(1-as)*Solar/4;
F(2) = c*(x(1)-x(2)) + sig*(1-tpa - apa)*x(1).^4 - 2*sig*x(2).^4 + (1-aa0-ta+as*ta)*Solar/4;
end
function [x,y] = f5(X,Y) % Dynamical flow equations
k1 = 1/75; % 75 year time constant for the Earth
k2 = 1/25; % 25 year time constant for the Atmosphere
fun = @root2d;
x0 = [0 0];
x = fsolve(fun,x0); % Solve for the temperatures that set the energy fluxes to zero
Ts0 = x(1) + 288; % Surface temperature in Kelvin
Ta0 = x(2) + 288; % Atmosphere temperature in Kelvin
xtmp = -k1*(X - Ts0); % Dynamical equations
ytmp = -k2*(Y - Ta0);
nrm = sqrt(xtmp.^2 + ytmp.^2);
if smallarrow == 1
x = xtmp./nrm;
y = ytmp./nrm;
else
x = xtmp;
y = ytmp;
end
end % end f5
end % end flowatmos

This model has a lot of parameters that can be tweaked. In addition to the parameters in the Table, the time dependence on the blanket properties of the atmosphere are governed by Delta0 and by feedfac for feedback of temperature on the atmosphere, such as increasing cloud cover and decrease ice cover. As an exercise, and using only small changes in the given parameters, find the following cases: 1) An increasing surface temperature is moderated by a falling atmosphere temperature; 2) The Earth goes into thermal run-away and ends like Venus; 3) The Earth initially warms then plummets into an ice age.

At the dawn of quantum theory, Heisenberg, Schrödinger, Bohr and Pauli were embroiled in a dispute over whether trajectories of particles, defined by their positions over time, could exist. The argument against trajectories was based on an apparent paradox: To draw a “line” depicting a trajectory of a particle along a path implies that there is a momentum vector that carries the particle along that path. But a line is a one-dimensional curve through space, and since at any point in time the particle’s position is perfectly localized, then by Heisenberg’s uncertainty principle, it can have no definable momentum to carry it along.

My previous blog shows the way out of this paradox, by assembling wavepackets that are spread in both space and momentum, explicitly obeying the uncertainty principle. This is nothing new to anyone who has taken a quantum course. But the surprising thing is that in some potentials, like a harmonic potential, the wavepacket travels without broadening, just like classical particles on a trajectory. A dramatic demonstration of this can be seen in this YouTube video. But other potentials “break up” the wavepacket, especially potentials that display classical chaos. Because phase space is one of the best tools for studying classical chaos, especially Hamiltonian chaos, it can be enlisted to dig deeper into the question of the quantum trajectory—not just about the existence of a quantum trajectory, but why quantum systems retain a shadow of their classical counterparts.

Phase Space

Phase space is the state space of Hamiltonian systems. Concepts of phase space were first developed by Boltzmann as he worked on the problem of statistical mechanics. Phase space was later codified by Gibbs for statistical mechanics and by Poincare for orbital mechanics, and it was finally given its name by Paul and Tatiana Ehrenfest (a husband-wife team) in correspondence with the German physicist Paul Hertz (See Chapter 6, “The Tangled Tale of Phase Space”, in Galileo Unbound by D. D. Nolte (Oxford, 2018)).

The stretched-out phase-space functions … are very similar to the stochastic layer that forms in separatrix chaos in classical systems.

The idea of phase space is very simple for classical systems: it is just a plot of the momentum of a particle as a function of its position. For a given initial condition, the trajectory of a particle through its natural configuration space (for instance our 3D world) is traced out as a path through phase space. Because there is one momentum variable per degree of freedom, then the dimensionality of phase space for a particle in 3D is 6D, which is difficult to visualize. But for a one-dimensional dynamical system, like a simple harmonic oscillator (SHO) oscillating in a line, the phase space is just two-dimensional, which is easy to see. The phase-space trajectories of an SHO are simply ellipses, and if the momentum axis is scaled appropriately, the trajectories are circles. The particle trajectory in phase space can be animated just like a trajectory through configuration space as the position and momentum change in time p(x(t)). For the SHO, the point follows the path of a circle going clockwise.

A more interesting phase space is for the simple pendulum, shown in Fig. 2. There are two types of orbits: open and closed. The closed orbits near the origin are like those of a SHO. The open orbits are when the pendulum is spinning around. The dividing line between the open and closed orbits is called a separatrix. Where the separatrix intersects itself is a saddle point. This saddle point is the most important part of the phase space portrait: it is where chaos emerges when perturbations are added.

One route to classical chaos is through what is known as “separatrix chaos”. It is easy to see why saddle points (also known as hyperbolic points) are the source of chaos: as the system trajectory approaches the saddle, it has two options of which directions to go. Any additional degree of freedom in the system (like a harmonic drive) can make the system go one way on one approach, and the other way on another approach, mixing up the trajectories. An example of the stochastic layer of separatrix chaos is shown in Fig. 3 for a damped driven pendulum. The chaotic behavior that originates at the saddle point extends out along the entire separatrix.

The main question about whether or not there is a quantum trajectory depends on how quantum packets behave as they approach a saddle point in phase space. Since packets are spread out, it would be reasonable to assume that parts of the packet will go one way, and parts of the packet will go another. But first, one has to ask: Is a phase-space description of quantum systems even possible?

Quantum Phase Space: The Wigner Distribution Function

Phase-space portraits are arguably the most powerful tool in the toolbox of classical dynamics, and one would like to retain its uses for quantum systems. However, there is that pesky paradox about quantum trajectories that cannot admit the existence of one-dimensional curves through such a phase space. Furthermore, there is no direct way of taking a wavefunction and simply “finding” its position or momentum to plot points on such a quantum phase space.

The answer was found in 1932 by Eugene Wigner (1902 – 1905), an Hungarian physicist working at Princeton. He realized that it was impossible to construct a quantum probability distribution in phase space that had positive values everywhere. This is a problem, because negative probabilities have no direct interpretation. But Wigner showed that if one relaxed the requirements a bit, so that expectation values computed over some distribution function (that had positive and negative values) gave correct answers that matched experiments, then this distribution function would “stand in” for an actual probability distribution.

The distribution function that Wigner found is called the Wigner distribution function. Given a wavefunction ψ(x), the Wigner distribution is defined as

The Wigner distribution function is the Fourier transform of the convolution of the wavefunction. The pure position dependence of the wavefunction is converted into a spread-out position-momentum function in phase space. For a Gaussian wavefunction ψ(x) with a finite width in space, the W-function in phase space is a two-dimensional Gaussian with finite widths in both space and momentum. In fact, the Δx-Δp product of the W-function is precisely the uncertainty production of the Heisenberg uncertainty relation.

The question of the quantum trajectory from the phase-space perspective becomes whether a Wigner function behaves like a localized “packet” that evolves in phase space in a way analogous to a classical particle, and whether classical chaos is reflected in the behavior of quantum systems.

The Harmonic Oscillator

The quantum harmonic oscillator is a rare and special case among quantum potentials, because the energy spacings between all successive states are all the same. This makes it possible for a Gaussian wavefunction, which is a superposition of the eigenstates of the harmonic oscillator, to propagate through the potential without broadening. To see an example of this, watch the first example in this YouTube video for a Schrödinger cat state in a two-dimensional harmonic potential. For this very special potential, the Wigner distribution behaves just like a (broadened) particle on an orbit in phase space, executing nice circular orbits.

A comparison of the classical phase-space portrait versus the quantum phase-space portrait is shown in Fig. 5. Where the classical particle is a point on an orbit, the quantum particle is spread out, obeying the Δx-Δp Heisenberg product, but following the same orbit as the classical particle.

However, a significant new feature appears in the Wigner representation in phase space when there is a coherent superposition of two states, known as a “cat” state, after Schrödinger’s cat. This new feature has no classical analog. It is the coherent interference pattern that appears at the zero-point of the harmonic oscillator for the Schrödinger cat state. There is no such thing as “classical” coherence, so this feature is absent in classical phase space portraits.

Two examples of Wigner distributions are shown in Fig. 6 for a statistical (incoherent) mixture of packets and a coherent superposition of packets. The quantum coherence signature is present in the coherent case but not the statistical mixture case. The coherence in the Wigner distribution represents “off-diagonal” terms in the density matrix that leads to interference effects in quantum systems. Quantum computing algorithms depend critically on such coherences that tend to decay rapidly in real-world physical systems, known as decoherence, and it is possible to make statements about decoherence by watching the zero-point interference.

Whereas Gaussian wave packets in the quantum harmonic potential behave nearly like classical systems, and their phase-space portraits are almost identical to the classical phase-space view (except for the quantum coherence), most quantum potentials cause wave packets to disperse. And when saddle points are present in the classical case, then we are back to the question about how quantum packets behave as they approach a saddle point in phase space.

Quantum Pendulum and Separatrix Chaos

One of the simplest anharmonic oscillators is the simple pendulum. In the classical case, the period diverges if the pendulum gets very close to going vertical. A similar thing happens in the quantum case, but because the motion has strong anharmonicity, an initial wave packet tends to spread dramatically as parts of the wavefunction less vertical stretch away from the part of the wave function that is more nearly vertical. Fig. 7 is a snap-shot about a eighth of a period after the wave packet was launched. The packet has already stretched out along the separatrix. A double-cat-state was used, so there is a second packet that has coherent interference with the first. To see a movie of the time evolution of the wave packet and the orbit in quantum phase space, see the YouTube video.

The simple pendulum does have a saddle point, but it is degenerate because the angle is modulo -2-pi. A simple potential that has a non-degenerate saddle point is a double-well potential.

Quantum Double-Well and Separatrix Chaos

The symmetric double-well potential has a saddle point at the mid-point between the two well minima. A wave packet approaching the saddle will split into to packets that will follow the individual separatrixes that emerge from the saddle point (the unstable manifolds). This effect is seen most dramatically in the middle pane of Fig. 8. For the full video of the quantum phase-space evolution, see this YouTube video. The stretched-out distribution in phase space is highly analogous to the separatrix chaos seen for the classical system.

Conclusion

A common statement often made about quantum chaos is that quantum systems tend to suppress chaos, only exhibiting chaos for special types of orbits that produce quantum scars. However, from the phase-space perspective, the opposite may be true. The stretched-out Wigner distribution functions, for critical wave packets that interact with a saddle point, are very similar to the stochastic layer that forms in separatrix chaos in classical systems. In this sense, the phase-space description brings out the similarity between classical chaos and quantum chaos.

Heisenberg’s uncertainty principle is a law of physics – it cannot be violated under any circumstances, no matter how much we may want it to yield or how hard we try to bend it. Heisenberg, as he developed his ideas after his lone epiphany like a monk on the isolated island of Helgoland off the north coast of Germany in 1925, became a bit of a zealot, like a religious convert, convinced that all we can say about reality is a measurement outcome. In his view, there was no independent existence of an electron other than what emerged from a measuring apparatus. Reality, to Heisenberg, was just a list of numbers in a spread sheet—matrix elements. He took this line of reasoning so far that he stated without exception that there could be no such thing as a trajectory in a quantum system. When the great battle commenced between Heisenberg’s matrix mechanics against Schrödinger’s wave mechanics, Heisenberg was relentless, denying any reality to Schrödinger’s wavefunction other than as a calculation tool. He was so strident that even Bohr, who was on Heisenberg’s side in the argument, advised Heisenberg to relent [1]. Eventually a compromise was struck, as Heisenberg’s uncertainty principle allowed Schrödinger’s wave functions to exist within limits—his uncertainty limits.

Disaster in the Poconos

Yet the idea of an actual trajectory of a quantum particle remained a type of heresy within the close quantum circles. Years later in 1948, when a young Richard Feynman took the stage at a conference in the Poconos, he almost sabotaged his career in front of Bohr and Dirac—two of the giants who had invented quantum mechanics—by having the audacity to talk about particle trajectories in spacetime diagrams.

Feynman was making his first presentation of a new approach to quantum mechanics that he had developed based on path integrals. The challenge was that his method relied on space-time graphs in which “unphysical” things were allowed to occur. In fact, unphysical things were required to occur, as part of the sum over many histories of his path integrals. For instance, a key element in the approach was allowing electrons to travel backwards in time as positrons, or a process in which the electron and positron annihilate into a single photon, and then the photon decays back into an electron-positron pair—a process that is not allowed by mass and energy conservation. But this is a possible history that must be added to Feynman’s sum.

It all looked like nonsense to the audience, and the talk quickly derailed. Dirac pestered him with questions that he tried to deflect, but Dirac persisted like a raven. A question was raised about the Pauli exclusion principle, about whether an orbital could have three electrons instead of the required two, and Feynman said that it could—all histories were possible and had to be summed over—an answer that dismayed the audience. Finally, as Feynman was drawing another of his space-time graphs showing electrons as lines, Bohr rose to his feet and asked derisively whether Feynman had forgotten Heisenberg’s uncertainty principle that made it impossible to even talk about an electron trajectory.

It was hopeless. The audience gave up and so did Feynman as the talk just fizzled out. It was a disaster. What had been meant to be Feynman’s crowning achievement and his entry to the highest levels of theoretical physics, had been a terrible embarrassment. He slunk home to Cornell where he sank into one of his depressions. At the close of the Pocono conference, Oppenheimer, the reigning king of physics, former head of the successful Manhattan Project and newly selected to head the prestigious Institute for Advanced Study at Princeton, had been thoroughly disappointed by Feynman.

But what Bohr and Dirac and Oppenheimer had failed to understand was that as long as the duration of unphysical processes was shorter than the energy differences involved, then it was literally obeying Heisenberg’s uncertainty principle. Furthermore, Feynman’s trajectories—what became his famous “Feynman Diagrams”—were meant to be merely cartoons—a shorthand way to keep track of lots of different contributions to a scattering process. The quantum processes certainly took place in space and time, conceptually like a trajectory, but only so far as time durations, and energy differences and locations and momentum changes were all within the bounds of the uncertainty principle. Feynman had invented a bold new tool for quantum field theory, able to supply deep results quickly. But no one at the Poconos could see it.

Coherent States

When Feynman had failed so miserably at the Pocono conference, he had taken the stage after Julian Schwinger, who had dazzled everyone with his perfectly scripted presentation of quantum field theory—the competing theory to Feynman’s. Schwinger emerged the clear winner of the contest. At that time, Roy Glauber (1925 – 2018) was a young physicist just taking his PhD from Schwinger at Harvard, and he later received a post-doc position at Princeton’s Institute for Advanced Study where he became part of a miniature revolution in quantum field theory that revolved around—not Schwinger’s difficult mathematics—but Feynman’s diagrammatic method. So Feynman won in the end. Glauber then went on to Caltech, where he filled in for Feynman’s lectures when Feynman was off in Brazil playing the bongos. Glauber eventually returned to Harvard where he was already thinking about the quantum aspects of photons in 1956 when news of the photon correlations in the Hanbury-Brown Twiss (HBT) experiment were published. Three years later, when the laser was invented, he began developing a theory of photon correlations in laser light that he suspected would be fundamentally different than in natural chaotic light.

Because of his background in quantum field theory, and especially quantum electrodynamics, it was fairly easy to couch the quantum optical properties of coherent light in terms of Dirac’s creation and annihilation operators of the electromagnetic field. Glauber developed a “coherent state” operator that was a minimum uncertainty state of the quantized electromagnetic field, related to the minimum-uncertainty wave functions derived initially by Schrödinger in the late 1920’s. The coherent state represents a laser operating well above the lasing threshold and behaved as “the most classical” wavepacket that can be constructed. Glauber was awarded the Nobel Prize in Physics in 2005 for his work on such “Glauber states” in quantum optics.

Quantum Trajectories

Glauber’s coherent states are built up from the natural modes of a harmonic oscillator. Therefore, it should come as no surprise that these coherent-state wavefunctions in a harmonic potential behave just like classical particles with well-defined trajectories. The quadratic potential matches the quadratic argument of the the Gaussian wavepacket, and the pulses propagate within the potential without broadening, as in Fig. 3, showing a snapshot of two wavepackets propagating in a two-dimensional harmonic potential. This is a somewhat radical situation, because most wavepackets in most potentials (or even in free space) broaden as they propagate. The quadratic potential is a special case that is generally not representative of how quantum systems behave.

To illustrate this special status for the quadratic potential, the wavepackets can be launched in a potential with a quartic perturbation. The quartic potential is anharmonic—the frequency of oscillation depends on the amplitude of oscillation unlike for the harmonic oscillator, where amplitude and frequency are independent. The quartic potential is integrable, like the harmonic oscillator, and there is no avenue for chaos in the classical analog. Nonetheless, wavepackets broaden as they propagate in the quartic potential, eventually spread out into a ring in the configuration space, as in Fig. 4.

A potential with integrability has as many conserved quantities to the motion as there are degrees of freedom. Because the quartic potential is integrable, the quantum wavefunction may spread, but it remains highly regular, as in the “ring” that eventually forms over time. However, integrable potentials are the exception rather than the rule. Most potentials lead to nonintegrable motion that opens the door to chaos.

A classic (and classical) potential that exhibits chaos in a two-dimensional configuration space is the famous Henon-Heiles potential. This has a four-dimensional phase space which admits classical chaos. The potential has a three-fold symmetry which is one reason it is non-integral, since a particle must “decide” which way to go when it approaches a saddle point. In the quantum regime, wavepackets face the same decision, leading to a breakup of the wavepacket on top of a general broadening. This allows the wavefunction eventually to distribute across the entire configuration space, as in Fig. 5.

Youtube Video

Movies of quantum trajectories can be viewed at my Youtube Channel, Physics Unbound. The answer to the question “Is there a quantum trajectory?” can be seen visually as the movies run—they do exist in a very clear sense under special conditions, especially coherent states in a harmonic oscillator. And the concept of a quantum trajectory also carries over from a classical trajectory in cases when the classical motion is integrable, even in cases when the wavefunction spreads over time. However, for classical systems that display chaotic motion, wavefunctions that begin as coherent states break up into chaotic wavefunctions that fill the accessible configuration space for a given energy. The character of quantum evolution of coherent states—the most classical of quantum wavefunctions—in these cases reflects the underlying character of chaotic motion in the classical analogs. This process can be seen directly watching the movies as a wavepacket approaches a saddle point in the potential and is split. Successive splits of the multiple wavepackets as they interact with the saddle points is what eventually distributes the full wavefunction into its chaotic form.

Therefore, the idea of a “quantum trajectory”, so thoroughly dismissed by Heisenberg, remains a phenomenological guide that can help give insight into the behavior of quantum systems—both integrable and chaotic.

As a side note, the laws of quantum physics obey time-reversal symmetry just as the classical equations do. In the third movie of “A Quantum Ballet“, wavefunctions in a double-well potential are tracked in time as they start from coherent states that break up into chaotic wavefunctions. It is like watching entropy in action as an ordered state devolves into a disordered state. But at the half-way point of the movie, the imaginary part of the wavefunction has its sign flipped, and the dynamics continue. But now the wavefunctions move from disorder into an ordered state, seemingly going against the second law of thermodynamics. Flipping the sign of the imaginary part of the wavefunction at just one instant in time plays the role of a time-reversal operation, and there is no violation of the second law.