Galileo’s Moons in the History of Science

When Galileo trained his crude telescope on the planet Jupiter, hanging above the horizon in 1610, and observed moons orbiting a planet other than Earth, it created a quake whose waves have rippled down through the centuries to today.  Never had such hard evidence been found that supported the Copernican idea of non-Earth-centric orbits, freeing astronomy and cosmology from a thousand years of error that shaded how people thought.

The Earth, after all, was not the center of the Universe.

Galileo’s moons: the Galilean Moons—Io, Europa, Ganymede, and Callisto—have drawn our eyes skyward now for over 400 years.  They have been the crucible for numerous scientific discoveries, serving as a test bed for new ideas and new techniques, from the problem of longitude to the speed of light, from the birth of astronomical interferometry to the beginnings of exobiology.  Here is a short history of Galileo’s Moons in the history of physics.

Galileo (1610): Celestial Orbits

In late 1609, Galileo (1564 – 1642) received an unwelcome guest to his home in Padua—his mother.  She was not happy with his mistress, and she was not happy with his chosen profession, but she was happy to tell him so.  By the time she left in early January 1610, he was yearning for something to take his mind off his aggravations, and he happened to point his new 20x telescope in the direction of the planet Jupiter hanging above the horizon [1].  Jupiter appeared as a bright circular spot, but nearby were three little stars all in line with the planet.  The alignment caught his attention, and when he looked again the next night, the position of the stars had shifted.  On successive nights he saw them shift again, sometimes disappearing into Jupiter’s bright disk.  Several days later he realized that there was a fourth little star that was also behaving the same way.  At first confused, he had a flash of insight—the little stars were orbiting the planet.  He quickly understood that just as the Moon orbited the Earth, these new “Medicean Planets” were orbiting Jupiter.  In March 1610, Galileo published his findings in Siderius Nuncius (The Starry Messenger). 

Page from Galileo’s Starry Messenger showing the positions of the moon of Jupiter

It is rare in the history of science for there not to be a dispute over priority of discovery.  Therefore, by an odd chance of fate, on the same nights that Galileo was observing the moons of Jupiter with his telescope from Padua, the German astronomer Simon Marius (1573 – 1625) also was observing them through a telescope of his own from Bavaria.  It took Marius four years to publish his observations, long after Galileo’s Siderius had become a “best seller”, but Marius took the opportunity to claim priority.  When Galileo first learned of this, he called Marius “a poisonous reptile” and “an enemy of all mankind.”  But harsh words don’t settle disputes, and the conflicting claims of both astronomers stood until the early 1900’s when a scientific enquiry looked at the hard evidence.  By that same odd chance of fate that had compelled both men to look in the same direction around the same time, the first notes by Marius in his notebooks were dated to a single day after the first notes by Galileo!  Galileo’s priority survived, but Marius may have had the last laugh.  The eternal names of the “Galilean” moons—Io, Europe, Ganymede and Callisto—were given to them by Marius.

Picard and Cassini (1671):  Longitude

The 1600’s were the Age of Commerce for the European nations who relied almost exclusively on ships and navigation.  While latitude (North-South) was easily determined by measuring the highest angle of the sun above the southern horizon, longitude (East-West) relied on clocks which were notoriously inaccurate, especially at sea. 

The Problem of Determining Longitude at Sea is the subject of Dava Sobel’s thrilling book Longitude (Walker, 1995) [2] where she reintroduced the world to what was once the greatest scientific problem of the day.  Because almost all commerce was by ships, the determination of longitude at sea was sometimes the difference between arriving safely in port with a cargo or being shipwrecked.  Galileo knew this, and later in his life he made a proposal to the King of Spain to fund a scheme to use the timings of the eclipses of his moons around Jupiter to serve as a “celestial clock” for ships at sea.  Galileo’s grant proposal went unfunded, but the possibility of using the timings of Jupiter’s moons for geodesy remained an open possibility, one which the King of France took advantage of fifty years later.

In 1671 the newly founded Academie des Sciences in Paris funded an expedition to the site of Tycho Brahe’s Uranibourg Observatory in Hven, Denmark, to measure the time of the eclipses of the Galilean moons observed there to be compared the time of the eclipses observed in Paris by Giovanni Cassini (1625 – 1712).  When the leader of the expedition, Jean Picard (1620 – 1682), arrived in Denmark, he engaged the services of a local astronomer, Ole Rømer (1644 – 1710) to help with the observations of over 100 eclipses of the Galilean moon Io by the planet Jupiter.  After the expedition returned to France, Cassini and Rømer calculated the time differences between the observations in Paris and Hven and concluded that Galileo had been correct.  Unfortunately, observing eclipses of the tiny moon from the deck of a ship turned out not to be practical, so this was not the long-sought solution to the problem of longitude, but it contributed to the early science of astrometry (the metrical cousin of astronomy).  It also had an unexpected side effect that forever changed the science of light.

Ole Rømer (1676): The Speed of Light

Although the differences calculated by Cassini and Rømer between the times of the eclipses of the moon Io between Paris and Hven were small, on top of these differences was superposed a surprisingly large effect that was shared by both observations.  This was a systematic shift in the time of eclipse that grew to a maximum value of 22 minutes half a year after the closest approach of the Earth to Jupiter and then decreased back to the original time after a full year had passed and the Earth and Jupiter were again at their closest approach.  At first Cassini thought the effect might be caused by a finite speed to light, but he backed away from this conclusion because Galileo had shown that the speed of light was unmeasurably fast, and Cassini did not want to gainsay the old master.

Ole Rømer

Rømer, on the other hand, was less in awe of Galileo’s shadow, and he persisted in his calculations and concluded that the 22 minute shift was caused by the longer distance light had to travel when the Earth was farthest away from Jupiter relative to when it was closest.  He presented his results before the Academie in December 1676 where he announced that the speed of light, though very large, was in fact finite.  Unfortnately, Rømer did not have the dimensions of the solar system at his disposal to calculate an actual value for the speed of light, but the Dutch mathematician Huygens did.

When Huygens read the proceedings of the Academie in which Rømer had presented his findings, he took what he knew of the radius of Earth’s orbit and the distance to Jupiter and made the first calculation of the speed of light.  He found a value of 220,000 km/second (kilometers did not exist yet, but this is the equivalent of what he calculated).  This value is 26 percent smaller than the true value, but it was the first time a number was given to the finite speed of light—based fundamentally on the Galilean moons. For a popular account of the story of Picard and Rømer and Huygens and the speed of light, see Ref. [3].

Michelson (1891): Astronomical Interferometry

Albert Michelson (1852 – 1931) was the first American to win the Nobel Prize in Physics.  He received the award in 1907 for his work to replace the standard meter, based on a bar of metal housed in Paris, with the much more fundamental wavelength of red light emitted by Cadmium atoms.  His work in Paris came on the heels of a new and surprising demonstration of the use of interferometry to measure the size of astronomical objects.

Albert Michelson

The wavelength of light (a millionth of a meter) seems ill-matched to measuring the size of astronomical objects (thousands of meters) that are so far from Earth (billions of meters).  But this is where optical interferometry becomes so important.  Michelson realized that light from a distant object, like a Galilean moon of Jupiter, would retain some partial coherence that could be measured using optical interferometry.  Furthermore, by measuring how the interference depended on the separation of slits placed on the front of a telescope, it would be possible to determine the size of the astronomical object.

From left to right: Walter Adams, Albert Michelson, Walther Mayer, Albert Einstein, Max Ferrand, and Robert Milliken. Photo taken at Caltech.

In 1891, Michelson traveled to California where the Lick Observatory was poised high above the fog and dust of agricultural San Jose (a hundred years before San Jose became the capitol of high-tech Silicon Valley).  Working with the observatory staff, he was able to make several key observations of the Galilean moons of Jupiter.  These were just close enough that their sizes could be estimated (just barely) from conventional telescopes.  Michelson found from his calculations of the interference effects that the sizes of the moons matched the conventional sizes to within reasonable error.  This was the first demonstration of astronomical interferometry which has burgeoned into a huge sub-discipline of astronomy today—based originally on the Galilean moons [4].

Pioneer (1973 – 1974): The First Tour

Pioneer 10 was launched on March 3, 1972 and made its closest approach to Jupiter on Dec. 3, 1973. Pioneer 11 was launched on April 5, 1973 and made its closest approach to Jupiter on Dec. 3, 1974 and later was the first spacecraft to fly by Saturn. The Pioneer spacecrafts were the first to leave the solar system (there have now been 5 that have left, or will leave, the solar system). The cameras on the Pioneers were single-pixel instruments that made line-scans as the spacecraft rotated. The point light detector was a Bendix Channeltron photomultiplier detector, which was a vacuum tube device (yes vacuum tube!) operating at a single-photon detection efficiency of around 10%. At the time of the system design, this was a state-of-the-art photon detector. The line scanning was sufficient to produce dramatic photographs (after extensive processing) of the giant planets. The much smaller moons were seen with low resolution, but were still the first close-ups ever to be made of Galileo’s moons.

Voyager (1979): The Grand Tour

Voyager 1 was launched on Sept. 5, 1977 and Voyager 2 was launched on August 20, 1977. Although Voyager 1 was launched second, it was the first to reach Jupiter with closest approach on March 5, 1979. Voyager 2 made its closest approach to Jupiter on July 9, 1979.

In the Fall of 1979, I had the good fortune to be an undergraduate at Cornell University when Carl Sagan gave an evening public lecture on the Voyager fly-bys, revealing for the first time the amazing photographs of not only Jupiter but of the Galilean Moons. Sitting in the audience listening to Sagan, a grand master of scientific story telling, made you feel like you were a part of history. I have never been so convinced of the beauty and power of science and technology as I was sitting in the audience that evening.

The camera technology on the Voyagers was a giant leap forward compared to the Pioneer spacecraft. The Voyagers used cathode ray vidicon cameras, like those used in television cameras of the day, with high-resolution imaging capabilities. The images were spectacular, displaying alien worlds in high-def for the first time in human history: volcanos and lava flows on the moon of Io; planet-long cracks in the ice-covered surface of Europa; Callisto’s pock-marked surface; Ganymede’s eerie colors.

The Voyager’s discoveries concerning the Galilean Moons were literally out of this world. Io was discovered to be a molten planet, its interior liquified by tidal-force heating from its nearness to Jupiter, spewing out sulfur lava onto a yellowed terrain pockmarked by hundreds of volcanoes, sporting mountains higher than Mt. Everest. Europa, by contrast, was discovered to have a vast flat surface of frozen ice, containing no craters nor mountains, yet fractured by planet-scale ruptures stained tan (for unknown reasons) against the white ice. Ganymede, the largest moon in the solar system, is a small planet, larger than Mercury. The Voyagers revealed that it had a blotchy surface with dark cratered patches interspersed with light smoother patches. Callisto, again by contrast, was found to be the most heavily cratered moon in the solar system, with its surface pocked by countless craters.

Galileo (1995): First in Orbit

The first mission to orbit Jupiter was the Galileo spacecraft that was launched, not from the Earth, but from Earth orbit after being delivered there by the Space Shuttle Atlantis on Oct. 18, 1989. Galileo arrived at Jupiter on Dec. 7, 1995 and was inserted into a highly elliptical orbit that became successively less eccentric on each pass. It orbited Jupiter for 8 years before it was purposely crashed into the planet (to prevent it from accidentally contaminating Europa that may support some form of life).

Galileo made many close passes to the Galilean Moons, providing exquisite images of the moon surfaces while its other instruments made scientific measurements of mass and composition. This was the first true extended study of Galileo’s Moons, establishing the likely internal structures, including the liquid water ocean lying below the frozen surface of Europa. As the largest body of liquid water outside the Earth, it has been suggested that some form of life could have evolved there (or possibly been seeded by meteor ejecta from Earth).

Juno (2016): Still Flying

The Juno spacecraft was launched from Cape Canaveral on Aug. 5, 2011 and entered a Jupiter polar orbit on July 5, 2016. The mission has been producing high-resolution studies of the planet. The mission was extended in 2021 to last to 2025 to include several close fly-bys of the Galilean Moons, especially Europa, which will be the object of several upcoming missions because of the possibility for the planet to support evolved life. These future missions include NASA’s Europa Clipper Mission, the ESA’s Jupiter Icy Moons Explorer, and the Io Volcano Observer.

Epilog (2060): Colonization of Callisto

In 2003, NASA identified the moon Callisto as the proposed site of a manned base for the exploration of the outer solar system. It would be the next most distant human base to be established after Mars, with a possible start date by the mid-point of this century. Callisto was chosen because it is has a low radiation level (being the farthest from Jupiter of the large moons) and is geologically stable. It also has a composition that could be mined to manufacture rocket fuel. The base would be a short-term way-station (crews would stay for no longer than a month) for refueling before launching and using a gravity assist from Jupiter to sling-shot spaceships to the outer planets.

[1] See Chapter 2, A New Scientist: Introducing Galileo, in David D. Nolte, Galileo Unbound (Oxford University Press, 2018).

[2] Dava Sobel, Longitude: The True Story of a Lone Genius who Solved the Greatest Scientific Problem of his Time (Walker, 1995)

[3] See Chap. 1, Thomas Young Polymath: The Law of Interference, in David D. Nolte, Interference: The History of Optical Interferometry and the Scientists who Tamed Light (Oxford University Press, 2023)

[4] See Chapter 5, Stellar Interference: Measuring the Stars, in David D. Nolte, Interference: The History of Optical Interferometry and the Scientists who Tamed Light (Oxford University Press, 2023).

The Mighty Simplex

There is no greater geometric solid than the simplex.  It is the paragon of efficiency, the pinnacle of symmetry, and the prototype of simplicity.  If the universe were not constructed of continuous coordinates, then surely it would be tiled by tessellations of simplices.

Indeed, simplices, or simplexes, arise in a wide range of geometrical problems and real-world applications.  For instance, metallic alloys are described on a simplex to identify the constituent elements [1].  Zero-sum games in game theory and ecosystems in population dynamics are described on simplexes [2], and the Dantzig simplex algorithm is a central algorithm for optimization in linear programming [3].  Simplexes also are used in nonlinear minimization (amoeba algorithm), in classification problems in machine learning, and they also raise their heads in quantum gravity.  These applications reflect the special status of the simplex in the geometry of high dimensions.

… It’s Simplexes all the way down!

The reason for their usefulness is the simplicity of their construction that guarantees a primitive set that is always convex.  For instance, in any space of d-dimensions, the simplest geometric figure that can be constructed of flat faces to enclose a d-volume consists of d+1 points that is the d-simplex. 

Or …

In any space of d-dimensions, the simplex is the geometric figure whose faces are simplexes, whose faces are simplexes, whose faces are again simplexes, and those faces are once more simplexes … And so on. 

In other words, it’s simplexes all the way down.

Simplex Geometry

In this blog, I will restrict the geometry to the regular simplex.  The regular simplex is the queen of simplexes: it is the equilateral simplex for which all vertices are equivalent, and all faces are congruent, and all sub-faces are congruent, and so on.  The regular simplexes have the highest symmetry properties of any polytope. A polytope is the d-dimensional generalization of a polyhedron.  For instance, the regular 2-simplex is the equilateral triangle, and the regular 3-simplex is the equilateral tetrahedron.

The N-simplex is the high-dimensional generalization of the tetrahedron.  It is a regular N-dimensional polytope with N+1 vertexes.  Starting at the bottom and going up, the simplexes are the point (0-simplex), the unit line (1-simplex), the equilateral triangle (2-simplex), the tetrahedron (3-simplex), the pentachoron (4-simplex), the hexateron (5-simplex) and onward.  When drawn on the two-dimensional plane, the simplexes are complete graphs with links connecting every node to every other node.  This dual character of equidistance and completeness give simplexes their utility. Each node is equivalent and is linked to each other.  There are N•(N-1)/2 links among N vertices, and there are (N-2)•(N-1)/2 triangular faces.

Fig. 1  The N-simplex structures from 1-D through 10-D.  Drawn on the 2D plane, the simplexes are complete graphs with links between every node.  The number of vertices is equal to the number of dimensions plus one. (Wikipedia)
Fig. 2 Coulomb-spring visualization of the energy minimization of a 12-simplex (a 12-dimensional tetrahedron). Each node is a charge. Each link is a spring. Beginning as a complete graph on the planar circle, it finds a minimum configuration with 3 internal nodes.

Construction of a d-simplex is recursive:  Begin with a (d-1)-dimensional simplex and add a point along an orthogonal dimension to construct a d-simplex.  For instance, to create a 2-simplex (an equilateral triangle), find the mid-point of the 1-simplex (a line segment)

            Centered 1-simplex:                (-1), (1)    

add a point on the perpendicular that is the same distance from each original vertex as the original vertices were distant from each other     

            Off-centered 2-simplex:         (-1,0), (1,0), (0, sqrt(3)/2)

Then shift the origin to the center of mass of the triangle

            Centered 2-simplex:               (-1, -sqrt(3)/6), (1, -sqrt(3)/6), (0, sqrt(3)/3)

The 2-simplex, i.e., the equilateral triangle, has a 1-simplex as each of its faces.  And each of those 1-simplexes has a 0-simplex as each of its ends.  Therefore, this recursive construction of ever higher-dimensional simplexes out of low-dimensional ones, provides an interesting pattern:

Fig. 3 The entries are the same numbers that appear in Pascal’s Triangle. (Wikipedia)

The coordinates of an N-simplex are not unique, although there are several convenient conventions.  One convention defines standard coordinates for an N-simplex in N+1 coordinate bases.  These coordinates embed the simplex into a space of one higher dimension.  For instance, the standard 2-simplex is defined by the coordinates (001), (010), (100) forming a two-dimensional triangle in three dimensions, and the simplex is a submanifold in the embedding space.  A more efficient coordinate choice matches the coordinate-space dimensionality to the dimensionality of the simplex.  Hence the 10 vertices of a 9-simplex can be defined by 9 coordinates (also not unique).  One choice is given in Fig. 4 for the 1-simplex up to the 9-simplex. 

Fig. 4 One possible set of coordinates for the 1-simplex up to the 9-simplex.  The center of mass of the simplex is at the origin, and the edge lengths are equal to 2.

The equations for the simplex coordinates are


is the “diagonal” vector.  These coordinates are centered on the center of mass of the simplex, and the links all have length equal to 2 which can be rescaled by a multiplying factor.  The internal dihedral angle between all of the coordinate vectors for an N-simplex is

For moderate to high-dimensionality, the position vectors of the simplex vertices are pseudo-orthogonal.  For instance, for N = 9 the dihedral angle cosine is -1/9 = -0.111.  For higher dimensions, the simplex position vectors become asymptotically orthogonal.  Such orthogonality is an important feature for orthonormal decomposition of class superpositions, for instance of overlapping images.

Alloy Mixtures and Barycentric Coordinates

For linear systems, the orthonormality of basis representations is one of the most powerful features for system analysis in terms of superposition of normal modes.  Neural networks, on the other hand, are intrinsically nonlinear decision systems for which linear superposition does not hold inside the network, even if the symbols presented to the network are orthonormal superpositions.  This loss of orthonormality in deep networks can be partially retrieved by selecting the Simplex code.  It has pseudo-orthogonal probability distribution functions located on the vertices of the simplex.  There is an additional advantage to using the Simplex code: by using so-called barycentric coordinates, the simplex vertices can be expressed as independent bases.  An example for the 2-simplex is shown in Fig. 5.  The x-y Cartesian coordinates of the vertices (using tensor index notation) are given by (S11, S12), (S21, S22), and (S31, S32).  Any point (x1, x2) on the plane can be expressed as a linear combination of the three vertices with barycentric coordinates (v1, v2, v3) by solving for these three coefficients from the equation

using Cramers rule.  For instance, the three vertices of the simplex are expressed using the 3-component barycentric coordinates (1,0,0), (0,1,0) and (0,0,1).  The mid-points on the edges have barycentric coordinates (1/2,1/2,0), (0,1/2,1/2), and (1/2,0,1/2).  The centroid of the simplex has barycentric coordinates (1/3,1/3,1/3).  Barycentric coordinates on a simplex are commonly used in phase diagrams of alloy systems in materials science. The simplex can also be used to identify crystallographic directions in three-dimensions, as in Fig. 6.

Fig. 5  Barycentric coordinates on the 2-Simplex.  The vertices represent “orthogonal” pure symbols.  Superpositions of 2 symbols lie on the edges.  Any point on the simplex can be represented using barycentric coordinates with three indices corresponding to the mixture of the three symbols.
Fig. 6 Crystallographic orientations expressed on a simplex. From A Treatise on Crystallography, William Miller, Cambridge (1839)

Replicator Dynamics on the Simplex

Ecosystems are among the most complex systems on Earth.  The complex interactions among hundreds or thousands of species may lead to steady homeostasis in some cases, to growth and collapse in other cases, and to oscillations or chaos in yet others.  But the definition of species can be broad and abstract, referring to businesses and markets in economic ecosystems, or to cliches and acquaintances in social ecosystems, among many other examples.  These systems are governed by the laws of evolutionary dynamics that include fitness and survival as well as adaptation. The dimensionality of the dynamical spaces for these systems extends to hundreds or thousands of dimensions—far too complex to visualize when thinking in four dimensions is already challenging. 

A classic model of interacting species is the replicator equation. It allows for a fitness-based proliferation and for trade-offs among the individual species. The replicator dynamics equations are shown in Fig. 7.

Fig. 7 Replicator dynamics has a surprisingly simple form, but with surprisingly complicated behavior. The key elements are the fitness and the payoff matrix. The fitness relates to how likely the species will survive. The payoff matrix describes how one species gains at the loss of another (although symbiotic relationships also occur).

The population dynamics on the 2D simplex are shown in Fig. 8 for several different pay-off matrices (square matrix to the upper left of each simplex). The matrix values are shown in color and help interpret the trajectories. For instance the simplex on the upper-right shows a fixed point center. This reflects the antisymmetric character of the pay-off matrix around the diagonal. The stable spiral on the lower-left has a nearly asymmetric pay-off matrix, but with unequal off-diagonal magnitudes. The other two cases show central saddle points with stable fixed points on the boundary. A large variety of behaviors are possible for this very simple system. The Python program can be found in

Fig. 8 Payoff matrix and population simplex for four random cases: Upper left is an unstable saddle. Upper right is a center. Lower left is a stable spiral. Lower right is a marginal case.

Linear Programming with the Dantzig Simplex

There is a large set of optimization problems in which a linear objective function is to be minimized subject to a set of inequalities. This is known as “Linear Programming”. These LP systems can be expressed as

The vector index goes from 1 to d, the dimension of the space. Each inequality creates a hyperplane, where two such hyperplanes intersect along a line terminated at each end by a vertex point. The set of vertexes defines a polytope in d-dimensions, and each face of the polytope, when combined with the point at the origin, defines a 3-simplex.

It is easy to visualize in lower dimensions why the linear objective function must have an absolute minimum at one of the vertexes of the polytope. And finding that minimum is a trivial exercise: Start at any vertex. Poll each neighboring vertex and move to the one that has the lowest value of the objective function. Repeat until the current vertex has a lower objective value than any neighbors. Because of the linearity of the objective function, this is a unique minimum (except for rare cases of accidental degeneracy). This iterative algorithm defines a walk on the vertexes of the polytope.

The question arises, why not just evaluate the objection function at each vertex and then just pick the vertex with the lowest value? The answer in high dimensions is that there are too many vertexes, and finding all of them is inefficient. If there are N vertexes, the walk to the solution visits only a few of the vertexes, on the order of log(N). The algorithm therefore scales as log(N), just like a search tree.

Fig. 9 Dantzig simplex approach on a convex 3D space of basic solutions in a linear programming problem.

This simple algorithm was devised by George Dantzig (1914 – 2005) in 1939 when he was a graduate student at UC Berkeley. He had arrived late to class and saw two problems written on the chalk board. He assumed that these were homework assignments, so he wrote them down and worked on them over the following week. He recalled that they seemed a bit harder than usual, but he eventually solved them and turned them in. A few weeks later, his very excited professor approached him and told him that the problems weren’t homework–they were two of the most important outstanding problems in optimization and that Dantzig had just solved them! The 1997 movie Good Will Hunting, with Matt Damon, Ben Affleck, and Robin Williams, borrowed this story for the opening scene.

The Amoeba Simplex Crawling through Hyperspace

Unlike linear programming problems with linear objective functions, multidimensional minimization of nonlinear objective functions is an art unto itself, with many approach. One of these is a visually compelling algorithm that does the trick more often than not. This is the so-called amoeba algorithm that shares much in common with the Dantzig simplex approach to linear programming, but instead of a set of fixed simplex coordinates, it uses a constantly shifting d-dimensional simplex that “crawls” over the objective function, seeking its minimum.

One of the best descriptions of the amoeba simplex algorithm is in “Numerical Recipes” [4] that describes the crawling simplex as

When it reaches a “valley floor”, the method contracts itself in the transverse direction and tries to ooze down the valley. If there is a situation where the simplex is trying to “pass through the eye of a needle”, it contracts itself in all directions, pulling itself in around its lowest (best) point.

(From Press, Numerical Recipes, Cambridge)

The basic operations for the crawling simplex are reflection and scaling. For a given evaluation of all the vertexes of the simplex, one will have the highest value and another the lowest. In a reflection, the highest point is reflected through the d-dimensional face defined by the other d vertexes. After reflection, if the new evaluation is lower than the former lowest value, then the point is expanded. If, on the other hand, it is little better than it was before reflection, then the point is contracted. The expansion and contraction are what allows the algorithm to slide through valleys or shrink to pass through the eye of a needle.

The amoeba algorithm was developed by John Nelder and Roger Mead in 1965 at a time when computing power was very limited. The algorithm works great as a first pass at a minimization problem, and it almost always works for moderately small dimensions, but for very high dimensions there are more powerful algorithms today for optimization, built into all the deep learning software environments like Tensor Flow and the Matlab toolbox.

[1] M. Hillert, Phase equilibria, phase diagrams and phase transformations : their thermodynamic basis.  (Cambridge University Press, Cambridge, UK ;, ed. 2nd ed., 2008).

[2] P. Schuster, K. Sigmund, Replicator Dynamics. Journal of Theoretical Biology 100, 533-538 (1983); P. Godfrey-Smith, The replicator in retrospect. Biology & Philosophy 15, 403-423 (2000).

[3] R. E. Stone, C. A. Tovey, The Simplex and Projective Scaling Algorithms as Iteratively Reweighted Least-squares Methods. Siam Review 33, 220-237 (1991).

[4] W. H. Press, Numerical Recipes in C++ : The Art of Scientific Computing.  (Cambridge University Press, Cambridge, UK; 2nd ed., 2002).

From Coal and Steam to ChatGPT: Chapters in the History of Technology

Mark Twain once famously wrote in a letter from London to a New York newspaper editor:

“I have … heard on good authority that I was dead [but] the report of my death was an exaggeration.”

The same may be true of recent reports on the grave illness and possible impending death of human culture at the hands of ChatGPT and other so-called Large Language Models (LLM).  It is argued that these algorithms have such sophisticated access to the bulk of human knowledge, and can write with apparent authority on virtually any topic, that no-one needs to learn or create anything new. It can all be recycled—the end of human culture!

While there may be a kernel of truth to these reports, they are premature.  ChatGPT is just the latest in a continuing string of advances that have disrupted human life and human culture ever since the invention of the steam engine.  We—humans, that is—weathered the steam engine in the short term and are just as likely to weather the LLM’s. 

ChatGPT: What is it?

For all the hype, ChatGPT is mainly just a very sophisticated statistical language model (SLM). 

To start with a very simple example of SLM, imagine you are playing a word scramble game and have the letter “Q”. You can be pretty certain that the “Q“ will be followed by a “U” to make “QU”.  Or if you have the initial pair “TH” there is a very high probability that it will be followed by a vowel as “THA…”, “THE…”, ”THI…”, “THO..” or “THU…” and possibly with an “R” as “THR…”.  This almost exhausts the probabilities.  This is all determined by the statistical properties of English.

Statistical language models build probability distributions for the likelihood that some sequence of letters will be followed by another sequence of letters, or a sequence of words (and punctuations) will be followed by another sequence of words.  The bigger the chains of letters and words, the number of possible permutations grows exponentially.  This is why SLMs usually stop at some moderate order of statistics.  If you build sentences from such a model, it sounds OK for a sentence or two, but then it just drifts around like it’s dreaming or hallucinating in a stream of consciousness without any coherence.

ChatGPT works in much the same way.  It just extends the length of the sequences where it sounds coherent up to a paragraph or two.  In this sense, it is no more “intelligent” than the SLM that follows “Q” with “U”.  ChatGPT simply sustains the charade longer.

Now the details of how ChatGPT accomplishes this charade is nothing less than revolutionary.  The acronym GPT means Generative Pre-Trained Transformer.  Transformers were a new type of neural net architecture invented in 2017 by the Google Brain team.  Transformers removed the need to feed sentences word-by-word into a neural net, instead allowing whole sentences and even whole paragraphs to be input in parallel.  Then, by feeding the transformers on more than a Terabyte of textual data from the web, they absorbed the vast output of virtually all the crowd-sourced information from the past 20 years.  (This what transformed the model from an SLM to an LLM.)  Finally, using humans to provide scores on what good answers looked like versus bad answers, ChatGPT was supervised to provide human-like responses.  The result is a chatbot that in any practical sense passes the Turing Test—if you query it for an extended period of time, you would be hard pressed to decide if it was a computer program or a human giving you the answers.  But Turing Tests are boring and not very useful. 

Figure. The Transformer architecture broken into the training step and the generation step. In training, pairs of inputs and targets are used to train encoders and decoders to build up word probabilities at the output. In generation, a partial input, or a query, is presented to the decoders that find the most likely missing, or next, word in the sequence. The sentence is built up sequentially in each iteration. It is an important distinction that this is not a look-up table … it is trained on huge amounts of data and learns statistical likelihoods, not exact sequences.

The true value of ChatGPT is the access it has to that vast wealth of information (note it is information and not knowledge).  Give it almost any moderately technical query, and it will provide a coherent summary for you—on amazingly esoteric topics—because almost every esoteric topic has found its way onto the net by now, and ChatGPT can find it. 

As a form of search engine, this is tremendous!  Think how frustrating it has always been searching the web for something specific.  Furthermore, the lengthened coherence made possible by the transformer neural net means that a first query that leads to an unsatisfactory answer from the chatbot can be refined, and ChatGPT will find a “better” response, conditioned by the statistics of its first response that was not optimal.  In a feedback cycle, with the user in the loop, very specific information can be isolated.

Or, imagine that you are not a strong writer, or don’t know the English language as well as you would like.  But entering your own text, you can ask ChatGPT to do a copy-edit, even rephrasing your writing where necessary, because ChatGPT above all else has an unequaled command of the structure of English.

Or, for customer service, instead of the frustratingly discrete menu of 5 or 10 potted topics, ChatGPT with a voice synthesizer could respond to continuously finely graded nuances of the customer’s problem—not with any understanding or intelligence, but with probabilistic likelihoods of what the solutions are for a broad range of possible customer problems.

In the midst of all the hype surrounding ChatGPT, it is important to keep in mind two things:  First, we are witnessing the beginning of a revolution and a disruptive technology that will change how we live.  Second, it is still very early days, just like the early days of the first steam engines running on coal.

Disruptive Technology

Disruptive technologies are the coin of the high-tech realm of Silicon Valley.  But this is nothing new.  There have always been disruptive technologies—all the way back to Thomas Newcomen and James Watt and the steam engines they developed between 1712 and 1776 in England.  At first, steam engines were so crude they were used only to drain water from mines, increasing the number jobs in and around the copper and tin mines of Cornwall (viz. the popular BBC series Poldark) and the coal mines of northern England.  But over the next 50 years, steam engines improved, and they became the power source for textile factories that displaced the cottage industry of spinning and weaving that had sustained marginal farms for centuries before.

There is a pattern to a disruptive technology.  It not only disrupts an existing economic model, but it displaces human workers.  Once-plentiful jobs in an economic sector can vanish quickly after the introduction of the new technology.  The change can happen so fast, that there is not enough time for the workforce to adapt, followed by human misery in some sectors.  Yet other, newer, sectors always flourish, with new jobs, new opportunities, and new wealth.  The displaced workers often never see these benefits because they lack skills for the new jobs. 

The same is likely true for the LLMs and the new market models they will launch. There will be a wealth of new jobs curating and editing LLM outputs. There will also be new jobs in the generation of annotated data and in the technical fields surrounding the support of LLMs. LLMs are incredibly hungry for high-quality annotated data in a form best provided by humans. Jobs unlikely to be at risk, despite prophesies of doom, include teachers who can use ChatGPT as an aide by providing appropriate context to its answers. Conversely, jobs that require a human to assemble information will likely disappear, such as news aggregators. The same will be true of jobs in which effort is repeated, or which follow a set of patterns, such as some computer coding jobs or data analysts. Customer service positions will continue to erode, as will library services. Media jobs are at risk, as well as technical writing. The writing of legal briefs may be taken over by LLMs, along with market and financial analysts. By some estimates, there are 300 million jobs around the world that will be impacted one way or another by the coming spectrum of LLMs.

This pattern of disruption is so set and so clear and so consistent, that forward-looking politicians or city and state planners could plan ahead, because we have been on a path of continuing waves disruption for over two hundred years.

Waves of Disruption

In the history of technology, it is common to describe a series of revolutions as if they were distinct.  The list looks something like this:

First:          Power (The Industrial Revolution: 1760 – 1840)

Second:     Electricity and Connectivity (Technological Revolution: 1860 – 1920)

Third:        Automation, Information, Cybernetics (Digital Revolution: 1950 – )

Fourth:      Intelligence, cyber-physical (Imagination Revolution: 2010 – )

The first revolution revolved around steam power fueled by coal, radically increasing output of goods.  The second revolution shifted to electrical technologies, including communication networks through telegraph and the telephones.  The third revolution focused on automation and digital information.

Yet this discrete list belies an underlying fact:  There is, and has been, only one continuous Industrial Revolution punctuated by waves.

The Age of Industrial Revolutions began around 1760 with the invention of the spinning jenny by James Hargreaves—and that Age has continued, almost without pause, up to today and will go beyond.  Each disruptive technology has displaced the last.  Each newly trained workforce has been displaced by the last.  The waves keep coming. 

Note that the fourth wave is happening now, as artificial intelligence matures. This is ironic, because this latest wave of the Industrial Revolution is referred to as the “Imagination Revolution” by the optimists who believe that we are moving into a period where human creativity is unleashed by the unlimited resources of human connectivity across the web. Yet this moment of human ascension to the heights of creativity is happening at just the moment when LLM’s are threatening to remove the need to create anything new.

So is it the end of human culture? Will all knowledge now just be recycled with nothing new added?

A Post-Human Future?

The limitations of the generative aspects of ChatGPT might be best visualized by using an image-based generative algorithm that has also gotten a lot of attention lately. This is the ability to input a photograph, and input a Van Gogh painting, and create a new painting of the photograph in the style of Van Gogh.

In this example, the output on the right looks like a Van Gogh painting. It is even recognizable as a Van Gogh. But in fact it is a parody. Van Gogh consciously created something never before seen by humans.

Even if an algorithm can create “new” art, it is a type of “found” art, like a picturesque stone formation or a sunset. The beauty becomes real only in the response it elicits in the human viewer. Art and beauty do not exist by themselves; they only exist in relationship to the internal state of the conscious observer, like a text or symbol signifying to an interpreter. The interpreter is human, even if the artist is not.

ChatGPT, or any LLM like Google’s Bard, can generate original text, but its value only resides in the human response to it. The human interpreter can actually add value to the LLM text by “finding” sections that are interesting or new, or that inspire new thoughts in the interpreter. The interpreter can also “edit” the text, to bring it in line with their aesthetic values. This way, the LLM becomes a tool for discovery. It cannot “discover” anything on its own, but it can present information to a human interpreter who can mold it into something that they recognize as new. From a semiotic perspective, the LLM can create the signifier, but the signified is only made real by the Human interpreter—emphasize Human.

Therefore, ChatGPT and the LLMs become part of the Fourth Wave of the human Industrial Revolution rather than replacing it.

We are moving into an exciting time in the history of technology, giving us a rare opportunity to watch as the newest wave of revolution takes shape before our very eyes. That said … just as the long-term consequences of the steam engine are only now coming home to roost two hundred years later in the form of threats to our global climate, the effect of ChatGPT in the long run may be hard to divine until far in the future—and then, maybe after it’s too late, so a little caution now would be prudent.


OpenAI ChatGPT:

Training GPT with human input:

Generative art:

Status of Large Language Models:

LLMs at Google:

How Transformers work:

The start of the Transformer:

Francois Arago and the Birth of Optical Interferometry

An excerpt from the upcoming book “Interference: The History of Optical Interferometry and the Scientists who Tamed Light” describes how a handful of 19th-century scientists laid the groundwork for one of the key tools of modern optics. Published in Optics and Photonics News, March 2023.

François Arago rose to the highest levels of French science and politics. Along the way, he met Augustin Fresnel and, together, they changed the course of optical science.

Link to OPN Article

New from Oxford Press: The History of Optical Interferometry (Late Summer 2023)

A Short History of Hyperspace

Hyperspace by any other name would sound as sweet, conjuring to the mind’s eye images of hypercubes and tesseracts, manifolds and wormholes, Klein bottles and Calabi Yau quintics.  Forget the dimension of time—that may be the most mysterious of all—but consider the extra spatial dimensions that challenge the mind and open the door to dreams of going beyond the bounds of today’s physics.

The geometry of n dimensions studies reality; no one doubts that. Bodies in hyperspace are subject to precise definition, just like bodies in ordinary space; and while we cannot draw pictures of them, we can imagine and study them.

(Poincare 1895)

Here is a short history of hyperspace.  It begins with advances by Möbius and Liouville and Jacobi who never truly realized what they had invented, until Cayley and Grassmann and Riemann made it explicit.  They opened Pandora’s box, and multiple dimensions burst upon the world never to be put back again, giving us today the manifolds of string theory and infinite-dimensional Hilbert spaces.

August Möbius (1827)

Although he is most famous for the single-surface strip that bears his name, one of the early contributions of August Möbius was the idea of barycentric coordinates [1] , for instance using three coordinates to express the locations of points in a two-dimensional simplex—the triangle. Barycentric coordinates are used routinely today in metallurgy to describe the alloy composition in ternary alloys.

August Möbius (1790 – 1868). Image.

Möbius’ work was one of the first to hint that tuples of numbers could stand in for higher dimensional space, and they were an early example of homogeneous coordinates that could be used for higher-dimensional representations. However, he was too early to use any language of multidimensional geometry.

Carl Jacobi (1834)

Carl Jacobi was a master at manipulating multiple variables, leading to his development of the theory of matrices. In this context, he came to study (n-1)-fold integrals over multiple continuous-valued variables. From our modern viewpoint, he was evaluating surface integrals of hyperspheres.

Carl Gustav Jacob Jacobi (1804 – 1851)

In 1834, Jacobi found explicit solutions to these integrals and published them in a paper with the imposing title “De binis quibuslibet functionibus homogeneis secundi ordinis per substitutiones lineares in alias binas transformandis, quae solis quadratis variabilium constant; una cum variis theorematis de transformatione et determinatione integralium multiplicium” [2]. The resulting (n-1)-fold integrals are

when the space dimension is even or odd, respectively. These are the surface areas of the manifolds called (n-1)-spheres in n-dimensional space. For instance, the 2-sphere is the ordinary surface 4πr2 of a sphere on our 3D space.

Despite the fact that we recognize these as surface areas of hyperspheres, Jacobi used no geometric language in his paper. He was still too early, and mathematicians had not yet woken up to the analogy of extending spatial dimensions beyond 3D.

Joseph Liouville (1838)

Joseph Liouville’s name is attached to a theorem that lies at the core of mechanical systems—Liouville’s Theorem that proves that volumes in high-dimensional phase space are incompressible. Surprisingly, Liouville had no conception of high dimensional space, to say nothing of abstract phase space. The story of the convoluted path that led Liouville’s name to be attached to his theorem is told in Chapter 6, “The Tangled Tale of Phase Space”, in Galileo Unbound (Oxford University Press, 2018).

Joseph Liouville (1809 – 1882)

Nonetheless, Liouville did publish a pure-mathematics paper in 1838 in Crelle’s Journal [3] that identified an invariant quantity that stayed constant during the differential change of multiple variables when certain criteria were satisfied. It was only later that Jacobi, as he was developing a new mechanical theory based on William R. Hamilton’s work, realized that the criteria needed for Liouville’s invariant quantity to hold were satisfied by conservative mechanical systems. Even then, neither Liouville nor Jacobi used the language of multidimensional geometry, but that was about to change in a quick succession of papers and books by three mathematicians who, unknown to each other, were all thinking along the same lines.

Facsimile of Liouville’s 1838 paper on invariants

Arthur Cayley (1843)

Arthur Cayley was the first to take the bold step to call the emerging geometry of multiple variables to be actual space. His seminal paper “Chapters in the Analytic Theory of n-Dimensions” was published in 1843 in the Philosophical Magazine [4]. Here, for the first time, Cayley recognized that the domain of multiple variables behaved identically to multidimensional space. He used little of the language of geometry in the paper, which was mostly analysis rather than geometry, but his bold declaration for spaces of n-dimensions opened the door to a changing mindset that would soon sweep through geometric reasoning.

Arthur Cayley (1821 – 1895). Image

Hermann Grassmann (1844)

Grassmann’s life story, although not overly tragic, was beset by lifelong setbacks and frustrations. He was a mathematician literally 30 years ahead of his time, but because he was merely a high-school teacher, no-one took his ideas seriously.

Somehow, in nearly a complete vacuum, disconnected from the professional mathematicians of his day, he devised an entirely new type of algebra that allowed geometric objects to have orientation. These could be combined in numerous different ways obeying numerous different laws. The simplest elements were just numbers, but these could be extended to arbitrary complexity with arbitrary number of elements. He called his theory a theory of “Extension”, and he self-published a thick and difficult tome that contained all of his ideas [5]. He tried to enlist Möbius to help disseminate his ideas, but even Möbius could not recognize what Grassmann had achieved.

In fact, what Grassmann did achieve was vector algebra of arbitrarily high dimension. Perhaps more impressive for the time is that he actually recognized what he was dealing with. He did not know of Cayley’s work, but independently of Cayley he used geometric language for the first time describing geometric objects in high dimensional spaces. He said, “since this method of formation is theoretically applicable without restriction, I can define systems of arbitrarily high level by this method… geometry goes no further, but abstract science knows no limits.” [6]

Grassman was convinced that he had discovered something astonishing and new, which he had, but no one understood him. After years trying to get mathematicians to listen, he finally gave up, left mathematics behind, and actually achieved some fame within his lifetime in the field of linguistics. There is even a law of diachronic linguistics named after him. For the story of Grassmann’s struggles, see the blog on Grassmann and his Wedge Product .

Hermann Grassmann (1809 – 1877).

Julius Plücker (1846)

Projective geometry sounds like it ought to be a simple topic, like the projective property of perspective art as parallel lines draw together and touch at the vanishing point on the horizon of a painting. But it is far more complex than that, and it provided a separate gateway into the geometry of high dimensions.

A hint of its power comes from homogeneous coordinates of the plane. These are used to find where a point in three dimensions intersects a plane (like the plane of an artist’s canvas). Although the point on the plane is in two dimensions, it take three homogeneous coordinates to locate it. By extension, if a point is located in three dimensions, then it has four homogeneous coordinates, as if the three dimensional point were a projection onto 3D from a 4D space.

These ideas were pursued by Julius Plücker as he extended projective geometry from the work of earlier mathematicians such as Desargues and Möbius. For instance, the barycentric coordinates of Möbius are a form of homogeneous coordinates. What Plücker discovered is that space does not need to be defined by a dense set of points, but a dense set of lines can be used just as well. The set of lines is represented as a four-dimensional manifold. Plücker reported his findings in a book in 1846 [7] and expanded on the concepts of multidimensional spaces published in 1868 [8].

Julius Plücker (1801 – 1868).

Ludwig Schläfli (1851)

After Plücker, ideas of multidimensional analysis became more common, and Ludwig Schläfli (1814 – 1895), a professor at the University of Berne in Switzerland, was one of the first to fully explore analytic geometry in higher dimensions. He described multidimsnional points that were located on hyperplanes, and he calculated the angles between intersecting hyperplanes [9]. He also investigated high-dimensional polytopes, from which are derived our modern “Schläfli notation“. However, Schläffli used his own terminology for these objects, emphasizing analytic properties without using the ordinary language of high-dimensional geometry.

Some of the polytopes studied by Schläfli.

Bernhard Riemann (1854)

The person most responsible for the shift in the mindset that finally accepted the geometry of high-dimensional spaces was Bernhard Riemann. In 1854 at the university in Göttingen he presented his habilitation talk “Über die Hypothesen, welche der Geometrie zu Grunde liegen” (Over the hypotheses on which geometry is founded). A habilitation in Germany was an examination that qualified an academic to be able to advise their own students (somewhat like attaining tenure in US universities).

The habilitation candidate would suggest three topics, and it was usual for the first or second to be picked. Riemann’s three topics were: trigonometric properties of functions (he was the first to rigorously prove the convergence properties of Fourier series), aspects of electromagnetic theory, and a throw-away topic that he added at the last minute on the foundations of geometry (on which he had not actually done any serious work). Gauss was his faculty advisor and picked the third topic. Riemann had to develop the topic in a very short time period, starting from scratch. The effort exhausted him mentally and emotionally, and he had to withdraw temporarily from the university to regain his strength. After returning around Easter, he worked furiously for seven weeks to develop a first draft and then asked Gauss to set the examination date. Gauss initially thought to postpone to the Fall semester, but then at the last minute scheduled the talk for the next day. (For the story of Riemann and Gauss, see Chapter 4 “Geometry on my Mind” in the book Galileo Unbound (Oxford, 2018)).

Riemann gave his lecture on 10 June 1854, and it was a masterpiece. He stripped away all the old notions of space and dimensions and imbued geometry with a metric structure that was fundamentally attached to coordinate transformations. He also showed how any set of coordinates could describe space of any dimension, and he generalized ideas of space to include virtually any ordered set of measurables, whether it was of temperature or color or sound or anything else. Most importantly, his new system made explicit what those before him had alluded to: Jacobi, Grassmann, Plücker and Schläfli. Ideas of Riemannian geometry began to percolate through the mathematics world, expanding into common use after Richard Dedekind edited and published Riemann’s habilitation lecture in 1868 [10].

Bernhard Riemann (1826 – 1866). Image.

George Cantor and Dimension Theory (1878)

In discussions of multidimensional spaces, it is important to step back and ask what is dimension? This question is not as easy to answer as it may seem. In fact, in 1878, George Cantor proved that there is a one-to-one mapping of the plane to the line, making it seem that lines and planes are somehow the same. He was so astonished at his own results that he wrote in a letter to his friend Richard Dedekind “I see it, but I don’t believe it!”. A few decades later, Peano and Hilbert showed how to create area-filling curves so that a single continuous curve can approach any point in the plane arbitrarily closely, again casting shadows of doubt on the robustness of dimension. These questions of dimensionality would not be put to rest until the work by Karl Menger around 1926 when he provided a rigorous definition of topological dimension (see the Blog on the History of Fractals).

Area-filling curves by Peano and Hilbert.

Hermann Minkowski and Spacetime (1908)

Most of the earlier work on multidimensional spaces were mathematical and geometric rather than physical. One of the first examples of physical hyperspace is the spacetime of Hermann Minkowski. Although Einstein and Poincaré had noted how space and time were coupled by the Lorentz equations, they did not take the bold step of recognizing space and time as parts of a single manifold. This step was taken in 1908 [11] by Hermann Minkowski who claimed

“Gentlemen! The views of space and time which I wish to lay before you … They are radical. Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.”Herman Minkowski (1908)

For the story of Einstein and Minkowski, see the Blog on Minkowski’s Spacetime: The Theory that Einstein Overlooked.

Facsimile of Minkowski’s 1908 publication on spacetime.

Felix Hausdorff and Fractals (1918)

No story of multiple “integer” dimensions can be complete without mentioning the existence of “fractional” dimensions, also known as fractals. The individual who is most responsible for the concepts and mathematics of fractional dimensions was Felix Hausdorff. Before being compelled to commit suicide by being jewish in Nazi Germany, he was a leading light in the intellectual life of Leipzig, Germany. By day he was a brilliant mathematician, by night he was the author Paul Mongré writing poetry and plays.

In 1918, as the war was ending, he wrote a small book “Dimension and Outer Measure” that established ways to construct sets whose measured dimensions were fractions rather than integers [12]. Benoit Mandelbrot would later popularize these sets as “fractals” in the 1980’s. For the background on a history of fractals, see the Blog A Short History of Fractals.

Felix Hausdorff (1868 – 1942)
Example of a fractal set with embedding dimension DE = 2, topological dimension DT = 1, and fractal dimension DH = 1.585.

The Fifth Dimension of Theodore Kaluza (1921) and Oskar Klein (1926)

The first theoretical steps to develop a theory of a physical hyperspace (in contrast to merely a geometric hyperspace) were taken by Theodore Kaluza at the University of Königsberg in Prussia. He added an additional spatial dimension to Minkowski spacetime as an attempt to unify the forces of gravity with the forces of electromagnetism. Kaluza’s paper was communicated to the journal of the Prussian Academy of Science in 1921 through Einstein who saw the unification principles as a parallel of some of his own attempts [13]. However, Kaluza’s theory was fully classical and did not include the new quantum theory that was developing at that time in the hands of Heisenberg, Bohr and Born.

Oskar Klein was a Swedish physicist who was in the “second wave” of quantum physicists having studied under Bohr. Unaware of Kaluza’s work, Klein developed a quantum theory of a five-dimensional spacetime [14]. For the theory to be self-consistent, it was necessary to roll up the extra dimension into a tight cylinder. This is like a strand a spaghetti—looking at it from far away it looks like a one-dimensional string, but an ant crawling on the spaghetti can move in two dimensions—along the long direction, or looping around it in the short direction called a compact dimension. Klein’s theory was an early attempt at what would later be called string theory. For the historical background on Kaluza and Klein, see the Blog on Oskar Klein.

The wave equations of Klein-Gordon, Schrödinger and Dirac.

John Campbell (1931): Hyperspace in Science Fiction

Art has a long history of shadowing the sciences, and the math and science of hyperspace was no exception. One of the first mentions of hyperspace in science fiction was in the story “Islands in Space’, by John Campbell [15], published in the Amazing Stories quarterly in 1931, where it was used as an extraordinary means of space travel.

In 1951, Isaac Asimov made travel through hyperspace the transportation network that connected the galaxy in his Foundation Trilogy [16].

Testez-vous : Isaac Asimov avait-il (entièrement) raison ? - Sciences et  Avenir
Isaac Asimov (1920 – 1992)

John von Neumann and Hilbert Space (1932)

Quantum mechanics had developed rapidly through the 1920’s, but by the early 1930’s it was in need of an overhaul, having outstripped rigorous mathematical underpinnings. These underpinnings were provided by John von Neumann in his 1932 book on quantum theory [17]. This is the book that cemented the Copenhagen interpretation of quantum mechanics, with projection measurements and wave function collapse, while also establishing the formalism of Hilbert space.

Hilbert space is an infinite dimensional vector space of orthogonal eigenfunctions into which any quantum wave function can be decomposed. The physicists of today work and sleep in Hilbert space as their natural environment, often losing sight of its infinite dimensions that don’t seem to bother anyone. Hilbert space is more than a mere geometrical space, but less than a full physical space (like five-dimensional spacetime). Few realize that what is so often ascribed to Hilbert was actually formalized by von Neumann, among his many other accomplishments like stored-program computers and game theory.

John von Neumann (1903 – 1957). Image Credits.

Einstein-Rosen Bridge (1935)

One of the strangest entities inhabiting the theory of spacetime is the Einstein-Rosen Bridge. It is space folded back on itself in a way that punches a short-cut through spacetime. Einstein, working with his collaborator Nathan Rosen at Princeton’s Institute for Advanced Study, published a paper in 1935 that attempted to solve two problems [18]. The first problem was the Schwarzschild singularity at a radius r = 2M/c2 known as the Schwarzschild radius or the Event Horizon. Einstein had a distaste for such singularities in physical theory and viewed them as a problem. The second problem was how to apply the theory of general relativity (GR) to point masses like an electron. Again, the GR solution to an electron blows up at the location of the particle at r = 0.

Einstein-Rosen Bridge. Image.

To eliminate both problems, Einstein and Rosen (ER) began with the Schwarzschild metric in its usual form

where it is easy to see that it “blows up” when r = 2M/c2 as well as at r = 0. ER realized that they could write a new form that bypasses the singularities using the simple coordinate substitution

to yield the “wormhole” metric

It is easy to see that as the new variable u goes from -inf to +inf that this expression never blows up. The reason is simple—it removes the 1/r singularity by replacing it with 1/(r + ε). Such tricks are used routinely today in computational physics to keep computer calculations from getting too large—avoiding the divide-by-zero problem. It is also known as a form of regularization in machine learning applications. But in the hands of Einstein, this simple “bypass” is not just math, it can provide a physical solution.

It is hard to imagine that an article published in the Physical Review, especially one written about a simple variable substitution, would appear on the front page of the New York Times, even appearing “above the fold”, but such was Einstein’s fame this is exactly the response when he and Rosen published their paper. The reason for the interest was because of the interpretation of the new equation—when visualized geometrically, it was like a funnel between two separated Minkowski spaces—in other words, what was named a “wormhole” by John Wheeler in 1957. Even back in 1935, there was some sense that this new property of space might allow untold possibilities, perhaps even a form of travel through such a short cut.

As it turns out, the ER wormhole is not stable—it collapses on itself in an incredibly short time so that not even photons can get through it in time. More recent work on wormholes have shown that it can be stabilized by negative energy density, but ordinary matter cannot have negative energy density. On the other hand, the Casimir effect might have a type of negative energy density, which raises some interesting questions about quantum mechanics and the ER bridge.

Edward Witten’s 10+1 Dimensions (1995)

A history of hyperspace would not be complete without a mention of string theory and Edward Witten’s unification of the variously different 10-dimensional string theories into 10- or 11-dimensional M-theory. At a string theory conference at USC in 1995 he pointed out that the 5 different string theories of the day were all related through dualities. This observation launched the second superstring revolution that continues today. In this theory, 6 extra spatial dimensions are wrapped up into complex manifolds such as the Calabi-Yau manifold.

Two-dimensional slice of a six-dimensional Calabi-Yau quintic manifold.


There is definitely something wrong with our three-plus-one dimensions of spacetime. We claim that we have achieved the pinnacle of fundamental physics with what is called the Standard Model and the Higgs boson, but dark energy and dark matter loom as giant white elephants in the room. They are giant, gaping, embarrassing and currently unsolved. By some estimates, the fraction of the energy density of the universe comprised of ordinary matter is only 5%. The other 95% is in some form unknown to physics. How can physicists claim to know anything if 95% of everything is in some unknown form?

The answer, perhaps to be uncovered sometime in this century, may be the role of extra dimensions in physical phenomena—probably not in every-day phenomena, and maybe not even in high-energy particles—but in the grand expanse of the cosmos.

By David D. Nolte, Feb. 8, 2023


M. Kaku, R. O’Keefe, Hyperspace: A scientific odyssey through parallel universes, time warps, and the tenth dimension.  (Oxford University Press, New York, 1994).

A. N. Kolmogorov, A. P. Yushkevich, Mathematics of the 19th century: Geometry, analytic function theory.  (Birkhäuser Verlag, Basel ; 1996).


[1] F. Möbius, in Möbius, F. Gesammelte Werke,, D. M. Saendig, Ed. (oHG, Wiesbaden, Germany, 1967), vol. 1, pp. 36-49.

[2] Carl Jacobi, “De binis quibuslibet functionibus homogeneis secundi ordinis per substitutiones lineares in alias binas transformandis, quae solis quadratis variabilium constant; una cum variis theorematis de transformatione et determinatione integralium multiplicium” (1834)

[3] J. Liouville, Note sur la théorie de la variation des constantes arbitraires. Liouville Journal 3, 342-349 (1838).

[4] A. Cayley, Chapters in the analytical geometry of n dimensions. Collected Mathematical Papers 1, 317-326, 119-127 (1843).

[5] H. Grassmann, Die lineale Ausdehnungslehre.  (Wiegand, Leipzig, 1844).

[6] H. Grassmann quoted in D. D. Nolte, Galileo Unbound (Oxford University Press, 2018) pg. 105

[7] J. Plücker, System der Geometrie des Raumes in Neuer Analytischer Behandlungsweise, Insbesondere de Flächen Sweiter Ordnung und Klasse Enthaltend.  (Düsseldorf, 1846).

[8] J. Plücker, On a New Geometry of Space (1868).

[9] L. Schläfli, J. H. Graf, Theorie der vielfachen Kontinuität. Neue Denkschriften der Allgemeinen Schweizerischen Gesellschaft für die Gesammten Naturwissenschaften 38. ([s.n.], Zürich, 1901).

[10] B. Riemann, Über die Hypothesen, welche der Geometrie zu Grunde liegen, Habilitationsvortrag. Göttinger Abhandlung 13,  (1854).

[11] Minkowski, H. (1909). “Raum und Zeit.” Jahresbericht der Deutschen Mathematikier-Vereinigung: 75-88.

[12] Hausdorff, F.(1919).“Dimension und ausseres Mass,”Mathematische Annalen, 79: 157–79.

[13] Kaluza, Theodor (1921). “Zum Unitätsproblem in der Physik”. Sitzungsber. Preuss. Akad. Wiss. Berlin. (Math. Phys.): 966–972

[14] Klein, O. (1926). “Quantentheorie und fünfdimensionale Relativitätstheorie“. Zeitschrift für Physik. 37 (12): 895

[15] John W. Campbell, Jr. “Islands of Space“, Amazing Stories Quarterly (1931)

[16] Isaac Asimov, Foundation (Gnome Press, 1951)

[17] J. von Neumann, Mathematical Foundations of Quantum Mechanics.  (Princeton University Press, ed. 1996, 1932).

[18] A. Einstein and N. Rosen, “The Particle Problem in the General Theory of Relativity,” Phys. Rev. 48(73) (1935).

New from Oxford Press: The History of Optical Interferometry (Late Summer 2023)

Paul Lévy’s Black Swan: The Physics of Outliers

The Black Swan was a mythical beast invented by the Roman poet Juvenal as a metaphor for things that are so rare they can only be imagined.  His quote goes “rara avis in terris nigroque simillima cygno” (a rare bird in the lands and very much like a black swan).

Imagine the shock, then, when the Dutch explorer Willem de Vlamingh first saw black swans in Australia in 1697.  The metaphor morphed into a new use, meaning when a broadly held belief (the impossibility of black swans) is refuted by a single new observation. 

For instance, in 1870 the biologist Thomas Henry Huxley, known as “Darwin’s Bulldog” for his avid defense of Darwin’s theories, delivered a speech in Liverpool, England, where he was quoted in Nature magazine as saying,

… the great tragedy of Science—the slaying of a beautiful hypothesis by an ugly fact

This quote has been picked up and repeated over the years in many different contexts. 

One of those contexts applies to the fate of a beautiful economic theory, proposed by Fischer Black and Myron Scholes in 1973, as a way to make the perfect hedge on Wall Street, purportedly risk free, yet guaranteeing a positive return in spite of the ups-and-downs of stock prices.  Scholes and Black launched an investment company in 1994 to cash in on this beautiful theory, returning an unbelievable 40% on investment.  Black died in 1995, but Scholes was awarded the Nobel Prize in Economics in 1997.  The next year, the fund went out of business.  The ugly fact that flew in the face of Black-Scholes was the Black Swan.

The Black Swan

A Black Swan is an outlier measurement that occurs in a sequence of data points.  Up until the Black Swan event, the data points behave normally, following the usual statistics we have all come to expect, maybe a Gaussian distribution or some other form of exponential that dominate most variable phenomena.

Fig. An Australian Black Swan (Wikipedia).

But then a Black Swan occurs.  It has a value so unexpected, and so unlike all the other measurements, that it is often assumed to be wrong and possibly even thrown out because it screws up the otherwise nice statistics.  That single data point skews averages and standard deviations in non-negligible ways.  The response to such a disturbing event is to take even more data to let the averages settle down again … until another Black Swan hits and again skews the mean value. However, such outliers are often not spurious measurements but are actually a natural part of the process. They should not, and can not, be thrown out without compromising the statistical integrity of the study.

This outlier phenomenon came to mainstream attention when the author Nassim Nicholas Taleb, in his influential 2007 book, The Black Swan: The Impact of the Highly Improbable, pointed out that it was a central part of virtually every aspect of modern life, whether in business, or the development of new technologies, or the running of elections, or the behavior of financial markets.  Things that seemed to be well behaved … a set of products, or a collective society, or a series of governmental policies … are suddenly disrupted by a new invention, or a new law, or a bad Supreme Court decision, or a war, or a stock-market crash.

As an illustration, let’s see where Black-Scholes went wrong.

The Perfect Hedge on Wall Street?

Fischer Black (1938 – 1995) was a PhD advisor’s nightmare.  He had graduated as an undergraduate physics major from Harvard in 1959, but then switched to mathematics for graduate school, then switched to computers, then switched again to artificial intelligence, after which he was thrown out of the graduate program at Harvard for having a serious lack of focus.  So he joined the RAND corporation, where he had time to play with his ideas, eventually approaching Marvin Minsky at MIT, who helped guide him to an acceptable thesis that he was allowed to submit to the Harvard program for his PhD in applied mathematics.  After that, he went to work in financial markets.

His famous contribution to financial theory was the Black-Scholes paper of 1973 on “The Pricing of Options and Corporate Liabilities” co-authored with Byron Scholes.   Hedging is a venerable tradition on Wall Street.  To hedge means that a broker sells an option (to purchase a stock at a given price at a later time) assuming that the stock will fall in value (selling short), and then buys, as insurance against the price rising, a number of shares of the same asset (buying long).  If the broker balances enough long shares with enough short options, then the portfolio’s value is insulated from the day-to-day fluctuations of the value of the underlying asset. 

This type of portfolio is one example of a financial instrument called a derivative.  The name comes from the fact that the value of the portfolio is derived from the values of the underlying assets.  The challenge with derivatives is finding their “true” value at any time before they mature.  If a broker knew the “true” value of a derivative, then there would be no risk in buying and selling derivatives.

To be risk free, the value of the derivative needs to be independent of the fluctuations.  This appears at first to be a difficult problem, because fluctuations are random and cannot be predicted.  But the solution actually relies on just this condition of randomness.  If the random fluctuations in stock prices are equivalent to a random walk superposed on the average rate of return, then perfect hedges can be constructed with impunity.

To make a hedge on an underlying asset, create a portfolio by selling one call option (selling short) and buying a number N shares of the asset (buying long) as insurance against the possibility that the asset value will rise.  The value of this portfolio is

If the number N is chosen correctly, then the short and long positions will balance, and the portfolio will be protected from fluctuations in the underlying asset price.  To find N, consider the change in the value of the portfolio as the variables fluctuate

and use an elegant result known as Ito’s Formula (a stochastic differential equation that includes the effects of a stochastic variable) to yield

Note that the last term contains the fluctuations, expressed using the stochastic term dW (a random walk).  The fluctuations can be zeroed-out by choosing

which yields

The important observation about this last equation is that the stochastic function W has disappeared.  This is because the fluctuations of the N share prices balance the fluctuations of the short option. 

When a broker buys an option, there is a guaranteed rate of return r at the time of maturity of the option which is set by the value of a risk-free bond.  Therefore, the price of a perfect hedge must increase with the risk-free rate of return.  This is


Equating the two equations gives

Simplifying, this leads to a partial differential equation for V(S,t)

The Black-Scholes equation is a partial differential equation whose solution, given the boundary conditions and time, defines the “true” value of the derivative and determines how many shares to buy at t = 0 at a specified guaranteed return rate r (or, alternatively, stating a specified stock price S(T) at the time of maturity T of the option).  It is a diffusion equation that incorporates the diffusion of the stock price with time.  If the derivative is sold at any time t prior to maturity, when the stock has some value S, then the value of the derivative is given by V(S,t) as the solution to the Black-Scholes equation [1].

One of the interesting features of this equation is the absence of the mean rate of return μ of the underlying asset.  This means that any stock of any value can be considered, even if the rate of return of the stock is negative!  This type of derivative looks like a truly risk-free investment.  You would be guaranteed to make money even if the value of the stock falls, which may sound too good to be true…which of course it is. 

Black, Scholes and Merton. Sholes and Merton were winners of the 1997 Nobel Prize in Economics.

The success (or failure) of derivative markets depends on fundamental assumptions about the stock market.  These include that it would not be subject to radical adjustments or to panic or irrational exuberance, i.i. Black-Swan events, which is clearly not the case.  Just think of booms and busts.  The efficient and rational market model, and ultimately the Black-Scholes equation, assumes that fluctuations in the market are governed by Gaussian random statistics.  However, there are other types of statistics that are just as well behaved as the Gaussian, but which admit Black Swans.

Stable Distributions: Black Swans are the Norm

When Paul Lévy (1886 – 1971) was asked in 1919 to give three lectures on random variables at the École Polytechnique, the mathematical theory of probability was just a loose collection of principles and proofs. What emerged from those lectures was a lifetime of study in a field that now has grown to become one of the main branches of mathematics. He had a distinguished and productive career, although he struggled to navigate the anti-semitism of Vichy France during WWII. His thesis advisor was the famous Jacques Hadamard and one of his students was the famous Benoit Mandelbrot.

Lévy wrote several influential textbooks that established the foundations of probability theory, and his name has become nearly synonymous with the field. One of his books was on the theory of the addition of random variables [2] in which he extended the idea of a stable distribution.

Fig. Paul Lévy in his early years. Les Annales des Mines

In probability theory, a class of distributions are called stable if a sum of two independent random variables that come from a distribution have the same distribution.  The normal (Gaussian) distribution clearly has this property because the sum of two normally distributed independent variables is also normally distributed.  The variance and possibly the mean may be different, but the functional form is still Gaussian. 

Fig. A look at Paul Lévy’s theory of the addition of random variables.

The general form of a probability distribution can be obtained by taking a Fourier transform as

where φ  is known as the characteristic function of the probability distribution.  A special case of a stable distribution is the Lévy symmetric stable distribution obtained as

which is parameterized by α and γ.  The characteristic function in this case is called a stretched exponential with the length scale set by the parameter γ. 

The most important feature of the Lévy distribution is that it has a power-law tail at large values.  For instance, the special case of the Lévy distribution for α = 1 is the Cauchy distribution for positive values x given by

which falls off at large values as x-(α+1). The Cauchy distribution is normalizable (probabilities integrate to unity) and has a characteristic scale set by γ, but it has a divergent mean value, violating the central limit theorem [3].  For distributions that satisfy the central limit theorem, increasing the number of samples from the distribution allows the mean value to converge on a finite value.  However, for the Cauchy distribution increasing the number of samples increases the chances of obtaining a black swan, which skews the mean value, and the mean value diverges to infinity in the limit of an infinite number of samples. This is why the Cauchy distribution is said to have a “heavy tail” that contains rare, but large amplitude, outlier events that keep shifting the mean.

Examples of Levy stable probability distribution functions are shown below for a range between α = 1 (Cauchy) and α = 2 (Gaussian).  The heavy tail is seen even for the case α = 1.99 very close to the Gaussian distribution.  Examples of two-dimensional Levy walks are shown in the figure for α = 1, α = 1.4 and α = 2.  In the case of the Gaussian distribution, the mean-squared displacement is well behaved and finite.  However, for all the other cases, the mean-squared displacement is divergent, caused by the large path lengths that become more probable as α approaches unity.

Fig. Symmetric Lévy distribution functions for a range of parameters α from α = 1 (Cauchy) to α = 2 (Gaussian). Levy flights for α < 2 have a run-and-tumble behavior that is often seen in bacterial motion.

The surprising point of the Lévy probability distribution functions is how common they are in natural phenomena. Heavy Lévy tails arise commonly in almost any process that has scale invariance. Yet as students, we are virtually shielded from them, as if Poisson and Gaussian statistics are all we need to know, but ignorance is not bliss. The assumption of Gaussian statistics is what sank Black-Scholes.

Scale-invariant processes are often consequences of natural cascades of mass or energy and hence arise as neutral phenomena. Yet there are biased phenomena in which a Lévy process can lead to a form of optimization. This is the case for Lévy random walks in biological contexts.

Lévy Walks

The random walk is one of the cornerstones of statistical physics and forms the foundation for Brownian motion which has a long and rich history in physics. Einstein used Brownian motion to derive his famous statistical mechanics equation for diffusion, proving the existence of molecular matter. Jean Perrin won the Nobel prize for his experimental demonstrations of Einstein’s theory. Paul Langevin used Brownian motion to introduce stochastic differential equations into statistical physics. And Lévy used Brownian motion to illustrate applications of mathematical probability theory, writing his last influential book on the topic.

Most treatments of the random walk assume Gaussian or Poisson statistics for the step length or rate, but a special form of random walk emerges when the step length is drawn from a Lévy distribution. This is a Lévy random walk, also named a “Lévy Flight” by Benoit Mandelbrot (Lévy’s student) who studied its fractal character.

Originally, Lévy walks were studied as ideal mathematical models, but there have been a number of discoveries in recent years in which Lévy walks have been observed in the foraging behavior of animals, even in the run-and-tumble behavior of bacteria, in which rare long-distance runs are followed by many local tumbling excursions. It has been surmised that this foraging strategy allows an animal to optimally sample randomly-distributed food sources. There is evidence of Lévy walks of molecules in intracellular transport, which may arise from random motions within the crowded intracellular neighborhood. A middle ground has also been observed [4] in which intracellular organelles and vesicles may take on a Lévy walk character as they attach, migrate, and detach from molecular motors that drive them along the cytoskeleton.

By David D. Nolte, Feb. 8, 2023

Selected Bibliography

Paul Lévy, Calcul des probabilités (Gauthier-Villars, Paris, 1925).

Paul Lévy, Théorie de l’addition des variables aléatoires (Gauthier-Villars, Paris, 1937).

Paul Lévy, Processus stochastique et mouvement brownien (Gauthier-Villars, Paris, 1948).

R. Metzler, J. Klafter, The random walk’s guide to anomalous diffusion: a fractional dynamics approach. Physics Reports-Review Section Of Physics Letters 339, 1-77 (2000).

J. Klafter, I. M. Sokolov, First Steps in Random Walks : From Tools to Applications.  (Oxford University Press, 2011).

F. Hoefling, T. Franosch, Anomalous transport in the crowded world of biological cells. Reports on Progress in Physics 76,  (2013).

V. Zaburdaev, S. Denisov, J. Klafter, Levy walks. Reviews of Modern Physics 87, 483-530 (2015).


[1]  Black, Fischer; Scholes, Myron (1973). “The Pricing of Options and Corporate Liabilities”. Journal of Political Economy. 81 (3): 637–654.

[2] P. Lévy, Théorie de l’addition des variables aléatoire (1937)

[3] The central limit theorem holds if the mean value of a number of N samples converges to a stable value as the number of samples increases to infinity.

[4] H. Choi, K. Jeong, J. Zuponcic, E. Ximenes, J. Turek, M. Ladisch, D. D. Nolte, Phase-Sensitive Intracellular Doppler Fluctuation Spectroscopy. Physical Review Applied 15, 024043 (2021).

Frontiers of Physics: The Year in Review (2022)

Physics forged ahead in 2022, making a wide range of advances. From a telescope far out in space to a telescope that spans the size of the Earth, from solid state physics and quantum computing at ultra-low temperatures to particle and nuclear physics at ultra-high energies, the year saw a number of firsts. Here’s a list of eight discoveries of 2022 that define the frontiers of physics.

James Webb Space Telescope

“First Light” has two meanings: the “First Light” that originated at the beginning of the universe, and the “First Light” that is collected by a new telescope. In the beginning of this year, the the James Webb Space Telescope (JWST) saw both types of first light, and with it came first surprises.

an undulating, translucent star-forming region in the Carina Nebula is shown in this Webb image, hued in ambers and blues; foreground stars with diffraction spikes can be seen, as can a speckling of background points of light through the cloudy nebula
NASA image of the Carina Nebula, a nursery for stars.

The JWST has found that galaxies are too well formed too early in the universe relative to current models of galaxy formation. Almost as soon as the JWST began forming images, it acquired evidence of massive galaxies from only a few hundred million years old. Existing theories of galaxy formation did not predict such large galaxies so soon after the Big Bang.

Another surprise came from images of the Southern Ring Nebula. While the Hubble did not find anything unusual about this planetary nebula, the JWST found cold dust surrounding the white dwarf that remained after the explosion of the supernova. This dust was not supposed to be there, but it may be coming from a third member of the intra-nebular environment. In addition, the ring-shaped nebula contained masses of swirling streams and ripples that are challenging astrophysicists who study supernova and nebula formation to refine their current models.

Quantum Machine Learning

Machine learning—the training of computers to identify and manipulate complicated patterns within massive data—has been on a roll in recent years, ever since efficient training algorithms were developed in the early 2000’s for large multilayer neural networks. Classical machine learning can take billions of bits of data and condense it down to understandable information in a matter of minutes. However, there are types of problems that even conventional machine learning might take the age of the universe to calculate, for instance calculating the properties of quantum systems based on a set of quantum measurements of the system.

In June of 2022, researchers at Caltech and Google announced that a quantum computer—Google’s Sycamore quantum computer—could calculate properties of quantum systems using exponentially fewer measurements than would be required to perform the same task using conventional computers. Quantum machine learning uses the resource of quantum entanglement that is not available to conventional machine learning, enabling new types of algorithms that can exponentially speed up calculations of quantum systems. It may come as no surprise that quantum computers are ideally suited to making calculations of quantum systems.

Part of Google's Sycamore quantum computer
Science News. External view of Google’s Sycamore quantum computer.

A Possible Heavy W Boson

High-energy particle physics has been in a crisis ever since 2012 when they reached the pinnacle of a dogged half-century search for the fundamental constituents of the universe. The Higgs boson was the crowning achievement, and was supposed to be the vanguard of a new frontier of physics uncovered by CERN. But little new physics has emerged, even though fundamental physics is in dire need of new results. For instance, dark matter and dark energy remain unsolved mysteries despite making up the vast majority of all there is. Therefore, when physicists at Fermilab announced that the W boson, a particle that carries the nuclear weak interaction, was heavier than predicted by the Standard Model, some physicists heaved sighs of relief. The excess mass could signal higher-energy contributions that might lead to new particles or interactions … if the excess weight holds up under continued scrutiny.

Science magazine. April 8, 2022

Imaging the Black Hole at the Center of the Milky Way

Imagine building a telescope the size of the Earth. What could it see?

If it detected in the optical regime, it could see a baseball on the surface of the Moon. If it detected at microwave frequencies, then it could see the material swirling around distant black holes. This is what the Event Horizon Telescope (EHT) can do. In 2019, it revealed the first image of a black hole: the super-massive black hole at the core of the M87 galaxy 53 million light years away. They did this Herculean feat by combining the signals of microwave telescopes from across the globe, combining their signals interferometrically to create an effective telescope aperture that was the size of the Earth.

The next obvious candidate was the black hole at the center of our own galaxy, the Milky Way. Even though our own black hole is much smaller than the one in M87, ours is much closer, and both subtend about the same solid angle. The challenge was observing it through the swirling stars and dust at the core of our galaxy. In May of this year, the EHT unveiled the first image of our own black hole, showing the radiation emitted by the in-falling material.

BBC image of the black hole at the core of our Milky Way galaxy.


Nuclear physics is a venerable part of modern physics that harkens back to the days of Bohr and Rutherford and the beginning of quantum physics, but in recent years it has yielded few new surprises (except at the RHIC collider which smashes heavy nuclei against each other to create quark-gluon plasma). That changed in June of 2022, when researchers in Germany announced the successful measurement of a tetraneutron–a cluster of four neutrons bound transiently together by the strong nuclear force.

Neutrons are the super-glue that holds together the nucleons in standard nuclei. The force is immense, strong enough to counteract the Coulomb repulsion of protons in a nucleus. For instance, Uranium 238 has 92 protons crammed within a volume of about 10 femtometer radius. It takes 146 neutrons to bind these together without flying apart. But neutrons don’t tend to bind to themselves, except in “resonance” states that decay rapidly. In 2012, a dineutron (two neutrons bound in a transient resonance state) was observed, but four neutrons were expected to produce an even more transient resonance (a three-neutron state is not allowed). When the German group created the tetraneutron, it had a lifetime of only about 1×10-21 seconds, so it is extremely ephemeral. Nonetheless, studying the properties of the tetraneutron may give insights into both the strong and weak nuclear forces.

Hi-Tc superconductivity

When Bednorz and Müller discovered Hi-Tc superconductivity in 1986, it set off both a boom and a crisis. The boom was the opportunity to raise the critical temperature of superconductivity from 23 K that had been the world record held by Nb3Ge for 13 years since it was set in 1973. The crisis was that the new Hi-Tc materials violated the established theory of superconductivity explained by Bardeen-Cooper-Schrieffer (BCS). There was almost nothing in the theory of solid state physics that could explain how such high critical temperatures could be attained. At the March Meeting of the APS the following year in 1987, the session on the new Hi-Tc materials and possible new theories became known as the Woodstock of Physics, where physicists camped out in the hallway straining their ears to hear the latest ideas on the subject.

One of the ideas put forward at the session was the idea of superexchange by Phil Anderson. The superexchange of two electrons is related to their ability to hop from one lattice site to another. If the hops are coordinated, then there can be an overall reduction in their energy, creating a ground state of long-range coordinated electron hopping that could support superconductivity. Anderson was perhaps the physicist best situated to suggest this theory because of his close familiarity with what was, even then, known as the Anderson Hamiltonian that explicitly describes the role of hopping in solid-state many-body phenomena.

Ever since, the idea of superexchange has been floating around the field of Hi-Tc superconductivity, but no one had been able to pin it down conclusively, until now. In a paper published in the PNAS in September of 2022, an experimental group at Oxford presented direct observations of the spatial density of Cooper pairs in relation to the spatial hopping rates—where hopping was easiest then the Cooper pair density was highest, and vice versa. This experiment provides almost indisputable evidence in favor of Anderson’s superexchange mechanism for Cooper pair formation in the Hi-Tc materials, laying to rest the crisis launched 36 years ago.

Holographic Wormhole

The holographic principle of cosmology proposes that our three-dimensional physical reality—stars, galaxies, expanding universe—is like the projection of information encoded on a two-dimensional boundary—just as a two-dimensional optical hologram can be illuminated to recreate a three-dimensional visual representation. This 2D to 3D projection was first proposed by Gerald t’Hooft, inspired by the black hole information paradox in which the entropy of a black hole scales as surface area of the black hole instead of its volume. The holographic principle was expanded by Leonard Susskind in 1995 based on string theory and is one path to reconciling quantum physics with the physics of gravitation in a theory of quantum gravity—one of the Holy Grails of physics.

While it is an elegant cosmic idea, the holographic principle could not be viewed as anything down to Earth, until now. In November 2022 a research group at Caltech published a paper in Nature describing how they used Google’s Sycamore quantum computer (housed at UC Santa Barbara) to manipulate a set of qubits into creating a laboratory-based analog of a Einstein-Rosen bridge, also known as a “wormhole”, through spacetime. The ability to use quantum information states to simulate a highly-warped spacetime analog provides the first experimental evidence for the validity of the cosmological holographic principle. Although the simulation did not produce a physical wormhole in our spacetime, it showed how quantum information and differential geometry (the mathematics of general relativity) can be connected.

One of the most important consequences of this work is the proposal that ER = EPR (Einstein-Rosen = Einstein-Podolsky-Rosen). The EPR paradox of quantum entanglement has long been viewed as a fundamental paradox of physics that requires instantaneous non-local correlations among quantum particles that can be arbitrarily far apart. Although EPR violates local realism, it is a valuable real-world resource for quantum teleportation. By demonstrating the holographic wormhole, the recent Caltech results show how quantum teleportation and gravitational wormholes may arise from the same physics.

Net-Positive-Energy from Nuclear Fusion

Ever since nuclear fission was harnessed to generate energy, the idea of tapping the even greater potential of nuclear fusion to power the world has been a dream of nuclear physicists. Nuclear fusion energy would be clean and green and could help us avoid the long-run disaster of global warming. However, achieving that dream has been surprisingly frustrating. While nuclear fission was harnessed for energy (and weapons) within only a few years of discovery, and a fusion “boost” was added to nuclear destructive power in the so-called hydrogen bomb, sustained energy production from fusion has remained elusive.

In December of 2022, the National Ignition Facility (NIF) focussed the power of 192 pulsed lasers onto a deuterium-tritium pellet, causing it to implode, and the nuclei to fuse, releasing about 50% more energy that it absorbed. This was the first time that controlled fusion released net positive energy—about 3 million Joules out from 2 million Joules in—enough energy to boil about 3 liters of water. This accomplishment represents a major milestone in the history of physics and could one day provide useful energy. The annual budget of the NIF is about 300 million dollars, so there is a long road ahead (probably several more decades) before this energy source can be scaled down to an economical level.

Laser fusion experiment yields record energy at LLNL's National Ignition  Facility | Lawrence Livermore National Laboratory
NIF image.

By David D. Nolte Jan. 16, 2023

Paul Dirac’s Delta Function

Physical reality is nothing but a bunch of spikes and pulses—or glitches.  Take any smooth phenomenon, no matter how benign it might seem, and decompose it into an infinitely dense array of infinitesimally transient, infinitely high glitches.  Then the sum of all glitches, weighted appropriately, becomes the phenomenon.  This might be called the “glitch” function—but it is better known as Green’s function in honor of the ex-millwright George Green who taught himself mathematics at night to became one of England’s leading mathematicians of the age. 

The δ function is thus merely a convenient notation … we perform operations on the abstract symbols, such as differentiation and integration …

PAM Dirac (1930)

The mathematics behind the “glitch” has a long history that began in the golden era of French analysis with the mathematicians Cauchy and Fourier, was employed by the electrical engineer Heaviside, and ultimately fell into the fertile hands of the quantum physicist, Paul Dirac, after whom it is named.

Augustin-Louis Cauchy (1815)

The French mathematician and physicist Augustin-Louis Cauchy (1789 – 1857) has lent his name to a wide array of theorems, proofs and laws that are still in use today. In mathematics, he was one of the first to establish “modern” functional analysis and especially complex analysis. In physics he established a rigorous foundation for elasticity theory (including the elastic properties of the so-called luminiferous ether).

Augustin-Louis Cauchy

In the early days of the 1800’s Cauchy was exploring how integrals could be used to define properties of functions.  In modern terminology we would say that he was defining kernel integrals, where a function is integrated over a kernel to yield some property of the function.

In 1815 Cauchy read before the Academy of Paris a paper with the long title “Theory of wave propagation on a surface of a fluid of indefinite weight”.  The paper was not published until more than ten years later in 1827 by which time it had expanded to 300 pages and contained numerous footnotes.  The thirteenth such footnote was titled “On definite integrals and the principal values of indefinite integrals” and it contained one of the first examples of what would later become known as a generalized distribution.  The integral is a function F(μ) integrated over a kernel

Cauchy lets the scale parameter α be “an infinitely small number”.  The kernel is thus essentially zero for any values of μ “not too close to α”.  Today, we would call the kernel given by

in the limit that α vanishes, “the delta function”.

Cauchy’s approach to the delta function is today one of the most commonly used descriptions of what a delta function is.  It is not enough to simply say that a delta function is an infinitely narrow, infinitely high function whose integral is equal to unity.  It helps to illustrate the behavior of the Cauchy function as α gets progressively smaller, as shown in Fig. 1. 

Fig. 1 Cauchy function for decreasing scale factor α approaches a delta function in the limit.

In the limit as α approaches zero, the function grows progressively higher and progressively narrower, but the integral over the function remains unity.

Joseph Fourier (1822)

The delayed publication of Cauchy’s memoire kept it out of common knowledge, so it can be excused if Joseph Fourier (1768 – 1830) may not have known of it by the time he published his monumental work on heat in 1822.  Perhaps this is why Fourier’s approach to the delta function was also different than Cauchy’s. 

Fourier noted that an integral over a sinusoidal function, as the argument of the sinusoidal function went to infinity, became independent of the limits of integration. He showed

when ε << 1/p as p went to infinity. In modern notation, this would be the delta function defined through the “sinc” function

and Fourier noted that integrating this form over another function f(x) yielded the value of the function f(α) evaluated at α, rediscovering the results of Cauchy, but using a sinc(x) function in Fig. 2 instead of the Cauchy function of Fig. 1.

Fig. 2 Sinc function for increasing scale factor p approaches a delta function in the limit.

George Green’s Function (1829)

A history of the delta function cannot be complete without mention of George Green, one of the most remarkable British mathematicians of the 1800’s.  He was a miller’s son who had only one year of education and spent most of his early life tending to his father’s mill.  In his spare time, and to cut the tedium of his work, he read the most up-to-date work of the French mathematicians, reading the papers of Cauchy and Poisson and Fourier, whose work far surpassed the British work at that time.  Unbelievably, he mastered the material and developed new material of his own, that he eventually self published.  This is the mathematical work that introduced the potential function and introduced fundamental solutions to unit sources—what today would be called point charges or delta functions.  These fundamental solutions are equivalent to the modern Green’s function, although they were developed rigorously much later by Courant and Hilbert and by Kirchhoff.

George Green’s flour mill in Sneinton, England.

The modern idea of a Green’s function is simply the system response to a unit impulse—like throwing a pebble into a pond to launch expanding ripples or striking a bell.  To obtain the solutions for a general impulse, one integrates over the fundamental solutions weighted by the strength of the impulse.  If the system response to a delta function impulse at x = a, that is, a delta function δ(x-a), is G(x-a), then the response of the system to a distributed force f(x) is given by

where G(x-a) is called the Green’s function.

Fig. Principle of Green’s function. The Green’s function is the system response to a delta-function impulse. The net system response is the integral over all the individual system responses summed over each of the impulses.

Oliver Heaviside (1893)

Oliver Heaviside (1850 – 1925) tended to follow his own path, independently of whatever the mathematicians were doing.  Heaviside took particularly pragmatic approaches based on physical phenomena and how they might behave in an experiment.  This is the context in which he introduced once again the delta function, unaware of the work of Cauchy or Fourier.

Oliver Heaviside

Heaviside was an engineer at heart who practiced his art by doing. He was not concerned with rigor, only with what works. This part of his personality may have been forged by his apprenticeship in telegraph technology helped by his uncle Charles Wheatstone (of the Wheatstone bridge). While still a young man, Heaviside tried to tackle Maxwell on his new treatise on electricity and magnetism, but he realized his mathematics were lacking, so he began a project of self education that took several years. The product of those years was his development of an idiosyncratic approach to electronics that may be best described as operator algebra. His algebra contained mis-behaved functions, such as the step function that was later named after him. It could also handle the derivative of the step function, which is yet another way of defining the delta function, though certainly not to the satisfaction of any rigorous mathematician—but it worked. The operator theory could even handle the derivative of the delta function.

The Heaviside function (step function) and its derivative the delta function.

Perhaps the most important influence by Heaviside was his connection of the delta function to Fourier integrals. He was one of the first to show that

which states that the Fourier transform of a delta function is a complex sinusoid, and the Fourier transform of a sinusoid is a delta function. Heaviside wrote several influential textbooks on his methods, and by the 1920’s these methods, including the Heaviside function and its derivative, had become standard parts of the engineer’s mathematical toolbox.

Given the work by Cauchy, Fourier, Green and Heaviside, what was left for Paul Dirac to do?

Paul Dirac (1930)

Paul Dirac (1902 – 1984) was given the moniker “The Strangest Man” by Niels Bohr during his visit to Copenhagen shortly after he had received his PhD.  In part, this was because of Dirac’s internal intensity that could make him seem disconnected from those around him. When he was working on a problem in his head, it was not unusual for him to start walking, and by the time he he became aware of his surroundings again, he would have walked the length of the city of Copenhagen. And his solutions to problems were ingenious, breaking bold new ground where others, some of whom were geniuses themselves, were fumbling in the dark.

P. A. M. Dirac

Among his many influential works—works that changed how physicists thought of and wrote about quantum systems—was his 1930 textbook on quantum mechanics. This was more than just a textbook, because it invented new methods by unifying the wave mechanics of Schrödinger with the matrix mechanics of Born and Heisenberg.

In particular, there had been a disconnect between bound electron states in a potential and free electron states scattering off of the potential. In the one case the states have a discrete spectrum, i.e. quantized, while in the other case the states have a continuous spectrum. There were standard quantum tools for decomposing discrete states by a projection onto eigenstates in Hilbert space, but an entirely different set of tools for handling the scattering states.

Yet Dirac saw a commonality between the two approaches. Specifically, eigenstate decomposition on the one hand used discrete sums of states, while scattering solutions on the other hand used integration over a continuum of states. In the first format, orthogonality was denoted by a Kronecker delta notation, but there was no equivalent in the continuum case—until Dirac introduced the delta function as a kernel in the integrand. In this way, the form of the equations with sums over states multiplied by Kronecker deltas took on the same form as integrals over states multiplied by the delta function.

Page 64 of Dirac’s 1930 edition of Quantum Mechanics.

In addition to introducing the delta function into the quantum formulas, Dirac also explored many of the properties and rules of the delta function. He was aware that the delta function was not a “proper” function, but by beginning with a simple integral property as a starting axiom, he could derive virtually all of the extended properties of the delta function, including properties of its derivatives.

Mathematicians, of course, were appalled and were quick to point out the insufficiency of the mathematical foundation for Dirac’s delta function, until the French mathematician Laurent Schwartz (1915 – 2002) developed the general theory of distributions in the 1940’s, which finally put the delta function in good standing.

Dirac’s introduction, development and use of the delta function was the first systematic definition of its properties. The earlier work by Cauchy, Fourier, Green and Heaviside had all touched upon the behavior of such “spiked” functions, but they had used it in passing. After Dirac, physicists embraced it as a powerful new tool in their toolbox, despite the lag in its formal acceptance by mathematicians, until the work of Schwartz redeemed it.

By David D. Nolte Feb. 17, 2022


V. Balakrishnan, “All about the Dirac Delta function(?)”, Resonance, Aug., pg. 48 (2003)

M. G. Katz. “Who Invented Dirac’s Delta Function?”, Semantic Scholar (2010).

J. Lützen, The prehistory of the theory of distributions. Studies in the history of mathematics and physical sciences ; 7 (Springer-Verlag, New York, 1982).

A Short History of Quantum Entanglement

Despite the many apparent paradoxes posed in physics—the twin and ladder paradoxes of relativity theory, Olber’s paradox of the bright night sky, Loschmitt’s paradox of irreversible statistical fluctuations—these are resolved by a deeper look at the underlying assumptions—the twin paradox is resolved by considering shifts in reference frames, the ladder paradox is resolved by the loss of simultaneity, Olber’s paradox is resolved by a finite age to the universe, and Loschmitt’s paradox is resolved by fluctuation theorems.  In each case, no physical principle is violated, and each paradox is fully explained.

However, there is at least one “true” paradox in physics that defies consistent explanation—quantum entanglement.  Quantum entanglement was first described by Einstein with colleagues Podolsky and Rosen in the famous EPR paper of 1935 as an argument against the completeness of quantum mechanics, and it was given its name by Schrödinger the same year in the paper where he introduced his “cat” as a burlesque consequence of entanglement. 

Here is a short history of quantum entanglement [1], from its beginnings in 1935 to the recent 2022 Nobel prize in Physics awarded to John Clauser, Alain Aspect and Anton Zeilinger.

The EPR Papers of 1935

Einstein can be considered as the father of quantum mechanics, even over Planck, because of his 1905 derivation of the existence of the photon as a discrete carrier of a quantum of energy (see Einstein versus Planck).  Even so, as Heisenberg and Bohr advanced quantum mechanics in the mid 1920’s, emphasizing the underlying non-deterministic outcomes of measurements, and in particular the notion of instantaneous wavefunction collapse, they pushed the theory in directions that Einstein found increasingly disturbing and unacceptable. 

This feature is an excerpt from an upcoming book, Interference: The History of Optical Interferometry and the Scientists Who Tamed Light (Oxford University Press, July 2023), by David D. Nolte.

At the invitation-only Solvay Congresses of 1927 and 1930, where all the top physicists met to debate the latest advances, Einstein and Bohr began a running debate that was epic in the history of physics as the two top minds went head-to-head as the onlookers looked on in awe.  Ultimately, Einstein was on the losing end.  Although he was convinced that something was missing in quantum theory, he could not counter all of Bohr’s rejoinders, even as Einstein’s assaults became ever more sophisticated, and he left the field of battle beaten but not convinced.  Several years later he launched his last and ultimate salvo.

Fig. 1 Niels Bohr and Albert Einstein

At the Institute for Advanced Study in Princeton, New Jersey, in the 1930’s Einstein was working with Nathan Rosen and Boris Podolsky when he envisioned a fundamental paradox in quantum theory that occurred when two widely-separated quantum particles were required to share specific physical properties because of simple conservation theorems like energy and momentum.  Even Bohr and Heisenberg could not deny the principle of conservation of energy and momentum, and Einstein devised a two-particle system for which these conservation principles led to an apparent violation of Heisenberg’s own uncertainty principle.  He left the details to his colleagues, with Podolsky writing up the main arguments.  They published the paper in the Physical Review in March of 1935 with the title “Can Quantum-Mechanical Description of Physical Reality be Considered Complete” [2].  Because of the three names on the paper (Einstein, Podolsky, Rosen), it became known as the EPR paper, and the paradox they presented became known as the EPR paradox.

When Bohr read the paper, he was initially stumped and aghast.  He felt that EPR had shaken the very foundations of the quantum theory that he and his institute had fought so hard to establish.  He also suspected that EPR had made a mistake in their arguments, and he halted all work at his institute in Copenhagen until they could construct a definitive answer.  A few months later, Bohr published a paper in the Physical Review in July of 1935, using the identical title that EPR had used, in which he refuted the EPR paradox [3].  There is not a single equation or figure in the paper, but he used his “awful incantation terminology” to maximum effect, showing that one of the EPR assumptions on the assessment of uncertainties to position and momentum was in error, and he was right.

Einstein was disgusted.  He had hoped that this ultimate argument against the completeness of quantum mechanics would stand the test of time, but Bohr had shot it down within mere months.  Einstein was particularly disappointed with Podolsky, because Podolsky had tried too hard to make the argument specific to position and momentum, leaving a loophole for Bohr to wiggle through, where Einstein had wanted the argument to rest on deeper and more general principles. 

Despite Bohr’s victory, Einstein had been correct in his initial formulation of the EPR paradox that showed quantum mechanics did not jibe with common notions of reality.  He and Schrödinger exchanged letters commiserating with each other and encouraging each other in their counter beliefs against Bohr and Heisenberg.  In November of 1935, Schrödinger published a broad, mostly philosophical, paper in Naturwissenschaften [4] in which he amplified the EPR paradox with the use of an absurd—what he called burlesque—consequence of wavefunction collapse that became known as Schrödinger’s Cat.  He also gave the central property of the EPR paradox its name: entanglement.

Ironically, both Einstein’s entanglement paradox and Schrödinger’s Cat, which were formulated originally to be arguments against the validity of quantum theory, have become established quantum tools.  Today, entangled particles are the core workhorses of quantum information systems, and physicists are building larger and larger versions of Schrödinger’s Cat that may eventually merge with the physics of the macroscopic world.

Bohm and Ahronov Tackle EPR

The physicist David Bohm was a rare political exile from the United States.  He was born in the heart of Pennsylvania in the town of Wilkes-Barre, attended Penn State and then the University of California at Berkeley, where he joined Robert Oppenheimer’s research group.  While there, he became deeply involved in the fight for unions and socialism, activities for which he was called before McCarthy’s Committee on Un-American Activities.  He invoked his right to the fifth amendment for which he was arrested.  Although he was later acquitted, Princeton University fired him from his faculty position, and fearing another arrest, he fled to Brazil where his US passport was confiscated by American authorities.  He had become a physicist without a country. 

Fig. 2 David Bohm

Despite his personal trials, Bohm remained scientifically productive.  He published his influential textbook on quantum mechanics in the midst of his Senate hearings, and after a particularly stimulating discussion with Einstein shortly before he fled the US, he developed and published an alternative version of quantum theory in 1952 that was fully deterministic—removing Einstein’s “God playing dice”—by creating a hidden-variable theory [5].

Hidden-variable theories of quantum mechanics seek to remove the randomness of quantum measurement by assuming that some deeper element of quantum phenomena—a hidden variable—explains each outcome.  But it is also assumed that these hidden variables are not directly accessible to experiment.  In this sense, the quantum theory of Bohr and Heisenberg was “correct” but not “complete”, because there were things that the theory could not predict or explain.

Bohm’s hidden variable theory, based on a quantum potential, was able to reproduce all the known results of standard quantum theory without invoking the random experimental outcomes that Einstein abhorred.  However, it still contained one crucial element that could not sweep away the EPR paradox—it was nonlocal.

Nonlocality lies at the heart of quantum theory.  In its simplest form, the nonlocal nature of quantum phenomenon says that quantum states span spacetime with space-like separations, meaning that parts of the wavefunction are non-causally connected to other parts of the wavefunction.  Because Einstein was fundamentally committed to causality, the nonlocality of quantum theory was what he found most objectionable, and Bohm’s elegant hidden-variable theory, that removed Einstein’s dreaded randomness, could not remove that last objection of non-causality.

After working in Brazil for several years, Bohm moved to the Technion University in Israel where he began a fruitful collaboration with Yakir Ahronov.  In addition to proposing the Ahronov-Bohm effect, in 1957 they reformulated Podolsky’s version of the EPR paradox that relied on continuous values of position and momentum and replaced it with a much simpler model based on the Stern-Gerlach effect on spins and further to the case of positronium decay into two photons with correlated polarizations.  Bohm and Ahronov reassessed experimental results of positronium decay that had been made by Madame Wu in 1950 at Columbia University and found it in full agreement with standard quantum theory.

John Bell’s Inequalities

John Stuart Bell had an unusual start for a physicist.  His family was too poor to give him an education appropriate to his skills, so he enrolled in vocational school where he took practical classes that included brick laying.  Working later as a technician in a university lab, he caught the attention of his professors who sponsored him to attend the university.  With a degree in physics, he began working at CERN as an accelerator designer when he again caught the attention of his supervisors who sponsored him to attend graduate school.  He graduated with a PhD and returned to CERN as a card-carrying physicist with all the rights and privileges that entailed.

Fig. 3 John Bell

During his university days, he had been fascinated by the EPR paradox, and he continued thinking about the fundamentals of quantum theory.  On a sabbatical to the Stanford accelerator in 1960 he began putting mathematics to the EPR paradox to see whether any local hidden variable theory could be compatible with quantum mechanics.  His analysis was fully general, so that it could rule out as-yet-unthought-of hidden-variable theories.  The result of this work was a set of inequalities that must be obeyed by any local hidden-variable theory.  Then he made a simple check using the known results of quantum measurement and showed that his inequalities are violated by quantum systems.  This ruled out the possibility of any local hidden variable theory (but not Bohm’s nonlocal hidden-variable theory).  Bell published his analysis in 1964 [6] in an obscure journal that almost no one read…except for a curious graduate student at Columbia University who began digging into the fundamental underpinnings of quantum theory against his supervisor’s advice.

Fig. 4 Polarization measurements on entangled photons violate Bell’s inequality.

John Clauser’s Tenacious Pursuit

As a graduate student in astrophysics at Columbia University, John Clauser was supposed to be doing astrophysics.  Instead, he spent his time musing over the fundamentals of quantum theory.  In 1967 Clauser stumbled across Bell’s paper while he was in the library.  The paper caught his imagination, but he also recognized that the inequalities were not experimentally testable, because they required measurements that depended directly on hidden variables, which are not accessible.  He began thinking of ways to construct similar inequalities that could be put to an experimental test, and he wrote about his ideas to Bell, who responded with encouragement.  Clauser wrote up his ideas in an abstract for an upcoming meeting of the American Physical Society, where one of the abstract reviewers was Abner Shimony of Boston University.  Clauser was surprised weeks later when he received a telephone call from Shimony.  Shimony and his graduate student Micheal Horne had been thinking along similar lines, and Shimony proposed to Clauser that they join forces.  They met in Boston where they were met Richard Holt, a graudate student at Harvard who was working on experimental tests of quantum mechanics.  Collectively, they devised a new type of Bell inequality that could be put to experimental test [7].  The result has become known as the CHSH Bell inequality (after Clauser, Horne, Shimony and Holt).

Fig. 5 John Clauser

When Clauser took a post-doc position in Berkeley, he began searching for a way to do the experiments to test the CHSH inequality, even though Holt had a head start at Harvard.  Clauser enlisted the help of Charles Townes, who convinced one of the Berkeley faculty to loan Clauser his graduate student, Stuart Freedman, to help.  Clauser and Freedman performed the experiments, using a two-photon optical decay of calcium ions and found a violation of the CHSH inequality by 5 standard deviations, publishing their result in 1972 [8]. 

Fig. 6 CHSH inequality violated by entangled photons.

Alain Aspect’s Non-locality

Just as Clauser’s life was changed when he stumbled on Bell’s obscure paper in 1967, the paper had the same effect on the life of French physicist Alain Aspect who stumbled on it in 1975.  Like Clauser, he also sought out Bell for his opinion, meeting with him in Geneva, and Aspect similarly received Bell’s encouragement, this time with the hope to build upon Clauser’s work. 

Fig. 7 Alain Aspect

In some respects, the conceptual breakthrough achieved by Clauser had been the CHSH inequality that could be tested experimentally.  The subsequent Clauser Freedman experiments were not a conclusion, but were just the beginning, opening the door to deeper tests.  For instance, in the Clauser-Freedman experiments, the polarizers were static, and the detectors were not widely separated, which allowed the measurements to be time-like separated in spacetime.  Therefore, the fundamental non-local nature of quantum physics had not been tested.

Aspect began a thorough and systematic program, that would take him nearly a decade to complete, to test the CHSH inequality under conditions of non-locality.  He began with a much brighter source of photons produced using laser excitation of the calcium ions.  This allowed him to perform the experiment in 100’s of seconds instead of the hundreds of hours by Clauser.  With such a high data rate, Aspect was able to verify violation of the Bell inequality to 10 standard deviations, published in 1981 [9].

However, the real goal was to change the orientations of the polarizers while the photons were in flight to widely separated detectors [10].  This experiment would allow the detection to be space-like separated in spacetime.  The experiments were performed using fast-switching acoustic-optic modulators, and the Bell inequality was violated to 5 standard deviations [11].  This was the most stringent test yet performed and the first to fully demonstrate the non-local nature of quantum physics.

Anton Zeilinger: Master of Entanglement

If there is one physicist today whose work encompasses the broadest range of entangled phenomena, it would be the Austrian physicist, Anton Zeilinger.  He began his career in neutron interferometery, but when he was bitten by the entanglement bug in 1976, he switched to quantum photonics because of the superior control that can be exercised using optics over sources and receivers and all the optical manipulations in between.

Fig. 8 Anton Zeilinger

Working with Daniel Greenberger and Micheal Horne, they took the essential next step past the Bohm two-particle entanglement to consider a 3-particle entangled state that had surprising properties.  While the violation of locality by the two-particle entanglement was observed through the statistical properties of many measurements, the new 3-particle entanglement could show violations on single measurements, further strengthening the arguments for quantum non-locality.  This new state is called the GHZ state (after Greenberger, Horne and Zeilinger) [12].

As the Zeilinger group in Vienna was working towards experimental demonstrations of the GHZ state, Charles Bennett of IBM proposed the possibility for quantum teleportation, using entanglement as a core quantum information resource [13].   Zeilinger realized that his experimental set-up could perform an experimental demonstration of the effect, and in a rapid re-tooling of the experimental apparatus [14], the Zeilinger group was the first to demonstrate quantum teleportation that satisfied the conditions of the Bennett teleportation proposal [15].  An Italian-UK collaboration also made an early demonstration of a related form of teleportation in a paper that was submitted first, but published after Zeilinger’s, due to delays in review [16].  But teleportation was just one of a widening array of quantum applications for entanglement that was pursued by the Zeilinger group over the succeeding 30 years [17], including entanglement swapping, quantum repeaters, and entanglement-based quantum cryptography. Perhaps most striking, he has worked on projects at astronomical observatories that entangle photons coming from cosmic sources.

By David D. Nolte Nov. 26, 2022

Video Lectures

YouTube Lecture on the History of Quantum Entanglement

Physics Colloquium on the Backstory of the 2023 Nobel Prize in Physics


1935 – Einstein EPR

1935 – Bohr EPR

1935 – Schrödinger: Entanglement and Cat

1950 – Madam Wu positron decay

1952 – David Bohm and Non-local hidden variables

1957 – Bohm and Ahronov version of EPR

1963 – Bell’s inequalities

1967 – Clauser reads Bell’s paper

1967 – Commins experiment with Calcium

1969 – CHSH inequality: measurable with detection inefficiencies

1972 – Clauser and Freedman experiment

1975 – Aspect reads Bell’s paper

1976 – Zeilinger reads Bell’s paper

1981 – Aspect two-photon generation source

1982 – Aspect time variable analyzers

1988 – Parametric down-conversion of EPR pairs (Shih and Alley, Ou and Mandel)

1989 – GHZ state proposed

1993 – Bennett quantum teleportation proposal

1995 – High-intensity down-conversion source of EPR pairs (Kwiat and Zeilinger)

1997 – Zeilinger quantum teleportation experiment

1999 – Observation of the GHZ state


[1] See the full details in: David D. Nolte, Interference: A History of Interferometry and the Scientists Who Tamed Light (Oxford University Press, July 2023)

[2] A. Einstein, B. Podolsky, N. Rosen, Can quantum-mechanical description of physical reality be considered complete? Physical Review 47, 0777-0780 (1935).

[3] N. Bohr, Can quantum-mechanical description of physical reality be considered complete? Physical Review 48, 696-702 (1935).

[4] E. Schrödinger, Die gegenwärtige Situation in der Quantenmechanik. Die Naturwissenschaften 23, 807-12; 823-28; 844-49 (1935).

[5] D. Bohm, A suggested interpretation of the quantum theory in terms of hidden variables .1. Physical Review 85, 166-179 (1952); D. Bohm, A suggested interpretation of the quantum theory in terms of hidden variables .2. Physical Review 85, 180-193 (1952).

[6] J. Bell, On the Einstein-Podolsky-Rosen paradox. Physics 1, 195 (1964).

[7] 1. J. F. Clauser, M. A. Horne, A. Shimony, R. A. Holt, Proposed experiment to test local hidden-variable theories. Physical Review Letters 23, 880-& (1969).

[8] S. J. Freedman, J. F. Clauser, Experimental test of local hidden-variable theories. Physical Review Letters 28, 938-& (1972).

[9] A. Aspect, P. Grangier, G. Roger, EXPERIMENTAL TESTS OF REALISTIC LOCAL THEORIES VIA BELLS THEOREM. Physical Review Letters 47, 460-463 (1981).

[10]  Alain Aspect, Bell’s Theorem: The Naïve Veiw of an Experimentalit. (2004), hal- 00001079

[11] A. Aspect, J. Dalibard, G. Roger, EXPERIMENTAL TEST OF BELL INEQUALITIES USING TIME-VARYING ANALYZERS. Physical Review Letters 49, 1804-1807 (1982).

[12] D. M. Greenberger, M. A. Horne, A. Zeilinger, in 1988 Fall Workshop on Bells Theorem, Quantum Theory and Conceptions of the Universe. (George Mason Univ, Fairfax, Va, 1988), vol. 37, pp. 69-72.

[13] C. H. Bennett, G. Brassard, C. Crepeau, R. Jozsa, A. Peres, W. K. Wootters, Teleporting an unknown quantum state via dual classical and einstein-podolsky-rosen channels. Physical Review Letters 70, 1895-1899 (1993).

[14]  J. Gea-Banacloche, Optical realizations of quantum teleportation, in Progress in Optics, Vol 46, E. Wolf, Ed. (2004), vol. 46, pp. 311-353.

[15] D. Bouwmeester, J.-W. Pan, K. Mattle, M. Eibl, H. Weinfurter, A. Zeilinger, Experimental quantum teleportation. Nature 390, 575-579 (1997).

[16] D. Boschi, S. Branca, F. De Martini, L. Hardy, S. Popescu, Experimental realization of teleporting an unknown pure quantum state via dual classical and Einstein-podolsky-Rosen Channels. Phys. Rev. Lett. 80, 1121-1125 (1998).

[17]  A. Zeilinger, Light for the quantum. Entangled photons and their applications: a very personal perspective. Physica Scripta 92, 1-33 (2017).

New from Oxford Press: The History of Optical Interferometry (Late Summer 2023)

A Short History of Quantum Tunneling

Quantum physics is often called “weird” because it does things that are not allowed in classical physics and hence is viewed as non-intuitive or strange.  Perhaps the two “weirdest” aspects of quantum physics are quantum entanglement and quantum tunneling.  Entanglement allows a particle state to extend across wide expanses of space, while tunneling allows a particle to have negative kinetic energy.  Neither of these effects has a classical analog.

Quantum entanglement arose out of the Bohr-Einstein debates at the Solvay Conferences in the 1920’s and 30’s, and it was the subject of a recent Nobel Prize in Physics (2022).  The quantum tunneling story is just as old, but it was recognized much earlier by the Nobel Prize in 1972 when it was awarded to Brian Josephson, Ivar Giaever and Leo Esaki—each of whom was a graduate student when they discovered their respective effects and two of whom got their big idea while attending a lecture class. 

Always go to class, you never know what you might miss, and the payoff is sometimes BIG

Ivar Giaever

Of the two effects, tunneling is the more common and the more useful in modern electronic devices (although entanglement is coming up fast with the advent of quantum information science). Here is a short history of quantum tunneling, told through a series of publications that advanced theory and experiments.

Double-Well Potential: Friedrich Hund (1927)

The first analysis of quantum tunneling was performed by Friedrich Hund (1896 – 1997), a German physicist who studied early in his career with Born in Göttingen and Bohr in Copenhagen.  He published a series of papers in 1927 in Zeitschrift für Physik [1] that solved the newly-proposed Schrödinger equation for the case of the double well potential.  He was particularly interested in the formation of symmetric and anti-symmetric states of the double well that contributed to the binding energy of atoms in molecules.  He derived the first tunneling-frequency expression for a quantum superposition of the symmetric and anti-symmetric states

where f is the coherent oscillation frequency, V is the height of the potential and hν is the quantum energy of the isolated states when the atoms are far apart.  The exponential dependence on the potential height V made the tunnel effect extremely sensitive to the details of the tunnel barrier.

Fig. 1 Friedrich Hund

Electron Emission: Lothar Nordheim and Ralph Fowler (1927 – 1928)

The first to consider quantum tunneling from a bound state to a continuum state was Lothar Nordheim (1899 – 1985), a German physicist who studied under David Hilbert and Max Born at Göttingen and worked with John von Neumann and Eugene Wigner and later with Hans Bethe. In 1927 he solved the problem of a particle in a well that is separated from continuum states by a thin finite barrier [2]. Using the new Schrödinger theory, he found transmission coefficients that were finite valued, caused by quantum tunneling of the particle through the barrier. Nordheim’s use of square potential wells and barriers are now, literally, textbook examples that every student of quantum mechanics solves. (For a quantum simulation of wavefunction tunneling through a square barrier see the companion Quantum Tunneling YouTube video.) Nordheim later escaped the growing nationalism and anti-semitism in Germany in the mid 1930’s to become a visiting professor of physics at Purdue University in the United States, moving to a permanent position at Duke University.

Fig. 2 Nordheim square tunnel barrier and Fowler-Nordheim triangular tunnel barrier for electron tunneling from bound states into the continuum.

One of the giants of mathematical physics in the UK from the 1920s through the 1930’s was Ralph Fowler (1889 – 1944). Three of his doctoral students went on to win Nobel Prizes (Chandrasekhar, Dirac and Mott) and others came close (Bhabha, Hartree, Lennard-Jones). In 1928 Fowler worked with Nordheim on a more realistic version of Nordheim’s surface electron tunneling that could explain thermionic emission of electrons from metals under strong electric fields. The electric field modified Nordheim’s square potential barrier into a triangular barrier (which they treated using WKB theory) to obtain the tunneling rate [3]. This type of tunnel effect is now known as Fowler-Nordheim tunneling.

Nuclear Alpha Decay: George Gamow (1928)

George Gamov (1904 – 1968) is one of the icons of mid-twentieth-century physics. He was a substantial physicist who also had a solid sense of humor that allowed him to achieve a level of cultural popularity shared by a few of the larger-than-life physicists of his time, like Richard Feynman and Stephen Hawking. His popular books included One Two Three … Infinity as well as a favorite series of books under the rubric of Mr. Tompkins (Mr. Tompkins in Wonderland and Mr. Tompkins Explores the Atom, among others). He also wrote a history of the early years of quantum theory (Thirty Years that Shook Physics).

In 1928 Gamow was in Göttingen (the Mecca of early quantum theory) with Max Born when he realized that the radioactive decay of Uranium by alpha decay might be explained by quantum tunneling. It was known that nucleons were bound together by some unknown force in what would be an effective binding potential, but that charged alpha particles would also feel a strong electrostatic repulsive potential from a nucleus. Gamow combined these two potentials to create a potential landscape that was qualitatively similar to Nordheim’s original system of 1927, but with a potential barrier that was neither square nor triangular (like the Fowler-Nordheim situation).

Fig. 3 George Gamow

Gamow was able to make an accurate approximation that allowed him to express the decay rate in terms of an exponential term

where Zα is the atomic charge of the alpha particle, Z is the nuclear charge of the Uranium decay product and v is the speed of the alpha particle detected in external measurements [4].

The very next day after Gamow submitted his paper, Ronald Gurney and Edward Condon of Princeton University submitted a paper [5] that solved the same problem using virtually the same approach … except missing Gamow’s surprisingly concise analytic expression for the decay rate.

Molecular Tunneling: George Uhlenbeck (1932)

Because tunneling rates depend inversely on the mass of the particle tunneling through the barrier, electrons are more likely to tunnel through potential barriers than atoms. However, hydrogen is a particularly small atom and is therefore the most amenable to experiencing tunneling.

The first example of atom tunneling is associated with hydrogen in the ammonia molecule NH3. The molecule has a pyramidal structure with the Nitrogen hovering above the plane defined by the three hydrogens. However, an equivalent configuration has the Nitrogen hanging below the hydrogen plane. The energies of these two configurations are the same, but the Nitrogen must tunnel from one side of the hydrogen plane to the other through a barrier. The presence of light-weight hydrogen that can “move out of the way” for the nitrogen makes this barrier very small (infrared energies). When the ammonia is excited into its first vibrational excited state, the molecular wavefunction tunnels through the barrier, splitting the excited level by an energy associated with a wavelength of 1.2 cm which is in the microwave. This tunnel splitting was the first microwave transition observed in spectroscopy and is used in ammonia masers.

Fig. 4 Nitrogen inversion in the ammonia molecule is achieved by excitation to a vibrational excited state followed by tunneling through the barrier, proposed by George Uhlenbeck in 1932.

One of the earliest papers [6] written on the tunneling of nitrogen in ammonia was published by George Uhlenbeck in 1932. George Uhlenbeck (1900 – 1988) was a Dutch-American theoretical physicist. He played a critical role, with Samuel Goudsmit, in establishing the spin of the electron in 1925. Both Uhlenbeck and Goudsmit were close associates of Paul Ehrenfest at Leiden in the Netherlands. Uhlenbeck is also famous for the Ornstein-Uhlenbeck process which is a generalization of Einstein’s theory of Brownian motion that can treat active transport such as intracellular transport in living cells.

Solid-State Electron Tunneling: Leo Esaki (1957)

Although the tunneling of electrons in molecular bonds and in the field emission from metals had been established early in the century, direct use of electron tunneling in solid state devices had remained elusive until Leo Esaki (1925 – ) observed electron tunneling in heavily doped Germanium and Silicon semiconductors. Esaki joined an early precursor of Sony electronics in 1956 and was supported to obtain a PhD from the University of Tokyo. In 1957 he was working with heavily-doped p-n junction diodes and discovered a phenomenon known as negative differential resistance where the current through an electronic device actually decreases as the voltage increases.

Because the junction thickness was only about 100 atoms, or about 10 nanometers, he suspected and then proved that the electronic current was tunneling quantum mechanically through the junction. The negative differential resistance was caused by a decrease in available states to the tunneling current as the voltage increased.

Fig. 5 Esaki tunnel diode with heavily doped p- and n-type semiconductors. At small voltages, electrons and holes tunnel through the semiconductor bandgap across a junction that is only about 10 nm wide. Ht higher voltage, the electrons and hole have no accessible states to tunnel into, producing negative differential resistance where the current decreases with increasing voltage.

Esaki tunnel diodes were the fastest semiconductor devices of the time, and the negative differential resistance of the diode in an external circuit produced high-frequency oscillations. They were used in high-frequency communication systems. They were also radiation hard and hence ideal for the early communications satellites. Esaki was awarded the 1973 Nobel Prize in Physics jointly with Ivar Giaever and Brian Josephson.

Superconducting Tunneling: Ivar Giaever (1960)

Ivar Giaever (1929 – ) is a Norwegian-American physicist who had just joined the GE research lab in Schenectady New York in 1958 when he read about Esaki’s tunneling experiments. He was enrolled at that time as a graduate student in physics at Rensselaer Polytechnic Institute (RPI) where he was taking a course in solid state physics and learning about superconductivity. Superconductivity is carried by pairs of electrons known as Cooper pairs that spontaneously bind together with a binding energy that produced an “energy gap” in the electron energies of the metal, but no one had ever found a way to directly measure it. The Esaki experiment made him immediately think of the equivalent experiment in which Cooper pairs might tunnel between two superconductors (through a thin oxide layer) and yield a measurement of the energy gap. The idea actually came to him during the class lecture.

The experiments used a junction between aluminum and lead (Al—Al2O3—Pb). At first, the temperature of the system was adjusted so that Al remained a normal metal and Pb was superconducting, and Giaever observed a tunnel current with a threshold related to the gap in Pb. Then the temperature was lowered so that both Al and Pb were superconducting, and a peak in the tunnel current appeared at the voltage associated with the difference in the energy gaps (predicted by Harrison and Bardeen).

Fig. 6 Diagram from Giaever “The Discovery of Superconducting Tunneling” at

The Josephson Effect: Brian Josephson (1962)

In Giaever’s experiments, the external circuits had been designed to pick up “ordinary” tunnel currents in which individual electrons tunneled through the oxide rather than the Cooper pairs themselves. However, in 1962, Brian Josephson (1940 – ), a physics graduate student at Cambridge, was sitting in a lecture (just like Giaever) on solid state physics given by Phil Anderson (who was on sabbatical there from Bell Labs). During lecture he had the idea to calculate whether it was possible for the Cooper pairs themselves to tunnel through the oxide barrier. Building on theoretical work by Leo Falicov who was at the University of Chicago and later at Berkeley (years later I was lucky to have Leo as my PhD thesis advisor at Berkeley), Josephson found a surprising result that even when the voltage was zero, there would be a supercurrent that tunneled through the junction (now known as the DC Josephson Effect). Furthermore, once a voltage was applied, the supercurrent would oscillate (now known as the AC Josephson Effect). These were strange and non-intuitive results, so he showed Anderson his calculations to see what he thought. By this time Anderson had already been extremely impressed by Josephson (who would often come to the board after one of Anderson’s lectures to show where he had made a mistake). Anderson checked over the theory and agreed with Josephson’s conclusions. Bolstered by this reception, Josephson submitted the theoretical prediction for publication [9].

As soon as Anderson returned to Bell Labs after his sabbatical, he connected with John Rowell who was making tunnel junction experiments, and they revised the external circuit configuration to be most sensitive to the tunneling supercurrent, which they observed in short time and submitted a paper for publication. Since then, the Josephson Effect has become a standard element of ultra-sensitive magnetometers, measurement standards for charge and voltage, far-infrared detectors, and have been used to construct rudimentary qubits and quantum computers.

By David D. Nolte: Nov. 6, 2022

YouTube Video

YouTube Video of Quantum Tunneling Systems


[1] F. Hund, Z. Phys. 40, 742 (1927). F. Hund, Z. Phys. 43, 805 (1927).

[2] L. Nordheim, Z. Phys. 46, 833 (1928).

[3] R. H. Fowler, L. Nordheim, Proc. R. Soc. London, Ser. A 119, 173 (1928).

[4] G. Gamow, Z. Phys. 51, 204 (1928).

[5] R. W. Gurney, E. U. Condon, Nature 122, 439 (1928). R. W. Gurney, E. U. Condon, Phys. Rev. 33, 127 (1929).

[6] Dennison, D. M. and G. E. Uhlenbeck. “The two-minima problem and the ammonia molecule.” Physical Review 41(3): 313-321. (1932)

[7] L. Esaki, New Phenomenon in Narrow Germanium Para-Normal-Junctions, Phys. Rev., 109, 603-604 (1958); L. Esaki, (1974). Long journey into tunneling, disintegration, Proc. of the Nature 123, IEEE, 62, 825.

[8] I. Giaever, Energy Gap in Superconductors Measured by Electron Tunneling, Phys. Rev. Letters, 5, 147-148 (1960); I. Giaever, Electron tunneling and superconductivity, Science, 183, 1253 (1974)

[9] B. D. Josephson, Phys. Lett. 1, 251 (1962); B.D. Josephson, The discovery of tunneling supercurrent, Science, 184, 527 (1974).

[10] P. W. Anderson, J. M. Rowell, Phys. Rev. Lett. 10, 230 (1963); Philip W. Anderson, How Josephson discovered his effect, Physics Today 23, 11, 23 (1970)

[11] Eugen Merzbacher, The Early History of Quantum Tunneling, Physics Today 55, 8, 44 (2002)

[12] Razavy, Mohsen. Quantum Theory Of Tunneling, World Scientific Publishing Company, 2003.

New from Oxford Press: The History of Optical Interferometry (Late Summer 2023)