A Short History of Quantum Entanglement

Despite the many apparent paradoxes posed in physics—the twin and ladder paradoxes of relativity theory, Olber’s paradox of the bright night sky, Loschmitt’s paradox of irreversible statistical fluctuations—these are resolved by a deeper look at the underlying assumptions—the twin paradox is resolved by considering shifts in reference frames, the ladder paradox is resolved by the loss of simultaneity, Olber’s paradox is resolved by a finite age to the universe, and Loschmitt’s paradox is resolved by fluctuation theorems.  In each case, no physical principle is violated, and each paradox is fully explained.

However, there is at least one “true” paradox in physics that defies consistent explanation—quantum entanglement.  Quantum entanglement was first described by Einstein with colleagues Podolsky and Rosen in the famous EPR paper of 1935 as an argument against the completeness of quantum mechanics, and it was given its name by Schrödinger the same year in the paper where he introduced his “cat” as a burlesque consequence of entanglement. 

Here is a short history of quantum entanglement [1], from its beginnings in 1935 to the recent 2022 Nobel prize in Physics awarded to John Clauser, Alain Aspect and Anton Zeilinger.

The EPR Papers of 1935

Einstein can be considered as the father of quantum mechanics, even over Planck, because of his 1905 derivation of the existence of the photon as a discrete carrier of a quantum of energy (see Einstein versus Planck).  Even so, as Heisenberg and Bohr advanced quantum mechanics in the mid 1920’s, emphasizing the underlying non-deterministic outcomes of measurements, and in particular the notion of instantaneous wavefunction collapse, they pushed the theory in directions that Einstein found increasingly disturbing and unacceptable. 

At the invitation-only Solvay Congresses of 1927 and 1930, where all the top physicists met to debate the latest advances, Einstein and Bohr began a running debate that was epic in the history of physics as the two top minds went head-to-head as the onlookers looked on in awe.  Ultimately, Einstein was on the losing end.  Although he was convinced that something was missing in quantum theory, he could not counter all of Bohr’s rejoinders, even as Einstein’s assaults became ever more sophisticated, and he left the field of battle beaten but not convinced.  Several years later he launched his last and ultimate salvo.

Fig. 1 Niels Bohr and Albert Einstein

At the Institute for Advanced Study in Princeton, New Jersey, in the 1930’s Einstein was working with Nathan Rosen and Boris Podolsky when he envisioned a fundamental paradox in quantum theory that occurred when two widely-separated quantum particles were required to share specific physical properties because of simple conservation theorems like energy and momentum.  Even Bohr and Heisenberg could not deny the principle of conservation of energy and momentum, and Einstein devised a two-particle system for which these conservation principles led to an apparent violation of Heisenberg’s own uncertainty principle.  He left the details to his colleagues, with Podolsky writing up the main arguments.  They published the paper in the Physical Review in March of 1935 with the title “Can Quantum-Mechanical Description of Physical Reality be Considered Complete” [2].  Because of the three names on the paper (Einstein, Podolsky, Rosen), it became known as the EPR paper, and the paradox they presented became known as the EPR paradox.

When Bohr read the paper, he was initially stumped and aghast.  He felt that EPR had shaken the very foundations of the quantum theory that he and his institute had fought so hard to establish.  He also suspected that EPR had made a mistake in their arguments, and he halted all work at his institute in Copenhagen until they could construct a definitive answer.  A few months later, Bohr published a paper in the Physical Review in July of 1935, using the identical title that EPR had used, in which he refuted the EPR paradox [3].  There is not a single equation or figure in the paper, but he used his “awful incantation terminology” to maximum effect, showing that one of the EPR assumptions on the assessment of uncertainties to position and momentum was in error, and he was right.

Einstein was disgusted.  He had hoped that this ultimate argument against the completeness of quantum mechanics would stand the test of time, but Bohr had shot it down within mere months.  Einstein was particularly disappointed with Podolsky, because Podolsky had tried too hard to make the argument specific to position and momentum, leaving a loophole for Bohr to wiggle through, where Einstein had wanted the argument to rest on deeper and more general principles. 

Despite Bohr’s victory, Einstein had been correct in his initial formulation of the EPR paradox that showed quantum mechanics did not jibe with common notions of reality.  He and Schrödinger exchanged letters commiserating with each other and encouraging each other in their counter beliefs against Bohr and Heisenberg.  In November of 1935, Schrödinger published a broad, mostly philosophical, paper in Naturwissenschaften [4] in which he amplified the EPR paradox with the use of an absurd—what he called burlesque—consequence of wavefunction collapse that became known as Schrödinger’s Cat.  He also gave the central property of the EPR paradox its name: entanglement.

Ironically, both Einstein’s entanglement paradox and Schrödinger’s Cat, which were formulated originally to be arguments against the validity of quantum theory, have become established quantum tools.  Today, entangled particles are the core workhorses of quantum information systems, and physicists are building larger and larger versions of Schrödinger’s Cat that may eventually merge with the physics of the macroscopic world.

Bohm and Ahronov Tackle EPR

The physicist David Bohm was a rare political exile from the United States.  He was born in the heart of Pennsylvania in the town of Wilkes-Barre, attended Penn State and then the University of California at Berkeley, where he joined Robert Oppenheimer’s research group.  While there, he became deeply involved in the fight for unions and socialism, activities for which he was called before McCarthy’s Committee on Un-American Activities.  He invoked his right to the fifth amendment for which he was arrested.  Although he was later acquitted, Princeton University fired him from his faculty position, and fearing another arrest, he fled to Brazil where his US passport was confiscated by American authorities.  He had become a physicist without a country. 

Fig. 2 David Bohm

Despite his personal trials, Bohm remained scientifically productive.  He published his influential textbook on quantum mechanics in the midst of his Senate hearings, and after a particularly stimulating discussion with Einstein shortly before he fled the US, he developed and published an alternative version of quantum theory in 1952 that was fully deterministic—removing Einstein’s “God playing dice”—by creating a hidden-variable theory [5].

Hidden-variable theories of quantum mechanics seek to remove the randomness of quantum measurement by assuming that some deeper element of quantum phenomena—a hidden variable—explains each outcome.  But it is also assumed that these hidden variables are not directly accessible to experiment.  In this sense, the quantum theory of Bohr and Heisenberg was “correct” but not “complete”, because there were things that the theory could not predict or explain.

Bohm’s hidden variable theory, based on a quantum potential, was able to reproduce all the known results of standard quantum theory without invoking the random experimental outcomes that Einstein abhorred.  However, it still contained one crucial element that could not sweep away the EPR paradox—it was nonlocal.

Nonlocality lies at the heart of quantum theory.  In its simplest form, the nonlocal nature of quantum phenomenon says that quantum states span spacetime with space-like separations, meaning that parts of the wavefunction are non-causally connected to other parts of the wavefunction.  Because Einstein was fundamentally committed to causality, the nonlocality of quantum theory was what he found most objectionable, and Bohm’s elegant hidden-variable theory, that removed Einstein’s dreaded randomness, could not remove that last objection of non-causality.

After working in Brazil for several years, Bohm moved to the Technion University in Israel where he began a fruitful collaboration with Yakir Ahronov.  In addition to proposing the Ahronov-Bohm effect, in 1957 they reformulated Podolsky’s version of the EPR paradox that relied on continuous values of position and momentum and replaced it with a much simpler model based on the Stern-Gerlach effect on spins and further to the case of positronium decay into two photons with correlated polarizations.  Bohm and Ahronov reassessed experimental results of positronium decay that had been made by Madame Wu in 1950 at Columbia University and found it in full agreement with standard quantum theory.

John Bell’s Inequalities

John Stuart Bell had an unusual start for a physicist.  His family was too poor to give him an education appropriate to his skills, so he enrolled in vocational school where he took practical classes that included brick laying.  Working later as a technician in a university lab, he caught the attention of his professors who sponsored him to attend the university.  With a degree in physics, he began working at CERN as an accelerator designer when he again caught the attention of his supervisors who sponsored him to attend graduate school.  He graduated with a PhD and returned to CERN as a card-carrying physicist with all the rights and privileges that entailed.

Fig. 3 John Bell

During his university days, he had been fascinated by the EPR paradox, and he continued thinking about the fundamentals of quantum theory.  On a sabbatical to the Stanford accelerator in 1960 he began putting mathematics to the EPR paradox to see whether any local hidden variable theory could be compatible with quantum mechanics.  His analysis was fully general, so that it could rule out as-yet-unthought-of hidden-variable theories.  The result of this work was a set of inequalities that must be obeyed by any local hidden-variable theory.  Then he made a simple check using the known results of quantum measurement and showed that his inequalities are violated by quantum systems.  This ruled out the possibility of any local hidden variable theory (but not Bohm’s nonlocal hidden-variable theory).  Bell published his analysis in 1964 [6] in an obscure journal that almost no one read…except for a curious graduate student at Columbia University who began digging into the fundamental underpinnings of quantum theory against his supervisor’s advice.

Fig. 4 Polarization measurements on entangled photons violate Bell’s inequality.

John Clauser’s Tenacious Pursuit

As a graduate student in astrophysics at Columbia University, John Clauser was supposed to be doing astrophysics.  Instead, he spent his time musing over the fundamentals of quantum theory.  In 1967 Clauser stumbled across Bell’s paper while he was in the library.  The paper caught his imagination, but he also recognized that the inequalities were not experimentally testable, because they required measurements that depended directly on hidden variables, which are not accessible.  He began thinking of ways to construct similar inequalities that could be put to an experimental test, and he wrote about his ideas to Bell, who responded with encouragement.  Clauser wrote up his ideas in an abstract for an upcoming meeting of the American Physical Society, where one of the abstract reviewers was Abner Shimony of Boston University.  Clauser was surprised weeks later when he received a telephone call from Shimony.  Shimony and his graduate student Micheal Horne had been thinking along similar lines, and Shimony proposed to Clauser that they join forces.  They met in Boston where they were met Richard Holt, a graudate student at Harvard who was working on experimental tests of quantum mechanics.  Collectively, they devised a new type of Bell inequality that could be put to experimental test [7].  The result has become known as the CHSH Bell inequality (after Clauser, Horne, Shimony and Holt).

Fig. 5 John Clauser

When Clauser took a post-doc position in Berkeley, he began searching for a way to do the experiments to test the CHSH inequality, even though Holt had a head start at Harvard.  Clauser enlisted the help of Charles Townes, who convinced one of the Berkeley faculty to loan Clauser his graduate student, Stuart Freedman, to help.  Clauser and Freedman performed the experiments, using a two-photon optical decay of calcium ions and found a violation of the CHSH inequality by 5 standard deviations, publishing their result in 1972 [8]. 

Fig. 6 CHSH inequality violated by entangled photons.

Alain Aspect’s Non-locality

Just as Clauser’s life was changed when he stumbled on Bell’s obscure paper in 1967, the paper had the same effect on the life of French physicist Alain Aspect who stumbled on it in 1975.  Like Clauser, he also sought out Bell for his opinion, meeting with him in Geneva, and Aspect similarly received Bell’s encouragement, this time with the hope to build upon Clauser’s work. 

Fig. 7 Alain Aspect

In some respects, the conceptual breakthrough achieved by Clauser had been the CHSH inequality that could be tested experimentally.  The subsequent Clauser Freedman experiments were not a conclusion, but were just the beginning, opening the door to deeper tests.  For instance, in the Clauser-Freedman experiments, the polarizers were static, and the detectors were not widely separated, which allowed the measurements to be time-like separated in spacetime.  Therefore, the fundamental non-local nature of quantum physics had not been tested.

Aspect began a thorough and systematic program, that would take him nearly a decade to complete, to test the CHSH inequality under conditions of non-locality.  He began with a much brighter source of photons produced using laser excitation of the calcium ions.  This allowed him to perform the experiment in 100’s of seconds instead of the hundreds of hours by Clauser.  With such a high data rate, Aspect was able to verify violation of the Bell inequality to 10 standard deviations, published in 1981 [9].

However, the real goal was to change the orientations of the polarizers while the photons were in flight to widely separated detectors [10].  This experiment would allow the detection to be space-like separated in spacetime.  The experiments were performed using fast-switching acoustic-optic modulators, and the Bell inequality was violated to 5 standard deviations [11].  This was the most stringent test yet performed and the first to fully demonstrate the non-local nature of quantum physics.

Anton Zeilinger: Master of Entanglement

If there is one physicist today whose work encompasses the broadest range of entangled phenomena, it would be the Austrian physicist, Anton Zeilinger.  He began his career in neutron interferometery, but when he was bitten by the entanglement bug in 1976, he switched to quantum photonics because of the superior control that can be exercised using optics over sources and receivers and all the optical manipulations in between.

Fig. 8 Anton Zeilinger

Working with Daniel Greenberger and Micheal Horne, they took the essential next step past the Bohm two-particle entanglement to consider a 3-particle entangled state that had surprising properties.  While the violation of locality by the two-particle entanglement was observed through the statistical properties of many measurements, the new 3-particle entanglement could show violations on single measurements, further strengthening the arguments for quantum non-locality.  This new state is called the GHZ state (after Greenberger, Horne and Zeilinger) [12].

As the Zeilinger group in Vienna was working towards experimental demonstrations of the GHZ state, Charles Bennett of IBM proposed the possibility for quantum teleportation, using entanglement as a core quantum information resource [13].   Zeilinger realized that his experimental set-up could perform an experimental demonstration of the effect, and in a rapid re-tooling of the experimental apparatus [14], the Zeilinger group was the first to demonstrate quantum teleportation that satisfied the conditions of the Bennett teleportation proposal [15].  An Italian-UK collaboration also made an early demonstration of a related form of teleportation in a paper that was submitted first, but published after Zeilinger’s, due to delays in review [16].  But teleportation was just one of a widening array of quantum applications for entanglement that was pursued by the Zeilinger group over the succeeding 30 years [17], including entanglement swapping, quantum repeaters, and entanglement-based quantum cryptography. Perhaps most striking, he has worked on projects at astronomical observatories that entangle photons coming from cosmic sources.


Timeline

1935 – Einstein EPR

1935 – Bohr EPR

1935 – Schrödinger: Entanglement and Cat

1950 – Madam Wu positron decay

1952 – David Bohm and Non-local hidden variables

1957 – Bohm and Ahronov version of EPR

1963 – Bell’s inequalities

1967 – Clauser reads Bell’s paper

1967 – Commins experiment with Calcium

1969 – CHSH inequality: measurable with detection inefficiencies

1972 – Clauser and Freedman experiment

1975 – Aspect reads Bell’s paper

1976 – Zeilinger reads Bell’s paper

1981 – Aspect two-photon generation source

1982 – Aspect time variable analyzers

1988 – Parametric down-conversion of EPR pairs (Shih and Alley, Ou and Mandel)

1989 – GHZ state proposed

1993 – Bennett quantum teleportation proposal

1995 – High-intensity down-conversion source of EPR pairs (Kwiat and Zeilinger)

1997 – Zeilinger quantum teleportation experiment

1999 – Observation of the GHZ state


Bibliography

[1] See the full details in: David D. Nolte, Interference: A History of Interferometry and the Scientists Who Tamed Light (Oxford University Press, 2023)

[2] A. Einstein, B. Podolsky, N. Rosen, Can quantum-mechanical description of physical reality be considered complete? Physical Review 47, 0777-0780 (1935).

[3] N. Bohr, Can quantum-mechanical description of physical reality be considered complete? Physical Review 48, 696-702 (1935).

[4] E. Schrödinger, Die gegenwärtige Situation in der Quantenmechanik. Die Naturwissenschaften 23, 807-12; 823-28; 844-49 (1935).

[5] D. Bohm, A suggested interpretation of the quantum theory in terms of hidden variables .1. Physical Review 85, 166-179 (1952); D. Bohm, A suggested interpretation of the quantum theory in terms of hidden variables .2. Physical Review 85, 180-193 (1952).

[6] J. Bell, On the Einstein-Podolsky-Rosen paradox. Physics 1, 195 (1964).

[7] 1. J. F. Clauser, M. A. Horne, A. Shimony, R. A. Holt, Proposed experiment to test local hidden-variable theories. Physical Review Letters 23, 880-& (1969).

[8] S. J. Freedman, J. F. Clauser, Experimental test of local hidden-variable theories. Physical Review Letters 28, 938-& (1972).

[9] A. Aspect, P. Grangier, G. Roger, EXPERIMENTAL TESTS OF REALISTIC LOCAL THEORIES VIA BELLS THEOREM. Physical Review Letters 47, 460-463 (1981).

[10]  Alain Aspect, Bell’s Theorem: The Naïve Veiw of an Experimentalit. (2004), hal- 00001079

[11] A. Aspect, J. Dalibard, G. Roger, EXPERIMENTAL TEST OF BELL INEQUALITIES USING TIME-VARYING ANALYZERS. Physical Review Letters 49, 1804-1807 (1982).

[12] D. M. Greenberger, M. A. Horne, A. Zeilinger, in 1988 Fall Workshop on Bells Theorem, Quantum Theory and Conceptions of the Universe. (George Mason Univ, Fairfax, Va, 1988), vol. 37, pp. 69-72.

[13] C. H. Bennett, G. Brassard, C. Crepeau, R. Jozsa, A. Peres, W. K. Wootters, Teleporting an unknown quantum state via dual classical and einstein-podolsky-rosen channels. Physical Review Letters 70, 1895-1899 (1993).

[14]  J. Gea-Banacloche, Optical realizations of quantum teleportation, in Progress in Optics, Vol 46, E. Wolf, Ed. (2004), vol. 46, pp. 311-353.

[15] D. Bouwmeester, J.-W. Pan, K. Mattle, M. Eibl, H. Weinfurter, A. Zeilinger, Experimental quantum teleportation. Nature 390, 575-579 (1997).

[16] D. Boschi, S. Branca, F. De Martini, L. Hardy, S. Popescu, Experimental realization of teleporting an unknown pure quantum state via dual classical and Einstein-podolsky-Rosen Channels. Phys. Rev. Lett. 80, 1121-1125 (1998).

[17]  A. Zeilinger, Light for the quantum. Entangled photons and their applications: a very personal perspective. Physica Scripta 92, 1-33 (2017).

A Short History of Quantum Tunneling

Quantum physics is often called “weird” because it does things that are not allowed in classical physics and hence is viewed as non-intuitive or strange.  Perhaps the two “weirdest” aspects of quantum physics are quantum entanglement and quantum tunneling.  Entanglement allows a particle state to extend across wide expanses of space, while tunneling allows a particle to have negative kinetic energy.  Neither of these effects has a classical analog.

Quantum entanglement arose out of the Bohr-Einstein debates at the Solvay Conferences in the 1920’s and 30’s, and it was the subject of a recent Nobel Prize in Physics (2022).  The quantum tunneling story is just as old, but it was recognized much earlier by the Nobel Prize in 1972 when it was awarded to Brian Josephson, Ivar Giaever and Leo Esaki—each of whom was a graduate student when they discovered their respective effects and two of whom got their big idea while attending a lecture class. 

Always go to class, you never know what you might miss, and the payoff is sometimes BIG

Ivar Giaever

Of the two effects, tunneling is the more common and the more useful in modern electronic devices (although entanglement is coming up fast with the advent of quantum information science). Here is a short history of quantum tunneling, told through a series of publications that advanced theory and experiments.

Double-Well Potential: Friedrich Hund (1927)

The first analysis of quantum tunneling was performed by Friedrich Hund (1896 – 1997), a German physicist who studied early in his career with Born in Göttingen and Bohr in Copenhagen.  He published a series of papers in 1927 in Zeitschrift für Physik [1] that solved the newly-proposed Schrödinger equation for the case of the double well potential.  He was particularly interested in the formation of symmetric and anti-symmetric states of the double well that contributed to the binding energy of atoms in molecules.  He derived the first tunneling-frequency expression for a quantum superposition of the symmetric and anti-symmetric states

where f is the coherent oscillation frequency, V is the height of the potential and hν is the quantum energy of the isolated states when the atoms are far apart.  The exponential dependence on the potential height V made the tunnel effect extremely sensitive to the details of the tunnel barrier.

Fig. 1 Friedrich Hund

Electron Emission: Lothar Nordheim and Ralph Fowler (1927 – 1928)

The first to consider quantum tunneling from a bound state to a continuum state was Lothar Nordheim (1899 – 1985), a German physicist who studied under David Hilbert and Max Born at Göttingen and worked with John von Neumann and Eugene Wigner and later with Hans Bethe. In 1927 he solved the problem of a particle in a well that is separated from continuum states by a thin finite barrier [2]. Using the new Schrödinger theory, he found transmission coefficients that were finite valued, caused by quantum tunneling of the particle through the barrier. Nordheim’s use of square potential wells and barriers are now, literally, textbook examples that every student of quantum mechanics solves. (For a quantum simulation of wavefunction tunneling through a square barrier see the companion Quantum Tunneling YouTube video.) Nordheim later escaped the growing nationalism and anti-semitism in Germany in the mid 1930’s to become a visiting professor of physics at Purdue University in the United States, moving to a permanent position at Duke University.

Fig. 2 Nordheim square tunnel barrier and Fowler-Nordheim triangular tunnel barrier for electron tunneling from bound states into the continuum.

One of the giants of mathematical physics in the UK from the 1920s through the 1930’s was Ralph Fowler (1889 – 1944). Three of his doctoral students went on to win Nobel Prizes (Chandrasekhar, Dirac and Mott) and others came close (Bhabha, Hartree, Lennard-Jones). In 1928 Fowler worked with Nordheim on a more realistic version of Nordheim’s surface electron tunneling that could explain thermionic emission of electrons from metals under strong electric fields. The electric field modified Nordheim’s square potential barrier into a triangular barrier (which they treated using WKB theory) to obtain the tunneling rate [3]. This type of tunnel effect is now known as Fowler-Nordheim tunneling.

Nuclear Alpha Decay: George Gamow (1928)

George Gamov (1904 – 1968) is one of the icons of mid-twentieth-century physics. He was a substantial physicist who also had a solid sense of humor that allowed him to achieve a level of cultural popularity shared by a few of the larger-than-life physicists of his time, like Richard Feynman and Stephen Hawking. His popular books included One Two Three … Infinity as well as a favorite series of books under the rubric of Mr. Tompkins (Mr. Tompkins in Wonderland and Mr. Tompkins Explores the Atom, among others). He also wrote a history of the early years of quantum theory (Thirty Years that Shook Physics).

In 1928 Gamow was in Göttingen (the Mecca of early quantum theory) with Max Born when he realized that the radioactive decay of Uranium by alpha decay might be explained by quantum tunneling. It was known that nucleons were bound together by some unknown force in what would be an effective binding potential, but that charged alpha particles would also feel a strong electrostatic repulsive potential from a nucleus. Gamow combined these two potentials to create a potential landscape that was qualitatively similar to Nordheim’s original system of 1927, but with a potential barrier that was neither square nor triangular (like the Fowler-Nordheim situation).

Fig. 3 George Gamow

Gamow was able to make an accurate approximation that allowed him to express the decay rate in terms of an exponential term

where Zα is the atomic charge of the alpha particle, Z is the nuclear charge of the Uranium decay product and v is the speed of the alpha particle detected in external measurements [4].

The very next day after Gamow submitted his paper, Ronald Gurney and Edward Condon of Princeton University submitted a paper [5] that solved the same problem using virtually the same approach … except missing Gamow’s surprisingly concise analytic expression for the decay rate.

Molecular Tunneling: George Uhlenbeck (1932)

Because tunneling rates depend inversely on the mass of the particle tunneling through the barrier, electrons are more likely to tunnel through potential barriers than atoms. However, hydrogen is a particularly small atom and is therefore the most amenable to experiencing tunneling.

The first example of atom tunneling is associated with hydrogen in the ammonia molecule NH3. The molecule has a pyramidal structure with the Nitrogen hovering above the plane defined by the three hydrogens. However, an equivalent configuration has the Nitrogen hanging below the hydrogen plane. The energies of these two configurations are the same, but the Nitrogen must tunnel from one side of the hydrogen plane to the other through a barrier. The presence of light-weight hydrogen that can “move out of the way” for the nitrogen makes this barrier very small (infrared energies). When the ammonia is excited into its first vibrational excited state, the molecular wavefunction tunnels through the barrier, splitting the excited level by an energy associated with a wavelength of 1.2 cm which is in the microwave. This tunnel splitting was the first microwave transition observed in spectroscopy and is used in ammonia masers.

Fig. 4 Nitrogen inversion in the ammonia molecule is achieved by excitation to a vibrational excited state followed by tunneling through the barrier, proposed by George Uhlenbeck in 1932.

One of the earliest papers [6] written on the tunneling of nitrogen in ammonia was published by George Uhlenbeck in 1932. George Uhlenbeck (1900 – 1988) was a Dutch-American theoretical physicist. He played a critical role, with Samuel Goudsmit, in establishing the spin of the electron in 1925. Both Uhlenbeck and Goudsmit were close associates of Paul Ehrenfest at Leiden in the Netherlands. Uhlenbeck is also famous for the Ornstein-Uhlenbeck process which is a generalization of Einstein’s theory of Brownian motion that can treat active transport such as intracellular transport in living cells.

Solid-State Electron Tunneling: Leo Esaki (1957)

Although the tunneling of electrons in molecular bonds and in the field emission from metals had been established early in the century, direct use of electron tunneling in solid state devices had remained elusive until Leo Esaki (1925 – ) observed electron tunneling in heavily doped Germanium and Silicon semiconductors. Esaki joined an early precursor of Sony electronics in 1956 and was supported to obtain a PhD from the University of Tokyo. In 1957 he was working with heavily-doped p-n junction diodes and discovered a phenomenon known as negative differential resistance where the current through an electronic device actually decreases as the voltage increases.

Because the junction thickness was only about 100 atoms, or about 10 nanometers, he suspected and then proved that the electronic current was tunneling quantum mechanically through the junction. The negative differential resistance was caused by a decrease in available states to the tunneling current as the voltage increased.

Fig. 5 Esaki tunnel diode with heavily doped p- and n-type semiconductors. At small voltages, electrons and holes tunnel through the semiconductor bandgap across a junction that is only about 10 nm wide. Ht higher voltage, the electrons and hole have no accessible states to tunnel into, producing negative differential resistance where the current decreases with increasing voltage.

Esaki tunnel diodes were the fastest semiconductor devices of the time, and the negative differential resistance of the diode in an external circuit produced high-frequency oscillations. They were used in high-frequency communication systems. They were also radiation hard and hence ideal for the early communications satellites. Esaki was awarded the 1973 Nobel Prize in Physics jointly with Ivar Giaever and Brian Josephson.

Superconducting Tunneling: Ivar Giaever (1960)

Ivar Giaever (1929 – ) is a Norwegian-American physicist who had just joined the GE research lab in Schenectady New York in 1958 when he read about Esaki’s tunneling experiments. He was enrolled at that time as a graduate student in physics at Rensselaer Polytechnic Institute (RPI) where he was taking a course in solid state physics and learning about superconductivity. Superconductivity is carried by pairs of electrons known as Cooper pairs that spontaneously bind together with a binding energy that produced an “energy gap” in the electron energies of the metal, but no one had ever found a way to directly measure it. The Esaki experiment made him immediately think of the equivalent experiment in which Cooper pairs might tunnel between two superconductors (through a thin oxide layer) and yield a measurement of the energy gap. The idea actually came to him during the class lecture.

The experiments used a junction between aluminum and lead (Al—Al2O3—Pb). At first, the temperature of the system was adjusted so that Al remained a normal metal and Pb was superconducting, and Giaever observed a tunnel current with a threshold related to the gap in Pb. Then the temperature was lowered so that both Al and Pb were superconducting, and a peak in the tunnel current appeared at the voltage associated with the difference in the energy gaps (predicted by Harrison and Bardeen).

Fig. 6 Diagram from Giaever “The Discovery of Superconducting Tunneling” at https://conferences.illinois.edu/bcs50/pdf/giaever.pdf

The Josephson Effect: Brian Josephson (1962)

In Giaever’s experiments, the external circuits had been designed to pick up “ordinary” tunnel currents in which individual electrons tunneled through the oxide rather than the Cooper pairs themselves. However, in 1962, Brian Josephson (1940 – ), a physics graduate student at Cambridge, was sitting in a lecture (just like Giaever) on solid state physics given by Phil Anderson (who was on sabbatical there from Bell Labs). During lecture he had the idea to calculate whether it was possible for the Cooper pairs themselves to tunnel through the oxide barrier. Building on theoretical work by Leo Falicov who was at the University of Chicago and later at Berkeley (years later I was lucky to have Leo as my PhD thesis advisor at Berkeley), Josephson found a surprising result that even when the voltage was zero, there would be a supercurrent that tunneled through the junction (now known as the DC Josephson Effect). Furthermore, once a voltage was applied, the supercurrent would oscillate (now known as the AC Josephson Effect). These were strange and non-intuitive results, so he showed Anderson his calculations to see what he thought. By this time Anderson had already been extremely impressed by Josephson (who would often come to the board after one of Anderson’s lectures to show where he had made a mistake). Anderson checked over the theory and agreed with Josephson’s conclusions. Bolstered by this reception, Josephson submitted the theoretical prediction for publication [9].

As soon as Anderson returned to Bell Labs after his sabbatical, he connected with John Rowell who was making tunnel junction experiments, and they revised the external circuit configuration to be most sensitive to the tunneling supercurrent, which they observed in short time and submitted a paper for publication. Since then, the Josephson Effect has become a standard element of ultra-sensitive magnetometers, measurement standards for charge and voltage, far-infrared detectors, and have been used to construct rudimentary qubits and quantum computers.


References:

[1] F. Hund, Z. Phys. 40, 742 (1927). F. Hund, Z. Phys. 43, 805 (1927).

[2] L. Nordheim, Z. Phys. 46, 833 (1928).

[3] R. H. Fowler, L. Nordheim, Proc. R. Soc. London, Ser. A 119, 173 (1928).

[4] G. Gamow, Z. Phys. 51, 204 (1928).

[5] R. W. Gurney, E. U. Condon, Nature 122, 439 (1928). R. W. Gurney, E. U. Condon, Phys. Rev. 33, 127 (1929).

[6] Dennison, D. M. and G. E. Uhlenbeck. “The two-minima problem and the ammonia molecule.” Physical Review 41(3): 313-321. (1932)

[7] L. Esaki, New Phenomenon in Narrow Germanium Para-Normal-Junctions, Phys. Rev., 109, 603-604 (1958); L. Esaki, (1974). Long journey into tunneling, disintegration, Proc. of the Nature 123, IEEE, 62, 825.

[8] I. Giaever, Energy Gap in Superconductors Measured by Electron Tunneling, Phys. Rev. Letters, 5, 147-148 (1960); I. Giaever, Electron tunneling and superconductivity, Science, 183, 1253 (1974)

[9] B. D. Josephson, Phys. Lett. 1, 251 (1962); B.D. Josephson, The discovery of tunneling supercurrent, Science, 184, 527 (1974).

[10] P. W. Anderson, J. M. Rowell, Phys. Rev. Lett. 10, 230 (1963); Philip W. Anderson, How Josephson discovered his effect, Physics Today 23, 11, 23 (1970)

[11] Eugen Merzbacher, The Early History of Quantum Tunneling, Physics Today 55, 8, 44 (2002)

[12] Razavy, Mohsen. Quantum Theory Of Tunneling, World Scientific Publishing Company, 2003.

Climate Change Physics 101

When our son was ten years old, he came home from a town fair in Battleground, Indiana, with an unwanted pet—a goldfish in a plastic bag.  The family rushed out to buy a fish bowl and food and plopped the golden-red animal into it.  In three days, it was dead!

It turns out that you can’t just put a gold fish in a fish bowl.  When it metabolizes its food and expels its waste, it builds up toxic levels of ammonia unless you add filters or plants or treat the water with chemicals.  In the end, the goldfish died because it was asphyxiated by its own pee.

It’s a basic rule—don’t pee in your own fish bowl.

The same can be said for humans living on the surface of our planet.  Polluting the atmosphere with our wastes cannot be a good idea.  In the end it will kill us.  The atmosphere may look vast—the fish bowl was a big one—but it is shocking how thin it is.

Turn on your Apple TV, click on the screen saver, and you are skimming over our planet on the dark side of the Earth. Then you see a thin blue line extending over the limb of the dark disc.  Hold!  That thin blue line!  That is our atmosphere! Is it really so thin?

When you look upwards on a clear sunny day, the atmosphere seems like it goes on forever.  It doesn’t.  It is a thin veneer on the surface of the Earth barely one percent of the Earth’s radius.  The Earth’s atmosphere is frighteningly thin. 

Fig. 1  A thin veneer of atmosphere paints the surface of the Earth.  The radius of the Earth is 6360 km, and the thickness of the atmosphere is 100 km, which is a bit above 1 percent of the radius.

Consider Mars.  It’s half the size of Earth, yet it cannot hold on to an atmosphere even 1/100th the thickness of ours.  When Mars first formed, it had an atmosphere not unlike our own, but through the eons its atmosphere has wafted away irretrievably into space.

An atmosphere is a precious fragile thing for a planet.  It gives life and it gives protection.  It separates us from the deathly cold of space, holding heat like a blanket.  That heat has served us well over the eons, allowing water to stay liquid and allowing life to arise on Earth.  But too much of a good thing is not a good thing.

Common Sense

If the fluid you are bathed in gives you life, then don’t mess with it.  Don’t run your car in the garage while you are working in it.  Don’t use a charcoal stove in an enclosed space.  Don’t dump carbon dioxide into the atmosphere because it also is an enclosed space.

At the end of winter, as the warm spring days get warmer, you take the winter blanket off your bed because blankets hold in heat.  The thicker the blanket, the more heat it holds in.  Common sense tells you to reduce the thickness of the blanket if you don’t want to get too warm.  Carbon dioxide in the atmosphere acts like a blanket.  If we don’t want the Earth to get too warm, then we need to limit the thickness of the blanket.

Without getting into the details of any climate change model, common sense already tells us what we should do.  Keep the atmosphere clean and stable (Don’t’ pee in our fishbowl) and limit the amount of carbon dioxide we put into it (Don’t let the blanket get too thick).

Some Atmospheric Facts

Here are some facts about the atmosphere, about the effect humans have on it, and about the climate:

Fact 1.  Humans have increased the amount of carbon dioxide in the atmosphere by 45% since 1850 (the beginning of the industrial age) and by 30% since just 1960.

Fact 2.  Carbon dioxide in the atmosphere prevents some of the heat absorbed from the Sun to re-radiate out to space.  More carbon dioxide stores more heat.

Fact 3.  Heat added to the Earth’s atmosphere increases its temperature.  This is a law of physics.

Fact 4.  The Earth’s average temperature has risen by 1.2 degrees Celsius since 1850 and 0.8 degrees of that has been just since 1960, so the effect is accelerating.

These facts are indisputable.  They hold true regardless of whether there is a Republican or a Democrat in the White House or in control of Congress.

There is another interesting observation which is not so direct, but may hold a harbinger for the distant future: The last time the Earth was 3 degrees Celsius warmer than it is today was during the Pliocene when the sea level was tens of meters higher.  If that sea level were to occur today, all of Delaware, most of Florida, half of Louisiana and the entire east coast of the US would be under water, including Houston, Miami, New Orleans, Philadelphia and New York City.  There are many reasons why this may not be an immediate worry. The distribution of water and ice now is different than in the Pliocene, and the effect of warming on the ice sheets and water levels could take centuries. Within this century, the amount of sea level rise is likely to be only about 1 meter, but accelerating after that.

Fig. 2  The east coast of the USA for a sea level 30 meters higher than today.  All of Delaware, half of Louisiana, and most of Florida are under water. Reasonable projections show only a 1 meter sea level rise by 2100, but accelerating after that. From https://www.youtube.com/watch?v=G2x1bonLJFA

Balance and Feedback

It is relatively easy to create a “rule-of-thumb” model for the Earth’s climate (see Ref. [2]).  This model is not accurate, but it qualitatively captures the basic effects of climate change and is a good way to get an intuitive feeling for how the Earth responds to changes, like changes in CO2 or to the amount of ice cover.  It can also provide semi-quantitative results, so that relative importance of various processes or perturbations can be understood.

The model is a simple energy balance statement:  In equilibrium, as much energy flows into the Earth system as out.

This statement is both simple and immediately understandable.  But then the work starts as we need to pin down how much energy is flowing in and how much is flowing out.  The energy flowing in comes from the sun, and the energy flowing out comes from thermal radiation into space. 

We also need to separate the Earth system into two components: the surface and the atmosphere.  These are two very different things that have two different average temperatures.  In addition, the atmosphere transmits sunlight to the surface, unless clouds reflect it back into space.  And the Earth radiates thermally into space, unless clouds or carbon dioxide layers reflect it back to the surface.

The energy fluxes are shown in the diagram in Fig. 3 for the 4-component system: Sun, Surface, Atmosphere, and Space. The light from the sun, mostly in the visible range of the spectrum, is partially absorbed by the atmosphere and partially transmitted and reflected. The transmitted portion is partially absorbed and partially reflected by the surface. The heat of the Earth is radiated at long wavelengths to the atmosphere, where it is partially transmitted out into space, but also partially reflected by the fraction a’a which is the blanket effect. In addition, the atmosphere itself radiates in equal parts to the surface and into outer space. On top of all of these radiative processes, there is also non-radiative convective interaction between the atmosphere and the surface.

Fig. 3 Energy flux model for a simple climate model with four interacting systems: the Sun, the Atmosphere, the Earth and Outer Space.

These processes are captured by two energy flux equations, one for the atmosphere and one for the surface, in Fig. 4. The individual contributions from Fig. 3 are annotated in each case. In equilibrium, each flux equals zero, which can then be used to solve for the two unknowns: Ts0 and Ta0: the surface and atmosphere temperatures.

Fig. 4 Energy-balance model of the Earth’s atmosphere for a simple climate approximation.

After the equilibrium temperatures Ts0 and Ta0 are found, they go into a set of dynamic response equations that governs how deviations in the temperatures relax back to the equilibrium values. These relaxation equations are

where ks and ka are the relaxation rates for the surface and atmosphere. These can be quite slow, in the range of a century. For illustration, we can take ks = 1/75 years and ka = 1/25 years. The equilibrium temperatures for the surface and atmosphere differ by about 50 degrees Celsius, with Ts = 289 K and Ta = 248 K. These are rough averages over the entire planet. The solar constant is S = 1.36×103 W/m2, the Stefan-Boltzman constant is σ = 5.67×10-8 W/m2/K4, and the convective interaction constant is c = 2.5 W m-2 K-1. Other parameters are given in Table I.

Short WavelengthLong Wavelength
as = 0.11
ts = 0.53t’a = 0.06
aa = 0.30a’a = 0.31

The relaxation equations are in the standard form of a mathematical “flow” (see Ref. [1]) and the solutions are plotted as a phase-space portrait in Fig. 5 as a video of the flow as the parameters in Table I shift because of the addition of greenhouse gases to the atmosphere. The video runs from the year 1850 (the dawn of the industrial age) through to the year 2060 about 40 years from now.

Fig. 5 Video of the phase space flow of the Surface-Atmosphere system for increasing year. The flow vectors and flow lines are the relaxation to equilibrium for temperature deviations. The change in equilibrium over the years is from increasing blanket effects in the atmosphere caused by greenhouse gases.

The scariest part of the video is how fast it accelerates. From 1850 to 1950 there is almost no change, but then it accelerates, faster and faster, reflecting the time-lag in temperature rise in response to increased greenhouse gases.

What if the Models are Wrong?  Russian Roulette

Now come the caveats.

This model is just for teaching purposes, not for any realistic modeling of climate change. It captures the basic physics, and it provides a semi-quantitative set of parameters that leads to roughly accurate current temperatures. But of course, the biggest elephant in the room is that it averages over the entire planet, which is a very crude approximation.

It does get the basic facts correct, though, showing an alarming trend in the rise in average temperatures with the temperature rising by 3 degrees by 2060.

The professionals in this business have computer models that are orders of magnitude more more accurate than this one. To understand the details of the real climate models, one needs to go to appropriate resources, like this NOAA link, this NASA link, this national climate assessment link, and this government portal link, among many others.

One of the frequent questions that is asked is: What if these models are wrong? What if global warming isn’t as bad as these models say? The answer is simple: If they are wrong, then the worst case is that life goes on. If they are right, then in the worst case life on this planet may end.

It’s like playing Russian Roulette. If just one of the cylinders on the revolver has a live bullet, do you want to pull the trigger?

Matlab Code

function flowatmos.m

mov_flag = 1;
if mov_flag == 1
    moviename = 'atmostmp';
    aviobj = VideoWriter(moviename,'MPEG-4');
    aviobj.FrameRate = 12;
    open(aviobj);
end

Solar = 1.36e3;		% Solar constant outside atmosphere [J/sec/m2]
sig = 5.67e-8;		% Stefan-Boltzman constant [W/m2/K4]

% 1st-order model of Earth + Atmosphere

ta = 0.53;			% (0.53)transmissivity of air
tpa0 = 0.06;			% (0.06)primes are for thermal radiation
as0 = 0.11;			% (0.11)
aa0 = 0.30;			% (0.30)
apa0 = 0.31;        % (0.31)
c = 2.5;               % W/m2/K

xrange = [287 293];
yrange = [247 251];

rngx = xrange(2) - xrange(1);
rngy = yrange(2) - yrange(1);

[X,Y] = meshgrid(xrange(1):0.05:xrange(2), yrange(1):0.05:yrange(2));

smallarrow = 1;
Delta0 = 0.0000009;
for tloop =1:80
    
    Delta = Delta0*(exp((tloop-1)/8)-1);   % This Delta is exponential, but should become more linear over time
    date = floor(1850 + (tloop-1)*(2060-1850)/79);
    
    [x,y] = f5(X,Y);
    
    clf
    hold off
    eps = 0.002;
    for xloop = 1:11
        xs = xrange(1) +(xloop-1)*rngx/10 + eps;
        for yloop = 1:11
            ys = yrange(1) +(yloop-1)*rngy/10 + eps;
            
            streamline(X,Y,x,y,xs,ys)
            
        end
    end
    hold on
    [XQ,YQ] = meshgrid(xrange(1):1:xrange(2),yrange(1):1:yrange(2));
    smallarrow = 1;
    [xq,yq] = f5(XQ,YQ);
    quiver(XQ,YQ,xq,yq,.2,'r','filled')
    hold off
    
    axis([xrange(1) xrange(2) yrange(1) yrange(2)])
    set(gcf,'Color','White')
    
    fun = @root2d;
    x0 = [0 -40];
    x = fsolve(fun,x0);
    
    Ts = x(1) + 288
    Ta = x(2) + 288
    
    hold on
    rectangle('Position',[Ts-0.05 Ta-0.05 0.1 0.1],'Curvature',[1 1],'FaceColor',[1 0 0],'EdgeColor','k','LineWidth',2)
    
    posTs(tloop) = Ts;
    posTa(tloop) = Ta;
    
    plot(posTs,posTa,'k','LineWidth',2);
    hold off
    
    text(287.5,250.5,strcat('Date = ',num2str(date)),'FontSize',24)
    box on
    xlabel('Surface Temperature (oC)','FontSize',24)
    ylabel('Atmosphere Temperature (oC)','FontSize',24)
    
    hh = figure(1);
    pause(0.01)
    if mov_flag == 1
        frame = getframe(hh);
        writeVideo(aviobj,frame);
    end
    
end     % end tloop

if mov_flag == 1
    close(aviobj);
end

    function F = root2d(xp)   % Energy fluxes 
        
        x = xp + 288;
        feedfac = 0.001;      % feedback parameter 
        
        apa = apa0 + feedfac*(x(2)-248) + Delta;  % Changes in the atmospheric blanket
        tpa = tpa0 - feedfac*(x(2)-248) - Delta;
        as = as0 - feedfac*(x(1)-289);
        
        F(1) = c*(x(1)-x(2)) + sig*(1-apa)*x(1).^4 - sig*x(2).^4 - ta*(1-as)*Solar/4;
        F(2) = c*(x(1)-x(2)) + sig*(1-tpa - apa)*x(1).^4 - 2*sig*x(2).^4 + (1-aa0-ta+as*ta)*Solar/4;
        
    end

    function [x,y] = f5(X,Y)   % Dynamical flow equations
        
        k1 = 1/75;   % 75 year time constant for the Earth
        k2 = 1/25;   % 25 year time constant for the Atmosphere
        
        fun = @root2d;
        x0 = [0 0];
        x = fsolve(fun,x0);   % Solve for the temperatures that set the energy fluxes to zero
        
        Ts0 = x(1) + 288;   % Surface temperature in Kelvin
        Ta0 = x(2) + 288;   % Atmosphere temperature in Kelvin
        
        xtmp = -k1*(X - Ts0);   % Dynamical equations
        ytmp = -k2*(Y - Ta0);
        
        nrm = sqrt(xtmp.^2 + ytmp.^2);
        
        if smallarrow == 1
            x = xtmp./nrm;
            y = ytmp./nrm;
        else
            x = xtmp;
            y = ytmp;
        end
        
    end     % end f5

end       % end flowatmos


This model has a lot of parameters that can be tweaked. In addition to the parameters in the Table, the time dependence on the blanket properties of the atmosphere are governed by Delta0 and by feedfac for feedback of temperature on the atmosphere, such as increasing cloud cover and decrease ice cover. As an exercise, and using only small changes in the given parameters, find the following cases: 1) An increasing surface temperature is moderated by a falling atmosphere temperature; 2) The Earth goes into thermal run-away and ends like Venus; 3) The Earth initially warms then plummets into an ice age.

References

[1] D. D. Nolte, Introduction to Modern Dynamics: Chaos, Networks, Space and Time, 2nd Ed. (Oxford University Press, 2019)

[2] E. Boeker and R. van Grondelle, Environmental Physics (Wiley, 1995)

[3] Recent lecture at the National Academy of Engineering by John Holdren.

Is There a Quantum Trajectory? The Phase-Space Perspective

At the dawn of quantum theory, Heisenberg, Schrödinger, Bohr and Pauli were embroiled in a dispute over whether trajectories of particles, defined by their positions over time, could exist. The argument against trajectories was based on an apparent paradox: To draw a “line” depicting a trajectory of a particle along a path implies that there is a momentum vector that carries the particle along that path. But a line is a one-dimensional curve through space, and since at any point in time the particle’s position is perfectly localized, then by Heisenberg’s uncertainty principle, it can have no definable momentum to carry it along.

My previous blog shows the way out of this paradox, by assembling wavepackets that are spread in both space and momentum, explicitly obeying the uncertainty principle. This is nothing new to anyone who has taken a quantum course. But the surprising thing is that in some potentials, like a harmonic potential, the wavepacket travels without broadening, just like classical particles on a trajectory. A dramatic demonstration of this can be seen in this YouTube video. But other potentials “break up” the wavepacket, especially potentials that display classical chaos. Because phase space is one of the best tools for studying classical chaos, especially Hamiltonian chaos, it can be enlisted to dig deeper into the question of the quantum trajectory—not just about the existence of a quantum trajectory, but why how quantum systems retain a shadow of their classical counterparts.

Phase Space

Phase space is the state space of Hamiltonian systems. Concepts of phase space were first developed by Boltzmann as he worked on the problem of statistical mechanics. Phase space was later codified by Gibbs for statistical mechanics and by Poincare for orbital mechanics, and it was finally given its name by Paul and Tatiana Ehrenfest (a husband-wife team) in correspondence with the German physicist Paul Hertz (See Chapter 6, “The Tangled Tale of Phase Space”, in Galileo Unbound by D. D. Nolte (Oxford, 2018)).

The stretched-out phase-space functions … are very similar to the stochastic layer that forms in separatrix chaos in classical systems.

The idea of phase space is very simple for classical systems: it is just a plot of the momentum of a particle as a function of its position. For a given initial condition, the trajectory of a particle through its natural configuration space (for instance our 3D world) is traced out as a path through phase space. Because there is one momentum variable per degree of freedom, then the dimensionality of phase space for a particle in 3D is 6D, which is difficult to visualize. But for a one-dimensional dynamical system, like a simple harmonic oscillator (SHO) oscillating in a line, the phase space is just two-dimensional, which is easy to see. The phase-space trajectories of an SHO are simply ellipses, and if the momentum axis is scaled appropriately, the trajectories are circles. The particle trajectory in phase space can be animated just like a trajectory through configuration space as the position and momentum change in time p(x(t)). For the SHO, the point follows the path of a circle going clockwise.

Fig. 1 Phase space of the simple harmonic oscillator. The “orbits” have constant energy.

A more interesting phase space is for the simple pendulum, shown in Fig. 2. There are two types of orbits: open and closed. The closed orbits near the origin are like those of a SHO. The open orbits are when the pendulum is spinning around. The dividing line between the open and closed orbits is called a separatrix. Where the separatrix intersects itself is a saddle point. This saddle point is the most important part of the phase space portrait: it is where chaos emerges when perturbations are added.

Fig. 2 Phase space for a simple pendulum. For small amplitudes the orbits are closed like those of a SHO. For large amplitudes the orbits become open as the pendulum spins about its axis. (Reproduced from Introduction to Modern Dynamics, 2nd Ed., pg. )

One route to classical chaos is through what is known as “separatrix chaos”. It is easy to see why saddle points (also known as hyperbolic points) are the source of chaos: as the system trajectory approaches the saddle, it has two options of which directions to go. Any additional degree of freedom in the system (like a harmonic drive) can make the system go one way on one approach, and the other way on another approach, mixing up the trajectories. An example of the stochastic layer of separatrix chaos is shown in Fig. 3 for a damped driven pendulum. The chaotic behavior that originates at the saddle point extends out along the entire separatrix.

Fig. 3 The stochastic layer of separatrix chaos for a damped driven pendulum. (Reproduced from Introduction to Modern Dynamics, 2nd Ed., pg. )

The main question about whether or not there is a quantum trajectory depends on how quantum packets behave as they approach a saddle point in phase space. Since packets are spread out, it would be reasonable to assume that parts of the packet will go one way, and parts of the packet will go another. But first, one has to ask: Is a phase-space description of quantum systems even possible?

Quantum Phase Space: The Wigner Distribution Function

Phase-space portraits are arguably the most powerful tool in the toolbox of classical dynamics, and one would like to retain its uses for quantum systems. However, there is that pesky paradox about quantum trajectories that cannot admit the existence of one-dimensional curves through such a phase space. Furthermore, there is no direct way of taking a wavefunction and simply “finding” its position or momentum to plot points on such a quantum phase space.

The answer was found in 1932 by Eugene Wigner (1902 – 1905), an Hungarian physicist working at Princeton. He realized that it was impossible to construct a quantum probability distribution in phase space that had positive values everywhere. This is a problem, because negative probabilities have no direct interpretation. But Wigner showed that if one relaxed the requirements a bit, so that expectation values computed over some distribution function (that had positive and negative values) gave correct answers that matched experiments, then this distribution function would “stand in” for an actual probability distribution.

The distribution function that Wigner found is called the Wigner distribution function. Given a wavefunction ψ(x), the Wigner distribution is defined as

Fig. 4 Wigner distribution function in (x, p) phase space.

The Wigner distribution function is the Fourier transform of the convolution of the wavefunction. The pure position dependence of the wavefunction is converted into a spread-out position-momentum function in phase space. For a Gaussian wavefunction ψ(x) with a finite width in space, the W-function in phase space is a two-dimensional Gaussian with finite widths in both space and momentum. In fact, the Δx-Δp product of the W-function is precisely the uncertainty production of the Heisenberg uncertainty relation.

The question of the quantum trajectory from the phase-space perspective becomes whether a Wigner function behaves like a localized “packet” that evolves in phase space in a way analogous to a classical particle, and whether classical chaos is reflected in the behavior of quantum systems.

The Harmonic Oscillator

The quantum harmonic oscillator is a rare and special case among quantum potentials, because the energy spacings between all successive states are all the same. This makes it possible for a Gaussian wavefunction, which is a superposition of the eigenstates of the harmonic oscillator, to propagate through the potential without broadening. To see an example of this, watch the first example in this YouTube video for a Schrödinger cat state in a two-dimensional harmonic potential. For this very special potential, the Wigner distribution behaves just like a (broadened) particle on an orbit in phase space, executing nice circular orbits.

A comparison of the classical phase-space portrait versus the quantum phase-space portrait is shown in Fig. 5. Where the classical particle is a point on an orbit, the quantum particle is spread out, obeying the Δx-Δp Heisenberg product, but following the same orbit as the classical particle.

Fig. 5 Classical versus quantum phase-space portraits for a harmonic oscillator. For a classical particle, the trajectory is a point executing an orbit. For a quantum particle, the trajectory is a Wigner distribution that follows the same orbit as the classical particle.

However, a significant new feature appears in the Wigner representation in phase space when there is a coherent superposition of two states, known as a “cat” state, after Schrödinger’s cat. This new feature has no classical analog. It is the coherent interference pattern that appears at the zero-point of the harmonic oscillator for the Schrödinger cat state. There is no such thing as “classical” coherence, so this feature is absent in classical phase space portraits.

Two examples of Wigner distributions are shown in Fig. 6 for a statistical (incoherent) mixture of packets and a coherent superposition of packets. The quantum coherence signature is present in the coherent case but not the statistical mixture case. The coherence in the Wigner distribution represents “off-diagonal” terms in the density matrix that leads to interference effects in quantum systems. Quantum computing algorithms depend critically on such coherences that tend to decay rapidly in real-world physical systems, known as decoherence, and it is possible to make statements about decoherence by watching the zero-point interference.

Fig. 6 Quantum phase-space portraits of double wave packets. On the left, the wave packets have no coherence, being a statistical mixture. On the right is the case for a coherent superposition, or “cat state” for two wave packets in a one-dimensional harmonic oscillator.

Whereas Gaussian wave packets in the quantum harmonic potential behave nearly like classical systems, and their phase-space portraits are almost identical to the classical phase-space view (except for the quantum coherence), most quantum potentials cause wave packets to disperse. And when saddle points are present in the classical case, then we are back to the question about how quantum packets behave as they approach a saddle point in phase space.

Quantum Pendulum and Separatrix Chaos

One of the simplest anharmonic oscillators is the simple pendulum. In the classical case, the period diverges if the pendulum gets very close to going vertical. A similar thing happens in the quantum case, but because the motion has strong anharmonicity, an initial wave packet tends to spread dramatically as parts of the wavefunction less vertical stretch away from the part of the wave function that is more nearly vertical. Fig. 7 is a snap-shot about a eighth of a period after the wave packet was launched. The packet has already stretched out along the separatrix. A double-cat-state was used, so there is a second packet that has coherent interference with the first. To see a movie of the time evolution of the wave packet and the orbit in quantum phase space, see the YouTube video.

Fig. 7 Wavefunction of a quantum pendulum released near vertical. The phase-space portrait is very similar to the classical case, except that the phase-space distribution is stretched out along the separatrix. The initial state for the phase-space portrait was a cat state.

The simple pendulum does have a saddle point, but it is degenerate because the angle is modulo -2-pi. A simple potential that has a non-degenerate saddle point is a double-well potential.

Quantum Double-Well and Separatrix Chaos

The symmetric double-well potential has a saddle point at the mid-point between the two well minima. A wave packet approaching the saddle will split into to packets that will follow the individual separatrixes that emerge from the saddle point (the unstable manifolds). This effect is seen most dramatically in the middle pane of Fig. 8. For the full video of the quantum phase-space evolution, see this YouTube video. The stretched-out distribution in phase space is highly analogous to the separatrix chaos seen for the classical system.

Fig. 8 Phase-space portraits of the Wigner distribution for a wavepacket in a double-well potential. The packet approaches the central saddle point, where the probability density splits along the unstable manifolds.

Conclusion

A common statement often made about quantum chaos is that quantum systems tend to suppress chaos, only exhibiting chaos for special types of orbits that produce quantum scars. However, from the phase-space perspective, the opposite may be true. The stretched-out Wigner distribution functions, for critical wave packets that interact with a saddle point, are very similar to the stochastic layer that forms in separatrix chaos in classical systems. In this sense, the phase-space description brings out the similarity between classical chaos and quantum chaos.

References

1. T. Curtright, D. Fairlie, C. Zachos, A Concise Treatise on Quantum Mechanics in Phase Space.  (World Scientific, New Jersey, 2014).

2. J. R. Nagel, A Review and Application of the Finite-Difference Time-Domain Algorithm Applied to the Schrödinger Equation, ACES Journal, Vol. 24, NO. 1, pp. 1-8 (2009)

Is There a Quantum Trajectory?

Heisenberg’s uncertainty principle is a law of physics – it cannot be violated under any circumstances, no matter how much we may want it to yield or how hard we try to bend it.  Heisenberg, as he developed his ideas after his lone epiphany like a monk on the isolated island of Helgoland off the north coast of Germany in 1925, became a bit of a zealot, like a religious convert, convinced that all we can say about reality is a measurement outcome.  In his view, there was no independent existence of an electron other than what emerged from a measuring apparatus.  Reality, to Heisenberg, was just a list of numbers in a spread sheet—matrix elements.  He took this line of reasoning so far that he stated without exception that there could be no such thing as a trajectory in a quantum system.  When the great battle commenced between Heisenberg’s matrix mechanics against Schrödinger’s wave mechanics, Heisenberg was relentless, denying any reality to Schrödinger’s wavefunction other than as a calculation tool.  He was so strident that even Bohr, who was on Heisenberg’s side in the argument, advised Heisenberg to relent [1].  Eventually a compromise was struck, as Heisenberg’s uncertainty principle allowed Schrödinger’s wave functions to exist within limits—his uncertainty limits.

Disaster in the Poconos

Yet the idea of an actual trajectory of a quantum particle remained a type of heresy within the close quantum circles.  Years later in 1948, when a young Richard Feynman took the stage at a conference in the Poconos, he almost sabotaged his career in front of Bohr and Dirac—two of the giants who had invented quantum mechanics—by having the audacity to talk about particle trajectories in spacetime diagrams.

Feynman was making his first presentation of a new approach to quantum mechanics that he had developed based on path integrals. The challenge was that his method relied on space-time graphs in which “unphysical” things were allowed to occur.  In fact, unphysical things were required to occur, as part of the sum over many histories of his path integrals.  For instance, a key element in the approach was allowing electrons to travel backwards in time as positrons, or a process in which the electron and positron annihilate into a single photon, and then the photon decays back into an electron-positron pair—a process that is not allowed by mass and energy conservation.  But this is a possible history that must be added to Feynman’s sum.

It all looked like nonsense to the audience, and the talk quickly derailed.  Dirac pestered him with questions that he tried to deflect, but Dirac persisted like a raven.  A question was raised about the Pauli exclusion principle, about whether an orbital could have three electrons instead of the required two, and Feynman said that it could—all histories were possible and had to be summed over—an answer that dismayed the audience.  Finally, as Feynman was drawing another of his space-time graphs showing electrons as lines, Bohr rose to his feet and asked derisively whether Feynman had forgotten Heisenberg’s uncertainty principle that made it impossible to even talk about an electron trajectory.

It was hopeless.  The audience gave up and so did Feynman as the talk just fizzled out.  It was a disaster.  What had been meant to be Feynman’s crowning achievement and his entry to the highest levels of theoretical physics, had been a terrible embarrassment.  He slunk home to Cornell where he sank into one of his depressions.  At the close of the Pocono conference, Oppenheimer, the reigning king of physics, former head of the successful Manhattan Project and newly selected to head the prestigious Institute for Advanced Study at Princeton, had been thoroughly disappointed by Feynman.

But what Bohr and Dirac and Oppenheimer had failed to understand was that as long as the duration of unphysical processes was shorter than the energy differences involved, then it was literally obeying Heisenberg’s uncertainty principle.  Furthermore, Feynman’s trajectories—what became his famous “Feynman Diagrams”—were meant to be merely cartoons—a shorthand way to keep track of lots of different contributions to a scattering process.  The quantum processes certainly took place in space and time, conceptually like a trajectory, but only so far as time durations, and energy differences and locations and momentum changes were all within the bounds of the uncertainty principle.  Feynman had invented a bold new tool for quantum field theory, able to supply deep results quickly.  But no one at the Poconos could see it.

Fig. 1 The first Feynman diagram.

Coherent States

When Feynman had failed so miserably at the Pocono conference, he had taken the stage after Julian Schwinger, who had dazzled everyone with his perfectly scripted presentation of quantum field theory—the competing theory to Feynman’s.  Schwinger emerged the clear winner of the contest.  At that time, Roy Glauber (1925 – 2018) was a young physicist just taking his PhD from Schwinger at Harvard, and he later received a post-doc position at Princeton’s Institute for Advanced Study where he became part of a miniature revolution in quantum field theory that revolved around—not Schwinger’s difficult mathematics—but Feynman’s diagrammatic method.  So Feynman won in the end.  Glauber then went on to Caltech, where he filled in for Feynman’s lectures when Feynman was off in Brazil playing the bongos.  Glauber eventually returned to Harvard where he was already thinking about the quantum aspects of photons in 1956 when news of the photon correlations in the Hanbury-Brown Twiss (HBT) experiment were published.  Three years later, when the laser was invented, he began developing a theory of photon correlations in laser light that he suspected would be fundamentally different than in natural chaotic light. 

Because of his background in quantum field theory, and especially quantum electrodynamics, it was fairly easy to couch the quantum optical properties of coherent light in terms of Dirac’s creation and annihilation operators of the electromagnetic field. Glauber developed a “coherent state” operator that was a minimum uncertainty state of the quantized electromagnetic field, related to the minimum-uncertainty wave functions derived initially by Schrödinger in the late 1920’s. The coherent state represents a laser operating well above the lasing threshold and behaved as “the most classical” wavepacket that can be constructed.  Glauber was awarded the Nobel Prize in Physics in 2005 for his work on such “Glauber states” in quantum optics.

Fig. 2 Roy Glauber

Quantum Trajectories

Glauber’s coherent states are built up from the natural modes of a harmonic oscillator.  Therefore, it should come as no surprise that these coherent-state wavefunctions in a harmonic potential behave just like classical particles with well-defined trajectories. The quadratic potential matches the quadratic argument of the the Gaussian wavepacket, and the pulses propagate within the potential without broadening, as in Fig. 3, showing a snapshot of two wavepackets propagating in a two-dimensional harmonic potential. This is a somewhat radical situation, because most wavepackets in most potentials (or even in free space) broaden as they propagate. The quadratic potential is a special case that is generally not representative of how quantum systems behave.

Fig. 3 Harmonic potential in 2D and two examples of pairs of pulses propagating without broadening. The wavepackets in the center are oscillating in line, and the wavepackets on the right are orbiting the center of the potential in opposite directions. (Movies of the quantum trajectories can be viewed at Physics Unbound.)

To illustrate this special status for the quadratic potential, the wavepackets can be launched in a potential with a quartic perturbation. The quartic potential is anharmonic—the frequency of oscillation depends on the amplitude of oscillation unlike for the harmonic oscillator, where amplitude and frequency are independent. The quartic potential is integrable, like the harmonic oscillator, and there is no avenue for chaos in the classical analog. Nonetheless, wavepackets broaden as they propagate in the quartic potential, eventually spread out into a ring in the configuration space, as in Fig. 4.

Fig. 4 Potential with a quartic corrections. The initial gaussian pulses spread into a “ring” orbiting the center of the potential.

A potential with integrability has as many conserved quantities to the motion as there are degrees of freedom. Because the quartic potential is integrable, the quantum wavefunction may spread, but it remains highly regular, as in the “ring” that eventually forms over time. However, integrable potentials are the exception rather than the rule. Most potentials lead to nonintegrable motion that opens the door to chaos.

A classic (and classical) potential that exhibits chaos in a two-dimensional configuration space is the famous Henon-Heiles potential. This has a four-dimensional phase space which admits classical chaos. The potential has a three-fold symmetry which is one reason it is non-integral, since a particle must “decide” which way to go when it approaches a saddle point. In the quantum regime, wavepackets face the same decision, leading to a breakup of the wavepacket on top of a general broadening. This allows the wavefunction eventually to distribute across the entire configuration space, as in Fig. 5.

Fig. 5 The Henon-Heiles two-dimensional potential supports Hamiltonian chaos in the classical regime. In the quantum regime, the wavefunction spreads to eventually fill the accessible configuration space (for constant energy).

Youtube Video

Movies of quantum trajectories can be viewed at my Youtube Channel, Physics Unbound. The answer to the question “Is there a quantum trajectory?” can be seen visually as the movies run—they do exist in a very clear sense under special conditions, especially coherent states in a harmonic oscillator. And the concept of a quantum trajectory also carries over from a classical trajectory in cases when the classical motion is integrable, even in cases when the wavefunction spreads over time. However, for classical systems that display chaotic motion, wavefunctions that begin as coherent states break up into chaotic wavefunctions that fill the accessible configuration space for a given energy. The character of quantum evolution of coherent states—the most classical of quantum wavefunctions—in these cases reflects the underlying character of chaotic motion in the classical analogs. This process can be seen directly watching the movies as a wavepacket approaches a saddle point in the potential and is split. Successive splits of the multiple wavepackets as they interact with the saddle points is what eventually distributes the full wavefunction into its chaotic form.

Therefore, the idea of a “quantum trajectory”, so thoroughly dismissed by Heisenberg, remains a phenomenological guide that can help give insight into the behavior of quantum systems—both integrable and chaotic.

As a side note, the laws of quantum physics obey time-reversal symmetry just as the classical equations do. In the third movie of “A Quantum Ballet“, wavefunctions in a double-well potential are tracked in time as they start from coherent states that break up into chaotic wavefunctions. It is like watching entropy in action as an ordered state devolves into a disordered state. But at the half-way point of the movie, the imaginary part of the wavefunction has its sign flipped, and the dynamics continue. But now the wavefunctions move from disorder into an ordered state, seemingly going against the second law of thermodynamics. Flipping the sign of the imaginary part of the wavefunction at just one instant in time plays the role of a time-reversal operation, and there is no violation of the second law.

References

[1] See Chapter 8 , On the Quantum Footpath, in Galileo Unbound, D. D. Nolte (Oxford University Press, 2018)

[2] J. R. Nagel, A Review and Application of the Finite-Difference Time-Domain Algorithm Applied to the Schrödinger Equation, ACES Journal, Vol. 24, NO. 1, pp. 1-8 (2009)

Quantum Chaos and the Cheshire Cat

Alice’s disturbing adventures in Wonderland tumbled upon her like a string of accidents as she wandered a world of chaos.  Rules were never what they seemed and shifted whenever they wanted.  She even met a cat who grinned ear-to-ear and could disappear entirely, or almost entirely, leaving only its grin.

The vanishing Cheshire Cat reminds us of another famous cat—Arnold’s Cat—that introduced the ideas of stretching and folding of phase-space volumes in non-integrable Hamiltonian systems.  But when Arnold’s Cat becomes a Quantum Cat, a central question remains: What happens to the chaotic behavior of the classical system … does it survive the transition to quantum mechanics?  The answer is surprisingly like the grin of the Cheshire Cat—the cat vanishes, but the grin remains.  In the quantum world of the Cheshire Cat, the grin of the classical cat remains even after the rest of the cat vanished. 

The Cheshire Cat fades away, leaving only its grin, like a fine filament, as classical chaos fades into quantum, leaving behind a quantum scar.

The Quantum Mechanics of Classically Chaotic Systems

The simplest Hamiltonian systems are integrable—they have as many constants of the motion as degrees of freedom.  This holds for quantum systems as well as for classical.  There is also a strong correspondence between classical and quantum systems for the integrable cases—literally the Correspondence Principle—that states that quantum systems at high quantum number approach classical behavior.  Even at low quantum numbers, classical resonances are mirrored by quantum eigenfrequencies that can show highly regular spectra.

But integrable systems are rare—surprisingly rare.  Almost no real-world Hamiltonian system is integrable, because the real world warps the ideal.  No spring can displace indefinitely, and no potential is perfectly quadratic.  There are always real-world non-idealities that destroy one constant of the motion or another, opening the door to chaos.

When classical Hamiltonian systems become chaotic, they don’t do it suddenly.  Almost all transitions to chaos in Hamiltonian systems are gradual.  One of the best examples of this is the KAM theory that starts with invariant action integrals that generate invariant tori in phase space.  As nonintegrable perturbations increase, the tori break up slowly into island chains of stability as chaos infiltrates the separatrixes—first as thin filaments of chaos surrounding the islands—then growing in width to take up more and more of phase space.  Even when chaos is fully developed, small islands of stability can remain—the remnants of stable orbits of the unperturbed system.

When the classical becomes quantum, chaos softens.  Quantum wave functions don’t like to be confined—they spread and they tunnel.  The separatrix of classical chaos—that barrier between regions of phase space—cannot constrain the exponential tails of wave functions.  And the origin of chaos itself—the homoclinic point of the separatrix—gets washed out.  Then the regular orbits of the classical system reassert themselves, and they appear, like the vestige of the Cheshire Cat, as a grin.

The Quantum Circus

The empty stadium is a surprisingly rich dynamical system that has unexpected structure in both the classical and the quantum domain.  Its importance in classical dynamics comes from the fact that its periodic orbits are unstable and its non-periodic orbits are ergodic (filling all available space if given long enough).  The stadium itself is empty so that particles (classical or quantum) are free to propagate between reflections from the perfectly-reflecting walls of the stadium.  The ergodicity comes from the fact that the stadium—like a classic Roman chariot-race stadium, also known as a circus—is not a circle, but has a straight stretch between two half circles.  This simple modification takes the stable orbits of the circle into the unstable orbits of the stadium.

A single classical orbit in a stadium is shown in Fig 1. This is an ergodic orbit that is non-periodic and eventually would fill the entire stadium space. There are other orbits that are nearly periodic, such as one that bounces back and forth vertically between the linear portions, but even this orbit will eventually wander into the circular part of the stadium and then become ergodic. The big quantum-classical question is what happens to these classical orbits when the stadium is shrunk to the nanoscale?

Fig. 1 A classical trajectory in a stadium. It will eventually visit every point, a property known as ergodicity.

Simulating an evolving quantum wavefunction in free space is surprisingly simple. Given a beginning quantum wavefunction A(x,y,t0), the discrete update equation is

Perfect reflection from the boundaries of the stadium are incorporated through imposing a boundary condition that sends the wavefunction to zero. Simple!

A snap-shot of a wavefunction evolving in the stadium is shown in Fig. 2. To see a movie of the time evolution, see my YouTube episode.

Fig. 2 Snapshot of a quantum wavefunction in the stadium. (From YouTube)

The time average of the wavefunction after a long time has passed is shown in Fig. 3. Other than the horizontal nodal line down the center of the stadium, there is little discernible structure or symmetry. This is also true for the mean squared wavefunction shown in Fig. 4, although there is some structure that may be emerging in the semi-circular regions.

Fig. 3 Time-average wavefunction after a long time.
Fig. 4 Time-average of the squared wavefunction after a long time.

On the other hand, for special initial conditions that have a lot of symmetry, something remarkable happens. Fig. 5 shows several mean-squared results for special initial conditions. There is definite structure in these cases that were given the somewhat ugly name “quantum scars” in the 1980’s by Eric Heller who was one of the first to study this phenomenon [1].

Fig. 5 Quantum scars reflect periodic (but unstable) orbits of the classical system. Quantum effects tend to quench chaos and favor regular motion.

One can superpose highly-symmetric classical trajectories onto the figures, as shown in the bottom row. All of these classical orbits go through a high-symmetry point, such as the center of the stadium (on the left image) and through the focal point of the circular mirrors (in the other two images). The astonishing conclusion of this exercise is that the highly-symmetric periodic classical orbits remain behind as quantum scars—like the Cheshire Cat’s grin—when the system is in the quantum realm. The classical orbits that produce quantum scars have the important property of being periodic but unstable. A slight perturbation from the symmetric trajectory causes it to eventually become ergodic (chaotic). These scars are regions with enhanced probability density, what might be termed “quantum trajectories”, but do not show strong interference patterns.

It is important to make the distinction that it is also possible to construct special wavefunctions that are strictly periodic, such as a wave bouncing perfectly vertically between the straight portions. This leads to large-scale interference patterns that are not the same as the quantum scars.

Quantum Chaos versus Laser Speckle

In addition to the bouncing-wave cases that do not strictly produce quantum scars, there is another “neutral” phenomenon that produces interference patterns that look a lot like scars, but are simply the random addition of lots of plane waves with the same wavelength [2]. A snapshot in time of one of these superpositions is shown in Fig. 6. To see how the waves add together, see the YouTube channel episode.

Fig. 6 The sum of 100 randomly oriented plane waves of constant wavelength. (A snapshot from YouTube.)

References

[1] Heller E J, Bound-state eigenfunctions of classically chaotic hamiltonian-systems – scars of periodic-orbits, Physical Review Letters 53 ,1515 (1984)

[2] Gutzwiller M C, Chaos in classical and quantum mechanics (New York: New York : Springer-Verlag, 1990)

George Cantor meets Machine Learning: Deep Discrete Encoders

Machine learning is characterized, more than by any other aspect, by the high dimensionality of the data spaces it seeks to find patterns in.  Hence, one of the principle functions of machine learning is to reduce the dimensionality of the data to lower dimensions—a process known as dimensionality reduction.

There are two driving reasons to reduce the dimensionality of data: 

First, typical dimensionalities faced by machine learning problems can be in the hundreds or thousands.  Trying to visualize such high dimensions may sound mind expanding, but it is really just saying that a data problem may have hundreds or thousands of different data entries for a single event.  And many, or even most, of those entries may not be independent.  While many others may be pure noise—or at least not relevant to the pattern.  Deep learning dimensionality reduction seeks to find the dependences—many of them nonlinear and non-single-valued (non-invertible)—and to reject the noise channels.

Second, the geometry of high dimension is highly unintuitive.  Many of the things we take for granted in our pitifully low dimension of 3 (or 4 if you include time) just don’t hold in high dimensions.  For instance, in very high dimensions almost all random vectors in a hyperspace are orthogonal, and almost all random unit vectors in the hyperspace are equidistant.  Even the topology of landscapes in high dimensions is unintuitive—there are far more mountain ridges than mountain peaks—with profound consequences for dynamical processes such as random walks (see my Blog on a Random Walk in 10 Dimensions).  In fact, we owe our evolutionary existence to this effect!  Therefore, deep dimensionality reduction is a way to bring complex data down to a dimensionality where our intuition can be applied to “explain” the data.

But what is a dimension?  And can you find the “right” dimensionality when performing dimensionality reduction?  Once again, our intuition struggles with these questions, as first discovered by a nineteenth-century German mathematician whose mind-expanding explorations of the essence of different types of infinity shattered the very concept of dimension.

George Cantor

Georg Cantor (1845 – 1918) was born in Russia, and the family moved to Germany while Cantor was still young.  In 1863, he enrolled at the University of Berlin where he sat on lectures by Weierstrass and Kronecker.  He received his doctorate in 1867 and his Habilitation in 1869, moving into a faculty position at the University of Halle and remaining there for the rest of his career.  Cantor published a paper early in 1872 on the question of whether the representation of an arbitrary function by a Fourier series is unique.  He had found that even though the series might converge to a function almost everywhere, there surprisingly could still be an infinite number of points where the convergence failed.  Originally, Cantor was interested in the behavior of functions at these points, but his interest soon shifted to the properties of the points themselves, which became his life’s work as he developed set theory and transfinite mathematics.

Georg Cantor (1845 – 1918)

In 1878, in a letter to his friend Richard Dedekind, Cantor showed that there was a one-to-one correspondence between the real numbers and the points in any n-dimensional space.  He was so surprised by his own result that he wrote to Dedekind “I see it, but I don’t believe it.”  Previously, the ideas of a dimension, moving in a succession from one (a line) to two (a plane) to three (a volume) had been absolute.  However, with his newly-discovered mapping, the solid concepts of dimension and dimensionality began to dissolve.  This was just the beginning of a long history of altered concepts of dimension (see my Blog on the History of Fractals).

Mapping Two Dimensions to One

Cantor devised a simple mapping that is at once obvious and subtle. To take a specific example of mapping a plane to a line, one can consider the two coordinate values (x,y) both with a closed domain on [0,1]. Each can be expressed as a decimal fraction given by

These two values can be mapped to a single value through a simple “ping-pong” between the decimal digits as

If x and y are irrational, then this presents a simple mapping of a pair of numbers (two-dimensional coordinates) to a single number. In this way, the special significance of two dimensions relative to one dimension is lost. In terms of set theory nomenclature, the cardinality of the two-dimensional R2 is the same as for the one-dimensions R.

Nonetheless, intuition about dimension still has it merits, and a subtle aspect of this mapping is that it contains discontinuities. These discontinuities are countably infinite, but they do disrupt any smooth transformation from 2D to 1D, which is where the concepts of intrinsic dimension are hidden. The topological dimension of the plane is clearly equal to 2, and that of the line is clearly equal to 1, determined by the dimensionality D+1 of cuts that are required to separate the sets. The area is separated by a D = 1 line, while the line is separated by a D= 0 point.

Fig. 1 A mapping of a 2D plane to a 1D line. Every point in the plane has an associated point on the line (except for a countably infinite number of special points … see the next figure.)

While the discontinuities help preserve the notions of dimension, and they are countably infinite (with the cardinality of the rational numbers), they pertain merely to the representation of number by decimals. As an example, in decimal notation for a number a = 14159/105, one has two equivalent representations

trailing either infinitely many 0’s or infinitely many 9’s. Despite the fact that these representations point to the same number, when it is used as one of the pairs in the bijection of Fig. 1, it produces two distinctly different numbers of t in R. Fortunately, there is a way to literally sweep this under the rug. Any time one retrieves a number that has repeating 0’s or 9’s, simply sweep it to the right it by dividing by a power of 10 to a region of trailing zeros for a different number, call it b, as in Fig. 2. The shifted version of a will not overlap with the number b, because b also ends in repeating 0’s or 9’s and so is swept farther to the right, and so on to infinity, so none of these numbers overlap, each is distinct, and the mapping becomes what is known as a bijection with one-to-one correspondence.

Fig. 3 The “measure-zero” fix. Numbers that have two equivalent representations can be shifted to the right to replace other numbers that are shifted further to the right, and so on to infinity. There is infinite room within the reals to accommodate the countably infinite number of repeating-decimal numbers.

Space-Filling Curves

When the peripatetic mathematician Guiseppe Peano learned of Cantor’s result for the mapping of 2D to 1D, he sought to demonstrate the correspondence geometrically, and he constructed a continuous curve that filled space, publishing the method in Sur une courbe, qui remplit toute une aire plane [1] in 1890.  The construction of Peano’s curve proceeds by taking a square and dividing it into 9 equal sub squares.  Lines connect the centers of each of the sub squares.  Then each sub square is divided again into 9 sub squares whose centers are all connected by lines.  At this stage, the original pattern, repeated 9 times, is connected together by 8 links, forming a single curve.  This process is repeated infinitely many times, resulting in a curve that passes through every point of the original plane square.  In this way, a line is made to fill a plane.  Where Cantor had proven abstractly that the cardinality of the real numbers was the same as the points in n-dimensional space, Peano created a specific example. 

This was followed quickly by another construction, invented by David Hilbert in 1891, that divided the square into four instead of nine, simplifying the construction, but also showing that such constructions were easily generated [2].  The space-filling curves of Peano and Hilbert have the extreme properties that a one-dimensional curve approaches every point in a two-dimensional space.  These curves have topological dimensionality of 1D and a fractal dimensionality of 2D. 

Fig. 3 Peano’s (1890) and Hilbert’s (1891) plane-filling curves.  When the iterations are taken to infinity, the curves approach every point of two-dimensional space arbitrarily closely, giving them a dimension D = 2.

A One-Dimensional Deep Discrete Encoder for MNIST

When performing dimensionality reduction in deep learning it is tempting to think that there is an underlying geometry to the data—as if they reside on some submanifold of the original high-dimensional space.  While this can be the case (for which dimensionality reduction is called deep metric learning) more often there is no such submanifold.  For instance, when there is a highly conditional nature to the different data channels, in which some measurements are conditional on the values of other measurements, then there is no simple contiguous space that supports the data.

It is also tempting to think that a deep learning problem has some intrinsic number of degrees of freedom which should be matched by the selected latent dimensionality for the dimensionality reduction.  For instance, if there are roughly five degrees of freedom buried within a complicated data set, then it is tempting to believe that the appropriate latent dimensionality also should be chosen to be five.  But this also is missing the mark. 

Take, for example, the famous MNIST data set of hand-written digits from 0 to 9.  Each digit example is contained in a 28-by28 two-dimensional array that is typically flattened to a 784-element linear vector that locates that single digit example within a 784-dimensional space.  The goal is to reduce the dimensionality down to a manageable number—but what should the resulting latent dimensionality be?  How many degrees of freedom are involved in writing digits?  Furthermore, given that there are ten classes of digits, should the chosen dimensionality of the latent space be related to the number of classes?

Fig. 4 A sampling of MNIST digits.

To illustrate that the dimensionality of the latent space has little to do with degrees of freedom or the number of classes, let’s build a simple discrete encoder that encodes the MNIST data set to the one-dimensional line—following Cantor.

The deep network of the encoder can have a simple structure that terminates with a single neuron that has a piece-wise linear output.  The objective function (or loss function) measures the squared distances of the outputs of the single neuron, after transformation by the network, to the associated values 0 to 9.  And that is it!  Train the network by minimizing the loss function.

Fig. 5 A deep encoder that encodes discrete classes to a one-dimensional latent variable.

The results of the linear encoder are shown in Fig. 6 (the transverse directions are only for ease of visualization…the classification is along the long axis). The different dots are specific digits, and the colors are the digit value. There is a clear trend from 1 through 10, although with this linear encoder there is substantial overlap among the point clouds.

Fig. 6 Latent space for a one-dimensional (line) encoder. The transverse dimensions are only for visualization.

Despite appearances, this one-dimensional discrete line encoder is NOT a form of regression. There is no such thing as 1.5 as the average of a 1-digit and a 2-digit. And 5 is not the average of a 4-digit and a 6-digit. The digits are images that have no intrinsic numerical value. Therefore, this one-dimensional encoder is highly discontinuous, and intuitive ideas about intrinsic dimension for continuous data remain secure.

Summary

The purpose of this Blog (apart from introducing Cantor and his ideas on dimension theory) was to highlight a key question of dimensionality reduction in representation theory: What is the intrinsic dimensionality of a dataset? The answer, in the case of discrete classes, is that there is no intrinsic dimensionality. Having 1 degree of freedom or 10 degrees of freedom, i.e. latent dimensionalities of 1 or 10, is mostly irrelevant. In ideal cases, one is just as good as the other.

On the other hand, for real-world data with its inevitable variability and finite variance, there can be reasons to choose one latent dimensionality over another. In fact, the “best” choice of dimensionality is one less than the number of classes. For instance, in the case of MNIST with its 10 classes, that is a 9-dimensional latent space. The reason this is “best” has to do with the geometry of high-dimensional simplexes, which will be the topic of a future Blog.


[1] G.Peano: Sur une courbe, qui remplit toute une aire plane. Mathematische Annalen 36 (1890), 157–160.

[2] D. Hilbert, “Uber die stetige Abbildung einer Linie auf ein Fllichenstilck,” Mathemutische Anna/en, vol. 38, pp. 459-460, (1891)

The Anharmonic Harmonic Oscillator

Harmonic oscillators are one of the fundamental elements of physical theory.  They arise so often in so many different contexts that they can be viewed as a central paradigm that spans all aspects of physics.  Some famous physicists have been quoted to say that the entire universe is composed of simple harmonic oscillators (SHO).

Despite the physicist’s love affair with it, the SHO is pathological! First, it has infinite frequency degeneracy which makes it prone to the slightest perturbation that can tip it into chaos, in contrast to non-harmonic cyclic dynamics that actually protects us from the chaos of the cosmos (see my Blog on Chaos in the Solar System). Second, the SHO is nowhere to be found in the classical world.  Linear oscillators are purely harmonic, with a frequency that is independent of amplitude—but no such thing exists!  All oscillators must be limited, or they could take on infinite amplitude and infinite speed, which is nonsense.  Even the simplest of simple harmonic oscillators would be limited by nothing other than the speed of light.  Relativistic effects would modify the linearity, especially through time dilation effects, rendering the harmonic oscillator anharmonic.

Despite the physicist’s love affair with it, the SHO is pathological!

Therefore, for students of physics as well as practitioners, it is important to break the shackles imposed by the SHO and embrace the anharmonic harmonic oscillator as the foundation of physics. Here is a brief survey of several famous anharmonic oscillators in the history of physics, followed by the mathematical analysis of the relativistic anharmonic linear-spring oscillator.

Anharmonic Oscillators

Anharmonic oscillators have a long venerable history with many varieties.  Many of these have become central models in systems as varied as neural networks, synchronization, grandfather clocks, mechanical vibrations, business cycles, ecosystem populations and more.

Christiaan Huygens

Already by the mid 1600’s Christiaan Huygens (1629 – 1695) knew that the pendulum becomes slower when it has larger amplitudes.  The pendulum was one of the best candidates for constructing an accurate clock needed for astronomical observations and for the determination of longitude at sea.  Galileo (1564 – 1642) had devised the plans for a rudimentary pendulum clock that his son attempted to construct, but the first practical pendulum clock was invented and patented by Huygens in 1657.  However, Huygens’ modified verge escapement required his pendulum to swing with large amplitudes, which brought it into the regime of anharmonicity. The equations of the simple pendulum are truly simple, but the presence of the sinθ makes it the simplest anharmonic oscillator.

Therefore, Huygens searched for the mathematical form of a tautochrone curve for the pendulum (a curve that is traversed with equal times independently of amplitude) and in the process he invented the involutes and evolutes of a curve—precursors of the calculus.  The answer to the tautochrone question is a cycloid (see my Blog on Huygen’s Tautochrone Curve).

Hermann von Helmholtz

Hermann von Helmholtz (1821 – 1894) was possibly the greatest German physicist of his generation—an Einstein before Einstein—although he began as a medical doctor.  His study of muscle metabolism, drawing on the early thermodynamic work of Carnot, Clapeyron and Joule, led him to explore and to express the conservation of energy in its clearest form.  Because he postulated that all forms of physical processes—electricity, magnetism, heat, light and mechanics—contributed to the interconversion of energy, he sought to explore them all, bringing his research into the mainstream of physics.  His laboratory in Berlin became world famous, attracting to his laboratory the early American physicists Henry Rowland (founder and first president of the American Physical Society) and Albert Michelson (first American Nobel prize winner).

Even the simplest of simple harmonic oscillators would be limited by nothing other than the speed of light.  

Helmholtz also pursued a deep interest in the physics of sensory perception such as sound.  This research led to his invention of the Helmholtz oscillator which is a highly anharmonic relaxation oscillator in which a tuning fork was placed near an electromagnet that was powered by a mercury switch attached to the fork. As the tuning fork vibrated, the mercury came in and out of contact with it, turning on and off the magnet, which fed back on the tuning fork, and so on, enabling the device, once started, to continue oscillating without interruption. This device is called a tuning-fork resonator, and it became the first door-bell buzzers.  (These are not to be confused with Helmholtz resonances that are formed when blowing across the open neck of a beer bottle.)

Lord Rayleigh

Baron John Strutt, the Lord Rayleigh (1842 – 1919) like Helmholtz also was a generalist and had a strong interest in the physics of sound.  He was inspired by Helmholtz’ oscillator to consider general nonlinear anharmonic oscillators mathematically.  He was led to consider the effects of anharmonic terms added to the harmonic oscillator equation.  in a paper published in the Philosophical Magazine issue of 1883 with the title On Maintained Vibrations, he introduced an equation to describe the self-oscillation by adding an extra term to a simple harmonic oscillator. The extra term depended on the cube of the velocity, representing a balance between the gain of energy from a steady force and natural dissipation by friction.  Rayleigh suggested that this equation applied to a wide range of self-oscillating systems, such as violin strings, clarinet reeds, finger glasses, flutes, organ pipes, among others (see my Blog on Rayleigh’s Harp.)

Georg Duffing

The first systematic study of quadratic and cubic deviations from the harmonic potential was performed by the German engineer George Duffing (1861 – 1944) under the conditions of a harmonic drive. The Duffing equation incorporates inertia, damping, the linear spring and nonlinear deviations.

Fig. 1 The Duffing equation adds a nonlinear term to the spring force when alpha is positive, stiffening or weakening it for larger excursions when beta is positive or negative, respectively. And by making alpha negative and beta positive, it describes a damped driven double-well potential.

Duffing confirmed his theoretical predictions with careful experiments and established the lowest-order corrections to ideal masses on springs. His work was rediscovered in the 1960’s after Lorenz helped launch numerical chaos studies. Duffing’s driven potential becomes especially interesting when α is negative and β is positive, creating a double-well potential. The driven double-well is a classic chaotic system (see my blog on Duffing’s Oscillator).

Balthasar van der Pol

Autonomous oscillators are one of the building blocks of complex systems, providing the fundamental elements for biological oscillatorsneural networksbusiness cyclespopulation dynamics, viral epidemics, and even the rings of Saturn.  The most famous autonomous oscillator (after the pendulum clock) is named for a Dutch physicist, Balthasar van der Pol (1889 – 1959), who discovered the laws that govern how electrons oscillate in vacuum tubes, but the dynamical system that he developed has expanded to become the new paradigm of cyclic dynamical systems to replace the SHO (see my Blog on GrandFather Clocks.)

Fig. 2 The van der Pol equation is the standard simple harmonic oscillator with a gain term that saturates for large excursions leading to a limit cycle oscillator.

Turning from this general survey, let’s find out what happens when special relativity is added to the simplest SHO [1].

Relativistic Linear-Spring Oscillator

The theory of the relativistic one-dimensional linear-spring oscillator starts from a relativistic Lagrangian of a free particle (with no potential) yielding the generalized relativistic momentum

The Lagrangian that accomplishes this is [2]

where the invariant 4-velocity is

When the particle is in a potential, the Lagrangian becomes

The action integral that is minimized is

and the Lagrangian for integration of the action integral over proper time is

The relativistic modification in the potential energy term of the Lagrangian is not in the spring constant, but rather is purely a time dilation effect.  This is captured by the relativistic Lagrangian

where the dot is with respect to proper time τ. The classical potential energy term in the Lagrangian is multiplied by the relativistic factor γ, which is position dependent because of the non-constant speed of the oscillator mass.  The Euler-Lagrange equations are

where the subscripts in the variables are a = 0, 1 for the time and space dimensions, respectively.  The derivative of the time component of the 4-vector is

From the derivative of the Lagrangian with respect to speed, the following result is derived

where E is the constant total relativistic energy.  Therefore,

which provides an expression for the derivative of the coordinate time with respect to the proper time where

The position-dependent γ(x) factor is then

The Euler-Lagrange equation with a = 1 is

which gives

providing the flow equations for the (an)harmonic oscillator with respect to proper time

This flow represents a harmonic oscillator modified by the γ(x) factor, due to time dilation, multiplying the spring force term.  Therefore, at relativistic speeds, the oscillator is no longer harmonic even though the spring constant remains truly a constant.  The term in parentheses effectively softens the spring for larger displacement, and hence the frequency of oscillation becomes smaller. 

The state-space diagram of the anharmonic oscillator is shown in Fig. 3 with respect to proper time (the time read on a clock co-moving with the oscillator mass).  At low energy, the oscillator is harmonic with a natural period of the SHO.  As the maximum speed exceeds β = 0.8, the period becomes longer and the trajectory less sinusoidal.  The position and speed for β = 0.9999 is shown in Fig. 4.  The mass travels near the speed of light as it passes the origin, producing significant time dilation at that instant.  The average time dilation through a single cycle is about a factor of three, despite the large instantaneous γ = 70 when the mass passes the origin.

Fig. 3 State-space diagram in relativistic units relative to proper time of a relativistic (an)harmonic oscillator with a constant spring constant for several relative speeds β. The anharmonicity becomes pronounced above β = 0.8.
Fig. 4 Position and speed in relativistic units relative to proper time of a relativistic (an)harmonic oscillator with a constant spring constant for β = 0.9999.  The period of oscillation in this simulation is nearly three times longer than the natural frequency at small amplitudes.

[1] W. Moreau, R. Easther, and R. Neutze, “RELATIVISTIC (AN)HARMONIC OSCILLATOR,” American Journal of Physics, Article vol. 62, no. 6, pp. 531-535, Jun (1994)

[2] D. D. Nolte, Introduction to Modern Dynamics: Chaos, Networks, Space and Time, 2nd. ed. (Oxford University Press, 2019)

The Many Worlds of the Quantum Beam Splitter

In one interpretation of quantum physics, when you snap your fingers, the trajectory you are riding through reality fragments into a cascade of alternative universes—one for each possible quantum outcome among all the different quantum states composing the molecules of your fingers. 

This is the Many-Worlds Interpretation (MWI) of quantum physics first proposed rigorously by Hugh Everett in his doctoral thesis in 1957 under the supervision of John Wheeler at Princeton University.  Everett had been drawn to this interpretation when he found inconsistencies between quantum physics and gravitation—topics which were supposed to have been his actual thesis topic.  But his side-trip into quantum philosophy turned out to be a one-way trip.  The reception of his theory was so hostile, no less than from Copenhagen and Bohr himself, that Everett left physics and spent a career at the Pentagon.

Resurrecting MWI in the Name of Quantum Information

Fast forward by 20 years, after Wheeler had left Princeton for the University of Texas at Austin, and once again a young physicist was struggling to reconcile quantum physics with gravity.  Once again the many worlds interpretation of quantum physics seemed the only sane way out of the dilemma, and once again a side-trip became a life-long obsession.

David Deutsch, visiting Wheeler in the early 1980’s, became convinced that the many worlds interpretation of quantum physics held the key to paradoxes in the theory of quantum information.  He was so convinced, that he began a quest to find a physical system that operated on more information than could be present in one universe at a time.  If such a physical system existed, it would be because streams of information from more than one universe were coming together and combining in a way that allowed one of the universes to “borrow” the information from the other.

It took only a year or two before Deutsch found what he was looking for—a simple quantum algorithm that yielded twice as much information as would be possible if there were no parallel universes.  This is the now-famous Deutsch algorithm—the first quantum algorithm [1].  At the heart of the Deutsch algorithm is a simple quantum interference.  The algorithm did nothing useful—but it convinced Deutsch that two universes were interfering coherently in the measurement process, giving that extra bit of information that should not have been there otherwise.  A few years later, the Deutsch-Josza algorithm [2] expanded the argument to interfere an exponentially larger amount of information streams from an exponentially larger number of universes to create a result that was exponentially larger than any classical computer could produce.  This marked the beginning of the quest for the quantum computer that is running red-hot today.

Deutsch’s “proof” of the many-worlds interpretation of quantum mechanics is not a mathematical proof but is rather a philosophical proof.  It holds no sway over how physicists do the math to make their predictions.  The Copenhagen interpretation, with its “spooky” instantaneous wavefunction collapse, works just fine predicting the outcome of quantum algorithms and the exponential quantum advantage of quantum computing.  Therefore, the story of David Deutsch and the MWI may seem like a chimera—except for one fact—it inspired him to generate the first quantum algorithm that launched what may be the next revolution in the information revolution of modern society.  Inspiration is important in science, because it lets scientists create things that had been impossible before. 

But if quantum interference is the heart of quantum computing, then there is one physical system that has the ultimate simplicity that may yet inspire future generations of physicists to invent future impossible things—the quantum beam splitter.  Nothing in the study of quantum interference can be simpler than a sliver of dielectric material sending single photons one way or another.  Yet the outcome of this simple system challenges the mind and reminds us of why Everett and Deutsch embraced the MWI in the first place.

The Classical Beam Splitter

The so-called “beam splitter” is actually a misnomer.  Its name implies that it takes a light beam and splits it into two, as if there is only one input.  But every “beam splitter” has two inputs, which is clear by looking at the classical 50/50 beam splitter shown in Fig. 1.  The actual action of the optical element is the combination of beams into superpositions in each of the outputs. It is only when one of the input fields is zero, a special case, that the optical element acts as a beam splitter.  In general, it is a beam combiner.

Given two input fields, the output fields are superpositions of the inputs

The square-root of two factor ensures that energy is conserved, because optical fluence is the square of the fields.  This relation is expressed more succinctly as a matrix input-output relation

The phase factors in these equations ensure that the matrix is unitary

reflecting energy conservation.

The Quantum Beam Splitter

A quantum beam splitter is just a classical beam splitter operating at the level of individual photons.  Rather than describing single photons entering or leaving the beam splitter, it is more practical to describe the properties of the fields through single-photon quantum operators

where the unitary matrix is the same as the classical case, but with fields replaced by the famous “a” operators.  The photon operators operate on single photon modes.  For instance, the two one-photon input cases are

where the creation operators operate on the vacuum state in each of the input modes.

The fundamental combinational properties of the beam splitter are even more evident in the quantum case, because there is no such thing as a single input to a quantum beam splitter.  Even if no photons are directed into one of the input ports, that port still receives a “vacuum” input, and this vacuum input contributes to the fluctuations observed in the outputs.

The input-output relations for the quantum beam splitter are

The beam splitter operating on a one-photon input converts the input-mode creation operator into a superposition of out-mode creation operators that generates

The resulting output is entangled: either the single photon exits one port, or it exits the other.  In the many worlds interpretation, the photon exits from one port in one universe, and it exits from the other port in a different universe.  On the other hand, in the Copenhagen interpretation, the two output ports of the beam splitter are perfectly anti-correlated.

Fig. 1  Quantum Operations of a Beam Splitter.  A beam splitter creates a quantum superposition of the input modes.  The a-symbols are quantum number operators that create and annihilate photons.  A single-photon input produces an entangled output that is a quantum superposition of the photon coming out of one output or the other.

The Hong-Ou-Mandel (HOM) Interferometer

When more than one photon is incident on a beam splitter, the fascinating effects of quantum interference come into play, creating unexpected outputs for simple inputs.  For instance, the simplest example is a two photon input where a single photon is present in each input port of the beam splitter.  The input state is represented with single creation operators operating on each vacuum state of each input port

creating a single photon in each of the input ports. The beam splitter operates on this input state by converting the input-mode creation operators into out-put mode creation operators to give

The important step in this process is the middle line of the equations: There is perfect destructive interference between the two single-photon operations.  Therefore, both photons always exit the beam splitter from the same port—never split.  Furthermore, the output is an entangled two-photon state, once more splitting universes.

Fig. 2  The HOM interferometer.  A two-photon input on a beam splitter generates an entangled superposition of the two photons exiting the beam splitter always together.

The two-photon interference experiment was performed in 1987 by Chung Ki Hong and Jeff Ou, students of Leonard Mandel at the Optics Institute at the University of Rochester [3], and this two-photon operation of the beam splitter is now called the HOM interferometer. The HOM interferometer has become a center-piece for optical and photonic implementations of quantum information processing and quantum computers.

N-Photons on a Beam Splitter

Of course, any number of photons can be input into a beam splitter.  For example, take the N-photon input state

The beam splitter acting on this state produces

The quantity on the right hand side can be re-expressed using the binomial theorem

where the permutations are defined by the binomial coefficient

The output state is given by

which is a “super” entangled state composed of N multi-photon states, involving N different universes.

Surprisingly, there is a multi-photon input state that generates a non-entangled output—as if the input states were simply classical fields.  These are the so-called coherent states, introduced by Glauber and Sudarshan [4, 5].  Coherent states can be described as superpositions of multi-photon states, but when a beam splitter operates on these superpositions, the outputs are simply 50/50 mixtures of the states.  For instance, if the input scoherent tates are denoted by a and b, then the output states after the beam splitter are

This output is factorized and hence is NOT entangled.  This is one of the many reasons why coherent states in quantum optics are considered the “most classical” of quantum states.  In this case, a quantum beam splitter operates on the inputs just as if they were classical fields.


References

[1] D. Deutsch, “Quantum-theory, the church-turing principle and the universal quantum computer,” Proceedings of the Royal Society of London Series a-Mathematical Physical and Engineering Sciences, vol. 400, no. 1818, pp. 97-117, (1985)

[2] D. Deutsch and R. Jozsa, “Rapid solution of problems by quantum computation,” Proceedings of the Royal Society of London Series a-Mathematical Physical and Engineering Sciences, vol. 439, no. 1907, pp. 553-558, Dec (1992)

[3] C. K. Hong, Z. Y. Ou, and L. Mandel, “Measurement of subpicosecond time intervals between 2 photons by interference,” Physical Review Letters, vol. 59, no. 18, pp. 2044-2046, Nov (1987)

[4] Glauber, R. J. (1963). “Photon Correlations.” Physical Review Letters 10(3): 84.

[5] Sudarshan, E. C. G. (1963). “Equivalence of semiclassical and quantum mechanical descriptions of statistical light beams.” Physical Review Letters 10(7): 277-&.; Mehta, C. L. and E. C. Sudarshan (1965). “Relation between quantum and semiclassical description of optical coherence.” Physical Review 138(1B): B274.

Post-Modern Machine Learning: The Deep Revolution

The mysteries of human intelligence are among the final frontiers of science. Despite our pride in what science has achieved across the past century, we have stalled when it comes to understanding intelligence or emulating it. The best we have done so far is through machine learning — harnessing the computational power of computers to begin to mimic the mind, attempting to answer the essential question:

How do we get machines to Know what we Know?

In modern machine learning, the answer is algorithmic.

In post-modern machine learning, the answer is manifestation.

The algorithms of modern machine learning are cause and effect, rules to follow, producing only what the programmer can imagine. But post-modern machine learning has blown past explicit algorithms to embrace deep networks. Deep networks today are defined by neural networks with thousands, or tens of thousands, or even hundreds of thousands, of neurons arrayed in multiple layers of dense connections. The interactivity of so many crossing streams of information defies direct deconstruction of what the networks are doing — they are post-modern. Their outputs manifest themselves, self-assembling into simplified structures and patterns and dependencies that are otherwise buried unseen in complicated data.

Fig. 1 A representation of a deep network with three fully-connected hidden layers. Deep networks are typically three or more layers deep, but each layer can have thousands of neurons. (Figure from the TowardsDataScience blog.)

Deep learning emerged as recently as 2006 and has opened wide new avenues of artificial intelligence that move beyond human capabilities for some tasks.  But deep learning also has pitfalls, some of which are inherited from the legacy approaches of traditional machine learning, and some of which are inherent in the massively high-dimensional spaces in which deep learning takes place.  Nonetheless, deep learning has revolutionized many aspects of science, and there is reason for optimism that the revolution will continue. Fifty years from now, looking back, we may recognize this as the fifth derivative of the industrial revolution (Phase I: Steam. Phase II: Electricity. Phase III: Automation. Phase IV: Information. Phase V: Intelligence).

From Multivariate Analysis to Deep Learning

Conventional machine learning, as we know it today, has had many names.  It began with Multivariate Analysis of mathematical population dynamics around the turn of the last century, pioneered by Francis Galton (1874), Karl Pearson (1901), Charles Spearman (1904) and Ronald Fisher (1922) among others.

The first on-line computers during World War II were developed to quickly calculate the trajectories of enemy aircraft for gunnery control, introducing the idea of feedback control of machines. This was named Cybernetics by Norbert Wiener, who had participated in the development of automated control of antiaircraft guns.

Table I. Evolution of Names for Machine Learning

A decade later, during the Cold War, it became necessary to find hidden objects in large numbers of photographs.  The embryonic electronic digital computers of the day were far too slow with far too little memory to do the task, so the Navy contracted with the Cornell Aeronautical Laboratory in Cheektowaga, New York, a suburb of Buffalo, to create an analog computer capable of real-time image analysis.  This led to the invention of the Perceptron by Frank Rosenblatt as the first neural network-inspired computer [1], building on ideas of neural logic developed by Warren McColloch and Walter Pitts. 

Fig. 2 Frank Rosenblatt working on the Perceptron. (From the Cornell Chronicle)
Fig. 3 Rosenblatt’s conceptual design of the connectionism of the perceptron (1958).

Several decades passed with fits and starts as neural networks remained too simple to accomplish anything profound.  Then in 1986, David Rumelhart and Ronald Williams at UC San Diego with Geoff Hinton at Carnegie-Mellon discovered a way to train multiple layers of neurons, in a process called error back propagation [2].  This publication opened the floodgates of Connectionism — also known as Parallel Distributed Processing.  The late 80’s and much of the 90’s saw an expansion of interest in neural networks, until the increasing size of the networks ran into limits caused by the processing speed and capacity of conventional computers towards the end of the decade.  During this time it had become clear that the most interesting computations required many layers of many neurons, and the number of neurons expanded into the thousands, but it was difficult to train such systems that had tens of thousands of adjustable parameters, and research in neural networks once again went into a lull.

The beginnings of deep learning started with two breakthroughs.  The first was by Yann Lecun at Bell Labs in 1998 who developed, with Leon Bottou, Yoshua Bengio and Patrick Haffner, a Convolutional Neural Network that had seven layers of neurons that classified hand-written digits [3]. The second was from Geoff Hinton in 2006, by then at the University of Toronto, who discovered a fast learning algorithm for training deep layers of neurons [4].  By the mid 2010’s, research on neural networks was hotter than ever, propelled in part by several very public successes, such as Deep Mind’s machine that beat the best player in the world at Go in 2017, self-driving cars, personal assistants like Siri and Alexa, and YouTube recommendations.

The Challenges of Deep Learning

Deep learning today is characterized by neural network architectures composed of many layers of many neurons.  The nature of deep learning brings with it two main challenges:  1) efficient training of the neural weights, and 2) generalization of trained networks to perform accurately on previously unseen data inputs.

Solutions to the first challenge, efficient training, are what allowed the deep revolution in the first place—the result of a combination of increasing computer power with improvements in numerical optimization. This included faster personal computers that allowed nonspecialists to work with deep network programming environments like Matlab’s Deep Learning toolbox and Python’s TensorFlow.

Solutions to the second challenge, generalization, rely heavily on a process known as “regularization”. The term “regularization” has a slippery definition, an obscure history, and an awkward sound to it. Regularization is the noun form of the verb “to regularize” or “to make regular”. Originally, regularization was used to keep certain inverse algorithms from blowing up, like inverse convolutions, also known as deconvolution. Direct convolution is a simple mathematical procedure that “blurs” ideal data based on the natural response of a measuring system. However, if one has experimental data, one might want to deconvolve the system response from the data to recover the ideal data. But this procedure is numerically unstable and can “blow up”, often because of the divide-by-zero problem. Regularization was a simple technique that kept denominators from going to zero.

Regularization became a common method for inverse problems that are notorious to solve because of the many-to-one mapping that can occur in measurement systems. There can be a number of possible causes that produce a single response. Regularization was a way of winnowing out “unphysical” solutions so that the physical inverse solution remained.

During the same time, regularization became a common tool used by quantum field theorists to prevent certain calculated quantities from diverging to infinity. The solution was again to keep denominators from going to zero by setting physical cut-off lengths on the calculations. These cut-offs were initially ad hoc, but the development of renormalization group theory by Kenneth Wilson at Cornell (Nobel Prize in 1982) provided a systematic approach to solving the infinities of quantum field theory.

With the advent of neural networks, having hundreds to thousands to millions of adjustable parameters, regularization became the catch-all term for fighting the problem of over-fitting. Over-fitting occurs when there are so many adjustable parameters that any training data can be fit, and the neural network becomes just a look-up table. Look-up tables are the ultimate hash code, but they have no predictive capability. If a slightly different set of data are fed into the network, the output can be anything. In over-fitting, there is no generalization, the network simply learns the idiosyncrasies of the training data without “learning” the deeper trends or patterns that would allow it to generalize to handle different inputs.

Over the past decades a wide collection of techniques have been developed to reduce over-fitting of neural networks. These techniques include early stopping, k-fold holdout, drop-out, L1 and L2 weight-constraint regularization, as well as physics-based constraints. The goal of all of these techniques is to keep neural nets from creating look-up tables and instead learning the deep codependencies that exist within complicated data.

Table II. Regularization Techniques in Machine Learning

By judicious application of these techniques, combined with appropriate choices of network design, amazingly complex problems can be solved by deep networks and they can generalized too (to some degree). As the field moves forward, we may expect additional regularization tricks to improve generalization, and design principles will emerge so that the networks no longer need to be constructed by trial and error.

The Potential of Deep Learning

In conventional machine learning, one of the most critical first steps performed on a dataset has been feature extraction. This step is complicated and difficult, especially when the signal is buried either in noise or in confounding factors (also known as distractors). The analysis is often highly sensitive to the choice of features, and the selected features may not even be the right ones, leading to bad generalization. In deep learning, feature extraction disappears into the net itself. Optimizing the neural weights subject to appropriate constraints forces the network to find where the relevant information lies and what to do with it.

The key to finding the right information was not just having many neurons, but having many layers, which is where the potential of deep learning emerges. It is as if each successive layer is learning a more abstract or more generalized form of the information than the last. This hierarchical layering is most evident in the construction of convolutional deep networks, where the layers are receiving a telescoping succession of information fields from lower levels. Geoff Hinton‘s Deep Belief Network, which launched the deep revolution in 2006, worked explicitly with this hierarchy in mind through direct design of the network architecture. Since then, network architecture has become more generalized, with less up-front design while relying on the increasingly sophisticated optimization techniques of training to set the neural weights. For instance, a simplified instance of a deep network is shown in Fig. 4 with three hidden layers of neurons.

Fig. 4 General structure of a deep network with three hidden layers. Layers will typically have hundreds or thousands of neurons. Each gray line represents a weight value, and each circle is a neural activation function.

The mathematical structure of a deep network is surprisingly simple. The equations for the network in Fig. 4, that convert an input xa to an output ye, are

These equations use index notation to denote vectors (single superscript) and matrices (double indexes). The repeated index (one up and one down) denotes an implicit “Einstein” summation. The function φ(.) is known as the activation function, which is nonlinear. One of the simplest activation functions to use and analyze, and the current favorite, is known as the ReLU (for rectifying linear unit). Note that these equations represent a simple neural cascade, as the output of one layer becomes the input for the next.

The training of all the matrix elements assumes a surprisingly simply optimization function, known as an objective function or a loss function, that can look like

where the first term is the mean squared error of the network output ye relative to the desired output y0 for the training set, and the second term, known as a regularization term (see section above) is a quadratic form that keeps the weights from blowing up. This loss function is minimized over the set of adjustable matrix weights.

The network in Fig. 4 is just a toy, with only 5 inputs and 5 outputs and only 23 neurons. But it has 30+36+36+30+23 = 155 adjustable weights. If this seems like overkill, but it is nothing compared to neural networks with thousands of neurons per layer and tens of layers. That massive overkill is exactly the power of deep learning — as well as its pitfall.

The Pitfalls of Deep Learning

Despite the impressive advances in deep learning, serious pitfalls remain for practitioners. One of the most challenging problems in deep learning is the saddle-point problem. A saddle-point in an objective function is like a mountain pass through the mountains: at the top of the pass it slopes downward in two opposite directions into the valleys but slopes upward in the two orthogonal directions to mountain peaks. A saddle point is an unstable equilibrium where a slight push this way or that can lead the traveller to two very different valleys separated by high mountain ridges. In our familiar three-dimensional space, saddle points are relatively rare and landscapes are dominated by valleys and mountaintops. But this intuition about landscapes fails in high dimensions.

Landscapes in high dimensions are dominated by neutral ridges that span the domain of the landscape. This key understanding about high-dimensional space actually came from the theory of evolutionary dynamics for the evolution of species. In the early days of evolutionary dynamics, there was a serious challenge to understand how genetic mutations could allow such diverse speciation to occur. If the fitness of a species were viewed as a fitness landscape, and if a highly-adapted species were viewed as a mountain peak in this landscape, then genetic mutations would need to drive the population state point into “valleys of death” that would need to be crossed to arrive at a neighboring fitness peak. It would seem that genetic mutations would likely kill off the species in the valleys before they could rise to the next equilibrium.

However, the geometry of high dimensions does not follow this simple low-dimensional intuition. As more dimensions come available, landscapes have more and more ridges of relatively constant height that span the full space (See my recent blog on random walks in 10-dimensions and my short YouTube video). For a species to move from one fitness peak to another fitness peak in a fitness landscape (in ultra-high-dimensional mutation space), all that is needed is for a genetic mutation to step the species off of the fitness peak onto a neutral ridge where many mutations can keep the species on that ridge as it moves ever farther away from the last fitness peak. Eventually, the neutral ridge brings the species near a new fitness peak where it can climb to the top, creating a new stable species. The point is that most genetic mutations are neutral — they do not impact the survivability of an individual. This is known as the neutral network theory of evolution proposed by Motoo Kimura (1924 – 1994) [5]. As these mutation accumulate, the offspring can get genetically far from the progenitor. And when a new fitness peak comes near, many of the previously neutral mutations can come together and become a positive contribution to fitness as the species climbs the new fitness peak.

The neutral network of genetic mutation was a paradigm shift in the field of evolutionary dynamics, and it also taught everyone how different random walks in high-dimensional spaces are from random walks in 3D. But although neutral networks solved the evolution problem, they become a two-edged sword in machine learning. On the positive side, fitness peaks are just like the minima of objective functions, and the ability for partial solutions to perform random walks along neutral ridges in the objective-function space allows optimal solutions to be found across a broad range of the configuration space of the neural weights. However, on the negative side, ridges are loci of unstable equilibrium. Hence there are always multiple directions that a solution state can go to minimize the objective function. Each successive run of a deep-network neural weight optimizer can find equivalent optimal solutions — but they each can be radically different. There is no hope of averaging the weights of an ensemble of networks to arrive at an “average” deep network. The averaging would simply drive all weights to zero. Certainly, the predictions of an ensemble of equivalently trained networks can be averaged—but this does not illuminate what is happening “under the hood” of the machine, which is where our own “understanding” of what the network is doing would come from.

Post-Modern Machine Learning

Post-modernism is admittedly kind of a joke — it works so hard to pull down every aspect of human endeavor that it falls victim to its own methods. The convoluted arguments made by its proponents sound like ultimate slacker talk — circuitous logic circling itself in an endless loop of denial.

But post-modernism does have its merits. It surfs on the moving crest of what passes as modernism, as modernism passes onward to its next phase. The philosophy of post-modernism moves beyond rationality in favor of a subjectivism in which cause and effect are blurred.  For instance, in post-modern semiotic theory, a text or a picture is no longer an objective element of reality, but fragments into multiple semiotic versions, each one different for each different reader or watcher — a spectrum of collaborative efforts between each consumer and the artist. The reader brings with them a unique set of life experiences that interact with the text to create an entirely new experience in each reader’s mind.

Deep learning is post-modern in the sense that deterministic algorithms have disappeared. Instead of a traceable path of sequential operations, neural nets scramble information into massively-parallel strings of partial information that cross and interact nonlinearly with other massively-parallel strings. It is difficult to impossible to trace any definable part of the information from input to output. The output simply manifests some aspect of the data that was hidden from human view.

But the Age of Artificial Intelligence is not here yet. The vast multiplicity of saddle ridges in high dimensions is one of the drivers for one of the biggest pitfalls of deep learning — the need for massive amounts of training data. Because there are so many adjustable parameters in a neural network, and hence so many dimensions, a tremendous amount of training data is required to train a network to convergence. This aspect of deep learning stands in strong contrast to human children who can be shown a single picture of a bicycle next to a tricycle, and then they can achieve almost perfect classification accuracy when shown any number of photographs of different bicycles and tricycles. Humans can generalize with an amazingly small amount of data, while deep networks often need thousands of examples. This example alone points to the marked difference between human intelligence and the current state of deep learning. There is still a long road ahead.


[1] F. Rosenblatt, “THE PERCEPTRON – A PROBABILISTIC MODEL FOR INFORMATION-STORAGE AND ORGANIZATION IN THE BRAIN,” Psychological Review, vol. 65, no. 6, pp. 386-408, (1958)

[2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “LEARNING REPRESENTATIONS BY BACK-PROPAGATING ERRORS,” Nature, vol. 323, no. 6088, pp. 533-536, Oct (1986)

[3] LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). “Gradient-based learning applied to document recognition”. Proceedings of the IEEE. 86 (11): 2278–2324.

[4] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527-1554, Jul (2006)

[5] M. Kimura, The Neutral Theory of Molecular Evolution. Cambridge University Press, 1968.