Posts by David D. Nolte

E. M. Purcell Distinguished Professor of Physics and Astronomy at Purdue University

Is There a Quantum Trajectory?

Heisenberg’s uncertainty principle is a law of physics – it cannot be violated under any circumstances, no matter how much we may want it to yield or how hard we try to bend it.  Heisenberg, as he developed his ideas after his lone epiphany like a monk on the isolated island of Helgoland off the north coast of Germany in 1925, became a bit of a zealot, like a religious convert, convinced that all we can say about reality is a measurement outcome.  In his view, there was no independent existence of an electron other than what emerged from a measuring apparatus.  Reality, to Heisenberg, was just a list of numbers in a spread sheet—matrix elements.  He took this line of reasoning so far that he stated without exception that there could be no such thing as a trajectory in a quantum system.  When the great battle commenced between Heisenberg’s matrix mechanics against Schrödinger’s wave mechanics, Heisenberg was relentless, denying any reality to Schrödinger’s wavefunction other than as a calculation tool.  He was so strident that even Bohr, who was on Heisenberg’s side in the argument, advised Heisenberg to relent [1].  Eventually a compromise was struck, as Heisenberg’s uncertainty principle allowed Schrödinger’s wave functions to exist within limits—his uncertainty limits.

Disaster in the Poconos

Yet the idea of an actual trajectory of a quantum particle remained a type of heresy within the close quantum circles.  Years later in 1948, when a young Richard Feynman took the stage at a conference in the Poconos, he almost sabotaged his career in front of Bohr and Dirac—two of the giants who had invented quantum mechanics—by having the audacity to talk about particle trajectories in spacetime diagrams.

Feynman was making his first presentation of a new approach to quantum mechanics that he had developed based on path integrals. The challenge was that his method relied on space-time graphs in which “unphysical” things were allowed to occur.  In fact, unphysical things were required to occur, as part of the sum over many histories of his path integrals.  For instance, a key element in the approach was allowing electrons to travel backwards in time as positrons, or a process in which the electron and positron annihilate into a single photon, and then the photon decays back into an electron-positron pair—a process that is not allowed by mass and energy conservation.  But this is a possible history that must be added to Feynman’s sum.

It all looked like nonsense to the audience, and the talk quickly derailed.  Dirac pestered him with questions that he tried to deflect, but Dirac persisted like a raven.  A question was raised about the Pauli exclusion principle, about whether an orbital could have three electrons instead of the required two, and Feynman said that it could—all histories were possible and had to be summed over—an answer that dismayed the audience.  Finally, as Feynman was drawing another of his space-time graphs showing electrons as lines, Bohr rose to his feet and asked derisively whether Feynman had forgotten Heisenberg’s uncertainty principle that made it impossible to even talk about an electron trajectory.

It was hopeless.  The audience gave up and so did Feynman as the talk just fizzled out.  It was a disaster.  What had been meant to be Feynman’s crowning achievement and his entry to the highest levels of theoretical physics, had been a terrible embarrassment.  He slunk home to Cornell where he sank into one of his depressions.  At the close of the Pocono conference, Oppenheimer, the reigning king of physics, former head of the successful Manhattan Project and newly selected to head the prestigious Institute for Advanced Study at Princeton, had been thoroughly disappointed by Feynman.

But what Bohr and Dirac and Oppenheimer had failed to understand was that as long as the duration of unphysical processes was shorter than the energy differences involved, then it was literally obeying Heisenberg’s uncertainty principle.  Furthermore, Feynman’s trajectories—what became his famous “Feynman Diagrams”—were meant to be merely cartoons—a shorthand way to keep track of lots of different contributions to a scattering process.  The quantum processes certainly took place in space and time, conceptually like a trajectory, but only so far as time durations, and energy differences and locations and momentum changes were all within the bounds of the uncertainty principle.  Feynman had invented a bold new tool for quantum field theory, able to supply deep results quickly.  But no one at the Poconos could see it.

Fig. 1 The first Feynman diagram.

Coherent States

When Feynman had failed so miserably at the Pocono conference, he had taken the stage after Julian Schwinger, who had dazzled everyone with his perfectly scripted presentation of quantum field theory—the competing theory to Feynman’s.  Schwinger emerged the clear winner of the contest.  At that time, Roy Glauber (1925 – 2018) was a young physicist just taking his PhD from Schwinger at Harvard, and he later received a post-doc position at Princeton’s Institute for Advanced Study where he became part of a miniature revolution in quantum field theory that revolved around—not Schwinger’s difficult mathematics—but Feynman’s diagrammatic method.  So Feynman won in the end.  Glauber then went on to Caltech, where he filled in for Feynman’s lectures when Feynman was off in Brazil playing the bongos.  Glauber eventually returned to Harvard where he was already thinking about the quantum aspects of photons in 1956 when news of the photon correlations in the Hanbury-Brown Twiss (HBT) experiment were published.  Three years later, when the laser was invented, he began developing a theory of photon correlations in laser light that he suspected would be fundamentally different than in natural chaotic light. 

Because of his background in quantum field theory, and especially quantum electrodynamics, it was fairly easy to couch the quantum optical properties of coherent light in terms of Dirac’s creation and annihilation operators of the electromagnetic field. Glauber developed a “coherent state” operator that was a minimum uncertainty state of the quantized electromagnetic field, related to the minimum-uncertainty wave functions derived initially by Schrödinger in the late 1920’s. The coherent state represents a laser operating well above the lasing threshold and behaved as “the most classical” wavepacket that can be constructed.  Glauber was awarded the Nobel Prize in Physics in 2005 for his work on such “Glauber states” in quantum optics.

Fig. 2 Roy Glauber

Quantum Trajectories

Glauber’s coherent states are built up from the natural modes of a harmonic oscillator.  Therefore, it should come as no surprise that these coherent-state wavefunctions in a harmonic potential behave just like classical particles with well-defined trajectories. The quadratic potential matches the quadratic argument of the the Gaussian wavepacket, and the pulses propagate within the potential without broadening, as in Fig. 3, showing a snapshot of two wavepackets propagating in a two-dimensional harmonic potential. This is a somewhat radical situation, because most wavepackets in most potentials (or even in free space) broaden as they propagate. The quadratic potential is a special case that is generally not representative of how quantum systems behave.

Fig. 3 Harmonic potential in 2D and two examples of pairs of pulses propagating without broadening. The wavepackets in the center are oscillating in line, and the wavepackets on the right are orbiting the center of the potential in opposite directions. (Movies of the quantum trajectories can be viewed at Physics Unbound.)

To illustrate this special status for the quadratic potential, the wavepackets can be launched in a potential with a quartic perturbation. The quartic potential is anharmonic—the frequency of oscillation depends on the amplitude of oscillation unlike for the harmonic oscillator, where amplitude and frequency are independent. The quartic potential is integrable, like the harmonic oscillator, and there is no avenue for chaos in the classical analog. Nonetheless, wavepackets broaden as they propagate in the quartic potential, eventually spread out into a ring in the configuration space, as in Fig. 4.

Fig. 4 Potential with a quartic corrections. The initial gaussian pulses spread into a “ring” orbiting the center of the potential.

A potential with integrability has as many conserved quantities to the motion as there are degrees of freedom. Because the quartic potential is integrable, the quantum wavefunction may spread, but it remains highly regular, as in the “ring” that eventually forms over time. However, integrable potentials are the exception rather than the rule. Most potentials lead to nonintegrable motion that opens the door to chaos.

A classic (and classical) potential that exhibits chaos in a two-dimensional configuration space is the famous Henon-Heiles potential. This has a four-dimensional phase space which admits classical chaos. The potential has a three-fold symmetry which is one reason it is non-integral, since a particle must “decide” which way to go when it approaches a saddle point. In the quantum regime, wavepackets face the same decision, leading to a breakup of the wavepacket on top of a general broadening. This allows the wavefunction eventually to distribute across the entire configuration space, as in Fig. 5.

Fig. 5 The Henon-Heiles two-dimensional potential supports Hamiltonian chaos in the classical regime. In the quantum regime, the wavefunction spreads to eventually fill the accessible configuration space (for constant energy).

Youtube Video

Movies of quantum trajectories can be viewed at my Youtube Channel, Physics Unbound. The answer to the question “Is there a quantum trajectory?” can be seen visually as the movies run—they do exist in a very clear sense under special conditions, especially coherent states in a harmonic oscillator. And the concept of a quantum trajectory also carries over from a classical trajectory in cases when the classical motion is integrable, even in cases when the wavefunction spreads over time. However, for classical systems that display chaotic motion, wavefunctions that begin as coherent states break up into chaotic wavefunctions that fill the accessible configuration space for a given energy. The character of quantum evolution of coherent states—the most classical of quantum wavefunctions—in these cases reflects the underlying character of chaotic motion in the classical analogs. This process can be seen directly watching the movies as a wavepacket approaches a saddle point in the potential and is split. Successive splits of the multiple wavepackets as they interact with the saddle points is what eventually distributes the full wavefunction into its chaotic form.

Therefore, the idea of a “quantum trajectory”, so thoroughly dismissed by Heisenberg, remains a phenomenological guide that can help give insight into the behavior of quantum systems—both integrable and chaotic.

As a side note, the laws of quantum physics obey time-reversal symmetry just as the classical equations do. In the third movie of “A Quantum Ballet“, wavefunctions in a double-well potential are tracked in time as they start from coherent states that break up into chaotic wavefunctions. It is like watching entropy in action as an ordered state devolves into a disordered state. But at the half-way point of the movie, the imaginary part of the wavefunction has its sign flipped, and the dynamics continue. But now the wavefunctions move from disorder into an ordered state, seemingly going against the second law of thermodynamics. Flipping the sign of the imaginary part of the wavefunction at just one instant in time plays the role of a time-reversal operation, and there is no violation of the second law.

References

[1] See Chapter 8 , On the Quantum Footpath, in Galileo Unbound, D. D. Nolte (Oxford University Press, 2018)

[2] J. R. Nagel, A Review and Application of the Finite-Difference Time-Domain Algorithm Applied to the Schrödinger Equation, ACES Journal, Vol. 24, NO. 1, pp. 1-8 (2009)

Quantum Chaos and the Cheshire Cat

Alice’s disturbing adventures in Wonderland tumbled upon her like a string of accidents as she wandered a world of chaos.  Rules were never what they seemed and shifted whenever they wanted.  She even met a cat who grinned ear-to-ear and could disappear entirely, or almost entirely, leaving only its grin.

The vanishing Cheshire Cat reminds us of another famous cat—Arnold’s Cat—that introduced the ideas of stretching and folding of phase-space volumes in non-integrable Hamiltonian systems.  But when Arnold’s Cat becomes a Quantum Cat, a central question remains: What happens to the chaotic behavior of the classical system … does it survive the transition to quantum mechanics?  The answer is surprisingly like the grin of the Cheshire Cat—the cat vanishes, but the grin remains.  In the quantum world of the Cheshire Cat, the grin of the classical cat remains even after the rest of the cat vanished. 

The Cheshire Cat fades away, leaving only its grin, like a fine filament, as classical chaos fades into quantum, leaving behind a quantum scar.

The Quantum Mechanics of Classically Chaotic Systems

The simplest Hamiltonian systems are integrable—they have as many constants of the motion as degrees of freedom.  This holds for quantum systems as well as for classical.  There is also a strong correspondence between classical and quantum systems for the integrable cases—literally the Correspondence Principle—that states that quantum systems at high quantum number approach classical behavior.  Even at low quantum numbers, classical resonances are mirrored by quantum eigenfrequencies that can show highly regular spectra.

But integrable systems are rare—surprisingly rare.  Almost no real-world Hamiltonian system is integrable, because the real world warps the ideal.  No spring can displace indefinitely, and no potential is perfectly quadratic.  There are always real-world non-idealities that destroy one constant of the motion or another, opening the door to chaos.

When classical Hamiltonian systems become chaotic, they don’t do it suddenly.  Almost all transitions to chaos in Hamiltonian systems are gradual.  One of the best examples of this is the KAM theory that starts with invariant action integrals that generate invariant tori in phase space.  As nonintegrable perturbations increase, the tori break up slowly into island chains of stability as chaos infiltrates the separatrixes—first as thin filaments of chaos surrounding the islands—then growing in width to take up more and more of phase space.  Even when chaos is fully developed, small islands of stability can remain—the remnants of stable orbits of the unperturbed system.

When the classical becomes quantum, chaos softens.  Quantum wave functions don’t like to be confined—they spread and they tunnel.  The separatrix of classical chaos—that barrier between regions of phase space—cannot constrain the exponential tails of wave functions.  And the origin of chaos itself—the homoclinic point of the separatrix—gets washed out.  Then the regular orbits of the classical system reassert themselves, and they appear, like the vestige of the Cheshire Cat, as a grin.

The Quantum Circus

The empty stadium is a surprisingly rich dynamical system that has unexpected structure in both the classical and the quantum domain.  Its importance in classical dynamics comes from the fact that its periodic orbits are unstable and its non-periodic orbits are ergodic (filling all available space if given long enough).  The stadium itself is empty so that particles (classical or quantum) are free to propagate between reflections from the perfectly-reflecting walls of the stadium.  The ergodicity comes from the fact that the stadium—like a classic Roman chariot-race stadium, also known as a circus—is not a circle, but has a straight stretch between two half circles.  This simple modification takes the stable orbits of the circle into the unstable orbits of the stadium.

A single classical orbit in a stadium is shown in Fig 1. This is an ergodic orbit that is non-periodic and eventually would fill the entire stadium space. There are other orbits that are nearly periodic, such as one that bounces back and forth vertically between the linear portions, but even this orbit will eventually wander into the circular part of the stadium and then become ergodic. The big quantum-classical question is what happens to these classical orbits when the stadium is shrunk to the nanoscale?

Fig. 1 A classical trajectory in a stadium. It will eventually visit every point, a property known as ergodicity.

Simulating an evolving quantum wavefunction in free space is surprisingly simple. Given a beginning quantum wavefunction A(x,y,t0), the discrete update equation is

Perfect reflection from the boundaries of the stadium are incorporated through imposing a boundary condition that sends the wavefunction to zero. Simple!

A snap-shot of a wavefunction evolving in the stadium is shown in Fig. 2. To see a movie of the time evolution, see my YouTube episode.

Fig. 2 Snapshot of a quantum wavefunction in the stadium. (From YouTube)

The time average of the wavefunction after a long time has passed is shown in Fig. 3. Other than the horizontal nodal line down the center of the stadium, there is little discernible structure or symmetry. This is also true for the mean squared wavefunction shown in Fig. 4, although there is some structure that may be emerging in the semi-circular regions.

Fig. 3 Time-average wavefunction after a long time.
Fig. 4 Time-average of the squared wavefunction after a long time.

On the other hand, for special initial conditions that have a lot of symmetry, something remarkable happens. Fig. 5 shows several mean-squared results for special initial conditions. There is definite structure in these cases that were given the somewhat ugly name “quantum scars” in the 1980’s by Eric Heller who was one of the first to study this phenomenon [1].

Fig. 5 Quantum scars reflect periodic (but unstable) orbits of the classical system. Quantum effects tend to quench chaos and favor regular motion.

One can superpose highly-symmetric classical trajectories onto the figures, as shown in the bottom row. All of these classical orbits go through a high-symmetry point, such as the center of the stadium (on the left image) and through the focal point of the circular mirrors (in the other two images). The astonishing conclusion of this exercise is that the highly-symmetric periodic classical orbits remain behind as quantum scars—like the Cheshire Cat’s grin—when the system is in the quantum realm. The classical orbits that produce quantum scars have the important property of being periodic but unstable. A slight perturbation from the symmetric trajectory causes it to eventually become ergodic (chaotic). These scars are regions with enhanced probability density, what might be termed “quantum trajectories”, but do not show strong interference patterns.

It is important to make the distinction that it is also possible to construct special wavefunctions that are strictly periodic, such as a wave bouncing perfectly vertically between the straight portions. This leads to large-scale interference patterns that are not the same as the quantum scars.

Quantum Chaos versus Laser Speckle

In addition to the bouncing-wave cases that do not strictly produce quantum scars, there is another “neutral” phenomenon that produces interference patterns that look a lot like scars, but are simply the random addition of lots of plane waves with the same wavelength [2]. A snapshot in time of one of these superpositions is shown in Fig. 6. To see how the waves add together, see the YouTube channel episode.

Fig. 6 The sum of 100 randomly oriented plane waves of constant wavelength. (A snapshot from YouTube.)

References

[1] Heller E J, Bound-state eigenfunctions of classically chaotic hamiltonian-systems – scars of periodic-orbits, Physical Review Letters 53 ,1515 (1984)

[2] Gutzwiller M C, Chaos in classical and quantum mechanics (New York: New York : Springer-Verlag, 1990)

George Cantor meets Machine Learning: Deep Discrete Encoders

Machine learning is characterized, more than by any other aspect, by the high dimensionality of the data spaces it seeks to find patterns in.  Hence, one of the principle functions of machine learning is to reduce the dimensionality of the data to lower dimensions—a process known as dimensionality reduction.

There are two driving reasons to reduce the dimensionality of data: 

First, typical dimensionalities faced by machine learning problems can be in the hundreds or thousands.  Trying to visualize such high dimensions may sound mind expanding, but it is really just saying that a data problem may have hundreds or thousands of different data entries for a single event.  And many, or even most, of those entries may not be independent.  While many others may be pure noise—or at least not relevant to the pattern.  Deep learning dimensionality reduction seeks to find the dependences—many of them nonlinear and non-single-valued (non-invertible)—and to reject the noise channels.

Second, the geometry of high dimension is highly unintuitive.  Many of the things we take for granted in our pitifully low dimension of 3 (or 4 if you include time) just don’t hold in high dimensions.  For instance, in very high dimensions almost all random vectors in a hyperspace are orthogonal, and almost all random unit vectors in the hyperspace are equidistant.  Even the topology of landscapes in high dimensions is unintuitive—there are far more mountain ridges than mountain peaks—with profound consequences for dynamical processes such as random walks (see my Blog on a Random Walk in 10 Dimensions).  In fact, we owe our evolutionary existence to this effect!  Therefore, deep dimensionality reduction is a way to bring complex data down to a dimensionality where our intuition can be applied to “explain” the data.

But what is a dimension?  And can you find the “right” dimensionality when performing dimensionality reduction?  Once again, our intuition struggles with these questions, as first discovered by a nineteenth-century German mathematician whose mind-expanding explorations of the essence of different types of infinity shattered the very concept of dimension.

George Cantor

Georg Cantor (1845 – 1918) was born in Russia, and the family moved to Germany while Cantor was still young.  In 1863, he enrolled at the University of Berlin where he sat on lectures by Weierstrass and Kronecker.  He received his doctorate in 1867 and his Habilitation in 1869, moving into a faculty position at the University of Halle and remaining there for the rest of his career.  Cantor published a paper early in 1872 on the question of whether the representation of an arbitrary function by a Fourier series is unique.  He had found that even though the series might converge to a function almost everywhere, there surprisingly could still be an infinite number of points where the convergence failed.  Originally, Cantor was interested in the behavior of functions at these points, but his interest soon shifted to the properties of the points themselves, which became his life’s work as he developed set theory and transfinite mathematics.

Georg Cantor (1845 – 1918)

In 1878, in a letter to his friend Richard Dedekind, Cantor showed that there was a one-to-one correspondence between the real numbers and the points in any n-dimensional space.  He was so surprised by his own result that he wrote to Dedekind “I see it, but I don’t believe it.”  Previously, the ideas of a dimension, moving in a succession from one (a line) to two (a plane) to three (a volume) had been absolute.  However, with his newly-discovered mapping, the solid concepts of dimension and dimensionality began to dissolve.  This was just the beginning of a long history of altered concepts of dimension (see my Blog on the History of Fractals).

Mapping Two Dimensions to One

Cantor devised a simple mapping that is at once obvious and subtle. To take a specific example of mapping a plane to a line, one can consider the two coordinate values (x,y) both with a closed domain on [0,1]. Each can be expressed as a decimal fraction given by

These two values can be mapped to a single value through a simple “ping-pong” between the decimal digits as

If x and y are irrational, then this presents a simple mapping of a pair of numbers (two-dimensional coordinates) to a single number. In this way, the special significance of two dimensions relative to one dimension is lost. In terms of set theory nomenclature, the cardinality of the two-dimensional R2 is the same as for the one-dimensions R.

Nonetheless, intuition about dimension still has it merits, and a subtle aspect of this mapping is that it contains discontinuities. These discontinuities are countably infinite, but they do disrupt any smooth transformation from 2D to 1D, which is where the concepts of intrinsic dimension are hidden. The topological dimension of the plane is clearly equal to 2, and that of the line is clearly equal to 1, determined by the dimensionality D+1 of cuts that are required to separate the sets. The area is separated by a D = 1 line, while the line is separated by a D= 0 point.

Fig. 1 A mapping of a 2D plane to a 1D line. Every point in the plane has an associated point on the line (except for a countably infinite number of special points … see the next figure.)

While the discontinuities help preserve the notions of dimension, and they are countably infinite (with the cardinality of the rational numbers), they pertain merely to the representation of number by decimals. As an example, in decimal notation for a number a = 14159/105, one has two equivalent representations

trailing either infinitely many 0’s or infinitely many 9’s. Despite the fact that these representations point to the same number, when it is used as one of the pairs in the bijection of Fig. 1, it produces two distinctly different numbers of t in R. Fortunately, there is a way to literally sweep this under the rug. Any time one retrieves a number that has repeating 0’s or 9’s, simply sweep it to the right it by dividing by a power of 10 to a region of trailing zeros for a different number, call it b, as in Fig. 2. The shifted version of a will not overlap with the number b, because b also ends in repeating 0’s or 9’s and so is swept farther to the right, and so on to infinity, so none of these numbers overlap, each is distinct, and the mapping becomes what is known as a bijection with one-to-one correspondence.

Fig. 3 The “measure-zero” fix. Numbers that have two equivalent representations can be shifted to the right to replace other numbers that are shifted further to the right, and so on to infinity. There is infinite room within the reals to accommodate the countably infinite number of repeating-decimal numbers.

Space-Filling Curves

When the peripatetic mathematician Guiseppe Peano learned of Cantor’s result for the mapping of 2D to 1D, he sought to demonstrate the correspondence geometrically, and he constructed a continuous curve that filled space, publishing the method in Sur une courbe, qui remplit toute une aire plane [1] in 1890.  The construction of Peano’s curve proceeds by taking a square and dividing it into 9 equal sub squares.  Lines connect the centers of each of the sub squares.  Then each sub square is divided again into 9 sub squares whose centers are all connected by lines.  At this stage, the original pattern, repeated 9 times, is connected together by 8 links, forming a single curve.  This process is repeated infinitely many times, resulting in a curve that passes through every point of the original plane square.  In this way, a line is made to fill a plane.  Where Cantor had proven abstractly that the cardinality of the real numbers was the same as the points in n-dimensional space, Peano created a specific example. 

This was followed quickly by another construction, invented by David Hilbert in 1891, that divided the square into four instead of nine, simplifying the construction, but also showing that such constructions were easily generated [2].  The space-filling curves of Peano and Hilbert have the extreme properties that a one-dimensional curve approaches every point in a two-dimensional space.  These curves have topological dimensionality of 1D and a fractal dimensionality of 2D. 

Fig. 3 Peano’s (1890) and Hilbert’s (1891) plane-filling curves.  When the iterations are taken to infinity, the curves approach every point of two-dimensional space arbitrarily closely, giving them a dimension D = 2.

A One-Dimensional Deep Discrete Encoder for MNIST

When performing dimensionality reduction in deep learning it is tempting to think that there is an underlying geometry to the data—as if they reside on some submanifold of the original high-dimensional space.  While this can be the case (for which dimensionality reduction is called deep metric learning) more often there is no such submanifold.  For instance, when there is a highly conditional nature to the different data channels, in which some measurements are conditional on the values of other measurements, then there is no simple contiguous space that supports the data.

It is also tempting to think that a deep learning problem has some intrinsic number of degrees of freedom which should be matched by the selected latent dimensionality for the dimensionality reduction.  For instance, if there are roughly five degrees of freedom buried within a complicated data set, then it is tempting to believe that the appropriate latent dimensionality also should be chosen to be five.  But this also is missing the mark. 

Take, for example, the famous MNIST data set of hand-written digits from 0 to 9.  Each digit example is contained in a 28-by28 two-dimensional array that is typically flattened to a 784-element linear vector that locates that single digit example within a 784-dimensional space.  The goal is to reduce the dimensionality down to a manageable number—but what should the resulting latent dimensionality be?  How many degrees of freedom are involved in writing digits?  Furthermore, given that there are ten classes of digits, should the chosen dimensionality of the latent space be related to the number of classes?

Fig. 4 A sampling of MNIST digits.

To illustrate that the dimensionality of the latent space has little to do with degrees of freedom or the number of classes, let’s build a simple discrete encoder that encodes the MNIST data set to the one-dimensional line—following Cantor.

The deep network of the encoder can have a simple structure that terminates with a single neuron that has a piece-wise linear output.  The objective function (or loss function) measures the squared distances of the outputs of the single neuron, after transformation by the network, to the associated values 0 to 9.  And that is it!  Train the network by minimizing the loss function.

Fig. 5 A deep encoder that encodes discrete classes to a one-dimensional latent variable.

The results of the linear encoder are shown in Fig. 6 (the transverse directions are only for ease of visualization…the classification is along the long axis). The different dots are specific digits, and the colors are the digit value. There is a clear trend from 1 through 10, although with this linear encoder there is substantial overlap among the point clouds.

Fig. 6 Latent space for a one-dimensional (line) encoder. The transverse dimensions are only for visualization.

Despite appearances, this one-dimensional discrete line encoder is NOT a form of regression. There is no such thing as 1.5 as the average of a 1-digit and a 2-digit. And 5 is not the average of a 4-digit and a 6-digit. The digits are images that have no intrinsic numerical value. Therefore, this one-dimensional encoder is highly discontinuous, and intuitive ideas about intrinsic dimension for continuous data remain secure.

Summary

The purpose of this Blog (apart from introducing Cantor and his ideas on dimension theory) was to highlight a key question of dimensionality reduction in representation theory: What is the intrinsic dimensionality of a dataset? The answer, in the case of discrete classes, is that there is no intrinsic dimensionality. Having 1 degree of freedom or 10 degrees of freedom, i.e. latent dimensionalities of 1 or 10, is mostly irrelevant. In ideal cases, one is just as good as the other.

On the other hand, for real-world data with its inevitable variability and finite variance, there can be reasons to choose one latent dimensionality over another. In fact, the “best” choice of dimensionality is one less than the number of classes. For instance, in the case of MNIST with its 10 classes, that is a 9-dimensional latent space. The reason this is “best” has to do with the geometry of high-dimensional simplexes, which will be the topic of a future Blog.


[1] G.Peano: Sur une courbe, qui remplit toute une aire plane. Mathematische Annalen 36 (1890), 157–160.

[2] D. Hilbert, “Uber die stetige Abbildung einer Linie auf ein Fllichenstilck,” Mathemutische Anna/en, vol. 38, pp. 459-460, (1891)

The Anharmonic Harmonic Oscillator

Harmonic oscillators are one of the fundamental elements of physical theory.  They arise so often in so many different contexts that they can be viewed as a central paradigm that spans all aspects of physics.  Some famous physicists have been quoted to say that the entire universe is composed of simple harmonic oscillators (SHO).

Despite the physicist’s love affair with it, the SHO is pathological! First, it has infinite frequency degeneracy which makes it prone to the slightest perturbation that can tip it into chaos, in contrast to non-harmonic cyclic dynamics that actually protects us from the chaos of the cosmos (see my Blog on Chaos in the Solar System). Second, the SHO is nowhere to be found in the classical world.  Linear oscillators are purely harmonic, with a frequency that is independent of amplitude—but no such thing exists!  All oscillators must be limited, or they could take on infinite amplitude and infinite speed, which is nonsense.  Even the simplest of simple harmonic oscillators would be limited by nothing other than the speed of light.  Relativistic effects would modify the linearity, especially through time dilation effects, rendering the harmonic oscillator anharmonic.

Despite the physicist’s love affair with it, the SHO is pathological!

Therefore, for students of physics as well as practitioners, it is important to break the shackles imposed by the SHO and embrace the anharmonic harmonic oscillator as the foundation of physics. Here is a brief survey of several famous anharmonic oscillators in the history of physics, followed by the mathematical analysis of the relativistic anharmonic linear-spring oscillator.

Anharmonic Oscillators

Anharmonic oscillators have a long venerable history with many varieties.  Many of these have become central models in systems as varied as neural networks, synchronization, grandfather clocks, mechanical vibrations, business cycles, ecosystem populations and more.

Christiaan Huygens

Already by the mid 1600’s Christiaan Huygens (1629 – 1695) knew that the pendulum becomes slower when it has larger amplitudes.  The pendulum was one of the best candidates for constructing an accurate clock needed for astronomical observations and for the determination of longitude at sea.  Galileo (1564 – 1642) had devised the plans for a rudimentary pendulum clock that his son attempted to construct, but the first practical pendulum clock was invented and patented by Huygens in 1657.  However, Huygens’ modified verge escapement required his pendulum to swing with large amplitudes, which brought it into the regime of anharmonicity. The equations of the simple pendulum are truly simple, but the presence of the sinθ makes it the simplest anharmonic oscillator.

Therefore, Huygens searched for the mathematical form of a tautochrone curve for the pendulum (a curve that is traversed with equal times independently of amplitude) and in the process he invented the involutes and evolutes of a curve—precursors of the calculus.  The answer to the tautochrone question is a cycloid (see my Blog on Huygen’s Tautochrone Curve).

Hermann von Helmholtz

Hermann von Helmholtz (1821 – 1894) was possibly the greatest German physicist of his generation—an Einstein before Einstein—although he began as a medical doctor.  His study of muscle metabolism, drawing on the early thermodynamic work of Carnot, Clapeyron and Joule, led him to explore and to express the conservation of energy in its clearest form.  Because he postulated that all forms of physical processes—electricity, magnetism, heat, light and mechanics—contributed to the interconversion of energy, he sought to explore them all, bringing his research into the mainstream of physics.  His laboratory in Berlin became world famous, attracting to his laboratory the early American physicists Henry Rowland (founder and first president of the American Physical Society) and Albert Michelson (first American Nobel prize winner).

Even the simplest of simple harmonic oscillators would be limited by nothing other than the speed of light.  

Helmholtz also pursued a deep interest in the physics of sensory perception such as sound.  This research led to his invention of the Helmholtz oscillator which is a highly anharmonic relaxation oscillator in which a tuning fork was placed near an electromagnet that was powered by a mercury switch attached to the fork. As the tuning fork vibrated, the mercury came in and out of contact with it, turning on and off the magnet, which fed back on the tuning fork, and so on, enabling the device, once started, to continue oscillating without interruption. This device is called a tuning-fork resonator, and it became the first door-bell buzzers.  (These are not to be confused with Helmholtz resonances that are formed when blowing across the open neck of a beer bottle.)

Lord Rayleigh

Baron John Strutt, the Lord Rayleigh (1842 – 1919) like Helmholtz also was a generalist and had a strong interest in the physics of sound.  He was inspired by Helmholtz’ oscillator to consider general nonlinear anharmonic oscillators mathematically.  He was led to consider the effects of anharmonic terms added to the harmonic oscillator equation.  in a paper published in the Philosophical Magazine issue of 1883 with the title On Maintained Vibrations, he introduced an equation to describe the self-oscillation by adding an extra term to a simple harmonic oscillator. The extra term depended on the cube of the velocity, representing a balance between the gain of energy from a steady force and natural dissipation by friction.  Rayleigh suggested that this equation applied to a wide range of self-oscillating systems, such as violin strings, clarinet reeds, finger glasses, flutes, organ pipes, among others (see my Blog on Rayleigh’s Harp.)

Georg Duffing

The first systematic study of quadratic and cubic deviations from the harmonic potential was performed by the German engineer George Duffing (1861 – 1944) under the conditions of a harmonic drive. The Duffing equation incorporates inertia, damping, the linear spring and nonlinear deviations.

Fig. 1 The Duffing equation adds a nonlinear term to the spring force when alpha is positive, stiffening or weakening it for larger excursions when beta is positive or negative, respectively. And by making alpha negative and beta positive, it describes a damped driven double-well potential.

Duffing confirmed his theoretical predictions with careful experiments and established the lowest-order corrections to ideal masses on springs. His work was rediscovered in the 1960’s after Lorenz helped launch numerical chaos studies. Duffing’s driven potential becomes especially interesting when α is negative and β is positive, creating a double-well potential. The driven double-well is a classic chaotic system (see my blog on Duffing’s Oscillator).

Balthasar van der Pol

Autonomous oscillators are one of the building blocks of complex systems, providing the fundamental elements for biological oscillatorsneural networksbusiness cyclespopulation dynamics, viral epidemics, and even the rings of Saturn.  The most famous autonomous oscillator (after the pendulum clock) is named for a Dutch physicist, Balthasar van der Pol (1889 – 1959), who discovered the laws that govern how electrons oscillate in vacuum tubes, but the dynamical system that he developed has expanded to become the new paradigm of cyclic dynamical systems to replace the SHO (see my Blog on GrandFather Clocks.)

Fig. 2 The van der Pol equation is the standard simple harmonic oscillator with a gain term that saturates for large excursions leading to a limit cycle oscillator.

Turning from this general survey, let’s find out what happens when special relativity is added to the simplest SHO [1].

Relativistic Linear-Spring Oscillator

The theory of the relativistic one-dimensional linear-spring oscillator starts from a relativistic Lagrangian of a free particle (with no potential) yielding the generalized relativistic momentum

The Lagrangian that accomplishes this is [2]

where the invariant 4-velocity is

When the particle is in a potential, the Lagrangian becomes

The action integral that is minimized is

and the Lagrangian for integration of the action integral over proper time is

The relativistic modification in the potential energy term of the Lagrangian is not in the spring constant, but rather is purely a time dilation effect.  This is captured by the relativistic Lagrangian

where the dot is with respect to proper time τ. The classical potential energy term in the Lagrangian is multiplied by the relativistic factor γ, which is position dependent because of the non-constant speed of the oscillator mass.  The Euler-Lagrange equations are

where the subscripts in the variables are a = 0, 1 for the time and space dimensions, respectively.  The derivative of the time component of the 4-vector is

From the derivative of the Lagrangian with respect to speed, the following result is derived

where E is the constant total relativistic energy.  Therefore,

which provides an expression for the derivative of the coordinate time with respect to the proper time where

The position-dependent γ(x) factor is then

The Euler-Lagrange equation with a = 1 is

which gives

providing the flow equations for the (an)harmonic oscillator with respect to proper time

This flow represents a harmonic oscillator modified by the γ(x) factor, due to time dilation, multiplying the spring force term.  Therefore, at relativistic speeds, the oscillator is no longer harmonic even though the spring constant remains truly a constant.  The term in parentheses effectively softens the spring for larger displacement, and hence the frequency of oscillation becomes smaller. 

The state-space diagram of the anharmonic oscillator is shown in Fig. 3 with respect to proper time (the time read on a clock co-moving with the oscillator mass).  At low energy, the oscillator is harmonic with a natural period of the SHO.  As the maximum speed exceeds β = 0.8, the period becomes longer and the trajectory less sinusoidal.  The position and speed for β = 0.9999 is shown in Fig. 4.  The mass travels near the speed of light as it passes the origin, producing significant time dilation at that instant.  The average time dilation through a single cycle is about a factor of three, despite the large instantaneous γ = 70 when the mass passes the origin.

Fig. 3 State-space diagram in relativistic units relative to proper time of a relativistic (an)harmonic oscillator with a constant spring constant for several relative speeds β. The anharmonicity becomes pronounced above β = 0.8.
Fig. 4 Position and speed in relativistic units relative to proper time of a relativistic (an)harmonic oscillator with a constant spring constant for β = 0.9999.  The period of oscillation in this simulation is nearly three times longer than the natural frequency at small amplitudes.

[1] W. Moreau, R. Easther, and R. Neutze, “RELATIVISTIC (AN)HARMONIC OSCILLATOR,” American Journal of Physics, Article vol. 62, no. 6, pp. 531-535, Jun (1994)

[2] D. D. Nolte, Introduction to Modern Dynamics: Chaos, Networks, Space and Time, 2nd. ed. (Oxford University Press, 2019)

The Many Worlds of the Quantum Beam Splitter

In one interpretation of quantum physics, when you snap your fingers, the trajectory you are riding through reality fragments into a cascade of alternative universes—one for each possible quantum outcome among all the different quantum states composing the molecules of your fingers. 

This is the Many-Worlds Interpretation (MWI) of quantum physics first proposed rigorously by Hugh Everett in his doctoral thesis in 1957 under the supervision of John Wheeler at Princeton University.  Everett had been drawn to this interpretation when he found inconsistencies between quantum physics and gravitation—topics which were supposed to have been his actual thesis topic.  But his side-trip into quantum philosophy turned out to be a one-way trip.  The reception of his theory was so hostile, no less than from Copenhagen and Bohr himself, that Everett left physics and spent a career at the Pentagon.

Resurrecting MWI in the Name of Quantum Information

Fast forward by 20 years, after Wheeler had left Princeton for the University of Texas at Austin, and once again a young physicist was struggling to reconcile quantum physics with gravity.  Once again the many worlds interpretation of quantum physics seemed the only sane way out of the dilemma, and once again a side-trip became a life-long obsession.

David Deutsch, visiting Wheeler in the early 1980’s, became convinced that the many worlds interpretation of quantum physics held the key to paradoxes in the theory of quantum information.  He was so convinced, that he began a quest to find a physical system that operated on more information than could be present in one universe at a time.  If such a physical system existed, it would be because streams of information from more than one universe were coming together and combining in a way that allowed one of the universes to “borrow” the information from the other.

It took only a year or two before Deutsch found what he was looking for—a simple quantum algorithm that yielded twice as much information as would be possible if there were no parallel universes.  This is the now-famous Deutsch algorithm—the first quantum algorithm [1].  At the heart of the Deutsch algorithm is a simple quantum interference.  The algorithm did nothing useful—but it convinced Deutsch that two universes were interfering coherently in the measurement process, giving that extra bit of information that should not have been there otherwise.  A few years later, the Deutsch-Josza algorithm [2] expanded the argument to interfere an exponentially larger amount of information streams from an exponentially larger number of universes to create a result that was exponentially larger than any classical computer could produce.  This marked the beginning of the quest for the quantum computer that is running red-hot today.

Deutsch’s “proof” of the many-worlds interpretation of quantum mechanics is not a mathematical proof but is rather a philosophical proof.  It holds no sway over how physicists do the math to make their predictions.  The Copenhagen interpretation, with its “spooky” instantaneous wavefunction collapse, works just fine predicting the outcome of quantum algorithms and the exponential quantum advantage of quantum computing.  Therefore, the story of David Deutsch and the MWI may seem like a chimera—except for one fact—it inspired him to generate the first quantum algorithm that launched what may be the next revolution in the information revolution of modern society.  Inspiration is important in science, because it lets scientists create things that had been impossible before. 

But if quantum interference is the heart of quantum computing, then there is one physical system that has the ultimate simplicity that may yet inspire future generations of physicists to invent future impossible things—the quantum beam splitter.  Nothing in the study of quantum interference can be simpler than a sliver of dielectric material sending single photons one way or another.  Yet the outcome of this simple system challenges the mind and reminds us of why Everett and Deutsch embraced the MWI in the first place.

The Classical Beam Splitter

The so-called “beam splitter” is actually a misnomer.  Its name implies that it takes a light beam and splits it into two, as if there is only one input.  But every “beam splitter” has two inputs, which is clear by looking at the classical 50/50 beam splitter shown in Fig. 1.  The actual action of the optical element is the combination of beams into superpositions in each of the outputs. It is only when one of the input fields is zero, a special case, that the optical element acts as a beam splitter.  In general, it is a beam combiner.

Given two input fields, the output fields are superpositions of the inputs

The square-root of two factor ensures that energy is conserved, because optical fluence is the square of the fields.  This relation is expressed more succinctly as a matrix input-output relation

The phase factors in these equations ensure that the matrix is unitary

reflecting energy conservation.

The Quantum Beam Splitter

A quantum beam splitter is just a classical beam splitter operating at the level of individual photons.  Rather than describing single photons entering or leaving the beam splitter, it is more practical to describe the properties of the fields through single-photon quantum operators

where the unitary matrix is the same as the classical case, but with fields replaced by the famous “a” operators.  The photon operators operate on single photon modes.  For instance, the two one-photon input cases are

where the creation operators operate on the vacuum state in each of the input modes.

The fundamental combinational properties of the beam splitter are even more evident in the quantum case, because there is no such thing as a single input to a quantum beam splitter.  Even if no photons are directed into one of the input ports, that port still receives a “vacuum” input, and this vacuum input contributes to the fluctuations observed in the outputs.

The input-output relations for the quantum beam splitter are

The beam splitter operating on a one-photon input converts the input-mode creation operator into a superposition of out-mode creation operators that generates

The resulting output is entangled: either the single photon exits one port, or it exits the other.  In the many worlds interpretation, the photon exits from one port in one universe, and it exits from the other port in a different universe.  On the other hand, in the Copenhagen interpretation, the two output ports of the beam splitter are perfectly anti-correlated.

Fig. 1  Quantum Operations of a Beam Splitter.  A beam splitter creates a quantum superposition of the input modes.  The a-symbols are quantum number operators that create and annihilate photons.  A single-photon input produces an entangled output that is a quantum superposition of the photon coming out of one output or the other.

The Hong-Ou-Mandel (HOM) Interferometer

When more than one photon is incident on a beam splitter, the fascinating effects of quantum interference come into play, creating unexpected outputs for simple inputs.  For instance, the simplest example is a two photon input where a single photon is present in each input port of the beam splitter.  The input state is represented with single creation operators operating on each vacuum state of each input port

creating a single photon in each of the input ports. The beam splitter operates on this input state by converting the input-mode creation operators into out-put mode creation operators to give

The important step in this process is the middle line of the equations: There is perfect destructive interference between the two single-photon operations.  Therefore, both photons always exit the beam splitter from the same port—never split.  Furthermore, the output is an entangled two-photon state, once more splitting universes.

Fig. 2  The HOM interferometer.  A two-photon input on a beam splitter generates an entangled superposition of the two photons exiting the beam splitter always together.

The two-photon interference experiment was performed in 1987 by Chung Ki Hong and Jeff Ou, students of Leonard Mandel at the Optics Institute at the University of Rochester [3], and this two-photon operation of the beam splitter is now called the HOM interferometer. The HOM interferometer has become a center-piece for optical and photonic implementations of quantum information processing and quantum computers.

N-Photons on a Beam Splitter

Of course, any number of photons can be input into a beam splitter.  For example, take the N-photon input state

The beam splitter acting on this state produces

The quantity on the right hand side can be re-expressed using the binomial theorem

where the permutations are defined by the binomial coefficient

The output state is given by

which is a “super” entangled state composed of N multi-photon states, involving N different universes.

Surprisingly, there is a multi-photon input state that generates a non-entangled output—as if the input states were simply classical fields.  These are the so-called coherent states, introduced by Glauber and Sudarshan [4, 5].  Coherent states can be described as superpositions of multi-photon states, but when a beam splitter operates on these superpositions, the outputs are simply 50/50 mixtures of the states.  For instance, if the input scoherent tates are denoted by a and b, then the output states after the beam splitter are

This output is factorized and hence is NOT entangled.  This is one of the many reasons why coherent states in quantum optics are considered the “most classical” of quantum states.  In this case, a quantum beam splitter operates on the inputs just as if they were classical fields.


References

[1] D. Deutsch, “Quantum-theory, the church-turing principle and the universal quantum computer,” Proceedings of the Royal Society of London Series a-Mathematical Physical and Engineering Sciences, vol. 400, no. 1818, pp. 97-117, (1985)

[2] D. Deutsch and R. Jozsa, “Rapid solution of problems by quantum computation,” Proceedings of the Royal Society of London Series a-Mathematical Physical and Engineering Sciences, vol. 439, no. 1907, pp. 553-558, Dec (1992)

[3] C. K. Hong, Z. Y. Ou, and L. Mandel, “Measurement of subpicosecond time intervals between 2 photons by interference,” Physical Review Letters, vol. 59, no. 18, pp. 2044-2046, Nov (1987)

[4] Glauber, R. J. (1963). “Photon Correlations.” Physical Review Letters 10(3): 84.

[5] Sudarshan, E. C. G. (1963). “Equivalence of semiclassical and quantum mechanical descriptions of statistical light beams.” Physical Review Letters 10(7): 277-&.; Mehta, C. L. and E. C. Sudarshan (1965). “Relation between quantum and semiclassical description of optical coherence.” Physical Review 138(1B): B274.

Post-Modern Machine Learning: The Deep Revolution

The mysteries of human intelligence are among the final frontiers of science. Despite our pride in what science has achieved across the past century, we have stalled when it comes to understanding intelligence or emulating it. The best we have done so far is through machine learning — harnessing the computational power of computers to begin to mimic the mind, attempting to answer the essential question:

How do we get machines to Know what we Know?

In modern machine learning, the answer is algorithmic.

In post-modern machine learning, the answer is manifestation.

The algorithms of modern machine learning are cause and effect, rules to follow, producing only what the programmer can imagine. But post-modern machine learning has blown past explicit algorithms to embrace deep networks. Deep networks today are defined by neural networks with thousands, or tens of thousands, or even hundreds of thousands, of neurons arrayed in multiple layers of dense connections. The interactivity of so many crossing streams of information defies direct deconstruction of what the networks are doing — they are post-modern. Their outputs manifest themselves, self-assembling into simplified structures and patterns and dependencies that are otherwise buried unseen in complicated data.

Fig. 1 A representation of a deep network with three fully-connected hidden layers. Deep networks are typically three or more layers deep, but each layer can have thousands of neurons. (Figure from the TowardsDataScience blog.)

Deep learning emerged as recently as 2006 and has opened wide new avenues of artificial intelligence that move beyond human capabilities for some tasks.  But deep learning also has pitfalls, some of which are inherited from the legacy approaches of traditional machine learning, and some of which are inherent in the massively high-dimensional spaces in which deep learning takes place.  Nonetheless, deep learning has revolutionized many aspects of science, and there is reason for optimism that the revolution will continue. Fifty years from now, looking back, we may recognize this as the fifth derivative of the industrial revolution (Phase I: Steam. Phase II: Electricity. Phase III: Automation. Phase IV: Information. Phase V: Intelligence).

From Multivariate Analysis to Deep Learning

Conventional machine learning, as we know it today, has had many names.  It began with Multivariate Analysis of mathematical population dynamics around the turn of the last century, pioneered by Francis Galton (1874), Karl Pearson (1901), Charles Spearman (1904) and Ronald Fisher (1922) among others.

The first on-line computers during World War II were developed to quickly calculate the trajectories of enemy aircraft for gunnery control, introducing the idea of feedback control of machines. This was named Cybernetics by Norbert Wiener, who had participated in the development of automated control of antiaircraft guns.

Table I. Evolution of Names for Machine Learning

A decade later, during the Cold War, it became necessary to find hidden objects in large numbers of photographs.  The embryonic electronic digital computers of the day were far too slow with far too little memory to do the task, so the Navy contracted with the Cornell Aeronautical Laboratory in Cheektowaga, New York, a suburb of Buffalo, to create an analog computer capable of real-time image analysis.  This led to the invention of the Perceptron by Frank Rosenblatt as the first neural network-inspired computer [1], building on ideas of neural logic developed by Warren McColloch and Walter Pitts. 

Fig. 2 Frank Rosenblatt working on the Perceptron. (From the Cornell Chronicle)
Fig. 3 Rosenblatt’s conceptual design of the connectionism of the perceptron (1958).

Several decades passed with fits and starts as neural networks remained too simple to accomplish anything profound.  Then in 1986, David Rumelhart and Ronald Williams at UC San Diego with Geoff Hinton at Carnegie-Mellon discovered a way to train multiple layers of neurons, in a process called error back propagation [2].  This publication opened the floodgates of Connectionism — also known as Parallel Distributed Processing.  The late 80’s and much of the 90’s saw an expansion of interest in neural networks, until the increasing size of the networks ran into limits caused by the processing speed and capacity of conventional computers towards the end of the decade.  During this time it had become clear that the most interesting computations required many layers of many neurons, and the number of neurons expanded into the thousands, but it was difficult to train such systems that had tens of thousands of adjustable parameters, and research in neural networks once again went into a lull.

The beginnings of deep learning started with two breakthroughs.  The first was by Yann Lecun at Bell Labs in 1998 who developed, with Leon Bottou, Yoshua Bengio and Patrick Haffner, a Convolutional Neural Network that had seven layers of neurons that classified hand-written digits [3]. The second was from Geoff Hinton in 2006, by then at the University of Toronto, who discovered a fast learning algorithm for training deep layers of neurons [4].  By the mid 2010’s, research on neural networks was hotter than ever, propelled in part by several very public successes, such as Deep Mind’s machine that beat the best player in the world at Go in 2017, self-driving cars, personal assistants like Siri and Alexa, and YouTube recommendations.

The Challenges of Deep Learning

Deep learning today is characterized by neural network architectures composed of many layers of many neurons.  The nature of deep learning brings with it two main challenges:  1) efficient training of the neural weights, and 2) generalization of trained networks to perform accurately on previously unseen data inputs.

Solutions to the first challenge, efficient training, are what allowed the deep revolution in the first place—the result of a combination of increasing computer power with improvements in numerical optimization. This included faster personal computers that allowed nonspecialists to work with deep network programming environments like Matlab’s Deep Learning toolbox and Python’s TensorFlow.

Solutions to the second challenge, generalization, rely heavily on a process known as “regularization”. The term “regularization” has a slippery definition, an obscure history, and an awkward sound to it. Regularization is the noun form of the verb “to regularize” or “to make regular”. Originally, regularization was used to keep certain inverse algorithms from blowing up, like inverse convolutions, also known as deconvolution. Direct convolution is a simple mathematical procedure that “blurs” ideal data based on the natural response of a measuring system. However, if one has experimental data, one might want to deconvolve the system response from the data to recover the ideal data. But this procedure is numerically unstable and can “blow up”, often because of the divide-by-zero problem. Regularization was a simple technique that kept denominators from going to zero.

Regularization became a common method for inverse problems that are notorious to solve because of the many-to-one mapping that can occur in measurement systems. There can be a number of possible causes that produce a single response. Regularization was a way of winnowing out “unphysical” solutions so that the physical inverse solution remained.

During the same time, regularization became a common tool used by quantum field theorists to prevent certain calculated quantities from diverging to infinity. The solution was again to keep denominators from going to zero by setting physical cut-off lengths on the calculations. These cut-offs were initially ad hoc, but the development of renormalization group theory by Kenneth Wilson at Cornell (Nobel Prize in 1982) provided a systematic approach to solving the infinities of quantum field theory.

With the advent of neural networks, having hundreds to thousands to millions of adjustable parameters, regularization became the catch-all term for fighting the problem of over-fitting. Over-fitting occurs when there are so many adjustable parameters that any training data can be fit, and the neural network becomes just a look-up table. Look-up tables are the ultimate hash code, but they have no predictive capability. If a slightly different set of data are fed into the network, the output can be anything. In over-fitting, there is no generalization, the network simply learns the idiosyncrasies of the training data without “learning” the deeper trends or patterns that would allow it to generalize to handle different inputs.

Over the past decades a wide collection of techniques have been developed to reduce over-fitting of neural networks. These techniques include early stopping, k-fold holdout, drop-out, L1 and L2 weight-constraint regularization, as well as physics-based constraints. The goal of all of these techniques is to keep neural nets from creating look-up tables and instead learning the deep codependencies that exist within complicated data.

Table II. Regularization Techniques in Machine Learning

By judicious application of these techniques, combined with appropriate choices of network design, amazingly complex problems can be solved by deep networks and they can generalized too (to some degree). As the field moves forward, we may expect additional regularization tricks to improve generalization, and design principles will emerge so that the networks no longer need to be constructed by trial and error.

The Potential of Deep Learning

In conventional machine learning, one of the most critical first steps performed on a dataset has been feature extraction. This step is complicated and difficult, especially when the signal is buried either in noise or in confounding factors (also known as distractors). The analysis is often highly sensitive to the choice of features, and the selected features may not even be the right ones, leading to bad generalization. In deep learning, feature extraction disappears into the net itself. Optimizing the neural weights subject to appropriate constraints forces the network to find where the relevant information lies and what to do with it.

The key to finding the right information was not just having many neurons, but having many layers, which is where the potential of deep learning emerges. It is as if each successive layer is learning a more abstract or more generalized form of the information than the last. This hierarchical layering is most evident in the construction of convolutional deep networks, where the layers are receiving a telescoping succession of information fields from lower levels. Geoff Hinton‘s Deep Belief Network, which launched the deep revolution in 2006, worked explicitly with this hierarchy in mind through direct design of the network architecture. Since then, network architecture has become more generalized, with less up-front design while relying on the increasingly sophisticated optimization techniques of training to set the neural weights. For instance, a simplified instance of a deep network is shown in Fig. 4 with three hidden layers of neurons.

Fig. 4 General structure of a deep network with three hidden layers. Layers will typically have hundreds or thousands of neurons. Each gray line represents a weight value, and each circle is a neural activation function.

The mathematical structure of a deep network is surprisingly simple. The equations for the network in Fig. 4, that convert an input xa to an output ye, are

These equations use index notation to denote vectors (single superscript) and matrices (double indexes). The repeated index (one up and one down) denotes an implicit “Einstein” summation. The function φ(.) is known as the activation function, which is nonlinear. One of the simplest activation functions to use and analyze, and the current favorite, is known as the ReLU (for rectifying linear unit). Note that these equations represent a simple neural cascade, as the output of one layer becomes the input for the next.

The training of all the matrix elements assumes a surprisingly simply optimization function, known as an objective function or a loss function, that can look like

where the first term is the mean squared error of the network output ye relative to the desired output y0 for the training set, and the second term, known as a regularization term (see section above) is a quadratic form that keeps the weights from blowing up. This loss function is minimized over the set of adjustable matrix weights.

The network in Fig. 4 is just a toy, with only 5 inputs and 5 outputs and only 23 neurons. But it has 30+36+36+30+23 = 155 adjustable weights. If this seems like overkill, but it is nothing compared to neural networks with thousands of neurons per layer and tens of layers. That massive overkill is exactly the power of deep learning — as well as its pitfall.

The Pitfalls of Deep Learning

Despite the impressive advances in deep learning, serious pitfalls remain for practitioners. One of the most challenging problems in deep learning is the saddle-point problem. A saddle-point in an objective function is like a mountain pass through the mountains: at the top of the pass it slopes downward in two opposite directions into the valleys but slopes upward in the two orthogonal directions to mountain peaks. A saddle point is an unstable equilibrium where a slight push this way or that can lead the traveller to two very different valleys separated by high mountain ridges. In our familiar three-dimensional space, saddle points are relatively rare and landscapes are dominated by valleys and mountaintops. But this intuition about landscapes fails in high dimensions.

Landscapes in high dimensions are dominated by neutral ridges that span the domain of the landscape. This key understanding about high-dimensional space actually came from the theory of evolutionary dynamics for the evolution of species. In the early days of evolutionary dynamics, there was a serious challenge to understand how genetic mutations could allow such diverse speciation to occur. If the fitness of a species were viewed as a fitness landscape, and if a highly-adapted species were viewed as a mountain peak in this landscape, then genetic mutations would need to drive the population state point into “valleys of death” that would need to be crossed to arrive at a neighboring fitness peak. It would seem that genetic mutations would likely kill off the species in the valleys before they could rise to the next equilibrium.

However, the geometry of high dimensions does not follow this simple low-dimensional intuition. As more dimensions come available, landscapes have more and more ridges of relatively constant height that span the full space (See my recent blog on random walks in 10-dimensions and my short YouTube video). For a species to move from one fitness peak to another fitness peak in a fitness landscape (in ultra-high-dimensional mutation space), all that is needed is for a genetic mutation to step the species off of the fitness peak onto a neutral ridge where many mutations can keep the species on that ridge as it moves ever farther away from the last fitness peak. Eventually, the neutral ridge brings the species near a new fitness peak where it can climb to the top, creating a new stable species. The point is that most genetic mutations are neutral — they do not impact the survivability of an individual. This is known as the neutral network theory of evolution proposed by Motoo Kimura (1924 – 1994) [5]. As these mutation accumulate, the offspring can get genetically far from the progenitor. And when a new fitness peak comes near, many of the previously neutral mutations can come together and become a positive contribution to fitness as the species climbs the new fitness peak.

The neutral network of genetic mutation was a paradigm shift in the field of evolutionary dynamics, and it also taught everyone how different random walks in high-dimensional spaces are from random walks in 3D. But although neutral networks solved the evolution problem, they become a two-edged sword in machine learning. On the positive side, fitness peaks are just like the minima of objective functions, and the ability for partial solutions to perform random walks along neutral ridges in the objective-function space allows optimal solutions to be found across a broad range of the configuration space of the neural weights. However, on the negative side, ridges are loci of unstable equilibrium. Hence there are always multiple directions that a solution state can go to minimize the objective function. Each successive run of a deep-network neural weight optimizer can find equivalent optimal solutions — but they each can be radically different. There is no hope of averaging the weights of an ensemble of networks to arrive at an “average” deep network. The averaging would simply drive all weights to zero. Certainly, the predictions of an ensemble of equivalently trained networks can be averaged—but this does not illuminate what is happening “under the hood” of the machine, which is where our own “understanding” of what the network is doing would come from.

Post-Modern Machine Learning

Post-modernism is admittedly kind of a joke — it works so hard to pull down every aspect of human endeavor that it falls victim to its own methods. The convoluted arguments made by its proponents sound like ultimate slacker talk — circuitous logic circling itself in an endless loop of denial.

But post-modernism does have its merits. It surfs on the moving crest of what passes as modernism, as modernism passes onward to its next phase. The philosophy of post-modernism moves beyond rationality in favor of a subjectivism in which cause and effect are blurred.  For instance, in post-modern semiotic theory, a text or a picture is no longer an objective element of reality, but fragments into multiple semiotic versions, each one different for each different reader or watcher — a spectrum of collaborative efforts between each consumer and the artist. The reader brings with them a unique set of life experiences that interact with the text to create an entirely new experience in each reader’s mind.

Deep learning is post-modern in the sense that deterministic algorithms have disappeared. Instead of a traceable path of sequential operations, neural nets scramble information into massively-parallel strings of partial information that cross and interact nonlinearly with other massively-parallel strings. It is difficult to impossible to trace any definable part of the information from input to output. The output simply manifests some aspect of the data that was hidden from human view.

But the Age of Artificial Intelligence is not here yet. The vast multiplicity of saddle ridges in high dimensions is one of the drivers for one of the biggest pitfalls of deep learning — the need for massive amounts of training data. Because there are so many adjustable parameters in a neural network, and hence so many dimensions, a tremendous amount of training data is required to train a network to convergence. This aspect of deep learning stands in strong contrast to human children who can be shown a single picture of a bicycle next to a tricycle, and then they can achieve almost perfect classification accuracy when shown any number of photographs of different bicycles and tricycles. Humans can generalize with an amazingly small amount of data, while deep networks often need thousands of examples. This example alone points to the marked difference between human intelligence and the current state of deep learning. There is still a long road ahead.


[1] F. Rosenblatt, “THE PERCEPTRON – A PROBABILISTIC MODEL FOR INFORMATION-STORAGE AND ORGANIZATION IN THE BRAIN,” Psychological Review, vol. 65, no. 6, pp. 386-408, (1958)

[2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “LEARNING REPRESENTATIONS BY BACK-PROPAGATING ERRORS,” Nature, vol. 323, no. 6088, pp. 533-536, Oct (1986)

[3] LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). “Gradient-based learning applied to document recognition”. Proceedings of the IEEE. 86 (11): 2278–2324.

[4] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527-1554, Jul (2006)

[5] M. Kimura, The Neutral Theory of Molecular Evolution. Cambridge University Press, 1968.

The Physics of Starflight: Proxima Centauri b or Bust!

The ability to travel to the stars has been one of mankind’s deepest desires. Ever since we learned that we are just one world in a vast universe of limitless worlds, we have yearned to visit some of those others. Yet nature has thrown up an almost insurmountable barrier to that desire–the speed of light. Only by traveling at or near the speed of light may we venture to far-off worlds, and even then, decades or centuries will pass during the voyage. The vast distances of space keep all the worlds isolated–possibly for the better.

Yet the closest worlds are not so far away that they will always remain out of reach. The very limit of the speed of light provides ways of getting there within human lifetimes. The non-intuitive effects of special relativity come to our rescue, and we may yet travel to the closest exoplanet we know of.

Proxima Centauri b

The closest habitable Earth-like exoplanet is Proxima Centauri b, orbiting the red dwarf star Proxima Centauri that is about 4.2 lightyears away from Earth. The planet has a short orbital period of only about 11 Earth days, but the dimness of the red dwarf puts the planet in what may be a habitable zone where water is in liquid form. Its official discovery date was August 24, 2016 by the European Southern Observatory in the Atacama Desert of Chile using the Doppler method. The Alpha Centauri system is a three-star system, and even before the discovery of the planet, this nearest star system to Earth was the inspiration for the Hugo-Award winning sci-fi trilogy The Three Body Problem by Chinese author Liu Cixin, originally published in 2008.

It may seem like a coincidence that the closest Earth-like planet to Earth is in the closest star system to Earth, but it says something about how common such exoplanets may be in our galaxy.

Artist’s rendition of Proxima Centauri b. From WikiCommons.

Breakthrough Starshot

There are already plans to send centimeter-sized spacecraft to Alpha Centauri. One such project that has received a lot of press is Breakthrough Starshot, a project of the Breakthrough Initiatives. Breakthrough Starshot would send around 1000 centimeter-sized camera-carrying laser-fitted spacecraft with 5-meter-diameter solar sails propelled by a large array of high-power lasers. The reason there are so many of these tine spacecraft is because of the collisions that are expected to take place with interstellar dust during the voyage. It is possible that only a few dozen of the craft will finally make it to Alpha Centauri intact.

Relative locations of the stars of the Alpha Centauri system. From ScienceNews.

As these spacecraft fly by the Alpha Centauri system, possibly within one hundred million miles of Proxima Centauri b, their tiny HR digital cameras will take pictures of the planet’s surface with enough resolution to see surface features. The on-board lasers will then transmit the pictures back to Earth. The travel time to the planet is expected to be 20 or 30 years, plus the four years for the laser information to make it back to Earth. Therefore, it would take a quarter century after launch to find out if Proxima Centauri b is habitable or not. The biggest question is whether it has an atmosphere. The red dwarf it orbits sends out catastrophic electromagnetic bursts that could strip the planet of its atmosphere thus preventing any chance for life to evolve or even to be sustained there if introduced.

There are multiple projects under consideration for travel to the Alpha Centauri systems. Even NASA has a tentative mission plan called the 2069 Mission (100 year anniversary of the Moon landing). This would entail a single spacecraft with a much larger solar sail than the small starshot units. Some of the mission plans proposed star-drive technology, such as nuclear propulsion systems, rather than light sails. Some of these designs could sustain a 1-g acceleration throughout the entire mission. It is intriguing to do the math on what such a mission could look like, in terms of travel time. Could we get an unmanned probe to Alpha Centauri in a matter of years? Let’s find out.

Special Relativity of Acceleration

The most surprising aspect of deriving the properties of relativistic acceleration using special relativity is that it works at all. We were all taught as young physicists that special relativity deals with inertial frames in constant motion. So the idea of frames that are accelerating might first seem to be outside the scope of special relativity. But one of Einstein’s key insights, as he sought to extend special relativity towards a more general theory, was that one can define a series of instantaneously inertial co-moving frames relative to an accelerating body. In other words, at any instant in time, the accelerating frame has an inertial co-moving frame. Once this is defined, one can construct invariants, just as in usual special relativity. And these invariants unlock the full mathematical structure of accelerating objects within the scope of special relativity.

For instance, the four-velocity and the four-acceleration in a co-moving frame for an object accelerating at g are given by

The object is momentarily stationary in the co-moving frame, which is why the four-velocity has only the zeroth component, and the four-acceleration has simply g for its first component.

Armed with these four-vectors, one constructs the invariants

and

This last equation is solved for the specific co-moving frame as

But the invariant is more general, allowing the expression

which yields

From these, putting them all together, one obtains the general differential equations for the change in velocity as a set of coupled equations

The solution to these equations is

where the unprimed frame is the lab frame (or Earth frame), and the primed frame is the frame of the accelerating object, for instance a starship heading towards Alpha Centauri. These equations allow one to calculate distances, times and speeds as seen in the Earth frame as well as the distances, times and speeds as seen in the starship frame. If the starship is accelerating at some acceleration g’ other than g, then the results are obtained simply by replacing g by g’ in the equations.

Relativistic Flight

It turns out that the acceleration due to gravity on our home planet provides a very convenient (but purely coincidental) correspondence

With a similarly convenient expression

These considerably simplify the math for a starship accelerating at g.

Let’s now consider a starship accelerating by g for the first half of the flight to Alpha Centauri, turning around and decelerating at g for the second half of the flight, so that the starship comes to a stop at its destination. The equations for the times to the half-way point are

This means at the midpoint that 1.83 years have elapsed on the starship, and about 3 years have elapsed on Earth. The total time to get to Alpha Centauri (and come to a stop) is then simply

It is interesting to look at the speed at the midpoint. This is obtained by

which is solved to give

This amazing result shows that the starship is traveling at 95% of the speed of light at the midpoint when accelerating at the modest value of g for about 3 years. Of course, the engineering challenges for providing such an acceleration for such a long time are currently prohibitive … but who knows? There is a lot of time ahead of us for technology to advance to such a point in the next century or so.

Figure. Time lapsed inside the spacecraft and on Earth for the probe to reach Alpha Centauri as a function of the acceleration of the craft. At 10 g’s, the time elapsed on Earth is a little less than 5 years. However, the signal sent back will take an additional 4.37 years to arrive for a total time of about 9 years.

Matlab alphacentaur.m

% alphacentaur.m
clear
format compact

g0 = 1;
L = 4.37;

for loop = 1:100
    
    g = 0.1*loop*g0;
    
    taup = (1/g)*acosh(g*L/2 + 1);
    tearth = (1/g)*sinh(g*taup);
    
    tauspacecraft(loop) = 2*taup;
    tlab(loop) = 2*tearth;
    
    acc(loop) = g;
    
end

figure(1)
loglog(acc,tauspacecraft,acc,tlab,'LineWidth',2)
legend('Space Craft','Earth Frame','FontSize',18)
xlabel('Acceleration (g)','FontSize',18)
ylabel('Time (years)','FontSize',18)
dum = set(gcf,'Color','White');
H = gca;
H.LineWidth = 2;
H.FontSize = 18;

To Centauri and Beyond

Once we get unmanned probes to Alpha Centauri, it opens the door to star systems beyond. The next closest are Barnards star at 6 Ly away, Luhman 16 at 6.5 Ly, Wise at 7.4 Ly, and Wolf 359 at 7.9 Ly. Several of these are known to have orbiting exoplanets. Ross 128 at 11 Ly and Lyuten at 12.2 Ly have known earth-like planets. There are about 40 known earth-like planets within 40 lightyears from Earth, and likely there are more we haven’t found yet. It is almost inconceivable that none of these would have some kind of life. Finding life beyond our solar system would be a monumental milestone in the history of science. Perhaps that day will come within this century.


Further Reading

R. A. Mould, Basic Relativity. Springer (1994)

D. D. Nolte, Introduction to Modern Dynamics : Chaos, Networks, Space and Time, 2nd ed.: Oxford University Press (2019)

Democracy against Dictatorship: The Physics of Public Opinion

An old joke says that Democracy is a terrible form of government … except it’s better than all the others!

Our world today is faced with conflict between democracy and dictatorship. On the one side is the free world, where leaders are chosen by some form of representation of large numbers of citizens and sometimes even a majority. On the other side is authoritarianism where a select few are selected by a select few to govern everyone else.

[I]t has been said that democracy is the worst form of Government except all those other forms that have been tried from time to time; but there is the broad feeling in our country that the people should rule, and that public opinion expressed by all constitutional means, should shape, guide, and control the actions of Ministers who are their servants and not their masters.

Winston Churchill (1947)

An argument in favor of democracy is freedom of choice for the largest segment of the population, plus the ability to remove leaders who fail to provide for the perceived welfare of the most citizens. This makes democracy adaptive, shifting with the times. It also makes leaders accountable for their actions and crimes. An argument in favor of authoritarianism is the myth of the benevolent dictator–someone who knows what’s best for the people even if the people don’t know it themselves.

But dictators are rarely benevolent, and as they become saturated with power, they are corrupted. The criminal massacres, perpetrated by Putin, of Ukrainian civilians is one of the strongest recent arguments against authoritarianism. A single man decides, on a whim, the life and death of thousands or maybe more. The invasion of Ukraine is so egregious and unwarranted, that we wonder how the Russian people can put up with their isolated and manic leader. Yet by some measure more than 60% of the people in Russia approve of the war.

How can the free world see the invasion as the atrocity it is, while Russia’s majority sees it as a just war? The answer is a surprising result of population dynamics known as the replicator-mutator equation. The challenge for us here in the free world is to learn how to game the replicator-mutator equation to break up the monopoly of popular opinion and make Putin pay for his arrogance. This blog explains how “mass hysteria” can arise from forces within a complex environment, and how to construct a possible antidote.

Replicator-Mutator Equation

There are several simple models of population dynamics that try to explain the rise and fall of the number of individuals that belong to varying cohorts within the population. These models incorporate aspects of relative benefit of one group over another, plus the chance to change sides–defection. The dynamics under these conditions can be highly nonlinear and highly non-intuitive. One of the simplest of these models is known as the replicator-mutator model where replication follows the fitness of the cohort, and where individuals can defect to a “more fit” cohort.

The basic dynamics of the model are

where xa is the fraction of the population that is in cohort a, Wab is a transition probability, and φ is the average fitness of the full population. The transition matrix is given by

where fb is the fitness of cohort b and Qba is a stochastic matrix that allows for defection of an individual from one cohort to another. The fitness of a cohort is given by

where pbc is the pay-off matrix for the relative benefit of one cohort at the expense of another. Finally the average fitness is

The Einstein implicit summation convention is assumed in all of these equations, and the metric space in which the dynamics are embedded is “flat” so that there is no essential difference between superscripts and subscripts. There is also a conservation law that the sum over all population fractions equals unity.

In the language of population dynamics, this model has frequency-dependent fitness, with defection and pay-off, in a zero-sum game.

One of the simplest questions to answer with this model is how so many people can come to believe one thing. This is known as “opinion uniformity”.

Uniformity versus Diversity

This replicator-mutator model explains the property of opinion uniformity, as well as the opposite extreme of opinion diversity. The starting point for both is the pay-off matrix pbc which is assumed to be unity on the diagonal for b = c and to a constant factor a for b ~= c. This pay-off is symmetric so that all opinions are equally “believable”. The stochastic defection matrix is close to unity on the diagonal, and has random terms on the off-diagonal that are proportional to a constant ε. The defection matrix allows a person from one cohort to defect to the belief system of another cohort if they believe that the new cohort has more merit. Cohorts with greater merit (fitness) gain more members over time, while cohorts with lower merit have fewer members over time.

Note that the fitness increases with the number of members in the cohort. This is the bandwagon effect. A belief system is perceived to have more merit if there are more people who believe it. This clearly creates a positive feedback that would cause this cohort to grow. Even though all the cohorts have this same positive feedback, the zero-sum rule only allows one of the cohorts to grow to its maximum extent, taking members away from all the other cohorts. This is illustrated in Fig. 1. One belief system wins, taking almost the full population with it.

Fig. 1 Population fractions evolving as a function of time for a = 0.5 and a small defection rate ε = 0.02. One winner takes almost all the population. These are two views of the same data on semilog and log-log.

What allows the winner to take all is the positive feedback where the fitness of the cohort increases with the number of members, combined with the ability for that cohort to take members from other cohorts through the defection matrix.

However, all of the cohorts are trying the same thing, and the pay-off matrix is fully symmetric and equal for all cohorts, so no cohort is intrinsically “better” than another. This property opens the door to a strong alternative to opinion uniformity. In fact, as more members are allowed to defect, it creates a trend counter to winner-take-all, helping to equalize the cohorts. Suddenly, a bifurcation is passed when the winner-take-all converts discontinuously to a super-symmetric situation when all opinions are held by equal numbers of people. This is illustrated in Fig. 2 for a slightly higher defection rate ε = 0.03. The parameters are identical to those in Fig. 1, but the higher defection rate stabilizes the super-symmetric state of maximum diversity.

Fig. 2 Population fractions for higher defection rate of 0.03. In super-symmetric state, all opinions are held at the same rate with maximum diversity.

These two extreme results of the replicator-mutator equation, that switch suddenly from one to the other dependent on the defection rate, may seem to produce solutions neither of which are ideal for a healthy democracy. One the one hand, in the uniform case where the wining opinion is monolithic, everyone is a carbon-copy of everyone else, which is a form of cultural death (lack of diversity). But, on the other hand, one might argue that maximum opinion diversity is just as concerning, because no-one can agree on anything. If all opinions are equivalent, then everyone in the society believes something different and there is no common ground. But in the diversity case, at least there is no state-level control of the population. In the case of opinion uniformity, the wining opinion can be manipulated by propaganda.

The Propaganda Machine

A government can “seed” the belief networks with propaganda that favors the fitness of what they want their citizens to hear. Because of the positive feedback, any slight advantage of one opinion over others can allow that opinion to gain large numbers through the bandwagon effect. Of course, even stronger control that stifles dissent, for instance by shutting down the free press, makes it that much more likely that the state-controlled story is believed. This may be one reason for the 60% (as of the writing of this blog) support Putin’s war, despite the obvious lies that are being told. This is illustrated in Fig. 3 by boosting the payoff between two similar lies that the government wants its people to believe. These rise to take about 60% of the population. Members of the cohort are brain-washed, not by the government alone, but by all their neighbors who are parroting the same thing.

Fig. 3 Government propaganda acts as a “seed” that makes the propaganda grow faster than other beliefs, even for a defection rate of 0.03 which is above the threshold of Fig. 2.

Breaking the Monopoly of Thought

How do we fight back? Not just against the Kremlin’s propaganda, but also against QANON and Trump’s Big Lie and the pernicious fallacy of nationalism? The answer is simple: diversity of thought! The sliver bullet in the replicator-mutator model is the defection matrix. The existence of a bifurcation means that a relatively small increase in the amount of diverse opinion, and the freedom to swap opinions, can lead to a major qualitative collapse of the monolithic thought, even when supported by government propaganda, as shown in Fig. 4. More people may still believe in the state-supported propaganda than the others, but it is no longer a majority.

Fig. 4 Increasing the defection rate can help equalize free opinions against the state-supported propaganda

The above models were all very homogeneous. It is more realistic that people are connected through small-world networks. In this case, there is much more diversity, as shown in Fig. 5, although the defection rate needs to be much higher to prevent a monolithic opinion from dominating. The state-supported propaganda is buried in the resulting mix of diverse ideas. Therefore, to counteract state control, people must feel free to hop about in their choice of beliefs and have access to other beliefs.

Fig. 5 The defection matrix is multiplied by the adjacency matrix of a small-world network. There is significant diversity of thought, but a relatively high defection rate is needed. The state-supported propaganda is buried in this mix.

This is a bit paradoxical. On the one hand, the connectivity of the internet has fostered the rise of conspiracy theories and other odd-ball ideas. But sustained access to multiple sources of information is the best defense against all that crazy stuff winning out. In other words, not only do we have to put up with the lunatic fringe if we are to have full diversity of thought, but we need to encourage everyone to feel free to “shop around” for different ideas, even if some of them are crazy. Our free society shouldn’t be cancelling people who have divergent opinions, because that sets us down the path to authoritarianism. As a recent add said in the New York Times, “Cancel culture cancels culture.” Unfortunately, authoritarianism is on the rise around the world, and the US almost suffered that fate on Jan. 6, 2021. Furthermore, with Xi aligning with Putin and giving him the green light on Ukraine–cynically on the eve of the Olympic Games (of peace)–the new world order will revolve around that axis for decades to come, if the world survives that long. Diversity and freedom may be the only antidote.

Matlab Program: Repmut.m

function repmut
% https://github.itap.purdue.edu/nolte/Matlab-Programs-for-Nonlinear-Dynamics

clear
format compact

N = 63;     
p = 0.5;

mutype = 1;     % 0 = Hamming   1 = rand
pay = 1;        % 0 = Hamming   1 = 1/sqrt(N) 
ep = 0.5;      % average mutation rate: 0.1 to 0.01 typical  (0.4835)

%%%%% Set original population
x0temp = rand(1,N);    % Initial population
sx = sum(x0temp);
y0 = x0temp/sx;
Pop0 = sum(y0);


%%%%% Set Adjacency

%node = makeglobal(N);
%node = makeER(N,0.25);       % 0.5     0.25 
%node = makeSF(N,6);       % 12         6
node = makeSW(N,7,0.125);   % 15,0.5    7,0.5
[Adj,degree,Lap] = adjacency(node);

%%%%%% Set Hamming distance
for yloop = 1:N
    for xloop = 1:N
        H(yloop,xloop) = hamming(yloop-1,xloop-1);
    end
end

%%%%%%% Set Mutation matrix
if mutype == 0
    Qtemp = 1./(1+H/ep);    %Mutation matrix on Hamming
    Qsum = sum(Qtemp,2);
    mnQsum = mean(Qsum);
    
    % Normalize mutation among species
    for yloop = 1:N
        for xloop = 1:N
            Q(yloop,xloop) = Qtemp(yloop,xloop)/Qsum(xloop);
        end
    end
    
elseif mutype == 1  
    S = stochasticmatrix(N);
    Stemp = S - diag(diag(S));
    Qtemp = ep*Stemp;
    sm = sum(Qtemp,2)';
    Q = Qtemp + diag(ones(1,N) - sm);
end

figure(1)
imagesc(Q)
title('Mutation Matrix')
colormap(jet)

%%%%%%% Set payoff matrix
if pay == 1
    payoff = zeros(N,N);
    for yloop = 1:N
        payoff(yloop,yloop) = 1;
        for xloop = yloop + 1:N
            payoff(yloop,xloop) = p;
            payoff(xloop,yloop) = p;
            payoff(1,N) = 1;    % Propaganda
            payoff(N,1) = 1;
        end
    end
elseif pay == 0
    payoff = zerodiag(exp(-1*H));
end

figure(2)
imagesc(payoff)
title('Payoff Matrix')
colormap(jet)

% Run time evolution
tspan = [0 4000];
[t,x] = ode45(@quasispec,tspan,y0);

Pop0
[sz,dum] = size(t);
Popend = sum(x(sz,:))

for loop = 1:N
    fit(loop) = sum(payoff(:,loop)'.*x(sz,:));
end

phistar = sum(fit.*x(sz,:))       % final average fitness

xend = x(sz,:)
sortxend = sort(xend,'descend');
coher = sum(sortxend(1:2))

figure(3)
clf
h = colormap(lines);
for loop = 1:N
    plot(t,x(:,loop),'Color',h(round(loop*64/N),:),'LineWidth',1.25)
    hold on
end
hold off

figure(4)
clf
for loop = 1:N
    semilogx(t,x(:,loop),'Color',h(round(loop*64/N),:),'LineWidth',1.25)
    hold on
end
hold off

figure(5)
clf
for loop = 1:N
    semilogy(t,x(:,loop),'Color',h(round(loop*64/N),:),'LineWidth',1.25)
    hold on
end
hold off

figure(6)
clf
for loop = 1:N
    loglog(t,x(:,loop),'Color',h(round(loop*64/N),:),'LineWidth',1.25)
    hold on
end
hold off

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    function yd = quasispec(~,y)
        
        for floop = 1:N
            f(floop) = sum(payoff(:,floop).*y);
        end
        
        Connect = Adj + eye(N);
        
        % Transition matrix
        for yyloop = 1:N
            for xxloop = 1:N
                W(yyloop,xxloop) = f(yyloop)*(Connect(yyloop,xxloop)*Q(yyloop,xxloop));
            end
        end
        
        phi = sum(f'.*y);   % Average fitness of population
        
        yd = W*y - phi*y;
        
    end     % end quasispec
end

Further Reading

M. A. Nowak, Evolutionary Dynamics: Exploring the Equations of Life. Cambridge, Mass.: Harvard University Press, 2006.

Life in a Solar System with a Super-sized Jupiter

There are many known super-Jupiters that orbit their stars—they are detected through a slight Doppler wobble they induce on their stars [1].  But what would become of a rocky planet also orbiting those stars as they feel the tug of both the star and the super planet?

This is not of immediate concern for us, because our solar system has had its current configuration of planets for over 4 billion years.  But there can be wandering interstellar planets or brown dwarfs that could visit our solar system, like Oumuamua did in 2017, but much bigger and able to scramble the planetary orbits. Such hypothesized astronomical objects have been given the name “Nemesis“, and it warrants thought on what living in an altered solar system might be like.

What would happen to Earth if Jupiter were 50 times bigger? Could we survive?

The Three-Body Problem

The Sun-Earth-Jupiter configuration is a three-body problem that has a long and interesting history, playing a key role in several aspects of modern dynamics [2].  There is no general analytical solution to the three-body problem.  To find the behavior of three mutually interacting bodies requires numerical solution.  However, there are subsets of the three-body problem that do yield to partial analytical approaches.  One of these is called the restricted three-body problem [3].  It consists of two massive bodies plus a third (nearly) massless body that all move in a plane.  This restricted problem was first tackled by Euler and later by Poincaré, who discovered the existence of chaos in its solutions.

The geometry of the restricted three-body problem is shown in Fig. 1. In this problem, take mass m1 = mS to be the Sun’s mass, m2 = mJ to be Jupiter’s mass, and the third (small) mass is the Earth. 

Fig. 1  The restricted 3-body problem in the plane.  The third mass is negligible relative to the first two masses that obey 2-body dynamics.

The equation of motion for the Earth is

where

and the parameter ξ characterizes the strength of the perturbation of the Earth’s orbit around the Sun.  The parameters for the Jupiter-Sun system are

with

for the 11.86 year journey of Jupiter around the Sun.  Eq. (1) is a four-dimensional non-autonomous flow

The solutions of an Earth orbit are shown in Fig.2.  The natural Earth-Sun-Jupiter system has a mass ratio mJ/mS = 0.001 for Jupiter relative to the Sun mass.  Even in this case, Jupiter causes perturbations of the Earth’s orbit by about one percent.  If the mass of Jupiter increases, the perturbations would grow larger until around ξ= 0.06 when the perturbations become severe and the orbit grows unstable.  The Earth gains energy from the momentum of the Sun-Jupiter system and can reach escape velocity.  The simulation for a mass ratio of 0.07 shows the Earth ejected from the Solar System.

Fig.2  Orbit of Earth as a function of the size of a Jupiter-like planet.  The natural system has a Jupiter-Earth mass ratio of 0.03.  As the size of Jupiter increases, the Earth orbit becomes unstable and can acquire escape velocity to escape from the Solar System. From body3.m. (Reprinted from Ref. [4])

The chances for ejection depends on initial conditions for these simulations, but generally the danger becomes severe when Jupiter is about 50 times larger than it currently is. Otherwise the Earth remains safe from ejection. However, if the Earth is to keep its climate intact, then Jupiter should not be any larger than about 5 times its current size. At the other extreme, for a planet 70 times larger than Jupiter, the Earth may not get ejected at once, but it can take a wild ride through the solar system. A simulation for a 70x Jupiter is shown in Fig. 3. In this case, the Earth is captured for a while as a “moon” of Jupiter in a very tight orbit around the super planet as it orbits the sun before it is set free again to orbit the sun in highly elliptical orbits. Because of the premise of the restricted three-body problem, the Earth has no effect on the orbit of Jupiter.

Fig. 3 Orbit of Earth for TJ = 11.86 years and ξ = 0.069. The radius of Jupiter is RJ = 5.2. Earth is “captured” for a while by Jupiter into a very tight orbit.

Resonance

If Nemesis were to swing by and scramble the solar system, then Jupiter might move closer to the Earth. More ominously, the period of Jupiter’s orbit could come into resonance with the Earth’s period. This occurs when the ratio of orbital periods is a ratio of small integers. Resonance can amplify small perturbations, so perhaps Jupiter would become a danger to Earth. However, the forces exerted by Jupiter on the Earth changes the Earth’s orbit and hence its period, preventing strict resonance to occur, and the Earth is not ejected from the solar system even for initial rational periods or larger planet mass. This is related to the famous KAM theory of resonances by Kolmogorov, Arnold and Moser that tends to protect the Earth from the chaos of the solar system. More often than not in these scenarios, the Earth is either captured by the super Jupiter, or it is thrown into a large orbit that is still bound to the sun. Some examples are given in the following figures.

Fig. 4 Orbit of Earth for an initial 8:1 resonance of TJ = 8 years and ξ = 0.073. The Radius of Jupiter is R = 4. Jupiter perturbs the Earth’s orbit so strongly that the 8:1 resonance is quickly removed.
Fig. 5 Earth orbit for TJ = 12 years and ξ = 0.071. The Earth is thrown into a nearly circular orbit beyond the orbit of Saturn.

Fig. 6 Earth Orbit for TJ = 4 years and ξ = 0.0615. Earth is thrown into an orbit of high ellipticity out to the orbit of Neptune.

Life on a planet in a solar system with two large bodies has been envisioned in dramatic detail in the science fiction novel “Three-Body Problem” by Liu Cixin about the Trisolarians of the closest known exoplanet to Earth–Proxima Centauri b.

Matlab Code: body3.m

function body3

clear

chsi0 = 1/1000;     % Earth-moon ratio = 1/317
wj0 = 2*pi/11.86;

wj = 2*pi/8;
chsi = 73*chsi0;    % (11.86,60) (11.86,67.5) (11.86,69) (11.86,70) (4,60) (4,61.5) (8,73) (12,71) 

rj = 5.203*(wj0/wj)^0.6666

rsun = chsi*rj/(1+chsi);
rjup = (1/chsi)*rj/(1+1/chsi);

r0 = 1-rsun;
y0 = [r0 0 0 2*pi/sqrt(r0)];

tspan = [0 300];
options = odeset('RelTol',1e-5,'AbsTol',1e-6);
[t,y] = ode45(@f5,tspan,y0,options);

figure(1)
plot(t,y(:,1),t,y(:,3))

figure(2)
plot(y(:,1),y(:,3),'k')
axis equal
axis([-6 6 -6 6])

RE = sqrt(y(:,1).^2 + y(:,3).^2);
stdRE = std(RE)

%print -dtiff -r800 threebody

    function yd = f5(t,y)
        
        xj = rjup*cos(wj*t);
        yj = rjup*sin(wj*t);
        xs = -rsun*cos(wj*t);
        ys = -rsun*sin(wj*t);
        rj32 = ((y(1) - xj).^2 + (y(3) - yj).^2).^1.5;
        r32 = ((y(1) - xs).^2 + (y(3) - ys).^2).^1.5;

        yp(1) = y(2);
        yp(2) = -4*pi^2*((y(1)-xs)/r32 + chsi*(y(1)-xj)/rj32);
        yp(3) = y(4);
        yp(4) = -4*pi^2*((y(3)-ys)/r32 + chsi*(y(3)-yj)/rj32);
 
        yd = [yp(1);yp(2);yp(3);yp(4)];

    end     % end f5

end



References:

[1] D. D. Nolte, “The Fall and Rise of the Doppler Effect,” Physics Today, vol. 73, no. 3, pp. 31-35, Mar (2020)

[2] J. Barrow-Green, Poincaré and the three body problem. London Mathematical Society, 1997.

[3] M. C. Gutzwiller, “Moon-Earth-Sun: The oldest three-body problem,” Reviews of Modern Physics, vol. 70, no. 2, pp. 589-639, Apr (1998)

[4] D. D. Nolte, Introduction to Modern Dynamics : Chaos, Networks, Space and Time, 1st ed. (Oxford University Press, 2015).

The Physics of Robinson Crusoe’s Economy

“What is a coconut worth to a cast-away on a deserted island?”

In the midst of the cast-away’s misfortune and hunger and exertion and food lies an answer that looks familiar to any physicist who speaks the words

“Assume a Lagrangian …”

It is the same process that determines how a bead slides along a bent wire in gravity or a skier navigates a ski hill.  The answer: find the balance of economic forces subject to constraints. 

Here is the history and the physics behind one of the simplest economic systems that can be conceived:  Robinson Crusoe spending his time collecting coconuts!

Robinson Crusoe in Economic History

Daniel Defoe published “The Life and Strange Surprizing Adventures of Robinson Crusoe” in 1719, about a man who is shipwrecked on a deserted island and survives there for 28 years before being rescued.  It was written in the first person, as if the author had actually lived through those experiences, and it was based on a real-life adventure story.  It is one of the first examples of realistic fiction, and it helped establish the genre of the English novel.

Several writers on economic theory made mention of Robinson Crusoe as an example of a labor economy, but it was in 1871 that Robinson Crusoe became an economic archetype.  In that year both William Stanley Jevons‘s The Theory of Political Economy and Carl Menger‘s Grundsätze der Volkswirtschaftslehre (Principles of Economics) used Robinson Crusoe to illustrate key principles of the budding marginalist revolution.

Marginalism in economic theory is the demarcation between classical economics and modern economics.  The key principle of marginalism is the principle of “diminishing returns” as the value of something gets less as an individual has more of it.  This principle makes functions convex, which helps to guarantee that there are equilibrium points in the economy.  Economic equilibrium is a key concept and goal because it provides stability to economic systems.

One-Product Is a Dull Diet

The Robinson Crusoe economy is  one of the simplest economic models that captures the trade-off between labor and production on one side, and leisure and consumption on the other.  The model has a single laborer for whom there are 24*7 =168 hours in the week.  Some of these hours must be spent finding food, let’s say coconuts, while the other hours are for leisure and rest.  The production of coconuts follows a production curve

that is a function of labor L.  There are diminishing returns in the finding of coconuts for a given labor, making the production curve of coconuts convex.  The amount of rest is

and there is a reciprocal production curve q(R) related to less coconuts produced for more time spent resting. In this model it is assumed that all coconuts that are produced are consumed.  This is known as market clearing when no surplus is built up. 

The production curve presents a continuous trade-off between consumption and leisure, but at first look there is no obvious way to decide how much to work and how much to rest.  A lazy person might be willing to go a little hungry if they can have more rest, while a busy person might want to use all waking hours to find coconuts.  The production curve represents something known as a Pareto frontier.  It is a continuous trade-off between two qualities.  Another example of a Pareto frontier is car engine efficiency versus cost.  Some consumers may care more about the up-front cost of the car than the cost of gas, while other consumers may value fuel efficiency and be willing to pay higher costs to get it. 

Continuous trade offs always present a bit of a problem for planning. It is often not clear what the best trade off should be. This problem is solved by introducing another concept into this little economy–the concept of “Utility”.

The utility function was introduced by the physicist Daniel Bernoulli, one of the many bountiful Bernoullis of Basel, in 1738. The utility function is a measure of how much benefit or utility a person or an enterprise gains by holding varying amounts of goods or labor. The essential problem in economic exchange is to maximize one’s utility function subject to whatever constraints are active. The utility function for Robinson Crusoe is

This function is obviously a maximum at maximum leisure (R = 1) and lots of coconuts (q = 1), but this is not allowed, because it lies off the production curve q(R). Therefore the question becomes: where on the production curve he can maximize the trade-off between coconuts and leisure?

Fig. 1 shows the dynamical space for Robinson Crusoe’s economy. The space is two dimensional with axes for coconuts q and rest R. Isoclines of the utility function are shown as contours known as “indifference” curves, because the utility is constant along these curves and hence Robinson Crusoe is indifferent to his position on it. The indifference curves are cut by the production curve q(R). The equilibrium problem is to maximize utility subject to the production curve.

Fig. 1 The production space of the Robinson Crusoe economy. The production curve q(R) cuts across the isoclines of the utility function U(q,R). The contours represent “indifference” curves because the utility is constant along a contour.

When looking at dynamics under constraints, Lagrange multipliers are the best tool. Furthermore, we can impart dynamics into the model with temporal adjustments in q and R that respond to economic forces.

The Lagrangian Economy

The approach to the Lagrangian economy is identical to the Lagrangian approach in classical physics. The equation of constraint is

All the dynamics take place on the production curve. The initial condition starts on the curve, and the state point moves along the curve until it reaches a maximum and settles into equilibrium. The dynamics is therefore one-dimensional, the link between q and R being the production curve.

The Lagrangian in this simple economy is given by the utility function augmented by the equation of constraint, such that

where λ is the Lagrangian undetermined multiplier. The Euler-Lagrange equations for dynamics are

where the term on the right-hand-side is a drag force with the relaxation rate γ.

The first term on the left is the momentum of the system. In economic dynamics, this is usually negligible, similar to dynamics in living systems at low Reynold’s number in which all objects are moving instantaneously at their terminal velocity in response to forces. The equations of motion are therefore

The Lagrange multiplier can be solved from the first equation as

and the last equation converts q-dot to R-dot to yield the single equation

which is a one-dimensional flow

where all q’s are expressed as R’s through the equation of constraint. The speed vanishes at the fixed point—the economic equilibrium—when

This is the point of Pareto efficient allocation. Any initial condition on the production curve will relax to this point with a rate given by γ. These trajectories are shown in Fig. 2. From the point of view of Robinson Crusoe, if he is working harder than he needs, then he will slack off. But if there aren’t enough coconuts to make him happy, he will work harder.

Fig. 2 Motion occurs on the one-dimensional manifold defined by the production curve such that the utility is maximized at a unique point called the Pareto Efficient Allocation.

The production curve is like a curved wire, the amount of production q is like the bead sliding on the wire. The utility function plays the role of a potential function, and the gradients of the utility function play the role of forces. Then this simple economic model is just like ordinary classical physics of point masses responding to forces constrained to lie on certain lines or surfaces. From this viewpoint, physics and economics are literally the same.

Worked Example

To make this problem specific, consider a utility function given by

that has a maximum in the upper right corner, and a production curve given by

that has diminishing returns. Then, the condition of equilibrium can be solved using

to yield

With the (fairly obvious) answer

For More Reading

[1] D. D. Nolte, Introduction to Modern Dynamics : Chaos, Networks, Space and Time, 2nd ed. Oxford : Oxford University Press (2019).

[2] Fritz Söllner; The Use (and Abuse) of Robinson Crusoe in Neoclassical Economics. History of Political Economy; 48 (1): 35–64. (2016)