Symmetry is the canvas upon which the laws of physics are written. Symmetry defines the invariants of dynamical systems. But when symmetry breaks, the laws of physics break with it, sometimes in dramatic fashion. Take the Big Bang, for example, when a highly-symmetric form of the vacuum, known as the “false vacuum”, suddenly relaxed to a lower symmetry, creating an inflationary cascade of energy that burst forth as our Universe.
The early universe was extremely hot and energetic, so much so that all the forces of nature acted as one–described by a unified Lagrangian (as yet resisting discovery by theoretical physicists) of the highest symmetry. Yet as the universe expanded and cooled, the symmetry of the Lagrangian broke, and the unified forces split into two (gravity and electro-nuclear). As the universe cooled further, the Lagrangian (of the Standard Model) lost more symmetry as the electro-nuclear split into the strong nuclear force and the electro-weak force. Finally, at a tiny fraction of a second after the Big Bang, the universe cooled enough that the unified electro-week force broke into the electromagnetic force and the weak nuclear force. At each stage, spontaneous symmetry breaking occurred, and invariants of physics were broken, splitting into new behavior. In 2008, Yoichiro Nambu received the Nobel Prize in physics for his model of spontaneous symmetry breaking in subatomic physics.
Physics is filled with examples of spontaneous symmetry breaking. Crystallization and phase transitions are common examples. When the temperature is lowered on a fluid of molecules with high average local symmetry, the molecular interactions can suddenly impose lower-symmetry constraints on relative positions, and the liquid crystallizes into an ordered crystal. Even solid crystals can undergo a phase transition as one symmetry becomes energetically advantageous over another, and the crystal can change to a new symmetry.
In mechanics, any time a potential function evolves slowly with some parameter, it can start with one symmetry and evolve to another lower symmetry. The mechanical system governed by such a potential may undergo a discontinuous change in behavior.
In complex systems and chaos theory, sudden changes in behavior can be quite common as some parameter is changed continuously. These discontinuous changes in behavior, in response to a continuous change in a control parameter, is known as a bifurcation. There are many types of bifurcation, carrying descriptive names like the pitchfork bifurcation, period-doubling bifurcation, Hopf bifurcation, and fold bifurcation, among others. The pitchfork bifurcation is a typical example, shown in Fig. 2. As a parameter is changed continuously (horizontal axis), a stable fixed point suddenly becomes unstable and two new stable fixed points emerge at the same time. This type of bifurcation is called pitchfork because the diagram looks like a three-tined pitchfork. (This is technically called a supercritical pitchfork bifurcation. In a subcritical pitchfork bifurcation the solid and dashed lines are swapped.) This is exactly the bifurcation displayed by a simple mechanical model that illustrates spontaneous symmetry breaking.
Sliding Mass on a Rotating Hoop
One of the simplest mechanical models that displays spontaneous symmetry breaking and the pitchfork bifurcation is a bead sliding without friction on a circular hoop that is spinning on the vertical axis, as in Fig. 3. When it spins very slowly, this is just a simple pendulum with a stable equilibrium at the bottom, and it oscillates with a natural oscillation frequency ω0 = sqrt(g/b), where b is the radius of the hoop and g is the acceleration due to gravity. On the other hand, when it spins very fast, then the bead is flung to to one side or the other by centrifugal force. The bead then oscillates around one of the two new stable fixed points, but the fixed point at the bottom of the hoop is very unstable, because any deviation to one side or the other will cause the centrifugal force to kick in. (Note that in the body frame, centrifugal force is a non-inertial force that arises in the non-inertial coordinate frame. )
The solution uses the Euler equations for the body frame along principal axes. In order to use the standard definitions of ω1, ω2, and ω3, the angle θ MUST be rotated around the x-axis. This means the x-axis points out of the page in the diagram. The y-axis is tilted up from horizontal by θ, and the z-axis is tilted from vertical by θ. This establishes the body frame.
The components of the angular velocity are
And the moments of inertia are (assuming the bead is small)
There is only one Euler equation that is non-trivial. This is for the x-axis and the angle θ. The x-axis Euler equation is
and solving for the angular acceleration gives.
This is a harmonic oscillator with a “phase transition” that occurs as ω increases from zero. At first the stable equilibrium is at the bottom. But when ω passes a critical threshold, the equilibrium angle begins to increase to a finite angle set by the rotation speed.
This can only be real if the magnitude of the argument is equal to or less than unity, which sets the critical threshold spin rate to make the system move to the new stable points to one side or the other for
which interestingly is the natural frequency of the non-rotating pendulum. Note that there are two equivalent angles (positive and negative), so this problem has a degeneracy.
This is an example of a dynamical phase transition that leads to spontaneous symmetry breaking and a pitchfork bifurcation. By integrating the angular acceleration we can get the effective potential for the problem. One contribution to the potential is due to gravity. The other is centrifugal force. When combined and plotted in Fig. 4 for a family of values of the spin rate ω, a pitchfork emerges naturally by tracing the minima in the effective potential. The values of the new equilibrium angles are given in Fig. 2.
Below the transition threshold for ω, the bottom of the hoop is the equilibrium position. To find the natural frequency of oscillation, expand the acceleration expression
For small oscillations the natural frequency is given by
As the effective potential gets flatter, the natural oscillation frequency decreases until it vanishes at the transition spin frequency. As the hoop spins even faster, the new equilibrium positions emerge. To find the natural frequency of the new equilibria, expand θ around the new equilibrium θ’ = θ – θ0
Which is a harmonic oscillator with oscillation angular frequency
Note that this is zero frequency at the transition threshold, then rises to match the spin rate of the hoop at high frequency. The natural oscillation frequency as a function of the spin looks like Fig. 5.
This mechanical analog is highly relevant for the spontaneous symmetry breaking that occurs in ferroelectric crystals when they go through a ferroelectric transition. At high temperature, these crystals have no internal polarization. But as the crystal cools towards the ferroelectric transition temperature, the optical-mode phonon modes “soften” as the phonon frequency decreases and vanishes at the transition temperature when the crystal spontaneously polarizes in one of several equivalent directions. The observation of mode softening in a polar crystal is one signature of an impending ferroelectric phase transition. Our mass on the hoop captures this qualitative physics nicely.
For fun, let’s find at what spin frequency the harmonic oscillation frequency at the dynamic equilibria equal the original natural frequency of the pendulum. Then
which is the golden ratio. It’s spooky how often the golden ratio appears in random physics problems!
The most energetic physical processes in the universe (shy of the Big Bang itself) are astrophysical jets. These are relativistic beams of ions and radiation that shoot out across intergalactic space, emitting nearly the full spectrum of electromagnetic radiation, seen as quasars (quasi-stellar objects) that are thought to originate from supermassive black holes at the center of distant galaxies. The most powerful jets emit more energy than the light from a thousand Milky Way galaxies.
Where can such astronomical amounts of energy come from?
Black Hole Accretion Disks
The potential wells of black holes are so deep and steep, that they attract matter from their entire neighborhood. If a star comes too close, the black hole can rip the hydrogen and helium atoms off the star’s surface and suck them into a death spiral that can only end in oblivion beyond the Schwarzschild radius.
However, just before they disappear, these atoms and ions make one last desperate stand to resist the inevitable pull, and they park themselves near an orbit that is just stable enough that they can survive many orbits before they lose too much energy, through collisions with the other atoms and ions, and resume their in-spiral. This last orbit, called the inner-most stable circular orbit (ISCO), is where matter accumulates into an accretion disk.
The Innermost Stable Circular Orbit (ISCO)
At what radius is the inner-most stable circular orbit? To find out, write the energy equation of a particle orbiting a black hole with an effective potential function as
where the effective potential is
The first two terms of the effective potential are the usual Newtonian terms that include the gravitational potential and the repulsive contribution from the angular momentum that normally prevents the mass from approaching the origin. The third term is the GR term that is attractive and overcomes the centrifugal barrier at small values of r, allowing the orbit to collapse to the center. This is the essential danger of orbiting a black hole—not all orbits around a black hole are stable, and even circular orbits will decay and be swallowed up if too close to the black hole.
To find the conditions for circular orbits, take the derivative of the effective potential and set it to zero
This is a quadratic equation that can be solved for r. There is an innermost stable circular orbit (ISCO) that is obtained when the term in the square root of the quadratic formula vanishes when the angular momentum satisfies the condition
which gives the simple result for the inner-most circular orbit as
Therefore, no particle can sustain a circular orbit with a radius closer than three times the Schwarzschild radius. Inside that, it will spiral into the black hole.
A single trajectory solution to the GR flow  is shown in Fig. 4. The particle begins in an elliptical orbit outside the innermost circular orbit and is captured into a nearly circular orbit inside the ISCO. This orbit eventually decays and spirals with increasing speed into the black hole. Accretion discs around black holes occupy these orbits before collisions cause them to lose angular momentum and spiral into the black hole.
The gravity of black holes is so great, that even photons can orbit black holes in circular orbits. The radius or the circular photon orbit defines what is known as the photon sphere. The radius of the photon sphere is RPS = 1.5RS, which is just a factor of 2 smaller than the ISCO.
Binding Energy of a Particle at the ISCO
So where does all the energy come from to power astrophysical jets? The explanation comes from the binding energy of a particle at the ISCO. The energy conservation equation including angular momentum for a massive particle of mass m orbiting a black hole of mass M is
where the term on the right is the kinetic energy of the particle at infinity, and the second and third terms on the left are the effective potential
Solving for the binding energy at the ISCO gives
Therefore, 6% of the rest energy of the object is given up when it spirals into the ISCO. Remember that the fusion of two hydrogen atoms into helium gives up only about 0.7% of its rest mass energy. Therefore, the energy emission per nucleon for an atom falling towards the ISCO is TEN times more efficient than nuclear fusion!
This incredible energy resource is where the energy for galactic jets and quasars comes from.
 These equations apply for particles that are nonrelativistic. Special relativity effects become important when the orbital radius of the particle approaches the Schwarzschild radius, which introduces relativistic corrections to these equations.
One of the most important conclusions from chaos theory is that not all random-looking processes are actually random. In deterministic chaos, structures such as strange attractors are not random at all but are fractal structures determined uniquely by the dynamics. But sometimes, in nature, processes really are random, or at least have to be treated as such because of their complexity. Brownian motion is a perfect example of this. At the microscopic level, the jostling of the Brownian particle can be understood in terms of deterministic momentum transfers from liquid atoms to the particle. But there are so many liquid particles that their individual influences cannot be directly predicted. In this situation, it is more fruitful to view the atomic collisions as a stochastic process with well-defined physical parameters and then study the problem statistically. This is what Einstein did in his famous 1905 paper that explained the statistical physics of Brownian motion.
Then there is the middle ground between deterministic mechanics and stochastic mechanics, where complex dynamics gains a stochastic component. This is what Paul Langevin did in 1908 when he generalized Einstein.
Paul Langevin (1872 – 1946) had the fortune to stand at the cross-roads of modern physics, making key contributions, while serving as a commentator expanding on the works of the giants like Einstein and Lorentz and Bohr. He was educated at the École Normale Supérieure and at the Sorbonne with a year in Cambridge studying with J. J. Thompson. At the Sorbonne he worked in the laboratory of Jean Perrin (1870 – 1942) who received the Nobel Prize in 1926 for the experimental work on Brownian motion that had set the stage for Einstein’s crucial analysis of the problem confirming the atomic nature of matter.
Langevin received his PhD in 1902 on the topic of x-ray ionization of gases and was appointed as a lecturer at the College de France to substitute in for Éleuthère Mascart (who was an influential French physicist in optics). In 1905 Langevin published several papers that delved into the problems of Lorentz contraction, coming very close to expressing the principles of relativity. This work later led Einstein to say that, had he delayed publishing his own 1905 paper on the principles of relativity, then Langevin might have gotten there first .
Also in 1905, Langevin published his most influential work, providing the theoretical foundations for the physics of paramagnetism and diamagnetism. He was working closely with Pierre Curie whose experimental work on magnetism had established the central temperature dependence of the phenomena. Langevin used the new molecular model of matter to derive the temperature dependence as well as the functional dependence on magnetic field. One surprising result was that only the valence electrons, moving relativistically, were needed to contribute to the molecular magnetic moment. This later became one of the motivations for Bohr’s model of multi-electron atoms.
Langevin suffered personal tragedy during World War II when the Vichy government arrested him because of his outspoken opposition to fascism. He was imprisoned and eventually released to house arrest. In 1942, his son-in-law was executed by the Nazis, and in 1943 his daughter was sent to Auschwitz. Fearing for his own life, Langevin escaped to Switzerland. He returned shortly after the liberation of Paris and was joined after the end of the war by his daughter who had survived Auschwitz and later served in the Assemblée Consultative as a communist member. Langevin passed away in 1946 and received a national funeral. His remains lie today in the Pantheon.
The Langevin Equation
In 1908, Langevin realized that Einstein’s 1905 theory on Brownian motion could be simplified while at the same time generalized. Langevin introduced a new quantity into theoretical physics—the stochastic force . With this new theoretical tool, he was able to work with diffusing particles in momentum space as dynamical objects with inertia buffeted by random forces, providing a Newtonian formulation for short-time effects that were averaged out and lost in Einstein’s approach.
Stochastic processes are understood by considering a dynamical flow that includes a random function. The resulting set of equations are called the Langevin equation, namely
where fa is a set of N regular functions, and σa is the standard deviation of the a-th process out of N. The stochastic functions ξa are in general non-differentiable but are integrable. They have zero mean, and no temporal correlations. The solution is an N-dimensional trajectory that has properties of a random walk superposed on the dynamics of the underlying mathematical flow.
As an example, take the case of a particle moving in a one-dimensional potential, subject to drag and to an additional stochastic force
where γ is the drag coefficient, U is a potential function and B is the velocity diffusion coefficient. The second term in the bottom equation is the classical force from a potential function, while the third term is the stochastic force. The crucial point is that the stochastic force causes jumps in velocity that integrate into displacements, creating a random walk superposed on the deterministic mechanics.
Random Walk in a Harmonic Potential
Diffusion of a particle in a weak harmonic potential is equivalent to a mass on a weak spring in a thermal bath. For short times, the particle motion looks like a random walk, but for long times, the mean-squared displacement must satisfy the equipartition relation
The Langevin equation is the starting point of motion under a stochastic force F’
where the second equation has been multiplied through by x. For a spherical particle of radius a, the viscous drag factor is
and η is the viscosity. The term on the left of the dynamical equation can be rewritten to give
It is then necessary to take averages. The last term on the right vanishes because of the random signs of xF’. However, the buffeting from the random force can be viewed as arising from an effective temperature. Then from equipartition on the velocity
Making the substitution y = <x2> gives
which is the dynamical equation for a particle in a harmonic potential subject to a constant effective force kBT. For small objects in viscous fluids, the inertial terms are negligible relative to the other terms (see Life at small Reynolds Number ), so the dynamic equation is
This solution at short times describes a diffusing particle (Fickian behavior) with a diffusion coefficient D. However, for long times the solution asymptotes to an equipartition value of <x2> = kBT/k. In the intermediate time regime, the particle is walking randomly, but the mean-squared displacement is no longer growing linearly with time.
Constrained motion shows clear saturation to the size set by the physical constraints (equipartition for an oscillator or compartment size for a freely diffusing particle ). However, if the experimental data do not clearly extend into the saturation time regime, then the fit to anomalous diffusion can lead to exponents that do not equal unity. This is illustrated in Fig. 3 with asymptotic MSD compared with the anomalous diffusion equation fit for the exponent β. Care must be exercised in the interpretation of the exponents obtained from anomalous diffusion experiments. In particular, all constrained motion leads to subdiffusive interpretations if measured at intermediate times.
Random Walk in a Double Potential
The harmonic potential has well-known asymptotic dynamics which makes the analytic treatment straightforward. However, the Langevin equation is general and can be applied to any potential function. Take a double-well potential as another example
The resulting Langevin equation can be solved numerically in the presence of random velocity jumps. A specific stochastic trajectory is shown in Fig. 4 that applies discrete velocity jumps using a normal distribution of jumps of variance 2B. The notable character of this trajectory, besides the random-walk character, is the ability of the particle to jump the barrier between the wells. In the deterministic system, the initial condition dictates which stable fixed point would be approached. In the stochastic system, there are random fluctuations that take the particle from one basin of attraction to the other.
The stochastic long-time probability distribution p(x,v) in Fig. 5 introduces an interesting new view of trajectories in state space that have a different character than typical state-space flows. If we think about starting a large number of systems with the same initial conditions, and then letting the stochastic dynamics take over, we can define a time-dependent probability distribution p(x,v,t) that describes the likely end-positions of an ensemble of trajectories on the state plane as a function of time. This introduces the idea of the trajectory of a probability cloud in state space, which has a strong analogy to time-dependent quantum mechanics. The Schrödinger equation can be viewed as a diffusion equation in complex time, which is the basis of a technique known as quantum Monte Carlo that solves for ground state wave functions using concepts of random walks. This goes beyond the topics of classical mechanics, and it shows how such diverse fields as econophysics, diffusion, and quantum mechanics can share common tools and language.
“Stochastic Chaos” sounds like an oxymoron. “Chaos” is usually synonymous with “deterministic chaos”, meaning that every next point on a trajectory is determined uniquely by its previous location–there is nothing random about the evolution of the dynamical system. It is only when one looks at long times, or at two nearby trajectories, that non-repeatable and non-predictable behavior emerges, so there is nothing stochastic about it.
On the other hand, there is nothing wrong with adding a stochastic function to the right-hand side of a deterministic flow–just as in the Langevin equation. One question immediately arises: if chaos has sensitivity to initial conditions (SIC), wouldn’t it be highly susceptible to constant buffeting by a stochastic force? Let’s take a look!
To the well-known Rössler model, add a stochastic function to one of the three equations,
in this case to the y-dot equation. This is just like the stochastic term in the random walks in the harmonic and double-well potentials. The solution is shown in Fig. 6. In addition to the familiar time-series of the Rössler model, there are stochastic jumps in the y-variable. An x-y projection similarly shows the familiar signature of the model, and the density of trajectory points is shown in the density plot on the right. The rms jump size for this simulation is approximately 10%.
Now for the supposition that because chaos has sensitivity to initial conditions that it should be highly susceptible to stochastic contributions–the answer can be seen in Fig. 7 in the state-space densities. Other than a slightly more fuzzy density for the stochastic case, the general behavior of the Rössler strange attractor is retained. The attractor is highly stable against the stochastic fluctuations. This demonstrates just how robust deterministic chaos is.
On the other hand, there is a saddle point in the Rössler dynamics a bit below the lowest part of the strange attractor in the figure, and if the stochastic jumps are too large, then the dynamics become unstable and diverge. A hint at this is already seen in the time series in Fig. 6 that shows the nearly closed orbit that occurs transiently at large negative y values. This is near the saddle point, and this trajectory is dangerously close to going unstable. Therefore, while the attractor itself is stable, anything that drives a dynamical system to a saddle point will destabilize it, so too much stochasticity can cause a sudden destruction of the attractor.
 E. M. Purcell, “Life at Low Reynolds-Number,” American Journal of Physics, vol. 45, no. 1, pp. 3-11, (1977)
 Ritchie, K., Shan, X.Y., Kondo, J., Iwasawa, K., Fujiwara, T., Kusumi, A.: Detection of non- Brownian diffusion in the cell membrane in single molecule tracking. Biophys. J. 88(3), 2266–2277 (2005)
Physics in high dimensions is becoming the norm in modern dynamics. It is not only that string theory operates in ten dimensions (plus one for time), but virtually every complex dynamical system is described and analyzed within state spaces of high dimensionality. Population dynamics, for instance, may describe hundreds or thousands of different species, each of whose time-varying populations define a separate axis in a high-dimensional space. Coupled mechanical systems likewise may have hundreds or thousands (or more) of degrees of freedom that are described in high-dimensional phase space.
In high-dimensional landscapes, mountain ridges are much more common than mountain peaks. This has profound consequences for the evolution of life, the dynamics of complex systems, and the power of machine learning.
For these reasons, as physics students today are being increasingly exposed to the challenges and problems of high-dimensional dynamics, it is important to build tools they can use to give them an intuitive feeling for the highly unintuitive behavior of systems in high-D.
Within the rapidly-developing field of machine learning, which often deals with landscapes (loss functions or objective functions) in high dimensions that need to be minimized, high dimensions are usually referred to in the negative as “The Curse of Dimensionality”.
Dimensionality might be viewed as a curse for several reasons. First, it is almost impossible to visualize data in dimensions higher than d = 4 (the fourth dimension can sometimes be visualized using colors or time series). Second, too many degrees of freedom create too many variables to fit or model, leading to the classic problem of overfitting. Put simply, there is an absurdly large amount of room in high dimensions. Third, our intuition about relationships among areas and volumes are highly biased by our low-dimensional 3D experiences, causing us to have serious misconceptions about geometric objects in high-dimensional spaces. Physical processes occurring in 3D can be over-generalized to give preconceived notions that just don’t hold true in higher dimensions.
Take, for example, the random walk. It is usually taught starting from a 1-dimensional random walk (flipping a coin) that is then extended to 2D and then to 3D…most textbooks stopping there. But random walks in high dimensions are the rule rather than the exception in complex systems. One example that is especially important in this context is the problem of molecular evolution. Each site on a genome represents an independent degree of freedom, and molecular evolution can be described as a random walk through that space, but the space of all possible genetic mutations is enormous. Faced with such an astronomically large set of permutations, it is difficult to conceive of how random mutations could possibly create something as complex as, say, ATP synthase which is the basis of all higher bioenergetics. Fortunately, the answer to this puzzle lies in the physics of random walks in high dimensions.
Why Ten Dimensions?
This blog presents the physics of random walks in 10 dimensions. Actually, there is nothing special about 10 dimensions versus 9 or 11 or 20, but it gives a convenient demonstration of high-dimensional physics for several reasons. First, it is high enough above our 3 dimensions that there is no hope to visualize it effectively, even by using projections, so it forces us to contend with the intrinsic “unvisualizability” of high dimensions. Second, ten dimensions is just big enough that it behaves roughly like any higher dimension, at least when it comes to random walks. Third, it is about as big as can be handled with typical memory sizes of computers. For instance, a ten-dimensional hypercubic lattice with 10 discrete sites along each dimension has 10^10 lattice points (10 Billion or 10 Gigs) which is about the limit of what a typical computer can handle with internal memory.
As a starting point for visualization, let’s begin with the well-known 4D hypercube but extend it to a 4D hyperlattice with three values along each dimension instead of two. The resulting 4D lattice can be displayed in 2D as a network with 3^4 = 81 nodes and 216 links or edges. The result is shown in Fig. 1, represented in two dimensions as a network graph with nodes and edges. Each node has four links with neighbors. Despite the apparent 3D look that this graph has about it, if you look closely you will see the frustration that occurs when trying to link to 4 neighbors, causing many long-distance links.
We can also look at a 10D hypercube that has 2^10 = 1024 nodes and 5120 edges, shown in Fig. 2. It is a bit difficult to see the hypercubic symmetry when presented in 2D, but each node has exactly 10 links.
Extending this 10D lattice to 10 positions instead of 2 and trying to visualize it is prohibitive, since the resulting graph in 2D just looks like a mass of overlapping circles. However, our interest extends not just to ten locations per dimension, but to an unlimited number of locations. This is the 10D infinite lattice on which we want to explore the physics of the random walk.
Diffusion in Ten Dimensions
An unconstrained random walk in 10D is just a minimal extension beyond a simple random walk in 1D. Because each dimension is independent, a single random walker takes a random step along any of the 10 dimensions at each iteration so that motion in any one of the 10 dimensions is just a 1D random walk. Therefore, a simple way to visualize this random walk in 10D is simply to plot the walk against each dimension, as in Fig. 3. There is one chance in ten that the walker will take a positive or negative step along any given dimension at each time point.
An alternate visualization of the 10D random walker is shown in Fig. 4 for the same data as Fig. 3. In this case the displacement is color coded, and each column is a different dimension. Time is on the vertical axis (starting at the top and increasing downward). This type of color map can easily be extended to hundreds of dimensions. Each row is a position vector of the single walker in the 10D space
In the 10D hyperlattice in this section, all lattice sites are accessible at each time point, so there is no constraint preventing the walk from visiting a previously-visited node. There is a possible adjustment that can be made to the walk that prevents it from ever crossing its own path. This is known as a self-avoiding-walk (SAW). In two dimensions, there is a major difference in the geometric and dynamical properties of an ordinary walk and an SAW. However, in dimensions larger than 4, it turns out that there are so many possibilities of where to go (high-dimensional spaces have so much free room) that it is highly unlikely that a random walk will ever cross itself. Therefore, in our 10D hyperlattice we do not need to make the distinction between an ordinary walk and a self-avoiding-walk. However, there are other constraints that can be imposed that mimic how complex systems evolve in time, and these constraints can have important consequences, as we see next.
Random Walk in a Maximally Rough Landscape
In the infinite hyperlattice of the previous section, all lattice sites are the same and are all equally accessible. However, in the study of complex systems, it is common to assign a value to each node in a high-dimensional lattice. This value can be assigned by a potential function, producing a high-dimensional potential landscape over the lattice geometry. Or the value might be the survival fitness of a species, producing a high-dimensional fitness landscape that governs how species compete and evolve. Or the value might be a loss function (an objective function) in a minimization problem from multivariate analysis or machine learning. In all of these cases, the scalar value on the nodes defines a landscape over which a state point executes a walk. The question then becomes, what are the properties of a landscape in high dimensions, and how does it affect a random walker?
As an example, let’s consider a landscape that is completely random point-to-point. There are no correlations in this landscape, making it maximally rough. Then we require that a random walker takes a walk along iso-potentials in this landscape, never increasing and never decreasing its potential. Beginning with our spatial intuition living in 3D space, we might be concerned that such a walker would quickly get confined in some area of the lanscape. Think of a 2D topo map with countour lines drawn on it — If we start at a certain elevation on a mountain side, then if we must walk along directions that maintain our elevation, we stay on a given contour and eventually come back to our starting point after circling the mountain peak — we are trapped! But this intuition informed by our 3D lives is misleading. What happens in our 10D hyperlattice?
To make the example easy to analyze, let’s assume that our potential function is restricted to N discrete values. This means that of the 10 neighbors to a given walker site, on average only 10/N are likely to have the same potential value as the given walker site. This constrains the available sites for the walker, and it converts the uniform hyperlattice into a hyperlattice site percolation problem.
Percolation theory is a fascinating topic in statistical physics. There are many deep concepts that come from asking simple questions about how nodes are connected across a network. The most important aspect of percolation theory is the concept of a percolation threshold. Starting with a complete network that is connected end-to-end, start removing nodes at random. For some critical fraction of nodes removed (on average) there will no longer be a single connected cluster that spans the network. This critical fraction is known as the percolation threshold. Above the percolation threshold, a random walker can get from one part of the network to another. Below the percolation threshold, the random walker is confined to a local cluster.
If a hyperlattice has N discrete values for the landscape potential (or height, or contour) and if a random walker can only move to site that has the same value as the walker’s current value (remains on the level set), then only a fraction of the hyperlattice sites are available to the walker, and the question of whether the walker can find a path the spans the hyperlattice becomes simply a question of how the fraction of available sites relates to the percolation threshold.
The percolation threshold for hyperlattices is well known. For reasonably high dimensions, it is given to good accuracy by
where d is the dimension of the hyperlattice. For a 10D hyperlattice the percolation threshold is pc(10) = 0.0568, or about 6%. Therefore, if more than 6% of the sites of the hyperlattice have the same value as the walker’s current site, then the walker is free to roam about the hyperlattice.
If there are N = 5 discrete values for the potential, then 20% of the sites are available, which is above the percolation threshold, and walkers can go as far as they want. This statement holds true no matter what the starting value is. It might be 5, which means the walker is as high on the landscape as they can get. Or it might be 1, which means the walker is as low on the landscape as they can get. Yet even if they are at the top, if the available site fraction is above the percolation threshold, then the walker can stay on the high mountain ridge, spanning the landscape. The same is true if they start at the bottom of a valley. Therefore, mountain ridges are very common, as are deep valleys, yet they allow full mobility about the geography. On the other hand, a so-called mountain peak would be a 5 surrounded by 4’s or lower. The odds for having this happen in 10D are 0.2*(1-0.8^10) = 0.18. Then the total density of mountain peaks, in a 10D hyperlattice with 5 potential values, is only 18%. Therefore, mountain peaks are rare in 10D, while mountain ridges are common. In even higher dimensions, the percolation threshold decreases roughly inversely with the dimensionality, and mountain peaks become extremely rare and play virtually no part in walks about the landscape.
To illustrate this point, Fig. 5 is the same 10D network that is in Fig. 2, but only the nodes sharing the same value are shown for N = 5, which means that only 20% of the nodes are accessible to a walker who stays only on nodes with the same values. There is a “giant cluster” that remains connected, spanning the original network. If the original network is infinite, then the giant cluster is also infinite but contains a finite fraction of the nodes.
The quantitative details of the random walk can change depending on the proximity of the sub-networks (the clusters, the ridges or the level sets) to the percolation threshold. For instance, a random walker in D =10 with N = 5 is shown in Fig. 6. The diffusion is a bit slower than in the unconstrained walk of Figs. 3 and 4. But the ability to wander about the 10D space is retained.
This is then the general important result: In high-dimensional landscapes, mountain ridges are much more common than mountain peaks. This has profound consequences for the evolution of life, the dynamics of complex systems, and the power of machine learning.
Consequences for Evolution and Machine Learning
When the high-dimensional space is the space of possible mutations on a genome, and when the landscape is a fitness landscape that assigns a survival advantage for one mutation relative to others, then the random walk describes the evolution of a species across generations. The prevalence of ridges, or more generally level sets, in high dimensions has a major consequence for the evolutionary process, because a species can walk along a level set acquiring many possible mutations that have only neutral effects on the survivability of the species. At the same time, the genetic make-up is constantly drifting around in this “neutral network”, allowing the species’ genome to access distant parts of the space. Then, at some point, natural selection may tip the species up a nearby (but rare) peak, and a new equilibrium is attained for the species.
One of the early criticisms of fitness landscapes was the (erroneous) criticism that for a species to move from one fitness peak to another, it would have to go down and cross wide valleys of low fitness to get to another peak. But this was a left-over from thinking in 3D. In high-D, neutral networks are ubiquitous, and a mutation can take a step away from one fitness peak onto one of the neutral networks, which can be sampled by a random walk until the state is near some distant peak. It is no longer necessary to think in terms of high peaks and low valleys of fitness — just random walks. The evolution of extremely complex structures, like ATP synthase, can then be understood as a random walk along networks of nearly-neutral fitness — once our 3D biases are eliminated.
The same arguments hold for many situations in machine learning and especially deep learning. When training a deep neural network, there can be thousands of neural weights that need to be trained through the minimization of a loss function, also known as an objective function. The loss function is the equivalent to a potential, and minimizing the loss function over the thousands of dimensions is the same problem as maximizing the fitness of an evolving species.
At first look, one might think that deep learning is doomed to failure. We have all learned, from the earliest days in calculus, that enough adjustable parameter can fit anything, but the fit is meaningless because it predicts nothing. Deep learning seems to be the worst example of this. How can fitting thousands of adjustable parameters be useful when the dimensionality of the optimization space is orders of magnitude larger than the degrees of freedom of the system being modeled?
The answer comes from the geometry of high dimensions. The prevalence of neutral networks in high dimensions gives lots of chances to escape local minima. In fact, local minima are actually rare in high dimensions, and when they do occur, there is a neutral network nearby onto which they can escape (if the effective temperature of the learning process is set sufficiently high). Therefore, despite the insanely large number of adjustable parameters, general solutions, that are meaningful and predictive, can be found by adding random walks around the objective landscape as a partial strategy in combination with gradient descent.
Given the superficial analogy of deep learning to the human mind, the geometry of random walks in ultra-high dimensions may partially explain our own intelligence and consciousness.
S. Gravilet, Fitness Landscapes and the Origins of Species. Princeton University Press, 2004.
M. Kimura, The Neutral Theory of Molecular Evolution. Cambridge University Press, 1968.
… GR combined with nonlinear synchronization yields the novel phenomenon of a “synchronization cascade”.
Imagine a space ship containing a collection of highly-accurate atomic clocks factory-set to arbitrary precision at the space-ship factory before launch. The clocks are lined up with precisely-equal spacing along the axis of the space ship, which should allow the astronauts to study events in spacetime to high accuracy as they orbit neutron stars or black holes. Despite all the precision, spacetime itself will conspire to detune the clocks. Yet all is not lost. Using the physics of nonlinear synchronization, the astronauts can bring all the clocks together to a compromise frequency—locking all the clocks to a common rate. This blog post shows how this can happen.
Synchronization of Oscillators
The simplest synchronization problem is two “phase oscillators” coupled with a symmetric nonlinearity. The dynamical flow is
where ωk are the individual angular frequencies and g is the coupling constant. When g is greater than the difference Δω, then the two oscillators, despite having different initial frequencies, will find a stable fixed point and lock to a compromise frequency.
Taking this model to N phase oscillators creates the well-known Kuramoto model that is characterized by a relatively sharp mean-field phase transition leading to global synchronization. The model averages N phase oscillators to a mean field where g is the coupling coefficient, K is the mean amplitude, Θ is the mean phase, and ω-bar is the mean frequency. The dynamics are given by
The last equation is the final mean-field equation that synchronizes each individual oscillator to the mean field. For a large number of oscillators that are globally coupled to each other, increasing the coupling has little effect on the oscillators until a critical threshold is crossed, after which all the oscillators synchronize with each other. This is known as the Kuramoto synchronization transition, shown in Fig. 2 for 20 oscillators with uniformly distributed initial frequencies. Note that the critical coupling constant gc is roughly half of the spread of initial frequencies.
The question that this blog seeks to answer is how this synchronization mechanism may be used in a space craft exploring the strong gravity around neutron stars or black holes. The key to answering this question is the metric tensor for this system
where the first term is the time-like term g00 that affects ticking clocks, and the second term is the space-like term that affects the length of the space craft.
Kuramoto versus the Neutron Star
Consider the space craft holding a steady radius above a neutron star, as in Fig. 3. For simplicity, hold the craft stationary rather than in an orbit to remove the details of rotating frames. Because each clock is at a different gravitational potential, it runs at a different rate because of gravitational time dilation–clocks nearer to the neutron star run slower than clocks farther away. There is also a gravitational length contraction of the space craft, which modifies the clock rates as well.
The analysis starts by incorporating the first-order approximation of time dilation through the component g00. The component is brought in through the period of oscillations. All frequencies are referenced to the base oscillator that has the angular rate ω0, and the other frequencies are primed. As we consider oscillators higher in the space craft at positions R + h, the 1/(R+h) term in g00 decreases as does the offset between each successive oscillator.
The dynamical equations for a system for only two clocks, coupled through the constant k, are
These are combined to a single equation by considering the phase difference
The two clocks will synchronize to a compromise frequency for the critical coupling coefficient
Now, if there is a string of N clocks, as in Fig. 3, the question is how the frequencies will spread out by gravitational time dilation, and what the entrainment of the frequencies to a common compromise frequency looks like. If the ship is located at some distance from the neutron star, then the gravitational potential at one clock to the next is approximately linear, and coupling them would produce the classic Kuramoto transition.
However, if the ship is much closer to the neutron star, so that the gravitational potential is no longer linear, then there is a “fan-out” of frequencies, with the bottom-most clock ticking much more slowly than the top-most clock. Coupling these clocks produces a modified, or “stretched”, Kuramoto transition as in Fig. 4.
In the two examples in Fig. 4, the bottom-most clock is just above the radius of the neutron star (at R0 = 4RS for a solar-mass neutron star, where RS is the Schwarzschild radius) and at twice that radius (at R0 = 8RS). The length of the ship, along which the clocks are distributed, is RS in this example. This may seem unrealistically large, but we could imagine a regular-sized ship supporting a long stiff cable dangling below it composed of carbon nanotubes that has the clocks distributed evenly on it, with the bottom-most clock at the radius R0. In fact, this might be a reasonable design for exploring spacetime events near a neutron star (although even carbon nanotubes would not be able to withstand the strain).
Kuramoto versus the Black Hole
Against expectation, exploring spacetime around a black hole is actually easier than around a neutron star, because there is no physical surface at the Schwarzschild radius RS, and gravitational tidal forces can be small for large black holes. In fact, one of the most unintuitive aspects of black holes pertains to a space ship falling into one. A distant observer sees the space ship contracting to zero length and the clocks slowing down and stopping as the space ship approaches the Schwarzschild radius asymptotically, but never crossing it. However, on board the ship, all appears normal as it crosses the Schwarzschild radius. To the astronaut inside, there is is a gravitational potential inside the space ship that causes the clocks at the base to run more slowly than the upper clocks, and length contraction affects the spacing a little, but otherwise there is no singularity as the event horizon is passed. This appears as a classic “paradox” of physics, with two different observers seeing paradoxically different behaviors.
The resolution of this paradox lies in the differential geometry of the two observers. Each approximates spacetime with a Euclidean coordinate system that matches the local coordinates. The distant observer references the warped geometry to this “chart”, which produces the apparent divergence of the Schwarzschild metric at RS. However, the astronaut inside the space ship has her own flat chart to which she references the locally warped space time around the ship. Therefore, it is the differential changes, referenced to the ships coordinate origin, that capture gravitational time dilation and length contraction. Because the synchronization takes place in the local coordinate system of the ship, this is the coordinate system that goes into the dynamical equations for synchronization. Taking this approach, the shifts in the clock rates are given by the derivative of the metric as
where hn is the height of the n-th clock above R0.
Fig. 5 shows the entrainment plot for the black hole. The plot noticeably has a much smoother transition. In this higher mass case, the system does not have as many hard coupling transitions and instead exhibits smooth behavior for global coupling. This is the Kuramoto “cascade”. Contrast the behavior of Fig. 5 (left) to the classic Kuramoto transition of Fig. 2. The increasing frequency separations near the black hole produces a succession of frequency locks as the coupling coefficient increases. For comparison, the case of linear coupling along the cable is shown in Fig. 5 on the right. The cascade is now accompanied with interesting oscillations as one clock entrains with a neighbor, only to be pulled back by interaction with locked subclusters.
Now let us consider what role the spatial component of the metric tensor plays in the synchronization. The spatial component causes the space between the oscillators to decrease closer to the supermassive object. This would cause the oscillators to entrain faster because the bottom oscillators that entrain the slowest would be closer together, but the top oscillators would entrain slower since they are a farther distance apart, as in Fig. 6.
In terms of the local coordinates of the space ship, the locations of each clock are
These values for hn can be put into the equation for ωn above. But it is clear that this produces a second order effect. Even at the event horizon, this effect is only a fraction of the shifts caused by g00 directly on the clocks. This is in contrast to what a distant observer sees–the clock separations decreasing to zero, which would seem to decrease the frequency shifts. But the synchronization coupling is performed in the ship frame, not the distant frame, so the astronaut can safely ignore this contribution.
As a final exploration of the black hole, before we leave it behind, look at the behavior for different values of R0 in Fig. 7. At 4RS, the Kuramoto transition is stretched. At 2RS there is a partial Kuramoto transition for the upper clocks, that then stretch into a cascade of locking events for the lower clocks. At 1RS we see the full cascade as before.
Note from the Editor:
This blog post by Moira Andrews is based on her final project for Phys 411, upper division undergraduate mechanics, at Purdue University. Students are asked to combine two seemingly-unrelated aspects of modern dynamics and explore the results. Moira thought of synchronizing clocks that are experiencing gravitational time dilation near a massive body. This is a nice example of how GR combined with nonlinear synchronization yields the novel phenomenon of a “synchronization cascade”.
Cheng, T.-P. (2010). Relativity, Gravitation and Cosmology. Oxford University Press.
Imagine if you just discovered how to text through time, i.e. time-texting, when a close friend meets a shocking death. Wouldn’t you text yourself in the past to try to prevent it? But what if, every time you change the time-line and alter the future in untold ways, the friend continues to die, and you seemingly can never stop it? This is the premise of Stein’s Gate, a Japanese sci-fi animé bringing in the paradoxes of time travel, casting CERN as an evil clandestine spy agency, and introducing do-it-yourself inventors, hackers, and wacky characters, while it centers on a terrible death of a lovable character that can never be avoided.
It is also a good computational physics project that explores the dynamics of bifurcations, bistability and chaos. I teach a course in modern dynamics in the Physics Department at Purdue University. The topics of the course range broadly from classical mechanics to chaos theory, social networks, synchronization, nonlinear dynamics, economic dynamics, population dynamics, evolutionary dynamics, neural networks, special and general relativity, among others that are covered in the course using a textbook that takes a modern view of dynamics .
For the final project of the second semester the students (Junior physics majors) are asked to combine two or three of the topics into a single project. Students have come up with a lot of creative combinations: population dynamics of zombies, nonlinear dynamics of negative gravitational mass, percolation of misinformation in presidential elections, evolutionary dynamics of neural architecture, and many more. In that spirit, and for a little fun, in this blog I explore the so-called physics of Stein’s Gate.
Stein’s Gate and the Divergence Meter
Stein’s Gate is a Japanese TV animé series that had a world-wide distribution in 2011. The central premise of the plot is that certain events always occur even if you are on different timelines—like trying to avoid someone’s death in an accident.
This is the problem confronting Rintaro Okabe who tries to stop an accident that kills his friend Mayuri Shiina. But every time he tries to change time, she dies in some other way. It turns out that all the nearby timelines involve her death. According to a device known as The Divergence Meter, Rintaro must get farther than 4% away from the original timeline to have a chance to avoid the otherwise unavoidable event.
This is new. Usually, time-travel Sci-Fi is based on the Butterfly Effect. Chaos theory is characterized by something called sensitivity to initial conditions (SIC), meaning that slightly different starting points produce trajectories that diverge exponentially from nearby trajectories. It is called the Butterfly Effect because of the whimsical notion that a butterfly flapping its wings in China can cause a hurricane in Florida. In the context of the butterfly effect, if you go back in time and change anything at all, the effect cascades through time until the present time in unrecognizable. As an example, in one episode of the TV cartoon The Simpsons, Homer goes back in time to the age of the dinosaurs and kills a single mosquito. When he gets back to our time, everything has changed in bazaar and funny ways.
Stein’s Gate introduces a creative counter example to the Butterfly Effect. Instead of scrambling the future when you fiddle with the past, you find that you always get the same event, even when you change a lot of the conditions—Mayuri still dies. This sounds eerily familiar to a physicist who knows something about chaos theory. It means that the unavoidable event is acting like a stable fixed point in the time dynamics—an attractor! Even if you change the initial conditions, the dynamics draw you back to the fixed point—in this case Mayuri’s accident. What would this look like in a dynamical system?
The Local Basin of Attraction
Dynamical systems can be described as trajectories in a high-dimensional state space. Within state space there are special points where the dynamics are static—known as fixed points. For a stable fixed point, a slight perturbation away will relax back to the fixed point. For an unstable fixed point, on the other hand, a slight perturbation grows and the system dynamics evolve away. However, there can be regions in state space where every initial condition leads to trajectories that stay within that region. This is known as a basin of attraction, and the boundaries of these basins are called separatrixes.
A high-dimensional state space can have many basins of attraction. All the physics that starts within a basin stays within that basin—almost like its own self-consistent universe, bordered by countless other universes. There are well-known physical systems that have many basins of attraction. String theory is suspected to generate many adjacent universes where the physical laws are a little different in each basin of attraction. Spin glasses, which are amorphous solid-state magnets, have this property, as do recurrent neural networks like the Hopfield network. Basins of attraction occur naturally within the physics of these systems.
It is possible to embed basins of attraction within an existing dynamical system. As an example, let’s start with one of the simplest types of dynamics, a hyperbolic fixed point
that has a single saddle fixed point at the origin. We want to add a basin of attraction at the origin with a domain range given by a radius r0. At the same time, we want to create a separatrix that keeps the outer hyperbolic dynamics separate from the internal basin dynamics. To keep all outer trajectories in the outer domain, we can build a dynamical barrier to prevent the trajectories from crossing the separatrix. This can be accomplished by adding a radial repulsive term
In x-y coordinates this is
We also want to keep the internal dynamics of our basin separate from the external dynamics. To do this, we can multiply by a sigmoid function, like a Heaviside function H(r-r0), to zero-out the external dynamics inside our basin. The final external dynamics is then
Now we have to add the internal dynamics for the basin of attraction. To make it a little more interesting, let’s make the internal dynamics an autonomous oscillator
Putting this all together, gives
This looks a little complex, for such a simple model, but it illustrates the principle. The sigmoid is best if it is differentiable, so instead of a Heaviside function it can be a Fermi function
The phase-space portrait of the final dynamics looks like
Adding the internal dynamics does not change the far-field external dynamics, which are still hyperbolic. The repulsive term does split the central saddle point into two saddle points, one on each side left-and-right, so the repulsive term actually splits the dynamics. But the internal dynamics are self-contained and separate from the external dynamics. The origin is an unstable spiral that evolves to a limit cycle. The basin boundary has marginal stability and is known as a “wall”.
To verify the stability of the external fixed point, find the fixed point coordinates
and evaluate the Jacobian matrix (for A = 1 and x0 = 2)
which is clearly a saddle point because the determinant is negative.
In the context of Stein’s Gate, the basin boundary is equivalent to the 4% divergence which is necessary to escape the internal basin of attraction where Mayuri meets her fate.
Python Program: SteinsGate2D.py
# -*- coding: utf-8 -*-
Created on Sat March 6, 2021
@author: David Nolte
Introduction to Modern Dynamics, 2nd edition (Oxford University Press, 2019)
2D simulation of Stein's Gate Divergence Meter
import numpy as np
from scipy import integrate
from matplotlib import pyplot as plt
def solve_flow(param,lim = [-6,6,-6,6],max_time=20.0):
def flow_deriv(x_y, t0, alpha, beta, gamma):
#"""Compute the time-derivative ."""
x, y = x_y
w = 1
R2 = x**2 + y**2
R = np.sqrt(R2)
arg = (R-2)/0.1
env1 = 1/(1+np.exp(arg))
env2 = 1 - env1
f = env2*(x*(1/(R-1.99)**2 + 1e-2) - x) + env1*(w*y + w*x*(1 - R))
g = env2*(y*(1/(R-1.99)**2 + 1e-2) + y) + env1*(-w*x + w*y*(1 - R))
model_title = 'Steins Gate'
xmin = lim
xmax = lim
ymin = lim
ymax = lim
plt.axis([xmin, xmax, ymin, ymax])
N = 24*4 + 47
x0 = np.zeros(shape=(N,2))
ind = -1
for i in range(0,24):
ind = ind + 1
x0[ind,0] = xmin + (xmax-xmin)*i/23
x0[ind,1] = ymin
ind = ind + 1
x0[ind,0] = xmin + (xmax-xmin)*i/23
x0[ind,1] = ymax
ind = ind + 1
x0[ind,0] = xmin
x0[ind,1] = ymin + (ymax-ymin)*i/23
ind = ind + 1
x0[ind,0] = xmax
x0[ind,1] = ymin + (ymax-ymin)*i/23
ind = ind + 1
x0[ind,0] = 0.05
x0[ind,1] = 0.05
for thetloop in range(0,10):
ind = ind + 1
theta = 2*np.pi*(thetloop)/10
ys = 0.125*np.sin(theta)
xs = 0.125*np.cos(theta)
x0[ind,0] = xs
x0[ind,1] = ys
for thetloop in range(0,10):
ind = ind + 1
theta = 2*np.pi*(thetloop)/10
ys = 1.7*np.sin(theta)
xs = 1.7*np.cos(theta)
x0[ind,0] = xs
x0[ind,1] = ys
for thetloop in range(0,20):
ind = ind + 1
theta = 2*np.pi*(thetloop)/20
ys = 2*np.sin(theta)
xs = 2*np.cos(theta)
x0[ind,0] = xs
x0[ind,1] = ys
ind = ind + 1
x0[ind,0] = -3
x0[ind,1] = 0.05
ind = ind + 1
x0[ind,0] = -3
x0[ind,1] = -0.05
ind = ind + 1
x0[ind,0] = 3
x0[ind,1] = 0.05
ind = ind + 1
x0[ind,0] = 3
x0[ind,1] = -0.05
ind = ind + 1
x0[ind,0] = -6
x0[ind,1] = 0.00
ind = ind + 1
x0[ind,0] = 6
x0[ind,1] = 0.00
colors = plt.cm.prism(np.linspace(0, 1, N))
# Solve for the trajectories
t = np.linspace(0, max_time, int(250*max_time))
x_t = np.asarray([integrate.odeint(flow_deriv, x0i, t, param)
for x0i in x0])
for i in range(N):
x, y = x_t[i,:,:].T
lines = plt.plot(x, y, '-', c=colors[i])
return t, x_t
param = (0.02,0.5,0.2) # Steins Gate
lim = (-6,6,-6,6)
t, x_t = solve_flow(param,lim)
The Lorenz Butterfly
Two-dimensional phase space cannot support chaos, and we would like to reconnect the central theme of Stein’s Gate, the Divergence Meter, with the Butterfly Effect. Therefore, let’s actually incorporate our basin of attraction inside the classic Lorenz Butterfly. The goal is to put an attracting domain into the midst of the three-dimensional state space of the Lorenz butterfly in a way that repels the butterfly, without destroying it, but attracts local trajectories. The question is whether the butterfly can survive if part of its state space is made unavailable to it.
The classic Lorenz dynamical system is
As in the 2D case, we will put in a repelling barrier that prevents external trajectories from moving into the local basin, and we will isolate the external dynamics by using the sigmoid function. The final flow equations looks like
where the radius is relative to the center of the attracting basin
and r0 is the radius of the basin. The center of the basin is at [x0, y0, z0] and we are assuming that x0 = 0 and y0 = 0 and z0 = 25 for the standard Butterfly parameters p = 10, r = 25 and b = 8/3. This puts our basin of attraction a little on the high side of the center of the Butterfly. If we embed it too far inside the Butterfly it does actually destroy the Butterfly dynamics.
When r0 = 0, the dynamics of the Lorenz’ Butterfly are essentially unchanged. However, when r0 = 1.5, then there is a repulsive effect on trajectories that pass close to the basin. It can be seen as part of the trajectory skips around the outside of the basin in Figure 2.
Trajectories can begin very close to the basin, but still on the outside of the separatrix, as in the top row of Figure 3 where the basin of attraction with r0 = 1.5 lies a bit above the center of the Butterfly. The Butterfly still exists for the external dynamics. However, any trajectory that starts within the basin of attraction remains there and executes a stable limit cycle. This is the world where Mayuri dies inside the 4% divergence. But if the initial condition can exceed 4%, then the Butterfly effect takes over. The bottom row of Figure 2 shows that the Butterfly itself is fragile. When the external dynamics are perturbed more strongly by more closely centering the local basin, the hyperbolic dynamics of the Butterfly are impeded and the external dynamics are converted to a stable limit cycle. It is interesting that the Butterfly, so often used as an illustration of sensitivity to initial conditions (SIC), is itself sensitive to perturbations that can convert it away from chaos and back to regular motion.
Discussion and Extensions
In the examples shown here, the local basin of attraction was put in “by hand” as an isolated region inside the dynamics. It would be interesting to consider more natural systems, like a spin glass or a Hopfield network, where the basins of attraction occur naturally from the physical principles of the system. Then we could use the “Divergence Meter” to explore these physical systems to see how far the dynamics can diverge before crossing a separatrix. These systems are impossible to visualize because they are intrinsically very high dimensional systems, but Monte Carlo approaches could be used to probe the “sizes” of the basins.
Another interesting extension would be to embed these complex dynamics into spacetime. Since this all started with the idea of texting through time, it would be interesting (and challenging) to see how we could describe this process in a high dimensional Minkowski space that had many space dimensions (but still only one time dimension). Certainly it would violate the speed of light criterion, but we could then take the approach of David Deutsch and view the time axis as if it had multiple branches, like the branches of the arctangent function, creating time-consistent sheets within a sheave of flat Minkowski spaces.
It is second nature to think of integer dimensions: A line is one dimensional. A plane is two dimensional. A volume is three dimensional. A point has no dimensions.
It is harder to think in four dimensions and higher, but even here it is a simple extrapolation of lower dimensions. Consider the basis vectors spanning a three-dimensional space consisting of the triples of numbers
Then a four dimensional hyperspace is just created by adding a new “tuple” to the list
and so on to 5 and 6 dimensions and on. Child’s play!
But how do you think of fractional dimensions? What is a fractional dimension? For that matter, what is a dimension? Even the integer dimensions began to unravel when George Cantor showed in 1877 that the line and the plane, which clearly had different “dimensionalities”, both had the same cardinality and could be put into a one-to-one correspondence. From then onward the concept of dimension had to be rebuilt from the ground up, leading ultimately to fractals.
Here is a short history of fractal dimension, partially excerpted from my history of dynamics in Galileo Unbound (Oxford University Press, 2018) pg. 110 ff. This blog page presents the history through a set of publications that successively altered how mathematicians thought about curves in spaces, beginning with Karl Weierstrass in 1872.
Karl Weierstrass (1872)
Karl Weierstrass (1815 – 1897) was studying convergence properties of infinite power series in 1872 when he began with a problem that Bernhard Riemann had given to his students some years earlier. Riemann had asked whether the function
was continuous everywhere but not differentiable. This simple question about a simple series was surprisingly hard to answer (it was not solved until Hardy provided the proof in 1916 ). Therefore, Weierstrass conceived of a simpler infinite sum that was continuous everywhere and for which he could calculate left and right limits of derivatives at any point. This function is
where b is a large odd integer and a is positive and less than one. Weierstrass showed that the left and right derivatives failed to converge to the same value, no matter where he took his point. In short, he had discovered a function that was continuous everywhere, but had a derivative nowhere . This pathological function, called a “Monster” by Charles Hermite, is now called the Weierstrass function.
Beyond the strange properties that Weierstrass sought, the Weierstrass function would turn out to be a fractal curve (recognized much later by Besicovitch and Ursell in 1937 ) with a fractal (Hausdorff) dimension given by
although this was not proven until very recently . An example of the function is shown in Fig. 1 for a = 0.5 and b = 5. This specific curve has a fractal dimension D = 1.5693. Notably, this is a number that is greater than 1 dimension (the topological dimension of the curve) but smaller than 2 dimensions (the embedding dimension of the curve). The curve tends to fill more of the two dimensional plane than a straight line, so its intermediate fractal dimension has an intuitive feel about it. The more “monstrous” the curve looks, the closer its fractal dimension approaches 2.
Fig. 1 Weierstrass’ “Monster” (1872) with a = 0.5, b = 5. This continuous function is nowhere differentiable. It is a fractal with fractal dimension D = 2 + ln(0.5)/ln(5) = 1.5693.
Georg Cantor (1883)
Partially inspired by Weierstrass’ discovery, George Cantor (1845 – 1918) published an example of an unusual ternary set in 1883 in “Grundlagen einer allgemeinen Mannigfaltigkeitslehre” (“Foundations of a General Theory of Aggregates”) . The set generates a function (The Cantor Staircase) that has a derivative equal to zero almost everywhere, yet whose area integrates to unity. It is a striking example of a function that is not equal to the integral of its derivative! Cantor demonstrated that the size of his set is aleph0 , which is the cardinality of the real numbers. But whereas the real numbers are uniformly distributed, Cantor’s set is “clumped”. This clumpiness is an essential feature that distinguishes it from the one-dimensional number line, and it raised important questions about dimensionality. The fractal dimension of the ternary Cantor set is DH = ln(2)/ln(3) = 0.6309.
Fig. 2 The 1883 Cantor set (below) and the Cantor staircase (above, as the indefinite integral over the set).
Giuseppe Peano (1890)
In 1878, in a letter to his friend Richard Dedekind, Cantor showed that there was a one-to-one correspondence between the real numbers and the points in any n-dimensional space. He was so surprised by his own result that he wrote to Dedekind “I see it, but I don’t believe it.” The solid concepts of dimension and dimensionality were dissolving before his eyes. What does it mean to trace the path of a trajectory in an n-dimensional space, if all the points in n dimensions were just numbers on a line? What could such a trajectory look like? A graphic example of a plane-filling path was constructed in 1890 by Peano , who was a peripatetic mathematician with interests that wandered broadly across the landscape of the mathematical problems of his day—usually ahead of his time. Only two years after he had axiomatized linear vector spaces , Peano constructed a continuous curve that filled space.
The construction of Peano’s curve proceeds by taking a square and dividing it into 9 equal sub squares. Lines connect the centers of each of the sub squares. Then each sub square is divided again into 9 sub squares whose centers are all connected by lines. At this stage, the original pattern, repeated 9 times, is connected together by 8 links, forming a single curve. This process is repeated infinitely many times, resulting in a curve that passes through every point of the original plane square. In this way, a line is made to fill a plane. Where Cantor had proven abstractly that the cardinality of the real numbers was the same as the points in n-dimensional space, Peano created a specific example. This was followed quickly by another construction, invented by David Hilbert in 1891, that divided the square into four instead of nine, simplifying the construction, but also showing that such constructions were easily generated.
Fig. 3 Peano’s (1890) and Hilbert’s (1891) plane-filling curves. When the iterations are taken to infinity, the curves approach every point of two-dimensional space arbitrarily closely, giving them a dimension DH = DE = 2, although their topological dimensions are DT = 1.
Helge von Koch (1904)
The space-filling curves of Peano and Hilbert have the extreme property that a one-dimensional curve approaches every point in a two-dimensional space. This ability of a one-dimensional trajectory to fill space mirrored the ergodic hypothesis that Boltzmann relied upon as he developed statistical mechanics. These examples by Peano, Hilbert and Boltzmann inspired searches for continuous curves whose dimensionality similarly exceeded one dimension, yet without filling space. Weierstrass’ Monster was already one such curve, existing in some dimension greater than one but not filling the plane. The construction of the Monster required infinite series of harmonic functions, and the resulting curve was single valued on its domain of real numbers.
An alternative approach was proposed by Helge von Koch (1870—1924), a Swedish mathematician with an interest in number theory. He suggested in 1904 that a set of straight line segments could be joined together, and then shrunk by a scale factor to act as new segments of the original pattern . The construction of the Koch curve is shown in Fig. 4. When the process is taken to its limit, it produces a curve, differentiable nowhere, which snakes through two dimensions. When connected with other identical curves into a hexagon, the curve resembles a snowflake, and the construction is known as “Koch’s Snowflake”.
The Koch curve begins in generation 1 with N0 = 4 elements. These are shrunk by a factor of b = 1/3 to become the four elements of the next generation, and so on. The number of elements varies with the observation scale according to the equation
where D is called the fractal dimension. In the example of the Koch curve, the fractal dimension is
which is a number less than its embedding dimenion DE = 2. The fractal is embedded in 2D but has a fractional dimension that is greater than it topological dimension DT = 1.
Fig. 4 Generation of a Koch curve (1904). The fractal dimension is D = ln(4)/ln(3) = 1.26. At each stage, four elements are reduced in size by a factor of 3. The “length” of the curve approaches infinity as the features get smaller and smaller. But the scaling of the length with size is determined uniquely by the fractal dimension.
Waclaw Sierpinski (1915)
Waclaw Sierpinski (1882 – 1969) was a Polish mathematician studying at the Jagellonian University in Krakow for his doctorate when he came across a theorem that every point in the plane can be defined by a single coordinate. Intrigued by such an unintuitive result, he dived deep into Cantor’s set theory after he was appointed as a faculty member at the university in Lvov. He began to construct curves that had more specific properties than the Peano or Hilbert curves, such as a curve that passes through every interior point of a unit square but that encloses an area that is only equal to 5/12 = 0.4167. Sierpinski became interested in the topological properties of such sets.
Sierpinski considered how to define a curve that was embedded in DE = 2 but that was NOT constructed as a topological dimension DT = 1 curve as the curves of Peano, Hilbert, Koch (and even his own) had been. To demonstrate this point, he described a construction that began with a topological dimension DT = 2 object, a planar triangle, from which the open set of its central inverted triangle is removed, leaving its boundary points. The process is continued iteratively to all scales . The resulting point set is shown in Fig. 5 and is called the Sierpinski gasket. What is left after all the internal triangles are removed is a point set that can be made discontinuous by cutting it at a finite set of points. This is shown in Fig. 5 by the red circles. Each circle, no matter the size, cuts the set at three points, making the resulting set discontinuous. Ten years later, Karl Menger would show that this property of discontinuous cuts determined the topological dimension of the Sierpinski gasket to be DT = 1. The embedding dimension is of course DE = 2, and the fractal dimension of the Sierpinski gasket is
Fig. 5 The Sierpinski gasket. The central triangle is removed (leaving its boundary) at each scale. The pattern is self-similar with a fractal dimension DH = 1.5850. Unintuitively, it has a topological dimension DT = 1.
Felix Hausdorff (1918)
The work by Cantor, Peano, von Koch and Sierpinski had created a crisis in geometry as mathematicians struggled to rescue concepts of dimensionality. An important byproduct of that struggle was a much deeper understanding of concepts of space, especially in the hands of Felix Hausdorff.
Felix Hausdorff (1868 – 1942) was born in Breslau, Prussia, and educated in Leipzig. In his early years as a doctoral student, and as an assistant professor at Leipzig, he was a practicing mathematician by day and a philosopher and playwright by night, publishing under the pseudonym Paul Mongré. He was at the University of Bonn working on set theory when the Greek mathematician Constatin Carathéodory published a paper in 1914 that showed how to construct a p-dimensional set in a q-dimensional space . Haussdorff realized that he could apply similar ideas to the Cantor set. He showed that the outer measure of the Cantor set would go discontinuously from zero to infinity as the fractional dimension increased smoothly. The critical value where the measure changed its character became known as the Hausdorff dimension .
For the Cantor ternary set, the Hausdorff dimension is exactly DH = ln(2)/ln(3) = 0.6309. This value for the dimension is less than the embedding dimension DE = 1 of the support (the real numbers on the interval [0, 1]), but it is also greater than DT = 0 which would hold for a countable number of points on the interval. The work by Hausdorff became well known in the mathematics community who applied the idea to a broad range of point sets like Weierstrass’s monster and the Koch curve.
It is important to keep a perspective of what Hausdorff’s work meant during which period of time. For instance, although the curves of Weierstrass, von Koch and Sierpinski were understood to present a challenge to concepts of dimension, it was only after Haussdorff that mathematicians began to think in terms of fractional dimensions and to calculate the fractional dimensions of these earlier point sets. Despite the fact that Sierpinski created one of the most iconic fractals that we use as an example every day, he was unaware at the time that he was doing so. His interest was topological—creating a curve for which any cut at any point would create disconnected subsets starting with objects (triangles) with topological dimension DT = 2. In this way, talking about the early fractal objects tends to be anachronistic, using language to describe them that had not yet been invented at that time.
This perspective is also true for the ideas of topological dimension. For instance, even Sierpinski was not fully tuned into the problems of defining topological dimension. It turns out that what he created was a curve of topological dimension DT = 1, but that would only become clear later with the work of the Austrian mathematician Karl Menger.
Karl Menger (1926)
The day that Karl Menger (1902 – 1985) was born, his father, Carl Menger (1840 – 1941) lost his job. Carl Menger was one of the founders of the famous Viennese school that established the marginalist view of economics. However, Carl was not married to Karl’s mother, which was frowned upon by polite Vienna society, so he had to relinquish his professorship. Despite his father’s reduction in status, Karl received an excellent education at a Viennese gymnasium (high school). Among of his classmates were Wolfgang Pauli (Nobel Prize for Physics in 1945) and Richard Kuhn (Nobel Prize for Chemistry in 1938). When Karl began attending the University of Vienna he studied physics, but the mathematics professor Hans Hahn opened his eyes to the fascinating work on analysis that was transforming mathematics at that time, so Karl shifted his studies to mathematical analysis, specifically concerning conceptions of “curves”.
Menger made important contributions to the history of fractal dimension as well as the history of topological dimension. In his approach to defining the intrinsic topological dimension of a point set, he described the construction of a point set embedded in three dimensions that had zero volume, an infinite surface area, and a fractal dimension between 2 and 3. The object is shown in Fig. 6 and is called a Menger “sponge” . The Menger sponge is a fractal with a fractal dimension DH = ln(20)/ln(3) = 2.7268. The face of the sponge is also known as the Sierpinski carpt. The fractal dimension of the Sierpinski carpet is DH = ln(8)/ln(3) = 1.8928.
Fig. 6 Menger Sponge. Embedding dimension DE = 3. Fractal dimension DH = ln(20)/ln(3) = 2.7268. Topological dimension DT = 1: all one-dimensional metric spaces can be contained within the Menger sponge point set. Each face is a Sierpinski carpet with fractal dimension DH = ln(8)/ln(3) = 1.8928.
The striking feature of the Menger sponge is its topological dimension. Menger created a new definition of topological dimension that partially solved the crises created by Cantor when he showed that every point on the unit square can be defined by a single coordinate. This had put a one dimensional curve in one-to-one correspondence with a two-dimensional plane. Yet the topology of a 2-dimensional object is clearly different than the topology of a line. Menger found a simple definition that showed why 2D is different, topologically, than 3D, despite Cantor’s conundrum. The answer came from the idea of making cuts on a point set and seeing if the cut created disconnected subsets.
As a simple example, take a 1D line. The removal of a single point creates two disconnected sub-lines. The intersection of the cut with the line is 0-dimensional, and Menger showed that this defined the line as 1-dimensional. Similarly, a line cuts the unit square into to parts. The intersection of the cut with the plane is 1-dimensional, signifying that the dimension of the plane is 2-dimensional. In other words, a (n-1) dimensional intersection of the boundary of a small neighborhood with the point set indicates that the point set has a dimension of n. Generalizing this idea, looking at the Sierpinski gasket in Fig. 5, the boundary of a small circular region, if placed appropriately (as in the figure), intersects the Sierpinski gasket at three points of dimension zero. Hence, the topological dimension of the Sierpinski gasket is one-dimensional. Manger was likewise able to show that his sponge also had a topology that was one-dimensional, DT = 1, despite the embedding dimension of DE = 3. In fact, all 1-dimensional metric spaces can be fit inside a Menger Sponge.
Benoit Mandelbrot (1967)
Benoit Mandelbrot (1924 – 2010) was born in Warsaw and his family emigrated to Paris in 1935. He attended the Ecole Polytechnique where he studied under Gaston Julia (1893 – 1978) and Paul Levy (1886 – 1971). Both Julia and Levy made significant contributions to the field of self-similar point sets and made a lasting impression on Mandelbrot. He went to Cal Tech for a master’s degree in aeronautics and then a PhD in mathematical sciences from the University of Paris. In 1958 Mandelbrot joined the research staff of the IBM Thomas J. Watson Research Center in Yorktown Heights, New York where he worked for over 35 years on topics of information theory and economics, always with a view of properties of self-similar sets and time series.
In 1967 Mandelbrot published one of his early important papers on the self-similar properties of the coastline of Britain. He proposed that many natural features had statistical self similarity, which he applied to coastlines. He published the work as “How Long Is the Coast of Britain? Statistical Self-Similarity and Fractional Dimension”  in Science magazine , where he showed that the length of the coastline diverged with a Hausdorff dimension equal to D = 1.25. Working at IBM, a world leader in computers, he had ready access to their power as well as their visualization capabilities. Therefore, he was one of the first to begin exploring the graphical character of self-similar maps and point sets.
During one of his sabbaticals at Harvard University he began exploring the properties of Julia sets (named after his former teacher at the Ecole Polytechnique). The Julia set is a self-similar point set that is easily visualized in the complex plane (two dimensions). As Mandelbrot studied the convergence of divergence of infinite series defined by the Julia mapping, he discovered an infinitely nested pattern that was both beautiful and complex. This has since become known as the Mandelbrot set.
Later, in 1975, Mandelbrot coined the term fractal to describe these self-similar point sets, and he began to realize that these types of sets were ubiquitous in nature, ranging from the structure of trees and drainage basins, to the patterns of clouds and mountain landscapes. He published his highly successful and influential book The Fractal Geometry of Nature in 1982, introducing fractals to the wider public and launching a generation of hobbyists interested in computer-generated fractals. The rise of fractal geometry coincided with the rise of chaos theory that was aided by the same computing power. For instance, important geometric structures of chaos theory, known as strange attractors, have fractal geometry.
Appendix: Box Counting
When confronted by a fractal of unknown structure, one of the simplest methods to find the fractal dimension is through box counting. This method is shown in Fig. 8. The fractal set is covered by a set of boxes of size b, and the number of boxes that contain at least one point of the fractal set are counted. As the boxes are reduced in size, the number of covering boxes increases as
To be numerically accurate, this method must be iterated over several orders of magnitude. The number of boxes covering a fractal has this characteristic power law dependence, as shown in Fig. 8, and the fractal dimension is obtained as the slope.
Fig. 8 Calculation of the fractal dimension using box counting. At each generation, the size of the grid is reduced by a factor of 3. The number of boxes that contain some part of the fractal curve increases as , where b is the scale
 Hardy, G. (1916). “Weierstrass’s non-differentiable function.” Transactions of the American Mathematical Society 17: 301-325.
 Weierstrass, K. (1872). “Uber continuirliche Functionen eines reellen Arguments, die fur keinen Werth des letzteren einen bestimmten Differentialquotienten besitzen.” Communication ri I’Academie Royale des Sciences II: 71-74.
 Besicovitch, A. S. and H. D. Ursell (1937). “Sets of fractional dimensions: On dimensional numbers of some continuous curves.” J. London Math. Soc. 1(1): 18-25.
 Shen, W. (2018). “Hausdorff dimension of the graphs of the classical Weierstrass functions.” Mathematische Zeitschrift. 289(1–2): 223–266.
 Cantor, G. (1883). Grundlagen einer allgemeinen Mannigfaltigkeitslehre. Leipzig, B. G. Teubner.
 Peano, G. (1890). “Sur une courbe qui remplit toute une aire plane.” Mathematische Annalen 36: 157-160.
 Peano, G. (1888). Calcolo geometrico secundo l’Ausdehnungslehre di H. Grassmann e precedutto dalle operazioni della logica deduttiva. Turin, Fratelli Bocca Editori.
 Von Koch, H. (1904). “Sur.une courbe continue sans tangente obtenue par une construction geometrique elementaire.” Arkiv for Mathematik, Astronomi och Fysich 1: 681-704.
 Sierpinski, W. (1915). “Sur une courbe dont tout point est un point de ramification.” Comptes Rendus Hebdomadaires des Seances de l’Academie des Sciences de Paris 160: 302-305.
 Carathéodory, C. (1914). “Über das lineare Mass von Punktmengen – eine Verallgemeinerung des Längenbegriffs.” Gött. Nachr. IV: 404–406.
 Hausdorff, F. (1919). “Dimension und ausseres Mass.” Mathematische Anna/en 79: 157-179.
 Menger, Karl (1926), “Allgemeine Räume und Cartesische Räume. I.”, Communications to the Amsterdam Academy of Sciences. English translation reprinted in Edgar, Gerald A., ed. (2004), Classics on fractals, Studies in Nonlinearity, Westview Press. Advanced Book Program, Boulder, CO
 B Mandelbrot, How Long Is the Coast of Britain? Statistical Self-Similarity and Fractional Dimension. Science, 156 3775 (May 5, 1967): 636-638.
The butterfly effect is one of the most widely known principles of chaos theory. It has become a meme, propagating through popular culture in movies, books, TV shows and even casual conversation.
Can a butterfly flapping its wings in Florida send a hurricane to New York?
The origin of the butterfly effect is — not surprisingly — the image of a butterfly-like set of trajectories that was generated, in one of the first computer simulations of chaos theory, by Edward Lorenz.
When Edward Lorenz (1917 – 2008) was a child, he memorized all perfect squares up to ten thousand. This obvious interest in mathematics led him to a master’s degree in the subject at Harvard in 1940 under the supervision of Georg Birkhoff. Lorenz’s master’s thesis was on an aspect of Riemannian geometry, but his foray into nonlinear dynamics was triggered by the intervention of World War II. Only a few months before receiving his doctorate in mathematics from Harvard, the Japanese bombed Pearl Harbor.
Lorenz left the PhD program at Harvard to join the United States Army Air Force to train as a weather forecaster in early 1942, and he took courses on forecasting and meteorology at MIT. After receiving a second master’s degree, this time in meteorology, Lorenz was posted to Hawaii, then to Saipan and finally to Guam. His area of expertise was in high-level winds, which were important for high-altitude bombing missions during the final months of the war in the Pacific. After the Japanese surrender, Lorenz returned to MIT, where he continued his studies in meteorology, receiving his doctorate degree in 1948 with a thesis on the application of fluid dynamical equations to predict the motion of storms.
One of Lorenz’ colleagues at MIT was Norbert Wiener (1894 – 1964), with whom he sometimes played chess during lunch at the faculty club. Wiener had published his landmark book Cybernetics: Control and Communication in the Animal and Machine in 1949 which arose out of the apparently mundane problem of gunnery control during the Second World War. As an abstract mathematician, Wiener attempted to apply his cybernetic theory to the complexities of weather, but he developed a theorem concerning nonlinear fluid dynamics which appeared to show that linear interpolation, of sufficient resolution, would suffice for weather forecasting, possibly even long-range forecasting. Many on the meteorology faculty embraced this theorem because it fell in line with common practices of the day in which tomorrow’s weather was predicted using linear regression on measurements taken today. However, Lorenz was skeptical, having acquired a detailed understanding of atmospheric energy cascades as larger vortices induced smaller vortices all the way down to the molecular level, dissipating as heat, and then all the way back up again as heat drove large-scale convection. This was clearly not a system that would yield to linearization. Therefore, Lorenz determined to solve nonlinear fluid dynamics models to test this conjecture.
Even with a computer in hand, the atmospheric equations needed to be simplified to make the calculations tractable. Lorenz was more a scientist than an engineer, and more of a meteorologist than a forecaster. He did not hesitate to make simplifying assumptions if they retained the correct phenomenological behavior, even if they no longer allowed for accurate weather predictions.
He had simplified the number of atmospheric equations down to twelve. Progress was good, and by 1961, he had completed a large initial numerical study. He focused on nonperiodic solutions, which he suspected would deviate significantly from the predictions made by linear regression, and this hunch was vindicated by his numerical output. One day, as he was testing his results, he decided to save time by starting the computations midway by using mid-point results from a previous run as initial conditions. He typed in the three-digit numbers from a paper printout and went down the hall for a cup of coffee. When he returned, he looked at the printout of the twelve variables and was disappointed to find that they were not related to the previous full-time run. He immediately suspected a faulty vacuum tube, as often happened. But as he looked closer at the numbers, he realized that, at first, they tracked very well with the original run, but then began to diverge more and more rapidly until they lost all connection with the first-run numbers. His initial conditions were correct to a part in a thousand, but this small error was magnified exponentially as the solution progressed.
At this point, Lorenz recalled that he “became rather excited”. He was looking at a complete breakdown of predictability in atmospheric science. If radically different behavior arose from the smallest errors, then no measurements would ever be accurate enough to be useful for long-range forecasting. At a more fundamental level, this was a break with a long-standing tradition in science and engineering that clung to the belief that small differences produced small effects. What Lorenz had discovered, instead, was that the deterministic solution to his 12 equations was exponentially sensitive to initial conditions (known today as SIC).
The Lorenz Equations
Over the following months, he was able to show that SIC was a result of the nonperiodic solutions. The more Lorenz became familiar with the behavior of his equations, the more he felt that the 12-dimensional trajectories had a repeatable shape. He tried to visualize this shape, to get a sense of its character, but it is difficult to visualize things in twelve dimensions, and progress was slow. Then Lorenz found that when the solution was nonperiodic (the necessary condition for SIC), four of the variables settled down to zero, leaving all the dynamics to the remaining three variables.
Lorenz narrowed the equations of atmospheric instability down to three variables: the stream function, the change in temperature and the deviation in linear temperature. The only parameter in the stream function is something known as the Prandtl Number. This is a dimensionless number which is the ratio of the kinetic viscosity of the fluid to its thermal diffusion coefficient and is a physical property of the fluid. The only parameter in the change in temperature is the Rayleigh Number which is a dimensionless parameter proportional to the difference in temperature between the top and the bottom of the fluid layer. The final parameter, in the equation for the deviation in linear temperature, is the ratio of the height of the fluid layer to the width of the convection rolls. The final simplified model is given by the flow equations
Lorenz finally had a 3-variable dynamical system that displayed chaos. Moreover, it had a three-dimensional state space that could be visualized directly. He ran his simulations, exploring the shape of the trajectories in three-dimensional state space for a wide range of initial conditions, and the trajectories did indeed always settle down to restricted regions of state space. They relaxed in all cases to a sort of surface that was elegantly warped, with wing-like patterns like a butterfly, as the state point of the system followed its dynamics through time. The attractor of the Lorenz equations was strange. Later, in 1971, David Ruelle (1935 – ), a Belgian-French mathematical physicist named this a “strange attractor”, and this name has become a standard part of the language of the theory of chaos.
The first graphical representation of the butterfly attractor is shown in Fig. 1 drawn by Lorenz for his 1963 publication.
Using our modern plotting ability, the 3D character of the butterfly is shown in Fig. 2
A projection onto the x-y plane is shown in Fig. 3. In the full 3D state space the trajectories never overlap, but in the projection onto a 2D plane the trajectories are moving above and below each other.
The reason it is called a strange attractor is because all initial conditions relax onto the strange attractor, yet every trajectory on the strange attractor separates exponentially from neighboring trajectories, displaying the classic SIC property of chaos. So here is an elegant collection of trajectories that are certainly not just random noise, yet detailed prediction is still impossible. Deterministic chaos has significant structure, and generates beautiful patterns, without actual “randomness”.
# -*- coding: utf-8 -*-
Created on Mon Apr 16 07:38:57 2018
Introduction to Modern Dynamics, 2nd edition (Oxford University Press, 2019)
Lorenz model of atmospheric turbulence
import numpy as np
import matplotlib as mpl
import matplotlib.colors as colors
import matplotlib.cm as cmx
from scipy import integrate
from matplotlib import cm
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import cnames
from matplotlib import animation
jet = cm = plt.get_cmap('jet')
values = range(10)
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
def solve_lorenz(N=12, angle=0.0, max_time=8.0, sigma=10.0, beta=8./3, rho=28.0):
fig = plt.figure()
ax = fig.add_axes([0, 0, 1, 1], projection='3d')
# prepare the axes limits
def lorenz_deriv(x_y_z, t0, sigma=sigma, beta=beta, rho=rho):
"""Compute the time-derivative of a Lorenz system."""
x, y, z = x_y_z
return [sigma * (y - x), x * (rho - z) - y, x * y - beta * z]
# Choose random starting points, uniformly distributed from -15 to 15
x0 = -10 + 20 * np.random.random((N, 3))
# Solve for the trajectories
t = np.linspace(0, max_time, int(500*max_time))
x_t = np.asarray([integrate.odeint(lorenz_deriv, x0i, t)
for x0i in x0])
# choose a different color for each trajectory
# colors = plt.cm.viridis(np.linspace(0, 1, N))
# colors = plt.cm.rainbow(np.linspace(0, 1, N))
# colors = plt.cm.spectral(np.linspace(0, 1, N))
colors = plt.cm.prism(np.linspace(0, 1, N))
for i in range(N):
x, y, z = x_t[i,:,:].T
lines = ax.plot(x, y, z, '-', c=colors[i])
return t, x_t
t, x_t = solve_lorenz(angle=0, N=12)
lines = plt.plot(t,x_t[1,:,0],t,x_t[1,:,1],t,x_t[1,:,2])
lines = plt.plot(t,x_t[2,:,0],t,x_t[2,:,1],t,x_t[2,:,2])
lines = plt.plot(t,x_t[10,:,0],t,x_t[10,:,1],t,x_t[10,:,2])
To explore the parameter space of the Lorenz attractor, the key parameters to change are sigma (the Prandtl number), r (the Rayleigh number) and b on line 31 of the Python code.
 E. N. Lorenz, The essence of chaos (The Jessie and John Danz lectures; Jessie and John Danz lectures.). Seattle :: University of Washington Press (in English), 1993.
 E. N. Lorenz, “Deterministic Nonperiodic Flow,” Journal of the Atmospheric Sciences, vol. 20, no. 2, pp. 130-141, 1963 (1963)
Well here is another squeaker! The 2020 U. S. presidential election was a dead heat. What is most striking is that half of the past six US presidential elections have been won by less than 1% of the votes cast in certain key battleground states. For instance, in 2000 the election was won in Florida by less than 1/100th of a percent of the total votes cast.
How can so many elections be so close? This question is especially intriguing when one considers the 2020 election, which should have been strongly asymmetric, because one of the two candidates had such serious character flaws. It is also surprising because the country is NOT split 50/50 between urban and rural populations (it’s more like 60/40). And the split of Democrat/Republican is about 33/29 — close, but not as close as the election. So how can the vote be so close so often? Is this a coincidence? Or something fundamental about our political system? The answer lies (partially) in nonlinear dynamics coupled with the libertarian tendencies of American voters.
Rabbits and Sheep
Elections are complex dynamical systems consisting of approximately 140 million degrees of freedom (the voters). Yet US elections are also surprisingly simple. They are dynamical systems with only 2 large political parties, and typically a very small third party.
Voters in a political party are not too different from species in an ecosystem. There are many population dynamics models of things like rabbit and sheep that seek to understand the steady-state solutions when two species vie for the same feedstock (or two parties vie for the same votes). Depending on reproduction rates and competition payoff, one species can often drive the other species to extinction. Yet with fairly small modifications of the model parameters, it is often possible to find a steady-state solution in which both species live in harmony. This is a symbiotic solution to the population dynamics, perhaps because the rabbits help fertilize the grass for the sheep to eat, and the sheep keep away predators for the rabbits.
There are two interesting features to such a symbiotic population-dynamics model. First, because there is a stable steady-state solution, if there is a perturbation of the populations, for instance if the rabbits are culled by the farmer, then the two populations will slowly relax back to the original steady-state solution. For this reason, this solution is called a “stable fixed point”. Deviations away from the steady-state values experience an effective “restoring force” that moves the population values back to the fixed point. The second feature of these models is that the steady state values depend on the parameters of the model. Small changes in the model parameters then cause small changes in the steady-state values. In this sense, this stable fixed point is not fundamental–it depends on the parameters of the model.
But there are dynamical models which do have a stability that maintains steady values even as the model parameters shift. These models have negative feedback, like many dynamical systems, but if the negative feedback is connected to winner-take-all outcomes of game theory, then a robustly stable fixed point can emerge at precisely the threshold where such a winner would take all.
The Replicator Equation
The replicator equation provides a simple model for competing populations . Despite its simplicity, it can model surprisingly complex behavior. The central equation is a simple growth model
where the growth rate depends on the fitness fa of the a-th species relative to the average fitness φ of all the species. The fitness is given by
where pab is the payoff matrix among the different species (implicit Einstein summation applies). The fitness is frequency dependent through the dependence on xb. The average fitness is
This model has a zero-sum rule that keeps the total population constant. Therefore, a three-species dynamics can be represented on a two-dimensional “simplex” where the three vertices are the pure populations for each of the species. The replicator equation can be applied easily to a three-party system, one simply defines a payoff matrix that is used to define the fitness of a party relative to the others.
The Nonlinear Dynamics of Presidential Elections
Here we will consider the replicator equation with three political parties (Democratic, Republican and Libertarian). Even though the third party is never a serious contender, the extra degree of freedom provided by the third party helps to stabilize the dynamics between the Democrats and the Republicans.
It is already clear that an essentially symbiotic relationship is at play between Democrats and Republicans, because the elections are roughly 50/50. If this were not the case, then a winner-take-all dynamic would drive virtually everyone to one party or the other. Therefore, having 100% Democrats is actually unstable, as is 100% Republicans. When the populations get too far out of balance, they get too monolithic and too inflexible, then defections of members will occur to the other parties to rebalance the system. But this is just a general trend, not something that can explain the nearly perfect 50/50 vote of the 2020 election.
To create the ultra-stable fixed point at 50/50 requires an additional contribution to the replicator equation. This contribution must create a type of toggle switch that depends on the winner-take-all outcome of the election. If a Democrat wins 51% of the vote, they get 100% of the Oval Office. This extreme outcome then causes a back action on the electorate who is always afraid when one party gets too much power.
Therefore, there must be a shift in the payoff matrix when too many votes are going one way or the other. Because the winner-take-all threshold is at exactly 50% of the vote, this becomes an equilibrium point imposed by the payoff matrix. Deviations in the numbers of voters away from 50% causes a negative feedback that drives the steady-state populations back to 50/50. This means that the payoff matrix becomes a function of the number of voters of one party or the other. In the parlance of nonlinear dynamics, the payoff matrix becomes frequency dependent. This goes one step beyond the original replicator equation where it was the population fitness that was frequency dependent, but not the payoff matrix. Now the payoff matrix also becomes frequency dependent.
The frequency-dependent payoff matrix (in an extremely simple model of the election dynamics) takes on negative feedback between two of the species (here the Democrats and the Republicans). If these are the first and third species, then the payoff matrix becomes
where the feedback coefficient is
and where the population dependences on the off-diagonal terms guarantee that, as soon as one party gains an advantage, there is defection of voters to the other party. This establishes a 50/50 balance that is maintained even when the underlying parameters would predict a strongly asymmetric election.
For instance, look at the dynamics in Fig. 2. For this choice of parameters, the replicator model predicts a 75/25 win for the democrats. However, when the feedback is active, it forces the 50/50 outcome, despite the underlying advantage for the original parameters.
There are several interesting features in this model. It may seem that the Libertarians are irrelevant because they never have many voters. But their presence plays a surprisingly important role. The Libertarians tend to stabilize the dynamics so that neither the democrats nor the republicans would get all the votes. Also, there is a saddle point not too far from the pure Libertarian vertex. That Libertarian vertex is an attractor in this model, so under some extreme conditions, this could become a one-party system…maybe not Libertarian in that case, but possibly something more nefarious, of which history can provide many sad examples. It’s a word of caution.
Disclaimers and Caveats
No attempt has been made to actually mode the US electorate. The parameters in the modified replicator equations are chosen purely for illustration purposes. This model illustrates a concept — that feedback in the payoff matrix can create an ultra-stable fixed point that is insensitive to changes in the underlying parameters of the model. This can possibly explain why so many of the US presidential elections are so tight.
Someone interested in doing actual modeling of US elections would need to modify the parameters to match known behavior of the voting registrations and voting records. The model presented here assumes a balanced negative feedback that ensures a 50/50 fixed point. This model is based on the aversion of voters to too much power in one party–an echo of the libertarian tradition in the country. A more sophisticated model would yield the fixed point as a consequence of the dynamics, rather than being a feature assumed in the model. In addition, nonlinearity could be added that would drive the vote off of the 50/50 point when the underlying parameters shift strongly enough. For instance, the 2008 election was not a close one, in part because the strong positive character of one of the candidates galvanized a large fraction of the electorate, driving the dynamics away from the 50/50 balance.
 D. D. Nolte, Introduction to Modern Dynamics: Chaos, Networks, Space and Time (Oxford University Press, 2019) 2nd Edition.
 Nowak, M. A. (2006). Evolutionary Dynamics: Exploring the Equations of Life. Cambridge, Mass., Harvard University Press.
A chief principle of chaos theory states that even simple systems can display complex dynamics. All that is needed for chaos, roughly, is for a system to have at least three dynamical variables plus some nonlinearity.
A classic example of chaos is the driven damped pendulum. This is a mass at the end of a massless rod driven by a sinusoidal perturbation. The three variables are the angle, the angular velocity and the phase of the sinusoidal drive. The nonlinearity is provided by the cosine function in the potential energy which is anharmonic for large angles. However, the driven damped pendulum is not an autonomous system, because the drive is an external time-dependent function. To find an autonomous system—one that persists in complex motion without any external driving function—one needs only to add one more mass to a simple pendulum to create what is known as a compound pendulum, or a double pendulum.
Daniel Bernoulli and the Discovery of Normal Modes
After the invention of the calculus by Newton and Leibniz, the first wave of calculus practitioners (Leibniz, Jakob and Johann Bernoulli and von Tschirnhaus) focused on static problems, like the functional form of the catenary (the shape of a hanging chain), or on constrained problems, like the brachistochrone (the path of least time for a mass under gravity to move between two points) and the tautochrone (the path of equal time).
The next generation of calculus practitioners (Euler, Johann and Daniel Bernoulli, and D’Alembert) focused on finding the equations of motion of dynamical systems. One of the simplest of these, that yielded the earliest equations of motion as well as the first identification of coupled modes, was the double pendulum. The double pendulum, in its simplest form, is a mass on a rigid massless rod attached to another mass on a massless rod. For small-angle motion, this is a simple coupled oscillator.
Daniel Bernoulli, the son of Johann I Bernoulli, was the first to study the double pendulum, publishing a paper on the topic in 1733 in the proceedings of the Academy in St. Petersburg just as he returned from Russia to take up a post permanently in his home town of Basel, Switzerland. Because he was a physicist first and mathematician second, he performed experiments with masses on strings to attempt to understand the qualitative as well as quantitative behavior of the two-mass system. He discovered that for small motions there was a symmetric behavior that had a low frequency of oscillation and an antisymmetric motion that had a higher frequency of oscillation. Furthermore, he recognized that any general motion of the double pendulum was a combination of the fundamental symmetric and antisymmetric motions. This work by Daniel Bernoulli represents the discovery of normal modes of coupled oscillators. It is also the first statement of the combination of motions that he would use later (1753) to express for the first time the principle of superposition.
Superposition is one of the guiding principles of linear physical systems. It provides a means for the solution of differential equations. It explains the existence of eigenmodes and their eigenfrequencies. It is the basis of all interference phenomenon, whether classical like the Young’s double-slit experiment or quantum like Schrödinger’s cat. Today, superposition has taken center stage in quantum information sciences and helps define the spooky (and useful) properties of quantum entanglement. Therefore, normal modes, composition of motion, superposition of harmonics on a musical string—these all date back to Daniel Bernoulli in the twenty years between 1733 and 1753. (Daniel Bernoulli is also the originator of the Bernoulli principle that explains why birds and airplanes fly.)
Johann Bernoulli and the Equations of Motion
Daniel Bernoulli’s father was Johann I Bernoulli. Daniel had been tutored by Johann, along with his friend Leonhard Euler, when Daniel was young. But as Daniel matured as a mathematician, he and his father began to compete against each other in international mathematics competitions (which were very common in the early eighteenth century). When Daniel beat his father in a competition sponsored by the French Academy, Johann threw Daniel out of his house and their relationship remained strained for the remainder of their lives.
Johann had a history of taking ideas from Daniel and never citing the source. For instance, when Johann published his work on equations of motion for masses on strings in 1742, he built on the work of his son Daniel from 1733 but never once mentioned it. Daniel, of course, was not happy.
In a letter dated 20 October 1742 that Daniel wrote to Euler, he said, “The collected works of my father are being printed, and I have Just learned that he has inserted, without any mention of me, the dynamical problems I first discovered and solved (such as e. g. the descent of a sphere on a moving triangle; the linked pendulum, the center of spontaneous rotation, etc.).” And on 4 September 1743, when Daniel had finally seen his father’s works in print, he said, “The new mechanical problems are mostly mine, and my father saw my solutions before he solved the problems in his way …”. 
Daniel clearly has the priority for the discovery of the normal modes of the linked (i.e. double or compound) pendulum, but Johann often would “improve” on Daniel’s work despite giving no credit for the initial work. As a mathematician, Johann had a more rigorous approach and could delve a little deeper into the math. For this reason, it was Johann in 1742 who came closest to writing down differential equations of motion for multi-mass systems, but falling just short. It was D’Alembert only one year later who first wrote down the differential equations of motion for systems of masses and extended it to the loaded string for which he was the first to derive the wave equation. The D’Alembertian operator is today named after him.
Double Pendulum Dynamics
The general dynamics of the double pendulum are best obtained from Lagrange’s equations of motion. However, setting up the Lagrangian takes careful thought, because the kinetic energy of the second mass depends on its absolute speed which is dependent on the motion of the first mass from which it is suspended. The velocity of the second mass is obtained through vector addition of velocities.
The potential energy of the system is
so that the Lagrangian is
The partial derivatives are
and the time derivatives of the last two expressions are
Therefore, the equations of motion are
To get a sense of how this system behaves, we can make a small-angle approximation to linearize the equations to find the lowest-order normal modes. In the small-angle approximation, the equations of motion become
where the determinant is
This quartic equation is quadratic in w2 and the quadratic solution is
This solution is still a little opaque, so taking the special case: R = R1 = R2 and M = M1 = M2 it becomes
There are two normal modes. The low-frequency mode is symmetric as both masses swing (mostly) together, while the higher frequency mode is antisymmetric with the two masses oscillating against each other. These are the motions that Daniel Bernoulli discovered in 1733.
It is interesting to note that if the string were rigid, so that the two angles were the same, then the lowest frequency would be 3/5 which is within 2% of the above answer but is certainly not equal. This tells us that there is a slightly different angular deflection for the second mass relative to the first.
Chaos in the Double Pendulum
The full expression for the nonlinear coupled dynamics is expressed in terms of four variables (q1, q2, w1, w2). The dynamical equations are
These can be put into the normal form for a four-dimensional flow as
The numerical solution of these equations produce a complex interplay between the angle of the first mass and the angle of the second mass. Examples of trajectory projections in configuration space are shown in Fig. 3 for E = 1. The horizontal is the first angle, and the vertical is the angle of the second mass.
The dynamics in state space are four dimensional which are difficult to visualize directly. Using the technique of the Poincaré first-return map, the four-dimensional trajectories can be viewed as a two-dimensional plot where the trajectories pierce the Poincaré plane. Poincare sections are shown in Fig. 4.
Python Code: DoublePendulum.py
# -*- coding: utf-8 -*-
Created on Oct 16 06:03:32 2020
"Introduction to Modern Dynamics" 2nd Edition (Oxford, 2019)
import numpy as np
from scipy import integrate
from matplotlib import pyplot as plt
E = 1. # Try 0.8 to 1.5
x, y, z, w = x_y_z_w
A = w**2*np.sin(y-x);
B = -2*np.sin(x);
C = z**2*np.sin(y-x)*np.cos(y-x);
D = np.sin(y)*np.cos(y-x);
EE = 2 - (np.cos(y-x))**2;
FF = w**2*np.sin(y-x)*np.cos(y-x);
G = -2*np.sin(x)*np.cos(y-x);
H = 2*z**2*np.sin(y-x);
I = 2*np.sin(y);
JJ = (np.cos(y-x))**2 - 2;
a = z
b = w
c = (A+B+C+D)/EE
d = (FF+G+H+I)/JJ
repnum = 75
for reploop in range(repnum):
px1 = 2*(np.random.random((1))-0.499)*np.sqrt(E);
py1 = -px1 + np.sqrt(2*E - px1**2);
xp1 = 0 # Try 0.1
yp1 = 0 # Try -0.2
x_y_z_w0 = [xp1, yp1, px1, py1]
tspan = np.linspace(1,1000,10000)
x_t = integrate.odeint(flow_deriv, x_y_z_w0, tspan)
siztmp = np.shape(x_t)
siz = siztmp
if reploop % 50 == 0:
lines = plt.plot(x_t[:,0],x_t[:,1])
y1 = np.mod(x_t[:,0]+np.pi,2*np.pi) - np.pi
y2 = np.mod(x_t[:,1]+np.pi,2*np.pi) - np.pi
y3 = np.mod(x_t[:,2]+np.pi,2*np.pi) - np.pi
y4 = np.mod(x_t[:,3]+np.pi,2*np.pi) - np.pi
py = np.zeros(shape=(10*repnum,))
yvar = np.zeros(shape=(10*repnum,))
cnt = -1
last = y1
for loop in range(2,siz):
if (last < 0)and(y1[loop] > 0):
cnt = cnt+1
del1 = -y1[loop-1]/(y1[loop] - y1[loop-1])
py[cnt] = y4[loop-1] + del1*(y4[loop]-y4[loop-1])
yvar[cnt] = y2[loop-1] + del1*(y2[loop]-y2[loop-1])
last = y1[loop]
last = y1[loop]
lines = plt.plot(yvar,py,'o',ms=1)
You can change the energy E on line 16 and also the initial conditions xp1 and yp1 on lines 48 and 49. The energy E is the initial kinetic energy imparted to the two masses. For a given initial condition, what happens to the periodic orbits as the energy E increases?
 Daniel Bernoulli, Theoremata de oscillationibus corporum filo flexili connexorum et catenae verticaliter suspensae,” Academiae Scientiarum Imperialis Petropolitanae, 6, 1732/1733
 Truesdell B. The rational mechanics of flexible or elastic bodies, 1638-1788. (Turici: O. Fussli, 1960). (This rare and artistically produced volume, that is almost impossible to find today in any library, is one of the greatest books written about the early history of dynamics.)