May 3, 2023September 8, 2023 by David D. Nolte

The Mighty Simplex

Machine Learning and Artificial Intelligence
Dantzig Simplex, Deep Learning, Equilateral, Hyperspace, Minimization, Nelder-Mead, Optimization, Pentachoron, population dynamics, Simplex, Tetrahedron

There is no greater geometric solid than the simplex. It is the paragon of efficiency, the pinnacle of symmetry, and the prototype of simplicity. If the universe were not constructed of continuous coordinates, then surely it would be tiled by tessellations of simplices.

Indeed, simplices, or simplexes, arise in a wide range of geometrical problems and real-world applications. For instance, metallic alloys are described on a simplex to identify the constituent elements [1]. Zero-sum games in game theory and ecosystems in population dynamics are described on simplexes [2], and the Dantzig simplex algorithm is a central algorithm for optimization in linear programming [3]. Simplexes also are used in nonlinear minimization (amoeba algorithm), in classification problems in machine learning, and they also raise their heads in quantum gravity. These applications reflect the special status of the simplex in the geometry of high dimensions.

… It’s Simplexes all the way down!

The reason for their usefulness is the simplicity of their construction that guarantees a primitive set that is always convex. For instance, in any space of d-dimensions, the simplest geometric figure that can be constructed of flat faces to enclose a d-volume consists of d+1 points that is the d-simplex.

Or …

In any space of d-dimensions, the simplex is the geometric figure whose faces are simplexes, whose faces are simplexes, whose faces are again simplexes, and those faces are once more simplexes … And so on.

In other words, it’s simplexes all the way down.

Simplex Geometry

In this blog, I will restrict the geometry to the regular simplex. The regular simplex is the queen of simplexes: it is the equilateral simplex for which all vertices are equivalent, and all faces are congruent, and all sub-faces are congruent, and so on. The regular simplexes have the highest symmetry properties of any polytope. A polytope is the d-dimensional generalization of a polyhedron. For instance, the regular 2-simplex is the equilateral triangle, and the regular 3-simplex is the equilateral tetrahedron.

The N-simplex is the high-dimensional generalization of the tetrahedron. It is a regular N-dimensional polytope with N+1 vertexes. Starting at the bottom and going up, the simplexes are the point (0-simplex), the unit line (1-simplex), the equilateral triangle (2-simplex), the tetrahedron (3-simplex), the pentachoron (4-simplex), the hexateron (5-simplex) and onward. When drawn on the two-dimensional plane, the simplexes are complete graphs with links connecting every node to every other node. This dual character of equidistance and completeness give simplexes their utility. Each node is equivalent and is linked to each other. There are N•(N-1)/2 links among N vertices, and there are (N-2)•(N-1)/2 triangular faces.

Fig. 1 The N-simplex structures from 1-D through 10-D. Drawn on the 2D plane, the simplexes are complete graphs with links between every node. The number of vertices is equal to the number of dimensions plus one. (Wikipedia)

Fig. 2 Coulomb-spring visualization of the energy minimization of a 12-simplex (a 12-dimensional tetrahedron). Each node is a charge. Each link is a spring. Beginning as a complete graph on the planar circle, it finds a minimum configuration with 3 internal nodes.

Construction of a d-simplex is recursive: Begin with a (d-1)-dimensional simplex and add a point along an orthogonal dimension to construct a d-simplex. For instance, to create a 2-simplex (an equilateral triangle), find the mid-point of the 1-simplex (a line segment)

Centered 1-simplex: (-1), (1)

add a point on the perpendicular that is the same distance from each original vertex as the original vertices were distant from each other

Off-centered 2-simplex: (-1,0), (1,0), (0, sqrt(3)/2)

Then shift the origin to the center of mass of the triangle

Centered 2-simplex: (-1, -sqrt(3)/6), (1, -sqrt(3)/6), (0, sqrt(3)/3)

The 2-simplex, i.e., the equilateral triangle, has a 1-simplex as each of its faces. And each of those 1-simplexes has a 0-simplex as each of its ends. Therefore, this recursive construction of ever higher-dimensional simplexes out of low-dimensional ones, provides an interesting pattern:

*Fig. 3 The entries are the same numbers that appear in Pascal’s Triangle.* *(Wikipedia)*

The coordinates of an N-simplex are not unique, although there are several convenient conventions. One convention defines standard coordinates for an N-simplex in N+1 coordinate bases. These coordinates embed the simplex into a space of one higher dimension. For instance, the standard 2-simplex is defined by the coordinates (001), (010), (100) forming a two-dimensional triangle in three dimensions, and the simplex is a submanifold in the embedding space. A more efficient coordinate choice matches the coordinate-space dimensionality to the dimensionality of the simplex. Hence the 10 vertices of a 9-simplex can be defined by 9 coordinates (also not unique). One choice is given in Fig. 4 for the 1-simplex up to the 9-simplex.

*Fig. 4 One possible set of coordinates for the 1-simplex up to the 9-simplex. The center of mass of the simplex is at the origin, and the edge lengths are equal to 2.*

The equations for the simplex coordinates are

where

is the “diagonal” vector. These coordinates are centered on the center of mass of the simplex, and the links all have length equal to 2 which can be rescaled by a multiplying factor. The internal dihedral angle between all of the coordinate vectors for an N-simplex is

For moderate to high-dimensionality, the position vectors of the simplex vertices are pseudo-orthogonal. For instance, for N = 9 the dihedral angle cosine is -1/9 = -0.111. For higher dimensions, the simplex position vectors become asymptotically orthogonal. Such orthogonality is an important feature for orthonormal decomposition of class superpositions, for instance of overlapping images.

Alloy Mixtures and Barycentric Coordinates

For linear systems, the orthonormality of basis representations is one of the most powerful features for system analysis in terms of superposition of normal modes. Neural networks, on the other hand, are intrinsically nonlinear decision systems for which linear superposition does not hold inside the network, even if the symbols presented to the network are orthonormal superpositions. This loss of orthonormality in deep networks can be partially retrieved by selecting the Simplex code. It has pseudo-orthogonal probability distribution functions located on the vertices of the simplex. There is an additional advantage to using the Simplex code: by using so-called barycentric coordinates, the simplex vertices can be expressed as independent bases. An example for the 2-simplex is shown in Fig. 5. The x-y Cartesian coordinates of the vertices (using tensor index notation) are given by (S₁¹, S₁²), (S₂¹, S₂²), and (S₃¹, S₃²). Any point (x¹, x²) on the plane can be expressed as a linear combination of the three vertices with barycentric coordinates (v¹, v², v³) by solving for these three coefficients from the equation

using Cramers rule. For instance, the three vertices of the simplex are expressed using the 3-component barycentric coordinates (1,0,0), (0,1,0) and (0,0,1). The mid-points on the edges have barycentric coordinates (1/2,1/2,0), (0,1/2,1/2), and (1/2,0,1/2). The centroid of the simplex has barycentric coordinates (1/3,1/3,1/3). Barycentric coordinates on a simplex are commonly used in phase diagrams of alloy systems in materials science. The simplex can also be used to identify crystallographic directions in three-dimensions, as in Fig. 6.

Fig. 5 Barycentric coordinates on the 2-Simplex. The vertices represent “orthogonal” pure symbols. Superpositions of 2 symbols lie on the edges. Any point on the simplex can be represented using barycentric coordinates with three indices corresponding to the mixture of the three symbols.

*Fig. 6 Crystallographic orientations expressed on a simplex. From A Treatise on Crystallography, William Miller, Cambridge (1839)*

Replicator Dynamics on the Simplex

Ecosystems are among the most complex systems on Earth. The complex interactions among hundreds or thousands of species may lead to steady homeostasis in some cases, to growth and collapse in other cases, and to oscillations or chaos in yet others. But the definition of species can be broad and abstract, referring to businesses and markets in economic ecosystems, or to cliches and acquaintances in social ecosystems, among many other examples. These systems are governed by the laws of evolutionary dynamics that include fitness and survival as well as adaptation. The dimensionality of the dynamical spaces for these systems extends to hundreds or thousands of dimensions—far too complex to visualize when thinking in four dimensions is already challenging.

A classic model of interacting species is the replicator equation. It allows for a fitness-based proliferation and for trade-offs among the individual species. The replicator dynamics equations are shown in Fig. 7.

Fig. 7 Replicator dynamics has a surprisingly simple form, but with surprisingly complicated behavior. The key elements are the fitness and the payoff matrix. The fitness relates to how likely the species will survive. The payoff matrix describes how one species gains at the loss of another (although symbiotic relationships also occur).

The population dynamics on the 2D simplex are shown in Fig. 8 for several different pay-off matrices (square matrix to the upper left of each simplex). The matrix values are shown in color and help interpret the trajectories. For instance the simplex on the upper-right shows a fixed point center. This reflects the antisymmetric character of the pay-off matrix around the diagonal. The stable spiral on the lower-left has a nearly asymmetric pay-off matrix, but with unequal off-diagonal magnitudes. The other two cases show central saddle points with stable fixed points on the boundary. A large variety of behaviors are possible for this very simple system. The Python program can be found in Trirep.py.

Fig. 8 Payoff matrix and population simplex for four random cases: Upper left is an unstable saddle. Upper right is a center. Lower left is a stable spiral. Lower right is a marginal case.

Linear Programming with the Dantzig Simplex

There is a large set of optimization problems in which a linear objective function is to be minimized subject to a set of inequalities. This is known as “Linear Programming”. These LP systems can be expressed as

The vector index goes from 1 to d, the dimension of the space. Each inequality creates a hyperplane, where two such hyperplanes intersect along a line terminated at each end by a vertex point. The set of vertexes defines a polytope in d-dimensions, and each face of the polytope, when combined with the point at the origin, defines a 3-simplex.

It is easy to visualize in lower dimensions why the linear objective function must have an absolute minimum at one of the vertexes of the polytope. And finding that minimum is a trivial exercise: Start at any vertex. Poll each neighboring vertex and move to the one that has the lowest value of the objective function. Repeat until the current vertex has a lower objective value than any neighbors. Because of the linearity of the objective function, this is a unique minimum (except for rare cases of accidental degeneracy). This iterative algorithm defines a walk on the vertexes of the polytope.

The question arises, why not just evaluate the objection function at each vertex and then just pick the vertex with the lowest value? The answer in high dimensions is that there are too many vertexes, and finding all of them is inefficient. If there are N vertexes, the walk to the solution visits only a few of the vertexes, on the order of log(N). The algorithm therefore scales as log(N), just like a search tree.

Fig. 9 Dantzig simplex approach on a convex 3D space of basic solutions in a linear programming problem.

This simple algorithm was devised by George Dantzig (1914 – 2005) in 1939 when he was a graduate student at UC Berkeley. He had arrived late to class and saw two problems written on the chalk board. He assumed that these were homework assignments, so he wrote them down and worked on them over the following week. He recalled that they seemed a bit harder than usual, but he eventually solved them and turned them in. A few weeks later, his very excited professor approached him and told him that the problems weren’t homework–they were two of the most important outstanding problems in optimization and that Dantzig had just solved them! The 1997 movie Good Will Hunting, with Matt Damon, Ben Affleck, and Robin Williams, borrowed this story for the opening scene.

The Amoeba Simplex Crawling through Hyperspace

Unlike linear programming problems with linear objective functions, multidimensional minimization of nonlinear objective functions is an art unto itself, with many approach. One of these is a visually compelling algorithm that does the trick more often than not. This is the so-called amoeba algorithm that shares much in common with the Dantzig simplex approach to linear programming, but instead of a set of fixed simplex coordinates, it uses a constantly shifting d-dimensional simplex that “crawls” over the objective function, seeking its minimum.

One of the best descriptions of the amoeba simplex algorithm is in “Numerical Recipes” [4] that describes the crawling simplex as

When it reaches a “valley floor”, the method contracts itself in the transverse direction and tries to ooze down the valley. If there is a situation where the simplex is trying to “pass through the eye of a needle”, it contracts itself in all directions, pulling itself in around its lowest (best) point.
(From Press, Numerical Recipes, Cambridge)

The basic operations for the crawling simplex are reflection and scaling. For a given evaluation of all the vertexes of the simplex, one will have the highest value and another the lowest. In a reflection, the highest point is reflected through the d-dimensional face defined by the other d vertexes. After reflection, if the new evaluation is lower than the former lowest value, then the point is expanded. If, on the other hand, it is little better than it was before reflection, then the point is contracted. The expansion and contraction are what allows the algorithm to slide through valleys or shrink to pass through the eye of a needle.

The amoeba algorithm was developed by John Nelder and Roger Mead in 1965 at a time when computing power was very limited. The algorithm works great as a first pass at a minimization problem, and it almost always works for moderately small dimensions, but for very high dimensions there are more powerful algorithms today for optimization, built into all the deep learning software environments like Tensor Flow and the Matlab toolbox.

By David D. Nolte, May 3, 2023

[1] M. Hillert, Phase equilibria, phase diagrams and phase transformations : their thermodynamic basis. (Cambridge University Press, Cambridge, UK ;, ed. 2nd ed., 2008).

[2] P. Schuster, K. Sigmund, Replicator Dynamics. Journal of Theoretical Biology 100, 533-538 (1983); P. Godfrey-Smith, The replicator in retrospect. Biology & Philosophy 15, 403-423 (2000).

[3] R. E. Stone, C. A. Tovey, The Simplex and Projective Scaling Algorithms as Iteratively Reweighted Least-squares Methods. Siam Review 33, 220-237 (1991).

[4] W. H. Press, Numerical Recipes in C++ : The Art of Scientific Computing. (Cambridge University Press, Cambridge, UK; 2nd ed., 2002).

March 8, 2023October 23, 2023 by David D. Nolte

A Short History of Multiple Dimensions

Hyperspace by any other name would sound as sweet, conjuring to the mind’s eye images of hypercubes and tesseracts, manifolds and wormholes, Klein bottles and Calabi Yau quintics. Forget the dimension of time—that may be the most mysterious of all—but consider the extra spatial dimensions that challenge the mind and open the door to dreams of going beyond the bounds of today’s physics.

The geometry of n dimensions studies reality; no one doubts that. Bodies in hyperspace are subject to precise definition, just like bodies in ordinary space; and while we cannot draw pictures of them, we can imagine and study them.
(Poincare 1895)

Here is a short history of hyperspace. It begins with advances by Möbius and Liouville and Jacobi who never truly realized what they had invented, until Cayley and Grassmann and Riemann made it explicit. They opened Pandora’s box, and multiple dimensions burst upon the world never to be put back again, giving us today the manifolds of string theory and infinite-dimensional Hilbert spaces.

August Möbius (1827)

Although he is most famous for the single-surface strip that bears his name, one of the early contributions of August Möbius was the idea of barycentric coordinates [1] , for instance using three coordinates to express the locations of points in a two-dimensional simplex—the triangle. Barycentric coordinates are used routinely today in metallurgy to describe the alloy composition in ternary alloys.

Möbius’ work was one of the first to hint that tuples of numbers could stand in for higher dimensional space, and they were an early example of homogeneous coordinates that could be used for higher-dimensional representations. However, he was too early to use any language of multidimensional geometry.

Carl Jacobi (1834)

Carl Jacobi was a master at manipulating multiple variables, leading to his development of the theory of matrices. In this context, he came to study (n-1)-fold integrals over multiple continuous-valued variables. From our modern viewpoint, he was evaluating surface integrals of hyperspheres.

In 1834, Jacobi found explicit solutions to these integrals and published them in a paper with the imposing title “De binis quibuslibet functionibus homogeneis secundi ordinis per substitutiones lineares in alias binas transformandis, quae solis quadratis variabilium constant; una cum variis theorematis de transformatione et determinatione integralium multiplicium” [2]. The resulting (n-1)-fold integrals are

when the space dimension is even or odd, respectively. These are the surface areas of the manifolds called (n-1)-spheres in n-dimensional space. For instance, the 2-sphere is the ordinary surface 4πr² of a sphere on our 3D space.

Despite the fact that we recognize these as surface areas of hyperspheres, Jacobi used no geometric language in his paper. He was still too early, and mathematicians had not yet woken up to the analogy of extending spatial dimensions beyond 3D.

Joseph Liouville (1838)

Joseph Liouville’s name is attached to a theorem that lies at the core of mechanical systems—Liouville’s Theorem that proves that volumes in high-dimensional phase space are incompressible. Surprisingly, Liouville had no conception of high dimensional space, to say nothing of abstract phase space. The story of the convoluted path that led Liouville’s name to be attached to his theorem is told in Chapter 6, “The Tangled Tale of Phase Space”, in Galileo Unbound (Oxford University Press, 2018).

Nonetheless, Liouville did publish a pure-mathematics paper in 1838 in Crelle’s Journal [3] that identified an invariant quantity that stayed constant during the differential change of multiple variables when certain criteria were satisfied. It was only later that Jacobi, as he was developing a new mechanical theory based on William R. Hamilton’s work, realized that the criteria needed for Liouville’s invariant quantity to hold were satisfied by conservative mechanical systems. Even then, neither Liouville nor Jacobi used the language of multidimensional geometry, but that was about to change in a quick succession of papers and books by three mathematicians who, unknown to each other, were all thinking along the same lines.

Facsimile of Liouville’s 1838 paper on invariants

Arthur Cayley (1843)

Arthur Cayley was the first to take the bold step to call the emerging geometry of multiple variables to be actual space. His seminal paper “Chapters in the Analytic Theory of n-Dimensions” was published in 1843 in the Philosophical Magazine [4]. Here, for the first time, Cayley recognized that the domain of multiple variables behaved identically to multidimensional space. He used little of the language of geometry in the paper, which was mostly analysis rather than geometry, but his bold declaration for spaces of n-dimensions opened the door to a changing mindset that would soon sweep through geometric reasoning.

Hermann Grassmann (1844)

Grassmann’s life story, although not overly tragic, was beset by lifelong setbacks and frustrations. He was a mathematician literally 30 years ahead of his time, but because he was merely a high-school teacher, no-one took his ideas seriously.

Somehow, in nearly a complete vacuum, disconnected from the professional mathematicians of his day, he devised an entirely new type of algebra that allowed geometric objects to have orientation. These could be combined in numerous different ways obeying numerous different laws. The simplest elements were just numbers, but these could be extended to arbitrary complexity with arbitrary number of elements. He called his theory a theory of “Extension”, and he self-published a thick and difficult tome that contained all of his ideas [5]. He tried to enlist Möbius to help disseminate his ideas, but even Möbius could not recognize what Grassmann had achieved.

In fact, what Grassmann did achieve was vector algebra of arbitrarily high dimension. Perhaps more impressive for the time is that he actually recognized what he was dealing with. He did not know of Cayley’s work, but independently of Cayley he used geometric language for the first time describing geometric objects in high dimensional spaces. He said, “since this method of formation is theoretically applicable without restriction, I can define systems of arbitrarily high level by this method… geometry goes no further, but abstract science knows no limits.” [6]

Grassman was convinced that he had discovered something astonishing and new, which he had, but no one understood him. After years trying to get mathematicians to listen, he finally gave up, left mathematics behind, and actually achieved some fame within his lifetime in the field of linguistics. There is even a law of diachronic linguistics named after him. For the story of Grassmann’s struggles, see the blog on Grassmann and his Wedge Product .

Julius Plücker (1846)

Projective geometry sounds like it ought to be a simple topic, like the projective property of perspective art as parallel lines draw together and touch at the vanishing point on the horizon of a painting. But it is far more complex than that, and it provided a separate gateway into the geometry of high dimensions.

A hint of its power comes from homogeneous coordinates of the plane. These are used to find where a point in three dimensions intersects a plane (like the plane of an artist’s canvas). Although the point on the plane is in two dimensions, it take three homogeneous coordinates to locate it. By extension, if a point is located in three dimensions, then it has four homogeneous coordinates, as if the three dimensional point were a projection onto 3D from a 4D space.

These ideas were pursued by Julius Plücker as he extended projective geometry from the work of earlier mathematicians such as Desargues and Möbius. For instance, the barycentric coordinates of Möbius are a form of homogeneous coordinates. What Plücker discovered is that space does not need to be defined by a dense set of points, but a dense set of lines can be used just as well. The set of lines is represented as a four-dimensional manifold. Plücker reported his findings in a book in 1846 [7] and expanded on the concepts of multidimensional spaces published in 1868 [8].

Ludwig Schläfli (1851)

After Plücker, ideas of multidimensional analysis became more common, and Ludwig Schläfli (1814 – 1895), a professor at the University of Berne in Switzerland, was one of the first to fully explore analytic geometry in higher dimensions. He described multidimsnional points that were located on hyperplanes, and he calculated the angles between intersecting hyperplanes [9]. He also investigated high-dimensional polytopes, from which are derived our modern “Schläfli notation“. However, Schläffli used his own terminology for these objects, emphasizing analytic properties without using the ordinary language of high-dimensional geometry.

Some of the polytopes studied by Schläfli.

Bernhard Riemann (1854)

The person most responsible for the shift in the mindset that finally accepted the geometry of high-dimensional spaces was Bernhard Riemann. In 1854 at the university in Göttingen he presented his habilitation talk “Über die Hypothesen, welche der Geometrie zu Grunde liegen” (Over the hypotheses on which geometry is founded). A habilitation in Germany was an examination that qualified an academic to be able to advise their own students (somewhat like attaining tenure in US universities).

The habilitation candidate would suggest three topics, and it was usual for the first or second to be picked. Riemann’s three topics were: trigonometric properties of functions (he was the first to rigorously prove the convergence properties of Fourier series), aspects of electromagnetic theory, and a throw-away topic that he added at the last minute on the foundations of geometry (on which he had not actually done any serious work). Gauss was his faculty advisor and picked the third topic. Riemann had to develop the topic in a very short time period, starting from scratch. The effort exhausted him mentally and emotionally, and he had to withdraw temporarily from the university to regain his strength. After returning around Easter, he worked furiously for seven weeks to develop a first draft and then asked Gauss to set the examination date. Gauss initially thought to postpone to the Fall semester, but then at the last minute scheduled the talk for the next day. (For the story of Riemann and Gauss, see Chapter 4 “Geometry on my Mind” in the book Galileo Unbound (Oxford, 2018)).

Riemann gave his lecture on 10 June 1854, and it was a masterpiece. He stripped away all the old notions of space and dimensions and imbued geometry with a metric structure that was fundamentally attached to coordinate transformations. He also showed how any set of coordinates could describe space of any dimension, and he generalized ideas of space to include virtually any ordered set of measurables, whether it was of temperature or color or sound or anything else. Most importantly, his new system made explicit what those before him had alluded to: Jacobi, Grassmann, Plücker and Schläfli. Ideas of Riemannian geometry began to percolate through the mathematics world, expanding into common use after Richard Dedekind edited and published Riemann’s habilitation lecture in 1868 [10].

George Cantor and Dimension Theory (1878)

In discussions of multidimensional spaces, it is important to step back and ask what is dimension? This question is not as easy to answer as it may seem. In fact, in 1878, George Cantor proved that there is a one-to-one mapping of the plane to the line, making it seem that lines and planes are somehow the same. He was so astonished at his own results that he wrote in a letter to his friend Richard Dedekind “I see it, but I don’t believe it!”. A few decades later, Peano and Hilbert showed how to create area-filling curves so that a single continuous curve can approach any point in the plane arbitrarily closely, again casting shadows of doubt on the robustness of dimension. These questions of dimensionality would not be put to rest until the work by Karl Menger around 1926 when he provided a rigorous definition of topological dimension (see the Blog on the History of Fractals).

Area-filling curves by Peano and Hilbert.

Hermann Minkowski and Spacetime (1908)

Most of the earlier work on multidimensional spaces were mathematical and geometric rather than physical. One of the first examples of physical hyperspace is the spacetime of Hermann Minkowski. Although Einstein and Poincaré had noted how space and time were coupled by the Lorentz equations, they did not take the bold step of recognizing space and time as parts of a single manifold. This step was taken in 1908 [11] by Hermann Minkowski who claimed

“Gentlemen! The views of space and time which I wish to lay before you … They are radical. Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.”Herman Minkowski (1908)

For the story of Einstein and Minkowski, see the Blog on Minkowski’s Spacetime: The Theory that Einstein Overlooked.

Facsimile of Minkowski’s 1908 publication on spacetime.

Felix Hausdorff and Fractals (1918)

No story of multiple “integer” dimensions can be complete without mentioning the existence of “fractional” dimensions, also known as fractals. The individual who is most responsible for the concepts and mathematics of fractional dimensions was Felix Hausdorff. Before being compelled to commit suicide by being jewish in Nazi Germany, he was a leading light in the intellectual life of Leipzig, Germany. By day he was a brilliant mathematician, by night he was the author Paul Mongré writing poetry and plays.

In 1918, as the war was ending, he wrote a small book “Dimension and Outer Measure” that established ways to construct sets whose measured dimensions were fractions rather than integers [12]. Benoit Mandelbrot would later popularize these sets as “fractals” in the 1980’s. For the background on a history of fractals, see the Blog A Short History of Fractals.

Example of a fractal set with embedding dimension D_E = 2, topological dimension D_T = 1, and fractal dimension D_H = 1.585.

The Fifth Dimension of Theodore Kaluza (1921) and Oskar Klein (1926)

The first theoretical steps to develop a theory of a physical hyperspace (in contrast to merely a geometric hyperspace) were taken by Theodore Kaluza at the University of Königsberg in Prussia. He added an additional spatial dimension to Minkowski spacetime as an attempt to unify the forces of gravity with the forces of electromagnetism. Kaluza’s paper was communicated to the journal of the Prussian Academy of Science in 1921 through Einstein who saw the unification principles as a parallel of some of his own attempts [13]. However, Kaluza’s theory was fully classical and did not include the new quantum theory that was developing at that time in the hands of Heisenberg, Bohr and Born.

Oskar Klein was a Swedish physicist who was in the “second wave” of quantum physicists having studied under Bohr. Unaware of Kaluza’s work, Klein developed a quantum theory of a five-dimensional spacetime [14]. For the theory to be self-consistent, it was necessary to roll up the extra dimension into a tight cylinder. This is like a strand a spaghetti—looking at it from far away it looks like a one-dimensional string, but an ant crawling on the spaghetti can move in two dimensions—along the long direction, or looping around it in the short direction called a compact dimension. Klein’s theory was an early attempt at what would later be called string theory. For the historical background on Kaluza and Klein, see the Blog on Oskar Klein.

The wave equations of Klein-Gordon, Schrödinger and Dirac.

John Campbell (1931): Hyperspace in Science Fiction

Art has a long history of shadowing the sciences, and the math and science of hyperspace was no exception. One of the first mentions of hyperspace in science fiction was in the story “Islands in Space’, by John Campbell [15], published in the Amazing Stories quarterly in 1931, where it was used as an extraordinary means of space travel.

In 1951, Isaac Asimov made travel through hyperspace the transportation network that connected the galaxy in his Foundation Trilogy [16].

Testez-vous : Isaac Asimov avait-il (entièrement) raison ? - Sciences et Avenir — Isaac Asimov (1920 – 1992)

John von Neumann and Hilbert Space (1932)

Quantum mechanics had developed rapidly through the 1920’s, but by the early 1930’s it was in need of an overhaul, having outstripped rigorous mathematical underpinnings. These underpinnings were provided by John von Neumann in his 1932 book on quantum theory [17]. This is the book that cemented the Copenhagen interpretation of quantum mechanics, with projection measurements and wave function collapse, while also establishing the formalism of Hilbert space.

Hilbert space is an infinite dimensional vector space of orthogonal eigenfunctions into which any quantum wave function can be decomposed. The physicists of today work and sleep in Hilbert space as their natural environment, often losing sight of its infinite dimensions that don’t seem to bother anyone. Hilbert space is more than a mere geometrical space, but less than a full physical space (like five-dimensional spacetime). Few realize that what is so often ascribed to Hilbert was actually formalized by von Neumann, among his many other accomplishments like stored-program computers and game theory.

John von Neumann (1903 – 1957). Image Credits.

Einstein-Rosen Bridge (1935)

One of the strangest entities inhabiting the theory of spacetime is the Einstein-Rosen Bridge. It is space folded back on itself in a way that punches a short-cut through spacetime. Einstein, working with his collaborator Nathan Rosen at Princeton’s Institute for Advanced Study, published a paper in 1935 that attempted to solve two problems [18]. The first problem was the Schwarzschild singularity at a radius r = 2M/c² known as the Schwarzschild radius or the Event Horizon. Einstein had a distaste for such singularities in physical theory and viewed them as a problem. The second problem was how to apply the theory of general relativity (GR) to point masses like an electron. Again, the GR solution to an electron blows up at the location of the particle at r = 0.

To eliminate both problems, Einstein and Rosen (ER) began with the Schwarzschild metric in its usual form

where it is easy to see that it “blows up” when r = 2M/c² as well as at r = 0. ER realized that they could write a new form that bypasses the singularities using the simple coordinate substitution

to yield the “wormhole” metric

It is easy to see that as the new variable u goes from -inf to +inf that this expression never blows up. The reason is simple—it removes the 1/r singularity by replacing it with 1/(r + ε). Such tricks are used routinely today in computational physics to keep computer calculations from getting too large—avoiding the divide-by-zero problem. It is also known as a form of regularization in machine learning applications. But in the hands of Einstein, this simple “bypass” is not just math, it can provide a physical solution.

It is hard to imagine that an article published in the Physical Review, especially one written about a simple variable substitution, would appear on the front page of the New York Times, even appearing “above the fold”, but such was Einstein’s fame this is exactly the response when he and Rosen published their paper. The reason for the interest was because of the interpretation of the new equation—when visualized geometrically, it was like a funnel between two separated Minkowski spaces—in other words, what was named a “wormhole” by John Wheeler in 1957. Even back in 1935, there was some sense that this new property of space might allow untold possibilities, perhaps even a form of travel through such a short cut.

As it turns out, the ER wormhole is not stable—it collapses on itself in an incredibly short time so that not even photons can get through it in time. More recent work on wormholes have shown that it can be stabilized by negative energy density, but ordinary matter cannot have negative energy density. On the other hand, the Casimir effect might have a type of negative energy density, which raises some interesting questions about quantum mechanics and the ER bridge.

Edward Witten’s 10+1 Dimensions (1995)

A history of hyperspace would not be complete without a mention of string theory and Edward Witten’s unification of the variously different 10-dimensional string theories into 10- or 11-dimensional M-theory. At a string theory conference at USC in 1995 he pointed out that the 5 different string theories of the day were all related through dualities. This observation launched the second superstring revolution that continues today. In this theory, 6 extra spatial dimensions are wrapped up into complex manifolds such as the Calabi-Yau manifold.

Two-dimensional slice of a six-dimensional Calabi-Yau quintic manifold.

Prospects

There is definitely something wrong with our three-plus-one dimensions of spacetime. We claim that we have achieved the pinnacle of fundamental physics with what is called the Standard Model and the Higgs boson, but dark energy and dark matter loom as giant white elephants in the room. They are giant, gaping, embarrassing and currently unsolved. By some estimates, the fraction of the energy density of the universe comprised of ordinary matter is only 5%. The other 95% is in some form unknown to physics. How can physicists claim to know anything if 95% of everything is in some unknown form?

The answer, perhaps to be uncovered sometime in this century, may be the role of extra dimensions in physical phenomena—probably not in every-day phenomena, and maybe not even in high-energy particles—but in the grand expanse of the cosmos.

By David D. Nolte, Feb. 8, 2023

Bibliography:

M. Kaku, R. O’Keefe, Hyperspace: A scientific odyssey through parallel universes, time warps, and the tenth dimension. (Oxford University Press, New York, 1994).

A. N. Kolmogorov, A. P. Yushkevich, Mathematics of the 19th century: Geometry, analytic function theory. (Birkhäuser Verlag, Basel ; 1996).

References:

[1] F. Möbius, in Möbius, F. Gesammelte Werke,, D. M. Saendig, Ed. (oHG, Wiesbaden, Germany, 1967), vol. 1, pp. 36-49.

[2] Carl Jacobi, “De binis quibuslibet functionibus homogeneis secundi ordinis per substitutiones lineares in alias binas transformandis, quae solis quadratis variabilium constant; una cum variis theorematis de transformatione et determinatione integralium multiplicium” (1834)

[3] J. Liouville, Note sur la théorie de la variation des constantes arbitraires. Liouville Journal 3, 342-349 (1838).

[4] A. Cayley, Chapters in the analytical geometry of n dimensions. Collected Mathematical Papers 1, 317-326, 119-127 (1843).

[5] H. Grassmann, Die lineale Ausdehnungslehre. (Wiegand, Leipzig, 1844).

[6] H. Grassmann quoted in D. D. Nolte, Galileo Unbound (Oxford University Press, 2018) pg. 105

[7] J. Plücker, System der Geometrie des Raumes in Neuer Analytischer Behandlungsweise, Insbesondere de Flächen Sweiter Ordnung und Klasse Enthaltend. (Düsseldorf, 1846).

[8] J. Plücker, On a New Geometry of Space (1868).

[9] L. Schläfli, J. H. Graf, Theorie der vielfachen Kontinuität. Neue Denkschriften der Allgemeinen Schweizerischen Gesellschaft für die Gesammten Naturwissenschaften 38. ([s.n.], Zürich, 1901).

[10] B. Riemann, Über die Hypothesen, welche der Geometrie zu Grunde liegen, Habilitationsvortrag. Göttinger Abhandlung 13, (1854).

[11] Minkowski, H. (1909). “Raum und Zeit.” Jahresbericht der Deutschen Mathematikier-Vereinigung: 75-88.

[12] Hausdorff, F.(1919).“Dimension und ausseres Mass,”Mathematische Annalen, 79: 157–79.

[13] Kaluza, Theodor (1921). “Zum Unitätsproblem in der Physik”. Sitzungsber. Preuss. Akad. Wiss. Berlin. (Math. Phys.): 966–972

[14] Klein, O. (1926). “Quantentheorie und fünfdimensionale Relativitätstheorie“. Zeitschrift für Physik. 37 (12): 895

[15] John W. Campbell, Jr. “Islands of Space“, Amazing Stories Quarterly (1931)

[16] Isaac Asimov, Foundation (Gnome Press, 1951)

[17] J. von Neumann, Mathematical Foundations of Quantum Mechanics. (Princeton University Press, ed. 1996, 1932).

[18] A. Einstein and N. Rosen, “The Particle Problem in the General Theory of Relativity,” Phys. Rev. 48(73) (1935).

Interference (New from Oxford University Press: 2023)

Read about the history of light and interference.

Available at Amazon.

Available at Oxford U Press

Avaliable at Barnes & Nobles

June 28, 2021April 3, 2022 by David D. Nolte

A Random Walk in 10 Dimensions

Physics in high dimensions is becoming the norm in modern dynamics. It is not only that string theory operates in ten dimensions (plus one for time), but virtually every complex dynamical system is described and analyzed within state spaces of high dimensionality. Population dynamics, for instance, may describe hundreds or thousands of different species, each of whose time-varying populations define a separate axis in a high-dimensional space. Coupled mechanical systems likewise may have hundreds or thousands (or more) of degrees of freedom that are described in high-dimensional phase space.

In high-dimensional landscapes, mountain ridges are much more common than mountain peaks. This has profound consequences for the evolution of life, the dynamics of complex systems, and the power of machine learning.

For these reasons, as physics students today are being increasingly exposed to the challenges and problems of high-dimensional dynamics, it is important to build tools they can use to give them an intuitive feeling for the highly unintuitive behavior of systems in high-D.

Within the rapidly-developing field of machine learning, which often deals with landscapes (loss functions or objective functions) in high dimensions that need to be minimized, high dimensions are usually referred to in the negative as “The Curse of Dimensionality”.

Dimensionality might be viewed as a curse for several reasons. First, it is almost impossible to visualize data in dimensions higher than d = 4 (the fourth dimension can sometimes be visualized using colors or time series). Second, too many degrees of freedom create too many variables to fit or model, leading to the classic problem of overfitting. Put simply, there is an absurdly large amount of room in high dimensions. Third, our intuition about relationships among areas and volumes are highly biased by our low-dimensional 3D experiences, causing us to have serious misconceptions about geometric objects in high-dimensional spaces. Physical processes occurring in 3D can be over-generalized to give preconceived notions that just don’t hold true in higher dimensions.

Take, for example, the random walk. It is usually taught starting from a 1-dimensional random walk (flipping a coin) that is then extended to 2D and then to 3D…most textbooks stopping there. But random walks in high dimensions are the rule rather than the exception in complex systems. One example that is especially important in this context is the problem of molecular evolution. Each site on a genome represents an independent degree of freedom, and molecular evolution can be described as a random walk through that space, but the space of all possible genetic mutations is enormous. Faced with such an astronomically large set of permutations, it is difficult to conceive of how random mutations could possibly create something as complex as, say, ATP synthase which is the basis of all higher bioenergetics. Fortunately, the answer to this puzzle lies in the physics of random walks in high dimensions.

Why Ten Dimensions?

This blog presents the physics of random walks in 10 dimensions. Actually, there is nothing special about 10 dimensions versus 9 or 11 or 20, but it gives a convenient demonstration of high-dimensional physics for several reasons. First, it is high enough above our 3 dimensions that there is no hope to visualize it effectively, even by using projections, so it forces us to contend with the intrinsic “unvisualizability” of high dimensions. Second, ten dimensions is just big enough that it behaves roughly like any higher dimension, at least when it comes to random walks. Third, it is about as big as can be handled with typical memory sizes of computers. For instance, a ten-dimensional hypercubic lattice with 10 discrete sites along each dimension has 10^10 lattice points (10 Billion or 10 Gigs) which is about the limit of what a typical computer can handle with internal memory.

As a starting point for visualization, let’s begin with the well-known 4D hypercube but extend it to a 4D hyperlattice with three values along each dimension instead of two. The resulting 4D lattice can be displayed in 2D as a network with 3^4 = 81 nodes and 216 links or edges. The result is shown in Fig. 1, represented in two dimensions as a network graph with nodes and edges. Each node has four links with neighbors. Despite the apparent 3D look that this graph has about it, if you look closely you will see the frustration that occurs when trying to link to 4 neighbors, causing many long-distance links.

[See YouTube video for movies showing evolving hyperlattices and random walks in 10D.]

Fig. 1 A 4D hyperlattice with three sites along each of the 4 dimensions. This high dimensional discrete lattice is represented as a network graph in 2D with nodes and edges.

We can also look at a 10D hypercube that has 2^10 = 1024 nodes and 5120 edges, shown in Fig. 2. It is a bit difficult to see the hypercubic symmetry when presented in 2D, but each node has exactly 10 links.

Fig. 2 A 10D hypercube of 1024 nodes and 5120 edges. Each node has exactly 10 links to neighbors

Extending this 10D lattice to 10 positions instead of 2 and trying to visualize it is prohibitive, since the resulting graph in 2D just looks like a mass of overlapping circles. However, our interest extends not just to ten locations per dimension, but to an unlimited number of locations. This is the 10D infinite lattice on which we want to explore the physics of the random walk.

Diffusion in Ten Dimensions

An unconstrained random walk in 10D is just a minimal extension beyond a simple random walk in 1D. Because each dimension is independent, a single random walker takes a random step along any of the 10 dimensions at each iteration so that motion in any one of the 10 dimensions is just a 1D random walk. Therefore, a simple way to visualize this random walk in 10D is simply to plot the walk against each dimension, as in Fig. 3. There is one chance in ten that the walker will take a positive or negative step along any given dimension at each time point.

Fig. 3 A single walker taking random unit steps in 10 dimensions. The position of the walker as a function of time is shown for all ten dimensions.

An alternate visualization of the 10D random walker is shown in Fig. 4 for the same data as Fig. 3. In this case the displacement is color coded, and each column is a different dimension. Time is on the vertical axis (starting at the top and increasing downward). This type of color map can easily be extended to hundreds of dimensions. Each row is a position vector of the single walker in the 10D space

Fig. 4 Same data as in Fig. 3 for a single 10D random walker on a hyperlattice. Distance is color coded. Time is on the vertical axis (increasing downward). Each row is a 10D position vector, and this representation is of a single 10D trajectory.

In the 10D hyperlattice in this section, all lattice sites are accessible at each time point, so there is no constraint preventing the walk from visiting a previously-visited node. There is a possible adjustment that can be made to the walk that prevents it from ever crossing its own path. This is known as a self-avoiding-walk (SAW). In two dimensions, there is a major difference in the geometric and dynamical properties of an ordinary walk and an SAW. However, in dimensions larger than 4, it turns out that there are so many possibilities of where to go (high-dimensional spaces have so much free room) that it is highly unlikely that a random walk will ever cross itself. Therefore, in our 10D hyperlattice we do not need to make the distinction between an ordinary walk and a self-avoiding-walk. However, there are other constraints that can be imposed that mimic how complex systems evolve in time, and these constraints can have important consequences, as we see next.

Random Walk in a Maximally Rough Landscape

In the infinite hyperlattice of the previous section, all lattice sites are the same and are all equally accessible. However, in the study of complex systems, it is common to assign a value to each node in a high-dimensional lattice. This value can be assigned by a potential function, producing a high-dimensional potential landscape over the lattice geometry. Or the value might be the survival fitness of a species, producing a high-dimensional fitness landscape that governs how species compete and evolve. Or the value might be a loss function (an objective function) in a minimization problem from multivariate analysis or machine learning. In all of these cases, the scalar value on the nodes defines a landscape over which a state point executes a walk. The question then becomes, what are the properties of a landscape in high dimensions, and how does it affect a random walker?

As an example, let’s consider a landscape that is completely random point-to-point. There are no correlations in this landscape, making it maximally rough. Then we require that a random walker takes a walk along iso-potentials in this landscape, never increasing and never decreasing its potential. Beginning with our spatial intuition living in 3D space, we might be concerned that such a walker would quickly get confined in some area of the lanscape. Think of a 2D topo map with countour lines drawn on it — If we start at a certain elevation on a mountain side, then if we must walk along directions that maintain our elevation, we stay on a given contour and eventually come back to our starting point after circling the mountain peak — we are trapped! But this intuition informed by our 3D lives is misleading. What happens in our 10D hyperlattice?

To make the example easy to analyze, let’s assume that our potential function is restricted to N discrete values. This means that of the 10 neighbors to a given walker site, on average only 10/N are likely to have the same potential value as the given walker site. This constrains the available sites for the walker, and it converts the uniform hyperlattice into a hyperlattice site percolation problem.

Percolation theory is a fascinating topic in statistical physics. There are many deep concepts that come from asking simple questions about how nodes are connected across a network. The most important aspect of percolation theory is the concept of a percolation threshold. Starting with a complete network that is connected end-to-end, start removing nodes at random. For some critical fraction of nodes removed (on average) there will no longer be a single connected cluster that spans the network. This critical fraction is known as the percolation threshold. Above the percolation threshold, a random walker can get from one part of the network to another. Below the percolation threshold, the random walker is confined to a local cluster.

If a hyperlattice has N discrete values for the landscape potential (or height, or contour) and if a random walker can only move to site that has the same value as the walker’s current value (remains on the level set), then only a fraction of the hyperlattice sites are available to the walker, and the question of whether the walker can find a path the spans the hyperlattice becomes simply a question of how the fraction of available sites relates to the percolation threshold.

The percolation threshold for hyperlattices is well known. For reasonably high dimensions, it is given to good accuracy by

where d is the dimension of the hyperlattice. For a 10D hyperlattice the percolation threshold is p_c(10) = 0.0568, or about 6%. Therefore, if more than 6% of the sites of the hyperlattice have the same value as the walker’s current site, then the walker is free to roam about the hyperlattice.

If there are N = 5 discrete values for the potential, then 20% of the sites are available, which is above the percolation threshold, and walkers can go as far as they want. This statement holds true no matter what the starting value is. It might be 5, which means the walker is as high on the landscape as they can get. Or it might be 1, which means the walker is as low on the landscape as they can get. Yet even if they are at the top, if the available site fraction is above the percolation threshold, then the walker can stay on the high mountain ridge, spanning the landscape. The same is true if they start at the bottom of a valley. Therefore, mountain ridges are very common, as are deep valleys, yet they allow full mobility about the geography. On the other hand, a so-called mountain peak would be a 5 surrounded by 4’s or lower. The odds for having this happen in 10D are 0.2*(1-0.8^10) = 0.18. Then the total density of mountain peaks, in a 10D hyperlattice with 5 potential values, is only 18%. Therefore, mountain peaks are rare in 10D, while mountain ridges are common. In even higher dimensions, the percolation threshold decreases roughly inversely with the dimensionality, and mountain peaks become extremely rare and play virtually no part in walks about the landscape.

To illustrate this point, Fig. 5 is the same 10D network that is in Fig. 2, but only the nodes sharing the same value are shown for N = 5, which means that only 20% of the nodes are accessible to a walker who stays only on nodes with the same values. There is a “giant cluster” that remains connected, spanning the original network. If the original network is infinite, then the giant cluster is also infinite but contains a finite fraction of the nodes.

Fig. 5 A 10D cluster that spans the network in Fig. 2 for 1/5 of the nodes sharing the same landscape value. This cluster represents a mountain ridge that spans the space. There are four additional co-existing clusters, each of which separately spans the same 10D space.

The quantitative details of the random walk can change depending on the proximity of the sub-networks (the clusters, the ridges or the level sets) to the percolation threshold. For instance, a random walker in D =10 with N = 5 is shown in Fig. 6. The diffusion is a bit slower than in the unconstrained walk of Figs. 3 and 4. But the ability to wander about the 10D space is retained.

Fig. 6 A random walker on the level-set cluster of Fig. 5

This is then the general important result: In high-dimensional landscapes, mountain ridges are much more common than mountain peaks. This has profound consequences for the evolution of life, the dynamics of complex systems, and the power of machine learning.

Consequences for Evolution and Machine Learning

When the high-dimensional space is the space of possible mutations on a genome, and when the landscape is a fitness landscape that assigns a survival advantage for one mutation relative to others, then the random walk describes the evolution of a species across generations. The prevalence of ridges, or more generally level sets, in high dimensions has a major consequence for the evolutionary process, because a species can walk along a level set acquiring many possible mutations that have only neutral effects on the survivability of the species. At the same time, the genetic make-up is constantly drifting around in this “neutral network”, allowing the species’ genome to access distant parts of the space. Then, at some point, natural selection may tip the species up a nearby (but rare) peak, and a new equilibrium is attained for the species.

One of the early criticisms of fitness landscapes was the (erroneous) criticism that for a species to move from one fitness peak to another, it would have to go down and cross wide valleys of low fitness to get to another peak. But this was a left-over from thinking in 3D. In high-D, neutral networks are ubiquitous, and a mutation can take a step away from one fitness peak onto one of the neutral networks, which can be sampled by a random walk until the state is near some distant peak. It is no longer necessary to think in terms of high peaks and low valleys of fitness — just random walks. The evolution of extremely complex structures, like ATP synthase, can then be understood as a random walk along networks of nearly-neutral fitness — once our 3D biases are eliminated.

The same arguments hold for many situations in machine learning and especially deep learning. When training a deep neural network, there can be thousands of neural weights that need to be trained through the minimization of a loss function, also known as an objective function. The loss function is the equivalent to a potential, and minimizing the loss function over the thousands of dimensions is the same problem as maximizing the fitness of an evolving species.

At first look, one might think that deep learning is doomed to failure. We have all learned, from the earliest days in calculus, that enough adjustable parameter can fit anything, but the fit is meaningless because it predicts nothing. Deep learning seems to be the worst example of this. How can fitting thousands of adjustable parameters be useful when the dimensionality of the optimization space is orders of magnitude larger than the degrees of freedom of the system being modeled?

The answer comes from the geometry of high dimensions. The prevalence of neutral networks in high dimensions gives lots of chances to escape local minima. In fact, local minima are actually rare in high dimensions, and when they do occur, there is a neutral network nearby onto which they can escape (if the effective temperature of the learning process is set sufficiently high). Therefore, despite the insanely large number of adjustable parameters, general solutions, that are meaningful and predictive, can be found by adding random walks around the objective landscape as a partial strategy in combination with gradient descent.

Given the superficial analogy of deep learning to the human mind, the geometry of random walks in ultra-high dimensions may partially explain our own intelligence and consciousness.

Biblography

S. Gravilet, Fitness Landscapes and the Origins of Species. Princeton University Press, 2004.

M. Kimura, The Neutral Theory of Molecular Evolution. Cambridge University Press, 1968.

YouTube Vlog on A Random Walk in 10 Dimensions

Galileo Unbound

Companion blog to the book "Galileo Unbound" published by Oxford University Press (David D. Nolte)

Tag / Hyperspace