George Cantor meets Machine Learning: Deep Discrete Encoders

Machine learning is characterized, more than by any other aspect, by the high dimensionality of the data spaces it seeks to find patterns in.  Hence, one of the principle functions of machine learning is to reduce the dimensionality of the data to lower dimensions—a process known as dimensionality reduction.

There are two driving reasons to reduce the dimensionality of data: 

First, typical dimensionalities faced by machine learning problems can be in the hundreds or thousands.  Trying to visualize such high dimensions may sound mind expanding, but it is really just saying that a data problem may have hundreds or thousands of different data entries for a single event.  And many, or even most, of those entries may not be independent.  While many others may be pure noise—or at least not relevant to the pattern.  Deep learning dimensionality reduction seeks to find the dependences—many of them nonlinear and non-single-valued (non-invertible)—and to reject the noise channels.

Second, the geometry of high dimension is highly unintuitive.  Many of the things we take for granted in our pitifully low dimension of 3 (or 4 if you include time) just don’t hold in high dimensions.  For instance, in very high dimensions almost all random vectors in a hyperspace are orthogonal, and almost all random unit vectors in the hyperspace are equidistant.  Even the topology of landscapes in high dimensions is unintuitive—there are far more mountain ridges than mountain peaks—with profound consequences for dynamical processes such as random walks (see my Blog on a Random Walk in 10 Dimensions).  In fact, we owe our evolutionary existence to this effect!  Therefore, deep dimensionality reduction is a way to bring complex data down to a dimensionality where our intuition can be applied to “explain” the data.

But what is a dimension?  And can you find the “right” dimensionality when performing dimensionality reduction?  Once again, our intuition struggles with these questions, as first discovered by a nineteenth-century German mathematician whose mind-expanding explorations of the essence of different types of infinity shattered the very concept of dimension.

George Cantor

Georg Cantor (1845 – 1918) was born in Russia, and the family moved to Germany while Cantor was still young.  In 1863, he enrolled at the University of Berlin where he sat on lectures by Weierstrass and Kronecker.  He received his doctorate in 1867 and his Habilitation in 1869, moving into a faculty position at the University of Halle and remaining there for the rest of his career.  Cantor published a paper early in 1872 on the question of whether the representation of an arbitrary function by a Fourier series is unique.  He had found that even though the series might converge to a function almost everywhere, there surprisingly could still be an infinite number of points where the convergence failed.  Originally, Cantor was interested in the behavior of functions at these points, but his interest soon shifted to the properties of the points themselves, which became his life’s work as he developed set theory and transfinite mathematics.

Georg Cantor (1845 – 1918)

In 1878, in a letter to his friend Richard Dedekind, Cantor showed that there was a one-to-one correspondence between the real numbers and the points in any n-dimensional space.  He was so surprised by his own result that he wrote to Dedekind “I see it, but I don’t believe it.”  Previously, the ideas of a dimension, moving in a succession from one (a line) to two (a plane) to three (a volume) had been absolute.  However, with his newly-discovered mapping, the solid concepts of dimension and dimensionality began to dissolve.  This was just the beginning of a long history of altered concepts of dimension (see my Blog on the History of Fractals).

Mapping Two Dimensions to One

Cantor devised a simple mapping that is at once obvious and subtle. To take a specific example of mapping a plane to a line, one can consider the two coordinate values (x,y) both with a closed domain on [0,1]. Each can be expressed as a decimal fraction given by

These two values can be mapped to a single value through a simple “ping-pong” between the decimal digits as

If x and y are irrational, then this presents a simple mapping of a pair of numbers (two-dimensional coordinates) to a single number. In this way, the special significance of two dimensions relative to one dimension is lost. In terms of set theory nomenclature, the cardinality of the two-dimensional R2 is the same as for the one-dimensions R.

Nonetheless, intuition about dimension still has it merits, and a subtle aspect of this mapping is that it contains discontinuities. These discontinuities are countably infinite, but they do disrupt any smooth transformation from 2D to 1D, which is where the concepts of intrinsic dimension are hidden. The topological dimension of the plane is clearly equal to 2, and that of the line is clearly equal to 1, determined by the dimensionality D+1 of cuts that are required to separate the sets. The area is separated by a D = 1 line, while the line is separated by a D= 0 point.

Fig. 1 A mapping of a 2D plane to a 1D line. Every point in the plane has an associated point on the line (except for a countably infinite number of special points … see the next figure.)

While the discontinuities help preserve the notions of dimension, and they are countably infinite (with the cardinality of the rational numbers), they pertain merely to the representation of number by decimals. As an example, in decimal notation for a number a = 14159/105, one has two equivalent representations

trailing either infinitely many 0’s or infinitely many 9’s. Despite the fact that these representations point to the same number, when it is used as one of the pairs in the bijection of Fig. 1, it produces two distinctly different numbers of t in R. Fortunately, there is a way to literally sweep this under the rug. Any time one retrieves a number that has repeating 0’s or 9’s, simply sweep it to the right it by dividing by a power of 10 to a region of trailing zeros for a different number, call it b, as in Fig. 2. The shifted version of a will not overlap with the number b, because b also ends in repeating 0’s or 9’s and so is swept farther to the right, and so on to infinity, so none of these numbers overlap, each is distinct, and the mapping becomes what is known as a bijection with one-to-one correspondence.

Fig. 3 The “measure-zero” fix. Numbers that have two equivalent representations can be shifted to the right to replace other numbers that are shifted further to the right, and so on to infinity. There is infinite room within the reals to accommodate the countably infinite number of repeating-decimal numbers.

Space-Filling Curves

When the peripatetic mathematician Guiseppe Peano learned of Cantor’s result for the mapping of 2D to 1D, he sought to demonstrate the correspondence geometrically, and he constructed a continuous curve that filled space, publishing the method in Sur une courbe, qui remplit toute une aire plane [1] in 1890.  The construction of Peano’s curve proceeds by taking a square and dividing it into 9 equal sub squares.  Lines connect the centers of each of the sub squares.  Then each sub square is divided again into 9 sub squares whose centers are all connected by lines.  At this stage, the original pattern, repeated 9 times, is connected together by 8 links, forming a single curve.  This process is repeated infinitely many times, resulting in a curve that passes through every point of the original plane square.  In this way, a line is made to fill a plane.  Where Cantor had proven abstractly that the cardinality of the real numbers was the same as the points in n-dimensional space, Peano created a specific example. 

This was followed quickly by another construction, invented by David Hilbert in 1891, that divided the square into four instead of nine, simplifying the construction, but also showing that such constructions were easily generated [2].  The space-filling curves of Peano and Hilbert have the extreme properties that a one-dimensional curve approaches every point in a two-dimensional space.  These curves have topological dimensionality of 1D and a fractal dimensionality of 2D. 

Fig. 3 Peano’s (1890) and Hilbert’s (1891) plane-filling curves.  When the iterations are taken to infinity, the curves approach every point of two-dimensional space arbitrarily closely, giving them a dimension D = 2.

A One-Dimensional Deep Discrete Encoder for MNIST

When performing dimensionality reduction in deep learning it is tempting to think that there is an underlying geometry to the data—as if they reside on some submanifold of the original high-dimensional space.  While this can be the case (for which dimensionality reduction is called deep metric learning) more often there is no such submanifold.  For instance, when there is a highly conditional nature to the different data channels, in which some measurements are conditional on the values of other measurements, then there is no simple contiguous space that supports the data.

It is also tempting to think that a deep learning problem has some intrinsic number of degrees of freedom which should be matched by the selected latent dimensionality for the dimensionality reduction.  For instance, if there are roughly five degrees of freedom buried within a complicated data set, then it is tempting to believe that the appropriate latent dimensionality also should be chosen to be five.  But this also is missing the mark. 

Take, for example, the famous MNIST data set of hand-written digits from 0 to 9.  Each digit example is contained in a 28-by28 two-dimensional array that is typically flattened to a 784-element linear vector that locates that single digit example within a 784-dimensional space.  The goal is to reduce the dimensionality down to a manageable number—but what should the resulting latent dimensionality be?  How many degrees of freedom are involved in writing digits?  Furthermore, given that there are ten classes of digits, should the chosen dimensionality of the latent space be related to the number of classes?

Fig. 4 A sampling of MNIST digits.

To illustrate that the dimensionality of the latent space has little to do with degrees of freedom or the number of classes, let’s build a simple discrete encoder that encodes the MNIST data set to the one-dimensional line—following Cantor.

The deep network of the encoder can have a simple structure that terminates with a single neuron that has a piece-wise linear output.  The objective function (or loss function) measures the squared distances of the outputs of the single neuron, after transformation by the network, to the associated values 0 to 9.  And that is it!  Train the network by minimizing the loss function.

Fig. 5 A deep encoder that encodes discrete classes to a one-dimensional latent variable.

The results of the linear encoder are shown in Fig. 6 (the transverse directions are only for ease of visualization…the classification is along the long axis). The different dots are specific digits, and the colors are the digit value. There is a clear trend from 1 through 10, although with this linear encoder there is substantial overlap among the point clouds.

Fig. 6 Latent space for a one-dimensional (line) encoder. The transverse dimensions are only for visualization.

Despite appearances, this one-dimensional discrete line encoder is NOT a form of regression. There is no such thing as 1.5 as the average of a 1-digit and a 2-digit. And 5 is not the average of a 4-digit and a 6-digit. The digits are images that have no intrinsic numerical value. Therefore, this one-dimensional encoder is highly discontinuous, and intuitive ideas about intrinsic dimension for continuous data remain secure.


The purpose of this Blog (apart from introducing Cantor and his ideas on dimension theory) was to highlight a key question of dimensionality reduction in representation theory: What is the intrinsic dimensionality of a dataset? The answer, in the case of discrete classes, is that there is no intrinsic dimensionality. Having 1 degree of freedom or 10 degrees of freedom, i.e. latent dimensionalities of 1 or 10, is mostly irrelevant. In ideal cases, one is just as good as the other.

On the other hand, for real-world data with its inevitable variability and finite variance, there can be reasons to choose one latent dimensionality over another. In fact, the “best” choice of dimensionality is one less than the number of classes. For instance, in the case of MNIST with its 10 classes, that is a 9-dimensional latent space. The reason this is “best” has to do with the geometry of high-dimensional simplexes, which will be the topic of a future Blog.

[1] G.Peano: Sur une courbe, qui remplit toute une aire plane. Mathematische Annalen 36 (1890), 157–160.

[2] D. Hilbert, “Uber die stetige Abbildung einer Linie auf ein Fllichenstilck,” Mathemutische Anna/en, vol. 38, pp. 459-460, (1891)

A Short History of Fractal Dimension

It is second nature to think of integer dimensions:  A line is one dimensional.  A plane is two dimensional. A volume is three dimensional.  A point has no dimensions.

It is harder to think in four dimensions and higher, but even here it is a simple extrapolation of lower dimensions.  Consider the basis vectors spanning a three-dimensional space consisting of the triples of numbers

Then a four dimensional hyperspace is just created by adding a new “tuple” to the list

and so on to 5 and 6 dimensions and on.  Child’s play!

But how do you think of fractional dimensions?  What is a fractional dimension?  For that matter, what is a dimension?  Even the integer dimensions began to unravel when George Cantor showed in 1877 that the line and the plane, which clearly had different “dimensionalities”, both had the same cardinality and could be put into a one-to-one correspondence.  From then onward the concept of dimension had to be rebuilt from the ground up, leading ultimately to fractals.

Here is a short history of fractal dimension, partially excerpted from my history of dynamics in Galileo Unbound (Oxford University Press, 2018) pg. 110 ff.  This blog page presents the history through a set of publications that successively altered how mathematicians thought about curves in spaces, beginning with Karl Weierstrass in 1872.

Karl Weierstrass (1872)

Karl Weierstrass (1815 – 1897) was studying convergence properties of infinite power series in 1872 when he began with a problem that Bernhard Riemann had given to his students some years earlier.  Riemann had asked whether the function

was continuous everywhere but not differentiable.  This simple question about a simple series was surprisingly hard to answer (it was not solved until Hardy provided the proof in 1916 [1]).  Therefore, Weierstrass conceived of a simpler infinite sum that was continuous everywhere and for which he could calculate left and right limits of derivatives at any point.  This function is

where b is a large odd integer and a is positive and less than one.  Weierstrass showed that the left and right derivatives failed to converge to the same value, no matter where he took his point.  In short, he had discovered a function that was continuous everywhere, but had a derivative nowhere [2].  This pathological function, called a “Monster” by Charles Hermite, is now called the Weierstrass function.

Beyond the strange properties that Weierstrass sought, the Weierstrass function would turn out to be a fractal curve (recognized much later by Besicovitch and Ursell in 1937 [3]) with a fractal (Hausdorff) dimension given by

although this was not proven until very recently [4].  An example of the function is shown in Fig. 1 for a = 0.5 and b = 5.  This specific curve has a fractal dimension D = 1.5693.  Notably, this is a number that is greater than 1 dimension (the topological dimension of the curve) but smaller than 2 dimensions (the embedding dimension of the curve).  The curve tends to fill more of the two dimensional plane than a straight line, so its intermediate fractal dimension has an intuitive feel about it.  The more “monstrous” the curve looks, the closer its fractal dimension approaches 2.

Fig. 1  Weierstrass’ “Monster” (1872) with a = 0.5, b = 5.  This continuous function is nowhere differentiable.  It is a fractal with fractal dimension D = 2 + ln(0.5)/ln(5) = 1.5693.

Georg Cantor (1883)

Partially inspired by Weierstrass’ discovery, George Cantor (1845 – 1918) published an example of an unusual ternary set in 1883 in “Grundlagen einer allgemeinen Mannigfaltigkeitslehre” (“Foundations of a General Theory of Aggregates”) [5].  The set generates a function (The Cantor Staircase) that has a derivative equal to zero almost everywhere, yet whose area integrates to unity.  It is a striking example of a function that is not equal to the integral of its derivative!  Cantor demonstrated that the size of his set is aleph0 , which is the cardinality of the real numbers.  But whereas the real numbers are uniformly distributed, Cantor’s set is “clumped”.  This clumpiness is an essential feature that distinguishes it from the one-dimensional number line, and it raised important questions about dimensionality. The fractal dimension of the ternary Cantor set is DH = ln(2)/ln(3) = 0.6309.

Fig. 2  The 1883 Cantor set (below) and the Cantor staircase (above, as the indefinite integral over the set).

Giuseppe Peano (1890)

In 1878, in a letter to his friend Richard Dedekind, Cantor showed that there was a one-to-one correspondence between the real numbers and the points in any n-dimensional space.  He was so surprised by his own result that he wrote to Dedekind “I see it, but I don’t believe it.”  The solid concepts of dimension and dimensionality were dissolving before his eyes.  What does it mean to trace the path of a trajectory in an n-dimensional space, if all the points in n dimensions were just numbers on a line?  What could such a trajectory look like?  A graphic example of a plane-filling path was constructed in 1890 by Peano [6], who was a peripatetic mathematician with interests that wandered broadly across the landscape of the mathematical problems of his day—usually ahead of his time.  Only two years after he had axiomatized linear vector spaces [7], Peano constructed a continuous curve that filled space. 

The construction of Peano’s curve proceeds by taking a square and dividing it into 9 equal sub squares.  Lines connect the centers of each of the sub squares.  Then each sub square is divided again into 9 sub squares whose centers are all connected by lines.  At this stage, the original pattern, repeated 9 times, is connected together by 8 links, forming a single curve.  This process is repeated infinitely many times, resulting in a curve that passes through every point of the original plane square.  In this way, a line is made to fill a plane.  Where Cantor had proven abstractly that the cardinality of the real numbers was the same as the points in n-dimensional space, Peano created a specific example.  This was followed quickly by another construction, invented by David Hilbert in 1891, that divided the square into four instead of nine, simplifying the construction, but also showing that such constructions were easily generated.

Fig. 3 Peano’s (1890) and Hilbert’s (1891) plane-filling curves.  When the iterations are taken to infinity, the curves approach every point of two-dimensional space arbitrarily closely, giving them a dimension DH = DE = 2, although their topological dimensions are DT = 1.

Helge von Koch (1904)

The space-filling curves of Peano and Hilbert have the extreme property that a one-dimensional curve approaches every point in a two-dimensional space.  This ability of a one-dimensional trajectory to fill space mirrored the ergodic hypothesis that Boltzmann relied upon as he developed statistical mechanics.  These examples by Peano, Hilbert and Boltzmann inspired searches for continuous curves whose dimensionality similarly exceeded one dimension, yet without filling space.  Weierstrass’ Monster was already one such curve, existing in some dimension greater than one but not filling the plane.  The construction of the Monster required infinite series of harmonic functions, and the resulting curve was single valued on its domain of real numbers. 

An alternative approach was proposed by Helge von Koch (1870—1924), a Swedish mathematician with an interest in number theory.  He suggested in 1904 that a set of straight line segments could be joined together, and then shrunk by a scale factor to act as new segments of the original pattern [8].  The construction of the Koch curve is shown in Fig. 4.  When the process is taken to its limit, it produces a curve, differentiable nowhere, which snakes through two dimensions.  When connected with other identical curves into a hexagon, the curve resembles a snowflake, and the construction is known as “Koch’s Snowflake”. 

The Koch curve begins in generation 1 with N0 = 4 elements.  These are shrunk by a factor of b = 1/3 to become the four elements of the next generation, and so on.  The number of elements varies with the observation scale according to the equation

where D is called the fractal dimension.  In the example of the Koch curve, the fractal dimension is

which is a number less than its embedding dimenion DE = 2.  The fractal is embedded in 2D but has a fractional dimension that is greater than it topological dimension DT = 1.

Fig. 4  Generation of a Koch curve (1904).  The fractal dimension is D = ln(4)/ln(3) = 1.26.  At each stage, four elements are reduced in size by a factor of 3.  The “length” of the curve approaches infinity as the features get smaller and smaller.  But the scaling of the length with size is determined uniquely by the fractal dimension.

Waclaw Sierpinski (1915)

Waclaw Sierpinski (1882 – 1969) was a Polish mathematician studying at the Jagellonian University in Krakow for his doctorate when he came across a theorem that every point in the plane can be defined by a single coordinate.  Intrigued by such an unintuitive result, he dived deep into Cantor’s set theory after he was appointed as a faculty member at the university in Lvov.  He began to construct curves that had more specific properties than the Peano or Hilbert curves, such as a curve that passes through every interior point of a unit square but that encloses an area that is only equal to 5/12 = 0.4167.  Sierpinski became interested in the topological properties of such sets.

Sierpinski considered how to define a curve that was embedded in DE = 2 but that was NOT constructed as a topological dimension DT = 1 curve as the curves of Peano, Hilbert, Koch (and even his own) had been.  To demonstrate this point, he described a construction that began with a topological dimension DT = 2 object, a planar triangle, from which the open set of its central inverted triangle is removed, leaving its boundary points.  The process is continued iteratively to all scales [9].  The resulting point set is shown in Fig. 5 and is called the Sierpinski gasket.  What is left after all the internal triangles are removed is a point set that can be made discontinuous by cutting it at a finite set of points.  This is shown in Fig. 5 by the red circles.  Each circle, no matter the size, cuts the set at three points, making the resulting set discontinuous.  Ten years later, Karl Menger would show that this property of discontinuous cuts determined the topological dimension of the Sierpinski gasket to be DT = 1.  The embedding dimension is of course DE = 2, and the fractal dimension of the Sierpinski gasket is

Fig. 5 The Sierpinski gasket.  The central triangle is removed (leaving its boundary) at each scale.  The pattern is self-similar with a fractal dimension DH = 1.5850.  Unintuitively, it has a topological dimension DT = 1.

Felix Hausdorff (1918)

The work by Cantor, Peano, von Koch and Sierpinski had created a crisis in geometry as mathematicians struggled to rescue concepts of dimensionality.  An important byproduct of that struggle was a much deeper understanding of concepts of space, especially in the hands of Felix Hausdorff. 

Felix Hausdorff (1868 – 1942) was born in Breslau, Prussia, and educated in Leipzig.  In his early years as a doctoral student, and as an assistant professor at Leipzig, he was a practicing mathematician by day and a philosopher and playwright by night, publishing under the pseudonym Paul Mongré.  He was at the University of Bonn working on set theory when the Greek mathematician Constatin Carathéodory published a paper in 1914 that showed how to construct a p-dimensional set in a q-dimensional space [9].  Haussdorff realized that he could apply similar ideas to the Cantor set.  He showed that the outer measure of the Cantor set would go discontinuously from zero to infinity as the fractional dimension increased smoothly.  The critical value where the measure changed its character became known as the Hausdorff dimension [11]

For the Cantor ternary set, the Hausdorff dimension is exactly DH = ln(2)/ln(3) = 0.6309.  This value for the dimension is less than the embedding dimension DE = 1 of the support (the real numbers on the interval [0, 1]), but it is also greater than DT = 0 which would hold for a countable number of points on the interval.  The work by Hausdorff became well known in the mathematics community who applied the idea to a broad range of point sets like Weierstrass’s monster and the Koch curve.

It is important to keep a perspective of what Hausdorff’s work meant during which period of time.  For instance, although the curves of Weierstrass, von Koch and Sierpinski were understood to present a challenge to concepts of dimension, it was only after Haussdorff that mathematicians began to think in terms of fractional dimensions and to calculate the fractional dimensions of these earlier point sets.  Despite the fact that Sierpinski created one of the most iconic fractals that we use as an example every day, he was unaware at the time that he was doing so.  His interest was topological—creating a curve for which any cut at any point would create disconnected subsets starting with objects (triangles) with topological dimension DT = 2.  In this way, talking about the early fractal objects tends to be anachronistic, using language to describe them that had not yet been invented at that time.

This perspective is also true for the ideas of topological dimension.  For instance, even Sierpinski was not fully tuned into the problems of defining topological dimension.  It turns out that what he created was a curve of topological dimension DT = 1, but that would only become clear later with the work of the Austrian mathematician Karl Menger.

Karl Menger (1926)

The day that Karl Menger (1902 – 1985) was born, his father, Carl Menger (1840 – 1941) lost his job.  Carl Menger was one of the founders of the famous Viennese school that established the marginalist view of economics.  However, Carl was not married to Karl’s mother, which was frowned upon by polite Vienna society, so he had to relinquish his professorship.  Despite his father’s reduction in status, Karl received an excellent education at a Viennese gymnasium (high school).  Among of his classmates were Wolfgang Pauli (Nobel Prize for Physics in 1945)  and Richard Kuhn (Nobel Prize for Chemistry in 1938).  When Karl began attending the University of Vienna he studied physics, but the mathematics professor Hans Hahn opened his eyes to the fascinating work on analysis that was transforming mathematics at that time, so Karl shifted his studies to mathematical analysis, specifically concerning conceptions of “curves”. 

Menger made important contributions to the history of fractal dimension as well as the history of topological dimension.  In his approach to defining the intrinsic topological dimension of a point set, he described the construction of a point set embedded in three dimensions that had zero volume, an infinite surface area, and a fractal dimension between 2 and 3.  The object is shown in Fig. 6 and is called a Menger “sponge” [12].  The Menger sponge is a fractal with a fractal dimension DH = ln(20)/ln(3) = 2.7268.  The face of the sponge is also known as the Sierpinski carpt.  The fractal dimension of the Sierpinski carpet is DH = ln(8)/ln(3) = 1.8928.

Fig. 6 Menger Sponge. Embedding dimension DE = 3. Fractal dimension DH = ln(20)/ln(3) = 2.7268. Topological dimension DT = 1: all one-dimensional metric spaces can be contained within the Menger sponge point set. Each face is a Sierpinski carpet with fractal dimension DH = ln(8)/ln(3) = 1.8928.

The striking feature of the Menger sponge is its topological dimension.  Menger created a new definition of topological dimension that partially solved the crises created by Cantor when he showed that every point on the unit square can be defined by a single coordinate.  This had put a one dimensional curve in one-to-one correspondence with a two-dimensional plane.  Yet the topology of a 2-dimensional object is clearly different than the topology of a line.  Menger found a simple definition that showed why 2D is different, topologically, than 3D, despite Cantor’s conundrum.  The answer came from the idea of making cuts on a point set and seeing if the cut created disconnected subsets. 

As a simple example, take a 1D line.  The removal of a single point creates two disconnected sub-lines.  The intersection of the cut with the line is 0-dimensional, and Menger showed that this defined the line as 1-dimensional.  Similarly, a line cuts the unit square into to parts.  The intersection of the cut with the plane is 1-dimensional, signifying that the dimension of the plane is 2-dimensional.  In other words, a (n-1) dimensional intersection of the boundary of a small neighborhood with the point set indicates that the point set has a dimension of n.  Generalizing this idea, looking at the Sierpinski gasket in Fig. 5, the boundary of a small circular region, if placed appropriately (as in the figure), intersects the Sierpinski gasket at three points of dimension zero.  Hence, the topological dimension of the Sierpinski gasket is one-dimensional.  Manger was likewise able to show that his sponge also had a topology that was one-dimensional, DT = 1, despite the embedding dimension of DE = 3.  In fact, all 1-dimensional metric spaces can be fit inside a Menger Sponge.

Benoit Mandelbrot (1967)

Benoit Mandelbrot (1924 – 2010) was born in Warsaw and his family emigrated to Paris in 1935.  He attended the Ecole Polytechnique where he studied under Gaston Julia (1893 – 1978) and Paul Levy (1886 – 1971).  Both Julia and Levy made significant contributions to the field of self-similar point sets and made a lasting impression on Mandelbrot.  He went to Cal Tech for a master’s degree in aeronautics and then a PhD in mathematical sciences from the University of Paris.  In 1958 Mandelbrot joined the research staff of the IBM Thomas J. Watson Research Center in Yorktown Heights, New York where he worked for over 35 years on topics of information theory and economics, always with a view of properties of self-similar sets and time series.

In 1967 Mandelbrot published one of his early important papers on the self-similar properties of the coastline of Britain.  He proposed that many natural features had statistical self similarity, which he applied to coastlines.  He published the work as “How Long Is the Coast of Britain? Statistical Self-Similarity and Fractional Dimension[13] in Science magazine , where he showed that the length of the coastline diverged with a Hausdorff dimension equal to D = 1.25.  Working at IBM, a world leader in computers, he had ready access to their power as well as their visualization capabilities.  Therefore, he was one of the first to begin exploring the graphical character of self-similar maps and point sets.

During one of his sabbaticals at Harvard University he began exploring the properties of Julia sets (named after his former teacher at the Ecole Polytechnique).  The Julia set is a self-similar point set that is easily visualized in the complex plane (two dimensions).  As Mandelbrot studied the convergence of divergence of infinite series defined by the Julia mapping, he discovered an infinitely nested pattern that was both beautiful and complex.  This has since become known as the Mandelbrot set.

Fig. 7 Mandelbrot set.

Later, in 1975, Mandelbrot coined the term fractal to describe these self-similar point sets, and he began to realize that these types of sets were ubiquitous in nature, ranging from the structure of trees and drainage basins, to the patterns of clouds and mountain landscapes.  He published his highly successful and influential book The Fractal Geometry of Nature in 1982, introducing fractals to the wider public and launching a generation of hobbyists interested in computer-generated fractals.  The rise of fractal geometry coincided with the rise of chaos theory that was aided by the same computing power.  For instance, important geometric structures of chaos theory, known as strange attractors, have fractal geometry. 

Appendix:  Box Counting

When confronted by a fractal of unknown structure, one of the simplest methods to find the fractal dimension is through box counting.  This method is shown in Fig. 8.  The fractal set is covered by a set of boxes of size b, and the number of boxes that contain at least one point of the fractal set are counted.  As the boxes are reduced in size, the number of covering boxes increases as 

To be numerically accurate, this method must be iterated over several orders of magnitude.  The number of boxes covering a fractal has this characteristic power law dependence, as shown in Fig. 8, and the fractal dimension is obtained as the slope.

Fig. 8  Calculation of the fractal dimension using box counting.  At each generation, the size of the grid is reduced by a factor of 3.  The number of boxes that contain some part of the fractal curve increases as  , where b is the scale


[1] Hardy, G. (1916). “Weierstrass’s non-differentiable function.” Transactions of the American Mathematical Society 17: 301-325.

[2] Weierstrass, K. (1872). “Uber continuirliche Functionen eines reellen Arguments, die fur keinen Werth des letzteren einen bestimmten Differentialquotienten besitzen.” Communication ri I’Academie Royale des Sciences II: 71-74.

[3] Besicovitch, A. S. and H. D. Ursell (1937). “Sets of fractional dimensions: On dimensional numbers of some continuous curves.” J. London Math. Soc. 1(1): 18-25.

[4] Shen, W. (2018). “Hausdorff dimension of the graphs of the classical Weierstrass functions.” Mathematische Zeitschrift. 289(1–2): 223–266.

[5] Cantor, G. (1883). Grundlagen einer allgemeinen Mannigfaltigkeitslehre. Leipzig, B. G. Teubner.

[6] Peano, G. (1890). “Sur une courbe qui remplit toute une aire plane.” Mathematische Annalen 36: 157-160.

[7] Peano, G. (1888). Calcolo geometrico secundo l’Ausdehnungslehre di H. Grassmann e precedutto dalle operazioni della logica deduttiva. Turin, Fratelli Bocca Editori.

[8] Von Koch, H. (1904). “Sur.une courbe continue sans tangente obtenue par une construction geometrique elementaire.” Arkiv for Mathematik, Astronomi och Fysich 1: 681-704.

[9] Sierpinski, W. (1915). “Sur une courbe dont tout point est un point de ramification.” Comptes Rendus Hebdomadaires des Seances de l’Academie des Sciences de Paris 160: 302-305.

[10] Carathéodory, C. (1914). “Über das lineare Mass von Punktmengen – eine Verallgemeinerung des Längenbegriffs.” Gött. Nachr. IV: 404–406.

[11] Hausdorff, F. (1919). “Dimension und ausseres Mass.” Mathematische Anna/en 79: 157-179.

[12] Menger, Karl (1926), “Allgemeine Räume und Cartesische Räume. I.”, Communications to the Amsterdam Academy of Sciences. English translation reprinted in Edgar, Gerald A., ed. (2004), Classics on fractals, Studies in Nonlinearity, Westview Press. Advanced Book Program, Boulder, CO

[13] B Mandelbrot, How Long Is the Coast of Britain? Statistical Self-Similarity and Fractional Dimension. Science, 156 3775 (May 5, 1967): 636-638.