A Short History of Neural Networks

When it comes to questions about the human condition, the question of intelligence is at the top. What is the origin of our intelligence? How intelligent are we? And how intelligent can we make other things…things like artificial neural networks?

This is a short history of the science and technology of neural networks, not just artificial neural networks but also the natural, organic type, because theories of natural intelligence are at the core of theories of artificial intelligence. Without understanding our own intelligence, we probably have no hope of creating the artificial type.

Ramon y Cajal (1888): Visualizing Neurons

The story begins with Santiago Ramon y Cajal (1853 – 1934) who received the Nobel Prize in physiology in 1906 for his work illuminating natural neural networks. He built on work by Camillo Golgi, using a stain to give intracellular components contrast [1], and then went further to developed his own silver emulsions like those of early photography (which was one of his hobbies). Cajal was the first to show that neurons were individual constituents of neural matter and that their contacts were sequential: axons of sets of neurons contacted the dendrites of other sets of neurons, never axon-to-axon or dendrite-to-dendrite, to create a complex communication network. This became known as the neuron doctrine, and it is a central idea of neuroscience today.

Fig. 1 One of Cajal’s published plates demonstrating neural synapses. From Link.

McCulloch and Pitts (1943): Mathematical Models

In 1941, Warren S. McCulloch (1898–1969) arrived at the Department of Psychiatry at the University of Illinois at Chicago where he met with the mathematical biology group at the University of Chicago led by Nicolas Rashevsky (1899–1972), widely acknowledged as the father of mathematical biophysics in the United States.

An itinerant member of Rashevsky’s group at the time was a brilliant, young and unusual mathematician, Walter Pitts (1923– 1969). He was not enrolled as a student at Chicago, but had simply “showed up” one day as a teenager at Rashevsky’s office door.  Rashevsky was so impressed by Pitts that he invited him to attend the group meetings, and Pitts became interested in the application of mathematical logic to biological information systems.

When McCulloch met Pitts, he realized that Pitts had the mathematical background that complemented his own views of brain activity as computational processes. Pitts was homeless at the time, so McCulloch invited him to live with his family, giving the two men ample time to work together on their mutual obsession to provide a logical basis for brain activity in the way that Turing had provided it for computation.

McColloch and Pitts simplified the operation of individual neurons to their most fundamental character, envisioning a neural computing unit with multiple inputs (received from upstream neurons) and a single on-off output (sent to downstream neurons) with the additional possibility of feedback loops as downstream neurons fed back onto upstream neurons. They also discretized the dynamics in time, using discrete logic and time-difference equations, succeeding in devising a logical structure with rules and equations for the general operation of nets of neurons.  They published their results a 1943 in the paper titled “A logical calculus of the ideas immanent in nervous activity,” [2] introducing computational language and logic to neuroscience.  Their simplified neural unit became the basis for discrete logic, picked up a few years later by von Neumann as an elemental example of a logic gate upon which von Neumann began constructing the theory and design of the modern electronic computer.

Fig. 2 The only figure in McCulloch and Pitt’s “Logical Calculus”.

Donald Hebb (1949): Hebbian Learning

The basic model for learning and adjustment of synaptic weights among neurons was put forward in 1949 by the physiological psychologist Donald Hebb (1904-1985) of McGill University in Canada in a book titled The Organization of Behavior [3].

In Hebbian learning, an initially untrained network consists of many neurons with many synapses having random synaptic weights. During learning, a synapse between two neurons is strengthened when both the pre-synaptic and post-synaptic neurons are firing simultaneously. In this model, it is essential that each neuron makes many synaptic contacts with other neurons because it requires many input neurons acting in concert to trigger the output neuron. In this way, synapses are strengthened when there is collective action among the neurons. The synaptic strengths are therefore altered through a form of self-organization. A collective response of the network strengthens all those synapses that are responsible for the response, while the other synapses that do not contribute, weaken. Despite the simplicity of this model, it has been surprisingly robust, standing up as a general principle for the training of artificial neural networks.

Fig. 3. A Figure from Hebb’s textbook on psychology (1958). From Link.

Hodgkin and Huxley (1952): Neuron Transporter Models

Alan Hodgkin (1914 – 1998) and Andrew Huxley (1917 – 2012) were English biophysicists who received the 1963 Nobel Prize in physiology for their work on the physics behind neural activation.  They constructed a differential equation for the spiking action potential for which their biggest conceptual challenge was the presence of time delays in the voltage signals that were not explained by linear models of the neural conductance. As they began exploring nonlinear models, using their experiments to guide the choice of parameters, they settled on a dynamical model in a four-dimensional phase space. One dimension was voltage, while another was inhibitory current. The two remaining dimensions were sodium and potassium conductances, which they had determined were the major ions participating in the generation and propagation of the action potential. The nonlinear conductances of their model described the observed time delays and captured the essential neural behavior of the fast spike followed by a slow recovery. Huxley solved the equations on a hand-cranked calculator, taking over three months of tedious cranking to plot the numerical results.

Fig. 4 The Hodgkin-Huxley model of the neuron, including capacitance C, voltage V and bias current I along with the conductances of potassium (K), sodium (Na) and Lithium (L) channels.

Hodgkin and Huxley published [4] their measurements and their model (known as the Hodgkin-Huxley model) in a series of six papers in 1952 that led to an explosion of research in electrophysiology, for which Hodgkin and Huxley won the 1963 Nobel Prize in physiology or medicine. The four-dimensional Hodgkin–Huxley model stands as a classic example of the power of phenomenological modeling when combined with accurate experimental observation. Hodgkin and Huxley were able to ascertain not only the existence of ion channels in the cell membrane, but also their relative numbers, long before these molecular channels were ever directly observed using electron microscopes. The Hodgkin–Huxley model lent itself to simplifications that could capture the essential behavior of neurons while stripping off the details.

Frank Rosenblatt (1958): The Perceptron

Frank Rosenblatt (1928–1971) had a PhD in psychology from Cornell University and was in charge of the cognitive systems section of the Cornell Aeronautical Laboratory (CAL) located in Buffalo, New York.  He was tasked with fulfilling a contract from the Navy to develop an analog image processor. Drawing from the work of McCulloch and Pitts, his team constructed a software system and then constructed a hardware model that adaptively updated the strength of the inputs, that they called neural weights, as it was trained on test images. The machine was dubbed the Mark I Perceptron, and its announcement in 1958 created a small media frenzy [5]. A New York Times article reported the perceptron was “the embryo of an electronic computer that [the navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”

The perceptron had a simple architecture, with two layers of neurons consisting of an input layer and a processing layer, and it was programmed by adjusting the synaptic weights to the inputs. This computing machine was the first to adaptively learn its functions, as opposed to following predetermined algorithms like digital computers. It seemed like a breakthrough in cognitive science and computing, as trumpeted by the New York Times.  But within a decade, the development had stalled because the architecture was too restrictive.

Fig. 5 Frank Rosenblatt with his Perceptron. From Link.

Richard Fitzhugh and Jin-Ichi Nagumo (1961): Neural van der Pol Oscillators

In 1961 Richard FitzHugh (1922–2007), a neurophysiology researcher at the National Institute of Neurological Disease and Blindness (NINDB) of the National Institutes of Health (NIH), created a surprisingly simple model of the neuron that retained only a third order nonlinearity, just like the third-order nonlinearity that Rayleigh had proposed and solved in 1883, and that van der Pol extended in 1926. Around the same time that FitzHugh proposed his mathematical model [6], the electronics engineer Jin-Ichi Nagumo (1926-1999) in Japan created an electronic diode circuit with an equivalent circuit model that mimicked neural oscillations [7]. Together, this work by FitzHugh and Nagumo led to the so-called FitzHugh–Nagumo model. The conceptual importance of this model is that it demonstrated that the neuron was a self-oscillator, just like a violin string or wheel shimmy or the pacemaker cells of the heart. Once again, self-oscillators showed themselves to be common elements of a complex world—and especially of life.

Fig. 6 The FitzHugh-Nagumo model of the neuron simplifies the Hodgkin-Huxley model from four dimensions down to two dimensions of voltage V and channel activation n.

John Hopfield (1982): Spin Glasses and Recurrent Networks

John Hopfield (1933–) received his PhD from Cornell University in 1958, advised by Al Overhauser in solid state theory, and he continued to work on a broad range of topics in solid state physics as he wandered from appointment to appointment at Bell Labs, Berkeley, Princeton, and Cal Tech. In the 1970s Hopfield’s interests broadened into the field of biophysics, where he used his expertise in quantum tunneling to study quantum effects in biomolecules, and expanded further to include information transfer processes in DNA and RNA. In the early 1980s, he became aware of aspects of neural network research and was struck by the similarities between McColloch and Pitts’ idealized neuronal units and the physics of magnetism. For instance, there is a type of disordered magnetic material called a spin glass in which a large number of local regions of magnetism are randomly oriented. In the language of solid-state physics, one says that the potential energy function of a spin glass has a large number of local minima into which various magnetic configurations can be trapped. In the language of dynamics, one says that the dynamical system has a large number of basins of attraction [8].

The Parallel Distributed Processing Group (1986): Backpropagation

David Rumelhart, a mathematical psychologist at UC San Diego, was joined by James McClelland in 1974 and then by Geoffrey Hinton in 1978 to become what they called the Parallel Distributed Processing (PDP) group. The central tenets of the PDP framework they developed were: 1) processing is distributed across many semi-autonomous neural units, that 2) learn by adjusting the weights of their interconnections based on the strengths of their signals (i.e., Hebbian learning), whose memories and behaviors are 3) an emergent property of the distributed learned weights.

PDP was an exciting framework for artificial intelligence, and it captured the general behavior of natural neural networks, but it had a serious problem: How could all of the neural weights be trained?

In 1986, Rumelhart and Hinton with the mathematician Ronald Williams developed a mathematical procedure for training neural weights called error backpropagation [9]. The idea is actually very simple: create a mean squared error of the response of a neural network compared to an ideal response, then tweak one of the neural weights and see if the error increases or decreases. If the error decreases, keep the tweak for that weight and move to the next, working iteratively, tweak by tweak, to minimize the mean squared error. In this way, large numbers of neural weights can be adjusted as the network is trained to perform a specified task.

Error backpropagation has come a long way from that early 1986 paper, and it now lies at the core of the AI revolution we are experiencing today as tens of millions of neural weights are trained on massive datasets.

Yann LeCun (1989): Convolutional Neural Networks

In 1988, I was a new post-doc at AT&T Bell Labs at Holmdel, New Jersey fresh out of my PhD in physics from Berkeley. Bell Labs liked to give its incoming employees inspirational talks and tours of their facilities, and one of the tours I took was of the neural network lab run by Lawrence Jackel that was working on computer recognition of zip-code digits. The team’s new post-doc, arriving at Bell Labs the same time as me, was Yann LeCun. It is very possible that the demo our little group watched was run by him, or at least he was there, but at the time he was a nobody, so even if I had heard his name, it wouldn’t have meant anything to me.

Fast forward to today, and Yann LeCun’s name is almost synonomous with AI. He is the Chief AI Scientist at Facebook and his google scholar page reports that he gets 50,000 citations per year.

LeCun is famous for developing the convolutional neural network (CNN) in work that he published from Bell Labs in 1989 [10]. It is a biomimetic neural network that takes its inspiration from the receptive fields of the neural networks in the retina. What you think you see, when you look at something, is actually reconstructed by your brain. Your retina is a neural processor with receptive fields that are a far cry from one-to-one. Most prominent in the retina are center-surround fields, or kernels, that respond to the derivatives of the focused image instead of the image itself. It’s the derivatives that are sent up your optic neuron to your brain which then reconstructs the image. It works as a form of image compression so that broad uniform areas in an image are reduced to its edges.

The convolutional neural network works in the same way, it’s just engineered specifically to produce compressed and multiscale codes that capture broad areas as well as the fine details of an image. By constructing many different “kernel” operators at many different scales, it creates a set of features that capture the nuances of the image in quantitative form that is then processed by training neural weights in downstream neural networks.

Fig. 7 Example of a receptive field of a CNN. The filter is the kernel (in this case a discrete 3×3 Laplace operator) that is stepped sequentially across the image field to produce the Laplacian feature map of the original image. One feature map for every different kernel becomes the input for the next level of kernels in a hierarchical scaling structure.

Geoff Hinton (2006): Deep Belief

It seems like Geoff Hinton has had his finger in almost every pie when it comes to how we do AI today. Backpropagation? Geoff Hinton. Rectified Linear Units? Geoff Hinton. Boltzmann Machines? Geoff Hinton. t-SNE? Geoff Hinton. Dropout regularization? Geoff Hinton. AlexNet? Geoff Hinton. The 2024 Nobel Prize in Physics? Geoff Hinton! He may not have invented all of these, but he was in the midst of it all.

Hinton received his PhD in Artificial Intelligence (ar rare field at the time) from the University of Edinburgh in 1978 after which he joined the PDP group at UCSD (see above) as a post-doc. After a time at Carnegie-Mellon, he joined the University of Toronto, Canada, in 1987 where he established one of the leading groups in the world on neural network research. It was from here that he launched so many of the ideas and techniques that have become the core of deep learning.

A central idea of deep learning came from Hinton’s work on Boltzmann Machines that learn statistical distributions of complex data. This type of neural network is known as an energy-based model, similar to a Hopfield network, and it has strong ties to the statistical mechanics of spin-glass systems. Unfortunately, it is a bitch to train! So Hinton simplified it into a Restricted Boltzmann Machine (RBM) that was much more tractable and layers of RBMs could be stacked into “Deep Belief Networks” [11] that had a hierarchical structure that allowed the neural nets to learn layers of abstractions. These were among the first deep networks that were able to do complex tasks at the level of human capabilities (and sometimes beyond).

The breakthrough that propelled Geoff Hinton to world-wide acclaim was the success of AlexNet, a neural network constructed by his graduate student Alex Krizhevsky at Toronto in 2012 consisting of 650,000 neurons with 60 million parameters that were trained using two early Nvidia GPUs. It won the ImageNet challenge that year, enabled by its deep architecture and representing a marked advancement that has been proceeding unabated today.

Deep learning is now the rule in AI, supported by the Attention mechanism and Transformers that underpin the large language models, like ChatGPT and others, that are poised to disrupt all the legacy business models based on the previous silicon revolution of 50 years ago.

Further Reading

(Sections of this article have been excerpted from Chapter 11 of Galileo Unbound, (Oxford University Press)

References

[1] Ramón y Cajal S. (1888). Estructura de los centros nerviosos de las aves. Rev. Trim. Histol. Norm. Pat. 1, 1–10.

[2] McCulloch, W.S. and W. Pitts, A Logical Calculus of the Ideas Immanent in Nervous Activity. Bull. Math. Biophys., 1943. 5: p. 115.

[3] Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: Wiley and Sons. ISBN 978-0-471-36727-7 – via Internet Archive.

[4] Hodgkin AL, Huxley AF (August 1952). “A quantitative description of membrane current and its application to conduction and excitation in nerve”. The Journal of Physiology. 117 (4): 500–44.

[5] Rosenblatt, Frank (1957). “The Perceptron—a perceiving and recognizing automaton”. Report 85-460-1. Cornell Aeronautical Laboratory.

[6] FitzHugh, Richard (July 1961). “Impulses and Physiological States in Theoretical Models of Nerve Membrane”. Biophysical Journal. 1 (6): 445–466.

[7] Nagumo, J.; Arimoto, S.; Yoshizawa, S. (October 1962). “An Active Pulse Transmission Line Simulating Nerve Axon”. Proceedings of the IRE. 50 (10): 2061–2070.

[8] Hopfield, J. J. (1982). “Neural networks and physical systems with emergent collective computational abilities”. Proceedings of the National Academy of Sciences. 79 (8): 2554–2558.

[9] Rumelhart, D.E. et al. Nature 323, 533-536 (1986).

[10] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541–551, Winter 1989.

[11] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation 18, 1527-1554 (2006).

Books by David D. Nolte at Oxford University Press
Read more in Books by David D. Nolte at Oxford University Press

Frontiers of Physics (2024): Dark Energy Thawing

At the turn of the New Year, as I turn to the breakthroughs in physics of the previous year, sifting through the candidates, I usually narrow it down to about 4 to 6 that I find personally compelling (See, for instance 2023, 2022). In a given year, they may be related to things like supersolids, condensed atoms, or quantum entanglement. Often they relate to those awful, embarrassing gaps in physics knowledge that we give euphemistic names to, like “Dark Energy” and “Dark Matter” (although in the end they may be neither energy nor matter). But this year, as I sifted, I was struck by how many of the “physics” advances of the past year were focused on pushing limits—lower temperatures, more qubits, larger distances.

If you want something that is eventually useful, then engineering is the way to go, and many of the potential breakthroughs of 2024 did require heroic efforts. But if you are looking for a paradigm shift—a new way of seeing or thinking about our reality—then bigger, better and farther won’t give you that. We may be pushing the boundaries, but the thinking stays the same.

Therefore, for 2024, I have replaced “breakthrough” with a single “prospect” that may force us to change our thinking about the universe and the fundamental forces behind it.

This prospect is the weakening of dark energy over time.

It is a “prospect” because it is not yet absolutely confirmed. If it is confirmed in the next few years, then it changes our view of reality. If it is not confirmed, then it still forces us to think harder about fundamental questions, pointing where to look next.

Einstein’s Cosmological “Constant”

Like so much of physics today, the origins of this story go back to Einstein. At the height of WWI in 1917, as Einstein was working in Berlin, he “tweaked” his new theory of general relativity to allow the universe to be static. The tweak came in the form of a parameter he labelled Lambda (Λ), providing a counterbalance against the gravitational collapse of the universe, which at the time was assumed to have a time-invariant density. This cosmological “constant” of spacetime represented a pressure that kept the universe inflated like a balloon.

Fig. 1 Einstein’s “Field Equations” for the universe containing expressions for curvature, the metric tensor and energy density. Spacetime is warped by energy density, and trajectories within the warped spacetime follow geodesic curves. When Λ = 0, only gravitional attraction is present. When Λ ≠ 0, a “repulsive” background force exerts a pressure on spacetime, keeping it inflated like a balloon.

Later, in 1929 when Edwin Hubble discovered that the universe was not static but was expanding, and not only expanding, but apparently on a free trajectory originating at some point in the past (the Big Bang), Einstein zeroed out his cosmological constant, viewing it as one of his greatest blunders.

And so it stood until 1998 when two teams announced that the expansion of the universe is accelerating—and Einstein’s cosmological constant was back in. In addition, measurements of the energy density of the universe showed that the cosmological constant was contributing around 68% of the total energy density, which has been given the name of Dark Energy. One of the ways to measure Dark Energy is through BAO.

Baryon Acoustic Oscillations (BAO)

If the goal of science communication is to be transparent, and to engage the public in the heroic pursuit of pure science, then the moniker Baryon Acoustic Oscillations (BAO) was perhaps the wrong turn of phrase. “Cosmic Ripples” might have been a better analogy (and a bit more poetic).

In the early moments after the Big Bang, slight density fluctuations set up a balance of opposing effects between gravitational attraction, that tends to clump matter, and the homogenization effects of the hot photon background, that tends to disperse ionized matter. Matter consists of both dark matter as well as the matter we are composed of, known as baryonic matter. Only baryonic matter can be ionized and hence interact with photons, hence only photons and baryons experience this balance. As the universe expanded, an initial clump of baryons and photons expanded outward together, like the ripples on a millpond caused by a thrown pebble. And because the early universe had many clumps (and anti-clumps where density was lower than average), the millpond ripples were like those from a gentle rain with many expanding ringlets overlapping.

Fig. 2 Overlapping ripples showing galaxies formed along the shells. The size of the shells is set by the speed of “sound” in the universe. From [Ref].
Fig. 3 Left. Galaxies formed on acoustic ringlets like drops of dew on a spider’s web. Right. Many ringlets overlapping. The characteristic size of the ringlets can still be extracted statistically. From [Ref].

Then, about 400,000 years after the Big Bang, as the universe expanded and cooled, it got cold enough that ionized electrons and baryons formed atoms that are neutral and transparent to light. Light suddenly flew free, decoupled from the matter that had constrained it. Removing the balance between light and matter in the BAO caused the baryonic ripples to freeze in place, as if a sudden arctic blast froze the millpond in an instant. The residual clumps of matter in the early universe became clumps of galaxies in the modern universe that we can see and measure. We can also see the effects of those clumps on the temperature fluctuations of the cosmic microwave background (CMB).

Between these two—the BAO and the CMB—it is possible to measure cosmic distances, and with those distances, to measure how fast the universe is expanding.

Acceleration Slowing

The Dark Energy Spectroscopic Instrument (DESI) on top of Kitt Peak in Arizona is measuring the distances to millions of galaxies using automated fiber optic arrays containing thousands of optical fibers. In one year it measured the distances to about 6 milliion galaxies.

Fig. 4 The Kitt Peak observatory, the site of DESI. From [Ref].

By focusing on seven “epochs” in galaxy formation in the universe, it measures the sizes of the BAO ripples over time, ranging in ages from 3 billion to 11 billion years ago. (The universe is about 13.8 billion years old.) The relative sizes are then compared to the predictions of the LCDM (Lambda-Cold-Dark-Matter) model. This is the “consensus” model of the day—agreed upon as being “most likely” to explain observations. If Dark Energy is a true constant, then the relative sizes of the ripples should all be the same, regardless of how far back in time we look.

But what the DESI data discovered is that relative sizes more recently (a few billion years ago) are smaller than predicted by LCDM. Given that LCDM includes the acceleration of the expansion of the universe caused by Dark Energy, it means that Dark Energy is slightly weaker in the past few billion years than it was 10 billion years ago—it’s weakening or “thawing”.

The measurements as they stand today are shown in Fig. 5, showing the relative sizes as a function of how far back in time they look, with a dashed line showing the deviation from the LCDM prediction. The error bars in the figure are not yet are that impressive, and statistical effects may be causing the trend, so it might be erased by more measurements. But the BAO results have been augmented by recent measurements of supernova (SNe) that provide additional support for thawing Dark Energy. Combined, the BAO+SNe results currently stand at about 3.4 sigma. The gold standard for “discovery” is about 5 sigma, so there is still room for this effect to disappear. So stay tuned—the final answer may be known within a few years.

Fig. 5 Seven “epochs” in the evolution of galaxies in the universe. This plot shows relative galactic distances as a function of time looking back towards the Big Bang (older times closer to the Big Bang are to the right side of the graph). In more recent times, relative distances are smaller than predicted by the consensus theory known as Lambda-Cold-Dark-Matter (LCDM), suggesting that Dark Energy is slight weaker today than it was billions of years ago. The three left-most data points (with error bars from early 2024) are below the LCDM line. From [Ref].
Fig. 6 Annotated version of the previous figure. From [Ref].

The Future of Physics

The gravitational constant G is considered to be a constant property of nature, as is Planck’s constant h, and the charge of the electron e. None of these fundamental properties of physics are viewed as time dependent and none can be derived from basic principles. They are simply constants of our reality. But if Λ is time dependent, then it is not a fundamental constant and should be derivable and explainable.

And that will open up new physics.

100 Years of Quantum Physics:  Pauli’s Exclusion Principle (1924)

One hundred years ago this month, in December 1924, Wolfgang Pauli submitted a paper to Zeitschrift für Physik that provided the final piece of the puzzle that connected Bohr’s model of the atom to the structure of the periodic table.  In the process, he introduced a new quantum number into physics that governs how matter as extreme as neutron stars, or as perfect as superfluid helium, organizes itself.

He was led to this crucial insight, not by his superior understanding of quantum physics, which he was grappling with as much as Bohr and Born and Sommerfeld were at that time, but through his superior understanding of relativistic physics that convinced him that the magnetism of atoms in magnetic fields could not be explained through the orbital motion of electrons alone.

Encyclopedia Article on Relativity

Bored with the topics he was being taught in high school in Vienna, Pauli was already reading Einstein on relativity and Emil Jordan on functional analysis before he arrived at the university in Munich to begin studying with Arnold Sommerfeld.  Pauli was still merely a student when Felix Klein approached Sommerfeld to write an article on relativity theory for his Encyclopedia of Mathematical Sciences.  Sommerfeld by that time was thoroughly impressed with Pauli’s command of the subject and suggested that he write the article.


Pauli’s encyclopedia article on relativity expanded to 250 pages and was published in Klein’s fifth volume in 1921 when Pauli was only 21 years old—just 5 years after Einstein had published his definitive work himself!  Pauli’s article is still considered today one of the clearest explanations of both special and general relativity.

Pauli’s approach established the methodical use of metric space concepts that is still used today when teaching introductory courses on the topic.  This contrasts with articles written only a few years earlier that seem archaic by comparison—even Einstein’s paper itself.  As I recently read through his article, I was struck by how similar it is to what I teach from my textbook on modern dynamics to my class at Purdue University for junior physics majors.

Fig. 1 Wolfgang Pauli [Image]

Anomalous Zeeman Effect

In 1922, Pauli completed his thesis on the properties of water molecules and began studying a phenomenon known as the anomalous Zeeman effect.  The Zeeman effect is the splitting of optical transitions in atoms under magnetic fields.  The electron orbital motion couples with the magnetic field through a semi-classical interaction between the magnetic moment of the orbital and the applied magnetic field, producing a contribution to the energy of the electron that is observed when it absorbs or emits light. 

The Bohr model of the atom had already concluded that the angular momentum of electron orbitals was quantized into integer units.  Furthermore, the Stern-Gerlach experiment of 1922 had shown that the projection of these angular momentum states onto the direction of the magnetic field was also quantized.  This was known at the time as “space quantization”.  Therefore, in the Zeeman effect, the quantized angular momentum created quantized energy interactions with the magnetic field, producing the splittings in the optical transitions.

File:Breit-rabi-Zeeman-en.svg
Fig. 2 The magnetic Zeeman splitting of Rb-87 from the weak field to the strong-field (Pachen-Back) effect

So far so good.  But then comes the problem with the anomalous Zeeman effect.

In the Bohr model, all angular momenta have integer values.  But in the anomalous Zeeman effect, the splittings could only be explained with half integers.  For instance, if total angular momentum were equal to one-half, then in a magnetic field it would produce a “doublet” with +1/2 and -1/2 space quantization.  An integer like L = 1 would produce a triplet with +1, 0, and -1 space quantization.  Although doublets of the anomalous Zeeman effect were often observed, half-integers were unheard of (so far) in the quantum numbers of early quantum physics.

But half integers were not the only problem with “2”s in the atoms and elements.  There was also the problem of the periodic table. It, too, seemed to be constructed out of “2”s, multiplying a sequence of the difference of squares.

The Difference of Squares

The difference of squares has a long history in physics stretching all the way back to Galileo Galilei who performed experiments around 1605 on the physics of falling bodies.  He noted that the distance traveled in successive time intervals varied as the difference 12 – 02 = 1, then 22-12 = 3, then 32-22 = 5, then 42-32 = 7 and so on.  In other words, the distances traveled in each successive time interval varied as the odd integers.  Galileo, ever the astute student of physics, recognized that the distance traveled by an accelerating body in a time t varied as the square of time t2.  Today, after Newton, we know that this is simply the dependence of distance for an accelerating body on the square of time s = (1/2)gt2

By early 1924 there was another law of the difference of squares.  But this time the physics was buried deep inside the new science of the elements, put on graphic display through the periodic table. 

The periodic table is constructed on the difference of squares.  First there is 2 for hydrogen and helium.  Then another 2 for lithium and beryllium, followed by 6 for B, C, N, O, F and Ne to make a total of 8.  After that there is another 8 plus 10 for the sequence of Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu and Zn to make a total of 18.  The sequence of 2-8-18 is 2•12 = 2, 2•22 = 8, 2•32 = 18 for the sequence 2n2

Why the periodic table should be constructed out of the number 2 times the square of the principal quantum number n was a complete mystery.  Sommerfeld went so far as to call the number sequence of the periodic table a “cabalistic” rule. 

The Bohr Model for Many Electrons

It is easy to picture how confusing this all was to Bohr and Born and others at the time.  From Bohr’s theory of the hydrogen atom, it was clear that there were different energy levels associated with the principal quantum number n, and that this was related directly to angular momentum through the motion of the electrons in the Bohr orbitals. 

But as the periodic table is built up from H to He and then to Li and Be and B, adding in successive additional electrons, one of the simplest questions was why the electrons did not all reside on the lowest energy level?  But even if that question could not be answered, there was the question of why after He the elements Li and Be behaved differently than B, N, O and F, leading to the noble gas Ne.  From normal Zeeman spectroscopy as well as x-ray transitions, it was clear that the noble gases behaved as the core of succeeding elements, like He for Li and Be and Ne for Na and Mg.

To grapple with all of this, Bohr had devised a “building up” rule for how electrons were “filling” the different energy levels as each new electron of the next element was considered.  The noble-gas core played a key role in this model, and the core was also assumed to be contributing to both the normal Zeeman effect as well as the anomalous Zeeman effect with its mysterious half-integer angular momenta.

But frankly, this core model was a mess, with ad hoc rules on how the additional electrons were filling the energy levels and how they were contributing to the total angular momentum.

This was the state of the problem when Pauli, with his exceptional understanding of special relativity, began to dig deep into the problem.  Since the Zeeman splittings were caused by the orbital motion of the electrons, the strongly bound electrons in high-Z atoms would be moving at speeds near the speed of light.  Pauli therefore calculated what the systematic effects would be on the Zeeman splittings as the Z of the atoms got larger and the relativistic effects got stronger.

He calculated this effect to high precision, and then waited for Landé to make the measurements.  When Landé finally got back to him, it was to say that there was absolutely no relativistic corrections for Thallium (Z = 90).  The splitting remained simply fixed by the Bohr magneton value with no relativistic effects.

Pauli had no choice but to reject the existing core model of angular momentum and to ascribe the Zeeman effects to the outer valence electron.  But this was just the beginning.

Pauli’s Breakthrough

https://onionesquereality.wordpress.com/wp-content/uploads/2012/07/wolfgang-pauli.jpg
Fig. 5 Wolfgang Pauli [Image]

By November of 1924 Pauli had concluded, in a letter to Landé

“In a puzzling, non-mechanical way, the valence electron manages to run about in two states with the same k but with different angular momenta.”

And in December of 1924 he submitted his work on the relativistic effects (or lack thereof) to Zeitschrift für Physik,

“From this viewpoint the doublet structure of the alkali spectra as well as the failure of Larmor’s theorem arise through a specific, classically  non-describable sort of Zweideutigkeit (two-foldness) of the quantum-theoretical properties of the valence electron. (Pauli, 1925a, pg. 385)

Around this time, he read a paper by Edmund Stoner in the Philosophical Magazine of London published in October of 1924.  Stoner’s insight was a connection between the number of states observed in a magnetic field and the number of states filled in the successive positions of elements in the periodic table.  Stoner’s insight led naturally to the 2-8-18 sequence for the table, although he was still thinking in terms of the quantum numbers of the core model of the atoms.

This is when Pauli put 2 plus 2 together: He realized that the states of the atom could be indexed by a set of 4 quantum numbers: n-the principal quantum number, k1-the angular momentum, m1-the space quantization number, and a new fourth quantum number m2 that he introduced but that had, as yet, no mechanistic explanation.  With these four quantum numbers enumerated, he then made the major step:

It should be forbidden that more than one electron, having the same equivalent quantum numbers, can be in the same state.  When an electron takes on a set of values for the four quantum numbers, then that state is occupied.

This is the Exclusion Principle:  No two electrons can have the same set of quantum numbers.  Or equivalently, no electron state can be occupied by two electrons.

Fig. 6 Level filling for Krypton using the Pauli Exclusion Principle

Today, we know that Pauli’s Zweideutigkeit is electron spin, a concept first put forward in 1925 by the American physicist Ralph Kronig and later that year by George Uhlenbeck and Samuel Goudsmit.



And Pauli’s Exclusion Principle is a consequence of the antisymmetry of electron wavefunctions first described by Paul Dirac in 1926 after the introduction of wavefunctions into quantum theory by Erwin Schrödinger earlier that year.

Fig. 7 The periodic table today.

Timeline:

1845 – Faraday effect (rotation of light polarization in a magnetic field)

1896 – Zeeman effect (splitting of optical transition in a magnetic field)

1897 – Anomalous Zeeman effect (half-integer splittings)

1902 – Lorentz and Zeeman awarded Nobel prize (for electron theory)

1921 – Paschen-Back effect (strong-field Zeeman effect)

1922 – Stern-Gerlach (space quantization)

1924 – de Broglie matter waves

1924 – Bose statistics of photons

1924 – Stoner (conservation of number of states)

1924 – Pauli Exclusion Principle

References:

E. C. Stoner (Philosophical Magazine, 48 [1924], 719) Issue 286  October 1924

M. Jammer, The conceptual development of quantum mechanics (Los Angeles, Calif.: Tomash Publishers, Woodbury, N.Y. : American Institute of Physics, 1989).

M. Massimi, Pauli’s exclusion principle: The origin and validation of a scientific principle (Cambridge University Press, 2005).

Pauli, W. Über den Einfluß der Geschwindigkeitsabhängigkeit der Elektronenmasse auf den Zeemaneffekt. Z. Physik 31, 373–385 (1925). https://doi.org/10.1007/BF02980592

Pauli, W. (1925). “Über den Zusammenhang des Abschlusses der Elektronengruppen im Atom mit der Komplexstruktur der Spektren”. Zeitschrift für Physik. 31 (1): 765–783

Read more in Books by David Nolte at Oxford University Press

Science Underground: Neutrino Physics and Deep Gold Mines

“By rights, we shouldn’t even be here,” says Samwise Gamgee to Frodo Baggins in the Peter Jackson movie The Lord of the Rings: The Two Towers

But we are!

We, our world, our Galaxy, our Universe of matter, should not exist.  The laws of physics, as we currently know them, say that all the matter created at the instant of the Big Bang should have annihilated with all the anti-matter there too.  The great flash of creation should have been followed by a great flash of destruction, and all that should be left now is a faint glow of light without matter.

Except that we are here, and so is our world, and our Galaxy and our Universe … against the laws of physics as we know them.

So, there must be more that we have yet to know.  We are not done yet with the laws of physics.

Which is why the scientists of the Sanford Underground Research Facility (SURF), a kilometer deep under the Black Hills of South Dakota, are probing the deep questions of the universe near the bottom of a century-old gold mine.

Homestake Mine

>>> Twenty of us are plunging vertically at one meter per second into the depths of the earth, packed into a steel cage, seven to a row, dressed in hard hats and fluorescent safety vests and personal protective gear plus a gas filter that will keep us alive for a mere 60 minutes if something goes very wrong.  It is dark, except for periodic fast glimpses of LED-lit mine drifts flying skyward, then rock again, repeating over and over for ten minutes.  Drops of water laced with carbonate drip from the cage ceiling, that, when dried, leave little white stalagmites on our clothing.  A loud bang tells everyone inside that a falling boulder has crashed into the top of the cage, and we all instinctively press our hard hats more tightly onto our heads.  Finally, the cage slows, eventually to a crawl, as it settles to the 4100 level of the Homestake mine. <<<

The Homestake mine was founded in 1877 on land that had been deeded for all time to the Lakota Sioux by the United States Government in the Treaty of Fort Laramie in 1868—that is, before George Custer, twice cursed, found gold in the rolling forests of Ȟe Sápa—the Black Hills, South Dakota.  The prospectors rushed in, and the Lakota were pushed out.

Gold was found washed down in the streams around the town of Deadwood, but the source of the gold was found a year later at the high Homestake site by prospectors.  The stake was too large for them to operate themselves, so they sold it to a California consortium headed by George Hearst, who moved into town and bought or stole all the land around it.  By 1890, the mine was producing the bulk of gold and silver in the US.  When George Hearst died in 1891, his wife Phoebe donated part of the fortune to building projects at the University of California at Berkeley, including the Hearst Mining Building, which was the largest building devoted to the science of mining engineering in the world.  Their son, William Randolph Hearst, became a famous newspaper magnate and a possible inspiration for Orson Well’s Citizen Cane.

The interior of Hearst Mining Building, UC Berkeley campus.

By the late 1900’s, the mining company had excavated over 300 miles of tunnels and extracted nearly 40 million ounces of gold (equivalent to $100B today).  Over the years, the mine had gone deeper and deeper, eventually reaching the 8000 foot level (about 3000 feet below sea level). 

This unique structure presented a unique opportunity for a nuclear chemist, Ray Davis, at Brookhaven National Laboratory who was interested in the physics of neutrinos, the elementary particles that Enrico Fermi had named the “little neutral ones” that accompany radioactive decay. 

Neutrinos are unlike any other fundamental particles, passing through miles of solid rock as if it were transparent, except for exceedingly rare instances when a neutrino might collide with a nucleus.  However, neutrino detectors on the surface of the Earth were overwhelmed by signals from cosmic rays.  What was needed was a thick shield to protect the neutrino detector, and what better shield than thousands of feet of rock? 

Davis approached the Homestake mining company to request space in one of their tunnels for his detector.  While a mining company would not usually be receptive to requests like this, one of its senior advisors had previously had an academic career at Harvard, and he tipped the scales in favor of Davis.  The experiment would proceed.

The Solar Neutrino Problem

>>> After we disembark onto the 4100 level (4100 feet below the surface) from the Ross Shaft, we load onto the rail cars of a toy train, the track width little more than a foot wide.  The diminutive engine clunks and clangs and jerks itself forward, gathering speed as it pushes and pulls us, disappearing into a dark hole (called a drift) on a mile-long trek to our experimental site.  Twice we get stuck, the engine wheels spinning without purchase, and it is not clear if the engineers can get it going again. 

At this point we have been on the track for a quarter of an hour and the prospect of walking back to the Ross is daunting.  The only other way out, the Yates Shaft, is down for repairs.  The drift is unlit except by us with our battery-powered headlamps sweeping across the rock face, and who knows how long the batteries will last?  The ground is broken and uneven, punctuated with small pools of black water.  There would be a lot of stumbling and falls if we had to walk our way out.  I guess this is why I had to initial and sign in twenty different places on six pages, filled with legal jargon nearly as dense as the rock around us, before they let me come down here. <<<

In 1965, the Homestake mining crews carved out a side cavern for Davis near the Yates shaft at the 4850 level of the mine.  He constructed a large vat to hold cleaning fluid that contained lots of chlorine atoms.  When a rare neutrino interacted with a chlorine nucleus, the nucleus would convert to argon and give off a characteristic flash of light.  By tallying the flashes of light, and by calculating how likely it was for a neutrino to interact with a nucleus, the total flux of neutrinos through the vat could be back calculated.

The main source for neutrinos in our neck of the solar system is the sun.  As hydrogen fuses into helium, it gives off neutrinos.  These pass through the overlying layers of the sun and pass through the Earth and through Davis’ vat—except those rare cases when chlorine converts to argon.  The rate at which solar neutrinos should be detected in the vat was calculated very accurately by John Bahcall at Cal Tech.

By the early 1970’s, there were enough data that the total neutrino flux could be calculated and compared to the theoretical value based on the fusion reactions in the sun—and they didn’t match.  Worse, they didn’t match within a factor of three!  There were three times fewer neutrino events detected that there should have been.  Where were all the missing neutrinos?

Origins and fluxes of solar neutrinos.

This came to be called the “Solar neutrino problem”.  At first, everyone assumed that the experiment was wrong, but Davis knew he was right.  Then others said the theoretical values were wrong, but Bahcall knew he was right.  The problem was, that Davis and Bahcall couldn’t both be right, could they?

Enter neutrino oscillations

The neutrinos coming from the sun originate mostly as what are known as electron neutrinos.  These interact with a neutron in a chlorine nucleus to convert it to a proton plus an ejected electron.  But if the neutrino were of a different kind, perhaps a muon neutrino, then there isn’t enough energy for the neutron to eject a muon, so the reaction doesn’t take place. 

Hydrogen fusion in the sun.

This became the leading explanation for the missing solar neutrinos.  If many of them converted to muon neutrinos on their way to the Earth, then the Davis experiment wouldn’t detect them—hence the missing events.

The way that neutrinos can oscillate from electron neutrinos to muon neutrinos is if neutrinos have a very small but finite mass.  This was the solution, then, to the solar neutrino problem.  Neutrinos have mass.  Ray Davis was awarded the Nobel Prize in Physics in 2002 for his discovery of the missing neutrinos.

But one solution begets another problem: the Standard Model of elementary particles says that neutrinos are massless.  What can be going on with the Standard Model?

Once again, the answer may be found deep underground.

Sanford Underground Research Facility (SURF)

>>> The rock of the Homestake is one of the hardest and densest rocks you will find, black as night yet shot through with white streaks of calcite like the tails of comets.  It is impermeable, and despite being so deep, the rock is surprisingly dry—most of the fractures are too tight to allow a trickle through. 

As our toy train picks up speed, the veins flash by in our headlamps, sometimes sparkling with pin pricks of reflected light.  A gold fleck perhaps?  Yet the drift as a whole (or as a hole) is a shabby thing, rusty wedges half buried in the ceiling to keep slabs from falling, bent and battered galvanized metal pinned to the walls by rock bolts to hold them back, flimsy metal webbing strung across the ceiling to keep boulders from crushing our heads.  It’s dirty and dark and damp and hewn haphazardly from the compressed crust.  There is no art, no sense of place.  These shafts were dynamited through, at three-to-five feet per detonation, driven by money and the need for the gold, so nobody had any sense of aesthetics. <<<

The Homestake mine closed operations in 2001 due to the low grade of ore and the sagging price of gold.  They continued pumping water from the mine for two more years in anticipation of handing the extensive underground facility over to the National Science Foundation for use as a deep underground science lab.  However, delays in the transfer and the cost of pumping forced them to turn off the pumps and the water slowly began rising through the levels, taking a year or more to rise and flood the famous 4850 level while negotiations continued. 

The surface buildings of the Sanford Underground Research Facility (SURF).
The open pit at Homestake.

Finally, the NSF took over the facility to house the Deep Underground Science and Engineering Laboratory (DUSEL) that would operate at the deepest levels, but these had already been flooded.  After a large donation from South Dakota banker T. Denny Sanford and support from the Governor Mike Rounds, the facility became the Sanford Underground Research Fability (SURF).  The 4850 level was “dewatered”, and the lab was dedicated in 2009.  But things were still not settled.  NSF had second thoughts, and in 2011 the plans for DUSEL (still under water) were terminated and the lab was transferred to the Department of Energy (DOE), administered through the Lawrence Berkeley National Laboratory, to host experiments at the 4850 level and higher.

Layout of the mine levels at SURF.

Two early experiments at SURF were the Majorana Demonstrator and LUX. 

The Majorana Demonstrator was an experiment designed to look for neutrino-less double-beta decay where two neutrons in a nucleus decay simultaneously, each emitting a neutrino. A theory of neutrinos proposed by the Italian physicist, Ettore Marjorana, in 1937 that goes beyond the Standard Model ,says that a neutrino is its own antiparticle. If this were the case, then the two neutrinos emitted in the double beta decay could annihilate each otherhence a “neutrinoless” double beta decay. The Demonstrator was too small to actually see such an event, but it tested the concept and laid the ground for later larger experiments. It operated between 2016 and 2021.

Neutrinoless double-beta decay.

The Large Underground Xenon (LUX) experiment was a prototype for the search for dark matter. Dark matter particles are expected to interact very weakly with ordinary matter (sort of like neutrinos, but even less interactive). Such weakly interacting massive particles (WIMPs) might scatter off a nucleus in an atom of Xenon, shifting the nucleus enough that it emits electrons and light. These would be captured by detectors at the caps of the liquid Xenon container.

Once again, cosmic rays at the surface of the Earth would make the experiment unworkable, but deep underground there is much less background within which to look for the “needle in the haystack”. LUX operated from 2009 to 2016 and was not big enough to hope to see a WIMP, but like the Demonstrator, it was a proof-of-principle to show that the idea worked and could be expanded to a much larger 7-ton experiment called LUX-Zeplin that began in 2020 and is ongoing, looking for the biggest portion of mass in our universe. (About a quarter of the energy of the universe is composed of dark matter. The usual stuff we see around us only makes up about 4% of the energy of the universe.)

LUX-Zeplin Experiment

Deep Underground Neutrino Experiment (DUNE)

>>> “Always keep a sense of where you are,” Bill the geologist tells us, in case we must hike our way out.  But what sense is there?  I have a natural built-in compass that has served me well over the years, but it seems to run on the heavens.  When I visited South Africa, I had an eerie sense of disorientation the whole time I was there.  When you are a kilometer underground, the heavens are about as far away as Heaven.  There is no sense of orientation, only the sense of lefts and rights. 

We were told there would be signs directing us towards the Ross or Yates Shafts.  But once we are down here, it turns out that these “signs” are crudely spray-painted marks on the black rock, like bad graffiti.  When you see them, your first thought is of kids with spray cans making a mess—until you suddenly recognize an R or an O or two S’s along with an indistinct arrow that points slightly more one way than the other. <<<

Deep Underground Neutrino Experiment (DUNE).

One of the most ambitious high-energy experiments ever devised is the Long Baseline Neutrino Facility (LBNF) that is 800 miles long. It begins in Batavia, Illinois, at the Fermilab accelerator that launches a beam of neutrinos that travel 800 miles through the Earth to detectors at the Deep Underground Neutrino Experiment (DUNE) at SURF in Lead, South Dakota. The neutrinos are expected to oscillate in flavor, just like solar neutrinos, and the detection rates at DUNE could finally answer one of the biggest outstanding questions of physics: Why is our universe made of matter?

At the instant of the Big Bang, equal amounts of matter and antimatter should have been generated, and these should have annihilated in equal manner, and the universe should be filled with nothing but photons. But it’s not. Matter is everywhere. Why?

In the Standard Model there are many symmetries, also known as conserved properties. One power symmetry is known as CPT symmetry, where C is a symmetry of changing particles into the antiparticles, P is a reflection of left-handed or right-handed particles, and T is time-reversal symmetry. Yet there could be a CP symmetry too, which you might expect if time-reversal is taken as a symmetric property of physics. But it’s not!

There is a strange meson called a Kaon that does not decay the same way for its particle and antiparticle pair, violating CP symmetry. This was discovered in 1964 by James Cronin and Val Fitch who won the 1980 Nobel prize in physics. The discovery shocked the physics world. Since then, additional violations of CP symmetry have been observed in quarks. Such a broken symmetry is allowed in the Standard Model of particles, but the effect is so exceedingly smallCP is so extremely close to being a true symmetrythat it cannot explain the size of the matter-antimatter asymmetry in the universe.

Neutrino oscillations also can violate CP symmetry, but the effects have been hard to measurethus the need for DUNE. By creating large amounts of neutrinos, beaming them 800 miles through the Earth, and detecting them in the vast liquid Argon vats in the underground caverns of SURF, the parameters of neutrino oscillation can be measured directly, possibly explaining the matter asymmetry of the universeand answering Samwise’s question of why we are here.

Center for Understanding Subsurface Signals and Permeability (CUSSP)

>>> Finally, in the distance, as we rush down the dark drift, we see a bright glow that grows to envelope us with a string of white LED lights.  The drift is not so shabby here, with fresh pipes and electrical cables laid neatly by the side.  We had arrived at the CUSSP experimental site.  It turned out it was just a few steps away from the inactive Yates Shaft, that, if it had been operating, would have removed the need for the crazy train ride through black rock along broken tunnels.  But that is OK.  Because we are here, and this is what had brought us down into the Earth to answer questions down-to-Earth as we try to answer questions related to our future existence on this planet, learning what we need to generate the power for our high-tech society without making our planet unlivable.  <<<

Not all the science at SURF is so ethereal. For instance, research on Enhanced Geothermal Systems (EGS) is funded by the DOE Office of Basic Energy Sciences.  Geothermal systems can generate power by extracting super-heated water from underground to run turbines. However, superheated water is nasty stuff, very corrosive and full of minerals that tend to block up the fractures that the water flows through. The idea of enhanced geothermal systems is to drill boreholes and use “fracking” to create fractures in the hard rock, possibly refracturing older fractures that had become blocked. If this could be done reliably, then geothermal sites could be kept operating.

The Center for Understanding Subsurface Signals and Permeability (CUSSP) was recently funded by the DoE to use the facilities at SURF to study how well fracks can be controlled. The team is led by Pacific Northwest National Lab (PNNL) with collaborations from Lawrence Berkeley Lab, Maryland, Illinois and Purdue, among others. We are installing seismic equipment as well as electrical resistivity to monitor the induced fractures.

The CUSSP installation on the 4100 level was the destination of our underground journey, to see the boreholes in person and to get a sense of the fracture orientations at the drift wall. During the half hour at the site, rocks were examined, questions were answered, tall tales were told, and it was time to return.

Shooting to the Stars

>>> At the end of the tour, we pack again into the Ross cage and are thrust skyward at 2 meters per second—twice the speed as coming down because of the asymmetry of slack cables that could snag and snap.  Ears pop, and pop again, until the cage slows, and we settle to the exit level, relieved and tired and ready to see the sky. Thinking back, as we were shooting up the shaft, I imagined that the cage would never stop, flying up past the massive hoist, up and onward into the sky and to the stars.  <<<

In a video we had been shown about SURF, Jace DeCory, a scholar of the Lakota Sioux, spoke of the sacred ground of Ȟe Sápa—the Black Hills.  Are we taking again what is not ours?  This time it seems not.  The scientists of SURF are linking us to the stars, bringing knowledge instead of taking gold.  Jace quoted Carl Sagan: “We are made of star-stuff.”  Then she reminded us, the Lakota Sioux have known that all along.

Counting by the Waters of Babylon: The Secrets of the Babylonian 60-by-60 Multiplication System

Could you memorize a 60-by-60 multiplication table?  It has 1830 distinct numbers to memorize.

The answer today is an emphatic “No”!  Remember how long it took you to memorize the 12-by-12 table when you were a school child!

But 4000 years ago, the ancient Babylonians were doing it just fine—or at least “half” fine.  This is how.

How to Tally

In the ancient land of Sumer, the centralization of the economy, and the need of the government to control it, made it necessary to keep records of who owned what and who gave what to whom.  Scribes recorded transactions initially as tally marks pressed into soft clay around 5000 years ago, but one can only put so many marks on a clay tablet before it is full. 

Therefore, two inventions were needed to save space and time.  The first invention was a symbol that could stand in for a collection of tally marks.  Given the ten fingers we have on our hands, it is no surprise that this aggregate symbol stood for 10 units—almost every culture has some aspect of a base-10 number system.  With just two symbols repeated, numbers into the tens are easily depicted, as in Fig. 1. 

Figure 1.  Babylonian cuneiform numbers use agglutination and place notation

But by 4000 years ago, tallies were ranging into the millions, and a more efficient numerical notation was needed.  Hence, the second invention.

Place-value notation—an idea more abstract than the first—was so abstract that other cultures who drew from Mesopotamian mathematics, such as the Greeks and Romans, failed to recognize its power and adopt it. 

Today, we are so accustomed to place-value notation that it is hard to recognize how ingenious it is—how orders of magnitude are so easily encompassed in a few groups of symbols that keep track of thousands or millions at the same time as single units.  Our own decimal place-value system is from Hindu-Arabic numerals, which seems natural enough to us, but the mathematics of Old Babylon from the time of Hammurabi (1792 – 1750 BCE) was sexagesimal, based on the number 60. 

Our symbol for one hundred (100) using sexagesimal would be a pair of numbers (1,40) meaning 1×60+4×10. 

Our symbol for 119 would be (1, 59) meaning 1×60 + 5×10 + 9. 

Very large numbers are easily expressed.  Our symbol for 13,179,661 (using eight symbols) would be expressed in the sexagesimal system using only 5 symbols as (1, 1, 1, 1, 1) for 1×604 + 1×603 + 1×602 + 1×60 + 1. 

There has been much speculation on why a base 60 numeral system makes any sense.  The number does stand out because it has the largest number of divisors (1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30) of any smaller integer, and three of the divisors (2, 3, 5) are prime.  Babylonian mathematical manipulation relied heavily on fractions, and the availability of so many divisors may have been the chief advantage of the system.  The number the Babylonians used for the square root of 2 was (1; 24, 51, 10) = 1 + 24/60 + 51/602 + 10/603 = 1.41421296 which is accurate to almost seven decimal places.  It has been pointed out [1] that this sexagesimal approximation for root-2 is what would be obtained if the Newton-Raphson method were used to find the root of the equation x2-2=0 starting from an initial guess of 3/2 = 1.5. 

Squares, Products and Differences

One of the most important quantities in any civilization is the measurement of land areas.  Land ownership is a measure of wealth and power and until recent times it was a requirement for authority or even citizenship.  This remains true today when land possession and ownership are one the bricks in the foundation of social stability and status.  The size of a McMansion is a status symbol, and the number of acres is a statement of wealth and power.  Even renters are acutely aware of how many square feet they have in their apartment or house. 

In ancient Sumer and Babylon, the possession of land was critically important, and it was necessary to measure land areas to track the accumulation or loss of ownership.  Because the measurement of area requires the multiplication of numbers, it is no surprise that multiplication was one of the first mathematical developments.

Babylonian mathematics depended heavily on squares—literally square geometric figures—and the manipulation of squares formed their central algorithm for multiplication.

The algorithm begins by associating to any pair of number (a, b) a unique second pair (p’, q’) where p’ = (a+b)/2 is the semi-sum (known as the average), and q’ = (b-a)/2 is the semi-difference.  The Babylonian mathematicians discovered that the product of the first pair is given by the difference in the squares of the second pair

as depicted in Fig. 2. 

Figure 2.  Old Babylonian mathematics.  To a pair of numbers (a,b) is associated another pair (p’,q’): the average and the semi-difference.  The product of the first pair of numbers is equal to the difference in the squares of the second pair (ab = p’2 – q’2).  A specific example is shown on the right.

This simple relation between products, and the differences of squares, provides a significant savings in time and effort when constructing products of two large numbers—as long as the two numbers have the same parity.  That is the caveat!  The semi-sum and semi-difference each must be an integer, which only happens when the two numbers share the same parity (evenness or oddness).

Therefore, while a multiplication table up to 60 by 60 would have 60•61/4 = 915 distinct numbers to memorize, which could not be memorized easily, all squares up to 602 gives just 60 numbers to memorize, which is fewer than our children need to learn today. 

Therefore, with just 60 numbers, one could construct all 915 same-parity products of the 60 by 60 table using only sums and differences. 

Try it yourself.


[1] pg. 60, R. L. Cooke, The History of Mathematics: A Brief Course. (New York, John Wiley & Sons, 2012)

Read more in Books by David Nolte at Oxford University Press

Edward Purcell:  From Radiation to Resonance

As the days of winter darkened in 1945, several young physicists huddled in the basement of Harvard’s Research Laboratory of Physics, nursing a high field magnet to keep it from overheating and dumping its field.  They were working with bootstrapped equipment—begged, borrowed or “stolen” from various labs across the Harvard campus.  The physicist leading the experiment, Edward Mills Purcell, didn’t even work at Harvard—he was still on the payroll of the Radiation Laboratory at MIT, winding down from its war effort on radar research for the military in WWII, so the Harvard experiment was being done on nights and weekends.

Just before Christmas, 1945, as college students were fleeing campus for the first holiday in years without war, the signal generator, borrowed from a psychology lab, launched an electromagnetic pulse into simple paraffin—and disappeared!  It had been absorbed by the nuclear spins of the copious number of hydrogen nuclei (protons) in the wax. 

The experiment was simple, unfunded, bootstrapped—and it launched a new field of physics that ultimately led to magnetic resonance imaging (MRI) that is now the workhorse of 3D medical imaging.

This is the story, in Purcell’s own words, of how he came to the discovery of nuclear magnetic resonance in solids, for which he was awarded the Nobel Prize in Physics in 1952.

Early Days

Edward Mills Purcell (1912 – 1997) was born in a small town in Illinois, the son of a telephone businessman, and some of his earliest memories were of rummaging around in piles of telephone equipment—wires and transformers and capacitors. He especially like the generators:

“You could always get plenty of the bell-ringing generators that were in the old telephones, which consisted of a series of horseshoe magnets making the stator field and an armature that was wound with what must have been a mile of number 39 wire or something like that… These made good shocking machines if nothing else.”

His science education in the small town was modest, mostly chemistry, but he had a physics teacher, a rare woman at that time, who was open to searching minds. When she told the students that you couldn’t pull yourself up using a single pulley, Purcell disagreed and got together with a friend:

“So we went into the barn after school and rigged this thing up with a seat and hooked the spring scales to the upgoing rope and then pulled on the downcoming rope.”

The experiment worked, of course, with the scale reading half the weight of the boy. When they rushed back to tell the physics teacher, she accepted their results immediately—demonstration trumped mere thought, and Purcell had just done his first physics experiment.

However, physics was not a profession in the early 1920’s.

“In the ’20s the idea of chemistry as a science was extremely well publicized and popular, so the young scientist of shall we say 1928 — you’d think of him as a chemist holding up his test tube and sighting through it or something…there was no idea of what it would mean to be a physicist.

The name Steinmetz was more familiar and exciting than the name Einstein, because Steinmetz was the famous electrical engineer at General Electric and was this hunchback with a cigar who was said to know the four-place logarithm table by heart.”

Purdue University and Prof. Lark-Horowitz

Purcell entered Purdue University in the Fall of 1929. The University had only 4500 students who paid $50 a year to attend. He chose a major in electrical engineering, because

“Being a physicist…I don’t remember considering that at that time as something you could be…you couldn’t major in physics. You see, Purdue had electrical, civil, mechanical and chemical engineering. It had something called the School of Science, and you could graduate, having majored in science.”

But he was drawn to physics. The Physics Department at Purdue was going through a Renaissance under the leadership of its new department head Prof. Lark-Horovitz

“His [Lark-Horovitz] coming to Purdue was really quite important for American physics in many ways…  It was he who subsequently over the years brought many important and productive European physicists to this country; they came to Purdue, passed through. And he began teaching; he began having graduate students and teaching really modern physics as of 1930, in his classes.”

Purcell attended Purdue during the early years of the depression when some students didn’t have enough money to find a home:

“People were also living down there in the cellar, sleeping on cots in the research rooms, because it was the Depression and some of the graduate students had nowhere else to live. I’d come in in the morning and find them shaving.”

Lark-Horovitz was a demanding department chair, but he was bringing the department out of the dark ages and into the modern research world.

“Lark-Horovitz ran the physics department on the European style: a pyramid with the professor at the top and everybody down below taking orders and doing what the professor thought ought to be done. This made working for him rather difficult. I was insulated by one layer from that because it was people like Yearian, for whom I was working, who had to deal with the Lark. “

Hubert Yearian had built a 20-kilovolt electron diffraction camera, a Debye-Scherrer transmission camera, just a few years after Davisson and Germer had performed the Nobel-prize winning experiment at Bell Labs that proved the wavelike nature of electrons. Purcell helped Yearian build his own diffraction system, and recalled:

“When I turned on the light in the dark room, I had Debye-Scherrer rings on it from electron diffraction — and that was only five years after electron diffraction had been discovered. So it really was right in the forefront. And as just an undergraduate, to be able to do that at that time was fantastic.”

Purcell graduated from Purdue in 1933 and from contacts through Lark-Horovitz he was able to spend a year in the physics department at Karlsruhe in Germany. He returned to the US in 1934 to enter graduate scool in physics at Harvard, working under Kenneth Bainbridge. His thesis topic was a bit of a bust, a dusty old problem in classical electrostatics that was a topic far older than the electron diffraction he worked on at Purdue. But it was enough to get him his degree in 1938, and he stayed on at Harvard as a faculty instructor until the war broke out.

Radiation Laboratory, MIT

In the Fall at the end of 1940 the Radiation Lab at MIT was launched and began vacuuming up all the unattached physicists in the United States, and Purcell was one of them. The radiation lab also vacuumed up some of the top physicists in the country, like Isidor Rabi from Columbia, to supervise the growing army of scientists that were committed to the war effort—even before the US was in the war.

“Our mission was to make a radar for a British night fighter using 10-centimeter magnetron that had been discovered at Birmingham.”

This research turned Purcell and his cohort into experts in radio-frequency electronics and measurement. He worked closely with Rabi (Nobel Prize 1944) and Norman Ramsey (Nobel Prize 1989) and Jerrold Zacharias, who were in the midst of measuring resonances in molecular beams. The names at the Rad Lab was like reading a Who’s Who of physics at that time:

“And then there was the theoretical group, which was also under Rabi. Most of their theory was concerned with electromagnetic fields and signal to noise, things of that sort. George Uhlenbeck was in charge of it for quite a long time, and Bethe was in it for a while; Schwinger was in it; Frank Carlson; David Saxon, now president of the University of California; Goudsmit also.”

Nuclear Magnetic Resonance

The research by Rabi had established the physics of resonances in molecular beams, but there were serious doubts that such phenomena could exist in solids. This became one of the Holy Grails of physics, with only a few physicists across the country with the skill and understanding to make a try to observe it in the solid state.

Many of the physicists at the Rad Lab were wondering what they should do next, after the war was over.

“Came the end of the war and we were all thinking about what shall we do when we go back and start doing physics. In the course of knocking around with these people, I had learned enough about what they had done in molecular beams to begin thinking about what can we do in the way of resonance with what we’ve learned. And it was out of that kind of talk that I was struck with the idea for what turned into nuclear magnetic resonance.”

“Well, that’s how NMR started, with that idea which, as I say, I can trace back to all those indirect influences of talking with Rabi, Ramsey and Zacharias, thinking about what we should do next.

“We actually did the first NMR experiment here [Harvard], not at MIT. But I wasn’t officially back. In fact, I went around MIT trying to borrow a magnet from somebody, a big magnet, get access to a big magnet so we could try it there and I didn’t have any luck. So I came back and talked to Curry Street, and he invited us to use his big old cosmic ray magnet which was out in the shed. So I didn’t ask anybody else’s permission. I came back and got the shop to make us some new pole pieces, and we borrowed some stuff here and there. We borrowed our signal generator from the Psycho Acoustic Lab that Smitty Stevens had. I don’t know that it ever got back to him. And some of the apparatus was made in the Radiation Lab shops. Bob Pound got the cavity made down there. They didn’t have much to do — things were kind of closing up — and so we bootlegged a cavity down there. And we did the experiment right here on nights and week-ends.

This was in December, 1945.

“Our first experiment was done on paraffin, which I bought up the street at the First National store between here and our house. For paraffin we thought we might have to deal with a relaxation time as long as several hours, and we were prepared to detect it with a signal which was sufficiently weak so that we would not upset the spin temperature while applying the r-f field. And, in fact, in the final time when the experiment was successful, I had been over here all night … nursing the magnet generator along so as to keep the field on for many hours, that being in our view a possible prerequisite for seeing the resonances. Now, it turned out later that in paraffin the relaxation time is actually 10-4 seconds. So I had the magnet on exactly 108 times longer than necessary!

The experiment was completed just before Christmas, 1945.


E. M. Purcell, H. C. Torrey, and R. V. Pound, “RESONANCE ABSORPTION BY NUCLEAR MAGNETIC MOMENTS IN A SOLID,” Physical Review 69, 37-38 (1946).

“But the thing that we did not understand, and it gradually dawned on us later, was really the basic message in the paper that was part of Bloembergen’s thesis … came to be known as BPP (Bloembergen, Purcell and Pound). [This] was the important, dominant role of molecular motion in nuclear spin relaxation, and also its role in line narrowing. So that after that was cleared up, then one understood the physics of spin relaxation and understood why we were getting lines that were really very narrow.”

Diagram of the microwave cavity filled with paraffin.

This was the discovery of nuclear magnetic resonance (NMR) for which Purcell shared the 1952 Nobel Prize in physics with Felix Bloch.

David D. Nolte is the Edward M. Purcell Distinguished Professor of Physics and Astronomy, Purdue University. Sept. 25, 2024

References and Notes

• The quotes from EM Purcell are from the “Living Histories” interview in 1977 by the AIP.

• K. Lark-Horovitz, J. D. Howe, and E. M. Purcell, “A new method of making extremely thin films,” Review of Scientific Instruments 6, 401-403 (1935).

• E. M. Purcell, H. C. Torrey, and R. V. Pound, “RESONANCE ABSORPTION BY NUCLEAR MAGNETIC MOMENTS IN A SOLID,” Physical Review 69, 37-38 (1946).

• National Academy of Sciences Biographies: Edward Mills Purcell

Read more in Books by David Nolte at Oxford University Press

Why Do Librarians Hate Books?

Beware! 

If you love books, don’t read this post.  Close the tab and look away from the second burning of the Library of Alexandria.

If you love books, then run to your favorite library (if it is still there), and take out every book you have ever thought of.  Fill your rooms and offices with checked-out books, the older the better, and never, ever, return them.  Keep clicking on RENEW, for as long as they let you.

The librarians had paved paradise and put up a parking lot. 

If you love books, the kind of rare valueless books on topics only you care about, then Librarians—the former Jedi gatekeepers of knowledge—have turned to the dark side, deaccessioning the unpopular books in the stacks, pulling their loan cards like tomb stones, shipping the books away in unmarked boxes like body bags to large warehouses to be sold for pennies—and you may never see them again.

The End of Physics

Just a few years ago my university, with little warning and no consultation with the physics faculty, closed the heart and soul of the Physics Department—our Physics Library.  It was a bright warm space where we met colleagues, quietly discussing deep theories, a place to escape for a minute or two, or for an hour, to browse a book picked from the shelf of new acquisitions—always something unexpected you would never think to search for online.  But that wasn’t the best part.

The best part was the three floors above, filled with dark and dusty stacks that seemed to rise higher than the building itself.  This was where you found the gems—books so old or so arcane that when you pulled them from the shelf to peer inside, they sent you back, like a time machine, to an era when physicists thought differently—not wrong, but differently.  And your understanding of your own physics was changed, seen with a longer lens, showing you things that went deeper than you expected, and you emerged from the stacks a changed person.

And then it was gone. 

They didn’t even need the space.  At a university where space is always in high demand, and turf wars erupt between departments who try to steal space in each other’s buildings, the dark cavernous rooms of the ex-physics library stood empty for years as the powers at be tried to figure out what to do with it.

This is the way a stack in a university library should look. It was too late to take a picture of a stack in my physics library, so this is from the Math library … the only topical library still left at my university among the dozen that existed only a few years ago.

So, I determined to try to understand how a room that stood empty would be more valuable to a university than a room full of books.  What I discovered was at the same time both mundane and shocking.  Mundane, because it delves into the rules and regulations that govern how universities function.  Shocking, because it is a betrayal of the very mission of universities and university libraries.

How to Get Accreditation Without Really Trying

Little strikes fear in the heart of a college administrator like the threat of losing accreditation.  Accreditation is the stamp of approval that drives sales—sales of slots in the freshman incoming class.  Without accreditation, a college is nothing more than a bunch of buildings housing over-educated educators.  But with accreditation, the college has a mandate to educate and has the moral authority to mold the minds of the next generation.

In times past—not too long past—let’s say up to the end of the last millennium, to receive accreditation, a college or university would need to spend something around 3% of its operating budget on the upkeep of its libraries.  For a moderate-sized university library system, this was on the order of $20M per year.  The requirement was a boon to the librarians who kept a constant lookout for new books to buy to populate the beloved “new acquisitions” shelf.

Librarians reveled in their leverage over the university administrators: buy books or lose accreditation.  It was a powerful negotiating position to be in.  But all that changed in the early 2000’s.  Universities are always strapped for cash (despite tuition increases rising at two-times the rate of inflation) and the librarian’s $20M cash cow was a tempting target.  Universities are also powerful, running their billion-dollar-a-year operations, and they lobbied the very organizations that give the accreditations, convincing them to remove the requirement for the minimum library budget.  After all, in the digital world, who needs expensive buildings filled with books, the vast majority of which never get checked out?

The Deaccessioning Wars: Double Fold

Twenty some years ago, a bibliovisionary by the name of Nicholson Baker recognized the book armegeddon of his age and wrote about it in Double Fold: Libraries and the Assault on Paper (Vintage Books/Random House, 2001).  Libraries everywhere were in the midst of an orgy of deaccessioning.  To deaccession a book means to remove it from the card catalog (an anachronism) and ship it off to second-hand book dealers.  But it was worse than that.  Many of the books, as well as rare journals and rarer newspapers, were being “guillotined” by cutting out each page and scanning it into some kind of visual/digital format before pitching all the pages into the recycle bin. The argument in favor of guillotining is that all paper must eventually decay to dust (a false assumption). 

The way to test whether a book, or a newspaper, is on its way to dissolution is to do the double fold test on a corner of a page.  You fold the corner over then back the other way—double fold—and repeat.  The double-fold number of a book is how many double folds it takes for the little triangular piece to fall off.  Any number less than a selected threshold gives a librarian carte blanch to deaccession the book, and maybe to guillotine it, regardless of how the book may be valued.

Librarians generally hate Baker’s little book Double Fold because deaccessioning is always a battle.  Given finite shelf space, for every new acquisition, something old needs to go.  How do you choose?  Any given item might be valued by someone, so an objective test that removes all shades of gray is the double-fold.  It is a blunt instrument, one that Nicholas Baker abhorred, but it does make room for the new—if that is all that a university library is for.

As an aside, as I write this blog, my university library, which does not own a copy of Double Fold, and through which I had to request a copy via Interlibrary Loan (ILL), is threatening me with punitive action if I don’t relinquish it because it is a few weeks overdue.  If my library had actually owned a copy, I could have taken it out and kept it on my office shelf for years, as long as I kept hitting that “renew” button on the library page.  (On the other hand, my university does own a book by the archivist Cox who wrote a poorly argued screed to try to refute Baker.)

The End of Deep Knowledge

Baker is already twenty years out of date, although his message is more dire now than ever.  In his day, deaccessioning was driven by that problem of finite shelf space—one book out for one book in.  Today, when virtually all new acquisitions are digital, that argument is moot.  Yet the current rate at which books are disappearing from libraries, and libraries themselves are disappearing from campuses, is nothing short of cataclysmic, dwarfing the little double-fold problem that Baker originally railed against.

My university used to have a dozen specialized libraries scattered across campus, with the Physics Library one of them.  Now there are maybe three in total.  One of those is the Main Library which was an imposing space filled with the broadest range of topics and a truly impressive depth of coverage.  You could stand in front of any stack and find beautifully produced volumes (with high-quality paper that would never fail the double fold test) on beautifully detailed topics, going as deep as you could wish to the very foundations of knowledge.

I am a writer of the history of science and technology, and as I write, I often will form a very specific question about how a new idea emerged.  What was its context?  How did it break free of old mindsets?  Why was it just one individual who saw the path forward?  What made them special?

My old practice was to look up a few books in the library catalog that may or may not have the kinds of answers I was looking for, then walk briskly across campus to the associated library (great for exercise and getting a break from my computer).  I would scan across the call numbers on the spines of the books until I found the book I sought—and then I would step back and look at the whole stack. 

Without fail, I would find gems I never knew existed, sometimes three, four or five shelves away from the book I first sought.  They were often on topics I never would have searched online.  And to find those gems, I would take down book after book, scanning them quickly before returning them to the shelf (yes, I know, re-shelving is a no-no, but the whole stack would be emptied if I followed the rules) and moving to the next—something you could never do online.  In ten minutes, or maybe half an hour if I lost track of time, I would have three or four books crucial to my argument in the crook of my arm, ready to walk down the stairs to circulation to take them out.  Often, the book that launched my search was not even among them.

A photo from the imperiled Math Library. The publication dates of the books on this short shelf range from the 1870’s to the 1970’s. A historian of mathematics could spend a year mining the stories that these books tell.

I thought that certainly this main library was safe, and I was looking forward to years ahead of me, even past retirement, buried in its stacks, sleuthing out the mysteries of the evolution of knowledge.

And then it was gone.

Not the building or the space—they were still there.  But the rows upon rows of stacks had been replaced with study space that students didn’t even need.  Once again, empty space was somehow more valuable to the library than having that space filled with books.  The librarians had paved paradise and put up a parking lot.  To me, it was like a death in the family. 

The Main Library after the recent remodel. This photo was taken at 11 am during the first week of the Fall semester 2024. This room used to be filled with full stacks of books. Now only about 10-20% of the books remain in the library. Notice the complete absence of students.

Why not bulldoze Williamsburg, Virginia, after digital capture? Why not burn the USS Constitution in Boston Bay after photographing it? Why not flatten the Alamo?

I recently looked up a book that was luckily still available at the Main Library in one of its few remaining stacks.  So I went to find it.  The shelves all around it were only about two-thirds filled, the wide gaps looking like abandoned store-fronts in a failing city.  And what books did remain were the superficial ones—the ones that any undergrad might want to take out to get an overview of some well-worn topic (which they could probably just get on Wikipedia).  All the deep knowledge (which Wikipedia will never see) was gone. 

I walked out with exactly the one book I had gone to find—not a single surprising gem to accompany it.  But the worst part is the opportunity cost: I will never know what I had failed to discover!

The stacks in 2024 are about 1/3 empty, and only about 20% of the stacks remain. The books that survived are the obvious ones.

Shrinking Budgets and Predatory Publishers

So why is a room that stands empty more valuable to a university than a room full of books? Here are the mundane and shocking answers.

On the one hand, library budgets are under assault. The following figure shows library expenditures as a percentage of total university expenditures averaged for 40 major university libraries tabulated by the ARL (Association of Research Libraries) from 1982 to 2017. There is an exponential decrease in the library budget as a function of year, with a break right around 2000-2001 when accreditation was no longer linked to library expenditures. Then the decay accelerated.

Combine decreasing rates of library funding with predatory publishers, and the problem is compounded. The following figure shows the increasing price of journal subscriptions that universities must pay relative to the normal inflation rate. The journal publishers are increasing their prices exponentially, tripling the cost each decade, a rate that erodes library budgets even more. Therefore, it is tempting to say that librarians don’t actually hate books, but are victims of bad economics. But that is the mundane answer.

The shocking answer is that modern librarians find books to be anachronistic. The new hires are by and large “digital librarians” who are focused on providing digital content to serve students who have become much more digital, especially after Covid. There is also a prevailing opinion among university librarians that students want more space to study, hence the removal of stacks to be replaced by soft chairs and open study spaces.

And that is the betrayal. The collections of deep knowledge, which are unique and priceless and irreplaceable, were replaced by generic study space that could be put anywhere at any time, having no intrinsic value.

You can argue that I still have access to the knowledge because of Interlibrary Loan (ILL). But ILL only works if other libraries have yet to remove the book. What happens when every library thinks that some other library has the book, and so they throw their own copy out? At some point that volume will have vanished from all collections and that will be the end of it.

Or you can argue that I can find the book digitally scanned on Internet Archive or Google Books. But I have already found situations where special folio pages, the very pages that I needed to make my argument, had failed to be reproduced in the digital versions. And the books were too rare to be allowed to go through ILL. So I was stuck.

(By the way, this was a rare copy of the works of Francois Arago. In my book Interference: Optical Interferometry and the Scientists who Tamed Light (Oxford University Press, 2023), I make the case that it was Arago who invented the first interferometer in 1816 long before Albert Michelson’s work in 1880. But for the final smoking gun, to prove my case, I needed that folio page which took Herculean efforts to eventually track down. Our Physics Library had the book in its stacks just a decade ago, and I could have just walked upstairs from my office to look at it. Where it is now is anyone’s guess.)

But digital scans are no substitute for the real thing. To hold an old volume in your hands, run off the printing press when the author was still alive, and filled with scribbled notes in the margins by your colleagues from years past, is to commune with history. Why not bulldoze Williamsburg, Virginia, after digital capture? Why not burn the USS Constitution in Boston Bay after photographing it? Why not flatten the Alamo? When you immerse yourself in these historical settings, you gain an understanding that is deeper than possible by browsing an article on Wikipedia.

People react to the real, like real books. Why take that away?

Acknowledgements: This post is the product of several discussions with my brother, James Nolte, a retired reference librarian. He was an early developer of digital libraries, working at Clarkson University in Potsdam, NY in the mid 1980’s. But like Frankenstein, he sometimes worries about the consequences of his own creation.

The Vital Virial of Rudolph Clausius: From Stat Mech to Quantum Mech

I often joke with my students in class that the reason I went into physics is because I have a bad memory.  In biology you need to memorize a thousand things, but in physics you only need to memorize 10 things … and you derive everything else!

Of course, the first question they ask me is “What are those 10 things?”.

That’s a hard question to answer, and every physics professor probably has a different set of 10 things.  Obviously, energy conservation would be first on the list, followed by other conservation laws for various types of momentum.  Inverse-square laws probably come next.  But then what?  What do you need to memorize to be most useful when you are working out physics problems on the back of an envelope, when your phone is dead, and you have no access to your laptop or books?

One of my favorites is the Virial Theorem because it rears its head over and over again, whether you are working on problems in statistical mechanics, orbital mechanics or quantum mechanics.

The Virial Theorem

The Virial Theorem makes a simple statement about the balance between kinetic energy and potential energy (in a conservative mechanical system).  It summarizes in a single form many different-looking special cases we learn about in physics.  For instance, everyone learns early in their first mechanics course that the average kinetic energy <T> of a mass on a spring is equal to the average potential energy <V>.  But this seems different than the problem of a circular orbit in gravitation or electrostatics where the average kinetic energy is equal to half the average potential energy, but with the opposite sign.

Yet there is a unity to these two—it is the Virial Theorem:

for cases where the potential energy V has power law dependence V ≈ rn.  The harmonic oscillator has n = 2, leading to the well-known equality between average kinetic and potential energy as

The inverse square force law has a potential that varies with n = -1, leading to the flip in sign.  For instance, for a circular orbit in gravitation, it looks like

and in electrostatics it looks like

where a is the radius of the orbit. 

Yet orbital mechanics is hardly the only place where the Virial Theorem pops up.  It began its life with statistical mechanics.

Rudolph Clausius and his Virial Theorem

The pantheon of physics is a somewhat exclusive club.  It lets in the likes of Galileo, Lagrange, Maxwell, Boltzmann, Einstein, Feynman and Hawking, but it excludes many worthy candidates, like Gilbert, Stevin, Maupertuis, du Chatelet, Arago, Clausius, Heaviside and Meitner all of whom had an outsized influence on the history of physics, but who often do not get their due.  Of this later group, Rudolph Clausius stands above the others because he was an inventor of whole new worlds and whole new terminologies that permeate physics today.

Within the German Confederation dominated by Prussia in the mid 1800’s, Clausius was among the first wave of the “modern” physicists who emerged from new or reorganized German universities that integrated mathematics with practical topics.  Carl Neumann at Königsberg, Carl Gauss and Max Weber at Göttingen, and Hermann von Helmholtz at Berlin were transforming physics from a science focused on pure mechanics and astronomy to one focused on materials and their associated phenomena, applying mathematics to these practical problems.

Clausius was educated at Berlin under Heinrich Gustav Magnus beginning in 1840, and he completed his doctorate at the University of Halle in 1847.  His doctoral thesis on light scattering in the atmosphere represented an early attempt at treating statistical fluctuations.  Though his initial approach was naïve, it helped orient Clausius to physics problems of statistical ensembles and especially to gases.  The sophistication of his physics matured rapidly and already in 1850 he published his famous paper Über die bewegende Kraft der Wärme, und die Gesetze, welche sich daraus für die Wärmelehre selbst ableiten lassen (About the moving power of heat and the laws that can be derived from it for the theory of heat itself). 

Rudolph Clausius
Fig. 1 Rudolph Clausius.

This was the fundamental paper that overturned the archaic theory of caloric, which had assumed that heat was a form of conserved quantity.  Clausius proved that this was not true, and he introduced what are today called the first and second laws of thermodynamics.  This early paper was one in which he was still striving to simplify thermodynamics, and his second law was mostly a qualitative statement that heat flows from higher temperatures to lower.  He refined the second law four years later in 1854 with Über eine veranderte Form des zweiten Hauptsatzes der mechanischen Wärmetheorie (On a modified form of the second law of the mechanical theory of heat).  He gave his concept the name Entropy in 1865 from the Greek word τροπη (transformation or change) with a prefix similar to Energy. 

Clausius was one of the first to consider the kinetic theory of heat where heat was understood as the average kinetic energy of the atoms or molecules that comprised the gas.  He published his seminal work on the topic in 1857 expanding on earlier work by Augustus Krönig.  Maxwell, in turn, expanded on Clausius in 1860 by introducing probability distributions.  By 1870, Clausius was fully immersed in the kinetic theory as he was searching for mechanical proofs of the second law of thermodynamics.  Along the way, he discovered a quantity based on action-reaction pairs of forces that was related to the kinetic energy.

At that time, kinetic energy was often called vis viva, meaning “living force”.  The singular of force (vis) had a plural (virias), so Clausius—always happy to coin new words—called the action-reaction pairs of forces the virial, and hence he proved the Virial Theorem.

The argument is relatively simple.  Consider the action of a single molecule of the gas subject to a force F that is applied reciprocally from another molecule.  Also, for simplicity consider only a single direction in the gas.  The change of the action over time is given by the derivative

The average over all action-reaction pairs is

but by the reciprocal nature of action-reaction pairs, the left-hand side balances exactly to zero, giving

This expression is expanded to include the other directions and to all N bodies to yield the Virial Theorem

where the sum is over all molecules in the gas, and Clausius called the term on the right the Virial.

An important special case is when the force law derives from a power law

Then the Virial Theorem becomes (again in just one dimension)

This is often the most useful form of the theorem.  For a spring force, it leads to <T> = <V>.  For gravitational or electrostatic orbits it is  <T> = -1/2<V>.

The Virial in Astrophysics

Clausius originally developed the Virial Theorem for the kinetic theory of gases, but it has applications that go far beyond.  It is already useful for simple orbital systems like masses interacting through central forces, and these can be scaled up to N-body systems like star clusters or galaxies.

Star clusters are groups of hundreds or thousands of stars that are gravitationally bound.  Such a cluster may begin in a highly non-equilibrium configuration, but the mutual interactions among the stars causes a relaxation to an equilibrium configuration of positions and velocities.  This process is known as Virialization.  The time scale for virializaiton depends on the number of stars and on the initial configuration, such as whether there is a net angular momentum in the cluster.

A gravitational simulation of 700 stars is shown in Fig. 2. The stars are distributed uniformly with zero velocities. The cluster collapses under gravitational attraction, rebounds and approaches a steady state. The Virial Theorem applies at long times. The simulation assumed all motion was in the plane, and a regularization term was added to the gravitational potential to keep forces bounded.

Simulation of the virial theorem for a star cluster with kinetic and potential energy graphs
Fig. 2 A numerical example of the Virial Theorem for a star cluster of 700 stars beginning in a uniform initial state, collapsing under gravitational attraction, rebounding and then approaching a steady state. The kinetic energy and the potential energy of the system satisfy the Virial Theorem at long times.

The Virial in Quantum Physics

Quantum theory holds strong analogs to classical mechanics.  For instance, the quantum commutation relations have strong similarities to Poisson Brackets.  Similarly, the Virial in classical physics has a direct quantum analog.

Begin with the commutator between the Hamiltonian H and the action composed as the product of the position operator and the momentum operator XnPn

Expand the two commutators on the right to give

Now recognize that the commutator with the Hamiltonian is Ehrenfest’s Theorem on the time dependence of the operators

which equals zero when the system become stationary or steady state.  All that remains is to take the expectation value of the equation (which can include many-body interactions as well)

which is the quantum form of the Virital Theorem which is identical to the classical form when the expectation value is replaced by the ensemble average.

For the hydrogen atom this is

for principal quantum number n and Bohr radius aB.  The quantum energy levels of the hydrogen atom are

By David D. Nolte, July 24, 2024

References

“Ueber die bewegende Kraft der Warme and die Gesetze welche sich daraus für die Warmelehre selbst ableiten lassen,” in Annalen der Physik, 79 (1850), 368–397, 500–524.

Über eine veranderte Form des zweiten Hauptsatzes der mechanischen Wärmetheorie, Annalen der Physik, 93 (1854), 481–506.

Ueber die Art der Bewegung, welche wir Warmenennen, Annalen der Physik, 100 (1857), 497–507.

Clausius, RJE (1870). “On a Mechanical Theorem Applicable to Heat”. Philosophical Magazine. Series 4. 40 (265): 122–127.

Matlab Code

function [y0,KE,Upoten,TotE] = Nbody(N,L)   %500, 100, 0

A = -1;        % Grav factor
eps = 1;        % 0.1
K = 0.00001;    %0.000025

format compact

mov_flag = 1;
if mov_flag == 1
    moviename = 'DrawNMovie';
    aviobj = VideoWriter(moviename,'MPEG-4');
    aviobj.FrameRate = 10;
    open(aviobj);
end

hh = colormap(jet);
%hh = colormap(gray);
rie = randintexc(255,255);       % Use this for random colors
%rie = 1:64;                     % Use this for sequential colors
for loop = 1:255
    h(loop,:) = hh(rie(loop),:);
end
figure(1)
fh = gcf;
clf;
set(gcf,'Color','White')
axis off

thet = 2*pi*rand(1,N);
rho = L*sqrt(rand(1,N));
X0 = rho.*cos(thet);
Y0 = rho.*sin(thet);

Vx0 = 0*Y0/L;   %1.5 for 500   2.0 for 700
Vy0 = -0*X0/L;
% X0 = L*2*(rand(1,N)-0.5);
% Y0 = L*2*(rand(1,N)-0.5);
% Vx0 = 0.5*sign(Y0);
% Vy0 = -0.5*sign(X0);
% Vx0 = zeros(1,N);
% Vy0 = zeros(1,N);

for nloop = 1:N
    y0(nloop) = X0(nloop);
    y0(nloop+N) = Y0(nloop);
    y0(nloop+2*N) = Vx0(nloop);
    y0(nloop+3*N) = Vy0(nloop);
end

T = 300;  %500
xp = zeros(1,N); yp = zeros(1,N);

for tloop = 1:T
    tloop
    
    delt = 0.005;
    tspan = [0 loop*delt];
    opts = odeset('RelTol',1e-2,'AbsTol',1e-5);
    [t,y] = ode45(@f5,tspan,y0,opts);
    
    %%%%%%%%% Plot Final Positions
    
    [szt,szy] = size(y);
    
    % Set nodes
    ind = 0; xpold = xp; ypold = yp;
    for nloop = 1:N
        ind = ind+1;
        xp(ind) = y(szt,ind+N);
        yp(ind) = y(szt,ind);
    end
    delxp = xp - xpold;
    delyp = yp - ypold;
    maxdelx = max(abs(delxp));
    maxdely = max(abs(delyp));
    maxdel = max(maxdelx,maxdely);
    
    rngx = max(xp) - min(xp);
    rngy = max(yp) - min(yp);
    maxrng = max(abs(rngx),abs(rngy));
    
    difepmx = maxdel/maxrng;
    
    crad = 2.5;
    subplot(1,2,1)
    gca;
    cla;
    
    % Draw nodes
    for nloop = 1:N
        rn = rand*63+1;
        colorval = ceil(64*nloop/N);
        
        rectangle('Position',[xp(nloop)-crad,yp(nloop)-crad,2*crad,2*crad],...
            'Curvature',[1,1],...
            'LineWidth',0.1,'LineStyle','-','FaceColor',h(colorval,:))
        
    end
    
    [syy,sxy] = size(y);
    y0(:) = y(syy,:);
    
    rnv = (2.0 + 2*tloop/T)*L;    % 2.0   1.5
    
    axis equal
    axis([-rnv rnv -rnv rnv])
    box on
    drawnow
    pause(0.01)
    
    KE = sum(y0(2*N+1:4*N).^2);
    
    Upot = 0;
    for nloop = 1:N
        for mloop = nloop+1:N
            dx = y0(nloop)-y0(mloop);
            dy = y0(nloop+N) - y0(mloop+N);
            dist = sqrt(dx^2+dy^2+eps^2);
            Upot = Upot + A/dist;
        end
    end
    
    Upoten = Upot;
    
    TotE = Upoten + KE;
    
    if tloop == 1
        TotE0 = TotE;
    end

    Upotent(tloop) = Upoten;
    KEn(tloop) = KE;
    TotEn(tloop) = TotE;
    
    xx = 1:tloop;
    subplot(1,2,2)
    plot(xx,KEn,xx,Upotent,xx,TotEn,'LineWidth',3)
    legend('KE','Upoten','TotE')
    axis([0 T -26000 22000])     % 3000 -6000 for 500   6000 -8000 for 700
    
    
    fh = figure(1);
    
    if mov_flag == 1
        frame = getframe(fh);
        writeVideo(aviobj,frame);
    end
    
end

if mov_flag == 1
    close(aviobj);
end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    function yd = f5(t,y)
        
        for n1loop = 1:N
            
            posx = y(n1loop);
            posy = y(n1loop+N);
            momx = y(n1loop+2*N);
            momy = y(n1loop+3*N);
            
            tempcx = 0; tempcy = 0;
            
            for n2loop = 1:N
                if n2loop ~= n1loop
                    cposx = y(n2loop);
                    cposy = y(n2loop+N);
                    cmomx = y(n2loop+2*N);
                    cmomy = y(n2loop+3*N);
                    
                    dis = sqrt((cposy-posy)^2 + (cposx-posx)^2 + eps^2);
                    CFx = 0.5*A*(posx-cposx)/dis^3 - 5e-5*momx/dis^4;
                    CFy = 0.5*A*(posy-cposy)/dis^3 - 5e-5*momy/dis^4;
                    
                    tempcx = tempcx + CFx;
                    tempcy = tempcy + CFy;
                    
                end
            end
                        
            ypp(n1loop) = momx;
            ypp(n1loop+N) = momy;
            ypp(n1loop+2*N) = tempcx - K*posx;
            ypp(n1loop+3*N) = tempcy - K*posy;
        end
        
        yd=ypp'; 
     
    end     % end f5

end     % end Nbody

Books by David D. Nolte at Oxford University Press
Read more in Books by David D. Nolte at Oxford University Press

Where is IT Leading Us?

One of my favorite movies within the Star Wars movie franchise is Rogue One, the prequel to the very first movie (known originally simply as Star Wars but now called Episode IV: A New Hope). 

But I always thought there was a fundamental flaw in the plotline of Rogue One when the two main characters Jyn Erso and Cassian Andor (played by Felicity Jones and Diego Luna) are forced to climb a physical tower to retrieve a physical memory unit, like a hard drive, containing the plans to the Death Star. 

In such an advanced technological universe as Star Wars, why were the Death Star plans sitting on a single isolated hard drive, stored away like a file in a filing cabinet?  Why weren’t they encrypted and stored in bits and pieces across the cloud?  In fact, among all the technological wonders of the Star Wars universe, the cloud and the internet are conspicuously absent.  Why?

After the Microsoft IT crash of July 19, 2024, I think I know the answer: Because the internet and the cloud and computer operating systems are so fundamentally and hopelessly flawed that any advanced civilization would have dispensed with them eons ago.

Information Technology (IT)

I used to love buying a new computer.  It was a joy to boot up for the first time, like getting a new toy.  But those days may be over.

Now, when I buy a new computer through my university, the IT staff won’t deliver it until they have installed several layers of control systems overlayed on top of the OS.  And then all the problems start … incompatibilities, conflicts, permissions denied, failed software installation, failed VPN connections, unrecognized IP addresses, and on and on.

The problem, of course, is computer security.  There are so many IT hack attacks through so many different avenues that multiple layers of protection are needed to keep attackers out of the university network and off its computers.

But the security overhead is getting so burdensome, causing so many problems, that the dream from decades ago that the computer era would save all of us so much time has now become a nightmare as we spend hours per day just doing battle with IT issues.  More and more of our time is sucked into the IT black hole.

The Microsoft IT Crash of July 19, 2024

On Friday the 19th, we were in New York City, scheduled to fly out of Newark Airport around 2pm to return to Indianapolis. We knew we were in trouble when we looked at the news on Friday morning.  The top story was about an IT crash of Microsoft operating systems controlling airlines, banks and healthcare systems.

At Newark airport, we were greeted by the Blue Screen of Death (BSoD) on all the displays that should have been telling us about flight information.  Our United apps still worked on our iPhones, but our flight to Indy had been cancelled.  We took an option for a later flight and went to the United Club with two valid tickets and a lot of time to kill, but they wouldn’t let us in because their reader had crashed too. 

So we went to get pot stickers for lunch.  Our push notifications had been turned on, but we never received the alert that our second flight had been cancelled because the push notifications weren’t going out.  By the time we realized we had no flight, United had rebooked us on a flight 2 days later.

Not wanting to hang around the Newark airport for 2 days, we went online to rent a car to drive the 16 hours back to Indy, but all the cars were sold out.  In a last desperate act, we went onto Expedia and found an available car from Thrifty Car Rental—likely the very last one at the Newark airport.

So, on the road by 4pm we had 16 hours ahead of us before getting back home.  The cost out of pocket (even after subtracting the $400 refund from United on our return flight) was $700 … all because of one line of code in a Microsoft update.  The total estimated cost of that error worldwide is anticipated to exceed $1B. 

A House of Cards

The IT era began around 1980, about 45 years ago, when IBM launched its PC.  Operating systems were amazingly simplistic at that time, but slowly over the decades they grew into behemoths, add-ons adding to add-ons, cobbled together as if with chewing gum and baling wire.  Now they consist of millions of lines of code, patches on patches seeking to fix incompatibilities that create more incompatibilities in the process.

IT is a house of cards that takes only one bad line of code to bring the whole thing crashing down across the world.  This is particularly worrisome given the Axis of Chaos that resents seeing the free world enjoying its freedoms.  It’s an easy target.

But it doesn’t have to be this way.  It’s not unlike the early industrial revolution of steam power when every engine was different, or transportation when there were multiple railroad wheel widths, or electrification when AC did battle with DC, or telecommunications when different types of MUX on fiber-optic cables were incompatible.  This always happens when there is a revolution in technology that develops rapidly.

What is needed is a restart, to scrap the entire system and start from scratch.  Computer Scientists know how to build an efficient and resilient network from the ground up, with certification processes to remove the anonymity that enables cyber criminals to masquerade as legitimate operators.

But to do this requires a financial incentive.  The cost would be huge because the current system is so delocalized as every laptop or smart pad becomes a node in the network.  The Infrastructure Bill could still make this goal its target.  That would be revolutionary and enabling (like the Eisenhower Interstate System was in the 1950’s which transformed American society), instead of spending a trillion dollars to fill in potholes across a neglected infrastructure.

It may seem to be too late to start over, but a few more IT crashes like last Friday may make it mandatory.  Wouldn’t it be better to start now?

100 Years of Quantum Physics: The Statistics of Satyendra Nath Bose (1924)

One hundred years ago, in July of 1924, a brilliant Indian physicist changed the way that scientists count.  Satyendra Nath Bose (1894 – 1974) mailed a letter to Albert Einstein enclosed with a manuscript containing a new derivation of Planck’s law of blackbody radiation.  Bose had used a radical approach that went beyond the classical statistics of Maxwell and Boltzmann by counting the different ways that photons can fill a volume of space.  His key insight was the indistinguishability of photons as quantum particles. 

Today, the indistinguishability of quantum particles is the foundational element of quantum statistics that governs how fundamental particles combine to make up all the matter of the universe.  At the time, neither Bose nor Einstein realized just how radical his approach was, until Einstein, using Bose’s idea, derived the behavior of material particles under conditions similar black-body radiation, predicting a new state of condensed matter [1].  It would take scientists 70 years to finally demonstrate “Bose-Einstein” condensation in a laboratory in Boulder, Colorado in 1995.

Early Days of the Photon

As outlined in a previous blog (see Who Invented the Quantum? Einstein versus Planck), Max Planck was a reluctant revolutionary.  He was led, almost against his will, in 1900 to postulate a quantized interaction between electromagnetic radiation and the atoms in the walls of a black-body enclosure.  He could not break free from the hold of classical physics, assuming classical properties for the radiation and assigning the quantum only to the “interaction” with matter.  It was Einstein, five years later in 1905, who took the bold step of assigning quantum properties to the radiation field itself, inventing the idea of the “photon” (named years later by the American chemist Gilbert Lewis) as the first quantum particle. 

Despite the vast potential opened by Einstein’s theory of the photon, quantum physics languished for nearly 20 years from 1905 to 1924 as semiclassical approaches dominated the thinking of Niels Bohr in Copenhagen, and Max Born in Göttingen, and Arnold Sommerfeld in Munich, as they grappled with wave-particle duality. 

The existence of the photon, first doubted by almost everyone, was confirmed in 1915 by Robert Millikan’s careful measurement of the photoelectric effect.  But even then, skepticism remained until Arthur Compton demonstrated experimentally in 1923 that the scattering of photons by electrons could only be explained if photons carried discrete energy and momentum in precisely the way that Einstein’s theory required.

Despite the success of Einstein’s photon by 1923, derivations of the Planck law still used a purely wave-based approach to count the number of electromagnetic standing waves that a cavity could support.  Bose would change that by deriving the Planck law using purely quantum methods.

The Quantum Derivation by Bose

Satyendra Nath Bose was born in 1894 in Calcutta, the old British capital city of India, now Kolkata.  He excelled at his studies, especially in mathematics, and received a lecturer post at the University of Calcutta from 1916 to 1921, when he moved into a professorship position at the new University of Dhaka. 

One day, as he was preparing a class lecture on the derivation of Planck’s law,

he became dissatisfied with the usual way it was presented in textbooks, based on standing waves in the cavity, and he flipped the problem.

Rather than deriving the number of standing wave modes in real space, he considered counting the number of ways a photon would fill up phase space.

Phase space is the natural dynamical space of Hamiltonian systems [2], such as collections of quantum particles like photons, in which the axes of the space are defined by the positions and momenta of the particles.  The differential volume of phase space dVPS occupied by a single photon is given by

Using Einstein’s formula for the relationship between momentum and frequency

where h is Planck’s constant, yields

No quantum particle can have its position and momentum defined arbitrarily precisely because of Heisenberg’s uncertainty principle, requiring phase space volumes to be resolvable only to within a minimum reducible volume element given by h3

Therefore, the number of states in phase space occupied by the single photon are obtained by dividing dVPS by h3 to yield

which is half of the prefactor in the Planck law.  Several comments are now necessary. 

First, when Bose did this derivation, there was no Heisenberg Uncertainty relationship—that would come years later in 1927.  Bose was guided, instead, by the work of Bohr and Sommerfeld and Ehrenfest who emphasized the role played by the action principle in quantum systems.  Phase space dimensions are counted in units of action, and the quantized unit of action is given by Planck’s constant h, hence quantized volumes of action in phase space are given by h3.  By taking this step, Bose was anticipating Heisenberg by nearly three years.

Second, Bose knew that his phase space volume was half of the prefactor in Planck’s law.  But since he was counting states, he reasoned that this meant that each photon had two internal degrees of freedom.  A possibility he considered to account for this was that the photon might have a spin that could be aligned, or anti-aligned, with the momentum of the photon [3, 4].  How he thought of spin is hard to fathom, because the spin of the electron, proposed by Uhlenbeck and Goudsmit, was still two years away. 

But Bose was not finished.  The derivation, so far, is just how much phase space volume is accessible to a single photon.  The next step is to count the different ways that many photons can fill up phase space.  For this he used (bringing in the factor of 2 for spin)

where pn is the probability that a volume of phase space contains n photons, plus he used the usual conditions on energy and number

The probability for all the different permutations for how photons can occupy phase space is then given by

A third comment is now necessary:  By assuming this probability, Bose was discounting situations where the photons could be distinguished from one another.  This indistinguishability of quantum particles is absolutely fundamental to our understanding today of quantum statistics, but Bose was using it implicitly for the first time here. 

The final distribution of photons at a given temperature T is found by maximizing the entropy of the system

subject to the conditions of photon energy and number. Bose found the occupancy probabilities to be

with a coefficient B to be found next by using this in the expression for the geometric series

yielding

Also, from the total number of photons

And, from the total energy

Bose obtained, finally

which is Planck’s law.

This derivation uses nothing by the counting of quanta in phase space.  There are no standing waves.  It is a purely quantum calculation—the first of its kind.

Enter Einstein

As usual with revolutionary approaches, Bose’s initial manuscript submitted to the British Philosophical Magazine was rejected.  But he was convinced that he had attained something significant, so he wrote his letter to Einstein containing his manuscript, asking that if Einstein found merit in the derivation, then perhaps he could have it translated into German and submitted to the Zeitschrift für Physik. (That Bose would approach Einstein with this request seems bold, but they had communicated some years before when Bose had translated Einstein’s theory of General Relativity into English.)

Indeed, Einstein recognized immediately what Bose had accomplished, and he translated the manuscript himself into German and submitted it to the Zeitschrift on July 2, 1924 [5].

During his translation, Einstein did not feel that Bose’s conjecture about photon spin was defensible, so he changed the wording to attribute the factor of 2 in the derivation to the two polarizations of light (a semiclassical concept), so Einstein actually backtracked a little from what Bose originally intended as a fully quantum derivation. The existence of photon spin was confirmed by C. V. Raman in 1931 [6].

In late 1924, Einstein applied Bose’s concepts to an ideal gas of material atoms and predicted that at low temperatures the gas would condense into a new state of matter known today as a Bose-Einstein condensate [1]. Matter differs from photons because the conservation of atom number introduces a finite chemical potential to the problem of matter condensation that is not present in the Planck law.

Fig. 1 Experimental evidence for the Bose-Einstein condensate in an atomic vapor [7].

Paul Dirac, in 1945, enshrined the name of Bose by coining the phrase “Boson” to refer to a particle of integer spin, just as he coined “Fermion” after Enrico Fermi to refer to a particle of half-integer spin. All quantum statistics were encased by these two types of quantum particle until 1982, when Frank Wilczek coined the term “Anyon” to describe the quantum statistics of particles confined to two dimensions whose behaviors vary between those of a boson and of a fermion.

By David D. Nolte, June 26, 2024

References

[1] A. Einstein. “Quantentheorie des einatomigen idealen Gases”. Sitzungsberichte der Preussischen Akademie der Wissenschaften. 1: 3. (1925)

[2] D. D. Nolte, “The tangled tale of phase space,” Physics Today 63, 33-38 (2010).

[3] Partha Ghose, “The Story of Bose, Photon Spin and Indistinguishability” arXiv:2308.01909 [physics.hist-ph]

[4] Barry R. Masters, “Satyendra Nath Bose and Bose-Einstein Statistics“, Optics and Photonics News, April, pp. 41-47 (2013)

[5] S. N. Bose, “Plancks Gesetz und Lichtquantenhypothese”, Zeitschrift für Physik , 26 (1): 178–181 (1924)

[6] C. V. Raman and S. Bhagavantam, Ind. J. Phys. vol. 6, p. 353, (1931).

[7] Anderson, M. H.; Ensher, J. R.; Matthews, M. R.; Wieman, C. E.; Cornell, E. A. (14 July 1995). “Observation of Bose-Einstein Condensation in a Dilute Atomic Vapor”. Science. 269 (5221): 198–201.


Books by David Nolte at Oxford University Press
Read more in Books by David Nolte at Oxford University Press