February 5, 2025September 3, 2025 by David D. Nolte

A Short History of Neural Networks

History of Dynamics
ai, Artificial Intelligence, Attention mechanism, convolutional neural network, Deep Learning, History of Physics, Hopfield network, Machine Learning, neural networks, Neurodynamics, Nonlinear Dynamics, recurrent neural network, technology, van der Pol oscillator
1 Comment

When it comes to questions about the human condition, the question of intelligence is at the top. What is the origin of our intelligence? How intelligent are we? And how intelligent can we make other things…things like artificial neural networks?

This is a short history of the science and technology of neural networks, not just artificial neural networks but also the natural, organic type, because theories of natural intelligence are at the core of theories of artificial intelligence. Without understanding our own intelligence, we probably have no hope of creating the artificial type.

Ramon y Cajal (1888): Visualizing Neurons

The story begins with Santiago Ramon y Cajal (1853 – 1934) who received the Nobel Prize in physiology in 1906 for his work illuminating natural neural networks. He built on work by Camillo Golgi, using a stain to give intracellular components contrast [1], and then went further to developed his own silver emulsions like those of early photography (which was one of his hobbies). Cajal was the first to show that neurons were individual constituents of neural matter and that their contacts were sequential: axons of sets of neurons contacted the dendrites of other sets of neurons, never axon-to-axon or dendrite-to-dendrite, to create a complex communication network. This became known as the neuron doctrine, and it is a central idea of neuroscience today.

*Fig. 1 One of Cajal’s published plates demonstrating neural synapses. From Link.*

McCulloch and Pitts (1943): Mathematical Models

In 1941, Warren S. McCulloch (1898–1969) arrived at the Department of Psychiatry at the University of Illinois at Chicago where he met with the mathematical biology group at the University of Chicago led by Nicolas Rashevsky (1899–1972), widely acknowledged as the father of mathematical biophysics in the United States.

An itinerant member of Rashevsky’s group at the time was a brilliant, young and unusual mathematician, Walter Pitts (1923– 1969). He was not enrolled as a student at Chicago, but had simply “showed up” one day as a teenager at Rashevsky’s office door. Rashevsky was so impressed by Pitts that he invited him to attend the group meetings, and Pitts became interested in the application of mathematical logic to biological information systems.

When McCulloch met Pitts, he realized that Pitts had the mathematical background that complemented his own views of brain activity as computational processes. Pitts was homeless at the time, so McCulloch invited him to live with his family, giving the two men ample time to work together on their mutual obsession to provide a logical basis for brain activity in the way that Turing had provided it for computation.

McCulloch.and.Pitts Download

McColloch and Pitts simplified the operation of individual neurons to their most fundamental character, envisioning a neural computing unit with multiple inputs (received from upstream neurons) and a single on-off output (sent to downstream neurons) with the additional possibility of feedback loops as downstream neurons fed back onto upstream neurons. They also discretized the dynamics in time, using discrete logic and time-difference equations, succeeding in devising a logical structure with rules and equations for the general operation of nets of neurons. They published their results a 1943 in the paper titled “A logical calculus of the ideas immanent in nervous activity,” [2] introducing computational language and logic to neuroscience. Their simplified neural unit became the basis for discrete logic, picked up a few years later by von Neumann as an elemental example of a logic gate upon which von Neumann began constructing the theory and design of the modern electronic computer.

*Fig. 2 The only figure in McCulloch and Pitt’s “Logical Calculus”.*

Donald Hebb (1949): Hebbian Learning

The basic model for learning and adjustment of synaptic weights among neurons was put forward in 1949 by the physiological psychologist Donald Hebb (1904-1985) of McGill University in Canada in a book titled The Organization of Behavior [3].

In Hebbian learning, an initially untrained network consists of many neurons with many synapses having random synaptic weights. During learning, a synapse between two neurons is strengthened when both the pre-synaptic and post-synaptic neurons are firing simultaneously. In this model, it is essential that each neuron makes many synaptic contacts with other neurons because it requires many input neurons acting in concert to trigger the output neuron. In this way, synapses are strengthened when there is collective action among the neurons. The synaptic strengths are therefore altered through a form of self-organization. A collective response of the network strengthens all those synapses that are responsible for the response, while the other synapses that do not contribute, weaken. Despite the simplicity of this model, it has been surprisingly robust, standing up as a general principle for the training of artificial neural networks.

*Fig. 3. A Figure from Hebb’s textbook on psychology (1958). From Link.*

Hodgkin and Huxley (1952): Neuron Transporter Models

Alan Hodgkin (1914 – 1998) and Andrew Huxley (1917 – 2012) were English biophysicists who received the 1963 Nobel Prize in physiology for their work on the physics behind neural activation. They constructed a differential equation for the spiking action potential for which their biggest conceptual challenge was the presence of time delays in the voltage signals that were not explained by linear models of the neural conductance. As they began exploring nonlinear models, using their experiments to guide the choice of parameters, they settled on a dynamical model in a four-dimensional phase space. One dimension was voltage, while another was inhibitory current. The two remaining dimensions were sodium and potassium conductances, which they had determined were the major ions participating in the generation and propagation of the action potential. The nonlinear conductances of their model described the observed time delays and captured the essential neural behavior of the fast spike followed by a slow recovery. Huxley solved the equations on a hand-cranked calculator, taking over three months of tedious cranking to plot the numerical results.

*Fig. 4 The Hodgkin-Huxley model of the neuron, including capacitance C, voltage V and bias current I along with the conductances of potassium (K), sodium (Na) and Lithium (L) channels.*

Hodgkin and Huxley published [4] their measurements and their model (known as the Hodgkin-Huxley model) in a series of six papers in 1952 that led to an explosion of research in electrophysiology, for which Hodgkin and Huxley won the 1963 Nobel Prize in physiology or medicine. The four-dimensional Hodgkin–Huxley model stands as a classic example of the power of phenomenological modeling when combined with accurate experimental observation. Hodgkin and Huxley were able to ascertain not only the existence of ion channels in the cell membrane, but also their relative numbers, long before these molecular channels were ever directly observed using electron microscopes. The Hodgkin–Huxley model lent itself to simplifications that could capture the essential behavior of neurons while stripping off the details.

Frank Rosenblatt (1958): The Perceptron

Frank Rosenblatt (1928–1971) had a PhD in psychology from Cornell University and was in charge of the cognitive systems section of the Cornell Aeronautical Laboratory (CAL) located in Buffalo, New York. He was tasked with fulfilling a contract from the Navy to develop an analog image processor. Drawing from the work of McCulloch and Pitts, his team constructed a software system and then constructed a hardware model that adaptively updated the strength of the inputs, that they called neural weights, as it was trained on test images. The machine was dubbed the Mark I Perceptron, and its announcement in 1958 created a small media frenzy [5]. A New York Times article reported the perceptron was “the embryo of an electronic computer that [the navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”

rosenblatt-1957 Download

The perceptron had a simple architecture, with two layers of neurons consisting of an input layer and a processing layer, and it was programmed by adjusting the synaptic weights to the inputs. This computing machine was the first to adaptively learn its functions, as opposed to following predetermined algorithms like digital computers. It seemed like a breakthrough in cognitive science and computing, as trumpeted by the New York Times. But within a decade, the development had stalled because the architecture was too restrictive.

Fig. 5 Frank Rosenblatt with his Perceptron. From Link.

Richard Fitzhugh and Jin-Ichi Nagumo (1961): Neural van der Pol Oscillators

In 1961 Richard FitzHugh (1922–2007), a neurophysiology researcher at the National Institute of Neurological Disease and Blindness (NINDB) of the National Institutes of Health (NIH), created a surprisingly simple model of the neuron that retained only a third order nonlinearity, just like the third-order nonlinearity that Rayleigh had proposed and solved in 1883, and that van der Pol extended in 1926. Around the same time that FitzHugh proposed his mathematical model [6], the electronics engineer Jin-Ichi Nagumo (1926-1999) in Japan created an electronic diode circuit with an equivalent circuit model that mimicked neural oscillations [7]. Together, this work by FitzHugh and Nagumo led to the so-called FitzHugh–Nagumo model. The conceptual importance of this model is that it demonstrated that the neuron was a self-oscillator, just like a violin string or wheel shimmy or the pacemaker cells of the heart. Once again, self-oscillators showed themselves to be common elements of a complex world—and especially of life.

*Fig. 6 The FitzHugh-Nagumo model of the neuron simplifies the Hodgkin-Huxley model from four dimensions down to two dimensions of voltage V and channel activation n.*

Fitzhugh_1961 Download

John Hopfield (1982): Spin Glasses and Recurrent Networks

John Hopfield (1933–) received his PhD from Cornell University in 1958, advised by Al Overhauser in solid state theory, and he continued to work on a broad range of topics in solid state physics as he wandered from appointment to appointment at Bell Labs, Berkeley, Princeton, and Cal Tech. In the 1970s Hopfield’s interests broadened into the field of biophysics, where he used his expertise in quantum tunneling to study quantum effects in biomolecules, and expanded further to include information transfer processes in DNA and RNA. In the early 1980s, he became aware of aspects of neural network research and was struck by the similarities between McColloch and Pitts’ idealized neuronal units and the physics of magnetism. For instance, there is a type of disordered magnetic material called a spin glass in which a large number of local regions of magnetism are randomly oriented. In the language of solid-state physics, one says that the potential energy function of a spin glass has a large number of local minima into which various magnetic configurations can be trapped. In the language of dynamics, one says that the dynamical system has a large number of basins of attraction [8].

Hopfieldpnas00447-0135 Download

The Parallel Distributed Processing Group (1986): Backpropagation

David Rumelhart, a mathematical psychologist at UC San Diego, was joined by James McClelland in 1974 and then by Geoffrey Hinton in 1978 to become what they called the Parallel Distributed Processing (PDP) group. The central tenets of the PDP framework they developed were: 1) processing is distributed across many semi-autonomous neural units, that 2) learn by adjusting the weights of their interconnections based on the strengths of their signals (i.e., Hebbian learning), whose memories and behaviors are 3) an emergent property of the distributed learned weights.

PDP was an exciting framework for artificial intelligence, and it captured the general behavior of natural neural networks, but it had a serious problem: How could all of the neural weights be trained?

In 1986, Rumelhart and Hinton with the mathematician Ronald Williams developed a mathematical procedure for training neural weights called error backpropagation [9]. The idea is actually very simple: create a mean squared error of the response of a neural network compared to an ideal response, then tweak one of the neural weights and see if the error increases or decreases. If the error decreases, keep the tweak for that weight and move to the next, working iteratively, tweak by tweak, to minimize the mean squared error. In this way, large numbers of neural weights can be adjusted as the network is trained to perform a specified task.

Error backpropagation has come a long way from that early 1986 paper, and it now lies at the core of the AI revolution we are experiencing today as tens of millions of neural weights are trained on massive datasets.

Rumelhart323533a0 Download

Yann LeCun (1989): Convolutional Neural Networks

In 1988, I was a new post-doc at AT&T Bell Labs at Holmdel, New Jersey fresh out of my PhD in physics from Berkeley. Bell Labs liked to give its incoming employees inspirational talks and tours of their facilities, and one of the tours I took was of the neural network lab run by Lawrence Jackel that was working on computer recognition of zip-code digits. The team’s new post-doc, arriving at Bell Labs the same time as me, was Yann LeCun. It is very possible that the demo our little group watched was run by him, or at least he was there, but at the time he was a nobody, so even if I had heard his name, it wouldn’t have meant anything to me.

Fast forward to today, and Yann LeCun’s name is almost synonomous with AI. He is the Chief AI Scientist at Facebook and his google scholar page reports that he gets 50,000 citations per year.

LeCun is famous for developing the convolutional neural network (CNN) in work that he published from Bell Labs in 1989 [10]. It is a biomimetic neural network that takes its inspiration from the receptive fields of the neural networks in the retina. What you think you see, when you look at something, is actually reconstructed by your brain. Your retina is a neural processor with receptive fields that are a far cry from one-to-one. Most prominent in the retina are center-surround fields, or kernels, that respond to the derivatives of the focused image instead of the image itself. It’s the derivatives that are sent up your optic neuron to your brain which then reconstructs the image. It works as a form of image compression so that broad uniform areas in an image are reduced to its edges.

LeCun.neco.1989.1.4.541 Download

The convolutional neural network works in the same way, it’s just engineered specifically to produce compressed and multiscale codes that capture broad areas as well as the fine details of an image. By constructing many different “kernel” operators at many different scales, it creates a set of features that capture the nuances of the image in quantitative form that is then processed by training neural weights in downstream neural networks.

Fig. 7 Example of a receptive field of a CNN. The filter is the kernel (in this case a discrete 3×3 Laplace operator) that is stepped sequentially across the image field to produce the Laplacian feature map of the original image. One feature map for every different kernel becomes the input for the next level of kernels in a hierarchical scaling structure.

Geoff Hinton (2006): Deep Belief

It seems like Geoff Hinton has had his finger in almost every pie when it comes to how we do AI today. Backpropagation? Geoff Hinton. Rectified Linear Units? Geoff Hinton. Boltzmann Machines? Geoff Hinton. t-SNE? Geoff Hinton. Dropout regularization? Geoff Hinton. AlexNet? Geoff Hinton. The 2024 Nobel Prize in Physics? Geoff Hinton! He may not have invented all of these, but he was in the midst of it all.

Hinton received his PhD in Artificial Intelligence (ar rare field at the time) from the University of Edinburgh in 1978 after which he joined the PDP group at UCSD (see above) as a post-doc. After a time at Carnegie-Mellon, he joined the University of Toronto, Canada, in 1987 where he established one of the leading groups in the world on neural network research. It was from here that he launched so many of the ideas and techniques that have become the core of deep learning.

A central idea of deep learning came from Hinton’s work on Boltzmann Machines that learn statistical distributions of complex data. This type of neural network is known as an energy-based model, similar to a Hopfield network, and it has strong ties to the statistical mechanics of spin-glass systems. Unfortunately, it is a bitch to train! So Hinton simplified it into a Restricted Boltzmann Machine (RBM) that was much more tractable and layers of RBMs could be stacked into “Deep Belief Networks” [11] that had a hierarchical structure that allowed the neural nets to learn layers of abstractions. These were among the first deep networks that were able to do complex tasks at the level of human capabilities (and sometimes beyond).

The breakthrough that propelled Geoff Hinton to world-wide acclaim was the success of AlexNet, a neural network constructed by his graduate student Alex Krizhevsky at Toronto in 2012 consisting of 650,000 neurons with 60 million parameters that were trained using two early Nvidia GPUs. It won the ImageNet challenge that year, enabled by its deep architecture and representing a marked advancement that has been proceeding unabated today.

AttentionIsAllYouNeed2017 Download

Deep learning is now the rule in AI, supported by the Attention mechanism and Transformers that underpin the large language models, like ChatGPT and others, that are poised to disrupt all the legacy business models based on the previous silicon revolution of 50 years ago.

References

[1] Ramón y Cajal S. (1888). Estructura de los centros nerviosos de las aves. Rev. Trim. Histol. Norm. Pat. 1, 1–10.

[2] McCulloch, W.S. and W. Pitts, A Logical Calculus of the Ideas Immanent in Nervous Activity. Bull. Math. Biophys., 1943. 5: p. 115.

[3] Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: Wiley and Sons. ISBN 978-0-471-36727-7 – via Internet Archive.

[4] Hodgkin AL, Huxley AF (August 1952). “A quantitative description of membrane current and its application to conduction and excitation in nerve”. The Journal of Physiology. 117 (4): 500–44.

[5] Rosenblatt, Frank (1957). “The Perceptron—a perceiving and recognizing automaton”. Report 85-460-1. Cornell Aeronautical Laboratory.

[6] FitzHugh, Richard (July 1961). “Impulses and Physiological States in Theoretical Models of Nerve Membrane”. Biophysical Journal. 1 (6): 445–466.

[7] Nagumo, J.; Arimoto, S.; Yoshizawa, S. (October 1962). “An Active Pulse Transmission Line Simulating Nerve Axon”. Proceedings of the IRE. 50 (10): 2061–2070.

[8] Hopfield, J. J. (1982). “Neural networks and physical systems with emergent collective computational abilities”. Proceedings of the National Academy of Sciences. 79 (8): 2554–2558.

[9] Rumelhart, D.E. et al. Nature 323, 533-536 (1986).

[10] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541–551, Winter 1989.

[11] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation 18, 1527-1554 (2006).

Read more in Books by David D. Nolte at Oxford University Press

April 18, 2022February 9, 2023 by David D. Nolte

Post-Modern Machine Learning: The Deep Revolution

The mysteries of human intelligence are among the final frontiers of science. Despite our pride in what science has achieved across the past century, we have stalled when it comes to understanding intelligence or emulating it. The best we have done so far is through machine learning — harnessing the computational power of computers to begin to mimic the mind, attempting to answer the essential question:

How do we get machines to Know what we Know?

In modern machine learning, the answer is algorithmic.

In post-modern machine learning, the answer is manifestation.

The algorithms of modern machine learning are cause and effect, rules to follow, producing only what the programmer can imagine. But post-modern machine learning has blown past explicit algorithms to embrace deep networks. Deep networks today are defined by neural networks with thousands, or tens of thousands, or even hundreds of thousands, of neurons arrayed in multiple layers of dense connections. The interactivity of so many crossing streams of information defies direct deconstruction of what the networks are doing — they are post-modern. Their outputs manifest themselves, self-assembling into simplified structures and patterns and dependencies that are otherwise buried unseen in complicated data.

Fig. 1 A representation of a deep network with three fully-connected hidden layers. Deep networks are typically three or more layers deep, but each layer can have thousands of neurons. (Figure from the TowardsDataScience blog.)

Deep learning emerged as recently as 2006 and has opened wide new avenues of artificial intelligence that move beyond human capabilities for some tasks. But deep learning also has pitfalls, some of which are inherited from the legacy approaches of traditional machine learning, and some of which are inherent in the massively high-dimensional spaces in which deep learning takes place. Nonetheless, deep learning has revolutionized many aspects of science, and there is reason for optimism that the revolution will continue. Fifty years from now, looking back, we may recognize this as the fifth derivative of the industrial revolution (Phase I: Steam. Phase II: Electricity. Phase III: Automation. Phase IV: Information. Phase V: Intelligence).

From Multivariate Analysis to Deep Learning

Conventional machine learning, as we know it today, has had many names. It began with Multivariate Analysis of mathematical population dynamics around the turn of the last century, pioneered by Francis Galton (1874), Karl Pearson (1901), Charles Spearman (1904) and Ronald Fisher (1922) among others.

The first on-line computers during World War II were developed to quickly calculate the trajectories of enemy aircraft for gunnery control, introducing the idea of feedback control of machines. This was named Cybernetics by Norbert Wiener, who had participated in the development of automated control of antiaircraft guns.

Table I. Evolution of Names for Machine Learning

Multivariate Analysis (early-to-mid 1900’s)
Cybernetics (1940’s – 1950’s)
Connectionism (1960’s – 1970’s)
Parallel Distributed Processing (1970’s – 1980’s)
Neural Networks (1980’s – 1990’s)
Deep Learning (2000’s – today)

A decade later, during the Cold War, it became necessary to find hidden objects in large numbers of photographs. The embryonic electronic digital computers of the day were far too slow with far too little memory to do the task, so the Navy contracted with the Cornell Aeronautical Laboratory in Cheektowaga, New York, a suburb of Buffalo, to create an analog computer capable of real-time image analysis. This led to the invention of the Perceptron by Frank Rosenblatt as the first neural network-inspired computer [1], building on ideas of neural logic developed by Warren McColloch and Walter Pitts.

Fig. 2 Frank Rosenblatt working on the Perceptron. (From the Cornell Chronicle)

Fig. 3 Rosenblatt’s conceptual design of the connectionism of the perceptron (1958).

Several decades passed with fits and starts as neural networks remained too simple to accomplish anything profound. Then in 1986, David Rumelhart and Ronald Williams at UC San Diego with Geoff Hinton at Carnegie-Mellon discovered a way to train multiple layers of neurons, in a process called error back propagation [2]. This publication opened the floodgates of Connectionism — also known as Parallel Distributed Processing. The late 80’s and much of the 90’s saw an expansion of interest in neural networks, until the increasing size of the networks ran into limits caused by the processing speed and capacity of conventional computers towards the end of the decade. During this time it had become clear that the most interesting computations required many layers of many neurons, and the number of neurons expanded into the thousands, but it was difficult to train such systems that had tens of thousands of adjustable parameters, and research in neural networks once again went into a lull.

The beginnings of deep learning started with two breakthroughs. The first was by Yann Lecun at Bell Labs in 1998 who developed, with Leon Bottou, Yoshua Bengio and Patrick Haffner, a Convolutional Neural Network that had seven layers of neurons that classified hand-written digits [3]. The second was from Geoff Hinton in 2006, by then at the University of Toronto, who discovered a fast learning algorithm for training deep layers of neurons [4]. By the mid 2010’s, research on neural networks was hotter than ever, propelled in part by several very public successes, such as Deep Mind’s machine that beat the best player in the world at Go in 2017, self-driving cars, personal assistants like Siri and Alexa, and YouTube recommendations.

The Challenges of Deep Learning

Deep learning today is characterized by neural network architectures composed of many layers of many neurons. The nature of deep learning brings with it two main challenges: 1) efficient training of the neural weights, and 2) generalization of trained networks to perform accurately on previously unseen data inputs.

Solutions to the first challenge, efficient training, are what allowed the deep revolution in the first place—the result of a combination of increasing computer power with improvements in numerical optimization. This included faster personal computers that allowed nonspecialists to work with deep network programming environments like Matlab’s Deep Learning toolbox and Python’s TensorFlow.

Solutions to the second challenge, generalization, rely heavily on a process known as “regularization”. The term “regularization” has a slippery definition, an obscure history, and an awkward sound to it. Regularization is the noun form of the verb “to regularize” or “to make regular”. Originally, regularization was used to keep certain inverse algorithms from blowing up, like inverse convolutions, also known as deconvolution. Direct convolution is a simple mathematical procedure that “blurs” ideal data based on the natural response of a measuring system. However, if one has experimental data, one might want to deconvolve the system response from the data to recover the ideal data. But this procedure is numerically unstable and can “blow up”, often because of the divide-by-zero problem. Regularization was a simple technique that kept denominators from going to zero.

Regularization became a common method for inverse problems that are notorious to solve because of the many-to-one mapping that can occur in measurement systems. There can be a number of possible causes that produce a single response. Regularization was a way of winnowing out “unphysical” solutions so that the physical inverse solution remained.

During the same time, regularization became a common tool used by quantum field theorists to prevent certain calculated quantities from diverging to infinity. The solution was again to keep denominators from going to zero by setting physical cut-off lengths on the calculations. These cut-offs were initially ad hoc, but the development of renormalization group theory by Kenneth Wilson at Cornell (Nobel Prize in 1982) provided a systematic approach to solving the infinities of quantum field theory.

With the advent of neural networks, having hundreds to thousands to millions of adjustable parameters, regularization became the catch-all term for fighting the problem of over-fitting. Over-fitting occurs when there are so many adjustable parameters that any training data can be fit, and the neural network becomes just a look-up table. Look-up tables are the ultimate hash code, but they have no predictive capability. If a slightly different set of data are fed into the network, the output can be anything. In over-fitting, there is no generalization, the network simply learns the idiosyncrasies of the training data without “learning” the deeper trends or patterns that would allow it to generalize to handle different inputs.

Over the past decades a wide collection of techniques have been developed to reduce over-fitting of neural networks. These techniques include early stopping, k-fold holdout, drop-out, L1 and L2 weight-constraint regularization, as well as physics-based constraints. The goal of all of these techniques is to keep neural nets from creating look-up tables and instead learning the deep codependencies that exist within complicated data.

Table II. Regularization Techniques in Machine Learning

By judicious application of these techniques, combined with appropriate choices of network design, amazingly complex problems can be solved by deep networks and they can generalized too (to some degree). As the field moves forward, we may expect additional regularization tricks to improve generalization, and design principles will emerge so that the networks no longer need to be constructed by trial and error.

The Potential of Deep Learning

In conventional machine learning, one of the most critical first steps performed on a dataset has been feature extraction. This step is complicated and difficult, especially when the signal is buried either in noise or in confounding factors (also known as distractors). The analysis is often highly sensitive to the choice of features, and the selected features may not even be the right ones, leading to bad generalization. In deep learning, feature extraction disappears into the net itself. Optimizing the neural weights subject to appropriate constraints forces the network to find where the relevant information lies and what to do with it.

The key to finding the right information was not just having many neurons, but having many layers, which is where the potential of deep learning emerges. It is as if each successive layer is learning a more abstract or more generalized form of the information than the last. This hierarchical layering is most evident in the construction of convolutional deep networks, where the layers are receiving a telescoping succession of information fields from lower levels. Geoff Hinton‘s Deep Belief Network, which launched the deep revolution in 2006, worked explicitly with this hierarchy in mind through direct design of the network architecture. Since then, network architecture has become more generalized, with less up-front design while relying on the increasingly sophisticated optimization techniques of training to set the neural weights. For instance, a simplified instance of a deep network is shown in Fig. 4 with three hidden layers of neurons.

Fig. 4 General structure of a deep network with three hidden layers. Layers will typically have hundreds or thousands of neurons. Each gray line represents a weight value, and each circle is a neural activation function.

The mathematical structure of a deep network is surprisingly simple. The equations for the network in Fig. 4, that convert an input x^a to an output y^e, are

These equations use index notation to denote vectors (single superscript) and matrices (double indexes). The repeated index (one up and one down) denotes an implicit “Einstein” summation. The function φ(.) is known as the activation function, which is nonlinear. One of the simplest activation functions to use and analyze, and the current favorite, is known as the ReLU (for rectifying linear unit). Note that these equations represent a simple neural cascade, as the output of one layer becomes the input for the next.

The training of all the matrix elements assumes a surprisingly simply optimization function, known as an objective function or a loss function, that can look like

where the first term is the mean squared error of the network output y^e relative to the desired output y⁰ for the training set, and the second term, known as a regularization term (see section above) is a quadratic form that keeps the weights from blowing up. This loss function is minimized over the set of adjustable matrix weights.

The network in Fig. 4 is just a toy, with only 5 inputs and 5 outputs and only 23 neurons. But it has 30+36+36+30+23 = 155 adjustable weights. If this seems like overkill, but it is nothing compared to neural networks with thousands of neurons per layer and tens of layers. That massive overkill is exactly the power of deep learning — as well as its pitfall.

The Pitfalls of Deep Learning

Despite the impressive advances in deep learning, serious pitfalls remain for practitioners. One of the most challenging problems in deep learning is the saddle-point problem. A saddle-point in an objective function is like a mountain pass through the mountains: at the top of the pass it slopes downward in two opposite directions into the valleys but slopes upward in the two orthogonal directions to mountain peaks. A saddle point is an unstable equilibrium where a slight push this way or that can lead the traveller to two very different valleys separated by high mountain ridges. In our familiar three-dimensional space, saddle points are relatively rare and landscapes are dominated by valleys and mountaintops. But this intuition about landscapes fails in high dimensions.

Landscapes in high dimensions are dominated by neutral ridges that span the domain of the landscape. This key understanding about high-dimensional space actually came from the theory of evolutionary dynamics for the evolution of species. In the early days of evolutionary dynamics, there was a serious challenge to understand how genetic mutations could allow such diverse speciation to occur. If the fitness of a species were viewed as a fitness landscape, and if a highly-adapted species were viewed as a mountain peak in this landscape, then genetic mutations would need to drive the population state point into “valleys of death” that would need to be crossed to arrive at a neighboring fitness peak. It would seem that genetic mutations would likely kill off the species in the valleys before they could rise to the next equilibrium.

However, the geometry of high dimensions does not follow this simple low-dimensional intuition. As more dimensions come available, landscapes have more and more ridges of relatively constant height that span the full space (See my recent blog on random walks in 10-dimensions and my short YouTube video). For a species to move from one fitness peak to another fitness peak in a fitness landscape (in ultra-high-dimensional mutation space), all that is needed is for a genetic mutation to step the species off of the fitness peak onto a neutral ridge where many mutations can keep the species on that ridge as it moves ever farther away from the last fitness peak. Eventually, the neutral ridge brings the species near a new fitness peak where it can climb to the top, creating a new stable species. The point is that most genetic mutations are neutral — they do not impact the survivability of an individual. This is known as the neutral network theory of evolution proposed by Motoo Kimura (1924 – 1994) [5]. As these mutation accumulate, the offspring can get genetically far from the progenitor. And when a new fitness peak comes near, many of the previously neutral mutations can come together and become a positive contribution to fitness as the species climbs the new fitness peak.

The neutral network of genetic mutation was a paradigm shift in the field of evolutionary dynamics, and it also taught everyone how different random walks in high-dimensional spaces are from random walks in 3D. But although neutral networks solved the evolution problem, they become a two-edged sword in machine learning. On the positive side, fitness peaks are just like the minima of objective functions, and the ability for partial solutions to perform random walks along neutral ridges in the objective-function space allows optimal solutions to be found across a broad range of the configuration space of the neural weights. However, on the negative side, ridges are loci of unstable equilibrium. Hence there are always multiple directions that a solution state can go to minimize the objective function. Each successive run of a deep-network neural weight optimizer can find equivalent optimal solutions — but they each can be radically different. There is no hope of averaging the weights of an ensemble of networks to arrive at an “average” deep network. The averaging would simply drive all weights to zero. Certainly, the predictions of an ensemble of equivalently trained networks can be averaged—but this does not illuminate what is happening “under the hood” of the machine, which is where our own “understanding” of what the network is doing would come from.

Post-Modern Machine Learning

Post-modernism is admittedly kind of a joke — it works so hard to pull down every aspect of human endeavor that it falls victim to its own methods. The convoluted arguments made by its proponents sound like ultimate slacker talk — circuitous logic circling itself in an endless loop of denial.

But post-modernism does have its merits. It surfs on the moving crest of what passes as modernism, as modernism passes onward to its next phase. The philosophy of post-modernism moves beyond rationality in favor of a subjectivism in which cause and effect are blurred. For instance, in post-modern semiotic theory, a text or a picture is no longer an objective element of reality, but fragments into multiple semiotic versions, each one different for each different reader or watcher — a spectrum of collaborative efforts between each consumer and the artist. The reader brings with them a unique set of life experiences that interact with the text to create an entirely new experience in each reader’s mind.

Deep learning is post-modern in the sense that deterministic algorithms have disappeared. Instead of a traceable path of sequential operations, neural nets scramble information into massively-parallel strings of partial information that cross and interact nonlinearly with other massively-parallel strings. It is difficult to impossible to trace any definable part of the information from input to output. The output simply manifests some aspect of the data that was hidden from human view.

But the Age of Artificial Intelligence is not here yet. The vast multiplicity of saddle ridges in high dimensions is one of the drivers for one of the biggest pitfalls of deep learning — the need for massive amounts of training data. Because there are so many adjustable parameters in a neural network, and hence so many dimensions, a tremendous amount of training data is required to train a network to convergence. This aspect of deep learning stands in strong contrast to human children who can be shown a single picture of a bicycle next to a tricycle, and then they can achieve almost perfect classification accuracy when shown any number of photographs of different bicycles and tricycles. Humans can generalize with an amazingly small amount of data, while deep networks often need thousands of examples. This example alone points to the marked difference between human intelligence and the current state of deep learning. There is still a long road ahead.

By David D. Nolte, April 18, 2022

[1] F. Rosenblatt, “THE PERCEPTRON – A PROBABILISTIC MODEL FOR INFORMATION-STORAGE AND ORGANIZATION IN THE BRAIN,” Psychological Review, vol. 65, no. 6, pp. 386-408, (1958)

[2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “LEARNING REPRESENTATIONS BY BACK-PROPAGATING ERRORS,” Nature, vol. 323, no. 6088, pp. 533-536, Oct (1986)

[3] LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). “Gradient-based learning applied to document recognition”. Proceedings of the IEEE. 86 (11): 2278–2324.

[4] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527-1554, Jul (2006)

[5] M. Kimura, The Neutral Theory of Molecular Evolution. Cambridge University Press, 1968.

November 28, 2021February 9, 2023 by David D. Nolte

Twenty Years at Light Speed: Photonic Computing

In the epilog of my book Mind at Light Speed: A New Kind of Intelligence (Free Press, 2001), I speculated about a future computer in which sheets of light interact with others to form new meanings and logical cascades as light makes decisions in a form of all-optical intelligence.

Twenty years later, that optical computer seems vaguely quaint, not because new technology has passed it by, like looking at the naïve musings of Jules Verne from our modern vantage point, but because the optical computer seems almost as far away now as it did back in 2001.

At the the turn of the Millennium we were seeing tremendous advances in data rates on fiber optics (see my previous Blog) as well as the development of new types of nonlinear optical devices and switches that served the role of rudimentary logic switches. At that time, it was not unreasonable to believe that the pace of progress would remain undiminished, and that by 2020 we would have all-optical computers and signal processors in which the same optical data on the communication fibers would be involved in the logic that told the data what to do and where to go—all without the wasteful and slow conversion to electronics and back again into photons—the infamous OEO conversion.

However, the rate of increase of the transmission bandwidth on fiber optic cables slowed not long after the publication of my book, and nonlinear optics today still needs high intensities to be efficient, which remains a challenge for significant (commercial) use of all-optical logic.

That said, it’s dangerous to ever say never, and research into all-optical computing and data processing is still going strong (See Fig. 1). It’s not the dream that was wrong, it was the time-scale that was wrong, just like fiber-to-the-home. Back in 2001, fiber-to-the-home was viewed as a pipe-dream by serious technology scouts. It took twenty years, but now that vision is coming true in urban settings. Back in 2001, all-optical computing seemed about 20 years away, but now it still looks 20 years out. Maybe this time the prediction is right. Recent advances in all-optical processing give some hope for it. Here are some of those advances.

Fig. 1 Number of papers published by year with the phrase in the title: “All-Optical” or “Photonic or Optical and Neur*” according to Web of Science search. The term “All-optical” saturated around 2005. Papers written around optical neural networks was low to 2015 but now is experiencing a strong surge. The sociology of title choices, and how favorite buzz words shift over time, can obscure underlying causes and trends, but overall there is current strong interest in all-optical systems.

The “What” and “Why” of All-Optical Processing

One of the great dreams of photonics is the use of light beams to perform optical logic in optical processors just as electronic currents perform electronic logic in transistors and integrated circuits.

Our information age, starting with the telegraph in the mid-1800’s, has been built upon electronics because the charge of the electron makes it a natural decision maker. Two charges attract or repel by Coulomb’s Law, exerting forces upon each other. Although we don’t think of currents acting in quite that way, the foundation of electronic logic remains electrical interactions.

But with these interactions also come constraints—constraining currents to be contained within wires, waiting for charging times that slow down decisions, managing electrical resistance and dissipation that generate heat (computer processing farms in some places today need to be cooled by glacier meltwater). Electronic computing is hardly a green technology.

Therefore, the advantages of optical logic are clear: broadcasting information without the need for expensive copper wires, little dissipation or heat, low latency (signals propagate at the speed of light). Furthermore, information on the internet is already in the optical domain, so why not keep it in the optical domain and have optical information packets making the decisions? All the routing and switching decisions about where optical information packets should go could be done by the optical packets themselves inside optical computers.

But there is a problem. Photons in free space don’t interact—they pass through each other unaffected. This is the opposite of what is needed for logic and decision making. The challenge of optical logic is then to find a way to get photons to interact.

Think of the scene in Star Wars: The New Hope when Obiwan Kenobi and Darth Vader battle to the death in a light saber duel—beams of light crashing against each other and repelling each other with equal and opposite forces. This is the photonic engineer’s dream! Light controlling light. But this cannot happen in free space. On the other hand, light beams can control other light beams inside nonlinear crystals where one light beam changes the optical properties of the crystal, hence changing how another light beam travels through it. These are nonlinear optical crystals.

Nonlinear Optics

Virtually all optical control designs, for any kind of optical logic or switch, require one light beam to affect the properties of another, and that requires an intervening medium that has nonlinear optical properties. The physics of nonlinear optics is actually simple: one light beam changes the electronic structure of a material which affects the propagation of another (or even the same) beam. The key parameter is the nonlinear coefficient that determines how intense the control beam needs to be to produce a significant modulation of the other beam. This is where the challenge is. Most materials have very small nonlinear coefficients, and the intensity of the control beam usually must be very high.

Therefore, to create low-power all-optical logic gates and switches there are four main design principles: 1) increase the nonlinear susceptibility by engineering the material, 2) increase the interaction length between the two beams, 3) concentrate light into small volumes, and 4) introduce feedback to boost the internal light intensities. Let’s take these points one at a time.

Nonlinear susceptibility: The key to getting stronger interaction of light with light is in the ease with which a control beam of light can distort the crystal so that the optical conditions change for a signal beam. This is called the nonlinear susceptibility . When working with “conventional” crystals like semiconductors (e.g. CdZnSe) or rare-Earths (e.g. LiNbO₃), there is only so much engineering that is possible to try to tweak the nonlinear susceptibilities. However, artificially engineered materials can offer significant increases in nonlinear susceptibilities, these include plasmonic materials, metamaterials, organic semiconductors, photonic crystals. An increasingly important class of nonlinear optical devices are semiconductor optical amplifiers (SOA).

Interaction length: The interaction strength between two light waves is a product of the nonlinear polarization and the length over which the waves interact. Interaction lengths can be made relatively long in waveguides but can be made orders of magnitude longer in fibers. Therefore, nonlinear effects in fiber optics are a promising avenue for achieving optical logic.

Intensity Concentration: Nonlinear polarization is the product of the nonlinear susceptibility with the field amplitude of the waves. Therefore, focusing light down to small cross sections produces high power, as in the core of a fiber optic, again showing advantages of fibers for optical logic implementations.

Feedback: Feedback, as in a standing-wave cavity, increases the intensity as well as the effective interaction length by folding the light wave continually back on itself. Both of these effects boost the nonlinear interaction, but then there is an additional benefit: interferometry. Cavities, like a Fabry-Perot, are interferometers in which a slight change in the round-trip phase can produce large changes in output light intensity. This is an optical analog to a transistor in which a small control current acts as a gate for an exponential signal current. The feedback in the cavity of a semiconductor optical amplifier (SOA), with high internal intensities and long effective interaction lengths and an active medium with strong nonlinearity make these elements attractive for optical logic gates. Similarly, integrated ring resonators have the advantage of interferometric control for light switching. Many current optical switches and logic gates are based on SOAs and integrated ring resonators.

All-Optical Regeneration

The vision of the all-optical internet, where the logic operations that direct information to different locations is all performed by optical logic without ever converting into the electrical domain, is facing a barrier that is as challenging to overcome today as it was back in 2001: all-optical regeneration. All-optical regeneration has been and remains the Achilles Heal of the all-optical internet.

Signal regeneration is currently performed through OEO conversion: Optical-to-Electronic-to-Optical. In OEO conversion, a distorted signal (distortion is caused by attenuation and dispersion and noise as signals travel down fiber optics) is received by a photodetector, is interpreted as ones and zeros that drive laser light sources that launch the optical pulses down the next stretch of fiber. The new pulses are virtually perfect, but they again degrade as they travel, until they are regenerated, and so on. The added advantage of the electrical layer is that the electronic signals can be used to drive conventional electronic logic for switching.

In all-optical regeneration, on the other hand, the optical pulses need to be reamplified, reshaped and retimed––known as 3R regeneration––all by sending the signal pulses through nonlinear amplifiers and mixers, which may include short stretches of highly nonlinear fiber (HNLF) or semiconductor optical amplifiers (SOA). There have been demonstrations of 2R all-optical regeneration (reamplifying and reshaping but not retiming) at lower data rates, but getting all 3Rs at the high data rates (40 Gb/s) in the next generation telecom systems remains elusive.

Nonetheless, there is an active academic literature that is pushing the envelope on optical logical devices and regenerators [1]. Many of the systems focus on SOA’s, HNLF’s and Interferometers. Numerical modeling of these kinds of devices is currently ahead of bench-top demonstrations, primarily because of the difficulty of fabrication and device lifetime. But the numerical models point to performance that would be competitive with OEO. If this OOO conversion (Optical-to-Optical-to-Optical) is scalable (can handle increasing bit rates and increasing numbers of channels), then the current data crunch that is facing the telecom trunk lines (see my previous Blog) may be a strong driver to implement such all-optical solutions.

It is important to keep in mind that legacy technology is not static but also continues to improve. As all-optical logic and switching and regeneration make progress, OEO conversion gets incrementally faster, creating a moving target. Therefore, we will need to wait another 20 years to see whether OEO is overtaken and replaced by all-optical.

*Fig. 3 Optical-Electronic-Optical regeneration and switching compared to all-optical control. The optical control is performed using SOA’s, interferometers and nonlinear fibers.*

Photonic Neural Networks

The most exciting area of optical logic today is in analog optical computing––specifically optical neural networks and photonic neuromorphic computing [2, 3]. A neural network is a highly-connected network of nodes and links in which information is distributed across the network in much the same way that information is distributed and processed in the brain. Neural networks can take several forms––from digital neural networks that are implemented with software on conventional digital computers, to analog neural networks implemented in specialized hardware, sometimes also called neuromorphic computing systems.

Optics and photonics are well suited to the analog form of neural network because of the superior ability of light to form free-space interconnects (links) among a high number of optical modes (nodes). This essential advantage of light for photonic neural networks was first demonstrated in the mid-1980’s using recurrent neural network architectures implemented in photorefractive (nonlinear optical) crystals (see Fig. 1 for a publication timeline). But this initial period of proof-of-principle was followed by a lag of about 2 decades due to a mismatch between driver applications (like high-speed logic on an all-optical internet) and the ability to configure the highly complex interconnects needed to perform the complex computations.

Fig. 4 Optical vector-matrix multiplication. An LED array is the input vector, focused by a lens onto the spatial light modulator that is the 2D matrix. The transmitted light is refocussed by the lens onto a photodiode array with is the output vector. Free-space propagation and multiplication is a key advantage to optical implementation of computing.

The rapid rise of deep machine learning over the past 5 years has removed this bottleneck, and there has subsequently been a major increase in optical implementations of neural networks. In particular, it is now possible to use conventional deep machine learning to design the interconnects of analog optical neural networks for fixed tasks such as image recognition [4]. At first look, this seems like a non-starter, because one might ask why not use the conventional trained deep network to do the recognition itself rather than using it to create a special-purpose optical recognition system. The answer lies primarily in the metrics of latency (speed) and energy cost.

In neural computing, approximately 90% of the time and energy go into matrix multiplication operations. Deep learning algorithms driving conventional digital computers need to do the multiplications at the sequential clock rate of the computer using nested loops. Optics, on the other had, is ideally suited to perform matrix multiplications in a fully parallel manner (see Fig. 4). In addition, a hardware implementation using optics operates literally at the speed of light. The latency is limited only by the time of flight through the optical system. If the optical train is 1 meter, then the time for the complete computation is only a few nanoseconds at almost no energy dissipation. Combining the natural parallelism of light with the speed has led to unprecedented computational rates. For instance, recent implementations of photonic neural networks have demonstrated over 10 Trillion operations per second (TOPS) [5].

It is important to keep in mind that although many of these photonic neural networks are characterized as all-optical, they are generally not reconfigurable, meaning that they are not adaptive to changing or evolving training sets or changing input information. Most adaptive systems use OEO conversion with electronically-addressed spatial light modulators (SLM) that are driven by digital logic. Another technology gaining recent traction is neuromorphic photonics in which neural processing is implemented on photonic integrated circuits (PICS) with OEO conversion. The integration of large numbers of light emitting sources on PICs is now routine, relieving the OEO bottleneck as electronics and photonics merge in silicon photonics.

Farther afield are all-optical systems that are adaptive through the use of optically-addressed spatial light modulators or nonlinear materials. In fact, these types of adaptive all-optical neural networks were among the first demonstrated in the late 1980’s. More recently, advanced adaptive optical materials, as well as fiber delay lines for a type of recurrent neural network known as reservoir computing, have been used to implement faster and more efficient optical nonlinearities needed for adaptive updates of neural weights. But there are still years to go before light is adaptively controlling light entirely in the optical domain at the speeds and with the flexibility needed for real-world applications like photonic packet switching in telecom fiber-optic routers.

In stark contrast to the status of classical all-optical computing, photonic quantum computing is on the cusp of revolutionizing the field of quantum information science. The recent demonstration from the Canadian company Xanadu of a programmable photonic quantum computer that operates at room temperature may be the harbinger of what is to come in the third generation Machines of Light: Quantum Optical Computers, which is the topic of my next blog.

By David D. Nolte, Nov. 28, 2021

Second Edition of Introduction to Modern Dynamics (Chaos, Networks, Space and Time)

The second edition of Introduction to Modern Dynamics: Chaos, Networks, Space and Time is available from Oxford University Press and Amazon.

Most physics majors will use modern dynamics in their careers: nonlinearity, chaos, network theory, econophysics, game theory, neural nets, geodesic geometry, among many others.

The first edition of Introduction to Modern Dynamics (IMD) was an upper-division junior-level mechanics textbook at the level of Thornton and Marion (Classical Dynamics of Particles and Systems) and Taylor (Classical Mechanics). IMD helped lead an emerging trend in physics education to update the undergraduate physics curriculum. Conventional junior-level mechanics courses emphasized Lagrangian and Hamiltonian physics, but notably missing from the classic subjects are modern dynamics topics that most physics majors will use in their careers: nonlinearity, chaos, network theory, econophysics, game theory, neural nets, geodesic geometry, among many others. These are the topics at the forefront of physics that drive high-tech businesses and start-ups, which is where more than half of all physicists work. IMD introduced these modern topics to junior-level physics majors in an accessible form that allowed them to master the fundamentals to prepare them for the modern world.

The second edition (IMD2) continues that trend by expanding the chapters to include additional material and topics. It rearranges several of the introductory chapters for improved logical flow and expands them to include key conventional topics that were missing in the first edition (e.g., Lagrange undetermined multipliers and expanded examples of Lagrangian applications). It is also an opportunity to correct several typographical errors and other errata that students have identified over the past several years. The second edition also has expanded homework problems.

The goal of IMD2 is to strengthen the sections on conventional topics (that students need to master to take their GREs) to make IMD2 attractive as a mainstream physics textbook for broader adoption at the junior level, while continuing the program of updating the topics and approaches that are relevant for the roles that physicists play in the 21^st century.

(New Chapters and Sections highlighted in red.)

New Features in Second Edition:

Second Edition Chapters and Sections

Part 1 Geometric Mechanics

• Expanded development of Lagrangian dynamics

• Lagrange multipliers

• More examples of applications

• Connection to statistical mechanics through the virial theorem

• Greater emphasis on action-angle variables

• The key role of adiabatic invariants

Part 1 Geometric Mechanics

Chapter 1 Physics and Geometry

1.1 State space and dynamical flows

1.2 Coordinate representations

1.3 Coordinate transformation

1.4 Uniformly rotating frames

1.5 Rigid-body motion

Chapter 2 Lagrangian Mechanics

2.1 Calculus of variations

2.2 Lagrangian applications

2.3 Lagrange’s undetermined multipliers

2.4 Conservation laws

2.5 Central force motion

2.6 Virial Theorem

Chapter 3 Hamiltonian Dynamics and Phase Space

3.1 The Hamiltonian function

3.2 Phase space

3.3 Integrable systems and action–angle variables

3.4 Adiabatic invariants

Part 2 Nonlinear Dynamics

• New section on non-autonomous dynamics

• Entire new chapter devoted to Hamiltonian mechanics

• Added importance to Chirikov standard map

• The important KAM theory of “constrained chaos” and solar system stability

• Degeneracy in Hamiltonian chaos

• A short overview of quantum chaos

• Rational resonances and the relation to KAM theory

• Synchronized chaos

Part 2 Nonlinear Dynamics

Chapter 4 Nonlinear Dynamics and Chaos

4.1 One-variable dynamical systems

4.2 Two-variable dynamical systems

4.3 Limit cycles

4.4 Discrete iterative maps

4.5 Three-dimensional state space and chaos

4.6 Non-autonomous (driven) flows

4.7 Fractals and strange attractors

Chapter 5 Hamiltonian Chaos

5.1 Perturbed Hamiltonian systems

5.2 Nonintegrable Hamiltonian systems

5.3 The Chirikov Standard Map

5.4 KAM Theory

5.5 Degeneracy and the web map

5.6 Quantum chaos

Chapter 6 Coupled Oscillators and Synchronization

6.1 Coupled linear oscillators

6.2 Simple models of synchronization

6.3 Rational resonances

6.4 External synchronization

6.5 Synchronization of Chaos

Part 3 Complex Systems

• New emphasis on diffusion on networks

• Epidemic growth on networks

• A new section of game theory in the context of evolutionary dynamics

• A new section on general equilibrium theory in economics

Part 3 Complex Systems

Chapter 7 Network Dynamics

7.1 Network structures

7.2 Random network topologies

7.3 Synchronization on networks

7.4 Diffusion on networks

7.5 Epidemics on networks

Chapter 8 Evolutionary Dynamics

81 Population dynamics

8.2 Virus infection and immune deficiency

8.3 Replicator Dynamics

8.4 Quasi-species

8.5 Game theory and evolutionary stable solutions

Chapter 9 Neurodynamics and Neural Networks

9.1 Neuron structure and function

9.2 Neuron dynamics

9.3 Network nodes: artificial neurons

9.4 Neural network architectures

9.5 Hopfield neural network

9.6 Content-addressable (associative) memory

Chapter 10 Economic Dynamics

10.1 Microeconomics and equilibrium

10.2 Macroeconomics

10.3 Business cycles

10.4 Random walks and stock prices (optional)

Part 4 Relativity and Space–Time

• Relativistic trajectories

• Gravitational waves

Part 4 Relativity and Space–Time

Chapter 11 Metric Spaces and Geodesic Motion

11.1 Manifolds and metric tensors

11.2 Derivative of a tensor

11.3 Geodesic curves in configuration space

11.4 Geodesic motion

Chapter 12 Relativistic Dynamics

12.1 The special theory

12.2 Lorentz transformations

12.3 Metric structure of Minkowski space

12.4 Relativistic trajectories

12.5 Relativistic dynamics

12.6 Linearly accelerating frames (relativistic)

Chapter 13 The General Theory of Relativity and Gravitation

13.1 Riemann curvature tensor

13.2 The Newtonian correspondence

13.3 Einstein’s field equations

13.4 Schwarzschild space–time

13.5 Kinematic consequences of gravity

13.6 The deflection of light by gravity

13.7 The precession of Mercury’s perihelion

13.8 Orbits near a black hole

13.9 Gravitational waves

Synopsis of 2^nd Ed. Chapters

Chapter 1. Physics and Geometry (Sample Chapter)

This chapter has been rearranged relative to the 1^st edition to provide a more logical flow of the overarching concepts of geometric mechanics that guide the subsequent chapters. The central role of coordinate transformations is strengthened, as is the material on rigid-body motion with expanded examples.

Chapter 2. Lagrangian Mechanics (Sample Chapter)

Much of the structure and material is retained from the 1^st edition while adding two important sections. The section on applications of Lagrangian mechanics adds many direct examples of the use of Lagrange’s equations of motion. An additional new section covers the important topic of Lagrange’s undetermined multipliers

Chapter 3. Hamiltonian Dynamics and Phase Space (Sample Chapter)

The importance of Hamiltonian systems and dynamics merits a stand-alone chapter. The topics from the 1^st edition are expanded in this new chapter, including a new section on adiabatic invariants that plays an important role in the development of quantum theory. Some topics are de-emphasized from the 1^st edition, such as general canonical transformations and the symplectic structure of phase space, although the specific transformation to action-angle coordinates is retained and amplified.

Chapter 4. Nonlinear Dynamics and Chaos

The first part of this chapter is retained from the 1^st edition with numerous minor corrections and updates of figures. The second part of the IMD 1^st edition, treating Hamiltonian chaos, will be expanded into the new Chapter 5.

Chapter 5. Hamiltonian Chaos

This new stand-alone chapter expands on the last half of Chapter 3 of the IMD 1^st edition. The physical character of Hamiltonian chaos is substantially distinct from dissipative chaos that it deserves its own chapter. It is also a central topic of interest for complex systems that are either conservative or that have integral invariants, such as our N-body solar system that played such an important role in the history of chaos theory beginning with Poincaré. The new chapter highlights Poincaré’s homoclinic tangle, illustrated by the Chirikov Standard Map. The Standard Map is an excellent introduction to KAM theory, which is one of the crowning achievements of the theory of dynamical systems by Komogorov, Arnold and Moser, connecting to deeper aspects of synchronization and rational resonances that drive the structure of systems as diverse as the rotation of the Moon and the rings of Saturn. This is also a perfect lead-in to the next chapter on synchronization. An optional section at the end of this chapter briefly discusses quantum chaos to show how Hamiltonian chaos can be extended into the quantum regime.

Chapter 6. Synchronization

This is an updated version of the IMD 1^st ed. chapter. It has a reduced initial section on coupled linear oscillators, retaining the key ideas about linear eigenmodes but removing some irrelevant details in the 1^st edition. A new section is added that defines and emphasizes the importance of quasi-periodicity. A new section on the synchronization of chaotic oscillators is added.

Chapter 7. Network Dynamics

This chapter rearranges the structure of the chapter from the 1^st edition, moving synchronization on networks earlier to connect from the previous chapter. The section on diffusion and epidemics is moved to the back of the chapter and expanded in the 2^nd edition into two separate sections on these topics, adding new material on discrete matrix approaches to continuous dynamics.

Chapter 8. Neurodynamics and Neural Networks

This chapter is retained from the 1^st edition with numerous minor corrections and updates of figures.

Chapter 9. Evolutionary Dynamics

Two new sections are added to this chapter. A section on game theory and evolutionary stable solutions introduces core concepts of evolutionary dynamics that merge well with the other topics of the chapter such as the pay-off matrix and replicator dynamics. A new section on nearly neutral networks introduces new types of behavior that occur in high-dimensional spaces which are counter intuitive but important for understanding evolutionary drift.

Chapter 10. Economic Dynamics

This chapter will be significantly updated relative to the 1^st edition. Most of the sections will be rewritten with improved examples and figures. Three new sections will be added. The 1^st edition section on consumer market competition will be split into two new sections describing the Cournot duopoly and Pareto optimality in one section, and Walras’ Law and general equilibrium theory in another section. The concept of the Pareto frontier in economics is becoming an important part of biophysical approaches to population dynamics. In addition, new trends in economics are drawing from general equilibrium theory, first introduced by Walras in the nineteenth century, but now merging with modern ideas of fixed points and stable and unstable manifolds. A third new section is added on econophysics, highlighting the distinctions that contrast economic dynamics (phase space dynamical approaches to economics) from the emerging field of econophysics (statistical mechanics approaches to economics).

Chapter 11. Metric Spaces and Geodesic Motion

This chapter is retained from the 1^st edition with several minor corrections and updates of figures.

Chapter 12. Relativistic Dynamics

This chapter is retained from the 1^st edition with minor corrections and updates of figures. More examples will be added, such as invariant mass reconstruction. The connection between relativistic acceleration and Einstein’s equivalence principle will be strengthened.

Chapter 13. The General Theory of Relativity and Gravitation

This chapter is retained from the 1^st edition with minor corrections and updates of figures. A new section will derive the properties of gravitational waves, given the spectacular success of LIGO and the new field of gravitational astronomy.

Homework Problems:

All chapters will have expanded and updated homework problems. Many of the homework problems from the 1^st edition will remain, but the number of problems at the end of each chapter will be nearly doubled, while removing some of the less interesting or problematic problems.

Bibliography

D. D. Nolte, Introduction to Modern Dynamics: Chaos, Networks, Space and Time, 2nd Ed. (Oxford University Press, 2019)

November 19, 2018August 4, 2020 by David D. Nolte

Ramon y Cajal (1888): Visualizing Neurons

McCulloch and Pitts (1943): Mathematical Models

Donald Hebb (1949): Hebbian Learning

Hodgkin and Huxley (1952): Neuron Transporter Models

Frank Rosenblatt (1958): The Perceptron

Richard Fitzhugh and Jin-Ichi Nagumo (1961): Neural van der Pol Oscillators

John Hopfield (1982): Spin Glasses and Recurrent Networks

The Parallel Distributed Processing Group (1986): Backpropagation

Yann LeCun (1989): Convolutional Neural Networks

Geoff Hinton (2006): Deep Belief

Further Reading

References

From Multivariate Analysis to Deep Learning

The Challenges of Deep Learning

The Potential of Deep Learning

The Pitfalls of Deep Learning

Post-Modern Machine Learning

The “What” and “Why” of All-Optical Processing

Nonlinear Optics

All-Optical Regeneration

Photonic Neural Networks

Further Reading

New Features in Second Edition:

Second Edition Chapters and Sections

Synopsis of 2nd Ed. Chapters

Bibliography

1) Phase Space

2) Metric Space

3) Invariants

4) Chaos theory

5) Synchronization

6) Network Dynamics

7) Neural Networks

8) Evolutionary Dynamics

9) Economic Dynamics

10) Relativity

Introduction to Modern Dynamics

Synopsis of 2^nd Ed. Chapters