In one interpretation of quantum physics, when you snap your fingers, the trajectory you are riding through reality fragments into a cascade of alternative universes—one for each possible quantum outcome among all the different quantum states composing the molecules of your fingers.
This is the Many-Worlds Interpretation (MWI) of quantum physics first proposed rigorously by Hugh Everett in his doctoral thesis in 1957 under the supervision of John Wheeler at Princeton University. Everett had been drawn to this interpretation when he found inconsistencies between quantum physics and gravitation—topics which were supposed to have been his actual thesis topic. But his side-trip into quantum philosophy turned out to be a one-way trip. The reception of his theory was so hostile, no less than from Copenhagen and Bohr himself, that Everett left physics and spent a career at the Pentagon.
Resurrecting MWI in the Name of Quantum Information
Fast forward by 20 years, after Wheeler had left Princeton for the University of Texas at Austin, and once again a young physicist was struggling to reconcile quantum physics with gravity. Once again the many worlds interpretation of quantum physics seemed the only sane way out of the dilemma, and once again a side-trip became a life-long obsession.
David Deutsch, visiting Wheeler in the early 1980’s, became convinced that the many worlds interpretation of quantum physics held the key to paradoxes in the theory of quantum information. He was so convinced, that he began a quest to find a physical system that operated on more information than could be present in one universe at a time. If such a physical system existed, it would be because streams of information from more than one universe were coming together and combining in a way that allowed one of the universes to “borrow” the information from the other.
It took only a year or two before Deutsch found what he was looking for—a simple quantum algorithm that yielded twice as much information as would be possible if there were no parallel universes. This is the now-famous Deutsch algorithm—the first quantum algorithm . At the heart of the Deutsch algorithm is a simple quantum interference. The algorithm did nothing useful—but it convinced Deutsch that two universes were interfering coherently in the measurement process, giving that extra bit of information that should not have been there otherwise. A few years later, the Deutsch-Josza algorithm  expanded the argument to interfere an exponentially larger amount of information streams from an exponentially larger number of universes to create a result that was exponentially larger than any classical computer could produce. This marked the beginning of the quest for the quantum computer that is running red-hot today.
Deutsch’s “proof” of the many-worlds interpretation of quantum mechanics is not a mathematical proof but is rather a philosophical proof. It holds no sway over how physicists do the math to make their predictions. The Copenhagen interpretation, with its “spooky” instantaneous wavefunction collapse, works just fine predicting the outcome of quantum algorithms and the exponential quantum advantage of quantum computing. Therefore, the story of David Deutsch and the MWI may seem like a chimera—except for one fact—it inspired him to generate the first quantum algorithm that launched what may be the next revolution in the information revolution of modern society. Inspiration is important in science, because it lets scientists create things that had been impossible before.
But if quantum interference is the heart of quantum computing, then there is one physical system that has the ultimate simplicity that may yet inspire future generations of physicists to invent future impossible things—the quantum beam splitter. Nothing in the study of quantum interference can be simpler than a sliver of dielectric material sending single photons one way or another. Yet the outcome of this simple system challenges the mind and reminds us of why Everett and Deutsch embraced the MWI in the first place.
The Classical Beam Splitter
The so-called “beam splitter” is actually a misnomer. Its name implies that it takes a light beam and splits it into two, as if there is only one input. But every “beam splitter” has two inputs, which is clear by looking at the classical 50/50 beam splitter shown in Fig. 1. The actual action of the optical element is the combination of beams into superpositions in each of the outputs. It is only when one of the input fields is zero, a special case, that the optical element acts as a beam splitter. In general, it is a beam combiner.
Given two input fields, the output fields are superpositions of the inputs
The square-root of two factor ensures that energy is conserved, because optical fluence is the square of the fields. This relation is expressed more succinctly as a matrix input-output relation
The phase factors in these equations ensure that the matrix is unitary
reflecting energy conservation.
The Quantum Beam Splitter
A quantum beam splitter is just a classical beam splitter operating at the level of individual photons. Rather than describing single photons entering or leaving the beam splitter, it is more practical to describe the properties of the fields through single-photon quantum operators
where the unitary matrix is the same as the classical case, but with fields replaced by the famous “a” operators. The photon operators operate on single photon modes. For instance, the two one-photon input cases are
where the creation operators operate on the vacuum state in each of the input modes.
The fundamental combinational properties of the beam splitter are even more evident in the quantum case, because there is no such thing as a single input to a quantum beam splitter. Even if no photons are directed into one of the input ports, that port still receives a “vacuum” input, and this vacuum input contributes to the fluctuations observed in the outputs.
The input-output relations for the quantum beam splitter are
The beam splitter operating on a one-photon input converts the input-mode creation operator into a superposition of out-mode creation operators that generates
The resulting output is entangled: either the single photon exits one port, or it exits the other. In the many worlds interpretation, the photon exits from one port in one universe, and it exits from the other port in a different universe. On the other hand, in the Copenhagen interpretation, the two output ports of the beam splitter are perfectly anti-correlated.
The Hong-Ou-Mandel (HOM) Interferometer
When more than one photon is incident on a beam splitter, the fascinating effects of quantum interference come into play, creating unexpected outputs for simple inputs. For instance, the simplest example is a two photon input where a single photon is present in each input port of the beam splitter. The input state is represented with single creation operators operating on each vacuum state of each input port
creating a single photon in each of the input ports. The beam splitter operates on this input state by converting the input-mode creation operators into out-put mode creation operators to give
The important step in this process is the middle line of the equations: There is perfect destructive interference between the two single-photon operations. Therefore, both photons always exit the beam splitter from the same port—never split. Furthermore, the output is an entangled two-photon state, once more splitting universes.
The two-photon interference experiment was performed in 1987 by Chung Ki Hong and Jeff Ou, students of Leonard Mandel at the Optics Institute at the University of Rochester , and this two-photon operation of the beam splitter is now called the HOM interferometer. The HOM interferometer has become a center-piece for optical and photonic implementations of quantum information processing and quantum computers.
N-Photons on a Beam Splitter
Of course, any number of photons can be input into a beam splitter. For example, take the N-photon input state
The beam splitter acting on this state produces
The quantity on the right hand side can be re-expressed using the binomial theorem
where the permutations are defined by the binomial coefficient
The output state is given by
which is a “super” entangled state composed of N multi-photon states, involving N different universes.
Surprisingly, there is a multi-photon input state that generates a non-entangled output—as if the input states were simply classical fields. These are the so-called coherent states, introduced by Glauber and Sudarshan [4, 5]. Coherent states can be described as superpositions of multi-photon states, but when a beam splitter operates on these superpositions, the outputs are simply 50/50 mixtures of the states. For instance, if the input scoherent tates are denoted by a and b, then the output states after the beam splitter are
This output is factorized and hence is NOT entangled. This is one of the many reasons why coherent states in quantum optics are considered the “most classical” of quantum states. In this case, a quantum beam splitter operates on the inputs just as if they were classical fields.
 D. Deutsch, “Quantum-theory, the church-turing principle and the universal quantum computer,” Proceedings of the Royal Society of London Series a-Mathematical Physical and Engineering Sciences, vol. 400, no. 1818, pp. 97-117, (1985)
 D. Deutsch and R. Jozsa, “Rapid solution of problems by quantum computation,” Proceedings of the Royal Society of London Series a-Mathematical Physical and Engineering Sciences, vol. 439, no. 1907, pp. 553-558, Dec (1992)
 C. K. Hong, Z. Y. Ou, and L. Mandel, “Measurement of subpicosecond time intervals between 2 photons by interference,” Physical Review Letters, vol. 59, no. 18, pp. 2044-2046, Nov (1987)
 Glauber, R. J. (1963). “Photon Correlations.” Physical Review Letters 10(3): 84.
 Sudarshan, E. C. G. (1963). “Equivalence of semiclassical and quantum mechanical descriptions of statistical light beams.” Physical Review Letters 10(7): 277-&.; Mehta, C. L. and E. C. Sudarshan (1965). “Relation between quantum and semiclassical description of optical coherence.” Physical Review 138(1B): B274.
If you are a fan of the Doppler effect, then time trials at the Indy 500 Speedway will floor you. Even if you have experienced the fall in pitch of a passing train whistle while stopped in your car at a railroad crossing, or heard the falling whine of a jet passing overhead, I can guarantee that you have never heard anything like an Indy car passing you by at 225 miles an hour.
Indy 500 Time Trials and the Doppler Effect
The Indy 500 time trials are the best way to experience the effect, rather than on race day when there is so much crowd noise and the overlapping sounds of all the cars. During the week before the race, the cars go out on the track, one by one, in time trials to decide the starting order in the pack on race day. Fans are allowed to wander around the entire complex, so you can get right up to the fence at track level on the straight-away. The cars go by only thirty feet away, so they are coming almost straight at you as they approach and straight away from you as they leave. The whine of the car as it approaches is 43% higher than when it is standing still, and it drops to 33% lower than the standing frequency—a ratio almost approaching a factor of two. And they go past so fast, it is almost a step function, going from a steady high note to a steady low note in less than a second. That is the Doppler effect!
But as obvious as the acoustic Doppler effect is to us today, it was far from obvious when it was proposed in 1842 by Christian Doppler at a time when trains, the fastest mode of transport at the time, ran at 20 miles per hour or less. In fact, Doppler’s theory generated so much controversy that the Academy of Sciences of Vienna held a trial in 1853 to decide its merit—and Doppler lost! For the surprising story of Doppler and the fate of his discovery, see my Physics Today article.
From that fraught beginning, the effect has expanded in such importance, that today it is a daily part of our lives. From Doppler weather radar, to speed traps on the highway, to ultrasound images of babies—Doppler is everywhere.
Development of the Doppler-Fizeau Effect
When Doppler proposed the shift in color of the light from stars in 1842 , depending on their motion towards or away from us, he may have been inspired by his walk to work every morning, watching the ripples on the surface of the Vltava River in Prague as the water slipped by the bridge piers. The drawings in his early papers look reminiscently like the patterns you see with compressed ripples on the upstream side of the pier and stretched out on the downstream side. Taking this principle to the night sky, Doppler envisioned that binary stars, where one companion was blue and the other was red, was caused by their relative motion. He could not have known at that time that typical binary star speeds were too small to cause this effect, but his principle was far more general, applying to all wave phenomena.
Six years later in 1848 , the French physicist Armand Hippolyte Fizeau, soon to be famous for making the first direct measurement of the speed of light, proposed the same principle, unaware of Doppler’s publications in German. As Fizeau was preparing his famous measurement, he originally worked with a spinning mirror (he would ultimately use a toothed wheel instead) and was thinking about what effect the moving mirror might have on the reflected light. He considered the effect of star motion on starlight, just as Doppler had, but realized that it was more likely that the speed of the star would affect the locations of the spectral lines rather than change the color. This is in fact the correct argument, because a Doppler shift on the black-body spectrum of a white or yellow star shifts a bit of the infrared into the visible red portion, while shifting a bit of the ultraviolet out of the visible, so that the overall color of the star remains the same, but Fraunhofer lines would shift in the process. Because of the independent development of the phenomenon by both Doppler and Fizeau, and because Fizeau was a bit clearer in the consequences, the effect is more accurately called the Doppler-Fizeau Effect, and in France sometimes only as the Fizeau Effect. Here in the US, we tend to forget the contributions of Fizeau, and it is all Doppler.
Doppler and Exoplanet Discovery
It is fitting that many of today’s applications of the Doppler effect are in astronomy. His original idea on binary star colors was wrong, but his idea that relative motion changes frequencies was right, and it has become one of the most powerful astrometric techniques in astronomy today. One of its important recent applications was in the discovery of extrasolar planets orbiting distant stars.
When a large planet like Jupiter orbits a star, the center of mass of the two-body system remains at a constant point, but the individual centers of mass of the planet and the star both orbit the common point. This makes it look like the star has a wobble, first moving towards our viewpoint on Earth, then moving away. Because of this relative motion of the star, the light can appear blueshifted caused by the Doppler effect, then redshifted with a set periodicity. This was observed by Queloz and Mayer in 1995 for the star 51 Pegasi, which represented the first detection of an exoplanet . The duo won the Nobel Prize in 2019 for the discovery.
Doppler and Vera Rubins’ Galaxy Velocity Curves
In the late 1960’s and early 1970’s Vera Rubin at the Carnegie Institution of Washington used newly developed spectrographs to use the Doppler effect to study the speeds of ionized hydrogen gas surrounding massive stars in individual galaxies . From simple Newtonian dynamics it is well understood that the speed of stars as a function of distance from the galactic center should increase with increasing distance up to the average radius of the galaxy, and then should decrease at larger distances. This trend in speed as a function of radius is called a rotation curve. As Rubin constructed the rotation curves for many galaxies, the increase of speed with increasing radius at small radii emerged as a clear trend, but the stars farther out in the galaxies were all moving far too fast. In fact, they are moving so fast that they exceeded escape velocity and should have flown off into space long ago. This disturbing pattern was repeated consistently in one rotation curve after another for many galaxies.
A simple fix to the problem of the rotation curves is to assume that there is significant mass present in every galaxy that is not observable either as luminous matter or as interstellar dust. In other words, there is unobserved matter, dark matter, in all galaxies that keeps all their stars gravitationally bound. Estimates of the amount of dark matter needed to fix the velocity curves is about five times as much dark matter as observable matter. In short, 80% of the mass of a galaxy is not normal. It is neither a perturbation nor an artifact, but something fundamental and large. The discovery of the rotation curve anomaly by Rubin using the Doppler effect stands as one of the strongest evidence for the existence of dark matter.
There is so much dark matter in the Universe that it must have a major effect on the overall curvature of space-time according to Einstein’s field equations. One of the best probes of the large-scale structure of the Universe is the afterglow of the Big Bang, known as the cosmic microwave background (CMB).
Doppler and the Big Bang
The Big Bang was astronomically hot, but as the Universe expanded it cooled. About 380,000 years after the Big Bang, the Universe cooled sufficiently that the electron-proton plasma that filled space at that time condensed into hydrogen. Plasma is charged and opaque to photons, while hydrogen is neutral and transparent. Therefore, when the hydrogen condensed, the thermal photons suddenly flew free and have continued unimpeded, continuing to cool. Today the thermal glow has reached about three degrees above absolute zero. Photons in thermal equilibrium with this low temperature have an average wavelength of a few millimeters corresponding to microwave frequencies, which is why the afterglow of the Big Bang got its name: the Cosmic Microwave Background (CMB).
Not surprisingly, the CMB has no preferred reference frame, because every point in space is expanding relative to every other point in space. In other words, space itself is expanding. Yet soon after the CMB was discovered by Arno Penzias and Robert Wilson (for which they were awarded the Nobel Prize in Physics in 1978), an anisotropy was discovered in the background that had a dipole symmetry caused by the Doppler effect as the Solar System moves at 368±2 km/sec relative to the rest frame of the CMB. Our direction is towards galactic longitude 263.85o and latitude 48.25o, or a bit southwest of Virgo. Interestingly, the local group of about 100 galaxies, of which the Milky Way and Andromeda are the largest members, is moving at 627±22 km/sec in the direction of galactic longitude 276o and latitude 30o. Therefore, it seems like we are a bit slack in our speed compared to the rest of the local group. This is in part because we are being pulled towards Andromeda in roughly the opposite direction, but also because of the speed of the solar system in our Galaxy.
Aside from the dipole anisotropy, the CMB is amazingly uniform when viewed from any direction in space, but not perfectly uniform. At the level of 0.005 percent, there are variations in the temperature depending on the location on the sky. These fluctuations in background temperature are called the CMB anisotropy, and they help interpret current models of the Universe. For instance, the average angular size of the fluctuations is related to the overall curvature of the Universe. This is because, in the early Universe, not all parts of it were in communication with each other. This set an original spatial size to thermal discrepancies. As the Universe continued to expand, the size of the regional variations expanded with it, and the sizes observed today would appear larger or smaller, depending on how the universe is curved. Therefore, to measure the energy density of the Universe, and hence to find its curvature, required measurements of the CMB temperature that were accurate to better than a part in 10,000.
Equivalently, parts of the early universe had greater mass density than others, causing the gravitational infall of matter towards these regions. Then, through the Doppler effect, light emitted (or scattered) by matter moving towards these regions contributes to the anisotropy. They contribute what are known as “Doppler peaks” in the spatial frequency spectrum of the CMB anisotropy.
The examples discussed in this blog (exoplanet discovery, galaxy rotation curves, and cosmic background) are just a small sampling of the many ways that the Doppler effect is used in Astronomy. But clearly, Doppler has played a key role in the long history of the universe.
 C. A. DOPPLER, “Über das farbige Licht der Doppelsterne und einiger anderer Gestirne des Himmels (About the coloured light of the binary stars and some other stars of the heavens),” Proceedings of the Royal Bohemian Society of Sciences, vol. V, no. 2, pp. 465–482, (Reissued 1903) (1842)
 H. Fizeau, “Acoustique et optique,” presented at the Société Philomathique de Paris, Paris, 1848.
 M. Mayor and D. Queloz, “A JUPITER-MASS COMPANION TO A SOLAR-TYPE STAR,” Nature, vol. 378, no. 6555, pp. 355-359, Nov (1995)
 Rubin, Vera; Ford, Jr., W. Kent (1970). “Rotation of the Andromeda Nebula from a Spectroscopic Survey of Emission Regions”. The Astrophysical Journal. 159: 379
In the epilog of my book Mind at Light Speed: A New Kind of Intelligence (Free Press, 2001), I speculated about a future computer in which sheets of light interact with others to form new meanings and logical cascades as light makes decisions in a form of all-optical intelligence.
Twenty years later, that optical computer seems vaguely quaint, not because new technology has passed it by, like looking at the naïve musings of Jules Verne from our modern vantage point, but because the optical computer seems almost as far away now as it did back in 2001.
At the the turn of the Millennium we were seeing tremendous advances in data rates on fiber optics (see my previous Blog) as well as the development of new types of nonlinear optical devices and switches that served the role of rudimentary logic switches. At that time, it was not unreasonable to believe that the pace of progress would remain undiminished, and that by 2020 we would have all-optical computers and signal processors in which the same optical data on the communication fibers would be involved in the logic that told the data what to do and where to go—all without the wasteful and slow conversion to electronics and back again into photons—the infamous OEO conversion.
However, the rate of increase of the transmission bandwidth on fiber optic cables slowed not long after the publication of my book, and nonlinear optics today still needs high intensities to be efficient, which remains a challenge for significant (commercial) use of all-optical logic.
That said, it’s dangerous to ever say never, and research into all-optical computing and data processing is still going strong (See Fig. 1). It’s not the dream that was wrong, it was the time-scale that was wrong, just like fiber-to-the-home. Back in 2001, fiber-to-the-home was viewed as a pipe-dream by serious technology scouts. It took twenty years, but now that vision is coming true in urban settings. Back in 2001, all-optical computing seemed about 20 years away, but now it still looks 20 years out. Maybe this time the prediction is right. Recent advances in all-optical processing give some hope for it. Here are some of those advances.
The “What” and “Why” of All-Optical Processing
One of the great dreams of photonics is the use of light beams to perform optical logic in optical processors just as electronic currents perform electronic logic in transistors and integrated circuits.
Our information age, starting with the telegraph in the mid-1800’s, has been built upon electronics because the charge of the electron makes it a natural decision maker. Two charges attract or repel by Coulomb’s Law, exerting forces upon each other. Although we don’t think of currents acting in quite that way, the foundation of electronic logic remains electrical interactions.
But with these interactions also come constraints—constraining currents to be contained within wires, waiting for charging times that slow down decisions, managing electrical resistance and dissipation that generate heat (computer processing farms in some places today need to be cooled by glacier meltwater). Electronic computing is hardly a green technology.
Therefore, the advantages of optical logic are clear: broadcasting information without the need for expensive copper wires, little dissipation or heat, low latency (signals propagate at the speed of light). Furthermore, information on the internet is already in the optical domain, so why not keep it in the optical domain and have optical information packets making the decisions? All the routing and switching decisions about where optical information packets should go could be done by the optical packets themselves inside optical computers.
But there is a problem. Photons in free space don’t interact—they pass through each other unaffected. This is the opposite of what is needed for logic and decision making. The challenge of optical logic is then to find a way to get photons to interact.
Think of the scene in Star Wars: The New Hope when Obiwan Kenobi and Darth Vader battle to the death in a light saber duel—beams of light crashing against each other and repelling each other with equal and opposite forces. This is the photonic engineer’s dream! Light controlling light. But this cannot happen in free space. On the other hand, light beams can control other light beams inside nonlinear crystals where one light beam changes the optical properties of the crystal, hence changing how another light beam travels through it. These are nonlinear optical crystals.
Virtually all optical control designs, for any kind of optical logic or switch, require one light beam to affect the properties of another, and that requires an intervening medium that has nonlinear optical properties. The physics of nonlinear optics is actually simple: one light beam changes the electronic structure of a material which affects the propagation of another (or even the same) beam. The key parameter is the nonlinear coefficient that determines how intense the control beam needs to be to produce a significant modulation of the other beam. This is where the challenge is. Most materials have very small nonlinear coefficients, and the intensity of the control beam usually must be very high.
Therefore, to create low-power all-optical logic gates and switches there are four main design principles: 1) increase the nonlinear susceptibility by engineering the material, 2) increase the interaction length between the two beams, 3) concentrate light into small volumes, and 4) introduce feedback to boost the internal light intensities. Let’s take these points one at a time.
Nonlinear susceptibility: The key to getting stronger interaction of light with light is in the ease with which a control beam of light can distort the crystal so that the optical conditions change for a signal beam. This is called the nonlinear susceptibility . When working with “conventional” crystals like semiconductors (e.g. CdZnSe) or rare-Earths (e.g. LiNbO3), there is only so much engineering that is possible to try to tweak the nonlinear susceptibilities. However, artificially engineered materials can offer significant increases in nonlinear susceptibilities, these include plasmonic materials, metamaterials, organic semiconductors, photonic crystals. An increasingly important class of nonlinear optical devices are semiconductor optical amplifiers (SOA).
Interaction length: The interaction strength between two light waves is a product of the nonlinear polarization and the length over which the waves interact. Interaction lengths can be made relatively long in waveguides but can be made orders of magnitude longer in fibers. Therefore, nonlinear effects in fiber optics are a promising avenue for achieving optical logic.
Intensity Concentration: Nonlinear polarization is the product of the nonlinear susceptibility with the field amplitude of the waves. Therefore, focusing light down to small cross sections produces high power, as in the core of a fiber optic, again showing advantages of fibers for optical logic implementations.
Feedback: Feedback, as in a standing-wave cavity, increases the intensity as well as the effective interaction length by folding the light wave continually back on itself. Both of these effects boost the nonlinear interaction, but then there is an additional benefit: interferometry. Cavities, like a Fabry-Perot, are interferometers in which a slight change in the round-trip phase can produce large changes in output light intensity. This is an optical analog to a transistor in which a small control current acts as a gate for an exponential signal current. The feedback in the cavity of a semiconductor optical amplifier (SOA), with high internal intensities and long effective interaction lengths and an active medium with strong nonlinearity make these elements attractive for optical logic gates. Similarly, integrated ring resonators have the advantage of interferometric control for light switching. Many current optical switches and logic gates are based on SOAs and integrated ring resonators.
The vision of the all-optical internet, where the logic operations that direct information to different locations is all performed by optical logic without ever converting into the electrical domain, is facing a barrier that is as challenging to overcome today as it was back in 2001: all-optical regeneration. All-optical regeneration has been and remains the Achilles Heal of the all-optical internet.
Signal regeneration is currently performed through OEO conversion: Optical-to-Electronic-to-Optical. In OEO conversion, a distorted signal (distortion is caused by attenuation and dispersion and noise as signals travel down fiber optics) is received by a photodetector, is interpreted as ones and zeros that drive laser light sources that launch the optical pulses down the next stretch of fiber. The new pulses are virtually perfect, but they again degrade as they travel, until they are regenerated, and so on. The added advantage of the electrical layer is that the electronic signals can be used to drive conventional electronic logic for switching.
In all-optical regeneration, on the other hand, the optical pulses need to be reamplified, reshaped and retimed––known as 3R regeneration––all by sending the signal pulses through nonlinear amplifiers and mixers, which may include short stretches of highly nonlinear fiber (HNLF) or semiconductor optical amplifiers (SOA). There have been demonstrations of 2R all-optical regeneration (reamplifying and reshaping but not retiming) at lower data rates, but getting all 3Rs at the high data rates (40 Gb/s) in the next generation telecom systems remains elusive.
Nonetheless, there is an active academic literature that is pushing the envelope on optical logical devices and regenerators . Many of the systems focus on SOA’s, HNLF’s and Interferometers. Numerical modeling of these kinds of devices is currently ahead of bench-top demonstrations, primarily because of the difficulty of fabrication and device lifetime. But the numerical models point to performance that would be competitive with OEO. If this OOO conversion (Optical-to-Optical-to-Optical) is scalable (can handle increasing bit rates and increasing numbers of channels), then the current data crunch that is facing the telecom trunk lines (see my previous Blog) may be a strong driver to implement such all-optical solutions.
It is important to keep in mind that legacy technology is not static but also continues to improve. As all-optical logic and switching and regeneration make progress, OEO conversion gets incrementally faster, creating a moving target. Therefore, we will need to wait another 20 years to see whether OEO is overtaken and replaced by all-optical.
Photonic Neural Networks
The most exciting area of optical logic today is in analog optical computing––specifically optical neural networks and photonic neuromorphic computing [2, 3]. A neural network is a highly-connected network of nodes and links in which information is distributed across the network in much the same way that information is distributed and processed in the brain. Neural networks can take several forms––from digital neural networks that are implemented with software on conventional digital computers, to analog neural networks implemented in specialized hardware, sometimes also called neuromorphic computing systems.
Optics and photonics are well suited to the analog form of neural network because of the superior ability of light to form free-space interconnects (links) among a high number of optical modes (nodes). This essential advantage of light for photonic neural networks was first demonstrated in the mid-1980’s using recurrent neural network architectures implemented in photorefractive (nonlinear optical) crystals (see Fig. 1 for a publication timeline). But this initial period of proof-of-principle was followed by a lag of about 2 decades due to a mismatch between driver applications (like high-speed logic on an all-optical internet) and the ability to configure the highly complex interconnects needed to perform the complex computations.
The rapid rise of deep machine learning over the past 5 years has removed this bottleneck, and there has subsequently been a major increase in optical implementations of neural networks. In particular, it is now possible to use conventional deep machine learning to design the interconnects of analog optical neural networks for fixed tasks such as image recognition . At first look, this seems like a non-starter, because one might ask why not use the conventional trained deep network to do the recognition itself rather than using it to create a special-purpose optical recognition system. The answer lies primarily in the metrics of latency (speed) and energy cost.
In neural computing, approximately 90% of the time and energy go into matrix multiplication operations. Deep learning algorithms driving conventional digital computers need to do the multiplications at the sequential clock rate of the computer using nested loops. Optics, on the other had, is ideally suited to perform matrix multiplications in a fully parallel manner (see Fig. 4). In addition, a hardware implementation using optics operates literally at the speed of light. The latency is limited only by the time of flight through the optical system. If the optical train is 1 meter, then the time for the complete computation is only a few nanoseconds at almost no energy dissipation. Combining the natural parallelism of light with the speed has led to unprecedented computational rates. For instance, recent implementations of photonic neural networks have demonstrated over 10 Trillion operations per second (TOPS) .
It is important to keep in mind that although many of these photonic neural networks are characterized as all-optical, they are generally not reconfigurable, meaning that they are not adaptive to changing or evolving training sets or changing input information. Most adaptive systems use OEO conversion with electronically-addressed spatial light modulators (SLM) that are driven by digital logic. Another technology gaining recent traction is neuromorphic photonics in which neural processing is implemented on photonic integrated circuits (PICS) with OEO conversion. The integration of large numbers of light emitting sources on PICs is now routine, relieving the OEO bottleneck as electronics and photonics merge in silicon photonics.
Farther afield are all-optical systems that are adaptive through the use of optically-addressed spatial light modulators or nonlinear materials. In fact, these types of adaptive all-optical neural networks were among the first demonstrated in the late 1980’s. More recently, advanced adaptive optical materials, as well as fiber delay lines for a type of recurrent neural network known as reservoir computing, have been used to implement faster and more efficient optical nonlinearities needed for adaptive updates of neural weights. But there are still years to go before light is adaptively controlling light entirely in the optical domain at the speeds and with the flexibility needed for real-world applications like photonic packet switching in telecom fiber-optic routers.
In stark contrast to the status of classical all-optical computing, photonic quantum computing is on the cusp of revolutionizing the field of quantum information science. The recent demonstration from the Canadian company Xanadu of a programmable photonic quantum computer that operates at room temperature may be the harbinger of what is to come in the third generation Machines of Light: Quantum Optical Computers, which is the topic of my next blog.
 V. Sasikala and K. Chitra, “All optical switching and associated technologies: a review,” Journal of Optics-India, vol. 47, no. 3, pp. 307-317, Sep (2018)
 C. Huang et a., “Prospects and applications of photonic neural networks,” Advances in Physics-X, vol. 7, no. 1, Jan (2022), Art no. 1981155
 X. Y. Xu, M. X. Tan, B. Corcoran, J. Y. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti, A. Mitchell, and D. J. Moss, “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature, vol. 589, no. 7840, pp. 44-+, Jan (2021)
One of the hardest aspects to grasp about relativity theory is the question of whether an event “look as if” it is doing something, or whether it “actually is” doing something.
Take, for instance, the classic twin paradox of relativity theory in which there are twins who wear identical high-precision wrist watches. One of them rockets off to Alpha Centauri at relativistic speeds and returns while the other twin stays on Earth. Each twin sees the other twin’s clock running slowly because of relativistic time dilation. Yet when they get back together and, standing side-by-side, they compare their watches—the twin who went to Alpha Centauri is actually younger than the other, despite the paradox. The relativistic effect of time dilation is “real”, not just apparent, regardless of whether they come back together to do the comparison.
Yet this understanding of relativistic effects took many years, even decades, to gain acceptance after Einstein proposed them. He was aware himself that key experiments were required to prove that relativistic effects are real and not just apparent.
Einstein and the Transverse Doppler Effect
In 1905 Einstein used his new theory of special relativity to predict observable consequences that included a general treatment of the relativistic Doppler effect . This included the effects of time dilation in addition to the longitudinal effect of the source chasing the wave. Time dilation produced a correction to Doppler’s original expression for the longitudinal effect that became significant at speeds approaching the speed of light. More significantly, it predicted a transverse Doppler effect for a source moving along a line perpendicular to the line of sight to an observer. This effect had not been predicted either by Christian Doppler (1803 – 1853) or by Woldemar Voigt (1850 – 1919).
Despite the generally positive reception of Einstein’s theory of special relativity, some of its consequences were anathema to many physicists at the time. A key stumbling block was the question whether relativistic effects, like moving clocks running slowly, were only apparent, or were actually real, and Einstein had to fight to convince others of its reality. When Johannes Stark (1874 – 1957) observed Doppler line shifts in ion beams called “canal rays” in 1906 (Stark received the 1919 Nobel prize in part for this discovery) , Einstein promptly published a paper suggesting how the canal rays could be used in a transverse geometry to directly detect time dilation through the transverse Doppler effect . Thirty years passed before the experiment was performed with sufficient accuracy by Herbert Ives and G. R. Stilwell in 1938 to measure the transverse Doppler effect . Ironically, even at this late date, Ives and Stilwell were convinced that their experiment had disproved Einstein’s time dilation by supporting Lorentz’ contraction theory of the electron. The Ives-Stilwell experiment was the first direct test of time dilation, followed in 1940 by muon lifetime measurements .
A) Transverse Doppler Shift Relative to EmissionAngle
The Doppler effect varies between blue shifts in the forward direction to red shifts in the backward direction, with a smooth variation in Doppler shift as a function of the emission angle. Consider the configuration shown in Fig. 1 for light emitted from a source moving at speed v and emitting at an angle θ0 in the receiver frame. The source moves a distance vT in the time of a single emission cycle (assume a harmonic wave). In that time T (which is the period of oscillation of the light source — or the period of a clock if we think of it putting out light pulses) the light travels a distance cT before another cycle begins (or another pulse is emitted).
The observed wavelength in the receiver frame is thus given by
where T is the emission period of the moving source. Importantly, the emission period is time dilated relative to the proper emission time of the source
This expression can be evaluated for several special cases:
a) θ0 = 0 for forward emission
which is the relativistic blue shift for longitudinal motion in the direction of the receiver.
b) θ0 = π for backward emission
which is the relativistic red shift for longitudinal motion away from the receiver
c) θ0 = π/2 for transverse emission
This transverse Doppler effect for emission at right angles is a red shift, caused only by the time dilation of the moving light source. This is the effect proposed by Einstein and observed by Stark that proved moving clocks tick slowly. But it is not the only way to view the transverse Doppler effect.
B) Transverse Doppler Shift Relative to Angle at Reception
A different option for viewing the transverse Doppler effect is the angle to the moving source at the moment that the light is detected. The geometry of this configuration relative to the previous is illustrated in Fig. 2.
The transverse distance to the detection point is
The length of the line connecting the detection point P with the location of the light source at the moment of detection is (using the law of cosines)
Combining with the first equation gives
An equivalent expression is obtained as
Note that this result, relating θ1 to θ0, is independent of the distance to the observation point.
When θ1 = π/2, then
for which the Doppler effect is
which is a blue shift. This creates the unexpected result that sin θ0 = π/2 produces a red shift, while sin θ1 = π/2 produces a blue shift. The question could be asked: which one represents time dilation? In fact, it is sin θ0 = π/2 that produces time dilation exclusively, because in that configuration there is no foreshortening effect on the wavelength–only the emission time.
C) Compromise: The Null Transverse Doppler Shift
The previous two configurations each could be used as a definition for the transverse Doppler effect. But one gives a red shift and one gives a blue shift, which seems contradictory. Therefore, one might try to strike a compromise between these two cases so that sin θ1 = sin θ0, and the configuration is shown in Fig. 3.
This is the case when θ1 + θ2 = π. The sines of the two angles are equal, yielding
which is solved for
Inserting this into the Doppler equation gives
where the Taylor’s expansion of the denominator (at low speed) cancels the numerator to give zero net Doppler shift. This compromise configuration represents the condition of null Doppler frequency shift. However, for speeds approaching the speed of light, the net effect is a lengthening of the wavelength, dominated by time dilation, causing a red shift.
D) Source in Circular Motion Around Receiver
An interesting twist can be added to the problem of the transverse Doppler effect: put the source or receiver into circular motion, one about the other. In the case of a source in circular motion around the receiver, it is easy to see that this looks just like case A) above for θ0 = π/2, which is the red shift caused by the time dilation of the moving source
However, there is the possible complication that the source is no longer in an inertial frame (it experiences angular acceleration) and therefore it is in the realm of general relativity instead of special relativity. In fact, it was Einstein’s solution to this problem that led him to propose the Equivalence Principle and make his first calculations on the deflection of light by gravity. His solution was to think of an infinite number of inertial frames, each of which was instantaneously co-moving with the same linear velocity as the source. These co-moving frames are inertial and can be analyzed using the principles of special relativity. The general relativistic effects come from slipping from one inertial co-moving frame to the next. But in the case of the circular transverse Doppler effect, each instantaneously co-moving frame has the exact configuration as case A) above, and so the wavelength is red shifted exactly by the time dilation.
E) Receiver in Circular Motion Around Source
With the notion of co-moving inertial frames now in hand, this configuration is exactly the same as case B) above, and the wavelength is blue shifted
 A. Einstein, “On the electrodynamics of moving bodies,” Annalen Der Physik, vol. 17, no. 10, pp. 891-921, Sep (1905)
Einstein’s theory of gravity came from a simple happy thought that occurred to him as he imagined an unfortunate worker falling from a roof, losing hold of his hammer, only to find both the hammer and himself floating motionless relative to each other as if gravity had ceased to exist. With this one thought, Einstein realized that the falling (i.e. accelerating) reference frame was in fact an inertial frame, and hence all the tricks that he had learned and invented to deal with inertial relativistic frames could apply just as well to accelerating frames in gravitational fields.
Gravitational lensing (and microlensing) have become a major tool of discovery in astrophysics applied to the study of quasars, dark matter and even the search for exoplanets.
Armed with this new perspective, one of the earliest discoveries that Einstein made was that gravity must bend light paths. This phenomenon is fundamentally post-Newtonian, because there can be no possible force of gravity on a massless photon—yet Einstein’s argument for why gravity should bend light is so obvious that it is manifestly true, as demonstrated by Arthur Eddington during the solar eclipse of 1919, launching Einstein to world-wide fame. It is also demonstrated by the beautiful gravitational lensing phenomenon of Einstein arcs. Einstein arcs are the distorted images of bright distant light sources in the universe caused by an intervening massive object, like a galaxy or galaxy cluster, that bends the light rays. A number of these arcs are seen in images of the Abel cluster of galaxies in Fig. 1.
Gravitational lensing (and microlensing) have become a major tool of discovery in astrophysics applied to the study of quasars, dark matter and even the search for exoplanets. However, as soon as Einstein conceived of gravitational lensing, in 1912, he abandoned the idea as too small and too unlikely to ever be useful, much like he abandoned the idea of gravitational waves in 1915 as similarly being too small ever to detect. It was only at the persistence of an amateur Czech scientist twenty years later that Einstein reluctantly agreed to publish his calculations on gravitational lensing.
The History of Gravitational Lensing
In 1912, only a few years after his “happy thought”, and fully three years before he published his definitive work on General Relativity, Einstein derived how light would be affected by a massive object, causing light from a distant source to be deflected like a lens. The historian of physics, Jürgen Renn discovered these derivations in Einstein’s notebooks while at the Max Planck Institute for the History of Science in Berlin in 1996 . However, Einstein also calculated the magnitude of the effect and dismissed it as too small, and so he never published it.
Years later, in 1936, Einstein received a visit from a Czech electrical engineer Rudi Mandl, an amateur scientist who had actually obtained a small stipend from the Czech government to visit Einstein at the Institute for Advanced Study at Princeton. Mandl had conceived of the possibility of gravitational lensing and wished to bring it to Einstein’s attention, thinking that the master would certainly know what to do with the idea. Einstein was obliging, redoing his calculations of 1912 and obtaining once again the results that made him believe that the effect would be too small to be seen. However, Mandl was persistent and pressed Einstein to publish the results, which he did . In his submission letter to the editor of Science, Einstein stated “Let me also thank you for your cooperation with the little publication, which Mister Mandl squeezed out of me. It is of little value, but it makes the poor guy happy”. Einstein’s pessimism was based on his thinking that isolated stars would be the only source of the gravitational lens (he did not “believe” in black holes), but in 1937 Fritz Zwicky at Cal Tech (a gadfly genius) suggested that the newly discovered phenomenon of “galaxy clusters” might provide the massive gravity that would be required to produce the effect. Although, to be visible, a distant source would need to be extremely bright.
Potential sources were discovered in the 1960’s using radio telescopes that discovered quasi-stellar objects (known as quasars) that are extremely bright and extremely far away. Quasars also appear in the visible range, and in 1979 a twin quasar was discovered by astronomers using the telescope at the Kitt Peak Obversvatory in Arizona–two quasars very close together that shared identical spectral fingerprints. The astronomers realized that it could be a twin image of a single quasar caused by gravitational lensing, which they published as a likely explanation. Although the finding was originally controversial, the twin-image was later confirmed, and many additional examples of gravitational lensing have since been discovered.
The Optics of Gravity and Light
Gravitational lenses are terrible optical instruments. A good imaging lens has two chief properties: 1) It produces increasing delay on a wavefront as the radial distance from the optic axis decreases; and 2) it deflects rays with increasing deflection angle as the radial distance of a ray increases away from the optic axis (the center of the lens). Both properties are part of the same effect: the conversion, by a lens, of an incident plane wave into a converging spherical wave. A third property of a good lens ensures minimal aberrations of the converging wave: a quadratic dependence of wavefront delay on radial distance from the optic axis. For instance, a parabolic lens produces a diffraction-limited focal spot.
Now consider the optical effects of gravity around a black hole. One of Einstein’s chief discoveries during his early investigations into the effects of gravity on light is the analogy of warped space-time as having an effective refractive index. Light propagates through space affected by gravity as if there were a refractive index associated with the gravitational potential. In a previous blog on the optics of gravity, I showed the simple derivation of the refractive effects of gravity on light based on the Schwarschild metric applied to a null geodesic of a light ray. The effective refractive index near a black hole is
This effective refractive index diverges at the Schwarzschild radius of the black hole. It produces the maximum delay, not on the optic axis as for a good lens, but at the finite distance RS. Furthermore, the maximum deflection also occurs at RS, and the deflection decreases with increasing radial distance. Both of these properties of gravitational lensing are opposite to the properties of a good lens. For this reason, the phrase “gravitational lensing” is a bit of a misnomer. Gravitating bodies certainly deflect light rays, but the resulting optical behavior is far from that of an imaging lens.
The path of a ray from a distant quasar, through the thin gravitational lens of a galaxy, and intersecting the location of the Earth, is shown in Fig. 2. The location of the quasar is a distance R from the “optic axis”. The un-deflected angular position is θ0, and with the intervening galaxy the image appears at the angular position θ. The angular magnification is therefore M = θ/θ0.
The deflection angles are related through
where b is the “impact parameter”
These two equations are solved to give to an expression that relates the unmagnified angle θ0 to the magnified angle θ as
is the angular size of the Einstein ring when the source is on the optic axis. The quadratic equation has two solutions that gives two images of the distant quasar. This is the origin of the “double image” that led to the first discovery of gravitational lensing in 1979.
When the distant quasar is on the optic axis, then θ0 = 0 and the deflection of the rays produces, not a double image, but an Einstein ring with an angular size of θE. For typical lensing objects, the angular size of Einstein rings are typically in the range of tens of microradians. The angular magnification for decreasing distance R diverges as
But this divergence is more a statement of the bad lens behavior than of actual image size. Because the gravitational lens is inverted (with greater deflection closer to the optic axis) compared to an ideal thin lens, it produces a virtual image ring that is closer than the original object, as in Fig. 3.
The location of the virtual image behind the gravitational lens (when the quasar is on the optic axis) is obtained from
If the quasar is much further from the lens than the Earth, then the image location is zi = -L1/2, or behind the lens by half the distance from the Earth to the lens. The longitudinal magnification is then
Note that while the transverse (angular) magnification diverges as the object approaches the optic axis, the longitudinal magnification remains finite but always greater than unity.
The Caustic Curves of Einstein Rings
Because gravitational lenses have such severe aberration relative to an ideal lens, and because the angles are so small, an alternate approach to understanding the optics of gravity is through the theory of light caustics. In a previous blog on the optics of caustics I described how sets of deflected rays of light become enclosed in envelopes that produce regions of high and low intensity. These envelopes are called caustics. Gravitational light deflection also causes caustics.
In addition to envelopes, it is also possible to trace the time delays caused by gravity on wavefronts. In the regions of the caustic envelopes, these wavefronts can fold back onto themselves so that different parts of the image arrive at different times coming from different directions.
An example of gravitational caustics is shown in Fig. 4. Rays are incident vertically on a gravitational thin lens which deflects the rays so that they overlap in the region below the lens. The red curves are selected wavefronts at three successively later times. The gravitational potential causes a time delay on the propgating front, with greater delays in regions of stronger gravitational potential. The envelope function that is tangent to the rays is called the caustic, here shown as the dense blue mesh. In this case there is a cusp in the caustic near z = -1 below the lens. The wavefronts become multiple-valued past the cusp
The intensity of the distant object past the lens is concentrated near the caustic envelope. The intensity of the caustic at z = -6 is shown in Fig. 5. The ring structure is the cross-sectional spatial intensity at the fixed observation plane, but a transform to the an angular image is one-to-one, so the caustic intensity distribution is also similar to the view of the Einstein ring from a position at z = -6 on the optic axis.
The gravitational potential is a function of the mass distribution in the gravitational lens. A different distribution with a flatter distribution of mass near the optic axis is shown in Fig. 6. There are multiple caustics in this case with multi-valued wavefronts. Because caustics are sensitive to mass distribution in the gravitational lens, astronomical observations of gravitational caustics can be used to back out the mass distribution, including dark matter or even distant exoplanets.
Python Code gravfront.py
# -*- coding: utf-8 -*-
Created on Tue Mar 30 19:47:31 2021
@author: David Nolte
Introduction to Modern Dynamics, 2nd edition (Oxford University Press, 2019)
import numpy as np
from matplotlib import pyplot as plt
n = n0/(1 + abs(x)**expon)**(1/expon);
delt = 0.001
Ly = 10
Lx = 5
n0 = 1
expon = 2 # adjust this from 1 to 10
delx = 0.01
rng = np.int(Lx/delx)
x = delx*np.linspace(-rng,rng)
n = refindex(x)
dndx = np.diff(n)/np.diff(x)
lines = plt.plot(x,n)
lines2 = plt.plot(dndx)
Nloop = 160;
xd = np.zeros((Nloop,3))
yd = np.zeros((Nloop,3))
for loop in range(0,Nloop):
xp = -Lx + 2*Lx*(loop/Nloop)
plt.plot([xp, xp],[2, 0],'b',linewidth = 0.25)
thet = (refindex(xp+delt) - refindex(xp-delt))/(2*delt)
xb = xp + np.tan(thet)*Ly
plt.plot([xp, xb],[0, -Ly],'b',linewidth = 0.25)
for sloop in range(0,3):
delay = n0/(1 + abs(xp)**expon)**(1/expon) - n0
dis = 0.75*(sloop+1)**2 - delay
xfront = xp + np.sin(thet)*dis
yfront = -dis*np.cos(thet)
xd[loop,sloop] = xfront
yd[loop,sloop] = yfront
for sloop in range(0,3):
plt.plot(xd[:,sloop],yd[:,sloop],'r',linewidth = 0.5)
 J. Renn, T. Sauer and J. Stachel, “The Origin of Gravitational Lensing: A Postscript to Einstein’s 1936 Science Paper, Science 275. 184 (1997)
 A. Einstein, “Lens-Like Action of a Star by the Deviation of Light in the Gravitational Field”, Science 84, 506 (1936)
 (Here is an excellent review article on the topic.) J. Wambsganss, “Gravitational lensing as a powerful astrophysical tool: Multiple quasars, giant arcs and extrasolar planets,” Annalen Der Physik, vol. 15, no. 1-2, pp. 43-59, Jan-Feb (2006) SpringerLink
The quantum of light—the photon—is a little over 100 years old. It was born in 1905 when Einstein merged Planck’s blackbody quantum hypothesis with statistical mechanics and concluded that light itself must be quantized. No one believed him! Fast forward to today, and the photon is a modern workhorse of modern quantum technology. Quantum encryption and communication are performed almost exclusively with photons, and many prototype quantum computers are optics based. Quantum optics also underpins atomic and molecular optics (AMO), which is one of the hottest and most rapidly advancing frontiers of physics today.
Only after the availability of “quantum” light sources … could photon numbers be manipulated at will, launching the modern era of quantum optics.
This blog tells the story of the early days of the photon and of quantum optics. It begins with Einstein in 1905 and ends with the demonstration of photon anti-bunching that was the first fundamentally quantum optical phenomenon observed seventy years later in 1977. Across that stretch of time, the photon went from a nascent idea in Einstein’s fertile brain to the most thoroughly investigated quantum particle in the realm of physics.
The Photon: Albert Einstein (1905)
When Planck presented his quantum hypothesis in 1900 to the German Physical Society , his model of black body radiation retained all its classical properties but one—the quantized interaction of light with matter. He did not think yet in terms of quanta, only in terms of steps in a continuous interaction.
The quantum break came from Einstein when he published his 1905 paper proposing the existence of the photon—an actual quantum of light that carried with it energy and momentum . His reasoning was simple and iron-clad, resting on Planck’s own blackbody relation that Einstein combined with simple reasoning from statistical mechanics. He was led inexorably to the existence of the photon. Unfortunately, almost no one believed him (see my blog on Einstein and Planck).
This was before wave-particle duality in quantum thinking, so the notion that light—so clearly a wave phenomenon—could be a particle was unthinkable. It had taken half of the 19th century to rid physics of Newton’s corpuscules and emmisionist theories of light, so to bring it back at the beginning of the 20th century seemed like a great blunder. However, Einstein persisted.
In 1909 he published a paper on the fluctuation properties of light  in which he proposed that the fluctuations observed in light intensity had two contributions: one from the discreteness of the photons (what we call “shot noise” today) and one from the fluctuations in the wave properties. Einstein was proposing that both particle and wave properties contributed to intensity fluctuations, exhibiting simultaneous particle-like and wave-like properties. This was one of the first expressions of wave-particle duality in modern physics.
In 1916 and 1917 Einstein took another bold step and proposed the existence of stimulated emission . Once again, his arguments were based on simple physics—this time the principle of detailed balance—and he was led to the audacious conclusion that one photon can stimulated the emission of another. This would become the basis of the laser forty-five years later.
While Einstein was confident in the reality of the photon, others sincerely doubted its existence. Robert Milliken (1868 – 1953) decided to put Einstein’s theory of photoelectron emission to the most stringent test ever performed. In 1915 he painstakingly acquired the definitive dataset with the goal to refute Einstein’s hypothesis, only to confirm it in spectacular fashion . Partly based on Milliken’s confirmation of Einstein’s theory of the photon, Einstein was awarded the Nobel Prize in Physics in 1921.
From that point onward, the physical existence of the photon was accepted and was incorporated routinely into other physical theories. Compton used the energy and the momentum of the photon in 1922 to predict and measure Compton scattering of x-rays off of electrons . The photon was given its modern name by Gilbert Lewis in 1926 .
Single-Photon Interference: Geoffry Taylor (1909)
If a light beam is made up of a group of individual light quanta, then in the limit of very dim light, there should just be one photon passing through an optical system at a time. Therefore, to do optical experiments on single photons, one just needs to reach the ultimate dim limit. As simple and clear as this argument sounds, it has problems that only were sorted out after the Hanbury Brown and Twiss experiments in the 1950’s and the controversy they launched (see below). However, in 1909, this thinking seemed like a clear approach for looking for deviations in optical processes in the single-photon limit.
In 1909, Geoffry Ingram Taylor (1886 – 1975) was an undergraduate student at Cambridge University and performed a low-intensity Young’s double-slit experiment (encouraged by J. J. Thomson). At that time the idea of Einstein’s photon was only 4 years old, and Bohr’s theory of the hydrogen atom was still a year away. But Thomson believed that if photons were real, then their existence could possibly show up as deviations in experiments involving single photons. Young’s double-slit experiment is the classic demonstration of the classical wave nature of light, so performing it under conditions when (on average) only a single photon was in transit between a light source and a photographic plate seemed like the best place to look.
The experiment was performed by finding an optimum exposure of photographic plates in a double slit experiment, then reducing the flux while increasing the exposure time, until the single-photon limit was achieved while retaining the same net exposure of the photographic plate. Under the lowest intensity, when only a single photon was in transit at a time (on average), Taylor performed the exposure for three months. To his disappointment, when he developed the film, there was no significant difference between high intensity and low intensity interference fringes . If photons existed, then their quantized nature was not showing up in the low-intensity interference experiment.
The reason that there is no single-photon-limit deviation in the behavior of the Young double-slit experiment is because Young’s experiment only measures first-order coherence properties. The average over many single-photon detection events is described equally well either by classical waves or by quantum mechanics. Quantized effects in the Young experiment could only appear in fluctuations in the arrivals of photons, but in Taylor’s day there was no way to detect the arrival of single photons.
Quantum Theory of Radiation : Paul Dirac (1927)
After Paul Dirac (1902 – 1984) was awarded his doctorate from Cambridge in 1926, he received a stipend that sent him to work with Niels Bohr (1885 – 1962) in Copenhagen. His attention focused on the electromagnetic field and how it interacted with the quantized states of atoms. Although the electromagnetic field was the classical field of light, it was also the quantum field of Einstein’s photon, and he wondered how the quantized harmonic oscillators of the electromagnetic field could be generated by quantum wavefunctions acting as operators. He decided that, to generate a photon, the wavefunction must operate on a state that had no photons—the ground state of the electromagnetic field known as the vacuum state.
Dirac put these thoughts into their appropriate mathematical form and began work on two manuscripts. The first manuscript contained the theoretical details of the non-commuting electromagnetic field operators. He called the process of generating photons out of the vacuum “second quantization”. In second quantization, the classical field of electromagnetism is converted to an operator that generates quanta of the associated quantum field out of the vacuum (and also annihilates photons back into the vacuum). The creation operators can be applied again and again to build up an N-photon state containing N photons that obey Bose-Einstein statistics, as they must, as required by their integer spin, and agreeing with Planck’s blackbody radiation.
Dirac then showed how an interaction of the quantized electromagnetic field with quantized energy levels involved the annihilation and creation of photons as they promoted electrons to higher atomic energy levels, or demoted them through stimulated emission. Very significantly, Dirac’s new theory explained the spontaneous emission of light from an excited electron level as a direct physical process that creates a photon carrying away the energy as the electron falls to a lower energy level. Spontaneous emission had been explained first by Einstein more than ten years earlier when he derived the famous A and B coefficients , but the physical mechanism for these processes was inferred rather than derived. Dirac, in late 1926, had produced the first direct theory of photon exchange with matter .
Einstein-Podolsky-Rosen (EPR) and Bohr (1935)
The famous dialog between Einstein and Bohr at the Solvay Conferences culminated in the now famous “EPR” paradox of 1935 when Einstein published (together with B. Podolsky and N. Rosen) a paper that contained a particularly simple and cunning thought experiment. In this paper, not only was quantum mechanics under attack, but so was the concept of reality itself, as reflected in the paper’s title “Can Quantum Mechanical Description of Physical Reality Be Considered Complete?” .
Einstein considered an experiment on two quantum particles that had become “entangled” (meaning they interacted) at some time in the past, and then had flown off in opposite directions. By the time their properties are measured, the two particles are widely separated. Two observers each make measurements of certain properties of the particles. For instance, the first observer could choose to measure either the position or the momentum of one particle. The other observer likewise can choose to make either measurement on the second particle. Each measurement is made with perfect accuracy. The two observers then travel back to meet and compare their measurements. When the two experimentalists compare their data, they find perfect agreement in their values every time that they had chosen (unbeknownst to each other) to make the same measurement. This agreement occurred either when they both chose to measure position or both chose to measure momentum.
It would seem that the state of the particle prior to the second measurement was completely defined by the results of the first measurement. In other words, the state of the second particle is set into a definite state (using quantum-mechanical jargon, the state is said to “collapse”) the instant that the first measurement is made. This implies that there is instantaneous action at a distance −− violating everything that Einstein believed about reality (and violating the law that nothing can travel faster than the speed of light). He therefore had no choice but to consider this conclusion of instantaneous action to be false. Therefore quantum mechanics could not be a complete theory of physical reality −− some deeper theory, yet undiscovered, was needed to resolve the paradox.
Bohr, on the other hand, did not hold “reality” so sacred. In his rebuttal to the EPR paper, which he published six months later under the identical title , he rejected Einstein’s criterion for reality. He had no problem with the two observers making the same measurements and finding identical answers. Although one measurement may affect the conditions of the second despite their great distance, no information could be transmitted by this dual measurement process, and hence there was no violation of causality. Bohr’s mind-boggling viewpoint was that reality was nonlocal, meaning that in the quantum world the measurement at one location does influence what is measured somewhere else, even at great distance. Einstein, on the other hand, could not accept a nonlocal reality.
The Intensity Interferometer: Hanbury Brown and Twiss (1956)
Optical physics was surprisingly dormant from the 1930’s through the 1940’s. Most of the research during this time was either on physical optics, like lenses and imaging systems, or on spectroscopy, which was more interested in the physical properties of the materials than in light itself. This hiatus from the photon was about to change dramatically, not driven by physicists, but driven by astronomers.
The development of radar technology during World War II enabled the new field of radio astronomy both with high-tech receivers and with a large cohort of scientists and engineers trained in radio technology. In the late 1940’s and early 1950’s radio astronomy was starting to work with long baselines to better resolve radio sources in the sky using interferometery. The first attempts used coherent references between two separated receivers to provide a common mixing signal to perform field-based detection. However, the stability of the reference was limiting, especially for longer baselines.
In 1950, a doctoral student in the radio astronomy department of the University of Manchester, R. Hanbury Brown, was given the task to design baselines that could work at longer distances to resolve smaller radio sources. After struggling with the technical difficulties of providing a coherent “local” oscillator for distant receivers, Hanbury Brown had a sudden epiphany one evening. Instead of trying to reference the field of one receiver to the field of another, what if, instead, one were to reference the intensity of one receiver to the intensity of the other, specifically correlating the noise on the intensity? To measure intensity requires no local oscillator or reference field. The size of an astronomical source would then show up in how well the intensity fluctuations correlated with each other as the distance between the receivers was changed. He did a back of the envelope calculation that gave him hope that his idea might work, but he needed more rigorous proof if he was to ask for money to try out his idea. He tracked down Richard Twiss at a defense research lab and the two working out the theory of intensity correlations for long-baseline radio interferometry. Using facilities at the famous Jodrell Bank Radio Observatory at Manchester, they demonstrated the principle of their intensity interferometer and measured the angular size of Cygnus A and Cassiopeia A, two of the strongest radio sources in the Northern sky.
One of the surprising side benefits of the intensity interferometer over field-based interferometry was insensitivity to environmental phase fluctuations. For radio astronomy the biggest source of phase fluctuations was the ionosphere, and the new intensity interferometer was immune to its fluctuations. Phase fluctuations had also been the limiting factor for the Michelson stellar interferometer which had limited its use to only about half a dozen stars, so Hanbury Brown and Twiss decided to revisit visible stellar interferometry using their new concept of intensity interferometry.
To illustrate the principle for visible wavelengths, Hanbury Brown and Twiss performed a laboratory experiment to correlate intensity fluctuations in two receivers illuminated by a common source through a beam splitter. The intensity correlations were detected and measured as a function of path length change, illustrating an excess correlation in noise for short path lengths that decayed as the path length increased. They published their results in Nature magazine in 1956 that immediately ignited a firestorm of protest from physicists .
In the 1950’s, many physicists had embraced the discrete properties of the photon and had developed a misleading mental picture of photons as individual and indivisible particles that could only go one way or another from a beam splitter, but not both. Therefore, the argument went, if the photon in an attenuated beam was detected in one detector at the output of a beam splitter, then it cannot be detected at the other. This would produce an anticorrelation in coincidence counts at the two detectors. However, the Hanbury Brown Twiss (HBT) data showed a correlation from the two detectors. This launched an intense controversy in which some of those who accepted the results called for a radical new theory of the photon, while most others dismissed the HBT results as due to systematics in the light source. The heart of this controversy was quickly understood by the Nobel laureate E. M Purcell. He correctly pointed out that photons are bosons and are indistinguishable discrete particles and hence are likely to “bunch” together, according to quantum statistics, even under low light conditions . Therefore, attenuated “chaotic” light would indeed show photodetector correlations, even if the average photon number was less than a single photon at a time, the photons would still bunch.
The bunching of photons in light is a second order effect that moves beyond the first-order interference effects of Young’s double slit, but even here the quantum nature of light is not required. A semiclassical theory of light emission from a spectral line with a natural bandwidth also predicts intensity correlations, and the correlations are precisely what would be observed for photon bunching. Therefore, even the second-order HBT results, when performed with natural light sources, do not distinguish between classical and quantum effects in the experimental results. But this reliance on natural light sources was about to change fundmaentally with the invention of the laser.
Invention of the Laser : Ted Maiman (1959)
One of the great scientific breakthroughs of the 20th century was the nearly simultaneous yet independent realization by several researchers around 1951 (by Charles H. Townes of Columbia University, by Joseph Weber of the University of Maryland, and by Alexander M. Prokhorov and Nikolai G. Basov at the Lebedev Institute in Moscow) that clever techniques and novel apparati could be used to produce collections of atoms that had more electrons in excited states than in ground states. Such a situation is called a population inversion. If this situation could be attained, then according to Einstein’s 1917 theory of photon emission, a single photon would stimulate a second photon, which in turn would stimulate two additional electrons to emit two identical photons to give a total of four photons −− and so on. Clearly this process turns a single photon into a host of photons, all with identical energy and phase.
Charles Townes and his research group were the first to succeed in 1953 in producing a device based on ammonia molecules that could work as an intense source of coherent photons. The initial device did not amplify visible light, but amplified microwave photons that had wavelengths of about 3 centimeters. They called the process microwave amplification by stimulated emission of radiation, hence the acronym “MASER”. Despite the significant breakthrough that this invention represented, the devices were very expensive and difficult to operate. The maser did not revolutionize technology, and some even quipped that the acronym stood for “Means of Acquiring Support for Expensive Research”. The maser did, however, launch a new field of study, called quantum electronics, that was the direct descendant of Einstein’s 1917 paper. Most importantly, the existence and development of the maser became the starting point for a device that could do the same thing for light.
The race to develop an optical maser (later to be called laser, for light amplification by stimulated emission of radiation) was intense. Many groups actively pursued this holy grail of quantum electronics. Most believed that it was possible, which made its invention merely a matter of time and effort. This race was won by Theodore H. Maiman at Hughes Research Laboratory in Malibu California in 1960 . He used a ruby crystal that was excited into a population inversion by an intense flash tube (like a flash bulb) that had originally been invented for flash photography. His approach was amazingly simple −− blast the ruby with a high-intensity pulse of light and see what comes out −− which explains why he was the first. Most other groups had been pursuing much more difficult routes because they believed that laser action would be difficult to achieve.
Perhaps the most important aspect of Maiman’s discovery was that it demonstrated that laser action was actually much simpler than people anticipated, and that laser action is a fairly common phenomenon. His discovery was quickly repeated by other groups, and then additional laser media were discovered such as helium-neon gas mixtures, argon gas, carbon dioxide gas, garnet lasers and others. Within several years, over a dozen different material and gas systems were made to lase, opening up wide new areas of research and development that continues unabated to this day. It also called for new theories of optical coherence to explain how coherent laser light interacted with matter.
Coherent States : Glauber (1963)
The HBT experiment had been performed with attenuated chaotic light that had residual coherence caused by the finite linewidth of the filtered light source. The theory of intensity correlations for this type of light was developed in the 1950’s by Emil Wolf and Leonard Mandel using a semiclassical theory in which the statistical properties of the light was based on electromagnetics without a direct need for quantized photons. The HBT results were fully consistent with this semiclassical theory. However, after the invention of the laser, new “coherent” light sources became available that required a fundamentally quantum depiction.
Roy Glauber was a theoretical physicist who received his PhD working with Julian Schwinger at Harvard. He spent several years as a post-doc at Princeton’s Institute for Advanced Study starting in 1949 at the time when quantum field theory was being developed by Schwinger, Feynman and Dyson. While Feynman was off in Brazil for a year learning to play the bongo drums, Glauber filled in for his lectures at Cal Tech. He returned to Harvard in 1952 in the position of an assistant professor. He was already thinking about the quantum aspects of photons in 1956 when news of the photon correlations in the HBT experiment were published, and when the laser was invented three years later, he began developing a theory of photon correlations in laser light that he suspected would be fundamentally different than in natural chaotic light.
Because of his background in quantum field theory, and especially quantum electrodynamics, it was a fairly easy task to couch the quantum optical properties of coherent light in terms of Dirac’s creation and annihilation operators of the electromagnetic field. Related to the minimum-uncertainty wave functions derived initially by Schrödinger in the late 1920’s, Glauber developed a “coherent state” operator that was a minimum uncertainty state of the quantized electromagnetic field . This coherent state represents a laser operating well above the lasing threshold and predicted that the HBT correlations would vanish. Glauber was awarded the Nobel Prize in Physics in 2005 for his work on such “Glauber” states in quantum optics.
Single-Photon Optics: Kimble and Mandel (1977)
Beyond introducing coherent states, Glauber’s new theoretical approach, and parallel work by George Sudarshan around the same time , provided a new formalism for exploring quantum optical properties in which fundamentally quantum processes could be explored that could not be predicted using only semiclassical theory. For instance, one could envision producing photon states in which the photon arrivals at a detector could display the kind of anti-bunching that had originally been assumed (in error) by the critics of the HBT experiment. A truly one-photon state, also known as a Fock state or a number state, would be the extreme limit in which the quantum field possessed a single quantum that could be directed at a beam splitter and would emerge either from one side or the other with complete anti-correlation. However, generating such a state in the laboratory remained a challenge.
In 1975 by Carmichel and Walls predicted that resonance fluorescence could produce quantized fields that had lower correlations than coherent states . In 1977 H. J. Kimble, M. Dagenais and L. Mandel demonstrated, for the first time, photon antibunching between two photodetectors at the two ports of a beam splitter . They used a beam of sodium atoms pumped by a dye laser.
This first demonstration of photon antibunching represents a major milestone in the history of quantum optics. Taylor’s first-order experiments in 1909 showed no difference between classical electromagnetic waves and a flux of photons. Similarly the second-order HBT experiment of 1956 using chaotic light could be explained equally well using classical or quantum approaches to explain the observed photon correlations. Even laser light (when the laser is operated far above threshold) produced classic “classical” wave effects with only the shot noise demonstrating the discreteness of photon arrivals. Only after the availability of “quantum” light sources, beginning with the work of Kimble and Mandel, could photon numbers be manipulated at will, launching the modern era of quantum optics. Later experiments by them and others have continually improved the control of photon states.
1900 – Planck (1901). “Law of energy distribution in normal spectra.” Annalen Der Physik 4(3): 553-563.
1905 – A. Einstein (1905). “Generation and conversion of light with regard to a heuristic point of view.” Annalen Der Physik 17(6): 132-148.
1909 – A. Einstein (1909). “On the current state of radiation problems.” Physikalische Zeitschrift 10: 185-193.
1909 – G.I. Taylor: Proc. Cam. Phil. Soc. Math. Phys. Sci. 15 , 114 (1909) Single photon double-slit experiment
1915 – Millikan, R. A. (1916). “A direct photoelectric determination of planck’s “h.”.” Physical Review 7(3): 0355-0388. Photoelectric effect.
1916 – Einstein, A. (1916). “Strahlungs-Emission un -Absorption nach der Quantentheorie.” Verh. Deutsch. Phys. Ges. 18: 318.. Einstein predicts stimulated emission
1923 –Compton, Arthur H. (May 1923). “A Quantum Theory of the Scattering of X-Rays by Light Elements”. Physical Review. 21 (5): 483–502.
1926 – Lewis, G. N. (1926). “The conservation of photons.” Nature 118: 874-875.. Gilbert Lewis named “photon”
1927 – D. Dirac, P. A. M. (1927). “The quantum theory of the emission and absorption of radiation.” Proceedings of the Royal Society of London Series a-Containing Papers of a Mathematical and Physical Character 114(767): 243-265.
1932 – E. P. Wigner: Phys. Rev. 40, 749 (1932)
1935 – A. Einstein, B. Podolsky, N. Rosen: Phys. Rev. 47 , 777 (1935). EPR paradox.
1935 – N. Bohr: Phys. Rev. 48 , 696 (1935). Bohr’s response to the EPR paradox.
 Einstein, A. (1916). “Strahlungs-Emission un -Absorption nach der Quantentheorie.” Verh. Deutsch. Phys. Ges. 18: 318; Einstein, A. (1917). “Quantum theory of radiation.” Physikalische Zeitschrift 18: 121-128.
 Brown, R. H. and R. Q. Twiss (1956). “Correlation Between Photons in 2 Coherent Beams of Light.” Nature177(4497): 27-29;  R. H. Brown and R. Q. Twiss, “Test of a new type of stellar interferometer on Sirius,” Nature, vol. 178, no. 4541, pp. 1046-1048, (1956).
 Glauber, R. J. (1963). “Photon Correlations.” Physical Review Letters 10(3): 84.
 Sudarshan, E. C. G. (1963). “Equivalence of semiclassical and quantum mechanical descriptions of statistical light beams.” Physical Review Letters 10(7): 277-&.; Mehta, C. L. and E. C. Sudarshan (1965). “Relation between quantum and semiclassical description of optical coherence.” Physical Review 138(1B): B274.
Nature loves the path of steepest descent. Place a ball on a smooth curved surface and release it, and it will instantansouly accelerate in the direction of steepest descent. Shoot a laser beam from an oblique angle onto a piece of glass to hit a target inside, and the path taken by the beam is the only path that decreases the distance to the target in the shortest time. Diffract a stream of electrons from the surface of a crystal, and quantum detection events are greatest at the positions where the troughs and peaks of the deBroglie waves converge the most. The first example is Newton’s second law. The second example is Fermat’s principle and Snell’s Law. The third example is Feynman’s path-integral formulation of quantum mechanics. They all share in common a minimization principle—the principle of least action—that the path of a dynamical system is the one that minimizes a property known as “action”.
The Eikonal Equation is the “F = ma” of ray optics. It’s solutions describe the paths of light rays through complicated media.
The principle of least action, first proposed by the French physicist Maupertuis through mechanical analogy, became a principle of Lagrangian mechanics in the hands of Lagrange, but was still restricted to mechanical systems of particles. The principle was generalized forty years later by Hamilton, who began by considering the propagation of light waves, and ended by transforming mechanics into a study of pure geometry divorced from forces and inertia. Optics played a key role in the development of mechanics, and mechanics returned the favor by giving optics the Eikonal Equation. The Eikonal Equation is the “F = ma” of ray optics. It’s solutions describe the paths of light rays through complicated media.
Anyone who has taken a course in optics knows that Étienne-Louis Malus (1775-1812) discovered the polarization of light, but little else is taught about this French mathematician who was one of the savants Napoleon had taken along with himself when he invaded Egypt in 1798. After experiencing numerous horrors of war and plague, Malus returned to France damaged but wiser. He discovered the polarization of light in the Fall of 1808 as he was playing with crystals of icelandic spar at sunset and happened to view last rays of the sun reflected from the windows of the Luxumbourg palace. Icelandic spar produces double images in natural light because it is birefringent. Malus discovered that he could extinguish one of the double images of the Luxumbourg windows by rotating the crystal a certain way, demonstrating that light is polarized by reflection. The degree to which light is extinguished as a function of the angle of the polarizing crystal is known as Malus’ Law.
Malus had picked up an interest in the general properties of light and imaging during lulls in his ordeal in Egypt. He was an emissionist following his compatriot Laplace, rather than an undulationist following Thomas Young. It is ironic that the French scientists were staunchly supporting Newton on the nature of light, while the British scientist Thomas Young was trying to upend Netwonian optics. Almost all physicists at that time were emissionists, only a few years after Young’s double-slit experiment of 1804, and few serious scientists accepted Young’s theory of the wave nature of light until Fresnel and Arago supplied the rigorous theory and experimental proofs much later in 1819.
As a prelude to his later discovery of polarization, Malus had earlier proven a theorem about trajectories that particles of light take through an optical system. One of the key questions about the particles of light in an optical system was how they formed images. The physics of light particles moving through lenses was too complex to treat at that time, but reflection was relatively easy based on the simple reflection law. Malus proved a theorem mathematically that after a reflection from a curved mirror, a set of rays perpendicular to an initial nonplanar surface would remain perpendicular at a later surface after reflection (this property is closely related to the conservation of optical etendue). This is known as Malus’ Theorem, and he thought it only held true after a single reflection, but later mathematicians proved that it remains true even after an arbitrary number of reflections, even in cases when the rays intersect to form an optical effect known as a caustic. The mathematics of caustics would catch the interest of an Irish mathematician and physicist who helped launch a new field of mathematical physics.
Hamilton’s Characteristic Function
William Rowan Hamilton (1805 – 1865) was a child prodigy who taught himself thirteen languages by the time he was thirteen years old (with the help of his linguist uncle), but mathematics became his primary focus at Trinity College at the University in Dublin. His mathematical prowess was so great that he was made the Astronomer Royal of Ireland while still an undergraduate student. He also became fascinated in the theory of envelopes of curves and in particular to the mathematics of caustic curves in optics.
In 1823 at the age of 18, he wrote a paper titled Caustics that was read to the Royal Irish Academy. In this paper, Hamilton gave an exceedingly simple proof of Malus’ Law, but that was perhaps the simplest part of the paper. Other aspects were mathematically obscure and reviewers requested further additions and refinements before publication. Over the next four years, as Hamilton expanded this work on optics, he developed a new theory of optics, the first part of which was published as Theory of Systems of Rays in 1827 with two following supplements completed by 1833 but never published.
Hamilton’s most important contribution
to optical theory (and eventually to mechanics) he called his characteristic
function. By applying the principle of
Fermat’s least time, which he called his principle of stationary action, he
sought to find a single unique function that characterized every path through
an optical system. By first proving
Malus’ Theorem and then applying the theorem to any system of rays using the
principle of stationary action, he was able to construct two partial
differential equations whose solution, if it could be found, defined every ray
through the optical system. This result
was completely general and could be extended to include curved rays passing
through inhomogeneous media. Because it
mapped input rays to output rays, it was the most general characterization of
any defined optical system. The
characteristic function defined surfaces of constant action whose normal
vectors were the rays of the optical system.
Today these surfaces of constant action are called the Eikonal function
(but how it got its name is the next chapter of this story). Using his characteristic function, Hamilton
predicted a phenomenon known as conical refraction in 1832, which was
subsequently observed, launching him to a level of fame unusual for an
Once Hamilton had established his principle of stationary action of curved light rays, it was an easy step to extend it to apply to mechanical systems of particles with curved trajectories. This step produced his most famous work On a General Method in Dynamics published in two parts in 1834 and 1835  in which he developed what became known as Hamiltonian dynamics. As his mechanical work was extended by others including Jacobi, Darboux and Poincaré, Hamilton’s work on optics was overshadowed, overlooked and eventually lost. It was rediscovered when Schrödinger, in his famous paper of 1926, invoked Hamilton’s optical work as a direct example of the wave-particle duality of quantum mechanics . Yet in the interim, a German mathematician tackled the same optical problems that Hamilton had seventy years earlier, and gave the Eikonal Equation its name.
The German mathematician Heinrich Bruns (1848-1919) was engaged chiefly with the measurement of the Earth, or geodesy. He was a professor of mathematics in Berlin and later Leipzig. One claim fame was that one of his graduate students was Felix Hausdorff  who would go on to much greater fame in the field of set theory and measure theory (the Hausdorff dimension was a precursor to the fractal dimension). Possibly motivated by his studies done with Hausdorff on refraction of light by the atmosphere, Bruns became interested in Malus’ Theorem for the same reasons and with the same goals as Hamilton, yet was unaware of Hamilton’s work in optics.
The mathematical process of creating “images”, in the sense of a mathematical mapping, made Bruns think of the Greek word eikwn which literally means “icon” or “image”, and he published a small book in 1895 with the title Das Eikonal in which he derived a general equation for the path of rays through an optical system. His approach was heavily geometrical and is not easily recognized as an equation arising from variational principals. It rediscovered most of the results of Hamilton’s paper on the Theory of Systems of Rays and was thus not groundbreaking in the sense of new discovery. But it did reintroduce the world to the problem of systems of rays, and his name of Eikonal for the equations of the ray paths stuck, and was used with increasing frequency in subsequent years. Arnold Sommerfeld (1868 – 1951) was one of the early proponents of the Eikonal equation and recognized its connection with action principles in mechanics. He discussed the Eikonal equation in a 1911 optics paper with Runge  and in 1916 used action principles to extend Bohr’s model of the hydrogen atom . While the Eikonal approach was not used often, it became popular in the 1960’s when computational optics made numerical solutions possible.
Lagrangian Dynamics of Light Rays
In physical optics, one of the most important properties of a ray passing through an optical system is known as the optical path length (OPL). The OPL is the central quantity that is used in problems of interferometry, and it is the central property that appears in Fermat’s principle that leads to Snell’s Law. The OPL played an important role in the history of the calculus when Johann Bernoulli in 1697 used it to derive the path taken by a light ray as an analogy of a brachistochrone curve – the curve of least time taken by a particle between two points.
The OPL between two points in a refractive medium is the sum of the piecewise product of the refractive index n with infinitesimal elements of the path length ds. In integral form, this is expressed as
where the “dot” is a derivative
with respedt to s. The optical
Lagrangian is recognized as
The Lagrangian is inserted into the Euler equations to yield (after some algebra, see Introduction to Modern Dynamics pg. 336)
This is a second-order
ordinary differential equation in the variables xa that define the
ray path through the system. It is
literally a “trajectory” of the ray, and the Eikonal equation becomes the F =
ma of ray optics.
In a paraxial system (in which
the rays never make large angles relative to the optic axis) it is common to
select the position z as a single parameter to define the curve of the ray path
so that the trajectory is parameterized as
where the derivatives
are with respect to z, and the effective Lagrangian is recognized as
formulation is derived from the Lagrangian by defining an optical Hamiltonian
as the Legendre transform of the Lagrangian.
To start, the Lagrangian is expressed in terms of the generalized
coordinates and momenta. The generalized
optical momenta are defined as
This relationship leads
to an alternative expression for the Eikonal equation (also known as the scalar
Eikonal equation) expressed as
where S(x,y,z) = const. is the eikonal function. The
momentum vectors are perpendicular to the surfaces of constant S, which
are recognized as the wavefronts of a propagating wave.
Lagrangian can be restated as a function of the generalized momenta as
and the Legendre
transform that takes the Lagrangian into the Hamiltonian is
The trajectory of the
rays is the solution to Hamilton’s equations of motion applied to this
If the optical rays are
restricted to the x-y plane, then Hamilton’s equations of motion can be
expressed relative to the path length ds, and the momenta are pa =
ndxa/ds. The ray equations are
(simply expressing the 2 second-order Eikonal equation as 4 first-order
where the dot is a derivative
with respect to the element ds.
As an example, consider a radial refractive index profile in the x-y plane
where r is the radius on the x-y plane. Putting this refractive index profile into the Eikonal equations creates a two-dimensional orbit in the x-y plane. The Eikonal Equation is the “F = ma” of ray optics. It’s solutions describe the paths of light rays through complicated media, including the phenomenon of gravitational lensing, described in my blog post here, and the orbits of photons around black holes, described in my other blog post here.
Python Code: raysimple.py
The following Python code solves for individual trajectories.
# -*- coding: utf-8 -*-
Created on Tue May 28 11:50:24 2019
D. D. Nolte, Introduction to Modern Dynamics: Chaos, Networks, Space and Time, 2nd ed. (Oxford,2019)
import numpy as np
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
from scipy import integrate
from matplotlib import pyplot as plt
from matplotlib import cm
# selection 1 = Gaussian
# selection 2 = Donut
selection = 1
if selection == 1:
sig = 10
n = 1 + np.exp(-(x**2 + y**2)/2/sig**2)
nx = (-2*x/2/sig**2)*np.exp(-(x**2 + y**2)/2/sig**2)
ny = (-2*y/2/sig**2)*np.exp(-(x**2 + y**2)/2/sig**2)
elif selection == 2:
sig = 10;
r2 = (x**2 + y**2)
r1 = np.sqrt(r2)
np.expon = np.exp(-r2/2/sig**2)
n = 1+0.3*r1*np.expon;
nx = 0.3*r1*(-2*x/2/sig**2)*np.expon + 0.3*np.expon*2*x/r1
ny = 0.3*r1*(-2*y/2/sig**2)*np.expon + 0.3*np.expon*2*y/r1
x, y, z, w = x_y_z
n, nx, ny = refindex(x,y)
yp = np.zeros(shape=(4,))
yp = z/n
yp = w/n
yp = nx
yp = ny
V = np.zeros(shape=(100,100))
for xloop in range(100):
xx = -20 + 40*xloop/100
for yloop in range(100):
yy = -20 + 40*yloop/100
n,nx,ny = refindex(xx,yy)
V[yloop,xloop] = n
fig = plt.figure(1)
contr = plt.contourf(V,100, cmap=cm.coolwarm, vmin = 1, vmax = 3)
fig.colorbar(contr, shrink=0.5, aspect=5)
fig = plt.show()
v1 = 0.707 # Change this initial condition
v2 = np.sqrt(1-v1**2)
y0 = [12, 0, v1, v2] # Change these initial conditions
tspan = np.linspace(1,1700,1700)
y = integrate.odeint(flow_deriv, y0, tspan)
lines = plt.plot(y[1:1550,0],y[1:1550,1])
An excellent textbook on geometric optics from Hamilton’s point of view is K. B. Wolf, Geometric Optics in Phase Space (Springer, 2004). Another is H. A. Buchdahl, An Introduction to Hamiltonian Optics (Dover, 1992).
A rather older textbook on geometrical optics is by J. L. Synge, Geometrical Optics: An Introduction to Hamilton’s Method (Cambridge University Press, 1962) showing the derivation of the ray equations in the final chapter using variational methods. Synge takes a dim view of Bruns’ term “Eikonal” since Hamilton got there first and Bruns was unaware of it.
A book that makes an especially strong case for the Optical-Mechanical analogy of Fermat’s principle, connecting the trajectories of mechanics to the paths of optical rays is Daryl Holm, Geometric Mechanics: Part I Dynamics and Symmetry (Imperial College Press 2008).
 Hamilton, W. R. “On a general method in dynamics I.” Mathematical Papers, I ,103-161: 247-308. (1834); Hamilton, W. R. “On a general method in dynamics II.” Mathematical Papers, I ,103-161: 95-144. (1835)
 Schrodinger, E. “Quantification of the eigen-value problem.” Annalen Der Physik 79(6): 489-527. (1926)