Twenty Years at Light Speed: The Future of Photonic Quantum Computing

Now is exactly the wrong moment to be reviewing the state of photonic quantum computing–the field is moving so rapidly, at just this moment, that everything I say here now will probably be out of date in just a few years. On the other hand, now is exactly the right time to be doing this review, because so much has happened in just the past few years, that it is important to take a moment and look at where this field is today and where it will be going.

At the 20-year anniversary of the publication of my book Mind at Light Speed (Free Press, 2001), this blog is the third in a series reviewing progress in three generations of Machines of Light over the past 20 years (see my previous blogs on the future of the photonic internet and on all-optical computers). This third and final update reviews progress on the third generation of the Machines of Light: the Quantum Optical Generation. Of the three generations, this is the one that is changing the fastest.

Quantum computing is almost here … and it will be at room temperature, using light, in photonic integrated circuits!

Quantum Computing with Linear Optics

Twenty years ago in 2001, Gerald Mulburn at the University of Queensland, Australia, with Emanuel Knill and Raymond LaFlamme at Los Alamos National Lab, published a revolutionary theoretical paper (known as KLM) in Nature Magazine on quantum computing with linear optics: “A scheme for efficient quantum computation with linear optics” [1]. Up until that time, it was believed that a quantum computer–if it was going to have the property of a universal Turing machine–needed to have at least some nonlinear interactions among qubits in a quantum gate. For instance, an example of a two-qubit gate is a controlled-NOT, or CNOT, gate shown in Fig. 1 with the Truth Table and the equivalent unitary matrix. It clear that one qubit is controlling the other, telling it what to do.

The quantum CNOT gate gets interesting when the control line has a quantum superposition, then the two outputs become entangled.

Entanglement is a strange process that is unique to quantum systems and has no classical analog. It also has no simple intuitive explanation. By any normal logic, if the control line passes through the gate unaltered, then absolutely nothing interesting should be happening on the Control-Out line. But that’s not the case. The control line going in was a separate state. If some measurement were made on it, either a 1 or 0 would be seen with equal probability. But coming out of the CNOT, the signal has somehow become perfectly correlated with whatever value is on the Signal-Out line. If the Signal-Out is measured, the measurement process collapses the state of the Control-Out to a value equal to the measured signal. The outcome of the control line becomes 100% certain even though nothing was ever done to it! This entanglement generation is one reason the CNOT is often the gate of choice when constructing quantum circuits to perform interesting quantum algorithms.

However, optical implementation of a CNOT is a problem, because light beams and photons really do not like to interact with each other. This is the problem with all-optical classical computers too (see my previous blog). There are ways of getting light to interact with light, for instance inside nonlinear optical materials. And in the case of quantum optics, a single atom in an optical cavity can interact with single photons in ways that can act like a CNOT or related gates. But the efficiencies are very low and the costs to implement it are very high, making it difficult or impossible to scale such systems up into whole networks needed to make a universal quantum computer.

Therefore, when KLM published their idea for quantum computing with linear optics, it caused a shift in the way people were thinking about optical quantum computing. A universal optical quantum computer could be built using just light sources, beam splitters and photon detectors.

The way that KLM gets around the need for a direct nonlinear interaction between two photons is to use postselection. They run a set of photons–signal photons and ancilla (control) photons–through their linear optical system and they detect (i.e., theoretically…the paper is purely a theoretical proposal) the ancilla photons. If these photons are not detected where they are wanted, then that iteration of the computation is thrown out, and it is tried again and again, until the photons end up where they need to be. When the ancilla outcomes are finally what they need to be, this run is selected. The signal photons are still unmeasured at this point and are therefore in quantum superpositions that are useful for quantum computation. Postselection uses entanglement and measurement collapse to put the signal photons into desired quantum states. Postselection provides an effective nonlinearity that is induced by the wavefunction collapse of the entangled state. Of course, the down side of this approach is that many iterations are thrown out–the computation becomes non-deterministic.

KLM could get around most of the non-determinism by using more and more ancilla photons, but this has the cost of blowing up the size and cost of the implementation, so their scheme was not imminently practical. But the important point was that it introduced the idea of linear quantum computing. (For this, Milburn and his collaborators have my vote for a future Nobel Prize.) Once that idea was out, others refined it, and improved upon it, and found clever ways to make it more efficient and more scalable. Many of these ideas relied on a technology that was co-evolving with quantum computing–photonic integrated circuits (PICs).

Quantum Photonic Integrated Circuits (QPICs)

Never underestimate the power of silicon. The amount of time and energy and resources that have now been invested in silicon device fabrication is so astronomical that almost nothing in this world can displace it as the dominant technology of the present day and the future. Therefore, when a photon can do something better than an electron, you can guess that eventually that photon will be encased in a silicon chip–on a photonic integrated circuit (PIC).

The dream of integrated optics (the optical analog of integrated electronics) has been around for decades, where waveguides take the place of conducting wires, and interferometers take the place of transistors–all miniaturized and fabricated in the thousands on silicon wafers. The advantages of PICs are obvious, but it has taken a long time to develop. When I was a post-doc at Bell Labs in the late 1980’s, everyone was talking about PICs, but they had terrible fabrication challenges and terrible attenuation losses. Fortunately, these are just technical problems, not limited by any fundamental laws of physics, so time (and an army of researchers) has chipped away at them.

One of the driving forces behind the maturation of PIC technology is photonic fiber optic communications (as discussed in a previous blog). Photons are clear winners when it comes to long-distance communications. In that sense, photonic information technology is a close cousin to silicon–photons are no less likely to be replaced by a future technology than silicon is. Therefore, it made sense to bring the photons onto the silicon chips, tapping into the full array of silicon fab resources so that there could be seamless integration between fiber optics doing the communications and the photonic chips directing the information. Admittedly, photonic chips are not yet all-optical. They still use electronics to control the optical devices on the chip, but this niche for photonics has provided a driving force for advancements in PIC fabrication.

Fig. 2 Schematic of a silicon photonic integrated circuit (PIC). The waveguides can be silica or nitride deposited on the silicon chip. From the Comsol WebSite.

One side-effect of improved PIC fabrication is low light losses. In telecommunications, this loss is not so critical because the systems use OEO regeneration. But less loss is always good, and the PICs can now safeguard almost every photon that comes on chip–exactly what is needed for a quantum PIC. In a quantum photonic circuit, every photon is valuable and informative and needs to be protected. The new PIC fabrication can do this. In addition, light switches for telecom applications are built from integrated interferometers on the chip. It turns out that interferometers at the single-photon level are unitary quantum gates that can be used to build universal photonic quantum computers. So the same technology and control that was used for telecom is just what is needed for photonic quantum computers. In addition, integrated optical cavities on the PICs, which look just like wavelength filters when used for classical optics, are perfect for producing quantum states of light known as squeezed light that turn out to be valuable for certain specialty types of quantum computing.

Therefore, as the concepts of linear optical quantum computing advanced through that last 20 years, the hardware to implement those concepts also advanced, driven by a highly lucrative market segment that provided the resources to tap into the vast miniaturization capabilities of silicon chip fabrication. Very fortuitous!

Room-Temperature Quantum Computers

There are many radically different ways to make a quantum computer. Some are built of superconducting circuits, others are made from semiconductors, or arrays of trapped ions, or nuclear spins on nuclei on atoms in molecules, and of course with photons. Up until about 5 years ago, optical quantum computers seemed like long shots. Perhaps the most advanced technology was the superconducting approach. Superconducting quantum interference devices (SQUIDS) have exquisite sensitivity that makes them robust quantum information devices. But the drawback is the cold temperatures that are needed for them to work. Many of the other approaches likewise need cold temperature–sometimes astronomically cold temperatures that are only a few thousandths of a degree above absolute zero Kelvin.

Cold temperatures and quantum computing seemed a foregone conclusion–you weren’t ever going to separate them–and for good reason. The single greatest threat to quantum information is decoherence–the draining away of the kind of quantum coherence that allows interferences and quantum algorithms to work. In this way, entanglement is a two-edged sword. On the one hand, entanglement provides one of the essential resources for the exponential speed-up of quantum algorithms. But on the other hand, if a qubit “sees” any environmental disturbance, then it becomes entangled with that environment. The entangling of quantum information with the environment causes the coherence to drain away–hence decoherence. Hot environments disturb quantum systems much more than cold environments, so there is a premium for cooling the environment of quantum computers to as low a temperature as they can. Even so, decoherence times can be microseconds to milliseconds under even the best conditions–quantum information dissipates almost as fast as you can make it.

Enter the photon! The bottom line is that photons don’t interact. They are blind to their environment. This is what makes them perfect information carriers down fiber optics. It is also what makes them such good qubits for carrying quantum information. You can prepare a photon in a quantum superposition just by sending it through a lossless polarizing crystal, and then the superposition will last for as long as you can let the photon travel (at the speed of light). Sometimes this means putting the photon into a coil of fiber many kilometers long to store it, but that is OK–a kilometer of coiled fiber in the lab is no bigger than a few tens of centimeters. So the same properties that make photons excellent at carrying information also gives them very small decoherence. And after the KLM schemes began to be developed, the non-interacting properties of photons were no longer a handicap.

In the past 5 years there has been an explosion, as well as an implosion, of quantum photonic computing advances. The implosion is the level of integration which puts more and more optical elements into smaller and smaller footprints on silicon PICs. The explosion is the number of first-of-a-kind demonstrations: the first universal optical quantum computer [2], the first programmable photonic quantum computer [3], and the first (true) quantum computational advantage [4].

All of these “firsts” operate at room temperature. (There is a slight caveat: The photon-number detectors are actually superconducting wire detectors that do need to be cooled. But these can be housed off-chip and off-rack in a separate cooled system that is coupled to the quantum computer by — no surprise–fiber optics.) These are the advantages of photonic quantum computers: hundreds of qubits integrated onto chips, room-temperature operation, long decoherence times, compatibility with telecom light sources and PICs, compatibility with silicon chip fabrication, universal gates using postselection, and more. Despite the head start of some of the other quantum computing systems, photonics looks like it will be overtaking the others within only a few years to become the dominant technology for the future of quantum computing. And part of that future is being helped along by a new kind of quantum algorithm that is perfectly suited to optics.

Fig. 3 Superconducting photon counting detector. From WebSite

A New Kind of Quantum Algorithm: Boson Sampling

In 2011, Scott Aaronson (then at at MIT) published a landmark paper titled “The Computational Complexity of Linear Optics” with his post-doc, Anton Arkhipov [5].  The authors speculated on whether there could be an application of linear optics, not requiring the costly step of post-selection, that was still useful for applications, while simultaneously demonstrating quantum computational advantage.  In other words, could one find a linear optical system working with photons that could solve problems intractable to a classical computer?  To their own amazement, they did!  The answer was something they called “boson sampling”.

To get an idea of what boson sampling is, and why it is very hard to do on a classical computer, think of the classic demonstration of the normal probability distribution found at almost every science museum you visit, illustrated in Fig. 2.  A large number of ping-pong balls are dropped one at a time through a forest of regularly-spaced posts, bouncing randomly this way and that until they are collected into bins at the bottom.  Bins near the center collect many balls, while bins farther to the side have fewer.  If there are many balls, then the stacked heights of the balls in the bins map out a Gaussian probability distribution.  The path of a single ping-pong ball represents a series of “decisions” as it hits each post and goes left or right, and the number of permutations of all the possible decisions among all the other ping-pong balls grows exponentially—a hard problem to tackle on a classical computer.

Fig. 4 Ping-pont ball normal distribution. Watch the YouTube video.

         

In the paper, Aaronson considered a quantum analog to the ping-pong problem in which the ping-pong balls are replaced by photons, and the posts are replaced by beam splitters.  As its simplest possible implementation, it could have two photon channels incident on a single beam splitter.  The well-known result in this case is the “HOM dip” [6] which is a consequence of the boson statistics of the photon.  Now scale this system up to many channels and a cascade of beam splitters, and one has an N-channel multi-photon HOM cascade.  The output of this photonic “circuit” is a sampling of the vast number of permutations allowed by bose statistics—boson sampling. 

To make the problem more interesting, Aaronson allowed the photons to be launched from any channel at the top (as opposed to dropping all the ping-pong balls at the same spot), and they allowed each beam splitter to have adjustable phases (photons and phases are the key elements of an interferometer).  By adjusting the locations of the photon channels and the phases of the beam splitters, it would be possible to “program” this boson cascade to mimic interesting quantum systems or even to solve specific problems, although they were not thinking that far ahead.  The main point of the paper was the proposal that implementing boson sampling in a photonic circuit used resources that scaled linearly in the number of photon channels, while the problems that could be solved grew exponentially—a clear quantum computational advantage [4]. 

On the other hand, it turned out that boson sampling is not universal—one cannot construct a universal quantum computer out of boson sampling.  The first proposal was a specialty algorithm whose main function was to demonstrate quantum computational advantage rather than do something specifically useful—just like Deutsch’s first algorithm.  But just like Deutsch’s algorithm, which led ultimately to Shor’s very useful prime factoring algorithm, boson sampling turned out to be the start of a new wave of quantum applications.

Shortly after the publication of Aaronson’s and Arkhipov’s paper in 2011, there was a flurry of experimental papers demonstrating boson sampling in the laboratory [7, 8].  And it was discovered that boson sampling could solve important and useful problems, such as the energy levels of quantum systems, and network similarity, as well as quantum random-walk problems. Therefore, even though boson sampling is not strictly universal, it solves a broad class of problems. It can be viewed more like a specialty chip than a universal computer, like the now-ubiquitous GPU’s are specialty chips in virtually every desktop and laptop computer today. And the room-temperature operation significantly reduces cost, so you don’t need a whole government agency to afford one. Just like CPU costs followed Moore’s Law to the point where a Raspberry Pi computer costs $40 today, the photonic chips may get onto their own Moore’s Law that will reduce costs over the next several decades until they are common (but still specialty and probably not cheap) computers in academia and industry. A first step along that path was a recently-demonstrated general programmable room-temperature photonic quantum computer.

Fig. 5 A classical Galton board on the left, and a photon-based boson sampling on the right. From the Walmsley (Oxford) WebSite.

A Programmable Photonic Quantum Computer: Xanadu’s X8 Chip

I don’t usually talk about specific companies, but the new photonic quantum computer chip from Xanadu, based in Toronto, Canada, feels to me like the start of something big. In the March 4, 2021 issue of Nature magazine, researchers at the company published the experimental results of their X8 photonic chip [3]. The chip uses boson sampling of strongly non-classical light. This was the first generally programmable photonic quantum computing chip, programmed using a quantum programming language they developed called Strawberry Fields. By simply changing the quantum code (using a simple conventional computer interface), they switched the computer output among three different quantum applications: transitions among states (spectra of molecular states), quantum docking, and similarity between graphs that represent two different molecules. These are radically different physics and math problems, yet the single chip can be programmed on the fly to solve each one.

The chip is constructed of nitride waveguides on silicon, shown in Fig. 6. The input lasers drive ring oscillators that produce squeezed states through four-wave mixing. The key to the reprogrammability of the chip is the set of phase modulators that use simple thermal changes on the waveguides. These phase modulators are changed in response to commands from the software to reconfigure the application. Although they switch slowly, once they are set to their new configuration, the computations take place “at the speed of light”. The photonic chip is at room temperature, but the outputs of the four channels are sent by fiber optic to a cooled unit containing the superconductor nanowire photon counters.

Fig. 6 The Xanadu X8 photonic quantum computing chip. From Ref.
Fig. 7 To see the chip in operation, see the YouTube video.

Admittedly, the four channels of the X8 chip are not large enough to solve the kinds of problems that would require a quantum computer, but the company has plans to scale the chip up to 100 channels. One of the challenges is to reduce the amount of photon loss in a multiplexed chip, but standard silicon fabrication approaches are expected to reduce loss in the next generation chips by an order of magnitude.

Additional companies are also in the process of entering the photonic quantum computing business, such as PsiQuantum, which recently closed a $450M funding round to produce photonic quantum chips with a million qubits. The company is led by Jeremy O’Brien from Bristol University who has been a leader in photonic quantum computing for over a decade.

Stay tuned!

Further Reading

• David D. Nolte, “Interference: Taking Nature’s Measure-The History and Physics of Optical Interferometry” (Oxford University Press, to be published in 2023)

• J. L. O’Brien, A. Furusawa, and J. Vuckovic, “Photonic quantum technologies,” Nature Photonics, Review vol. 3, no. 12, pp. 687-695, Dec (2009)

• T. C. Ralph and G. J. Pryde, “Optical Quantum Computation,” in Progress in Optics, Vol 54, vol. 54, E. Wolf Ed.,  (2010), pp. 209-269.

• S. Barz, “Quantum computing with photons: introduction to the circuit model, the one-way quantum computer, and the fundamental principles of photonic experiments,” (in English), Journal of Physics B-Atomic Molecular and Optical Physics, Article vol. 48, no. 8, p. 25, Apr (2015), Art no. 083001

References

[1] E. Knill, R. Laflamme, and G. J. Milburn, “A scheme for efficient quantum computation with linear optics,” Nature, vol. 409, no. 6816, pp. 46-52, Jan (2001)

[2] J. Carolan, J. L. O’Brien et al, “Universal linear optics,” Science, vol. 349, no. 6249, pp. 711-716, Aug (2015)

[3] J. M. Arrazola, et al, “Quantum circuits with many photons on a programmable nanophotonic chip,” Nature, vol. 591, no. 7848, pp. 54-+, Mar (2021)

[4] H.-S. Zhong J.-W. Pan et al, “Quantum computational advantage using photons,” Science, vol. 370, no. 6523, p. 1460, (2020)

[5] S. Aaronson and A. Arkhipov, “The Computational Complexity of Linear Optics,” in 43rd ACM Symposium on Theory of Computing, San Jose, CA, Jun 06-08 2011, NEW YORK: Assoc Computing Machinery, in Annual ACM Symposium on Theory of Computing, 2011, pp. 333-342

[6] C. K. Hong, Z. Y. Ou, and L. Mandel, “Measurement of subpicosecond time intervals between 2 photons by interference,” Physical Review Letters, vol. 59, no. 18, pp. 2044-2046, Nov (1987)

[7] J. B. Spring, I. A. Walmsley et al, “Boson Sampling on a Photonic Chip,” Science, vol. 339, no. 6121, pp. 798-801, Feb (2013)

[8] M. A. Broome, A. Fedrizzi, S. Rahimi-Keshari, J. Dove, S. Aaronson, T. C. Ralph, and A. G. White, “Photonic Boson Sampling in a Tunable Circuit,” Science, vol. 339, no. 6121, pp. 794-798, Feb (2013)

Twenty Years at Light Speed: Optical Computing

In the epilog of my book Mind at Light Speed: A New Kind of Intelligence (Free Press, 2001), I speculated about a future computer in which sheets of light interact with others to form new meanings and logical cascades as light makes decisions in a form of all-optical intelligence.

Twenty years later, that optical computer seems vaguely quaint, not because new technology has passed it by, like looking at the naïve musings of Jules Verne from our modern vantage point, but because the optical computer seems almost as far away now as it did back in 2001.

At the the turn of the Millennium we were seeing tremendous advances in data rates on fiber optics (see my previous Blog) as well as the development of new types of nonlinear optical devices and switches that served the role of rudimentary logic switches.  At that time, it was not unreasonable to believe that the pace of progress would remain undiminished, and that by 2020 we would have all-optical computers and signal processors in which the same optical data on the communication fibers would be involved in the logic that told the data what to do and where to go—all without the wasteful and slow conversion to electronics and back again into photons.

However, the rate of increase of the transmission bandwidth on fiber optic cables slowed not long after the publication of my book, and nonlinear optics today still needs high intensities to be efficient, which remains a challenge for significant (commercial) use of all-optical logic any time soon.

That said, it’s dangerous to ever say never, and research into all-optical computing and data processing is still going strong (See Fig. 1).  It’s not the dream that was wrong, it was the time-scale that was wrong, just like fiber-to-the-home.  Back in 2001, fiber-to-the-home was viewed as a pipe-dream by serious technology scouts.  It took twenty years, but now that vision is coming true in urban settings.  Back in 2001, all-optical computing seemed about 20 years away, but now it still looks 20 years out.  Maybe this time the prediction is right.  Recent advances in all-optical processing give some hope for it.  Here are some of those advances.

Fig. 1 Number of papers published by year with the phrase in the title: “All-Optical” or “Photonic or Optical and Neur*” according to Web of Science search. The term “All-optical” saturated around 2005. Papers written around optical neural networks was low to 2015 but now is experiencing a strong surge. The sociology of title choices, and how favorite buzz words shift over time, can obscure underlying causes and trends, but overall there is current strong interest in all-optical systems.

The “What” and “Why” of All-Optical Processing

One of the great dreams of photonics is the use of light beams to perform optical logic in optical processors just as electronic currents perform electronic logic in transistors and integrated circuits. 

Our information age, starting with the telegraph in the mid-1800’s, has been built upon electronics because the charge of the electron makes it a natural decision maker.  Two charges attract or repel by Coulomb’s Law, exerting forces upon each other.  Although we don’t think of currents acting in quite that way, the foundation of electronic logic remains electrical interactions. 

But with these interactions also come constraints—constraining currents to be contained within wires, waiting for charging times that slow down decisions, managing electrical resistance and dissipation that generate heat (computer processing farms in some places today need to be cooled by glacier meltwater).  Electronic computing is hardly a green technology.

Therefore, the advantages of optical logic are clear: broadcasting information without the need for expensive copper wires, little dissipation or heat, low latency (signals propagate at the speed of light).  Furthermore, information on the internet is already in the optical domain, so why not keep it in the optical domain and have optical information packets making the decisions?  All the routing and switching decisions about where optical information packets should go could be done by the optical packets themselves inside optical computers.

But there is a problem.  Photons in free space don’t interact—they pass through each other unaffected.  This is the opposite of what is needed for logic and decision making.  The challenge of optical logic is then to find a way to get photons to interact.

Think of the scene in Star Wars: The New Hope when Obiwan Kenobi and Darth Vader battle to the death in a light saber duel—beams of light crashing against each other and repelling each other with equal and opposite forces.  This is the photonic engineer’s dream!  Light controlling light.  But this cannot happen in free space. On the other hand, light beams can control other light beams inside nonlinear crystals where one light beam changes the optical properties of the crystal, hence changing how another light beam travels through it.  These are nonlinear optical crystals.

Nonlinear Optics

Virtually all optical control designs, for any kind of optical logic or switch, require one light beam to affect the properties of another, and that requires an intervening medium that has nonlinear optical properties.  The physics of nonlinear optics is actually simple: one light beam changes the electronic structure of a material which affects the propagation of another (or even the same) beam.  The key parameter is the nonlinear coefficient that determines how intense the control beam needs to be to produce a significant modulation of the other beam.  This is where the challenge is.  Most materials have very small nonlinear coefficients, and the intensity of the control beam usually must be very high. 

Fig. 2 Nonlinear optics: Light controlling light. Light does not interact in free space, but inside a nonlinear crystal, polarizability can create an effect interaction that can be surprisingly strong. Two-wave mixing (exchange of energy between laser beams) is shown in the upper pane. Optical associative holographic memory (four-wave mixing) is an example of light controlling light. The hologram is written when exposed by both “Light” and “Guang/Hikari”. When the recorded hologram is presented later only with “Guang/Hikari” it immediately translates it to “Light”, and vice versa.

Therefore, to create low-power all-optical logic gates and switches there are four main design principles: 1) increase the nonlinear susceptibility by engineering the material, 2) increase the interaction length between the two beams, 3) concentrate light into small volumes, and 4) introduce feedback to boost the internal light intensities.  Let’s take these points one at a time.

Nonlinear susceptibility: The key to getting stronger interaction of light with light is in the ease with which a control beam of light can distort the crystal so that the optical conditions change for a signal beam. This is called the nonlinear susceptibility . When working with “conventional” crystals like semiconductors (e.g. CdZnSe) or rare-Earths (e.g. LiNbO3), there is only so much engineering that is possible to try to tweak the nonlinear susceptibilities. However, artificially engineered materials can offer significant increases in nonlinear susceptibilities, these include plasmonic materials, metamaterials, organic semiconductors, photonic crystals. An increasingly important class of nonlinear optical devices are semiconductor optical amplifiers (SOA).

Interaction length: The interaction strength between two light waves is a product of the nonlinear polarization and the length over which the waves interact. Interaction lengths can be made relatively long in waveguides but can be made orders of magnitude longer in fibers. Therefore, nonlinear effects in fiber optics are a promising avenue for achieving optical logic.

Intensity Concentration:  Nonlinear polarization is the product of the nonlinear susceptibility with the field amplitude of the waves. Therefore, focusing light down to small cross sections produces high power, as in the core of a fiber optic, again showing advantages of fibers for optical logic implementations.

Feedback: Feedback, as in a standing-wave cavity, increases the intensity as well as the effective interaction length by folding the light wave continually back on itself. Both of these effects boost the nonlinear interaction, but then there is an additional benefit: interferometry. Cavities, like a Fabry-Perot, are interferometers in which a slight change in the round-trip phase can produce large changes in output light intensity. This is an optical analog to a transistor in which a small control current acts as a gate for an exponential signal current. The feedback in the cavity of a semiconductor optical amplifier (SOA), with high internal intensities and long effective interaction lengths and an active medium with strong nonlinearity make these elements attractive for optical logic gates. Similarly, integrated ring resonators have the advantage of interferometric control for light switching. Many current optical switches and logic gates are based on SOAs and integrated ring resonators.

All-Optical Regeneration

The vision of the all-optical internet, where the logic operations that direct information to different locations is all performed by optical logic without ever converting into the electrical domain, is facing a barrier that is as challenging to overcome today as it was back in 2001: all-optical regeneration. All-optical regeneration has been and remains the Achilles Heal of the all-optical internet.

Signal regeneration is currently performed through OEO conversion: Optical-to-Electronic-to-Optical. In OEO conversion, a distorted signal (distortion is caused by attenuation and dispersion and noise as signals travel down fiber optics) is received by a photodetector, is interpreted as ones and zeros that drive laser light sources that launch the optical pulses down the next stretch of fiber. The new pulses are virtually perfect, but they again degrade as they travel, until they are regenerated, and so on. The added advantage of the electrical layer is that the electronic signals can be used to drive conventional electronic logic for switching.

In all-optical regeneration, on the other hand, the optical pulses need to be reamplified, reshaped and retimed––known as 3R regeneration––all by sending the signal pulses through nonlinear amplifiers and mixers, which may include short stretches of highly nonlinear fiber (HNLF) or semiconductor optical amplifiers (SOA). There have been demonstrations of 2R all-optical regeneration (reamplifying and reshaping but not retiming) at lower data rates, but getting all 3Rs at the high data rates (40 Gb/s) in the next generation telecom systems remains elusive.

Nonetheless, there is an active academic literature that is pushing the envelope on optical logical devices and regenerators [1]. Many of the systems focus on SOA’s, HNLF’s and Interferometers. Numerical modeling of these kinds of devices is currently ahead of bench-top demonstrations, primarily because of the difficulty of fabrication and device lifetime. But the numerical models point to performance that would be competitive with OEO. If this OOO conversion (Optical-to-Optical-to-Optical) is scalable (can handle increasing bit rates and increasing numbers of channels), then the current data crunch that is facing the telecom trunk lines (see my previous Blog) may be a strong driver to implement such all-optical solutions.

It is important to keep in mind that legacy technology is not static but also continues to improve. As all-optical logic and switching and regeneration make progress, OEO conversion gets incrementally faster, creating a moving target. Therefore, we will need to wait another 20 years to see whether OEO is overtaken and replaced by all-optical.

Fig. 3 Optical-Electronic-Optical regeneration and switching compared to all-optical control. The optical control is performed using SOA’s, interferometers and nonlinear fibers.

Photonic Neural Networks

The most exciting area of optical logic today is in analog optical computing––specifically optical neural networks and photonic neuromorphic computing [2, 3]. A neural network is a highly-connected network of nodes and links in which information is distributed across the network in much the same way that information is distributed and processed in the brain. Neural networks can take several forms––from digital neural networks that are implemented with software on conventional digital computers, to analog neural networks implemented in specialized hardware, sometimes also called neuromorphic computing systems.

Optics and photonics are well suited to the analog form of neural network because of the superior ability of light to form free-space interconnects (links) among a high number of optical modes (nodes). This essential advantage of light for photonic neural networks was first demonstrated in the mid-1980’s using recurrent neural network architectures implemented in photorefractive (nonlinear optical) crystals (see Fig. 1 for a publication timeline). But this initial period of proof-of-principle was followed by a lag of about 2 decades due to a mismatch between driver applications (like high-speed logic on an all-optical internet) and the ability to configure the highly complex interconnects needed to perform the complex computations.

Fig. 4 Optical vector-matrix multiplication. An LED array is the input vector, focused by a lens onto the spatial light modulator that is the 2D matrix. The transmitted light is refocussed by the lens onto a photodiode array with is the output vector. Free-space propagation and multiplication is a key advantage to optical implementation of computing.

The rapid rise of deep machine learning over the past 5 years has removed this bottleneck, and there has subsequently been a major increase in optical implementations of neural networks. In particular, it is now possible to use conventional deep machine learning to design the interconnects of analog optical neural networks for fixed tasks such as image recognition [4]. At first look, this seems like a non-starter, because one might ask why not use the conventional trained deep network to do the recognition itself rather than using it to create a special-purpose optical recognition system. The answer lies primarily in the metrics of latency (speed) and energy cost.

In neural computing, approximately 90% of the time and energy go into matrix multiplication operations. Deep learning algorithms driving conventional digital computers need to do the multiplications at the sequential clock rate of the computer using nested loops. Optics, on the other had, is ideally suited to perform matrix multiplications in a fully parallel manner (see Fig. 4). In addition, a hardware implementation using optics operates literally at the speed of light. The latency is limited only by the time of flight through the optical system. If the optical train is 1 meter, then the time for the complete computation is only a few nanoseconds at almost no energy dissipation. Combining the natural parallelism of light with the speed has led to unprecedented computational rates. For instance, recent implementations of photonic neural networks have demonstrated over 10 Trillion operations per second (TOPS) [5].

It is important to keep in mind that although many of these photonic neural networks are characterized as all-optical, they are generally not reconfigurable, meaning that they are not adaptive to changing or evolving training sets or changing input information. Most adaptive systems use OEO conversion with electronically-addressed spatial light modulators (SLM) that are driven by digital logic.

Farther afield are all-optical systems that are adaptive through the use of optically-addressed spatial light modulators or nonlinear materials. In fact, these types of adaptive all-optical neural networks were among the first demonstrated in the late 1980’s. More recently, advanced adaptive optical materials, as well as fiber delay lines for a type of recurrent neural network known as reservoir computing, have been used to implement faster and more efficient optical nonlinearities needed for adaptive updates of neural weights. But there are still years to go before light is adaptively controlling light entirely in the optical domain at the speeds and with the flexibility needed for real-world applications like photonic packet switching in telecom fiber-optic routers.

In stark contrast to the status of classical all-optical computing, photonic quantum computing is on the cusp of revolutionizing the field of quantum information science. The recent demonstration from the Canadian company Xanadu of a programmable photonic quantum computer that operates at room temperature may be the harbinger of what is to come in the third generation Machines of Light: Quantum Optical Computers, which is the topic of my next blog.

Further Reading

[1] V. Sasikala and K. Chitra, “All optical switching and associated technologies: a review,” Journal of Optics-India, vol. 47, no. 3, pp. 307-317, Sep (2018)

[2] C. Huang et a., “Prospects and applications of photonic neural networks,” Advances in Physics-X, vol. 7, no. 1, Jan (2022), Art no. 1981155

[3] G. Wetzstein, A. Ozcan, S. Gigan, S. H. Fan, D. Englund, M. Soljacic, C. Denz, D. A. B. Miller, and D. Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature, vol. 588, no. 7836, pp. 39-47, Dec (2020)

[4] X. Lin, Y. Rivenson, N. T. Yardimei, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science, vol. 361, no. 6406, pp. 1004-+, Sep (2018)

[5] X. Y. Xu, M. X. Tan, B. Corcoran, J. Y. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti, A. Mitchell, and D. J. Moss, “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature, vol. 589, no. 7840, pp. 44-+, Jan (2021)

Twenty Years at Light Speed: Fiber Optics and the Future of the Photonic Internet

Twenty years ago this November, my book Mind at Light Speed: A New Kind of Intelligence was published by The Free Press (Simon & Schuster, 2001).  The book described the state of optical science at the turn of the Millennium through three generations of Machines of Light:  The Optoelectronic Generation of electronic control meshed with photonic communication; The All-Optical Generation of optical logic; and The Quantum Optical Generation of quantum communication and computing.

To mark the occasion of the publication, this Blog Post begins a three-part series that updates the state-of-the-art of optical technology, looking at the advances in optical science and technology over the past 20 years since the publication of Mind at Light Speed.  This first blog reviews fiber optics and the photonic internet.  The second blog reviews all-optical communication and computing.  The third and final blog reviews the current state of photonic quantum communication and computing.

The Wabash Yacht Club

During late 1999 and early 2000, while I was writing Mind at Light Speed, my wife Laura and I would often have lunch at the ironically-named Wabash Yacht Club.  Not only was it not a Yacht Club, but it was a dark and dingy college-town bar located in a drab 70-‘s era plaza in West Lafayette, Indiana, far from any navigable body of water.  But it had a great garlic burger and we loved the atmosphere.

The Wabash River. No yachts. (https://www.riverlorian.com/wabash-river)

One of the TV monitors in the bar was always tuned to a station that covered stock news, and almost every day we would watch the NASDAQ rise 100 points just over lunch.  This was the time of the great dot-com stock-market bubble—one of the greatest speculative bubbles in the history of world economics.  In the second quarter of 2000, total US venture capital investments exceeded $30B as everyone chased the revolution in consumer market economics.

Fiber optics will remain the core technology of the internet for the foreseeable future.

Part of that dot-com bubble was a massive bubble in optical technology companies, because everyone knew that the dot-com era would ride on the back of fiber optics telecommunications.  Fiber optics at that time had already revolutionized transatlantic telecommunications, and there seemed to be no obstacle for it to do the same land-side with fiber optics to every home bringing every dot-com product to every house and every movie ever made.  What would make this possible was the tremendous information bandwidth that can be crammed into tiny glass fibers in the form of photon packets traveling at the speed of light.

Doing optics research at that time was a heady experience.  My research on real-time optical holography was only on the fringe of optical communications, but at the CLEO conference on lasers and electro-optics, I was invited by tiny optics companies to giant parties, like a fully-catered sunset cruise on a schooner sailing Baltimore’s inner harbor.  Venture capital scouts took me to dinner in San Francisco with an eye to scoop up whatever patents I could dream of.  And this was just the side show.  At the flagship fiber-optics conference, the Optical Fiber Conference (OFC) of the OSA, things were even crazier.  One tiny company that made a simple optical switch went almost overnight from a company worth a couple of million to being bought out by Nortel (the giant Canadian telecommunications conglomerate of the day) for over 4 billion dollars.

The Telecom Bubble and Bust

On the other side from the small mom-and-pop optics companies were the giants like Corning (who made the glass for the glass fiber optics) and Nortel.  At the height of the telecom bubble in September 2000, Nortel had a capitalization of almost $400B Canadian dollars due to massive speculation about the markets around fiber-optic networks.

One of the central questions of the optics bubble of Y2K was what the new internet market would look like.  Back then, fiber was only beginning to connect to distribution nodes that were connected off the main cross-country trunk lines.  Cable TV dominated the market with fixed programming where you had to watch whatever they transmitted whenever they transmitted it.  Google was only 2 years old, and Youtube didn’t even exist then—it was founded in 2005.  Everyone still shopped at malls, while Amazon had only gone public three years before.

There were fortune tellers who predicted that fiber-to-the-home would tap a vast market of online commerce where you could buy anything you wanted and have it delivered to your door.  They foretold of movies-on-demand, where anyone could stream any movie they wanted at any time.  They also foretold of phone calls and video chats that never went over the phone lines ruled by the telephone monopolies.  The bandwidth, the data rates, that these markets would drive were astronomical.  The only technology at that time that could support such high data rates was fiber optics.

At first, these fortune tellers drove an irrational exuberance.  But as the stocks inflated, there were doomsayers who pointed out that the costs at that time of bringing fiber into homes was prohibitive. And the idea that people would be willing to pay for movies-on-demand was laughable.  The cost of the equipment and the installation just didn’t match what then seemed to be a sparse market demand.  Furthermore, the fiber technology in the year 2000 couldn’t even get to the kind of data rates that could support these dreams.

In March of 2000 the NASDAQ hit a high of 5000, and then the bottom fell out.

By November 2001 the NASDAQ had fallen to 1500.  One of the worst cases of the telecom bust was Nortel whose capitalization plummeted from $400B at its high to $5B Canadian by August 2002.  Other optics companies fared little better.

The main questions, as we stand now looking back from 20 years in the future, are: What in real life motivated the optics bubble of 2000?  And how far has optical technology come since then?  The surprising answer is that the promise of optics in 2000 was not wrong—the time scale was just off. 

Fiber to the Home

Today, fixed last-mile broadband service is an assumed part of life in metro areas in the US.  This broadband takes on three forms: legacy coaxial cable, 4G wireless soon to be upgraded to 5G, and fiber optics.  There are arguments pro and con for each of these technologies, especially moving forward 10 or 20 years or more, and a lot is at stake.  The global market revenue was $108 Billion in 2020 and is expected to reach $200 Billion in 2027, growing at over 9% from 2021 to 2027.

(ShutterStock_75369058.jpg)

To sort through the pros and cons to pick the wining technology, several key performance parameters must be understood for each technology.  The two most important performance measures are bandwidth and latency.  Bandwidth is the data rate—how many bits per second can you get to the home.  Latency is a little more subtle.  It is the time it takes to complete a transmission.  This time includes the actual time for information to travel from a transmitter to a receiver, but that is rarely the major contributor.  Currently, almost all of the latency is caused by the logical operations needed to move the information onto and off of the home data links. 

Coax (short for coaxial cable) is attractive because so much of the last-mile legacy hardware is based on the old cable services.  But coax cable has very limited bandwidth and high latency. As a broadband technology, it is slowly disappearing.

Wireless is attractive because the information is transmitted in the open air without any need for physical wires or fibers.  But high data rates require high frequency.  For instance, 4G wireless operates at frequencies between 700 MHz to 2.6 GHz.  Current WiFi is 2.4 GHz or 5 GHz, and next-generation 5G will have 26 GHz using millimeter wave technology, and WiGig is even more extreme at 60 GHz.  While WiGig will deliver up to 10 Gbits per second, as everyone with wireless routers in their homes knows, the higher the frequency, the more it is blocked by walls or other obstacles.  Even 5 GHz is mostly attenuated by walls, and the attenuation gets worse as the frequency gets higher.  Testing of 5G networks has shown that cell towers need to be closely spaced to allow seamless coverage.  And the crazy high frequency of WiGig all but guarantees that it will only be usable for line-of-sight communication within a home or in an enterprise setting. 

Fiber for the last mile, on the other hand, has multiple advantages.  Chief among these is that fiber is passive.  It is a light pipe that has ten thousand times more usable bandwidth than a coaxial cable.  For instance, lab tests have pushed up to 100 Tbit/sec over kilometers of fiber.  To access that bandwidth, the input and output hardware can be continually upgraded, while the installed fiber is there to handle pretty much any amount of increasing data rates for the next 10 or 20 years.  Fiber installed today is supporting 1 Gbit/sec data rates, and the existing protocol will work up to 10 Gbit/sec—data rates that can only be hoped for with WiFi.  Furthermore, optical communications on fiber have latencies of around 1.5 msec over 20 kilometers compared with 4G LTE that has a latency of 8 msec over 1 mile.  The much lower latency is key to support activities that cannot stand much delay, such as voice over IP, video chat, remote controlled robots, and virtual reality (i.e., gaming).  On top of all of that, the internet technology up to the last mile is already almost all optical.  So fiber just extends the current architecture across the last mile.

Therefore, fixed fiber last-mile broadband service is a technology winner.  Though the costs can be higher than for WiFi or coax in the short run for installation, the long-run costs are lower when amortized over the lifetime of the installed fiber which can exceed 25 years.

It is becoming routine to have fiber-to-the-curb (FTTC) where a connection box converts photons in fibers into electrons on copper to take the information into the home.  But a market also exists in urban settings for fiber-to-the-home (FTTH) where the fiber goes directly into the house to a receiver and only then would the information be converted from photons to electrons and electronics.

Shortly after Mind at Light Speed was published in 2001, I was called up by a reporter for the Seattle Times who wanted to know my thoughts about FTTH.  When I extolled its virtue, he nearly hung up on me.  He was in the middle of debunking the telecom bubble and his premise was that FTTH was a fraud.  In 2001 he might have been right.  But in 2021, FTTH is here, it is expanding, and it will continue to do so for at least another quarter century.  Fiber to the home will become the legacy that some future disruptive technology will need to displace.

Fig. 1 Optical data rates on optical links, trunk lines and submarine cables over the past 30 years and projecting into the future. Redrawn from Refs. [1, 2]

Trunk-Line Fiber Optics

Despite the rosy picture for Fiber to the Home, a storm is brewing for the optical trunk lines.  The total traffic on the internet topped a billion Terrabytes in 2019 and is growing fast, doubling about every 2 years on an exponential growth curve.  In 20 years, that becomes another factor of a thousand more traffic in 2040 than today.  Therefore, the technology companies that manage and supply the internet worry about a capacity crunch that is fast approaching when there will be more demand than the internet can supply.

Over the past 20 years, the data rates on the fiber trunk lines—the major communication links that span the United States—matched demand by packing more bits in more ways into the fibers.  Up to 2009, increased data rates were achieved using dispersion-managed wavelength-division multiplexing (WDM) which means that they kept adding more lasers of slightly different colors to send the optical bits down the fiber.  For instance, in 2009 the commercial standard was 80 colors each running at 40 Gbit/sec for a total of 3.2 Tbit/sec down a single fiber. 

Since 2009, increased bandwidth has been achieved through coherent WDM, where not only the amplitude of light but also the phase of the light is used to encode bits of information using interferometry.  We are still in the coherent WDM era as improved signal processing is helping to fill the potential coherent bandwidth of a fiber.  Commercial protocols using phase-shift keying, quadrature phase-shift keying, and 16-quadrature amplitude modulation currently support 50 Gbit/sec, 100 Gbit/sec and 200 Gbit/sec, respectively.  But the capacity remaining is shrinking, and several years from now, a new era will need to begin in order to keep up with demand.  But if fibers are already using time, color, polarization and phase to carry information, what is left? 

The answer is space!

Coming soon will be commercial fiber trunk lines that use space-division multiplexing (SDM).  The simplest form is already happening now as fiber bundles are replacing single-mode fibers.  If you double the number of fibers in a cable, then you double the data rate of the cable.  But the problem with this simple approach is the scaling.  If you double just 10 times, then you need 1024 fibers in a single cable—each fiber needing its own hardware to launch the data and retrieve it at the other end.  This is linear scaling, which is bad scaling for commercial endeavors. 

Fig. 2 Fiber structures for space-division multiplexing (SDM). Fiber bundles are cables of individual single-mode fibers. Multi-element fibers (MEF) are single-mode fibers formed together inside the coating. Multi-core fibers (MCF) have multiple cores within the cladding. Few-mode fibers (FMF) are multi-mode fibers with small mode numbers. Coupled core (CC) fibers are multi-core fibers in which the cores are close enough that the light waves are coupled into coupled spatial modes. Redrawn from Ref. [3]

Therefore, alternatives for tapping into SDM are being explored in lab demonstrations now that have sublinear scaling (costs don’t rise as fast as improved capacity).  These include multi-element fibers where multiple fiber optical elements are manufactured as a group rather than individually and then combined into a cable.  There are also multi-core fibers, where multiple fibers share the same cladding.  These approaches provide multiple fibers for multiple channels without a proportional rise in cost.

More exciting are approaches that use few-mode-fibers (FMF) to support multiple spatial modes traveling simultaneously down the same fiber.  In the same vein are coupled-core fibers which is a middle ground between multi-core fibers and few-mode fibers in that individual cores can interact within the cladding to support coupled spatial modes that can encode separate spatial channels.  Finally, combinations of approaches can use multiple formats.  For instance, a recent experiment combined FMF and MCF that used 19 cores each supporting 6 spatial modes for a total of 114 spatial channels.

However, space-division multiplexing has been under development for several years now, yet it has not fully moved into commercial systems. This may be a sign that the doubling rate of bandwidth may be starting to slow down, just as Moore’s Law slowed down for electronic chips.  But there were doomsayers foretelling the end of Moore’s Law for decades before it actually slowed down, because new ideas cannot be predicted. But even if the full capacity of fiber is being approached, there is certainly nothing that will replace fiber with any better bandwidth.  So fiber optics will remain the core technology of the internet for the foreseeable future. 

But what of the other generations of Machines of Light: the all-optical and the quantum-optical generations?  How have optics and photonics fared in those fields?  Stay tuned for my next blogs to find out.

Bibliography

[1] P. J. Winzer, D. T. Neilson, and A. R. Chraplyvy, “Fiber-optic transmission and networking: the previous 20 and the next 20 years,” Optics Express, vol. 26, no. 18, pp. 24190-24239, Sep (2018) [Link]

[2] W. Shi, Y. Tian, and A. Gervais, “Scaling capacity of fiber-optic transmission systems via silicon photonics,” Nanophotonics, vol. 9, no. 16, pp. 4629-4663, Nov (2020)

[3] E. Agrell, M. Karlsson, A. R. Chraplyvy, D. J. Richardson, P. M. Krummrich, P. Winzer, K. Roberts, J. K. Fischer, S. J. Savory, B. J. Eggleton, M. Secondini, F. R. Kschischang, A. Lord, J. Prat, I. Tomkos, J. E. Bowers, S. Srinivasan, M. Brandt-Pearce, and N. Gisin, “Roadmap of optical communications,” Journal of Optics, vol. 18, no. 6, p. 063002, 2016/05/04 (2016) [Link]

The Secret Life of Snow: Laser Speckle

If you have ever seen euphemistically-named “snow”—the black and white dancing pixels on television screens in the old days of cathode-ray tubes—you may think it is nothing but noise.  But the surprising thing about noise is that it is a good place to hide information. 

Shine a laser pointer on any rough surface and look at the scattered light on a distant wall, then you will see the same patterns of light and dark known as laser speckle.  If you move your head or move the pointer, then the speckle shimmers—just like the snow on the old TVs.  This laser speckle—this snow—is providing fundamental new ways to extract information hidden inside three-dimensional translucent objects—objects like biological tissue or priceless paintings or silicon chips.

Snow Crash

The science fiction novel Snow Crash, published in 1992 by Neal Stephenson, is famous for popularizing virtual reality and the role of avatars.  The central mystery of the novel is the mind-destroying mental crash that is induced by Snow—white noise in the metaverse.  The protagonist hero of the story—a hacker with an avatar improbably named Hiro Protagonist—must find the source of snow and thwart the nefarious plot behind it.

Fig. 1 Book cover of Snow Crash

If Hiro’s snow in his VR headset is caused by laser speckle, then the seemingly random pattern is composed of amplitudes and phases that vary spatially and temporally.  There are many ways to make computer-generated versions of speckle.  One of the simplest is to just add together a lot of sinusoidal functions with varying orientations and periodicities.  This is a “Fourier” approach to speckle which views it as a random superposition of two-dimensional spatial frequencies.  An example is shown in Fig. 2 for one sinusoid which has been added to 20 others to generate the speckle pattern on the right.  There is still residual periodicity in the speckle for N = 20, but as N increases, the speckle pattern becomes strictly random, like noise. 

But if the sinusoids that are being added together link the periodicity with their amplitude through some functional relationship, then the final speckle can be analyze using a 2D Fourier transform to find its spatial frequency spectrum.  The functional form of this spectrum can tell a lot about the underlying processes of the speckle formation.  This is part of the information hidden inside snow.

Fig. 2 Sinusoidal addition to generate random speckle.  a) One example of a spatial periodicity.  b) The superposition of 20 random sinusoids.

(To watch animations of the figures in real time, See YouTube video.)

An alternative viewpoint to generating a laser speckle pattern thinks in terms of spatially-localized patches that add randomly together with random amplitudes and phases.  This is a space-domain view of speckle formation in contrast to the Fourier-space view of the previous construction.  Sinusoids are “global” extending spatially without bound. The underlying spatially localized functions can be almost any local function.  Gaussians spring to mind, but so do Airy functions, because they are common point-spread functions that participate in the formation of images through lenses.  The example in Fig 3a shows one such Airy function, and in 3b for speckle generated from N = 20 Airy functions of varying amplitudes and phases and locations.

Fig. 3  Generating speckle by random superposition of point spread functions (spatially-localized functions) of varying amplitude, phase, position and bandwidth.

These two examples are complementary ways of generating speckle, where the 2D Fourier-domain approach is conjugate to the 2D space-domain approach.

However, laser speckle is actually a 3D phenomenon, and the two-dimensional speckle patterns are just 2D cross sections intersecting a complex 3D pattern of light filaments.  To get a sense of how laser speckle is formed in a physical system, one can solve the propagation of a laser beam through a random optical medium.  In this way you can visualize the physical formation of the regions of brightness and darkness when the fragmented laser beam exits the random material. 

Fig. 4 Propagation of a coherent beam into a random optical medium. Speckle is intrinsically three dimensional while 2D speckle is the cross section of the light filaments.

Coherent Patch

For a quantitative understanding of laser speckle, when 2D laser speckle is formed by an optical system, the central question is how big are the regions of brightness and darkness?  This is a question of spatial coherence, and one way to define spatial coherence is through the coherence area at the observation plane

where A is the source emitting area, z is the distance to the observation plane, and Ωs is the solid angle subtended by the source emitting area as seen from the observation point. This expression assumes that the angular spread of the light scattered from the illumination area is very broad. Larger distances and smaller emitting areas (pinholes in an optical diffuser or focused laser spots on a rough surface) produce larger coherence areas in the speckle pattern. For a Gaussian intensity distribution at the emission plane, the coherence area is

for beam waist w0 at the emission plane.  To put some numbers to these parameters to give an intuitive sense of the size of speckle spots, assume a wavelength of 1 micron, a focused beam waist of 0.1 mm and a viewing distance of 1 meter.  This gives patches with a radius of about 2 millimeters.  Examples of laser speckle are shown in Fig. 5 for a variety of beam waist values w0

Fig. 5 Speckle intensities for Gaussian illumination of a random phase screen for changing illumination radius w0 = 64, 32, 16 and 8 microns for f = 1 cm and W = 500 nm with a field-of-view of 5 mm. (Reproduced from Ref.[1])

Speckle Holograms

Associated with any intensity modulation must be a phase modulation through the Kramers-Kronig relations [2].  Phase cannot be detected directly in the speckle intensity pattern, but it can be measured by using interferometry.  One of the easiest interferometric techniques is holography in which a coherent plane wave is caused to intersect, at a small angle, a speckle pattern generated from the same laser source.  An example of a speckle hologram and its associated phase is shown in Fig. 6. The fringes of the hologram are formed when a plane reference wave interferes with the speckle field.  The fringes are not parallel because of the varying phase of the speckle field, but the average spatial frequency is still recognizable in Fig. 5a.  The associated phase map is shown in Fig. 5b.

Fig. 6 Speckle hologram and speckle phase.  a) A coherent plane-wave reference added to fully-developed speckle (unity contrast) produces a speckle hologram.  b) The phase of the speckle varies through 2π.

Optical Vortex Physics

In the speckle intensity field, there are locations where the intensity vanishes, and the phase becomes undefined.  In the neighborhood of a singular point the phase wraps around it with a 2pi phase range.  Because of the wrapping phase such a singular point is called and optical vortex [3].  Vortices always come in pairs with opposite helicity (defined by the direction of the wrapping phase) with a line of neutral phase between them as shown in Fig. 7.  The helicity defines the topological charge of the vortex, and they can have topological charges larger than ±1 if the phase wraps multiple times.  In dynamic speckle these vortices are also dynamic and move with speeds related to the underlying dynamics of the scattering medium [4].  Vortices can annihilate if they have opposite helicity, and they can be created in pairs.  Studies of singular optics have merged with structured illumination [5] to create an active field of topological optics with applications in biological microscopy as well as material science.

iFig. 7 Optical vortex patterns. a) Log intensity showing zeros in the intensity field.  The circles identify the intensity nulls which are the optical vortices  b) Associated phase with a 2pi phase wrapping around each singularity.  c) Associated hologram showing dislocations in the fringes that occur at the vortices.

References

[1] D. D. Nolte, Optical Interferometry for Biology and Medicine. (Springer, 2012)

[2] A. Mecozzi, C. Antonelli, and M. Shtaif, “Kramers-Kronig coherent receiver,” Optica, vol. 3, no. 11, pp. 1220-1227, Nov (2016)

[3] M. R. Dennis, R. P. King, B. Jack, K. O’Holleran, and M. J. Padgett, “Isolated optical vortex knots,” Nature Physics, vol. 6, no. 2, pp. 118-121, Feb (2010)

[4] S. J. Kirkpatrick, K. Khaksari, D. Thomas, and D. D. Duncan, “Optical vortex behavior in dynamic speckle fields,” Journal of Biomedical Optics, vol. 17, no. 5, May (2012), Art no. 050504

[5] H. Rubinsztein-Dunlop, A. Forbes, M. V. Berry, M. R. Dennis, D. L. Andrews, M. Mansuripur, C. Denz, C. Alpmann, P. Banzer, T. Bauer, E. Karimi, L. Marrucci, M. Padgett, M. Ritsch-Marte, N. M. Litchinitser, N. P. Bigelow, C. Rosales-Guzman, A. Belmonte, J. P. Torres, T. W. Neely, M. Baker, R. Gordon, A. B. Stilgoe, J. Romero, A. G. White, R. Fickler, A. E. Willner, G. D. Xie, B. McMorran, and A. M. Weiner, “Roadmap on structured light,” Journal of Optics, vol. 19, no. 1, Jan (2017), Art no. 013001

The Transverse Doppler Effect and Relativistic Time Dilation

One of the hardest aspects to grasp about relativity theory is the question of whether an event “look as if” it is doing something, or whether it “actually is” doing something. 

Take, for instance, the classic twin paradox of relativity theory in which there are twins who wear identical high-precision wrist watches.  One of them rockets off to Alpha Centauri at relativistic speeds and returns while the other twin stays on Earth.  Each twin sees the other twin’s clock running slowly because of relativistic time dilation.  Yet when they get back together and, standing side-by-side, they compare their watches—the twin who went to Alpha Centauri is actually younger than the other, despite the paradox.  The relativistic effect of time dilation is “real”, not just apparent, regardless of whether they come back together to do the comparison.

Yet this understanding of relativistic effects took many years, even decades, to gain acceptance after Einstein proposed them.  He was aware himself that key experiments were required to prove that relativistic effects are real and not just apparent.

Einstein and the Transverse Doppler Effect

In 1905 Einstein used his new theory of special relativity to predict observable consequences that included a general treatment of the relativistic Doppler effect [1].  This included the effects of time dilation in addition to the longitudinal effect of the source chasing the wave.  Time dilation produced a correction to Doppler’s original expression for the longitudinal effect that became significant at speeds approaching the speed of light.  More significantly, it predicted a transverse Doppler effect for a source moving along a line perpendicular to the line of sight to an observer.  This effect had not been predicted either by Christian Doppler (1803 – 1853) or by Woldemar Voigt (1850 – 1919). 

( Read article in Physics Today on the history of the Doppler effect [2] )

Despite the generally positive reception of Einstein’s theory of special relativity, some of its consequences were anathema to many physicists at the time.  A key stumbling block was the question whether relativistic effects, like moving clocks running slowly, were only apparent, or were actually real, and Einstein had to fight to convince others of its reality.  When Johannes Stark (1874 – 1957) observed Doppler line shifts in ion beams called “canal rays” in 1906 (Stark received the 1919 Nobel prize in part for this discovery) [3], Einstein promptly published a paper suggesting how the canal rays could be used in a transverse geometry to directly detect time dilation through the transverse Doppler effect [4].  Thirty years passed before the experiment was performed with sufficient accuracy by Herbert Ives and G. R. Stilwell in 1938 to measure the transverse Doppler effect [5].  Ironically, even at this late date, Ives and Stilwell were convinced that their experiment had disproved Einstein’s time dilation by supporting Lorentz’ contraction theory of the electron.  The Ives-Stilwell experiment was the first direct test of time dilation, followed in 1940 by muon lifetime measurements [6].

A) Transverse Doppler Shift Relative to Emission Angle

The Doppler effect varies between blue shifts in the forward direction to red shifts in the backward direction, with a smooth variation in Doppler shift as a function of the emission angle.  Consider the configuration shown in Fig. 1 for light emitted from a source moving at speed v and emitting at an angle θ0 in the receiver frame. The source moves a distance vT in the time of a single emission cycle (assume a harmonic wave). In that time T (which is the period of oscillation of the light source — or the period of a clock if we think of it putting out light pulses) the light travels a distance cT before another cycle begins (or another pulse is emitted).

Fig. 1 Configuration for detection of Doppler shifts for emission angle θ0. The light source travels a distance vT during the time of a single cycle, while the wavefront travels a distance cT towards the detector.

[ See YouTube video on the derivation of the transverse Doppler Effect.]

The observed wavelength in the receiver frame is thus given by

where T is the emission period of the moving source.  Importantly, the emission period is time dilated relative to the proper emission time of the source

Therefore,

This expression can be evaluated for several special cases:

a) θ0 = 0 for forward emission

which is the relativistic blue shift for longitudinal motion in the direction of the receiver.

b) θ0 = π for backward emission

which is the relativistic red shift for longitudinal motion away from the receiver

c) θ0 = π/2 for transverse emission

This transverse Doppler effect for emission at right angles is a red shift, caused only by the time dilation of the moving light source.  This is the effect proposed by Einstein and observed by Stark that proved moving clocks tick slowly.  But it is not the only way to view the transverse Doppler effect.

B) Transverse Doppler Shift Relative to Angle at Reception

A different option for viewing the transverse Doppler effect is the angle to the moving source at the moment that the light is detected.  The geometry of this configuration relative to the previous is illustrated in Fig. 2.

Fig. 2 The detection point is drawn at a finite distance. However, the relationship between θ0 and θ1 is independent of the distance to the detector

The transverse distance to the detection point is

The length of the line connecting the detection point P with the location of the light source at the moment of detection is (using the law of cosines)

Combining with the first equation gives

An equivalent expression is obtained as

Note that this result, relating θ1 to θ0, is independent of the distance to the observation point.

When θ1 = π/2, then

yielding

for which the Doppler effect is

which is a blue shift.  This creates the unexpected result that sin θ0 = π/2 produces a red shift, while sin θ1 = π/2 produces a blue shift. The question could be asked: which one represents time dilation? In fact, it is sin θ0 = π/2 that produces time dilation exclusively, because in that configuration there is no foreshortening effect on the wavelength–only the emission time.

C) Compromise: The Null Transverse Doppler Shift

The previous two configurations each could be used as a definition for the transverse Doppler effect. But one gives a red shift and one gives a blue shift, which seems contradictory. Therefore, one might try to strike a compromise between these two cases so that sin θ1 = sin θ0, and the configuration is shown in Fig. 3.

This is the case when θ1 + θ2 = π.  The sines of the two angles are equal, yielding

and

which is solved for

Inserting this into the Doppler equation gives

where the Taylor’s expansion of the denominator (at low speed) cancels the numerator to give zero net Doppler shift. This compromise configuration represents the condition of null Doppler frequency shift. However, for speeds approaching the speed of light, the net effect is a lengthening of the wavelength, dominated by time dilation, causing a red shift.

D) Source in Circular Motion Around Receiver

An interesting twist can be added to the problem of the transverse Doppler effect: put the source or receiver into circular motion, one about the other. In the case of a source in circular motion around the receiver, it is easy to see that this looks just like case A) above for θ0 = π/2, which is the red shift caused by the time dilation of the moving source

However, there is the possible complication that the source is no longer in an inertial frame (it experiences angular acceleration) and therefore it is in the realm of general relativity instead of special relativity. In fact, it was Einstein’s solution to this problem that led him to propose the Equivalence Principle and make his first calculations on the deflection of light by gravity. His solution was to think of an infinite number of inertial frames, each of which was instantaneously co-moving with the same linear velocity as the source. These co-moving frames are inertial and can be analyzed using the principles of special relativity. The general relativistic effects come from slipping from one inertial co-moving frame to the next. But in the case of the circular transverse Doppler effect, each instantaneously co-moving frame has the exact configuration as case A) above, and so the wavelength is red shifted exactly by the time dilation.

E) Receiver in Circular Motion Around Source

With the notion of co-moving inertial frames now in hand, this configuration is exactly the same as case B) above, and the wavelength is blue shifted

References

[1] A. Einstein, “On the electrodynamics of moving bodies,” Annalen Der Physik, vol. 17, no. 10, pp. 891-921, Sep (1905)

[2] D. D. Nolte, “The Fall and Rise of the Doppler Effect,” Physics Today, vol. 73, no. 3, pp. 31-35, Mar (2020)

[3] J. Stark, W. Hermann, and S. Kinoshita, “The Doppler effect in the spectrum of mercury,” Annalen Der Physik, vol. 21, pp. 462-469, Nov 1906.

[4] A. Einstein, “Possibility of a new examination of the relativity principle,” Annalen Der Physik, vol. 23, no. 6, pp. 197-198, May (1907)

[5] H. E. Ives and G. R. Stilwell, “An experimental study of the rate of a moving atomic clock,” Journal of the Optical Society of America, vol. 28, p. 215, 1938.

[6] B. Rossi and D. B. Hall, “Variation of the Rate of Decay of Mesotrons with Momentum,” Physical Review, vol. 59, pp. 223–228, 1941.

Hermann Minkowski’s Spacetime: The Theory that Einstein Overlooked

“Society is founded on hero worship”, wrote Thomas Carlyle (1795 – 1881) in his 1840 lecture on “Hero as Divinity”—and the society of physicists is no different.  Among physicists, the hero is the genius—the monomyth who journeys into the supernatural realm of high mathematics, engages in single combat against chaos and confusion, gains enlightenment in the mysteries of the universe, and returns home to share the new understanding.  If the hero is endowed with unusual talent and achieves greatness, then mythologies are woven, creating shadows that can grow and eclipse the truth and the work of others, bestowing upon the hero recognitions that are not entirely deserved.

      “Gentlemen! The views of space and time which I wish to lay before you … They are radical. Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.”

Herman Minkowski (1908)

The greatest hero of physics of the twentieth century, without question, is Albert Einstein.  He is the person most responsible for the development of “Modern Physics” that encompasses:

  • Relativity theory (both special and general),
  • Quantum theory (he invented the quantum in 1905—see my blog),
  • Astrophysics (his field equations of general relativity were solved by Schwarzschild in 1916 to predict event horizons of black holes, and he solved his own equations to predict gravitational waves that were discovered in 2015),
  • Cosmology (his cosmological constant is now recognized as the mysterious dark energy that was discovered in 2000), and
  • Solid state physics (his explanation of the specific heat of crystals inaugurated the field of quantum matter). 

Einstein made so many seminal contributions to so many sub-fields of physics that it defies comprehension—hence he is mythologized as genius, able to see into the depths of reality with unique insight. He deserves his reputation as the greatest physicist of the twentieth century—he has my vote, and he was chosen by Time magazine in 2000 as the Man of the Century.  But as his shadow has grown, it has eclipsed and even assimilated the work of others—work that he initially criticized and dismissed, yet later embraced so whole-heartedly that he is mistakenly given credit for its discovery.

For instance, when we think of Einstein, the first thing that pops into our minds is probably “spacetime”.  He himself wrote several popular accounts of relativity that incorporated the view that spacetime is the natural geometry within which so many of the non-intuitive properties of relativity can be understood.  When we think of time being mixed with space, making it seem that position coordinates and time coordinates share an equal place in the description of relativistic physics, it is common to attribute this understanding to Einstein.  Yet Einstein initially resisted this viewpoint and even disparaged it when he first heard it! 

Spacetime was the brain-child of Hermann Minkowski.

Minkowski in Königsberg

Hermann Minkowski was born in 1864 in Russia to German parents who moved to the city of Königsberg (King’s Mountain) in East Prussia when he was eight years old.  He entered the university in Königsberg in 1880 when he was sixteen.  Within a year, when he was only seventeen years old, and while he was still a student at the University, Minkowski responded to an announcement of the Mathematics Prize of the French Academy of Sciences in 1881.  When he submitted is prize-winning memoire, he could have had no idea that it was starting him down a path that would lead him years later to revolutionary views.

A view of Königsberg in 1581. Six of the seven bridges of Königsberg—which Euler famously described in the first essay on topology—are seen in this picture. The University is in the center distance behind the castle.

The specific Prize challenge of 1881 was to find the number of representations of an integer as a sum of five squares of integers.  For instance, every integer n > 33 can be expressed as the sum of five nonzero squares.  As an example, 42 = 22 + 22 + 32 + 32 + 42,  which is the only representation for that number.  However, there are five representation for n = 53

The task of enumerating these representations draws from the theory of quadratic forms.  A quadratic form is a function of products of numbers with integer coefficients, such as ax2 + bxy + cy2 and ax2 + by2 + cz2 + dxy + exz + fyz.  In number theory, one seeks to find integer solutions for which the quadratic form equals an integer.  For instance, the Pythagorean theorem x2 + y2 = n2 for integers is a quadratic form for which there are many integer solutions (x,y,n), known as Pythagorean triplets, such as

The topic of quadratic forms gained special significance after the work of Bernhard Riemann who established the properties of metric spaces based on the metric expression

for infinitesimal distance in a D-dimensional metric space.  This is a generalization of Euclidean distance to more general non-Euclidean spaces that may have curvature.  Minkowski would later use this expression to great advantage, developing a “Geometry of Numbers” [1] as he delved ever deeper into quadratic forms and their uses in number theory.

Minkowski in Göttingen

After graduating with a doctoral degree in 1885 from Königsberg, Minkowski did his habilitation at the university of Bonn and began teaching, moving back to Königsberg in 1892 and then to Zurich in 1894 (where one of his students was a somewhat lazy and unimpressive Albert Einstein).  A few years later he was given an offer that he could not refuse.

At the turn of the 20th century, the place to be in mathematics was at the University of Göttingen.  It had a long tradition of mathematical giants that included Carl Friedrich Gauss, Bernhard Riemann, Peter Dirichlet, and Felix Klein.  Under the guidance of Felix Klein, Göttingen mathematics had undergone a renaissance. For instance, Klein had attracted Hilbert from the University of Königsberg in 1895.  David Hilbert had known Minkowski when they were both students in Königsberg, and Hilbert extended an invitation to Minkowski to join him in Göttingen, which Minkowski accepted in 1902.

The University of Göttingen

A few years after Minkowski arrived at Göttingen, the relativity revolution broke, and both Minkowski and Hilbert began working on mathematical aspects of the new physics. They organized a colloquium dedicated to relativity and related topics, and on Nov. 5, 1907 Minkowski gave his first tentative address on the geometry of relativity.

Because Minkowski’s specialty was quadratic forms, and given his understanding of Riemann’s work, he was perfectly situated to apply his theory of quadratic forms and invariants to the Lorentz transformations derived by Poincaré and Einstein.  Although Poincaré had published a paper in 1906 that showed that the Lorentz transformation was a generalized rotation in four-dimensional space [2], Poincaré continued to discuss space and time as separate phenomena, as did Einstein.  For them, simultaneity was no longer an invariant, but events in time were still events in time and not somehow mixed with space-like properties. Minkowski recognized that Poincaré had missed an opportunity to define a four-dimensional vector space filled by four-vectors that captured all possible events in a single coordinate description without the need to separate out time and space. 

Minkowski’s first attempt, presented in his 1907 colloquium, at constructing velocity four-vectors was flawed because (like so many of my mechanics students when they first take a time derivative of the four-position) he had not yet understood the correct use of proper time. But the research program he outlined paved the way for the great work that was to follow.

On Feb. 21, 1908, only 3 months after his first halting steps, Minkowski delivered a thick manuscript to the printers for an article to appear in the Göttinger Nachrichten. The title “Die Grundgleichungen für die elektromagnetischen Vorgänge in bewegten Körpern” (The Basic Equations for Electromagnetic Processes of Moving Bodies) belies the impact and importance of this very dense article [3]. In its 60 pages (with no figures), Minkowski presents the correct form for four-velocity by taking derivatives relative to proper time, and he formalizes his four-dimensional approach to relativity that became the standard afterwards. He introduces the terms spacelike vector, timelike vector, light cone and world line. He also presents the complete four-tensor form for the electromagnetic fields. The foundational work of Levi Cevita and Ricci-Curbastro on tensors was not yet well known, so Minkowski invents his own terminology of Traktor to describe it. Most importantly, he invents the terms spacetime (Raum-Zeit) and events (Erignisse) [4].

Minkowski’s four-dimensional formalism of relativistic electromagnetics was more than a mathematical trick—it uncovered the presence of a multitude of invariants that were obscured by the conventional mathematics of Einstein and Lorentz and Poincaré. In Minkowski’s approach, whenever a proper four-vector is contracted with itself (its inner product), an invariant emerges. Because there are many fundamental four-vectors, there are many invariants. These invariants provide the anchors from which to understand the complex relative properties amongst relatively moving frames.

Minkowski’s master work appeared in the Nachrichten on April 5, 1908. If he had thought that physicists would embrace his visionary perspective, he was about to be woefully disabused of that notion.

Einstein’s Reaction

Despite his impressive ability to see into the foundational depths of the physical world, Einstein did not view mathematics as the root of reality. Mathematics for him was a tool to reduce physical intuition into quantitative form. In 1908 his fame was rising as the acknowledged leader in relativistic physics, and he was not impressed or pleased with the abstract mathematical form that Minkowski was trying to stuff the physics into. Einstein called it “superfluous erudition” [5], and complained “since the mathematics pounced on the relativity theory, I no longer understand it myself! [6]”

With his collaborator Jakob Laub (also a former student of Minkowski’s), Einstein objected to more than the hard-to-follow mathematics—they believed that Minkowski’s form of the pondermotive force was incorrect. They then proceeded to re-translate Minkowski’s elegant four-vector derivations back into ordinary vector analysis, publishing two papers in Annalen der Physik in the summer of 1908 that were politely critical of Minkowski’s approach [7-8]. Yet another of Minkowski’s students from Zurich, Gunnar Nordström, showed how to derive Minkowski’s field equations without any of the four-vector formalism.

One can only wonder why so many of his former students so easily dismissed Minkowski’s revolutionary work. Einstein had actually avoided Minkowski’s mathematics classes as a student at ETH [5], which may say something about Minkowski’s reputation among the students, although Einstein did appreciate the class on mechanics that he took from Minkowski. Nonetheless, Einstein missed the point! Rather than realizing the power and universality of the four-dimensional spacetime formulation, he dismissed it as obscure and irrelevant—perhaps prejudiced by his earlier dim view of his former teacher.

Raum und Zeit

It is clear that Minkowski was stung by the poor reception of his spacetime theory. It is also clear that he truly believed that he had uncovered an essential new approach to physical reality. While mathematicians were generally receptive of his work, he knew that if physicists were to adopt his new viewpoint, he needed to win them over with the elegant results.

In 1908, Minkowski presented a now-famous paper Raum und Zeit at the 80th Assembly of German Natural Scientists and Physicians (21 September 1908).  In his opening address, he stated [9]:

“Gentlemen!  The views of space and time which I wish to lay before you have sprung from the soil of experimental physics, and therein lies their strength. They are radical. Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.”

To illustrate his arguments Minkowski constructed the most recognizable visual icon of relativity theory—the space-time diagram in which the trajectories of particles appear as “world lines”, as in Fig. 1.  On this diagram, one spatial dimension is plotted along the horizontal-axis, and the value ct (speed of light times time) is plotted along the vertical-axis.  In these units, a photon travels along a line oriented at 45 degrees, and the world-line (the name Minkowski gave to trajectories) of all massive particles must have slopes steeper than this.  For instance, a stationary particle, that appears to have no trajectory at all, executes a vertical trajectory on the space-time diagram as it travels forward through time.  Within this new formulation by Minkowski, space and time were mixed together in a single manifold—spacetime—and were no longer separate entities.

Fig. 1 The First “Minkowski diagram” of spacetime.

In addition to the spacetime construct, Minkowski’s great discovery was the plethora of invariants that followed from his geometry. For instance, the spacetime hyperbola

is invariant to Lorentz transformation in coordinates.  This is just a simple statement that a vector is an entity of reality that is independent of how it is described.  The length of a vector in our normal three-space does not change if we flip the coordinates around or rotate them, and the same is true for four-vectors in Minkowski space subject to Lorentz transformations. 

In relativity theory, this property of invariance becomes especially useful because part of the mental challenge of relativity is that everything looks different when viewed from different frames.  How do you get a good grip on a phenomenon if it is always changing, always relative to one frame or another?  The invariants become the anchors that we can hold on to as reference frames shift and morph about us. 

Fig. 2 Any event on an invariant hyperbola is transformed by the Lorentz transformation onto another point on the same hyperbola. Events that are simultaneous in one frame are each on a separate hyperbola. After transformation, simultaneity is lost, but each event stays on its own invariant hyperbola (Figure reprinted from [10]).

As an example of a fundamental invariant, the mass of a particle in its rest frame becomes an invariant mass, always with the same value.  In earlier relativity theory, even in Einstein’s papers, the mass of an object was a function of its speed.  How is the mass of an electron a fundamental property of physics if it is a function of how fast it is traveling?  The construction of invariant mass removes this problem, and the mass of the electron becomes an immutable property of physics, independent of the frame.  Invariant mass is just one of many invariants that emerge from Minkowski’s space-time description.  The study of relativity, where all things seem relative, became a study of invariants, where many things never change.  In this sense, the theory of relativity is a misnomer.  Ironically, relativity theory became the motivation of post-modern relativism that denies the existence of absolutes, even as relativity theory, as practiced by physicists, is all about absolutes.

Despite his audacious gambit to win over the physicists, Minkowski would not live to see the fruits of his effort. He died suddenly of a burst gall bladder on Jan. 12, 1909 at the age of 44.

Arnold Sommerfeld (who went on to play a central role in the development of quantum theory) took up Minkowski’s four vectors, and he systematized it in a way that was palatable to physicists.  Then Max von Laue extended it while he was working with Sommerfeld in Munich, publishing the first physics textbook on relativity theory in 1911, establishing the space-time formalism for future generations of German physicists.  Further support for Minkowski’s work came from his distinguished colleagues at Göttingen (Hilbert, Klein, Wiechert, Schwarzschild) as well as his former students (Born, Laue, Kaluza, Frank, Noether).  With such champions, Minkowski’s work was immortalized in the methodology (and mythology) of physics, representing one of the crowning achievements of the Göttingen mathematical community.

Einstein Relents

Already in 1907 Einstein was beginning to grapple with the role of gravity in the context of relativity theory, and he knew that the special theory was just a beginning. Yet between 1908 and 1910 Einstein’s focus was on the quantum of light as he defended and extended his unique view of the photon and prepared for the first Solvay Congress of 1911. As he returned his attention to the problem of gravitation after 1910, he began to realize that Minkowski’s formalism provided a framework from which to understand the role of accelerating frames. In 1912 Einstein wrote to Sommerfeld to say [5]

I occupy myself now exclusively with the problem of gravitation . One thing is certain that I have never before had to toil anywhere near as much, and that I have been infused with great respect for mathematics, which I had up until now in my naivety looked upon as a pure luxury in its more subtle parts. Compared to this problem. the original theory of relativity is child’s play.

By the time Einstein had finished his general theory of relativity and gravitation in 1915, he fully acknowledge his indebtedness to Minkowski’s spacetime formalism without which his general theory may never have appeared.


[1] H. Minkowski, Geometrie der Zahlen. Leipzig and Berlin: R. G. Teubner, 1910.

[2] Poincaré, H. (1906). “Sur la dynamique de l’´electron.” Rendiconti del circolo matematico di Palermo 21: 129–176.

[3] H. Minkowski, “Die Grundgleichungen für die electromagnetischen Vorgänge in bewegten Körpern,” Nachrichten von der Königlichen Gesellschaft der Wissenschaften zu Göttingen, pp. 53–111, (1908)

[4] S. Walter, “Minkowski’s Modern World,” in Minkowski Spacetime: A Hundred Years Later, Petkov Ed.: Springer, 2010, ch. 2, pp. 43-61.

[5] L. Corry, “The influence of David Hilbert and Hermann Minkowski on Einstein’s views over the interrelation between physics and mathematics,” Endeavour, vol. 22, no. 3, pp. 95-97, (1998)

[6] A. Pais, Subtle is the Lord: The Science and the Life of Albert Einstein. Oxford, 2005.

[7] A. Einstein and J. Laub, “Electromagnetic basic equations for moving bodies,” Annalen Der Physik, vol. 26, no. 8, pp. 532-540, Jul (1908)

[8] A. Einstein and J. Laub, “Electromagnetic fields on quiet bodies with pondermotive energy,” Annalen Der Physik, vol. 26, no. 8, pp. 541-550, Jul (1908)

[9] Minkowski, H. (1909). “Raum und Zeit.” Jahresbericht der Deutschen Mathematikier-Vereinigung: 75-88.

[10] D. D. Nolte, Introduction to Modern Dynamics : Chaos, Networks, Space and Time, 2nd ed. Oxford: Oxford University Press, 2019.

The Lens of Gravity: Einstein’s Rings

Einstein’s theory of gravity came from a simple happy thought that occurred to him as he imagined an unfortunate worker falling from a roof, losing hold of his hammer, only to find both the hammer and himself floating motionless relative to each other as if gravity had ceased to exist.  With this one thought, Einstein realized that the falling (i.e. accelerating) reference frame was in fact an inertial frame, and hence all the tricks that he had learned and invented to deal with inertial relativistic frames could apply just as well to accelerating frames in gravitational fields.

Gravitational lensing (and microlensing) have become a major tool of discovery in astrophysics applied to the study of quasars, dark matter and even the search for exoplanets.

Armed with this new perspective, one of the earliest discoveries that Einstein made was that gravity must bend light paths.  This phenomenon is fundamentally post-Newtonian, because there can be no possible force of gravity on a massless photon—yet Einstein’s argument for why gravity should bend light is so obvious that it is manifestly true, as demonstrated by Arthur Eddington during the solar eclipse of 1919, launching Einstein to world-wide fame. It is also demonstrated by the beautiful gravitational lensing phenomenon of Einstein arcs. Einstein arcs are the distorted images of bright distant light sources in the universe caused by an intervening massive object, like a galaxy or galaxy cluster, that bends the light rays. A number of these arcs are seen in images of the Abel cluster of galaxies in Fig. 1.

Fig. 1 Numerous Einstein arcs seen in the Abel cluster of galaxies.

Gravitational lensing (and microlensing) have become a major tool of discovery in astrophysics applied to the study of quasars, dark matter and even the search for exoplanets.  However, as soon as Einstein conceived of gravitational lensing, in 1912, he abandoned the idea as too small and too unlikely to ever be useful, much like he abandoned the idea of gravitational waves in 1915 as similarly being too small ever to detect.  It was only at the persistence of an amateur Czech scientist twenty years later that Einstein reluctantly agreed to publish his calculations on gravitational lensing.

The History of Gravitational Lensing

In 1912, only a few years after his “happy thought”, and fully three years before he published his definitive work on General Relativity, Einstein derived how light would be affected by a massive object, causing light from a distant source to be deflected like a lens. The historian of physics, Jürgen Renn discovered these derivations in Einstein’s notebooks while at the Max Planck Institute for the History of Science in Berlin in 1996 [1]. However, Einstein also calculated the magnitude of the effect and dismissed it as too small, and so he never published it.

Years later, in 1936, Einstein received a visit from a Czech electrical engineer Rudi Mandl, an amateur scientist who had actually obtained a small stipend from the Czech government to visit Einstein at the Institute for Advanced Study at Princeton. Mandl had conceived of the possibility of gravitational lensing and wished to bring it to Einstein’s attention, thinking that the master would certainly know what to do with the idea. Einstein was obliging, redoing his calculations of 1912 and obtaining once again the results that made him believe that the effect would be too small to be seen. However, Mandl was persistent and pressed Einstein to publish the results, which he did [2]. In his submission letter to the editor of Science, Einstein stated “Let me also thank you for your cooperation with the little publication, which Mister Mandl squeezed out of me. It is of little value, but it makes the poor guy happy”. Einstein’s pessimism was based on his thinking that isolated stars would be the only source of the gravitational lens (he did not “believe” in black holes), but in 1937 Fritz Zwicky at Cal Tech (a gadfly genius) suggested that the newly discovered phenomenon of “galaxy clusters” might provide the massive gravity that would be required to produce the effect. Although, to be visible, a distant source would need to be extremely bright.

Potential sources were discovered in the 1960’s using radio telescopes that discovered quasi-stellar objects (known as quasars) that are extremely bright and extremely far away. Quasars also appear in the visible range, and in 1979 a twin quasar was discovered by astronomers using the telescope at the Kitt Peak Obversvatory in Arizona–two quasars very close together that shared identical spectral fingerprints. The astronomers realized that it could be a twin image of a single quasar caused by gravitational lensing, which they published as a likely explanation. Although the finding was originally controversial, the twin-image was later confirmed, and many additional examples of gravitational lensing have since been discovered.

The Optics of Gravity and Light

Gravitational lenses are terrible optical instruments.  A good imaging lens has two chief properties: 1) It produces increasing delay on a wavefront as the radial distance from the optic axis decreases; and 2) it deflects rays with increasing deflection angle as the radial distance of a ray increases away from the optic axis (the center of the lens).  Both properties are part of the same effect: the conversion, by a lens, of an incident plane wave into a converging spherical wave.  A third property of a good lens ensures minimal aberrations of the converging wave: a quadratic dependence of wavefront delay on radial distance from the optic axis.  For instance, a parabolic lens produces a diffraction-limited focal spot.

Now consider the optical effects of gravity around a black hole.  One of Einstein’s chief discoveries during his early investigations into the effects of gravity on light is the analogy of warped space-time as having an effective refractive index.  Light propagates through space affected by gravity as if there were a refractive index associated with the gravitational potential.  In a previous blog on the optics of gravity, I showed the simple derivation of the refractive effects of gravity on light based on the Schwarschild metric applied to a null geodesic of a light ray.  The effective refractive index near a black hole is

This effective refractive index diverges at the Schwarzschild radius of the black hole. It produces the maximum delay, not on the optic axis as for a good lens, but at the finite distance RS.  Furthermore, the maximum deflection also occurs at RS, and the deflection decreases with increasing radial distance.  Both of these properties of gravitational lensing are opposite to the properties of a good lens.  For this reason, the phrase “gravitational lensing” is a bit of a misnomer.  Gravitating bodies certainly deflect light rays, but the resulting optical behavior is far from that of an imaging lens.

The path of a ray from a distant quasar, through the thin gravitational lens of a galaxy, and intersecting the location of the Earth, is shown in Fig. 2. The location of the quasar is a distance R from the “optic axis”. The un-deflected angular position is θ0, and with the intervening galaxy the image appears at the angular position θ. The angular magnification is therefore M = θ/θ0.

Fig. 2 Optical ray layout for gravitational lensing and Einstein rings. All angles are greatly exaggerated; typical angles are in the range of several arcseconds.

The deflection angles are related through

where b is the “impact parameter”

These two equations are solved to give to an expression that relates the unmagnified angle θ0 to the magnified angle θ as

where

is the angular size of the Einstein ring when the source is on the optic axis. The quadratic equation has two solutions that gives two images of the distant quasar. This is the origin of the “double image” that led to the first discovery of gravitational lensing in 1979.

When the distant quasar is on the optic axis, then θ0 = 0 and the deflection of the rays produces, not a double image, but an Einstein ring with an angular size of θE. For typical lensing objects, the angular size of Einstein rings are typically in the range of tens of microradians. The angular magnification for decreasing distance R diverges as

But this divergence is more a statement of the bad lens behavior than of actual image size. Because the gravitational lens is inverted (with greater deflection closer to the optic axis) compared to an ideal thin lens, it produces a virtual image ring that is closer than the original object, as in Fig. 3.

Fig. 3 Gravitational lensing does not produce an “image” but rather an Einstein ring that is virtual and magnified (appears closer).

The location of the virtual image behind the gravitational lens (when the quasar is on the optic axis) is obtained from

If the quasar is much further from the lens than the Earth, then the image location is zi = -L1/2, or behind the lens by half the distance from the Earth to the lens. The longitudinal magnification is then

Note that while the transverse (angular) magnification diverges as the object approaches the optic axis, the longitudinal magnification remains finite but always greater than unity.

The Caustic Curves of Einstein Rings

Because gravitational lenses have such severe aberration relative to an ideal lens, and because the angles are so small, an alternate approach to understanding the optics of gravity is through the theory of light caustics. In a previous blog on the optics of caustics I described how sets of deflected rays of light become enclosed in envelopes that produce regions of high and low intensity. These envelopes are called caustics. Gravitational light deflection also causes caustics.

In addition to envelopes, it is also possible to trace the time delays caused by gravity on wavefronts. In the regions of the caustic envelopes, these wavefronts can fold back onto themselves so that different parts of the image arrive at different times coming from different directions.

An example of gravitational caustics is shown in Fig. 4. Rays are incident vertically on a gravitational thin lens which deflects the rays so that they overlap in the region below the lens. The red curves are selected wavefronts at three successively later times. The gravitational potential causes a time delay on the propgating front, with greater delays in regions of stronger gravitational potential. The envelope function that is tangent to the rays is called the caustic, here shown as the dense blue mesh. In this case there is a cusp in the caustic near z = -1 below the lens. The wavefronts become multiple-valued past the cusp

Fig. 4 Wavefronts (in red) perpendicular to the rays (in blue) from gravitational deflection of light. A cusp in the wavefront forms at the apex of the caustic ray envelope near z = -1. Farther from the lens the wavefront becomes double-valued, leading to different time delays for the two images if the object is off the optic axis. (All angle are greatly exaggerated.)

The intensity of the distant object past the lens is concentrated near the caustic envelope. The intensity of the caustic at z = -6 is shown in Fig. 5. The ring structure is the cross-sectional spatial intensity at the fixed observation plane, but a transform to the an angular image is one-to-one, so the caustic intensity distribution is also similar to the view of the Einstein ring from a position at z = -6 on the optic axis.

Fig. 5 Simulated caustic of an Einstein arc. This is the cross-sectional intensity at z = -6 from Fig. 4.

The gravitational potential is a function of the mass distribution in the gravitational lens. A different distribution with a flatter distribution of mass near the optic axis is shown in Fig. 6. There are multiple caustics in this case with multi-valued wavefronts. Because caustics are sensitive to mass distribution in the gravitational lens, astronomical observations of gravitational caustics can be used to back out the mass distribution, including dark matter or even distant exoplanets.

Fig. 6 Wavefronts and caustic for a much flatter mass distribution in the galaxy. The wavefront has multiple cusps in this case and the caustic has a double ring. The details of the caustics caused by the gravitational lens can provide insight into the mass distribution of the lensing object.

Python Code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Mar 30 19:47:31 2021

gravfront.py

@author: David Nolte
Introduction to Modern Dynamics, 2nd edition (Oxford University Press, 2019)

Gravitational Lensing
"""

import numpy as np
from matplotlib import pyplot as plt

plt.close('all')

def refindex(x):
    n = n0/(1 + abs(x)**expon)**(1/expon);
    return n


delt = 0.001
Ly = 10
Lx = 5
n0 = 1
expon = 2   # adjust this from 1 to 10


delx = 0.01
rng = np.int(Lx/delx)
x = delx*np.linspace(-rng,rng)

n = refindex(x)

dndx = np.diff(n)/np.diff(x)

plt.figure(1)
lines = plt.plot(x,n)

plt.figure(2)
lines2 = plt.plot(dndx)

plt.figure(3)
plt.xlim(-Lx, Lx)
plt.ylim(-Ly, 2)
Nloop = 160;
xd = np.zeros((Nloop,3))
yd = np.zeros((Nloop,3))
for loop in range(0,Nloop):
    xp = -Lx + 2*Lx*(loop/Nloop)
    plt.plot([xp, xp],[2, 0],'b',linewidth = 0.25)

    thet = (refindex(xp+delt) - refindex(xp-delt))/(2*delt)
    xb = xp + np.tan(thet)*Ly
    plt.plot([xp, xb],[0, -Ly],'b',linewidth = 0.25)
    
    for sloop in range(0,3):
        delay = n0/(1 + abs(xp)**expon)**(1/expon) - n0
        dis = 0.75*(sloop+1)**2 - delay
        xfront = xp + np.sin(thet)*dis
        yfront = -dis*np.cos(thet)
                
        xd[loop,sloop] = xfront
        yd[loop,sloop] = yfront
        
for sloop in range(0,3):
    plt.plot(xd[:,sloop],yd[:,sloop],'r',linewidth = 0.5)

References

[1] J. Renn, T. Sauer and J. Stachel, “The Origin of Gravitational Lensing: A Postscript to Einstein’s 1936 Science Paper, Science 275. 184 (1997)

[2] A. Einstein, “Lens-Like Action of a Star by the Deviation of Light in the Gravitational Field”, Science 84, 506 (1936)

[3] (Here is an excellent review article on the topic.) J. Wambsganss, “Gravitational lensing as a powerful astrophysical tool: Multiple quasars, giant arcs and extrasolar planets,” Annalen Der Physik, vol. 15, no. 1-2, pp. 43-59, Jan-Feb (2006) SpringerLink

Caustic Curves and the Optics of Rays

Snorkeling above a shallow reef on a clear sunny day transports you to an otherworldly galaxy of spectacular deep colors and light reverberating off of the rippled surface.  Playing across the underwater floor of the reef is a fabulous light show of bright filaments entwining and fluttering, creating random mesh networks of light and dark.  These same patterns appear on the bottom of swimming pools in summer and in deep fountains in parks.

Johann Bernoulli had a stormy career and a problematic personality–but he was brilliant even among the bountiful Bernoulli clan. Using methods of tangents, he found the analytic solution of the caustic of the circle.

Something similar happens when a bare overhead light reflects from the sides of a circular glass of water.  The pattern no longer moves, but a dazzling filament splays across the bottom of the glass with a sharp bright cusp at the center. These bright filaments of light have an age old name — Caustics — meaning burning as in burning with light. The study of caustics goes back to Archimedes of Syracuse and his apocryphal burning mirrors that are supposed to have torched the invading triremes of the Roman navy in 212 BC.

Fig. 1 (left) Archimedes supposedly burning the Roman navy with caustics formed by a “burning mirror”. A wall painting from the Uffizi Gallery, Stanzino delle Matematiche, in Florence, Italy. Painted in 1600 by Gieulio Parigi. (right) The Mojave thermal farm uses 3000 acres of mirrors to actually do the trick.

Caustics in optics are concentrations of light rays that form bright filaments, often with cusp singularities. Mathematically, they are envelope curves that are tangent to a set of lines. Cata-caustics are caustics caused by light reflecting from curved surfaces. Dia-caustics are caustics caused by light refracting from transparent curved materials.

From Leonardo to Huygens

Even after Archimedes, burning mirrors remained an interest for a broad range of scientists, artists and engineers. Leonardo Da Vinci took an interest around 1503 – 1506 when he drew reflected caustics from a circular mirror in his many notebooks.

Fig. 2 Drawings of caustics of the circle in Leonardo Da Vinci’s notebooks circa 1503 – 1506. Digitized by the British Museum.

Almost two centuries later, Christian Huygens constructed the caustic of a circle in his Treatise on light : in which are explained the causes of that which occurs in reflection, & in refraction and particularly in the strange refraction of Iceland crystal. This is the famous treatise in which he explained his principle for light propagation as wavefronts. He was able to construct the caustic geometrically, but did not arrive at a functional form. He mentions that it has a cusp like a cycloid, but without being a cycloid. He first presented this work at the Paris Academy in 1678 where the news of his lecture went as far as Italy where a young German mathematician was traveling.

Fig. 3 Christian Huygens construction of the cusp of the caustic of the circle from his Treatise on Light (1690).

The Cata-caustics of Tschirnhaus and Bernoulli

In the decades after Newton and Leibniz invented the calculus, a small cadre of mathematicians strove to apply the new method to understand aspects of the physical world. At at a time when Newton had left the calculus behind to follow more arcane pursuits, Lebniz, Jakob and Johann Bernoulli, Guillaume de l’Hôpital, Émilie du Chatelet and Walter von Tschirnhaus were pushing notation reform (mainly following Leibniz) to make the calculus easier to learn and use, as well as finding new applications of which there were many.

Ehrenfried Walter von Tschirnhaus (1651 – 1708) was a German mathematician and physician and a lifelong friend of Leibniz, who he met in Paris in 1675. He was one of only five mathematicians to provide a solution to Johann Bernoulli’s brachistochrone problem. One of the recurring interests of von Tschirnhaus, that he revisited throughout his carrier, was in burning glasses and mirrors. A burning glass is a high-quality magnifying lens that brings the focus of the sun to a fine point to burn or anneal various items. Burning glasses were used to heat small items for manufacture or for experimentation. For instance, Priestly and Lavoisier routinely used burning glasses in their chemistry experiments. Low optical aberrations were required for the lenses to bring the light to the finest possible focus, so the study of optical focusing was an important topic both academically and practically. Tshirnhaus had his own laboratory to build and test burning mirrors, and he became aware of the cata-caustic patterns of light reflected from a circular mirror or glass surface. Given his parallel interest in the developing calculus methods, he published a paper in Acta Eruditorum in 1682 that constructed the envelope function created by the cata-caustics of a circle. However, Tschirnhaus did not produce the analytic function–that was provided by Johann Bernoulli ten years later in 1692.

Fig. 4 Excerpt from Acta Eruditorum 1682 by von Tschirnhaus.

Johann Bernoulli had a stormy career and a problematic personality–but he was brilliant even among the Bountiful Bernoulli clan. Using methods of tangents, he found the analytic solution of the caustic of the circle. He did this by stating the general equation for all reflected rays and then finding when their y values are independent of changing angle … in other words using the principle of stationarity which would later become a potent tool in the hands of Lagrange as he developed Lagrangian physics.

Fig. 5 Bernoulli’s construction of the equations of rays reflected by the unit circle.

The equation for the reflected ray, expressing y as a function of x for a given angle α in Fig. 5, is

The condition of the caustic envelope requires the change in y with respect to the angle α to vanish while treating x as a constant. This is a partial derivative, and Johann Bernoulli is giving an early use of this method in 1692 to ensure the stationarity of y with respect to the changing angle. The partial derivative is

This is solved to give

Plugging this into the equation at the top equation above yields

These last two expressions for x and y in terms of the angle α are a parametric representation of the caustic. Combining them gives the solution to the caustic of the circle

The square root provides the characteristic cusp at the center of the caustic.

Fig. 6 Caustic of a circle. Image was generated using the Python program raycaustic.py.

Python Code: raycaustic.py

There are lots of options here. Try them all … then add your own!

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Feb 16 16:44:42 2021

raycaustic.py

@author: nolte

D. D. Nolte, Optical Interferometry for Biology and Medicine (Springer,2011)
"""

import numpy as np
from matplotlib import pyplot as plt

plt.close('all')

# model_case 1 = cosine
# model_case 2 = circle
# model_case 3 = square root
# model_case 4 = inverse power law
# model_case 5 = ellipse
# model_case 6 = secant
# model_case 7 = parabola
# model_case 8 = Cauchy

model_case = int(input('Input Model Case (1-7)'))
if model_case == 1:
    model_title = 'cosine'
    xleft = -np.pi
    xright = np.pi
    ybottom = -1
    ytop = 1.2

elif model_case == 2:
    model_title = 'circle'
    xleft = -1
    xright = 1
    ybottom = -1
    ytop = .2

elif model_case == 3:
    model_title = 'square-root'
    xleft = 0
    xright = 4
    ybottom = -2
    ytop = 2

elif model_case == 4:
    model_title = 'Inverse Power Law'
    xleft = 1e-6
    xright = 4
    ybottom = 0
    ytop = 4
    
elif model_case == 5:
    model_title = 'ellipse'
    a = 0.5
    b = 2
    xleft = -b
    xright = b
    ybottom = -a
    ytop = 0.5*b**2/a
    
elif model_case == 6:
    model_title = 'secant'
    xleft = -np.pi/2
    xright = np.pi/2
    ybottom = 0.5
    ytop = 4
    
elif model_case == 7:
    model_title = 'Parabola'
    xleft = -2
    xright = 2
    ybottom = 0
    ytop = 4

elif model_case == 8:
    model_title = 'Cauchy'
    xleft = 0
    xright = 4
    ybottom = 0
    ytop = 4
    
def feval(x):

    if model_case == 1:
        y = -np.cos(x)

    elif model_case == 2:
        y = -np.sqrt(1-x**2)

    elif model_case == 3:
        y = -np.sqrt(x)
        
    elif model_case == 4:
        y = x**(-0.75)
        
    elif model_case == 5:
        y = -a*np.sqrt(1-x**2/b**2)

    elif model_case == 6:
        y = 1.0/np.cos(x)

    elif model_case == 7:
        y = 0.5*x**2  
        
    elif model_case == 8:
        y = 1/(1 + x**2)

    return y

xx = np.arange(xleft,xright,0.01)
yy = feval(xx)

lines = plt.plot(xx,yy)
plt.xlim(xleft, xright)
plt.ylim(ybottom, ytop)

delx = 0.001
N = 75

for i in range(N+1):
    
    x = xleft + (xright-xleft)*(i-1)/N
    
    val = feval(x)
    valp = feval(x+delx/2)
    valm = feval(x-delx/2)
    deriv = (valp-valm)/delx
    
    phi = np.arctan(deriv)
    slope =  np.tan(np.pi/2 + 2*phi)

    if np.abs(deriv) < 1:
        xf = (ytop-val+slope*x)/slope;
        yf = ytop;
    else:
        xf = (ybottom-val+slope*x)/slope;
        yf = ybottom;
    plt.plot([x, x],[ytop, val],linewidth = 0.5)       
    plt.plot([x, xf],[val, yf],linewidth = 0.5)
    plt.gca().set_aspect('equal', adjustable='box')       
    plt.show()
    

The Dia-caustics of Swimming Pools

A caustic is understood mathematically as the envelope function of multiple rays that converge in the Fourier domain (angular deflection measured at far distances).  These are points of mathematical stationarity, in which the ray density is invariant to first order in deviations in the refracting surface.  The rays themselves are the trajectories of the Eikonal Equation as rays of light thread their way through complicated optical systems.

The basic geometry is shown in Fig 7 for a ray incident on a nonplanar surface emerging into a less-dense medium.  From Snell’s law we have the relation for light entering a dense medium like light into water

where n is the relative index (ratio), and the small-angle approximation has been made.  The incident angle θ1 is simply related to the slope of the interface dh/dx as

where the small-angle approximation is used again.  The angular deflection relative to the optic axis is then

which is equal to the optical path difference through the sample.

Fig. 7 The geometry of ray deflection by a random surface. Reprinted from Optical Interferometry, Ref. [1].

In two dimensions, the optical path difference can be replaced with a general potential

and the two orthogonal angular deflections (measured in the far field on a Fourier plane) are

These angles describe the deflection of the rays across the sample surface. They are also the right-hand side of the Eikonal Equation, the equation governing ray trajectories through optical systems.

Caustics are lines of stationarity, meaning that the density of rays is independent of first-order changes in the refracting sample.  The condition of stationarity is defined by the Jacobian of the transformation from (x,y) to (θx, θy) with

where the second expression is the Hessian determinant of the refractive power of the uneven surface. When this condition is satisfied, the envelope function bounding groups of collected rays is stationary to perturbations in the inhomogeneous sample.

An example of diacaustic formation from a random surface is shown in Fig. 8 generated by the Python program caustic.py. The Jacobian density (center) outlines regions in which the ray density is independent of small changes in the surface. They are positions of the zeros of the Hessian determinant, the regions of zero curvature of the surface or potential function. These high-intensity regions spread out and are intercepted at some distance by a suface, like the bottom of a swimming pool, where the concentrated rays create bright filaments. As the wavelets on the surface of the swimming pool move, the caustic filaments on the bottom of the swimming pool dance about.

Optical caustics also occur in the gravitational lensing of distant quasars by galaxy clusters in the formation of Einstein rings and arcs seen by deep field telescopes, as described in my following blog post.

Fig. 8 Formation of diacaustics by transmission through a transparent material of random thickness (left). The Jacobian density is shown at the center. These are regions of constant ray density. A near surface displays caustics (right) as on the bottom of a swimming pool. Images were generated using the Python program caustic.py.

Python Code: caustic.py

This Python code was used to generate the caustic patterns in Fig. 8. You can change the surface roughness by changing the divisors on the last two arguments on Line 58. The distance to the bottom of the swimming pool can be changed by changing the parameter d on Line 84.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Feb 16 19:50:54 2021

caustic.py

@author: nolte

D. D. Nolte, Optical Interferometry for Biology and Medicine (Springer,2011)
"""

import numpy as np
from matplotlib import pyplot as plt
from numpy import random as rnd
from scipy import signal as signal

plt.close('all')

N = 256

def gauss2(sy,sx,wy,wx):
    
    x = np.arange(-sx/2,sy/2,1)
    y = np.arange(-sy/2,sy/2,1)
    y = y[..., None]
    
    ex = np.ones(shape=(sy,1))
    x2 = np.kron(ex,x**2/(2*wx**2));
    
    ey = np.ones(shape=(1,sx));
    y2 = np.kron(y**2/(2*wy**2),ey);

    rad2 = (x2+y2);

    A = np.exp(-rad2);

    return A

def speckle2(sy,sx,wy,wx):

    Btemp = 2*np.pi*rnd.rand(sy,sx);
    B = np.exp(complex(0,1)*Btemp);

    C = gauss2(sy,sx,wy,wx);

    Atemp = signal.convolve2d(B,C,'same');

    Intens = np.mean(np.mean(np.abs(Atemp)**2));

    D = np.real(Atemp/np.sqrt(Intens));

    Dphs = np.arctan2(np.imag(D),np.real(D));

    return D, Dphs


Sp, Sphs = speckle2(N,N,N/16,N/16)

plt.figure(2)
plt.matshow(Sp,2,cmap=plt.cm.get_cmap('seismic'))  # hsv, seismic, bwr
plt.show()

fx, fy = np.gradient(Sp);

fxx,fxy = np.gradient(fx);
fyx,fyy = np.gradient(fy);

J = fxx*fyy - fxy*fyx;

D = np.abs(1/J)

plt.figure(3)
plt.matshow(D,3,cmap=plt.cm.get_cmap('gray'))  # hsv, seismic, bwr
plt.clim(0,0.5e7)
plt.show()

eps = 1e-7
cnt = 0
E = np.zeros(shape=(N,N))
for yloop in range(0,N-1):
    for xloop in range(0,N-1):
        
        d = N/2
        
        indx = int(N/2 + (d*(fx[yloop,xloop])+(xloop-N/2)/2))
        indy = int(N/2 + (d*(fy[yloop,xloop])+(yloop-N/2)/2))
        
        if ((indx > 0) and (indx < N)) and ((indy > 0) and (indy < N)):
            E[indy,indx] = E[indy,indx] + 1

plt.figure(4)
plt.imshow(E,interpolation='bicubic',cmap=plt.cm.get_cmap('gray'))
plt.clim(0,30)
plt.xlim(N/4, 3*N/4)
plt.ylim(N/4,3*N/4)

References

[1] D. D. Nolte, “Speckle and Spatial Coherence,” Chapter 3 in Optical Interferometry for Biology and Medicine (Springer, 2012), pp. 95-121.

[2] E. Hairer and G. Wanner, Analysis by its history. (Springer, 1996)

[3] C. Huygens (1690), Treatise on light : in which are explained the causes of that which occurs in reflection, & in refraction and particularly in the strange refraction of Iceland crystal. Ed. S. P. Thompson, (University of Chicago Press, 1950).

A Short History of the Photon

The quantum of light—the photon—is a little over 100 years old.  It was born in 1905 when Einstein merged Planck’s blackbody quantum hypothesis with statistical mechanics and concluded that light itself must be quantized.  No one believed him!  Fast forward to today, and the photon is a modern workhorse of modern quantum technology.  Quantum encryption and communication are performed almost exclusively with photons, and many prototype quantum computers are optics based.  Quantum optics also underpins atomic and molecular optics (AMO), which is one of the hottest and most rapidly advancing  frontiers of physics today.

Only after the availability of “quantum” light sources … could photon numbers be manipulated at will, launching the modern era of quantum optics.

This blog tells the story of the early days of the photon and of quantum optics.  It begins with Einstein in 1905 and ends with the demonstration of photon anti-bunching that was the first fundamentally quantum optical phenomenon observed seventy years later in 1977.  Across that stretch of time, the photon went from a nascent idea in Einstein’s fertile brain to the most thoroughly investigated quantum particle in the realm of physics.

The Photon: Albert Einstein (1905)

When Planck presented his quantum hypothesis in 1900 to the German Physical Society [1], his model of black body radiation retained all its classical properties but one—the quantized interaction of light with matter.  He did not think yet in terms of quanta, only in terms of steps in a continuous interaction.

The quantum break came from Einstein when he published his 1905 paper proposing the existence of the photon—an actual quantum of light that carried with it energy and momentum [2].  His reasoning was simple and iron-clad, resting on Planck’s own blackbody relation that Einstein combined with simple reasoning from statistical mechanics.  He was led inexorably to the existence of the photon.  Unfortunately, almost no one believed him (see my blog on Einstein and Planck). 

This was before wave-particle duality in quantum thinking, so the notion that light—so clearly a wave phenomenon—could be a particle was unthinkable.  It had taken half of the 19th century to rid physics of Newton’s corpuscules and emmisionist theories of light, so to bring it back at the beginning of the 20th century seemed like a great blunder.  However, Einstein persisted.

In 1909 he published a paper on the fluctuation properties of light [3] in which he proposed that the fluctuations observed in light intensity had two contributions: one from the discreteness of the photons (what we call “shot noise” today) and one from the fluctuations in the wave properties.  Einstein was proposing that both particle and wave properties contributed to intensity fluctuations, exhibiting simultaneous particle-like and wave-like properties.  This was one of the first expressions of wave-particle duality in modern physics.

In 1916 and 1917 Einstein took another bold step and proposed the existence of stimulated emission [4].  Once again, his arguments were based on simple physics—this time the principle of detailed balance—and he was led to the audacious conclusion that one photon can stimulated the emission of another.  This would become the basis of the laser forty-five years later.

While Einstein was confident in the reality of the photon, others sincerely doubted its existence.  Robert Milliken (1868 – 1953) decided to put Einstein’s theory of photoelectron emission to the most stringent test ever performed.  In 1915 he painstakingly acquired the definitive dataset with the goal to refute Einstein’s hypothesis, only to confirm it in spectacular fashion [5].  Partly based on Milliken’s confirmation of Einstein’s theory of the photon, Einstein was awarded the Nobel Prize in Physics in 1921.

Einstein at a blackboard.

From that point onward, the physical existence of the photon was accepted and was incorporated routinely into other physical theories.  Compton used the energy and the momentum of the photon in 1922 to predict and measure Compton scattering of x-rays off of electrons [6].  The photon was given its modern name by Gilbert Lewis in 1926 [7].

Single-Photon Interference: Geoffry Taylor (1909)

If a light beam is made up of a group of individual light quanta, then in the limit of very dim light, there should just be one photon passing through an optical system at a time.  Therefore, to do optical experiments on single photons, one just needs to reach the ultimate dim limit.  As simple and clear as this argument sounds, it has problems that only were sorted out after the Hanbury Brown and Twiss experiments in the 1950’s and the controversy they launched (see below).  However, in 1909, this thinking seemed like a clear approach for looking for deviations in optical processes in the single-photon limit.

In 1909, Geoffry Ingram Taylor (1886 – 1975) was an undergraduate student at Cambridge University and performed a low-intensity Young’s double-slit experiment (encouraged by J. J. Thomson).  At that time the idea of Einstein’s photon was only 4 years old, and Bohr’s theory of the hydrogen atom was still a year away.  But Thomson believed that if photons were real, then their existence could possibly show up as deviations in experiments involving single photons.  Young’s double-slit experiment is the classic demonstration of the classical wave nature of light, so performing it under conditions when (on average) only a single photon was in transit between a light source and a photographic plate seemed like the best place to look.

G. I. Taylor

The experiment was performed by finding an optimum exposure of photographic plates in a double slit experiment, then reducing the flux while increasing the exposure time, until the single-photon limit was achieved while retaining the same net exposure of the photographic plate.  Under the lowest intensity, when only a single photon was in transit at a time (on average), Taylor performed the exposure for three months.  To his disappointment, when he developed the film, there was no significant difference between high intensity and low intensity interference fringes [8].  If photons existed, then their quantized nature was not showing up in the low-intensity interference experiment.

The reason that there is no single-photon-limit deviation in the behavior of the Young double-slit experiment is because Young’s experiment only measures first-order coherence properties.  The average over many single-photon detection events is described equally well either by classical waves or by quantum mechanics.  Quantized effects in the Young experiment could only appear in fluctuations in the arrivals of photons, but in Taylor’s day there was no way to detect the arrival of single photons. 

Quantum Theory of Radiation : Paul Dirac (1927)

After Paul Dirac (1902 – 1984) was awarded his doctorate from Cambridge in 1926, he received a stipend that sent him to work with Niels Bohr (1885 – 1962) in Copenhagen. His attention focused on the electromagnetic field and how it interacted with the quantized states of atoms.  Although the electromagnetic field was the classical field of light, it was also the quantum field of Einstein’s photon, and he wondered how the quantized harmonic oscillators of the electromagnetic field could be generated by quantum wavefunctions acting as operators.  He decided that, to generate a photon, the wavefunction must operate on a state that had no photons—the ground state of the electromagnetic field known as the vacuum state.

Dirac put these thoughts into their appropriate mathematical form and began work on two manuscripts.  The first manuscript contained the theoretical details of the non-commuting electromagnetic field operators.  He called the process of generating photons out of the vacuum “second quantization”.  In second quantization, the classical field of electromagnetism is converted to an operator that generates quanta of the associated quantum field out of the vacuum (and also annihilates photons back into the vacuum).  The creation operators can be applied again and again to build up an N-photon state containing N photons that obey Bose-Einstein statistics, as they must, as required by their integer spin, and agreeing with Planck’s blackbody radiation. 

Dirac then showed how an interaction of the quantized electromagnetic field with quantized energy levels involved the annihilation and creation of photons as they promoted electrons to higher atomic energy levels, or demoted them through stimulated emission.  Very significantly, Dirac’s new theory explained the spontaneous emission of light from an excited electron level as a direct physical process that creates a photon carrying away the energy as the electron falls to a lower energy level.  Spontaneous emission had been explained first by Einstein more than ten years earlier when he derived the famous A and B coefficients [4], but the physical mechanism for these processes was inferred rather than derived. Dirac, in late 1926, had produced the first direct theory of photon exchange with matter [9]

Paul Dirac in his early days.

Einstein-Podolsky-Rosen (EPR) and Bohr (1935)

The famous dialog between Einstein and Bohr at the Solvay Conferences culminated in the now famous “EPR” paradox of 1935 when Einstein published (together with B. Podolsky and N. Rosen) a paper that contained a particularly simple and cunning thought experiment. In this paper, not only was quantum mechanics under attack, but so was the concept of reality itself, as reflected in the paper’s title “Can Quantum Mechanical Description of Physical Reality Be Considered Complete?” [10].

Bohr and Einstein at Paul Ehrenfest’s house in 1925.

Einstein considered an experiment on two quantum particles that had become “entangled” (meaning they interacted) at some time in the past, and then had flown off in opposite directions. By the time their properties are measured, the two particles are widely separated. Two observers each make measurements of certain properties of the particles. For instance, the first observer could choose to measure either the position or the momentum of one particle. The other observer likewise can choose to make either measurement on the second particle. Each measurement is made with perfect accuracy. The two observers then travel back to meet and compare their measurements.   When the two experimentalists compare their data, they find perfect agreement in their values every time that they had chosen (unbeknownst to each other) to make the same measurement. This agreement occurred either when they both chose to measure position or both chose to measure momentum.

It would seem that the state of the particle prior to the second measurement was completely defined by the results of the first measurement. In other words, the state of the second particle is set into a definite state (using quantum-mechanical jargon, the state is said to “collapse”) the instant that the first measurement is made. This implies that there is instantaneous action at a distance −− violating everything that Einstein believed about reality (and violating the law that nothing can travel faster than the speed of light). He therefore had no choice but to consider this conclusion of instantaneous action to be false.  Therefore quantum mechanics could not be a complete theory of physical reality −− some deeper theory, yet undiscovered, was needed to resolve the paradox.

Bohr, on the other hand, did not hold “reality” so sacred. In his rebuttal to the EPR paper, which he published six months later under the identical title [11], he rejected Einstein’s criterion for reality. He had no problem with the two observers making the same measurements and finding identical answers. Although one measurement may affect the conditions of the second despite their great distance, no information could be transmitted by this dual measurement process, and hence there was no violation of causality. Bohr’s mind-boggling viewpoint was that reality was nonlocal, meaning that in the quantum world the measurement at one location does influence what is measured somewhere else, even at great distance. Einstein, on the other hand, could not accept a nonlocal reality.

Entangled versus separable states. When the states are separable, no measurement on photon A has any relation to measurements on photon B. However, in the entangled case, all measurements on A are related to measurements on B (and vice versa) regardless of what decision is made to make what measurement on either photon, or whether the photons are separated by great distance. The entangled wave-function is “nonlocal” in the sense that it encompasses both particles at the same time, no matter how far apart they are.

The Intensity Interferometer:  Hanbury Brown and Twiss (1956)

Optical physics was surprisingly dormant from the 1930’s through the 1940’s. Most of the research during this time was either on physical optics, like lenses and imaging systems, or on spectroscopy, which was more interested in the physical properties of the materials than in light itself. This hiatus from the photon was about to change dramatically, not driven by physicists, but driven by astronomers.

The development of radar technology during World War II enabled the new field of radio astronomy both with high-tech receivers and with a large cohort of scientists and engineers trained in radio technology. In the late 1940’s and early 1950’s radio astronomy was starting to work with long baselines to better resolve radio sources in the sky using interferometery. The first attempts used coherent references between two separated receivers to provide a common mixing signal to perform field-based detection. However, the stability of the reference was limiting, especially for longer baselines.

In 1950, a doctoral student in the radio astronomy department of the University of Manchester, R. Hanbury Brown, was given the task to design baselines that could work at longer distances to resolve smaller radio sources. After struggling with the technical difficulties of providing a coherent “local” oscillator for distant receivers, Hanbury Brown had a sudden epiphany one evening. Instead of trying to reference the field of one receiver to the field of another, what if, instead, one were to reference the intensity of one receiver to the intensity of the other, specifically correlating the noise on the intensity? To measure intensity requires no local oscillator or reference field. The size of an astronomical source would then show up in how well the intensity fluctuations correlated with each other as the distance between the receivers was changed. He did a back of the envelope calculation that gave him hope that his idea might work, but he needed more rigorous proof if he was to ask for money to try out his idea. He tracked down Richard Twiss at a defense research lab and the two working out the theory of intensity correlations for long-baseline radio interferometry. Using facilities at the famous Jodrell Bank Radio Observatory at Manchester, they demonstrated the principle of their intensity interferometer and measured the angular size of Cygnus A and Cassiopeia A, two of the strongest radio sources in the Northern sky.

R. Hanbury Brown

One of the surprising side benefits of the intensity interferometer over field-based interferometry was insensitivity to environmental phase fluctuations. For radio astronomy the biggest source of phase fluctuations was the ionosphere, and the new intensity interferometer was immune to its fluctuations. Phase fluctuations had also been the limiting factor for the Michelson stellar interferometer which had limited its use to only about half a dozen stars, so Hanbury Brown and Twiss decided to revisit visible stellar interferometry using their new concept of intensity interferometry.

To illustrate the principle for visible wavelengths, Hanbury Brown and Twiss performed a laboratory experiment to correlate intensity fluctuations in two receivers illuminated by a common source through a beam splitter. The intensity correlations were detected and measured as a function of path length change, illustrating an excess correlation in noise for short path lengths that decayed as the path length increased. They published their results in Nature magazine in 1956 that immediately ignited a firestorm of protest from physicists [12].

In the 1950’s, many physicists had embraced the discrete properties of the photon and had developed a misleading mental picture of photons as individual and indivisible particles that could only go one way or another from a beam splitter, but not both. Therefore, the argument went, if the photon in an attenuated beam was detected in one detector at the output of a beam splitter, then it cannot be detected at the other. This would produce an anticorrelation in coincidence counts at the two detectors. However, the Hanbury Brown Twiss (HBT) data showed a correlation from the two detectors. This launched an intense controversy in which some of those who accepted the results called for a radical new theory of the photon, while most others dismissed the HBT results as due to systematics in the light source. The heart of this controversy was quickly understood by the Nobel laureate E. M Purcell. He correctly pointed out that photons are bosons and are indistinguishable discrete particles and hence are likely to “bunch” together, according to quantum statistics, even under low light conditions [13]. Therefore, attenuated “chaotic” light would indeed show photodetector correlations, even if the average photon number was less than a single photon at a time, the photons would still bunch.

The bunching of photons in light is a second order effect that moves beyond the first-order interference effects of Young’s double slit, but even here the quantum nature of light is not required. A semiclassical theory of light emission from a spectral line with a natural bandwidth also predicts intensity correlations, and the correlations are precisely what would be observed for photon bunching. Therefore, even the second-order HBT results, when performed with natural light sources, do not distinguish between classical and quantum effects in the experimental results. But this reliance on natural light sources was about to change fundmaentally with the invention of the laser.

Invention of the Laser : Ted Maiman (1959)

One of the great scientific breakthroughs of the 20th century was the nearly simultaneous yet independent realization by several researchers around 1951 (by Charles H. Townes of Columbia University, by Joseph Weber of the University of Maryland, and by Alexander M. Prokhorov and Nikolai G. Basov at the Lebedev Institute in Moscow) that clever techniques and novel apparati could be used to produce collections of atoms that had more electrons in excited states than in ground states. Such a situation is called a population inversion. If this situation could be attained, then according to Einstein’s 1917 theory of photon emission, a single photon would stimulate a second photon, which in turn would stimulate two additional electrons to emit two identical photons to give a total of four photons −− and so on. Clearly this process turns a single photon into a host of photons, all with identical energy and phase.

Theodore Maiman

Charles Townes and his research group were the first to succeed in 1953 in producing a device based on ammonia molecules that could work as an intense source of coherent photons. The initial device did not amplify visible light, but amplified microwave photons that had wavelengths of about 3 centimeters. They called the process microwave amplification by stimulated emission of radiation, hence the acronym “MASER”. Despite the significant breakthrough that this invention represented, the devices were very expensive and difficult to operate. The maser did not revolutionize technology, and some even quipped that the acronym stood for “Means of Acquiring Support for Expensive Research”. The maser did, however, launch a new field of study, called quantum electronics, that was the direct descendant of Einstein’s 1917 paper. Most importantly, the existence and development of the maser became the starting point for a device that could do the same thing for light.

The race to develop an optical maser (later to be called laser, for light amplification by stimulated emission of radiation) was intense. Many groups actively pursued this holy grail of quantum electronics. Most believed that it was possible, which made its invention merely a matter of time and effort. This race was won by Theodore H. Maiman at Hughes Research Laboratory in Malibu California in 1960 [14]. He used a ruby crystal that was excited into a population inversion by an intense flash tube (like a flash bulb) that had originally been invented for flash photography. His approach was amazingly simple −− blast the ruby with a high-intensity pulse of light and see what comes out −− which explains why he was the first. Most other groups had been pursuing much more difficult routes because they believed that laser action would be difficult to achieve.

Perhaps the most important aspect of Maiman’s discovery was that it demonstrated that laser action was actually much simpler than people anticipated, and that laser action is a fairly common phenomenon. His discovery was quickly repeated by other groups, and then additional laser media were discovered such as helium-neon gas mixtures, argon gas, carbon dioxide gas, garnet lasers and others. Within several years, over a dozen different material and gas systems were made to lase, opening up wide new areas of research and development that continues unabated to this day. It also called for new theories of optical coherence to explain how coherent laser light interacted with matter.

Coherent States : Glauber (1963)

The HBT experiment had been performed with attenuated chaotic light that had residual coherence caused by the finite linewidth of the filtered light source. The theory of intensity correlations for this type of light was developed in the 1950’s by Emil Wolf and Leonard Mandel using a semiclassical theory in which the statistical properties of the light was based on electromagnetics without a direct need for quantized photons. The HBT results were fully consistent with this semiclassical theory. However, after the invention of the laser, new “coherent” light sources became available that required a fundamentally quantum depiction.

Roy Glauber was a theoretical physicist who received his PhD working with Julian Schwinger at Harvard. He spent several years as a post-doc at Princeton’s Institute for Advanced Study starting in 1949 at the time when quantum field theory was being developed by Schwinger, Feynman and Dyson. While Feynman was off in Brazil for a year learning to play the bongo drums, Glauber filled in for his lectures at Cal Tech. He returned to Harvard in 1952 in the position of an assistant professor. He was already thinking about the quantum aspects of photons in 1956 when news of the photon correlations in the HBT experiment were published, and when the laser was invented three years later, he began developing a theory of photon correlations in laser light that he suspected would be fundamentally different than in natural chaotic light.

Roy Glauber

Because of his background in quantum field theory, and especially quantum electrodynamics, it was a fairly easy task to couch the quantum optical properties of coherent light in terms of Dirac’s creation and annihilation operators of the electromagnetic field. Related to the minimum-uncertainty wave functions derived initially by Schrödinger in the late 1920’s, Glauber developed a “coherent state” operator that was a minimum uncertainty state of the quantized electromagnetic field [15]. This coherent state represents a laser operating well above the lasing threshold and predicted that the HBT correlations would vanish. Glauber was awarded the Nobel Prize in Physics in 2005 for his work on such “Glauber” states in quantum optics.

Single-Photon Optics: Kimble and Mandel (1977)

Beyond introducing coherent states, Glauber’s new theoretical approach, and parallel work by George Sudarshan around the same time [16], provided a new formalism for exploring quantum optical properties in which fundamentally quantum processes could be explored that could not be predicted using only semiclassical theory. For instance, one could envision producing photon states in which the photon arrivals at a detector could display the kind of anti-bunching that had originally been assumed (in error) by the critics of the HBT experiment. A truly one-photon state, also known as a Fock state or a number state, would be the extreme limit in which the quantum field possessed a single quantum that could be directed at a beam splitter and would emerge either from one side or the other with complete anti-correlation. However, generating such a state in the laboratory remained a challenge.

In 1975 by Carmichel and Walls predicted that resonance fluorescence could produce quantized fields that had lower correlations than coherent states [17]. In 1977 H. J. Kimble, M. Dagenais and L. Mandel demonstrated, for the first time, photon antibunching between two photodetectors at the two ports of a beam splitter [18]. They used a beam of sodium atoms pumped by a dye laser.

This first demonstration of photon antibunching represents a major milestone in the history of quantum optics. Taylor’s first-order experiments in 1909 showed no difference between classical electromagnetic waves and a flux of photons. Similarly the second-order HBT experiment of 1956 using chaotic light could be explained equally well using classical or quantum approaches to explain the observed photon correlations. Even laser light (when the laser is operated far above threshold) produced classic “classical” wave effects with only the shot noise demonstrating the discreteness of photon arrivals. Only after the availability of “quantum” light sources, beginning with the work of Kimble and Mandel, could photon numbers be manipulated at will, launching the modern era of quantum optics. Later experiments by them and others have continually improved the control of photon states.

TimeLine:

  • 1900 – Planck (1901). “Law of energy distribution in normal spectra.” Annalen Der Physik 4(3): 553-563.
  • 1905 – A. Einstein (1905). “Generation and conversion of light with regard to a heuristic point of view.” Annalen Der Physik 17(6): 132-148.
  • 1909 – A. Einstein (1909). “On the current state of radiation problems.” Physikalische Zeitschrift 10: 185-193.
  • 1909 – G.I. Taylor: Proc. Cam. Phil. Soc. Math. Phys. Sci. 15 , 114 (1909) Single photon double-slit experiment
  • 1915 – Millikan, R. A. (1916). “A direct photoelectric determination of planck’s “h.”.” Physical Review 7(3): 0355-0388. Photoelectric effect.
  • 1916 – Einstein, A. (1916). “Strahlungs-Emission un -Absorption nach der Quantentheorie.” Verh. Deutsch. Phys. Ges. 18: 318.. Einstein predicts stimulated emission
  • 1923 –Compton, Arthur H. (May 1923). “A Quantum Theory of the Scattering of X-Rays by Light Elements”. Physical Review. 21 (5): 483–502.
  • 1926 – Lewis, G. N. (1926). “The conservation of photons.” Nature 118: 874-875.. Gilbert Lewis named “photon”
  • 1927 – D. Dirac, P. A. M. (1927). “The quantum theory of the emission and absorption of radiation.” Proceedings of the Royal Society of London Series a-Containing Papers of a Mathematical and Physical Character 114(767): 243-265.
  • 1932 – E. P. Wigner: Phys. Rev. 40, 749 (1932)
  • 1935 – A. Einstein, B. Podolsky, N. Rosen: Phys. Rev. 47 , 777 (1935). EPR paradox.
  • 1935 – N. Bohr: Phys. Rev. 48 , 696 (1935). Bohr’s response to the EPR paradox.
  • 1956 – R. Hanbury-Brown, R.W. Twiss: Nature 177 , 27 (1956) Photon bunching
  • 1963 – R. J. Glauber: Phys. Rev. 130 , 2529 (1963) Coherent states
  • 1963 – E. C. G. Sudarshan: Phys. Rev. Lett. 10, 277 (1963) Coherent states
  • 1964 – P. L. Kelley, W.H. Kleiner: Phys. Rev. 136 , 316 (1964)
  • 1966 – F. T. Arecchi, E. Gatti, A. Sona: Phys. Rev. Lett. 20 , 27 (1966); F.T. Arecchi, Phys. Lett. 16 , 32 (1966)
  • 1966 – J. S. Bell: Physics 1 , 105 (1964); Rev. Mod. Phys. 38 , 447 (1966) Bell inequalities
  • 1967 – R. F. Pfleegor, L. Mandel: Phys. Rev. 159 , 1084 (1967) Interference at single photon level
  • 1967 – M. O. Scully, W.E. Lamb: Phys. Rev. 159 , 208 (1967).  Quantum theory of laser
  • 1967 – B. R. Mollow, R. J. Glauber: Phys. Rev. 160, 1097 (1967); 162, 1256 (1967) Parametric converter
  • 1969 – M. O. Scully, W.E. Lamb: Phys. Rev. 179 , 368 (1969).  Quantum theory of laser
  • 1969 – M. Lax, W.H. Louisell: Phys. Rev. 185 , 568 (1969).  Quantum theory of laser
  • 1975 – Carmichael, H. J. and D. F. Walls (1975). Journal of Physics B-Atomic Molecular and Optical Physics 8(6): L77-L81. Photon anti-bunching predicted in resonance fluorescence
  • 1977 – H. J. Kimble, M. Dagenais and L. Mandel (1977) Photon antibunching in resonance fluorescence. Phys. Rev. Lett. 39, 691-5:  Kimble, Dagenais and Mandel demonstrate the effect of antibunching

References

• Parts of this blog are excerpted from Mind at Light Speed, D. Nolte (Free Press, 2001) that tells the story of light’s central role in telecommunications and in the future of optical and quantum computers.

[1] Planck (1901). “Law of energy distribution in normal spectra.” Annalen Der Physik 4(3): 553-563.

[2] A. Einstein (1905). “Generation and conversion of light with regard to a heuristic point of view.” Annalen Der Physik 17(6): 132-148

[3] A. Einstein (1909). “On the current state of radiation problems.” Physikalische Zeitschrift 10: 185-193.

[4] Einstein, A. (1916). “Strahlungs-Emission un -Absorption nach der Quantentheorie.” Verh. Deutsch. Phys. Ges. 18: 318; Einstein, A. (1917). “Quantum theory of radiation.” Physikalische Zeitschrift 18: 121-128.

[5] Millikan, R. A. (1916). “A direct photoelectric determination of planck‘s “h.”.” Physical Review 7(3): 0355-0388.

[6] Compton, A. H. (1923). “A quantum theory of the scattering of x-rays by light elements.” Physical Review 21(5): 0483-0502.

[7] Lewis, G. N. (1926). “The conservation of photons.” Nature 118: 874-875.

[8] Taylor, G. I. (1910). “Interference fringes with feeble light.” Proceedings of the Cambridge Philosophical Society 15: 114-115.

[9] Dirac, P. A. M. (1927). “The quantum theory of the emission and absorption of radiation.” Proceedings of the Royal Society of London Series a-Containing Papers of a Mathematical and Physical Character 114(767): 243-265.

[10] Einstein, A., B. Podolsky and N. Rosen (1935). “Can quantum-mechanical description of physical reality be considered complete?” Physical Review 47(10): 0777-0780.

[11] Bohr, N. (1935). “Can quantum-mechanical description of physical reality be considered complete?” Physical Review 48(8): 696-702.

[12] Brown, R. H. and R. Q. Twiss (1956). “Correlation Between Photons in 2 Coherent Beams of Light.” Nature 177(4497): 27-29; [1] R. H. Brown and R. Q. Twiss, “Test of a new type of stellar interferometer on Sirius,” Nature, vol. 178, no. 4541, pp. 1046-1048, (1956).

[13] Purcell, E. M. (1956). “Question of Correlation Between Photons in Coherent Light Rays.” Nature 178(4548): 1448-1450.

[14] Maimen, T. H. (1960). “Stimulated optical radiation in ruby.” Nature 187: 493.

[15] Glauber, R. J. (1963). “Photon Correlations.” Physical Review Letters 10(3): 84.

[16] Sudarshan, E. C. G. (1963). “Equivalence of semiclassical and quantum mechanical descriptions of statistical light beams.” Physical Review Letters 10(7): 277-&.; Mehta, C. L. and E. C. Sudarshan (1965). “Relation between quantum and semiclassical description of optical coherence.” Physical Review 138(1B): B274.

[17] Carmichael, H. J. and D. F. Walls (1975). “Quantum treatment of spontaneous emission from a strongly driven 2-level atom.” Journal of Physics B-Atomic Molecular and Optical Physics 8(6): L77-L81.

[18] Kimble, H. J., M. Dagenais and L. Mandel (1977). “Photon anti bunching in resonance fluorescence.” Physical Review Letters 39(11): 691-695.

Quantum Seeing without Looking? The Strange Physics of Quantum Sensing

Quantum sensors have amazing powers.  They can detect the presence of an obstacle without ever interacting with it.  For instance, consider a bomb that is coated with a light sensitive layer that sets off the bomb if it absorbs just a single photon.  Then put this bomb inside a quantum sensor system and shoot photons at it.  Remarkably, using the weirdness of quantum mechanics, it is possible to design the system in such a way that you can detect the presence of the bomb using photons without ever setting it off.  How can photons see the bomb without illuminating it?  The answer is a bizarre side effect of quantum physics in which quantum wavefunctions are recognized as the root of reality as opposed to the pesky wavefunction collapse at the moment of measurement.

The ability for a quantum system to see an object with light, without exposing it, is uniquely a quantum phenomenon that has no classical analog.

All Paths Lead to Feynman

When Richard Feynman was working on his PhD under John Archibald Wheeler at Princeton in the early 1940’s he came across an obscure paper written by Paul Dirac in 1933 that connected quantum physics with classical Lagrangian physics.  Dirac had recognized that the phase of a quantum wavefunction was analogous to the classical quantity called the “Action” that arises from Lagrangian physics.  Building on this concept, Feynman constructed a new interpretation of quantum physics, known as the “many histories” interpretation, that occupies the middle ground between Schrödinger’s wave mechanics and Heisenberg’s matrix mechanics.  One of the striking consequences of the many histories approach is the emergence of the principle of least action—a classical concept—into interpretations of quantum phenomena.  In this approach, Feynman considered ALL possible histories for the propagation of a quantum particle from one point to another, he tabulated the quantum action in the phase factor, and he then summed all of these histories.

One of the simplest consequences of the sum over histories is a quantum interpretation of Snell’s law of refraction in optics.  When summing over all possible trajectories of a photon from a point above to a point below an interface, there are a subset of paths for which the action integral varies very little from one path in the subset to another.  The consequence of this is that the phases of all these paths add constructively, producing a large amplitude to the quantum wavefunction along the centroid of these trajectories.  Conversely, for paths far away from this subset, the action integral takes on many values and the phases tend to interfere destructively, canceling the wavefunction along these other paths.  Therefore, the most likely path of the photon between the two points is the path of maximum constructive interference and hence the path of stationary action.  It is simple so show that this path is none other than the classical path determined by Snell’s Law and equivalently by Fermat’s principle of least time.  With the many histories approach, we can add the principle of least (or stationary) action to the list of explanations of Snell’s Law.  This argument holds as well for an electron (with mass and a de Broglie wavelength) as it does for a photon, so this not just a coincidence specific to optics but is a fundamental part of quantum physics.

A more subtle consequence of the sum over histories view of quantum phenomena is Young’s double slit experiment for electrons, shown at the top of Fig 1.  The experiment consists of a source that emits only a single electron at a time that passes through a double-slit mask to impinge on an electron detection screen.  The wavefunction for a single electron extends continuously throughout the full spatial extent of the apparatus, passing through both slits.  When the two paths intersect at the screen, the difference in the quantum phases of the two paths causes the combined wavefunction to have regions of total constructive interference and other regions of total destructive interference.  The probability of detecting an electron is proportional to the squared amplitude of the wavefunction, producing a pattern of bright stripes separated by darkness.  At positions of destructive interference, no electrons are detected when both slits are open.  However, if an opaque plate blocks the upper slit, then the interference pattern disappears, and electrons can be detected at those previously dark locations.  Therefore, the presence of the object can be deduced by the detection of electrons at locations that should be dark.

Fig. 1  Demonstration of the sum over histories in a double-slit experiment for electrons. In the upper frame, the electron interference pattern on the phosphorescent screen produces bright and dark stripes.  No electrons hit the screen in a dark stripe.  When the upper slit is blocked (bottom frame), the interference pattern disappears, and an electron can arrive at the location that had previously been dark.

Consider now when the opaque plate is an electron-sensitive detector.  In this case, a single electron emitted by the source can be detected at the screen or at the plate.  If it is detected at the screen, it can appear at the location of a dark fringe, heralding the presence of the opaque plate.  Yet the quantum conundrum is that when the electron arrives at a dark fringe, it must be detected there as a whole, it cannot be detected at the electron-sensitive plate too.  So how does the electron sense the presence of the detector without exposing it, without setting it off? 

In Feynman’s view, the electron does set off the detector as one possible history.  And that history interferes with the other possible history when the electron arrives at the screen.  While that interpretation may seem weird, mathematically it is a simple statement that the plate blocks the wavefunction from passing through the upper slit, so the wavefunction in front of the screen, resulting from all possible paths, has no interference fringes (other than possible diffraction from the lower slit).  From this point of view, the wavefunction samples all of space, including the opaque plate, and the eventual absorption of a photon one place or another has no effect on the wavefunction.  In this sense, it is the wavefunction, prior to any detection event, that samples reality.  If the single electron happens to show up at a dark fringe at the screen, the plate, through its effects on the total wavefunction, has been detected without interacting with the photon. 

This phenomenon is known as an interaction-free measurement, but there are definitely some semantics issues here.  Just because the plate doesn’t absorb a photon, it doesn’t mean that the plate plays no role.  The plate certainly blocks the wavefunction from passing through the upper slit.  This might be called an “interaction”, but that phrase it better reserved for when the photon is actually absorbed, while the role of the plate in shaping the wavefunction is better described as one of the possible histories.

Quantum Seeing in the Dark

Although Feynman was thinking hard (and clearly) about these issues as he presented his famous lectures in physics at Cal Tech during 1961 to 1963, the specific possibility of interaction-free measurement dates more recently to 1993 when Avshalom C. Elitzur and Lev Vaidman at Tel Aviv University suggested a simple Michelson interferometer configuration that could detect an object half of the time without interacting with it [1].  They are the ones who first pressed this point home by thinking of a light-sensitive bomb.  There is no mistaking when a bomb goes off, so it tends to give an exaggerated demonstration of the interaction-free measurement. 

The Michelson interferometer for interaction-free measurement is shown in Fig. 2.  This configuration uses a half-silvered beamsplitter to split the possible photon paths.  When photons hit the beamsplitter, they either continue traveling to the right, or are deflected upwards.  After reflecting off the mirrors, the photons again encounter the beamsplitter, where, in each case, they continue undeflected or are reflected.  The result is that two paths combine at the beamsplitter to travel to the detector, while two other paths combine to travel back along the direction of the incident beam. 

Fig. 2 A quantum-seeing in the dark (QSD) detector with a photo-sensitive bomb. A single photon is sent into the interferometer at a time. If the bomb is NOT present, destructive interference at the detector guarantees that the photon is not detected. However, if the bomb IS present, it destroys the destructive interference and the photon can arrive at the detector. That photon heralds the presence of the bomb without setting it off. (Reprinted from Mind @ Light Speed)

The paths of the light beams can be adjusted so that the beams that combine to travel to the detector experience perfect destructive interference.  In this situation, the detector never detects light, and all the light returns back along the direction of the incident beam.  Quantum mechanically, when only a single photon is present in the interferometer at a time, we would say that the quantum wavefunction of the photon interferes destructively along the path to the detector, and constructively along the path opposite to the incident beam, and the detector would detect no photons.  It is clear that the unobstructed path of both beams results in the detector making no detections.

Now place the light sensitive bomb in the upper path.  Because this path is no longer available to the photon wavefunction, the destructive interference of the wavefunction along the detector path is removed.  Now when a single photon is sent into the interferometer, three possible things can happen.  One, the photon is reflected by the beamsplitter and detonates the bomb.  Two, the photon is transmitted by the beamsplitter, reflects off the right mirror, and is transmitted again by the beamsplitter to travel back down the incident path without being detected by the detector.  Three, the photon is transmitted by the beamsplitter, reflects off the right mirror, and is reflected off the beamsplitter to be detected by the detector. 

In this third case, the photon is detected AND the bomb does NOT go off, which succeeds at quantum seeing in the dark.  The odds are much better than for Young’s experiment.  If the bomb is present, it will detonate a maximum of 50% of the time.  The other 50%, you will either detect a photon (signifying the presence of the bomb), or else you will not detect a photon (giving an ambiguous answer and requiring you to perform the experiment again).  When you perform the experiment again, you again have a 50% chance of detonating the bomb, and a 25% chance of detecting it without it detonating, but again a 25% chance of not detecting it, and so forth.  All in all, every time you send in a photon, you have one chance in four of seeing the bomb without detonating it.  These are much better odds than for the Young’s apparatus where only exact detection of the photon at a forbidden location would signify the presence of the bomb.

It is possible to increase your odds above one chance in four by decreasing the reflectivity of the beamsplitter.  In practice, this is easy to do simply by depositing less and less aluminum on the surface of the glass plate.  When the reflectivity gets very low, let us say at the level of 1%, then most of the time the photon just travels back along the direction it came and you have an ambiguous result.  On the other hand, when the photon does not return, there is an equal probability of detonation as detection.  This means that, though you may send in many photons, your odds for eventually seeing the bomb without detonating it are nearly 50%, which is a factor of two better odds than for the half-silvered beamsplitter.  A version of this experiment was performed by Paul Kwiat in 1995 as a postdoc at Innsbruck with Anton Zeilinger.  It was Kwiat who coined the phrase “quantum seeing in the dark” as a catchier version of “interaction-free measurement” [2].

A 50% chance of detecting the bomb without setting it off sounds amazing, until you think that there is a 50% chance that it will go off and kill you.  Then those odds don’t look so good.  But optical phenomena never fail to surprise, and they never let you down.  A crucial set of missing elements in the simple Michelson experiment was polarization-control using polarizing beamsplitters and polarization rotators.  These are common elements in many optical systems, and when they are added to the Michelson quantum sensor, they can give almost a 100% chance of detecting the bomb without setting it off using the quantum Zeno effect.

The Quantum Zeno Effect

Photons carry polarization as their prime quantum number, with two possible orientations.  These can be defined in different ways, but the two possible polarizations are orthogonal to each other.  For instance, these polarization pairs can be vertical (V)  and horizontal (H), or they can be right circular  and left circular.  One of the principles of quantum state evolution is that a quantum wavefunction can be maintained in a specific state, even if it has a tendency naturally to drift out of that state, by repeatedly making a quantum measurement that seeks to measure deviations from that state.  In practice, the polarization of a photon can be maintained by repeatedly passing it through a polarizing beamsplitter with the polarization direction parallel to the original polarization of the photon.  If there is a deviation in the photon polarization direction by a small angle, then a detector on the side port of the polarizing beamsplitter will fire with a probability equal to the square of the sine of the deviation.  If the deviation angle is very small, say Δθ, then the probability of measuring the deviation is proportional to (Δθ)2, which is an even smaller number.  Furthermore, the probability that the photon will transmit through the polarizing beamsplitter is equal to 1-(Δθ)2 , which is nearly 100%.

This is what happens in Fig. 3 when the photo-sensitive bomb IS present. A single H-polarized photon is injected through a switchable mirror into the interferometer on the right. In the path of the photon is a polarization rotator that rotates the polarization by a small angle Δθ. There is nearly a 100% chance that the photon will transmit through the polarizing beamsplitter with perfect H-polarization reflect from the mirror and return through the polarizing beamsplitter, again with perfect H-polarization to pass through the polarization rotator to the switchable mirror where it reflects, gains another increment to its polarization angle, which is still small, and transmits through the beamsplitter, etc. At each pass, the photon polarization is repeatedly “measured” to be horizontal. After a number of passes N = π/Δθ/2, the photon is switched out of the interferometer and is transmitted through the external polarizing beamsplitter where it is detected at the H-photon detector.

Now consider what happens when the bomb IS NOT present. This time, even though there is a high amplitude for the transmitted photon, there is that Δθ amplitude for reflection out the V port. This small V-amplitude, when it reflects from the mirror, recombines with the H-amplitude at the polarizing beamsplitter to produce a polarization that has the same tilted polarizaton that it started with, sending it back in the direction from which it came. (In this situation, the detector on the “dark” port of the internal beamsplitter never sees the photon because of destructive interference along this path.) The photon is then rotated once more by the polarization rotator, and the photon polarization is rotated again, etc.. Now, after a number of passes N = π/Δθ/2, the photon has acquired a V polarization and is switched out of the interferometer. At the external polarizing beamsplitter it is reflected out of the V-port where it is detected at the V-photon detector.

Fig. 3  Quantum Zeno effect for interaction-free measurement.  If the bomb is present, the H-photon detector detects the output photon without setting it off.  The switchable mirror ejects the photon after it makes π/Δθ/2 round trips in the polarizing interferometer.

The two end results of this thought experiment are absolutely distinct, giving a clear answer to the question whether the bomb is present or not. If the bomb IS present, the H-detector fires. If the bomb IS NOT present, then the V-detector fires. Through all of this, the chance to set off the bomb is almost zero. Therefore, this quantum Zeno interaction-free measurement detects the bomb with nearly 100% efficiency with almost no chance of setting it off. This is the amazing consequence of quantum physics. The wavefunction is affected by the presence of the bomb, altering the interference effects that allow the polarization to rotate. But the likelihood of a photon being detected by the bomb is very low.

On a side note: Although ultrafast switchable mirrors do exist, the experiment was much easier to perform by creating a helix in the optical path through the system so that there is only a finite number of bounces of the photon inside the cavity. See Ref. [2] for details.

In conclusion, the ability for a quantum system to see an object with light, without exposing it, is uniquely a quantum phenomenon that has no classical analog.  No E&M wave description can explain this effect.


Further Reading

I first wrote about quantum seeing the dark in my 2001 book on the future of optical physics and technology: Nolte, D. D. (2001). Mind at Light Speed : A new kind of intelligence. (New York, Free Press)

More on the story of Feynman and Wheeler and what they were trying to accomplish is told in Chapter 8 of Galileo Unbound on the physics and history of dynamics: Nolte, D. D. (2018). Galileo Unbound: A Path Across Life, the Universe and Everything (Oxford University Press).

Paul Kwiat introduced to the world to interaction-free measurements in 1995 in this illuminating Scientific American article: Kwiat, P., H. Weinfurter and A. Zeilinger (1996). “Quantum seeing in the dark – Quantum optics demonstrates the existence of interaction-free measurements: the detection of objects without light-or anything else-ever hitting them.” Scientific American 275(5): 72-78.


References

[1] Elitzur, A. C. and L. Vaidman (1993). “QUANTUM-MECHANICAL INTERACTION-FREE MEASUREMENTS.” Foundations of Physics 23(7): 987-997.

[2] Kwiat, P., H. Weinfurter, T. Herzog, A. Zeilinger and M. A. Kasevich (1995). “INTERACTION-FREE MEASUREMENT.” Physical Review Letters 74(24): 4763-4766.