Posts by Nolte

E. M. Purcell Distinguished Professor of Physics and Astronomy at Purdue University

The Physics of Starflight: Proxima Centauri b or Bust!

The ability to travel to the stars has been one of mankind’s deepest desires. Ever since we learned that we are just one world in a vast universe of limitless worlds, we have yearned to visit some of those others. Yet nature has thrown up an almost insurmountable barrier to that desire–the speed of light. Only by traveling at or near the speed of light may we venture to far-off worlds, and even then, decades or centuries will pass during the voyage. The vast distances of space keep all the worlds isolated–possibly for the better.

Yet the closest worlds are not so far away that they will always remain out of reach. The very limit of the speed of light provides ways of getting there within human lifetimes. The non-intuitive effects of special relativity come to our rescue, and we may yet travel to the closest exoplanet we know of.

Proxima Centauri b

The closest habitable Earth-like exoplanet is Proxima Centauri b, orbiting the red dwarf star Proxima Centauri that is about 4.2 lightyears away from Earth. The planet has a short orbital period of only about 11 Earth days, but the dimness of the red dwarf puts the planet in what may be a habitable zone where water is in liquid form. Its official discovery date was August 24, 2016 by the European Southern Observatory in the Atacama Desert of Chile using the Doppler method. The Alpha Centauri system is a three-star system, and even before the discovery of the planet, this nearest star system to Earth was the inspiration for the Hugo-Award winning sci-fi trilogy The Three Body Problem by Chinese author Liu Cixin, originally published in 2008.

It may seem like a coincidence that the closest Earth-like planet to Earth is in the closest star system to Earth, but it says something about how common such exoplanets may be in our galaxy.

Artist’s rendition of Proxima Centauri b. From WikiCommons.

Breakthrough Starshot

There are already plans to send centimeter-sized spacecraft to Alpha Centauri. One such project that has received a lot of press is Breakthrough Starshot, a project of the Breakthrough Initiatives. Breakthrough Starshot would send around 1000 centimeter-sized camera-carrying laser-fitted spacecraft with 5-meter-diameter solar sails propelled by a large array of high-power lasers. The reason there are so many of these tine spacecraft is because of the collisions that are expected to take place with interstellar dust during the voyage. It is possible that only a few dozen of the craft will finally make it to Alpha Centauri intact.

Relative locations of the stars of the Alpha Centauri system. From ScienceNews.

As these spacecraft fly by the Alpha Centauri system, possibly within one hundred million miles of Proxima Centauri b, their tiny HR digital cameras will take pictures of the planet’s surface with enough resolution to see surface features. The on-board lasers will then transmit the pictures back to Earth. The travel time to the planet is expected to be 20 or 30 years, plus the four years for the laser information to make it back to Earth. Therefore, it would take a quarter century after launch to find out if Proxima Centauri b is habitable or not. The biggest question is whether it has an atmosphere. The red dwarf it orbits sends out catastrophic electromagnetic bursts that could strip the planet of its atmosphere thus preventing any chance for life to evolve or even to be sustained there if introduced.

There are multiple projects under consideration for travel to the Alpha Centauri systems. Even NASA has a tentative mission plan called the 2069 Mission (100 year anniversary of the Moon landing). This would entail a single spacecraft with a much larger solar sail than the small starshot units. Some of the mission plans proposed star-drive technology, such as nuclear propulsion systems, rather than light sails. Some of these designs could sustain a 1-g acceleration throughout the entire mission. It is intriguing to do the math on what such a mission could look like, in terms of travel time. Could we get an unmanned probe to Alpha Centauri in a matter of years? Let’s find out.

Special Relativity of Acceleration

The most surprising aspect of deriving the properties of relativistic acceleration using special relativity is that it works at all. We were all taught as young physicists that special relativity deals with inertial frames in constant motion. So the idea of frames that are accelerating might first seem to be outside the scope of special relativity. But one of Einstein’s key insights, as he sought to extend special relativity towards a more general theory, was that one can define a series of instantaneously inertial co-moving frames relative to an accelerating body. In other words, at any instant in time, the accelerating frame has an inertial co-moving frame. Once this is defined, one can construct invariants, just as in usual special relativity. And these invariants unlock the full mathematical structure of accelerating objects within the scope of special relativity.

For instance, the four-velocity and the four-acceleration in a co-moving frame for an object accelerating at g are given by

The object is momentarily stationary in the co-moving frame, which is why the four-velocity has only the zeroth component, and the four-acceleration has simply g for its first component.

Armed with these four-vectors, one constructs the invariants

and

This last equation is solved for the specific co-moving frame as

But the invariant is more general, allowing the expression

which yields

From these, putting them all together, one obtains the general differential equations for the change in velocity as a set of coupled equations

The solution to these equations is

where the unprimed frame is the lab frame (or Earth frame), and the primed frame is the frame of the accelerating object, for instance a starship heading towards Alpha Centauri. These equations allow one to calculate distances, times and speeds as seen in the Earth frame as well as the distances, times and speeds as seen in the starship frame. If the starship is accelerating at some acceleration g’ other than g, then the results are obtained simply by replacing g by g’ in the equations.

Relativistic Flight

It turns out that the acceleration due to gravity on our home planet provides a very convenient (but purely coincidental) correspondence

With a similarly convenient expression

These considerably simplify the math for a starship accelerating at g.

Let’s now consider a starship accelerating by g for the first half of the flight to Alpha Centauri, turning around and decelerating at g for the second half of the flight, so that the starship comes to a stop at its destination. The equations for the times to the half-way point are

This means at the midpoint that 1.83 years have elapsed on the starship, and about 3 years have elapsed on Earth. The total time to get to Alpha Centauri (and come to a stop) is then simply

It is interesting to look at the speed at the midpoint. This is obtained by

which is solved to give

This amazing result shows that the starship is traveling at 95% of the speed of light at the midpoint when accelerating at the modest value of g for about 3 years. Of course, the engineering challenges for providing such an acceleration for such a long time are currently prohibitive … but who knows? There is a lot of time ahead of us for technology to advance to such a point in the next century or so.

Figure. Time lapsed inside the spacecraft and on Earth for the probe to reach Alpha Centauri as a function of the acceleration of the craft. At 10 g’s, the time elapsed on Earth is a little less than 5 years. However, the signal sent back will take an additional 4.37 years to arrive for a total time of about 9 years.

Matlab alphacentaur.m

% alphacentaur.m
clear
format compact

g0 = 1;
L = 4.37;

for loop = 1:100
    
    g = 0.1*loop*g0;
    
    taup = (1/g)*acosh(g*L/2 + 1);
    tearth = (1/g)*sinh(g*taup);
    
    tauspacecraft(loop) = 2*taup;
    tlab(loop) = 2*tearth;
    
    acc(loop) = g;
    
end

figure(1)
loglog(acc,tauspacecraft,acc,tlab,'LineWidth',2)
legend('Space Craft','Earth Frame','FontSize',18)
xlabel('Acceleration (g)','FontSize',18)
ylabel('Time (years)','FontSize',18)
dum = set(gcf,'Color','White');
H = gca;
H.LineWidth = 2;
H.FontSize = 18;

To Centauri and Beyond

Once we get unmanned probes to Alpha Centauri, it opens the door to star systems beyond. The next closest are Barnards star at 6 Ly away, Luhman 16 at 6.5 Ly, Wise at 7.4 Ly, and Wolf 359 at 7.9 Ly. Several of these are known to have orbiting exoplanets. Ross 128 at 11 Ly and Lyuten at 12.2 Ly have known earth-like planets. There are about 40 known earth-like planets within 40 lightyears from Earth, and likely there are more we haven’t found yet. It is almost inconceivable that none of these would have some kind of life. Finding life beyond our solar system would be a monumental milestone in the history of science. Perhaps that day will come within this century.


Further Reading

R. A. Mould, Basic Relativity. Springer (1994)

D. D. Nolte, Introduction to Modern Dynamics : Chaos, Networks, Space and Time, 2nd ed.: Oxford University Press (2019)

Democracy against Authoritarians: The Physics of Public Opinion

An old joke goes that Democracy is a terrible form of government … except it’s better than all the others!

Our world today is faced with conflict between democracy and dictatorship. On the one side is the free world, where leaders are chosen by some form of representation of large numbers of citizens and sometimes even a majority. On the other side is authoritarianism where a select few are selected by a select few to govern everyone else.

[I]t has been said that democracy is the worst form of Government except all those other forms that have been tried from time to time; but there is the broad feeling in our country that the people should rule, and that public opinion expressed by all constitutional means, should shape, guide, and control the actions of Ministers who are their servants and not their masters.

Winston Churchill (1947)

An argument in favor of democracy is freedom of choice for the largest segment of the population, plus the ability to remove leaders who fail to provide for the perceived welfare of the most citizens. This makes democracy adaptive, shifting with the times. It also makes leaders accountable for their actions and crimes. An argument in favor of authoritarianism is the myth of the benevolent dictator–someone who knows what’s best for the people even if the people don’t know it themselves.

But dictators are rarely benevolent, and as they become saturated with power, they are corrupted. The criminal massacres, perpetrated by Putin, of Ukrainian civilians is one of the strongest recent arguments against authoritarianism. A single man decides, on a whim, the life and death of thousands or maybe more. The invasion of Ukraine is so egregious and unwarranted, that we wonder how the Russian people can put up with their isolated and manic leader. Yet by some measure more than 60% of the people in Russia approve of the war.

How can the free world see the invasion as the atrocity it is, while Russia’s majority sees it as a just war? The answer is a surprising result of population dynamics known as the replicator-mutator equation. The challenge for us here in the free world is to learn how to game the replicator-mutator equation to break up the monopoly of popular opinion and make Putin pay for his arrogance. This blog explains how “mass hysteria” can arise from forces within a complex environment, and how to construct a possible antidote.

Replicator-Mutator Equation

There are several simple models of population dynamics that try to explain the rise and fall of the number of individuals that belong to varying cohorts within the population. These models incorporate aspects of relative benefit of one group over another, plus the chance to change sides–defection. The dynamics under these conditions can be highly nonlinear and highly non-intuitive. One of the simplest of these models is known as the replicator-mutator model where replication follows the fitness of the cohort, and where individuals can defect to a “more fit” cohort.

The basic dynamics of the model are

where xa is the fraction of the population that is in cohort a, Wab is a transition probability, and φ is the average fitness of the full population. The transition matrix is given by

where fb is the fitness of cohort b and Qba is a stochastic matrix that allows for defection of an individual from one cohort to another. The fitness of a cohort is given by

where pbc is the pay-off matrix for the relative benefit of one cohort at the expense of another. Finally the average fitness is

The Einstein implicit summation convention is assumed in all of these equations, and the metric space in which the dynamics are embedded is “flat” so that there is no essential difference between superscripts and subscripts. There is also a conservation law that the sum over all population fractions equals unity.

In the language of population dynamics, this model has frequency-dependent fitness, with defection and pay-off, in a zero-sum game.

One of the simplest questions to answer with this model is how so many people can come to believe one thing. This is known as “opinion uniformity”.

Uniformity versus Diversity

This replicator-mutator model explains the property of opinion uniformity, as well as the opposite extreme of opinion diversity. The starting point for both is the pay-off matrix pbc which is assumed to be unity on the diagonal for b = c and to a constant factor a for b ~= c. This pay-off is symmetric so that all opinions are equally “believable”. The stochastic defection matrix is close to unity on the diagonal, and has random terms on the off-diagonal that are proportional to a constant ε. The defection matrix allows a person from one cohort to defect to the belief system of another cohort if they believe that the new cohort has more merit. Cohorts with greater merit (fitness) gain more members over time, while cohorts with lower merit have fewer members over time.

Note that the fitness increases with the number of members in the cohort. This is the bandwagon effect. A belief system is perceived to have more merit if there are more people who believe it. This clearly creates a positive feedback that would cause this cohort to grow. Even though all the cohorts have this same positive feedback, the zero-sum rule only allows one of the cohorts to grow to its maximum extent, taking members away from all the other cohorts. This is illustrated in Fig. 1. One belief system wins, taking almost the full population with it.

Fig. 1 Population fractions evolving as a function of time for a = 0.5 and a small defection rate ε = 0.02. One winner takes almost all the population. These are two views of the same data on semilog and log-log.

What allows the winner to take all is the positive feedback where the fitness of the cohort increases with the number of members, combined with the ability for that cohort to take members from other cohorts through the defection matrix.

However, all of the cohorts are trying the same thing, and the pay-off matrix is fully symmetric and equal for all cohorts, so no cohort is intrinsically “better” than another. This property opens the door to a strong alternative to opinion uniformity. In fact, as more members are allowed to defect, it creates a trend counter to winner-take-all, helping to equalize the cohorts. Suddenly, a bifurcation is passed when the winner-take-all converts discontinuously to a super-symmetric situation when all opinions are held by equal numbers of people. This is illustrated in Fig. 2 for a slightly higher defection rate ε = 0.03. The parameters are identical to those in Fig. 1, but the higher defection rate stabilizes the super-symmetric state of maximum diversity.

Fig. 2 Population fractions for higher defection rate of 0.03. In super-symmetric state, all opinions are held at the same rate with maximum diversity.

These two extreme results of the replicator-mutator equation, that switch suddenly from one to the other dependent on the defection rate, may seem to produce solutions neither of which are ideal for a healthy democracy. One the one hand, in the uniform case where the wining opinion is monolithic, everyone is a carbon-copy of everyone else, which is a form of cultural death (lack of diversity). But, on the other hand, one might argue that maximum opinion diversity is just as concerning, because no-one can agree on anything. If all opinions are equivalent, then everyone in the society believes something different and there is no common ground. But in the diversity case, at least there is no state-level control of the population. In the case of opinion uniformity, the wining opinion can be manipulated by propaganda.

The Propaganda Machine

A government can “seed” the belief networks with propaganda that favors the fitness of what they want their citizens to hear. Because of the positive feedback, any slight advantage of one opinion over others can allow that opinion to gain large numbers through the bandwagon effect. Of course, even stronger control that stifles dissent, for instance by shutting down the free press, makes it that much more likely that the state-controlled story is believed. This may be one reason for the 60% (as of the writing of this blog) support Putin’s war, despite the obvious lies that are being told. This is illustrated in Fig. 3 by boosting the payoff between two similar lies that the government wants its people to believe. These rise to take about 60% of the population. Members of the cohort are brain-washed, not by the government alone, but by all their neighbors who are parroting the same thing.

Fig. 3 Government propaganda acts as a “seed” that makes the propaganda grow faster than other beliefs, even for a defection rate of 0.03 which is above the threshold of Fig. 2.

Breaking the Monopoly of Thought

How do we fight back? Not just against the Kremlin’s propaganda, but also against QANON and Trump’s Big Lie and the pernicious fallacy of nationalism? The answer is simple: diversity of thought! The sliver bullet in the replicator-mutator model is the defection matrix. The existence of a bifurcation means that a relatively small increase in the amount of diverse opinion, and the freedom to swap opinions, can lead to a major qualitative collapse of the monolithic thought, even when supported by government propaganda, as shown in Fig. 4. More people may still believe in the state-supported propaganda than the others, but it is no longer a majority.

Fig. 4 Increasing the defection rate can help equalize free opinions against the state-supported propaganda

The above models were all very homogeneous. It is more realistic that people are connected through small-world networks. In this case, there is much more diversity, as shown in Fig. 5, although the defection rate needs to be much higher to prevent a monolithic opinion from dominating. The state-supported propaganda is buried in the resulting mix of diverse ideas. Therefore, to counteract state control, people must feel free to hop about in their choice of beliefs and have access to other beliefs.

Fig. 5 The defection matrix is multiplied by the adjacency matrix of a small-world network. There is significant diversity of thought, but a relatively high defection rate is needed. The state-supported propaganda is buried in this mix.

This is a bit paradoxical. On the one hand, the connectivity of the internet has fostered the rise of conspiracy theories and other odd-ball ideas. But sustained access to multiple sources of information is the best defense against all that crazy stuff winning out. In other words, not only do we have to put up with the lunatic fringe if we are to have full diversity of thought, but we need to encourage everyone to feel free to “shop around” for different ideas, even if some of them are crazy. Our free society shouldn’t be cancelling people who have divergent opinions, because that sets us down the path to authoritarianism. As a recent add said in the New York Times, “Cancel culture cancels culture.” Unfortunately, authoritarianism is on the rise around the world, and the US almost suffered that fate on Jan. 6, 2021. Furthermore, with Xi aligning with Putin and giving him the green light on Ukraine–cynically on the eve of the Olympic Games (of peace)–the new world order will revolve around that axis for decades to come, if the world survives that long. Diversity and freedom may be the only antidote.

Matlab Program: Repmut.m

function repmut
% https://github.itap.purdue.edu/nolte/Matlab-Programs-for-Nonlinear-Dynamics

clear
format compact

N = 63;     
p = 0.5;

mutype = 1;     % 0 = Hamming   1 = rand
pay = 1;        % 0 = Hamming   1 = 1/sqrt(N) 
ep = 0.5;      % average mutation rate: 0.1 to 0.01 typical  (0.4835)

%%%%% Set original population
x0temp = rand(1,N);    % Initial population
sx = sum(x0temp);
y0 = x0temp/sx;
Pop0 = sum(y0);


%%%%% Set Adjacency

%node = makeglobal(N);
%node = makeER(N,0.25);       % 0.5     0.25 
%node = makeSF(N,6);       % 12         6
node = makeSW(N,7,0.125);   % 15,0.5    7,0.5
[Adj,degree,Lap] = adjacency(node);

%%%%%% Set Hamming distance
for yloop = 1:N
    for xloop = 1:N
        H(yloop,xloop) = hamming(yloop-1,xloop-1);
    end
end

%%%%%%% Set Mutation matrix
if mutype == 0
    Qtemp = 1./(1+H/ep);    %Mutation matrix on Hamming
    Qsum = sum(Qtemp,2);
    mnQsum = mean(Qsum);
    
    % Normalize mutation among species
    for yloop = 1:N
        for xloop = 1:N
            Q(yloop,xloop) = Qtemp(yloop,xloop)/Qsum(xloop);
        end
    end
    
elseif mutype == 1  
    S = stochasticmatrix(N);
    Stemp = S - diag(diag(S));
    Qtemp = ep*Stemp;
    sm = sum(Qtemp,2)';
    Q = Qtemp + diag(ones(1,N) - sm);
end

figure(1)
imagesc(Q)
title('Mutation Matrix')
colormap(jet)

%%%%%%% Set payoff matrix
if pay == 1
    payoff = zeros(N,N);
    for yloop = 1:N
        payoff(yloop,yloop) = 1;
        for xloop = yloop + 1:N
            payoff(yloop,xloop) = p;
            payoff(xloop,yloop) = p;
            payoff(1,N) = 1;    % Propaganda
            payoff(N,1) = 1;
        end
    end
elseif pay == 0
    payoff = zerodiag(exp(-1*H));
end

figure(2)
imagesc(payoff)
title('Payoff Matrix')
colormap(jet)

% Run time evolution
tspan = [0 4000];
[t,x] = ode45(@quasispec,tspan,y0);

Pop0
[sz,dum] = size(t);
Popend = sum(x(sz,:))

for loop = 1:N
    fit(loop) = sum(payoff(:,loop)'.*x(sz,:));
end

phistar = sum(fit.*x(sz,:))       % final average fitness

xend = x(sz,:)
sortxend = sort(xend,'descend');
coher = sum(sortxend(1:2))

figure(3)
clf
h = colormap(lines);
for loop = 1:N
    plot(t,x(:,loop),'Color',h(round(loop*64/N),:),'LineWidth',1.25)
    hold on
end
hold off

figure(4)
clf
for loop = 1:N
    semilogx(t,x(:,loop),'Color',h(round(loop*64/N),:),'LineWidth',1.25)
    hold on
end
hold off

figure(5)
clf
for loop = 1:N
    semilogy(t,x(:,loop),'Color',h(round(loop*64/N),:),'LineWidth',1.25)
    hold on
end
hold off

figure(6)
clf
for loop = 1:N
    loglog(t,x(:,loop),'Color',h(round(loop*64/N),:),'LineWidth',1.25)
    hold on
end
hold off

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    function yd = quasispec(~,y)
        
        for floop = 1:N
            f(floop) = sum(payoff(:,floop).*y);
        end
        
        Connect = Adj + eye(N);
        
        % Transition matrix
        for yyloop = 1:N
            for xxloop = 1:N
                W(yyloop,xxloop) = f(yyloop)*(Connect(yyloop,xxloop)*Q(yyloop,xxloop));
            end
        end
        
        phi = sum(f'.*y);   % Average fitness of population
        
        yd = W*y - phi*y;
        
    end     % end quasispec
end

Further Reading

M. A. Nowak, Evolutionary Dynamics: Exploring the Equations of Life. Cambridge, Mass.: Harvard University Press, 2006.

Life in a Solar System with a Super-sized Jupiter

There are many known super-Jupiters that orbit their stars—they are detected through a slight Doppler wobble they induce on their stars [1].  But what would become of a rocky planet also orbiting those stars as they feel the tug of both the star and the super planet?

This is not of immediate concern for us, because our solar system has had its current configuration of planets for over 4 billion years.  But there can be wandering interstellar planets or brown dwarfs that could visit our solar system, like Oumuamua did in 2017, but much bigger and able to scramble the planetary orbits. Such hypothesized astronomical objects have been given the name “Nemesis“, and it warrants thought on what living in an altered solar system might be like.

What would happen to Earth if Jupiter were 50 times bigger? Could we survive?

The Three-Body Problem

The Sun-Earth-Jupiter configuration is a three-body problem that has a long and interesting history, playing a key role in several aspects of modern dynamics [2].  There is no general analytical solution to the three-body problem.  To find the behavior of three mutually interacting bodies requires numerical solution.  However, there are subsets of the three-body problem that do yield to partial analytical approaches.  One of these is called the restricted three-body problem [3].  It consists of two massive bodies plus a third (nearly) massless body that all move in a plane.  This restricted problem was first tackled by Euler and later by Poincaré, who discovered the existence of chaos in its solutions.

The geometry of the restricted three-body problem is shown in Fig. 1. In this problem, take mass m1 = mS to be the Sun’s mass, m2 = mJ to be Jupiter’s mass, and the third (small) mass is the Earth. 

Fig. 1  The restricted 3-body problem in the plane.  The third mass is negligible relative to the first two masses that obey 2-body dynamics.

The equation of motion for the Earth is

where

and the parameter ξ characterizes the strength of the perturbation of the Earth’s orbit around the Sun.  The parameters for the Jupiter-Sun system are

with

for the 11.86 year journey of Jupiter around the Sun.  Eq. (1) is a four-dimensional non-autonomous flow

The solutions of an Earth orbit are shown in Fig.2.  The natural Earth-Sun-Jupiter system has a mass ratio mJ/mS = 0.001 for Jupiter relative to the Sun mass.  Even in this case, Jupiter causes perturbations of the Earth’s orbit by about one percent.  If the mass of Jupiter increases, the perturbations would grow larger until around ξ= 0.06 when the perturbations become severe and the orbit grows unstable.  The Earth gains energy from the momentum of the Sun-Jupiter system and can reach escape velocity.  The simulation for a mass ratio of 0.07 shows the Earth ejected from the Solar System.

Fig.2  Orbit of Earth as a function of the size of a Jupiter-like planet.  The natural system has a Jupiter-Earth mass ratio of 0.03.  As the size of Jupiter increases, the Earth orbit becomes unstable and can acquire escape velocity to escape from the Solar System. From body3.m. (Reprinted from Ref. [4])

The chances for ejection depends on initial conditions for these simulations, but generally the danger becomes severe when Jupiter is about 50 times larger than it currently is. Otherwise the Earth remains safe from ejection. However, if the Earth is to keep its climate intact, then Jupiter should not be any larger than about 5 times its current size. At the other extreme, for a planet 70 times larger than Jupiter, the Earth may not get ejected at once, but it can take a wild ride through the solar system. A simulation for a 70x Jupiter is shown in Fig. 3. In this case, the Earth is captured for a while as a “moon” of Jupiter in a very tight orbit around the super planet as it orbits the sun before it is set free again to orbit the sun in highly elliptical orbits. Because of the premise of the restricted three-body problem, the Earth has no effect on the orbit of Jupiter.

Fig. 3 Orbit of Earth for TJ = 11.86 years and ξ = 0.069. The radius of Jupiter is RJ = 5.2. Earth is “captured” for a while by Jupiter into a very tight orbit.

Resonance

If Nemesis were to swing by and scramble the solar system, then Jupiter might move closer to the Earth. More ominously, the period of Jupiter’s orbit could come into resonance with the Earth’s period. This occurs when the ratio of orbital periods is a ratio of small integers. Resonance can amplify small perturbations, so perhaps Jupiter would become a danger to Earth. However, the forces exerted by Jupiter on the Earth changes the Earth’s orbit and hence its period, preventing strict resonance to occur, and the Earth is not ejected from the solar system even for initial rational periods or larger planet mass. This is related to the famous KAM theory of resonances by Kolmogorov, Arnold and Moser that tends to protect the Earth from the chaos of the solar system. More often than not in these scenarios, the Earth is either captured by the super Jupiter, or it is thrown into a large orbit that is still bound to the sun. Some examples are given in the following figures.

Fig. 4 Orbit of Earth for an initial 8:1 resonance of TJ = 8 years and ξ = 0.073. The Radius of Jupiter is R = 4. Jupiter perturbs the Earth’s orbit so strongly that the 8:1 resonance is quickly removed.
Fig. 5 Earth orbit for TJ = 12 years and ξ = 0.071. The Earth is thrown into a nearly circular orbit beyond the orbit of Saturn.

Fig. 6 Earth Orbit for TJ = 4 years and ξ = 0.0615. Earth is thrown into an orbit of high ellipticity out to the orbit of Neptune.

Life on a planet in a solar system with two large bodies has been envisioned in dramatic detail in the science fiction novel “Three-Body Problem” by Liu Cixin about the Trisolarians of the closest known exoplanet to Earth–Proxima Centauri b.

Matlab Code: body3.m

function body3

clear

chsi0 = 1/1000;     % Earth-moon ratio = 1/317
wj0 = 2*pi/11.86;

wj = 2*pi/8;
chsi = 73*chsi0;    % (11.86,60) (11.86,67.5) (11.86,69) (11.86,70) (4,60) (4,61.5) (8,73) (12,71) 

rj = 5.203*(wj0/wj)^0.6666

rsun = chsi*rj/(1+chsi);
rjup = (1/chsi)*rj/(1+1/chsi);

r0 = 1-rsun;
y0 = [r0 0 0 2*pi/sqrt(r0)];

tspan = [0 300];
options = odeset('RelTol',1e-5,'AbsTol',1e-6);
[t,y] = ode45(@f5,tspan,y0,options);

figure(1)
plot(t,y(:,1),t,y(:,3))

figure(2)
plot(y(:,1),y(:,3),'k')
axis equal
axis([-6 6 -6 6])

RE = sqrt(y(:,1).^2 + y(:,3).^2);
stdRE = std(RE)

%print -dtiff -r800 threebody

    function yd = f5(t,y)
        
        xj = rjup*cos(wj*t);
        yj = rjup*sin(wj*t);
        xs = -rsun*cos(wj*t);
        ys = -rsun*sin(wj*t);
        rj32 = ((y(1) - xj).^2 + (y(3) - yj).^2).^1.5;
        r32 = ((y(1) - xs).^2 + (y(3) - ys).^2).^1.5;

        yp(1) = y(2);
        yp(2) = -4*pi^2*((y(1)-xs)/r32 + chsi*(y(1)-xj)/rj32);
        yp(3) = y(4);
        yp(4) = -4*pi^2*((y(3)-ys)/r32 + chsi*(y(3)-yj)/rj32);
 
        yd = [yp(1);yp(2);yp(3);yp(4)];

    end     % end f5

end



References:

[1] D. D. Nolte, “The Fall and Rise of the Doppler Effect,” Physics Today, vol. 73, no. 3, pp. 31-35, Mar (2020)

[2] J. Barrow-Green, Poincaré and the three body problem. London Mathematical Society, 1997.

[3] M. C. Gutzwiller, “Moon-Earth-Sun: The oldest three-body problem,” Reviews of Modern Physics, vol. 70, no. 2, pp. 589-639, Apr (1998)

[4] D. D. Nolte, Introduction to Modern Dynamics : Chaos, Networks, Space and Time, 1st ed. (Oxford University Press, 2015).

The Physics of Robinson Crusoe’s Economy

“What is a coconut worth to a cast-away on a deserted island?”

In the midst of the cast-away’s misfortune and hunger and exertion and food lies an answer that looks familiar to any physicist who speaks the words

“Assume a Lagrangian …”

It is the same process that determines how a bead slides along a bent wire in gravity or a skier navigates a ski hill.  The answer: find the balance of economic forces subject to constraints. 

Here is the history and the physics behind one of the simplest economic systems that can be conceived:  Robinson Crusoe spending his time collecting coconuts!

Robinson Crusoe in Economic History

Daniel Defoe published “The Life and Strange Surprizing Adventures of Robinson Crusoe” in 1719, about a man who is shipwrecked on a deserted island and survives there for 28 years before being rescued.  It was written in the first person, as if the author had actually lived through those experiences, and it was based on a real-life adventure story.  It is one of the first examples of realistic fiction, and it helped establish the genre of the English novel.

Several writers on economic theory made mention of Robinson Crusoe as an example of a labor economy, but it was in 1871 that Robinson Crusoe became an economic archetype.  In that year both William Stanley Jevons‘s The Theory of Political Economy and Carl Menger‘s Grundsätze der Volkswirtschaftslehre (Principles of Economics) used Robinson Crusoe to illustrate key principles of the budding marginalist revolution.

Marginalism in economic theory is the demarcation between classical economics and modern economics.  The key principle of marginalism is the principle of “diminishing returns” as the value of something gets less as an individual has more of it.  This principle makes functions convex, which helps to guarantee that there are equilibrium points in the economy.  Economic equilibrium is a key concept and goal because it provides stability to economic systems.

One-Product Is a Dull Diet

The Robinson Crusoe economy is  one of the simplest economic models that captures the trade-off between labor and production on one side, and leisure and consumption on the other.  The model has a single laborer for whom there are 24*7 =168 hours in the week.  Some of these hours must be spent finding food, let’s say coconuts, while the other hours are for leisure and rest.  The production of coconuts follows a production curve

that is a function of labor L.  There are diminishing returns in the finding of coconuts for a given labor, making the production curve of coconuts convex.  The amount of rest is

and there is a reciprocal production curve q(R) related to less coconuts produced for more time spent resting. In this model it is assumed that all coconuts that are produced are consumed.  This is known as market clearing when no surplus is built up. 

The production curve presents a continuous trade-off between consumption and leisure, but at first look there is no obvious way to decide how much to work and how much to rest.  A lazy person might be willing to go a little hungry if they can have more rest, while a busy person might want to use all waking hours to find coconuts.  The production curve represents something known as a Pareto frontier.  It is a continuous trade-off between two qualities.  Another example of a Pareto frontier is car engine efficiency versus cost.  Some consumers may care more about the up-front cost of the car than the cost of gas, while other consumers may value fuel efficiency and be willing to pay higher costs to get it. 

Continuous trade offs always present a bit of a problem for planning. It is often not clear what the best trade off should be. This problem is solved by introducing another concept into this little economy–the concept of “Utility”.

The utility function was introduced by the physicist Daniel Bernoulli, one of the many bountiful Bernoullis of Basel, in 1738. The utility function is a measure of how much benefit or utility a person or an enterprise gains by holding varying amounts of goods or labor. The essential problem in economic exchange is to maximize one’s utility function subject to whatever constraints are active. The utility function for Robinson Crusoe is

This function is obviously a maximum at maximum leisure (R = 1) and lots of coconuts (q = 1), but this is not allowed, because it lies off the production curve q(R). Therefore the question becomes: where on the production curve he can maximize the trade-off between coconuts and leisure?

Fig. 1 shows the dynamical space for Robinson Crusoe’s economy. The space is two dimensional with axes for coconuts q and rest R. Isoclines of the utility function are shown as contours known as “indifference” curves, because the utility is constant along these curves and hence Robinson Crusoe is indifferent to his position on it. The indifference curves are cut by the production curve q(R). The equilibrium problem is to maximize utility subject to the production curve.

Fig. 1 The production space of the Robinson Crusoe economy. The production curve q(R) cuts across the isoclines of the utility function U(q,R). The contours represent “indifference” curves because the utility is constant along a contour.

When looking at dynamics under constraints, Lagrange multipliers are the best tool. Furthermore, we can impart dynamics into the model with temporal adjustments in q and R that respond to economic forces.

The Lagrangian Economy

The approach to the Lagrangian economy is identical to the Lagrangian approach in classical physics. The equation of constraint is

All the dynamics take place on the production curve. The initial condition starts on the curve, and the state point moves along the curve until it reaches a maximum and settles into equilibrium. The dynamics is therefore one-dimensional, the link between q and R being the production curve.

The Lagrangian in this simple economy is given by the utility function augmented by the equation of constraint, such that

where λ is the Lagrangian undetermined multiplier. The Euler-Lagrange equations for dynamics are

where the term on the right-hand-side is a drag force with the relaxation rate γ.

The first term on the left is the momentum of the system. In economic dynamics, this is usually negligible, similar to dynamics in living systems at low Reynold’s number in which all objects are moving instantaneously at their terminal velocity in response to forces. The equations of motion are therefore

The Lagrange multiplier can be solved from the first equation as

and the last equation converts q-dot to R-dot to yield the single equation

which is a one-dimensional flow

where all q’s are expressed as R’s through the equation of constraint. The speed vanishes at the fixed point—the economic equilibrium—when

This is the point of Pareto efficient allocation. Any initial condition on the production curve will relax to this point with a rate given by γ. These trajectories are shown in Fig. 2. From the point of view of Robinson Crusoe, if he is working harder than he needs, then he will slack off. But if there aren’t enough coconuts to make him happy, he will work harder.

Fig. 2 Motion occurs on the one-dimensional manifold defined by the production curve such that the utility is maximized at a unique point called the Pareto Efficient Allocation.

The production curve is like a curved wire, the amount of production q is like the bead sliding on the wire. The utility function plays the role of a potential function, and the gradients of the utility function play the role of forces. Then this simple economic model is just like ordinary classical physics of point masses responding to forces constrained to lie on certain lines or surfaces. From this viewpoint, physics and economics are literally the same.

Worked Example

To make this problem specific, consider a utility function given by

that has a maximum in the upper right corner, and a production curve given by

that has diminishing returns. Then, the condition of equilibrium can be solved using

to yield

With the (fairly obvious) answer

For More Reading

[1] D. D. Nolte, Introduction to Modern Dynamics : Chaos, Networks, Space and Time, 2nd ed. Oxford : Oxford University Press (2019).

[2] Fritz Söllner; The Use (and Abuse) of Robinson Crusoe in Neoclassical Economics. History of Political Economy; 48 (1): 35–64. (2016)

The Doppler Universe

If you are a fan of the Doppler effect, then time trials at the Indy 500 Speedway will floor you.  Even if you have experienced the fall in pitch of a passing train whistle while stopped in your car at a railroad crossing, or heard the falling whine of a jet passing overhead, I can guarantee that you have never heard anything like an Indy car passing you by at 225 miles an hour.

Indy 500 Time Trials and the Doppler Effect

The Indy 500 time trials are the best way to experience the effect, rather than on race day when there is so much crowd noise and the overlapping sounds of all the cars.  During the week before the race, the cars go out on the track, one by one, in time trials to decide the starting order in the pack on race day.  Fans are allowed to wander around the entire complex, so you can get right up to the fence at track level on the straight-away.  The cars go by only thirty feet away, so they are coming almost straight at you as they approach and straight away from you as they leave.  The whine of the car as it approaches is 43% higher than when it is standing still, and it drops to 33% lower than the standing frequency—a ratio almost approaching a factor of two.  And they go past so fast, it is almost a step function, going from a steady high note to a steady low note in less than a second.  That is the Doppler effect!

But as obvious as the acoustic Doppler effect is to us today, it was far from obvious when it was proposed in 1842 by Christian Doppler at a time when trains, the fastest mode of transport at the time, ran at 20 miles per hour or less.  In fact, Doppler’s theory generated so much controversy that the Academy of Sciences of Vienna held a trial in 1853 to decide its merit—and Doppler lost!  For the surprising story of Doppler and the fate of his discovery, see my Physics Today article

From that fraught beginning, the effect has expanded in such importance, that today it is a daily part of our lives.  From Doppler weather radar, to speed traps on the highway, to ultrasound images of babies—Doppler is everywhere.

Development of the Doppler-Fizeau Effect

When Doppler proposed the shift in color of the light from stars in 1842 [1], depending on their motion towards or away from us, he may have been inspired by his walk to work every morning, watching the ripples on the surface of the Vltava River in Prague as the water slipped by the bridge piers.  The drawings in his early papers look reminiscently like the patterns you see with compressed ripples on the upstream side of the pier and stretched out on the downstream side.  Taking this principle to the night sky, Doppler envisioned that binary stars, where one companion was blue and the other was red, was caused by their relative motion.  He could not have known at that time that typical binary star speeds were too small to cause this effect, but his principle was far more general, applying to all wave phenomena. 

Six years later in 1848 [2], the French physicist Armand Hippolyte Fizeau, soon to be famous for making the first direct measurement of the speed of light, proposed the same principle, unaware of Doppler’s publications in German.  As Fizeau was preparing his famous measurement, he originally worked with a spinning mirror (he would ultimately use a toothed wheel instead) and was thinking about what effect the moving mirror might have on the reflected light.  He considered the effect of star motion on starlight, just as Doppler had, but realized that it was more likely that the speed of the star would affect the locations of the spectral lines rather than change the color.  This is in fact the correct argument, because a Doppler shift on the black-body spectrum of a white or yellow star shifts a bit of the infrared into the visible red portion, while shifting a bit of the ultraviolet out of the visible, so that the overall color of the star remains the same, but Fraunhofer lines would shift in the process.  Because of the independent development of the phenomenon by both Doppler and Fizeau, and because Fizeau was a bit clearer in the consequences, the effect is more accurately called the Doppler-Fizeau Effect, and in France sometimes only as the Fizeau Effect.  Here in the US, we tend to forget the contributions of Fizeau, and it is all Doppler.

Fig. 1 The title page of Doppler’s 1842 paper [1] proposing the shift in color of stars caused by their motions. (“On the colored light of double stars and a few other stars in the heavens: Study of an integral part of Bradley’s general aberration theory”)
Fig. 2 Doppler used simple proportionality and relative velocities to deduce the first-order change in frequency of waves caused by motion of the source relative to the receiver, or of the receiver relative to the source.
Fig. 3 Doppler’s drawing of what would later be called the Mach cone generating a shock wave. Mach was one of Doppler’s later champions, making dramatic laboratory demonstrations of the acoustic effect, even as skepticism persisted in accepting the phenomenon.

Doppler and Exoplanet Discovery

It is fitting that many of today’s applications of the Doppler effect are in astronomy. His original idea on binary star colors was wrong, but his idea that relative motion changes frequencies was right, and it has become one of the most powerful astrometric techniques in astronomy today. One of its important recent applications was in the discovery of extrasolar planets orbiting distant stars.

When a large planet like Jupiter orbits a star, the center of mass of the two-body system remains at a constant point, but the individual centers of mass of the planet and the star both orbit the common point. This makes it look like the star has a wobble, first moving towards our viewpoint on Earth, then moving away. Because of this relative motion of the star, the light can appear blueshifted caused by the Doppler effect, then redshifted with a set periodicity. This was observed by Queloz and Mayer in 1995 for the star 51 Pegasi, which represented the first detection of an exoplanet [3]. The duo won the Nobel Prize in 2019 for the discovery.

Fig. 4 A gas giant (like Jupiter) and a star obit a common center of mass causing the star to wobble. The light of the star when viewed at Earth is periodically red- and blue-shifted by the Doppler effect. From Ref.

Doppler and Vera Rubins’ Galaxy Velocity Curves

In the late 1960’s and early 1970’s Vera Rubin at the Carnegie Institution of Washington used newly developed spectrographs to use the Doppler effect to study the speeds of ionized hydrogen gas surrounding massive stars in individual galaxies [4]. From simple Newtonian dynamics it is well understood that the speed of stars as a function of distance from the galactic center should increase with increasing distance up to the average radius of the galaxy, and then should decrease at larger distances. This trend in speed as a function of radius is called a rotation curve. As Rubin constructed the rotation curves for many galaxies, the increase of speed with increasing radius at small radii emerged as a clear trend, but the stars farther out in the galaxies were all moving far too fast. In fact, they are moving so fast that they exceeded escape velocity and should have flown off into space long ago. This disturbing pattern was repeated consistently in one rotation curve after another for many galaxies.

Fig. 5 Locations of Doppler shifts of ionized hydrogen measured by Vera Rubin on the Andromeda galaxy. From Ref.
Fig. 6 Vera Rubin’s velocity curve for the Andromeda galaxy. From Ref.
Fig. 7 Measured velocity curves relative to what is expected from the visible mass distribution of the galaxy. From Ref.

A simple fix to the problem of the rotation curves is to assume that there is significant mass present in every galaxy that is not observable either as luminous matter or as interstellar dust. In other words, there is unobserved matter, dark matter, in all galaxies that keeps all their stars gravitationally bound. Estimates of the amount of dark matter needed to fix the velocity curves is about five times as much dark matter as observable matter. In short, 80% of the mass of a galaxy is not normal. It is neither a perturbation nor an artifact, but something fundamental and large. The discovery of the rotation curve anomaly by Rubin using the Doppler effect stands as one of the strongest evidence for the existence of dark matter.

There is so much dark matter in the Universe that it must have a major effect on the overall curvature of space-time according to Einstein’s field equations. One of the best probes of the large-scale structure of the Universe is the afterglow of the Big Bang, known as the cosmic microwave background (CMB).

Doppler and the Big Bang

The Big Bang was astronomically hot, but as the Universe expanded it cooled. About 380,000 years after the Big Bang, the Universe cooled sufficiently that the electron-proton plasma that filled space at that time condensed into hydrogen. Plasma is charged and opaque to photons, while hydrogen is neutral and transparent. Therefore, when the hydrogen condensed, the thermal photons suddenly flew free and have continued unimpeded, continuing to cool. Today the thermal glow has reached about three degrees above absolute zero. Photons in thermal equilibrium with this low temperature have an average wavelength of a few millimeters corresponding to microwave frequencies, which is why the afterglow of the Big Bang got its name: the Cosmic Microwave Background (CMB).

Not surprisingly, the CMB has no preferred reference frame, because every point in space is expanding relative to every other point in space. In other words, space itself is expanding. Yet soon after the CMB was discovered by Arno Penzias and Robert Wilson (for which they were awarded the Nobel Prize in Physics in 1978), an anisotropy was discovered in the background that had a dipole symmetry caused by the Doppler effect as the Solar System moves at 368±2 km/sec relative to the rest frame of the CMB. Our direction is towards galactic longitude 263.85o and latitude 48.25o, or a bit southwest of Virgo. Interestingly, the local group of about 100 galaxies, of which the Milky Way and Andromeda are the largest members, is moving at 627±22 km/sec in the direction of galactic longitude 276o and latitude 30o. Therefore, it seems like we are a bit slack in our speed compared to the rest of the local group. This is in part because we are being pulled towards Andromeda in roughly the opposite direction, but also because of the speed of the solar system in our Galaxy.

Fig. 8 The CMB dipole anisotropy caused by the Doppler effect as the Earth moves at 368 km/sec through the rest frame of the CMB.

Aside from the dipole anisotropy, the CMB is amazingly uniform when viewed from any direction in space, but not perfectly uniform. At the level of 0.005 percent, there are variations in the temperature depending on the location on the sky. These fluctuations in background temperature are called the CMB anisotropy, and they help interpret current models of the Universe. For instance, the average angular size of the fluctuations is related to the overall curvature of the Universe. This is because, in the early Universe, not all parts of it were in communication with each other. This set an original spatial size to thermal discrepancies. As the Universe continued to expand, the size of the regional variations expanded with it, and the sizes observed today would appear larger or smaller, depending on how the universe is curved. Therefore, to measure the energy density of the Universe, and hence to find its curvature, required measurements of the CMB temperature that were accurate to better than a part in 10,000.

Equivalently, parts of the early universe had greater mass density than others, causing the gravitational infall of matter towards these regions. Then, through the Doppler effect, light emitted (or scattered) by matter moving towards these regions contributes to the anisotropy. They contribute what are known as “Doppler peaks” in the spatial frequency spectrum of the CMB anisotropy.

Fig. 9 The CMB small-scale anisotropy, part of which is contributed by Doppler shifts of matter falling into denser regions in the early universe.

The examples discussed in this blog (exoplanet discovery, galaxy rotation curves, and cosmic background) are just a small sampling of the many ways that the Doppler effect is used in Astronomy. But clearly, Doppler has played a key role in the long history of the universe.


References:

[1] C. A. DOPPLER, “Über das farbige Licht der Doppelsterne und einiger anderer Gestirne des Himmels (About the coloured light of the binary stars and some other stars of the heavens),” Proceedings of the Royal Bohemian Society of Sciences, vol. V, no. 2, pp. 465–482, (Reissued 1903) (1842)

[2] H. Fizeau, “Acoustique et optique,” presented at the Société Philomathique de Paris, Paris, 1848.

[3] M. Mayor and D. Queloz, “A JUPITER-MASS COMPANION TO A SOLAR-TYPE STAR,” Nature, vol. 378, no. 6555, pp. 355-359, Nov (1995)

[4] Rubin, Vera; Ford, Jr., W. Kent (1970). “Rotation of the Andromeda Nebula from a Spectroscopic Survey of Emission Regions”. The Astrophysical Journal. 159: 379


Further Reading

D. D. Nolte, “The Fall and Rise of the Doppler Effect,” Physics Today, vol. 73, no. 3, pp. 31-35, Mar (2020)

M. Tegmark, “Doppler peaks and all that: CMB anisotropies and what they can tell us,” in International School of Physics Enrico Fermi Course 132 on Dark Matter in the Universe, Varenna, Italy, Jul 25-Aug 04 1995, vol. 132, in Proceedings of the International School of Physics Enrico Fermi, 1996, pp. 379-416

Twenty Years at Light Speed: The Future of Photonic Quantum Computing

Now is exactly the wrong moment to be reviewing the state of photonic quantum computing — the field is moving so rapidly, at just this moment, that everything I say here now will probably be out of date in just a few years. On the other hand, now is exactly the right time to be doing this review, because so much has happened in just the past few years, that it is important to take a moment and look at where this field is today and where it will be going.

At the 20-year anniversary of the publication of my book Mind at Light Speed (Free Press, 2001), this blog is the third in a series reviewing progress in three generations of Machines of Light over the past 20 years (see my previous blogs on the future of the photonic internet and on all-optical computers). This third and final update reviews progress on the third generation of the Machines of Light: the Quantum Optical Generation. Of the three generations, this is the one that is changing the fastest.

Quantum computing is almost here … and it will be at room temperature, using light, in photonic integrated circuits!

Quantum Computing with Linear Optics

Twenty years ago in 2001, Emanuel Knill and Raymond LaFlamme at Los Alamos National Lab, with Gerald Mulburn at the University of Queensland, Australia, published a revolutionary theoretical paper (known as KLM) in Nature on quantum computing with linear optics: “A scheme for efficient quantum computation with linear optics” [1]. Up until that time, it was believed that a quantum computer — if it was going to have the property of a universal Turing machine — needed to have at least some nonlinear interactions among qubits in a quantum gate. For instance, an example of a two-qubit gate is a controlled-NOT, or CNOT, gate shown in Fig. 1 with the Truth Table and the equivalent unitary matrix. It clear that one qubit is controlling the other, telling it what to do.

The quantum CNOT gate gets interesting when the control line has a quantum superposition, then the two outputs become entangled.

Entanglement is a strange process that is unique to quantum systems and has no classical analog. It also has no simple intuitive explanation. By any normal logic, if the control line passes through the gate unaltered, then absolutely nothing interesting should be happening on the Control-Out line. But that’s not the case. The control line going in was a separate state. If some measurement were made on it, either a 1 or 0 would be seen with equal probability. But coming out of the CNOT, the signal has somehow become perfectly correlated with whatever value is on the Signal-Out line. If the Signal-Out is measured, the measurement process collapses the state of the Control-Out to a value equal to the measured signal. The outcome of the control line becomes 100% certain even though nothing was ever done to it! This entanglement generation is one reason the CNOT is often the gate of choice when constructing quantum circuits to perform interesting quantum algorithms.

However, optical implementation of a CNOT is a problem, because light beams and photons really do not like to interact with each other. This is the problem with all-optical classical computers too (see my previous blog). There are ways of getting light to interact with light, for instance inside nonlinear optical materials. And in the case of quantum optics, a single atom in an optical cavity can interact with single photons in ways that can act like a CNOT or related gates. But the efficiencies are very low and the costs to implement it are very high, making it difficult or impossible to scale such systems up into whole networks needed to make a universal quantum computer.

Therefore, when KLM published their idea for quantum computing with linear optics, it caused a shift in the way people were thinking about optical quantum computing. A universal optical quantum computer could be built using just light sources, beam splitters and photon detectors.

The way that KLM gets around the need for a direct nonlinear interaction between two photons is to use postselection. They run a set of photons — signal photons and ancilla (test) photons — through their linear optical system and they detect (i.e., theoretically…the paper is purely a theoretical proposal) the ancilla photons. If these photons are not detected where they are wanted, then that iteration of the computation is thrown out, and it is tried again and again, until the photons end up where they need to be. When the ancilla outcomes are finally what they need to be, this run is selected because the signal state are known to have undergone a known transformation. The signal photons are still unmeasured at this point and are therefore in quantum superpositions that are useful for quantum computation. Postselection uses entanglement and measurement collapse to put the signal photons into desired quantum states. Postselection provides an effective nonlinearity that is induced by the wavefunction collapse of the entangled state. Of course, the down side of this approach is that many iterations are thrown out — the computation becomes non-deterministic.

KLM could get around most of the non-determinism by using more and more ancilla photons, but this has the cost of blowing up the size and cost of the implementation, so their scheme was not imminently practical. But the important point was that it introduced the idea of linear quantum computing. (For this, Milburn and his collaborators have my vote for a future Nobel Prize.) Once that idea was out, others refined it, and improved upon it, and found clever ways to make it more efficient and more scalable. Many of these ideas relied on a technology that was co-evolving with quantum computing — photonic integrated circuits (PICs).

Quantum Photonic Integrated Circuits (QPICs)

Never underestimate the power of silicon. The amount of time and energy and resources that have now been invested in silicon device fabrication is so astronomical that almost nothing in this world can displace it as the dominant technology of the present day and the future. Therefore, when a photon can do something better than an electron, you can guess that eventually that photon will be encased in a silicon chip–on a photonic integrated circuit (PIC).

The dream of integrated optics (the optical analog of integrated electronics) has been around for decades, where waveguides take the place of conducting wires, and interferometers take the place of transistors — all miniaturized and fabricated in the thousands on silicon wafers. The advantages of PICs are obvious, but it has taken a long time to develop. When I was a post-doc at Bell Labs in the late 1980’s, everyone was talking about PICs, but they had terrible fabrication challenges and terrible attenuation losses. Fortunately, these are just technical problems, not limited by any fundamental laws of physics, so time (and an army of researchers) has chipped away at them.

One of the driving forces behind the maturation of PIC technology is photonic fiber optic communications (as discussed in a previous blog). Photons are clear winners when it comes to long-distance communications. In that sense, photonic information technology is a close cousin to silicon — photons are no less likely to be replaced by a future technology than silicon is. Therefore, it made sense to bring the photons onto the silicon chips, tapping into the full array of silicon fab resources so that there could be seamless integration between fiber optics doing the communications and the photonic chips directing the information. Admittedly, photonic chips are not yet all-optical. They still use electronics to control the optical devices on the chip, but this niche for photonics has provided a driving force for advancements in PIC fabrication.

Fig. 2 Schematic of a silicon photonic integrated circuit (PIC). The waveguides can be silica or nitride deposited on the silicon chip. From the Comsol WebSite.

One side-effect of improved PIC fabrication is low light losses. In telecommunications, this loss is not so critical because the systems use OEO regeneration. But less loss is always good, and the PICs can now safeguard almost every photon that comes on chip — exactly what is needed for a quantum PIC. In a quantum photonic circuit, every photon is valuable and informative and needs to be protected. The new PIC fabrication can do this. In addition, light switches for telecom applications are built from integrated interferometers on the chip. It turns out that interferometers at the single-photon level are unitary quantum gates that can be used to build universal photonic quantum computers. So the same technology and control that was used for telecom is just what is needed for photonic quantum computers. In addition, integrated optical cavities on the PICs, which look just like wavelength filters when used for classical optics, are perfect for producing quantum states of light known as squeezed light that turn out to be valuable for certain specialty types of quantum computing.

Therefore, as the concepts of linear optical quantum computing advanced through that last 20 years, the hardware to implement those concepts also advanced, driven by a highly lucrative market segment that provided the resources to tap into the vast miniaturization capabilities of silicon chip fabrication. Very fortuitous!

Room-Temperature Quantum Computers

There are many radically different ways to make a quantum computer. Some are built of superconducting circuits, others are made from semiconductors, or arrays of trapped ions, or nuclear spins on nuclei on atoms in molecules, and of course with photons. Up until about 5 years ago, optical quantum computers seemed like long shots. Perhaps the most advanced technology was the superconducting approach. Superconducting quantum interference devices (SQUIDS) have exquisite sensitivity that makes them robust quantum information devices. But the drawback is the cold temperatures that are needed for them to work. Many of the other approaches likewise need cold temperature–sometimes astronomically cold temperatures that are only a few thousandths of a degree above absolute zero Kelvin.

Cold temperatures and quantum computing seemed a foregone conclusion — you weren’t ever going to separate them — and for good reason. The single greatest threat to quantum information is decoherence — the draining away of the kind of quantum coherence that allows interferences and quantum algorithms to work. In this way, entanglement is a two-edged sword. On the one hand, entanglement provides one of the essential resources for the exponential speed-up of quantum algorithms. But on the other hand, if a qubit “sees” any environmental disturbance, then it becomes entangled with that environment. The entangling of quantum information with the environment causes the coherence to drain away — hence decoherence. Hot environments disturb quantum systems much more than cold environments, so there is a premium for cooling the environment of quantum computers to as low a temperature as they can. Even so, decoherence times can be microseconds to milliseconds under even the best conditions — quantum information dissipates almost as fast as you can make it.

Enter the photon! The bottom line is that photons don’t interact. They are blind to their environment. This is what makes them perfect information carriers down fiber optics. It is also what makes them such good qubits for carrying quantum information. You can prepare a photon in a quantum superposition just by sending it through a lossless polarizing crystal, and then the superposition will last for as long as you can let the photon travel (at the speed of light). Sometimes this means putting the photon into a coil of fiber many kilometers long to store it, but that is OK — a kilometer of coiled fiber in the lab is no bigger than a few tens of centimeters. So the same properties that make photons excellent at carrying information also gives them very small decoherence. And after the KLM schemes began to be developed, the non-interacting properties of photons were no longer a handicap.

In the past 5 years there has been an explosion, as well as an implosion, of quantum photonic computing advances. The implosion is the level of integration which puts more and more optical elements into smaller and smaller footprints on silicon PICs. The explosion is the number of first-of-a-kind demonstrations: the first universal optical quantum computer [2], the first programmable photonic quantum computer [3], and the first (true) quantum computational advantage [4].

All of these “firsts” operate at room temperature. (There is a slight caveat: The photon-number detectors are actually superconducting wire detectors that do need to be cooled. But these can be housed off-chip and off-rack in a separate cooled system that is coupled to the quantum computer by — no surprise — fiber optics.) These are the advantages of photonic quantum computers: hundreds of qubits integrated onto chips, room-temperature operation, long decoherence times, compatibility with telecom light sources and PICs, compatibility with silicon chip fabrication, universal gates using postselection, and more. Despite the head start of some of the other quantum computing systems, photonics looks like it will be overtaking the others within only a few years to become the dominant technology for the future of quantum computing. And part of that future is being helped along by a new kind of quantum algorithm that is perfectly suited to optics.

Fig. 3 Superconducting photon counting detector. From WebSite

A New Kind of Quantum Algorithm: Boson Sampling

In 2011, Scott Aaronson (then at at MIT) published a landmark paper titled “The Computational Complexity of Linear Optics” with his post-doc, Anton Arkhipov [5].  The authors speculated on whether there could be an application of linear optics, not requiring the costly step of post-selection, that was still useful for applications, while simultaneously demonstrating quantum computational advantage.  In other words, could one find a linear optical system working with photons that could solve problems intractable to a classical computer?  To their own amazement, they did!  The answer was something they called “boson sampling”.

To get an idea of what boson sampling is, and why it is very hard to do on a classical computer, think of the classic demonstration of the normal probability distribution found at almost every science museum you visit, illustrated in Fig. 2.  A large number of ping-pong balls are dropped one at a time through a forest of regularly-spaced posts, bouncing randomly this way and that until they are collected into bins at the bottom.  Bins near the center collect many balls, while bins farther to the side have fewer.  If there are many balls, then the stacked heights of the balls in the bins map out a Gaussian probability distribution.  The path of a single ping-pong ball represents a series of “decisions” as it hits each post and goes left or right, and the number of permutations of all the possible decisions among all the other ping-pong balls grows exponentially—a hard problem to tackle on a classical computer.

Fig. 4 Ping-pont ball normal distribution. Watch the YouTube video.

         

In the paper, Aaronson considered a quantum analog to the ping-pong problem in which the ping-pong balls are replaced by photons, and the posts are replaced by beam splitters.  As its simplest possible implementation, it could have two photon channels incident on a single beam splitter.  The well-known result in this case is the “HOM dip” [6] which is a consequence of the boson statistics of the photon.  Now scale this system up to many channels and a cascade of beam splitters, and one has an N-channel multi-photon HOM cascade.  The output of this photonic “circuit” is a sampling of the vast number of permutations allowed by bose statistics—boson sampling. 

To make the problem more interesting, Aaronson allowed the photons to be launched from any channel at the top (as opposed to dropping all the ping-pong balls at the same spot), and they allowed each beam splitter to have adjustable phases (photons and phases are the key elements of an interferometer).  By adjusting the locations of the photon channels and the phases of the beam splitters, it would be possible to “program” this boson cascade to mimic interesting quantum systems or even to solve specific problems, although they were not thinking that far ahead.  The main point of the paper was the proposal that implementing boson sampling in a photonic circuit used resources that scaled linearly in the number of photon channels, while the problems that could be solved grew exponentially—a clear quantum computational advantage [4]. 

On the other hand, it turned out that boson sampling is not universal—one cannot construct a universal quantum computer out of boson sampling.  The first proposal was a specialty algorithm whose main function was to demonstrate quantum computational advantage rather than do something specifically useful—just like Deutsch’s first algorithm.  But just like Deutsch’s algorithm, which led ultimately to Shor’s very useful prime factoring algorithm, boson sampling turned out to be the start of a new wave of quantum applications.

Shortly after the publication of Aaronson’s and Arkhipov’s paper in 2011, there was a flurry of experimental papers demonstrating boson sampling in the laboratory [7, 8].  And it was discovered that boson sampling could solve important and useful problems, such as the energy levels of quantum systems, and network similarity, as well as quantum random-walk problems. Therefore, even though boson sampling is not strictly universal, it solves a broad class of problems. It can be viewed more like a specialty chip than a universal computer, like the now-ubiquitous GPU’s are specialty chips in virtually every desktop and laptop computer today. And the room-temperature operation significantly reduces cost, so you don’t need a whole government agency to afford one. Just like CPU costs followed Moore’s Law to the point where a Raspberry Pi computer costs $40 today, the photonic chips may get onto their own Moore’s Law that will reduce costs over the next several decades until they are common (but still specialty and probably not cheap) computers in academia and industry. A first step along that path was a recently-demonstrated general programmable room-temperature photonic quantum computer.

Fig. 5 A classical Galton board on the left, and a photon-based boson sampling on the right. From the Walmsley (Oxford) WebSite.

A Programmable Photonic Quantum Computer: Xanadu’s X8 Chip

I don’t usually talk about specific companies, but the new photonic quantum computer chip from Xanadu, based in Toronto, Canada, feels to me like the start of something big. In the March 4, 2021 issue of Nature magazine, researchers at the company published the experimental results of their X8 photonic chip [3]. The chip uses boson sampling of strongly non-classical light. This was the first generally programmable photonic quantum computing chip, programmed using a quantum programming language they developed called Strawberry Fields. By simply changing the quantum code (using a simple conventional computer interface), they switched the computer output among three different quantum applications: transitions among states (spectra of molecular states), quantum docking, and similarity between graphs that represent two different molecules. These are radically different physics and math problems, yet the single chip can be programmed on the fly to solve each one.

The chip is constructed of nitride waveguides on silicon, shown in Fig. 6. The input lasers drive ring oscillators that produce squeezed states through four-wave mixing. The key to the reprogrammability of the chip is the set of phase modulators that use simple thermal changes on the waveguides. These phase modulators are changed in response to commands from the software to reconfigure the application. Although they switch slowly, once they are set to their new configuration, the computations take place “at the speed of light”. The photonic chip is at room temperature, but the outputs of the four channels are sent by fiber optic to a cooled unit containing the superconductor nanowire photon counters.

Fig. 6 The Xanadu X8 photonic quantum computing chip. From Ref.
Fig. 7 To see the chip in operation, see the YouTube video.

Admittedly, the four channels of the X8 chip are not large enough to solve the kinds of problems that would require a quantum computer, but the company has plans to scale the chip up to 100 channels. One of the challenges is to reduce the amount of photon loss in a multiplexed chip, but standard silicon fabrication approaches are expected to reduce loss in the next generation chips by an order of magnitude.

Additional companies are also in the process of entering the photonic quantum computing business, such as PsiQuantum, which recently closed a $450M funding round to produce photonic quantum chips with a million qubits. The company is led by Jeremy O’Brien from Bristol University who has been a leader in photonic quantum computing for over a decade.

Stay tuned!

Further Reading

• David D. Nolte, “Interference: A History of Interferometry and the Scientists who Tamed Light” (Oxford University Press, to be published in 2023)

• J. L. O’Brien, A. Furusawa, and J. Vuckovic, “Photonic quantum technologies,” Nature Photonics, Review vol. 3, no. 12, pp. 687-695, Dec (2009)

• T. C. Ralph and G. J. Pryde, “Optical Quantum Computation,” in Progress in Optics, Vol 54, vol. 54, E. Wolf Ed.,  (2010), pp. 209-269.

• S. Barz, “Quantum computing with photons: introduction to the circuit model, the one-way quantum computer, and the fundamental principles of photonic experiments,” (in English), Journal of Physics B-Atomic Molecular and Optical Physics, Article vol. 48, no. 8, p. 25, Apr (2015), Art no. 083001

References

[1] E. Knill, R. Laflamme, and G. J. Milburn, “A scheme for efficient quantum computation with linear optics,” Nature, vol. 409, no. 6816, pp. 46-52, Jan (2001)

[2] J. Carolan, J. L. O’Brien et al, “Universal linear optics,” Science, vol. 349, no. 6249, pp. 711-716, Aug (2015)

[3] J. M. Arrazola, et al, “Quantum circuits with many photons on a programmable nanophotonic chip,” Nature, vol. 591, no. 7848, pp. 54-+, Mar (2021)

[4] H.-S. Zhong J.-W. Pan et al, “Quantum computational advantage using photons,” Science, vol. 370, no. 6523, p. 1460, (2020)

[5] S. Aaronson and A. Arkhipov, “The Computational Complexity of Linear Optics,” in 43rd ACM Symposium on Theory of Computing, San Jose, CA, Jun 06-08 2011, NEW YORK: Assoc Computing Machinery, in Annual ACM Symposium on Theory of Computing, 2011, pp. 333-342

[6] C. K. Hong, Z. Y. Ou, and L. Mandel, “Measurement of subpicosecond time intervals between 2 photons by interference,” Physical Review Letters, vol. 59, no. 18, pp. 2044-2046, Nov (1987)

[7] J. B. Spring, I. A. Walmsley et al, “Boson Sampling on a Photonic Chip,” Science, vol. 339, no. 6121, pp. 798-801, Feb (2013)

[8] M. A. Broome, A. Fedrizzi, S. Rahimi-Keshari, J. Dove, S. Aaronson, T. C. Ralph, and A. G. White, “Photonic Boson Sampling in a Tunable Circuit,” Science, vol. 339, no. 6121, pp. 794-798, Feb (2013)

Twenty Years at Light Speed: Photonic Computing

In the epilog of my book Mind at Light Speed: A New Kind of Intelligence (Free Press, 2001), I speculated about a future computer in which sheets of light interact with others to form new meanings and logical cascades as light makes decisions in a form of all-optical intelligence.

Twenty years later, that optical computer seems vaguely quaint, not because new technology has passed it by, like looking at the naïve musings of Jules Verne from our modern vantage point, but because the optical computer seems almost as far away now as it did back in 2001.

At the the turn of the Millennium we were seeing tremendous advances in data rates on fiber optics (see my previous Blog) as well as the development of new types of nonlinear optical devices and switches that served the role of rudimentary logic switches.  At that time, it was not unreasonable to believe that the pace of progress would remain undiminished, and that by 2020 we would have all-optical computers and signal processors in which the same optical data on the communication fibers would be involved in the logic that told the data what to do and where to go—all without the wasteful and slow conversion to electronics and back again into photons—the infamous OEO conversion.

However, the rate of increase of the transmission bandwidth on fiber optic cables slowed not long after the publication of my book, and nonlinear optics today still needs high intensities to be efficient, which remains a challenge for significant (commercial) use of all-optical logic.

That said, it’s dangerous to ever say never, and research into all-optical computing and data processing is still going strong (See Fig. 1).  It’s not the dream that was wrong, it was the time-scale that was wrong, just like fiber-to-the-home.  Back in 2001, fiber-to-the-home was viewed as a pipe-dream by serious technology scouts.  It took twenty years, but now that vision is coming true in urban settings.  Back in 2001, all-optical computing seemed about 20 years away, but now it still looks 20 years out.  Maybe this time the prediction is right.  Recent advances in all-optical processing give some hope for it.  Here are some of those advances.

Fig. 1 Number of papers published by year with the phrase in the title: “All-Optical” or “Photonic or Optical and Neur*” according to Web of Science search. The term “All-optical” saturated around 2005. Papers written around optical neural networks was low to 2015 but now is experiencing a strong surge. The sociology of title choices, and how favorite buzz words shift over time, can obscure underlying causes and trends, but overall there is current strong interest in all-optical systems.

The “What” and “Why” of All-Optical Processing

One of the great dreams of photonics is the use of light beams to perform optical logic in optical processors just as electronic currents perform electronic logic in transistors and integrated circuits. 

Our information age, starting with the telegraph in the mid-1800’s, has been built upon electronics because the charge of the electron makes it a natural decision maker.  Two charges attract or repel by Coulomb’s Law, exerting forces upon each other.  Although we don’t think of currents acting in quite that way, the foundation of electronic logic remains electrical interactions. 

But with these interactions also come constraints—constraining currents to be contained within wires, waiting for charging times that slow down decisions, managing electrical resistance and dissipation that generate heat (computer processing farms in some places today need to be cooled by glacier meltwater).  Electronic computing is hardly a green technology.

Therefore, the advantages of optical logic are clear: broadcasting information without the need for expensive copper wires, little dissipation or heat, low latency (signals propagate at the speed of light).  Furthermore, information on the internet is already in the optical domain, so why not keep it in the optical domain and have optical information packets making the decisions?  All the routing and switching decisions about where optical information packets should go could be done by the optical packets themselves inside optical computers.

But there is a problem.  Photons in free space don’t interact—they pass through each other unaffected.  This is the opposite of what is needed for logic and decision making.  The challenge of optical logic is then to find a way to get photons to interact.

Think of the scene in Star Wars: The New Hope when Obiwan Kenobi and Darth Vader battle to the death in a light saber duel—beams of light crashing against each other and repelling each other with equal and opposite forces.  This is the photonic engineer’s dream!  Light controlling light.  But this cannot happen in free space. On the other hand, light beams can control other light beams inside nonlinear crystals where one light beam changes the optical properties of the crystal, hence changing how another light beam travels through it.  These are nonlinear optical crystals.

Nonlinear Optics

Virtually all optical control designs, for any kind of optical logic or switch, require one light beam to affect the properties of another, and that requires an intervening medium that has nonlinear optical properties.  The physics of nonlinear optics is actually simple: one light beam changes the electronic structure of a material which affects the propagation of another (or even the same) beam.  The key parameter is the nonlinear coefficient that determines how intense the control beam needs to be to produce a significant modulation of the other beam.  This is where the challenge is.  Most materials have very small nonlinear coefficients, and the intensity of the control beam usually must be very high. 

Fig. 2 Nonlinear optics: Light controlling light. Light does not interact in free space, but inside a nonlinear crystal, polarizability can create an effect interaction that can be surprisingly strong. Two-wave mixing (exchange of energy between laser beams) is shown in the upper pane. Optical associative holographic memory (four-wave mixing) is an example of light controlling light. The hologram is written when exposed by both “Light” and “Guang/Hikari”. When the recorded hologram is presented later only with “Guang/Hikari” it immediately translates it to “Light”, and vice versa.

Therefore, to create low-power all-optical logic gates and switches there are four main design principles: 1) increase the nonlinear susceptibility by engineering the material, 2) increase the interaction length between the two beams, 3) concentrate light into small volumes, and 4) introduce feedback to boost the internal light intensities.  Let’s take these points one at a time.

Nonlinear susceptibility: The key to getting stronger interaction of light with light is in the ease with which a control beam of light can distort the crystal so that the optical conditions change for a signal beam. This is called the nonlinear susceptibility . When working with “conventional” crystals like semiconductors (e.g. CdZnSe) or rare-Earths (e.g. LiNbO3), there is only so much engineering that is possible to try to tweak the nonlinear susceptibilities. However, artificially engineered materials can offer significant increases in nonlinear susceptibilities, these include plasmonic materials, metamaterials, organic semiconductors, photonic crystals. An increasingly important class of nonlinear optical devices are semiconductor optical amplifiers (SOA).

Interaction length: The interaction strength between two light waves is a product of the nonlinear polarization and the length over which the waves interact. Interaction lengths can be made relatively long in waveguides but can be made orders of magnitude longer in fibers. Therefore, nonlinear effects in fiber optics are a promising avenue for achieving optical logic.

Intensity Concentration:  Nonlinear polarization is the product of the nonlinear susceptibility with the field amplitude of the waves. Therefore, focusing light down to small cross sections produces high power, as in the core of a fiber optic, again showing advantages of fibers for optical logic implementations.

Feedback: Feedback, as in a standing-wave cavity, increases the intensity as well as the effective interaction length by folding the light wave continually back on itself. Both of these effects boost the nonlinear interaction, but then there is an additional benefit: interferometry. Cavities, like a Fabry-Perot, are interferometers in which a slight change in the round-trip phase can produce large changes in output light intensity. This is an optical analog to a transistor in which a small control current acts as a gate for an exponential signal current. The feedback in the cavity of a semiconductor optical amplifier (SOA), with high internal intensities and long effective interaction lengths and an active medium with strong nonlinearity make these elements attractive for optical logic gates. Similarly, integrated ring resonators have the advantage of interferometric control for light switching. Many current optical switches and logic gates are based on SOAs and integrated ring resonators.

All-Optical Regeneration

The vision of the all-optical internet, where the logic operations that direct information to different locations is all performed by optical logic without ever converting into the electrical domain, is facing a barrier that is as challenging to overcome today as it was back in 2001: all-optical regeneration. All-optical regeneration has been and remains the Achilles Heal of the all-optical internet.

Signal regeneration is currently performed through OEO conversion: Optical-to-Electronic-to-Optical. In OEO conversion, a distorted signal (distortion is caused by attenuation and dispersion and noise as signals travel down fiber optics) is received by a photodetector, is interpreted as ones and zeros that drive laser light sources that launch the optical pulses down the next stretch of fiber. The new pulses are virtually perfect, but they again degrade as they travel, until they are regenerated, and so on. The added advantage of the electrical layer is that the electronic signals can be used to drive conventional electronic logic for switching.

In all-optical regeneration, on the other hand, the optical pulses need to be reamplified, reshaped and retimed––known as 3R regeneration––all by sending the signal pulses through nonlinear amplifiers and mixers, which may include short stretches of highly nonlinear fiber (HNLF) or semiconductor optical amplifiers (SOA). There have been demonstrations of 2R all-optical regeneration (reamplifying and reshaping but not retiming) at lower data rates, but getting all 3Rs at the high data rates (40 Gb/s) in the next generation telecom systems remains elusive.

Nonetheless, there is an active academic literature that is pushing the envelope on optical logical devices and regenerators [1]. Many of the systems focus on SOA’s, HNLF’s and Interferometers. Numerical modeling of these kinds of devices is currently ahead of bench-top demonstrations, primarily because of the difficulty of fabrication and device lifetime. But the numerical models point to performance that would be competitive with OEO. If this OOO conversion (Optical-to-Optical-to-Optical) is scalable (can handle increasing bit rates and increasing numbers of channels), then the current data crunch that is facing the telecom trunk lines (see my previous Blog) may be a strong driver to implement such all-optical solutions.

It is important to keep in mind that legacy technology is not static but also continues to improve. As all-optical logic and switching and regeneration make progress, OEO conversion gets incrementally faster, creating a moving target. Therefore, we will need to wait another 20 years to see whether OEO is overtaken and replaced by all-optical.

Fig. 3 Optical-Electronic-Optical regeneration and switching compared to all-optical control. The optical control is performed using SOA’s, interferometers and nonlinear fibers.

Photonic Neural Networks

The most exciting area of optical logic today is in analog optical computing––specifically optical neural networks and photonic neuromorphic computing [2, 3]. A neural network is a highly-connected network of nodes and links in which information is distributed across the network in much the same way that information is distributed and processed in the brain. Neural networks can take several forms––from digital neural networks that are implemented with software on conventional digital computers, to analog neural networks implemented in specialized hardware, sometimes also called neuromorphic computing systems.

Optics and photonics are well suited to the analog form of neural network because of the superior ability of light to form free-space interconnects (links) among a high number of optical modes (nodes). This essential advantage of light for photonic neural networks was first demonstrated in the mid-1980’s using recurrent neural network architectures implemented in photorefractive (nonlinear optical) crystals (see Fig. 1 for a publication timeline). But this initial period of proof-of-principle was followed by a lag of about 2 decades due to a mismatch between driver applications (like high-speed logic on an all-optical internet) and the ability to configure the highly complex interconnects needed to perform the complex computations.

Fig. 4 Optical vector-matrix multiplication. An LED array is the input vector, focused by a lens onto the spatial light modulator that is the 2D matrix. The transmitted light is refocussed by the lens onto a photodiode array with is the output vector. Free-space propagation and multiplication is a key advantage to optical implementation of computing.

The rapid rise of deep machine learning over the past 5 years has removed this bottleneck, and there has subsequently been a major increase in optical implementations of neural networks. In particular, it is now possible to use conventional deep machine learning to design the interconnects of analog optical neural networks for fixed tasks such as image recognition [4]. At first look, this seems like a non-starter, because one might ask why not use the conventional trained deep network to do the recognition itself rather than using it to create a special-purpose optical recognition system. The answer lies primarily in the metrics of latency (speed) and energy cost.

In neural computing, approximately 90% of the time and energy go into matrix multiplication operations. Deep learning algorithms driving conventional digital computers need to do the multiplications at the sequential clock rate of the computer using nested loops. Optics, on the other had, is ideally suited to perform matrix multiplications in a fully parallel manner (see Fig. 4). In addition, a hardware implementation using optics operates literally at the speed of light. The latency is limited only by the time of flight through the optical system. If the optical train is 1 meter, then the time for the complete computation is only a few nanoseconds at almost no energy dissipation. Combining the natural parallelism of light with the speed has led to unprecedented computational rates. For instance, recent implementations of photonic neural networks have demonstrated over 10 Trillion operations per second (TOPS) [5].

It is important to keep in mind that although many of these photonic neural networks are characterized as all-optical, they are generally not reconfigurable, meaning that they are not adaptive to changing or evolving training sets or changing input information. Most adaptive systems use OEO conversion with electronically-addressed spatial light modulators (SLM) that are driven by digital logic. Another technology gaining recent traction is neuromorphic photonics in which neural processing is implemented on photonic integrated circuits (PICS) with OEO conversion. The integration of large numbers of light emitting sources on PICs is now routine, relieving the OEO bottleneck as electronics and photonics merge in silicon photonics.

Farther afield are all-optical systems that are adaptive through the use of optically-addressed spatial light modulators or nonlinear materials. In fact, these types of adaptive all-optical neural networks were among the first demonstrated in the late 1980’s. More recently, advanced adaptive optical materials, as well as fiber delay lines for a type of recurrent neural network known as reservoir computing, have been used to implement faster and more efficient optical nonlinearities needed for adaptive updates of neural weights. But there are still years to go before light is adaptively controlling light entirely in the optical domain at the speeds and with the flexibility needed for real-world applications like photonic packet switching in telecom fiber-optic routers.

In stark contrast to the status of classical all-optical computing, photonic quantum computing is on the cusp of revolutionizing the field of quantum information science. The recent demonstration from the Canadian company Xanadu of a programmable photonic quantum computer that operates at room temperature may be the harbinger of what is to come in the third generation Machines of Light: Quantum Optical Computers, which is the topic of my next blog.

Further Reading

[1] V. Sasikala and K. Chitra, “All optical switching and associated technologies: a review,” Journal of Optics-India, vol. 47, no. 3, pp. 307-317, Sep (2018)

[2] C. Huang et a., “Prospects and applications of photonic neural networks,” Advances in Physics-X, vol. 7, no. 1, Jan (2022), Art no. 1981155

[3] G. Wetzstein, A. Ozcan, S. Gigan, S. H. Fan, D. Englund, M. Soljacic, C. Denz, D. A. B. Miller, and D. Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature, vol. 588, no. 7836, pp. 39-47, Dec (2020)

[4] X. Lin, Y. Rivenson, N. T. Yardimei, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science, vol. 361, no. 6406, pp. 1004-+, Sep (2018)

[5] X. Y. Xu, M. X. Tan, B. Corcoran, J. Y. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti, A. Mitchell, and D. J. Moss, “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature, vol. 589, no. 7840, pp. 44-+, Jan (2021)

Twenty Years at Light Speed: Fiber Optics and the Future of the Photonic Internet

Twenty years ago this November, my book Mind at Light Speed: A New Kind of Intelligence was published by The Free Press (Simon & Schuster, 2001).  The book described the state of optical science at the turn of the Millennium through three generations of Machines of Light:  The Optoelectronic Generation of electronic control meshed with photonic communication; The All-Optical Generation of optical logic; and The Quantum Optical Generation of quantum communication and computing.

To mark the occasion of the publication, this Blog Post begins a three-part series that updates the state-of-the-art of optical technology, looking at the advances in optical science and technology over the past 20 years since the publication of Mind at Light Speed.  This first blog reviews fiber optics and the photonic internet.  The second blog reviews all-optical communication and computing.  The third and final blog reviews the current state of photonic quantum communication and computing.

The Wabash Yacht Club

During late 1999 and early 2000, while I was writing Mind at Light Speed, my wife Laura and I would often have lunch at the ironically-named Wabash Yacht Club.  Not only was it not a Yacht Club, but it was a dark and dingy college-town bar located in a drab 70-‘s era plaza in West Lafayette, Indiana, far from any navigable body of water.  But it had a great garlic burger and we loved the atmosphere.

The Wabash River. No yachts. (https://www.riverlorian.com/wabash-river)

One of the TV monitors in the bar was always tuned to a station that covered stock news, and almost every day we would watch the NASDAQ rise 100 points just over lunch.  This was the time of the great dot-com stock-market bubble—one of the greatest speculative bubbles in the history of world economics.  In the second quarter of 2000, total US venture capital investments exceeded $30B as everyone chased the revolution in consumer market economics.

Fiber optics will remain the core technology of the internet for the foreseeable future.

Part of that dot-com bubble was a massive bubble in optical technology companies, because everyone knew that the dot-com era would ride on the back of fiber optics telecommunications.  Fiber optics at that time had already revolutionized transatlantic telecommunications, and there seemed to be no obstacle for it to do the same land-side with fiber optics to every home bringing every dot-com product to every house and every movie ever made.  What would make this possible was the tremendous information bandwidth that can be crammed into tiny glass fibers in the form of photon packets traveling at the speed of light.

Doing optics research at that time was a heady experience.  My research on real-time optical holography was only on the fringe of optical communications, but at the CLEO conference on lasers and electro-optics, I was invited by tiny optics companies to giant parties, like a fully-catered sunset cruise on a schooner sailing Baltimore’s inner harbor.  Venture capital scouts took me to dinner in San Francisco with an eye to scoop up whatever patents I could dream of.  And this was just the side show.  At the flagship fiber-optics conference, the Optical Fiber Conference (OFC) of the OSA, things were even crazier.  One tiny company that made a simple optical switch went almost overnight from a company worth a couple of million to being bought out by Nortel (the giant Canadian telecommunications conglomerate of the day) for over 4 billion dollars.

The Telecom Bubble and Bust

On the other side from the small mom-and-pop optics companies were the giants like Corning (who made the glass for the glass fiber optics) and Nortel.  At the height of the telecom bubble in September 2000, Nortel had a capitalization of almost $400B Canadian dollars due to massive speculation about the markets around fiber-optic networks.

One of the central questions of the optics bubble of Y2K was what the new internet market would look like.  Back then, fiber was only beginning to connect to distribution nodes that were connected off the main cross-country trunk lines.  Cable TV dominated the market with fixed programming where you had to watch whatever they transmitted whenever they transmitted it.  Google was only 2 years old, and Youtube didn’t even exist then—it was founded in 2005.  Everyone still shopped at malls, while Amazon had only gone public three years before.

There were fortune tellers who predicted that fiber-to-the-home would tap a vast market of online commerce where you could buy anything you wanted and have it delivered to your door.  They foretold of movies-on-demand, where anyone could stream any movie they wanted at any time.  They also foretold of phone calls and video chats that never went over the phone lines ruled by the telephone monopolies.  The bandwidth, the data rates, that these markets would drive were astronomical.  The only technology at that time that could support such high data rates was fiber optics.

At first, these fortune tellers drove an irrational exuberance.  But as the stocks inflated, there were doomsayers who pointed out that the costs at that time of bringing fiber into homes was prohibitive. And the idea that people would be willing to pay for movies-on-demand was laughable.  The cost of the equipment and the installation just didn’t match what then seemed to be a sparse market demand.  Furthermore, the fiber technology in the year 2000 couldn’t even get to the kind of data rates that could support these dreams.

In March of 2000 the NASDAQ hit a high of 5000, and then the bottom fell out.

By November 2001 the NASDAQ had fallen to 1500.  One of the worst cases of the telecom bust was Nortel whose capitalization plummeted from $400B at its high to $5B Canadian by August 2002.  Other optics companies fared little better.

The main questions, as we stand now looking back from 20 years in the future, are: What in real life motivated the optics bubble of 2000?  And how far has optical technology come since then?  The surprising answer is that the promise of optics in 2000 was not wrong—the time scale was just off. 

Fiber to the Home

Today, fixed last-mile broadband service is an assumed part of life in metro areas in the US.  This broadband takes on three forms: legacy coaxial cable, 4G wireless soon to be upgraded to 5G, and fiber optics.  There are arguments pro and con for each of these technologies, especially moving forward 10 or 20 years or more, and a lot is at stake.  The global market revenue was $108 Billion in 2020 and is expected to reach $200 Billion in 2027, growing at over 9% from 2021 to 2027.

(ShutterStock_75369058.jpg)

To sort through the pros and cons to pick the wining technology, several key performance parameters must be understood for each technology.  The two most important performance measures are bandwidth and latency.  Bandwidth is the data rate—how many bits per second can you get to the home.  Latency is a little more subtle.  It is the time it takes to complete a transmission.  This time includes the actual time for information to travel from a transmitter to a receiver, but that is rarely the major contributor.  Currently, almost all of the latency is caused by the logical operations needed to move the information onto and off of the home data links. 

Coax (short for coaxial cable) is attractive because so much of the last-mile legacy hardware is based on the old cable services.  But coax cable has very limited bandwidth and high latency. As a broadband technology, it is slowly disappearing.

Wireless is attractive because the information is transmitted in the open air without any need for physical wires or fibers.  But high data rates require high frequency.  For instance, 4G wireless operates at frequencies between 700 MHz to 2.6 GHz.  Current WiFi is 2.4 GHz or 5 GHz, and next-generation 5G will have 26 GHz using millimeter wave technology, and WiGig is even more extreme at 60 GHz.  While WiGig will deliver up to 10 Gbits per second, as everyone with wireless routers in their homes knows, the higher the frequency, the more it is blocked by walls or other obstacles.  Even 5 GHz is mostly attenuated by walls, and the attenuation gets worse as the frequency gets higher.  Testing of 5G networks has shown that cell towers need to be closely spaced to allow seamless coverage.  And the crazy high frequency of WiGig all but guarantees that it will only be usable for line-of-sight communication within a home or in an enterprise setting. 

Fiber for the last mile, on the other hand, has multiple advantages.  Chief among these is that fiber is passive.  It is a light pipe that has ten thousand times more usable bandwidth than a coaxial cable.  For instance, lab tests have pushed up to 100 Tbit/sec over kilometers of fiber.  To access that bandwidth, the input and output hardware can be continually upgraded, while the installed fiber is there to handle pretty much any amount of increasing data rates for the next 10 or 20 years.  Fiber installed today is supporting 1 Gbit/sec data rates, and the existing protocol will work up to 10 Gbit/sec—data rates that can only be hoped for with WiFi.  Furthermore, optical communications on fiber have latencies of around 1.5 msec over 20 kilometers compared with 4G LTE that has a latency of 8 msec over 1 mile.  The much lower latency is key to support activities that cannot stand much delay, such as voice over IP, video chat, remote controlled robots, and virtual reality (i.e., gaming).  On top of all of that, the internet technology up to the last mile is already almost all optical.  So fiber just extends the current architecture across the last mile.

Therefore, fixed fiber last-mile broadband service is a technology winner.  Though the costs can be higher than for WiFi or coax in the short run for installation, the long-run costs are lower when amortized over the lifetime of the installed fiber which can exceed 25 years.

It is becoming routine to have fiber-to-the-curb (FTTC) where a connection box converts photons in fibers into electrons on copper to take the information into the home.  But a market also exists in urban settings for fiber-to-the-home (FTTH) where the fiber goes directly into the house to a receiver and only then would the information be converted from photons to electrons and electronics.

Shortly after Mind at Light Speed was published in 2001, I was called up by a reporter for the Seattle Times who wanted to know my thoughts about FTTH.  When I extolled its virtue, he nearly hung up on me.  He was in the middle of debunking the telecom bubble and his premise was that FTTH was a fraud.  In 2001 he might have been right.  But in 2021, FTTH is here, it is expanding, and it will continue to do so for at least another quarter century.  Fiber to the home will become the legacy that some future disruptive technology will need to displace.

Fig. 1 Optical data rates on optical links, trunk lines and submarine cables over the past 30 years and projecting into the future. Redrawn from Refs. [1, 2]

Trunk-Line Fiber Optics

Despite the rosy picture for Fiber to the Home, a storm is brewing for the optical trunk lines.  The total traffic on the internet topped a billion Terrabytes in 2019 and is growing fast, doubling about every 2 years on an exponential growth curve.  In 20 years, that becomes another factor of a thousand more traffic in 2040 than today.  Therefore, the technology companies that manage and supply the internet worry about a capacity crunch that is fast approaching when there will be more demand than the internet can supply.

Over the past 20 years, the data rates on the fiber trunk lines—the major communication links that span the United States—matched demand by packing more bits in more ways into the fibers.  Up to 2009, increased data rates were achieved using dispersion-managed wavelength-division multiplexing (WDM) which means that they kept adding more lasers of slightly different colors to send the optical bits down the fiber.  For instance, in 2009 the commercial standard was 80 colors each running at 40 Gbit/sec for a total of 3.2 Tbit/sec down a single fiber. 

Since 2009, increased bandwidth has been achieved through coherent WDM, where not only the amplitude of light but also the phase of the light is used to encode bits of information using interferometry.  We are still in the coherent WDM era as improved signal processing is helping to fill the potential coherent bandwidth of a fiber.  Commercial protocols using phase-shift keying, quadrature phase-shift keying, and 16-quadrature amplitude modulation currently support 50 Gbit/sec, 100 Gbit/sec and 200 Gbit/sec, respectively.  But the capacity remaining is shrinking, and several years from now, a new era will need to begin in order to keep up with demand.  But if fibers are already using time, color, polarization and phase to carry information, what is left? 

The answer is space!

Coming soon will be commercial fiber trunk lines that use space-division multiplexing (SDM).  The simplest form is already happening now as fiber bundles are replacing single-mode fibers.  If you double the number of fibers in a cable, then you double the data rate of the cable.  But the problem with this simple approach is the scaling.  If you double just 10 times, then you need 1024 fibers in a single cable—each fiber needing its own hardware to launch the data and retrieve it at the other end.  This is linear scaling, which is bad scaling for commercial endeavors. 

Fig. 2 Fiber structures for space-division multiplexing (SDM). Fiber bundles are cables of individual single-mode fibers. Multi-element fibers (MEF) are single-mode fibers formed together inside the coating. Multi-core fibers (MCF) have multiple cores within the cladding. Few-mode fibers (FMF) are multi-mode fibers with small mode numbers. Coupled core (CC) fibers are multi-core fibers in which the cores are close enough that the light waves are coupled into coupled spatial modes. Redrawn from Ref. [3]

Therefore, alternatives for tapping into SDM are being explored in lab demonstrations now that have sublinear scaling (costs don’t rise as fast as improved capacity).  These include multi-element fibers where multiple fiber optical elements are manufactured as a group rather than individually and then combined into a cable.  There are also multi-core fibers, where multiple fibers share the same cladding.  These approaches provide multiple fibers for multiple channels without a proportional rise in cost.

More exciting are approaches that use few-mode-fibers (FMF) to support multiple spatial modes traveling simultaneously down the same fiber.  In the same vein are coupled-core fibers which is a middle ground between multi-core fibers and few-mode fibers in that individual cores can interact within the cladding to support coupled spatial modes that can encode separate spatial channels.  Finally, combinations of approaches can use multiple formats.  For instance, a recent experiment combined FMF and MCF that used 19 cores each supporting 6 spatial modes for a total of 114 spatial channels.

However, space-division multiplexing has been under development for several years now, yet it has not fully moved into commercial systems. This may be a sign that the doubling rate of bandwidth may be starting to slow down, just as Moore’s Law slowed down for electronic chips.  But there were doomsayers foretelling the end of Moore’s Law for decades before it actually slowed down, because new ideas cannot be predicted. But even if the full capacity of fiber is being approached, there is certainly nothing that will replace fiber with any better bandwidth.  So fiber optics will remain the core technology of the internet for the foreseeable future. 

But what of the other generations of Machines of Light: the all-optical and the quantum-optical generations?  How have optics and photonics fared in those fields?  Stay tuned for my next blogs to find out.

Bibliography

[1] P. J. Winzer, D. T. Neilson, and A. R. Chraplyvy, “Fiber-optic transmission and networking: the previous 20 and the next 20 years,” Optics Express, vol. 26, no. 18, pp. 24190-24239, Sep (2018) [Link]

[2] W. Shi, Y. Tian, and A. Gervais, “Scaling capacity of fiber-optic transmission systems via silicon photonics,” Nanophotonics, vol. 9, no. 16, pp. 4629-4663, Nov (2020)

[3] E. Agrell, M. Karlsson, A. R. Chraplyvy, D. J. Richardson, P. M. Krummrich, P. Winzer, K. Roberts, J. K. Fischer, S. J. Savory, B. J. Eggleton, M. Secondini, F. R. Kschischang, A. Lord, J. Prat, I. Tomkos, J. E. Bowers, S. Srinivasan, M. Brandt-Pearce, and N. Gisin, “Roadmap of optical communications,” Journal of Optics, vol. 18, no. 6, p. 063002, 2016/05/04 (2016) [Link]

Of Solar Flares, Cosmic Ray Physics and American Vikings

Exactly a thousand years ago this year an American Viking living in the Norse outpost on Straumfjord, on the northern tip of Newfoundland, took a metal axe and cut a tree.  The trimmed parts of the tree were cast away and, almost a thousand years later, were found by archeologists and stored for later study. What that study found was an exact date of the felling of the tree, in AD 1021.

How can that date be known to such precision?  The answer comes from a confluence of modern science: solar flares, cosmic ray physics, archeology, recent advances in dendrochronology, and the historiography of Icelandic sagas. The new findings were reported in the Oct. 20, 2021 issue of Nature.

American Vikings

Snorri Thorfinnsson was the first American Viking born in the Western Hemisphere.  He was born in Newfoundland sometime around AD 1007, the son of Thorfinn Karlsefni and his wife Gudrid Thorbjarnardottir, who were exploring the wooded coasts of Labrador and Newfoundland for timber to bring back to the Norse settlements in Greenland which had no wood for building.  Thorfinn and Gudrid traveled in a small fleet of Norse trading vessels known as knarrs.   

Knarrs were not the sleek long boats of Viking raiders, but were instead the semi-trailer trucks of the Viking expansion between AD 793 and 1066.  A knarr was an open planked boat about 50 feet long and 15 feet wide with a single mast and square-rigged sail.  It had a keel and could be rigged with a sprit to run close-hauled to the wind.  Its cargo was typically walrus ivory, wood, wool, wheat and furs with enough mid-ship room for a few livestock.

By using the technique of latitude sailing, that is by sailing to a known latitude and then keeping the North Star at a fixed angle above the horizon, knarrs could traverse the North Atlantic in a matter of weeks, sailing back and forth between Norway and Iceland and Greenland.  The trip from Greenland’s eastern settlement to Norway was 3000 km and took about 4 weeks (compare that to the two months it took the Mayflower to cross the Atlantic 600 years later).  Storms and bad weather put a crimp in this type of latitude sailing when the North Star could be obscured for days or weeks, and the sailors could end up somewhere they didn’t expect.  This is what happened to the merchant Bjarni Herjólfsson circa 985 when his ships were blown west in a terrible storm and he came upon a land of white beaches and green forests stretching to the horizon.  To get home, he sailed north along the new-discovered coast to the known latitude of Greenland and then headed east until he hit land. 

Map of the Norse voyages. Yellow: 3000 km between Greenland and Norway (about 4 weeks by knarr) was a “routine” voyage. Red: 3000 km between Greenland and the Norse outpost at Straumfjord in Newfoundland (about 4 weeks by knarr). Green: 2000 km from the northern tip of Newfoundland to Long Island Sound (about 3 weeks by knarr). Butternut wood remnants discovered at Straumfjord likely came from the southern coast of Maine or the coast of Connecticut.

Bjarni never set foot on the new land, but his tale inspired Leif Eriksson, the son of Erik the Red, to explore the new world.  Leif bought Bjarni’s knarr and with a small fleet sailed up the west coast of Greenland to where Bjarni had landed, then headed due west along the latitude of what is today the Davis Straight.  Leif made landfall on Baffin Island and sailed south down the Labrador coast to Belle Island in the Gulf of St. Lawrence, that he named Straumfjord, and then across to the northern tip of Newfoundland on the edge of a shallow bay where they could run their ships onto shore.  There, sometime around AD1000 they built a small settlement of wood houses that they used as a base for wider explorations of the land they called Vinland.  Later expeditions returned to the Straumfjord settlement and expanded it, including Thorfinn and Gudrid, where their son Snorri was born. 

View of the reconstructed Norse outpost at L’Anse aux Meadows in Newfoundland, Canada, and the Gulf of St. Lawrence (Straumfjord).

The voyage one-way between Newfoundland and Greenland took only 3 to 4 weeks, and each successive group repaired the damage from the ravages of the Newfoundland weather.  One of these repairs happened in the year AD 1021, long after Thorfinn and Gudrid and Snorri had resettled in northern Iceland, where their descendants crafted a saga of their exploits that was passed down by oral tradition through the generations until they were written down around AD 1400 and then almost forgotten…until the archeologist Anne Stine Ingstad with her husband Helge Ingstad found the remains of wood houses in 1960 buried under the turf at a place called L’Anse aux Meadows on Newfoundland’s northern tip. 

The Icelandic Saga of Erik the Red written around 1387-1394 and known as the Flateyjarbók (The Flatley Book).

The outpost at L’Anse aux Meadows was used on and off for decades as a base for the timber and fur trade. In addition to the dwellings, vast numbers of wood chips and discarded tree parts were uncovered, pointing to an active timber operation. Some of the wood is from the butternut tree which does not grow in Newfoundland nor anywhere along the shores of the Gulf of St. Lawrence. The modern areas of the butternut tree within range of Norse excursions are from the southern coast of Maine and the coast of Connecticut on Long Island Sound. Given how freely the Norse sailed their knarrs, making routine voyages of several weeks duration, the three-week trip from L’Anse aux Meadows to Long Island Sound seems easy, and there were plenty of bays to slip into for provisions as they went. Although there is no direct evidence for the Norse presence along the northeastern coast of the US, it seems highly likely that they plied these waterways and brought back the butternut timber to L’Anse aux Meadows.

Carbon 14 dating placed the age of the outpost at L’Anse aux Meadows at around AD 1000, consistent with the chronology of the Icelandic Sagas. But with an accuracy of plus or minus several decades it was not possible to know where it fit into the story…except for a lucky accident of solar physics.

Miyake Events and Solar Physics

In 2012, while studying tree rings from two cedar trees in Japan, Fuse Miyake of Nagoya University and his team from the Solar-Terrestrial Environment Laboratory made the unexpected discovery that a single tree ring, shared in common between the two specimens, had 20% more Carbon 14 than any of the other rings.  The ratio of Carbon 14 in nature to the usual Carbon 12 is very stable, with a variation of about 2% year to year, mostly due to measurement accuracy.  Therefore, the 20% spike in Carbon 14 was a striking anomaly.  By comparing the known ages of the cedars to the rings, using the techniques of dendrochronology, the date of the ring growth was pinpointed to the year 774-775.

A solar flare like this may generate a solar proton event (SPE).

Such a sudden increase in Carbon 14 over only a year’s time could only be caused by a sudden and massive influx of high-energy cosmic rays into the Earth’s upper atmosphere.  Carbon 14 is generated by the capture of 10-40 MeV neutrons by Nitrogen 14 followed by proton decay of the excited nitrogen nucleus.  The high-energy neutrons are generated as byproducts of even higher energy processes.  Miyake and his team considered high-energy gamma photons from a local super nova, but that was not consistent with the timing or amount of Carbon 14 that was generated.  They next considered a massive generation of high-energy solar protons when the sun spits out a massive surge of high-energy protons.    The exact cause of a solar proton event is still debated, but it is likely to be associated with solar flares that accelerate the protons to high energy.  The high-energy protons can penetrate the Earth’s magnetic field and cause particle cascades in the upper atmosphere.  They called it a Solar Proton Event (SPE), but it has since been renamed a Miyake Event.

Solar proton events may be associated with the Aurora Borealis. In the year of the Miyake event of 774 there were historical reports of unusual atmospheric lights and patterns. The Aurora is caused by electron currents which may be associated with the proton event.
High-energy protons from the sun cause high-altitude cosmic ray cascades that also produce high-energy neutrons. The neutrons are captured by Nitrogen 14 which decays rapidly into Carbon 14. Carbon 14 eventually decays back to Nitrogen 14 with a half life of about 5000 years.

Miyake Events are extremely rare.  There have been only about 3 or 4 SPE’s in the past 10,000 years.  By luck, another Miyake Event occurred in 993, about 8 years after Bjarni Herjólfsson was blown off course and about 7 years before Leif Eriksson began exploring the new world.  The excess Carbon 14 rained down on Earth and was incorporated into the fresh growth of juniper and fir trees growing near the northern Newfoundland shore.  Twenty seven years later, while repairing Leif Eriksson’s wood houses, a Viking felled the trees with a metal axe.  Chunks of the trees were discarded, with the traces of the metal axe blade as well as the outer bark of the tree intact.

The intact bark on the wood pieces was an essential part of the dating. Simply by counting the number of tree rings from the ring of 993, it was possible to know not only the year the tree was cut down, but even the season. Furthermore, the existence of the marks from the metal axe confirmed that the tree was felled by someone from the settlement because there were no metal tools among the indigenous people.

The Norse timber traders treated the indigenous people terribly from the very first expeditions, with tales of wanton murder recorded proudly in the later sagas. This was ultimately their undoing. Resistance from the local tribes could be fierce, and the Norse could spare few casualties in their small expeditions. Eventually, the Norse were driven off. The wood structures at L’Anse aux Meadows were burned before they sank beneath the turf, and human remains with arrow wounds have been uncovered from the site, hinting at how this bold tale ended.

Physics of the Flipping iPhone and the Fate of the Earth

Find an iPhone, then flip it face upwards (hopefully over a soft cushion or mattress).  What do you see?

An iPhone is a rectangular parallelepiped with three unequal dimensions and hence three unequal principle moments of inertia I1 < I2 < I3.  These axes are: vertical to the face, horizontal through the small dimension, and horizontal through the long dimension. So now spin the iPhone around its long axis, it keeps a nice and steady spin.  And then spin it around an axis point out of the face, again it’s a nice steady spin. But flip it face upwards, and it almost always does a half twist. Why?

The answer is variously known as the Tennis Racket Theorem or the Intermediate Axis Theorem or even the Dzhanibekov Effect. If you don’t have an iPhone or Samsung handy, then watch this NASA video of the effect.

Stability Analysis

The flipping iPhone is a rigid body experiencing force-free motion. The Euler equations are an easy way to approach the somewhat complicated physics. These equations are

They all equal zero because there is no torque. First let’s assume the object is rotating mainly around the x1 axis so that ω2 and ω3 are small (rotating mainly around ω1).  Then solving for the angular accelerations yields

This is a two-dimensional flow equation in the variables  ω2, ω3.  Hence we can apply classic stability analysis for rotation mainly about the x1 axis. The Jacobian matrix is

This matrix has a trace τ = 0 and a determinant Δ given by

Because of the ordering I1 < I2 < I3 we know that this is quantity is positive. 

Armed with the trace and the determinant of a two-dimensional flow, we simply need to look at the 2D “stability space” as shown in Fig. 1. The horizontal axis is the determinant of the Jacobian matrix evaluated at the fixed point of the motion, and the vertical axis is the trace. In the case of the flipping iPhone, the Jacobian matrix is independent of both ω2 and ω3 (if they are remain small), so it has a global stability. When the determinant is positive, the stability depends on the trace. If the trace is positive, all motions are unstable (deviations grow exponentially). If the trace is negative, all motions are stable. The sideways parabola in the figure is known as the discriminant. If solutions are within the discriminant, they are spirals. As the trace approaches the origin, the spirals get slower and slow, until they become simple harmonic motions when the trace goes to zero. This kind of marginal stability is also known as centers. Centers have a stead-state stability without dissipation.

Fig. 1 The stability space for two-dimensional dynamics. The vertical axis is the trace of the Jacobian matrix and the horizontal axis is the determinant. If the determinant is negative, all motions are unstable saddle points. Otherwise, stability depends on the sign of the trace, unless the trace is zero, for which case the motion has steady-state stability like celestial orbits or harmonic oscillators. (Reprinted from Ref. [1])

For the flipping iPhone (or tennis racket or book), the trace is zero and the determinant is positive for rotation mainly about the x1 axis, and the stability is therefore a “center”.  This is why the iPhone spins nicely about its axis with the smallest moment.

Let’s permute the indices to get the motion about the x3 axis with the largest moment. Then

The trace and determinant are

where the determinant is again positive and the stability is again a center.

But now let’s permute again so that the motion is mainly about the x2 axis with the intermediate moment.  In this case

And the trace and determinant are

The determinant is now negative, and from Fig. 1, this means that the stability is a saddle point. 

Saddle points in 2D have one stable manifold and one unstable manifold.  If the initial condition is just a little off the stability point, then the deviation will grow as the dynamical trajectory moves away from the equilibrium point along the unstable manifold.

The components of the angular frequencies of each of these cases is shown in Fig. 2 for rotation mainly around x1, then x2 and then x3. A small amount of rotation is given as an initial condition about the other two axes for each case. For these calculations, no approximations were made, using the full Euler equations, and the motion is fully three-dimensional.

Fig. 2 Angular frequency components for motion with initial conditions of spin mainly about, respecitvely, the x1, x2 and x3 axes. The x2 case shows strong nonlinearity and slow unstable dynamics that periodically reverse. (I1 = 0.3, I2 = 0.5, I3 = 0.7)

Fate of the Spinning Earth

When two of the axes have very similar moments of inertia, that is, when the object becomes more symmetric, then the unstable dynamics can get very slow. An example is shown in Fig. 3 for I2 just a bit smaller than I3. The high frequency spin remains the same for long times and then quickly reverses. During the time when the spin is nearly stable, the other angular frequencies are close to zero, and the object would have only a slight wobble to it. Yet, in time, the wobble goes from bad to worse, until the whole thing flips over. It’s inevitable for almost any real-world solid…like maybe the Earth.

Fig. 3 Angular frequencies for a slightly asymmetric rigid body. The spin remains the same for long times and then flips suddenly.

The Earth is an oblate spheroid, wider at the equator because of the centrifugal force of the rotation. If it were a perfect spheroid, then the two moments orthogonal to the spin axis would be identically equal. However, the Earth has landmasses, continents, that make the moments of inertia slightly unequal. This would have catastrophic consequences, because if the Earth were perfectly rigid, then every few million years it should flip over, scrambling the seasons!

But that doesn’t happen. The reason is that the Earth has a liquid mantel and outer core that very slowly dissipate any wobble. The Earth, and virtually every celestial object that has any type of internal friction, always spins about its axis with the highest moment of inertia, which also means the system relaxes to its lowest kinetic energy for conserved L through the simple equation

So we are safe!

Python Code

Here is a simple Python code to explore the intermediate axis theorem. Change the moments of inertia and change the initial conditions. Note that this program does not solve for the actual motions–the configuration-space trajectories. The solution of the Euler equations gives the time evolution of the three components of the angular velocity. Incremental rotations could be applied through rotation matrices operating on the configuration space to yield the configuration-space trajectory of the flipping iPhone (link to the technical details here).

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thurs Oct 7 19:38:57 2021

@author: David Nolte
Introduction to Modern Dynamics, 2nd edition (Oxford University Press, 2019)

FlipPhone Example
"""
import numpy as np
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
from scipy import integrate
from matplotlib import pyplot as plt

plt.close('all')
fig = plt.figure()
ax = fig.add_axes([0, 0, 1, 1], projection='3d')
ax.axis('on')

I1 = 0.45   # Moments of inertia can be changed here
I2 = 0.5
I3 = 0.55

def solve_lorenz(max_time=300.0):

# Flip Phone
    def flow_deriv(x_y_z, t0):

        x, y, z = x_y_z
        
        yp1 = ((I2-I3)/I1)*y*z;
        yp2 = ((I3-I1)/I2)*z*x;
        yp3 = ((I1-I2)/I3)*x*y;
        
        return [yp1, yp2, yp3]
    
    model_title = 'Flip Phone'

    t = np.linspace(0, max_time/4, int(250*max_time/4))

    # Solve for trajectories
    x0 = [[0.01,1,0.01]]   # Initial Conditions:  Change the major rotation axis here ....
    t = np.linspace(0, max_time, int(250*max_time))
    x_t = np.asarray([integrate.odeint(flow_deriv, x0i, t)
                      for x0i in x0])
     
    x, y, z = x_t[0,:,:].T
    lines = ax.plot(x, y, z, '-')
    plt.setp(lines, linewidth=0.5)

    ax.view_init(30, 30)
    plt.show()
    plt.title(model_title)
    plt.savefig('Flow3D')

    return t, x_t

ax.set_xlim((-1.1, 1.1))
ax.set_ylim((-1.1, 1.1))
ax.set_zlim((-1.1, 1.1))

t, x_t = solve_lorenz()

plt.figure(2)
lines = plt.plot(t,x_t[0,:,0],t,x_t[0,:,1],t,x_t[0,:,2])
plt.setp(lines, linewidth=1)


[1] D. D. Nolte, Introduction to Modern Dynamics, 2nd Edition (Oxford, 2019)

To see more on the Intermediate Axis Theorem, watch this amazing Youtube.

And here is another description of the Intermediate Axis Theorem.