The Sensor Was Already There

In 2017, Shaun White landed a Cab Double Cork 1440 at the U.S. Open in Vail. The trick's name is a specification: two off-axis inversions plus two horizontal rotations, each assigned 360 degrees, totaling 1,440. That is what he set out to do. It is also what the broadcast camera recorded: a two-dimensional video file of a snowboarder moving through space, stored on servers, rewatched on highlight reels, uncontroversial.

Nine years later, my team at Google Cloud ran that same 2017 footage through an AI pipeline we built for the Milan Cortina 2026 Winter Olympics. The pipeline reconstructed White's skeleton in three dimensions at every frame, built a rigid body frame anchored on his spine and shoulders, and summed the angular displacement of that body frame from takeoff to landing. The number came back at 1,122 degrees.

That 318-degree gap is a measure of mastery; the fewer degrees an athlete needs to complete a trick, the more precisely they have controlled the axis, and the more margin they have for style and a clean landing. It is a physical quantity that was present in the 2017 footage but was not in the 2017 footage, in the sense that no one could read it. The camera captured it. The technology to extract it did not exist.

That is the observation this essay is about. Humanity has been recording two-dimensional video at industrial scale for roughly a century. Broadcast archives, surveillance feeds, security cameras, phone videos, drone passes, dashcams, factory floor cameras, telemedicine sessions, agricultural monitoring, YouTube. A staggering corpus. Until recently, almost none of it was readable as physical data; it was visual content, useful for human viewing and essentially opaque to machines that needed to understand what was happening in three-dimensional space.

The Olympics work demonstrated that this has changed. Given a modern multimodal model and a reasonable amount of compute, a standard broadcast camera is a biomechanical sensor. It always was. We just did not have the instrument to read it.

This is not primarily a story about broadcast innovation, though that is how it appeared on stage at NAB Show 2026 when I joined Darryl Jefferson of NBC Sports and Jane Day of Google Cloud for a panel called "Augmenting the Game." It is a story about what happens to the medium of video itself when every frame becomes a latent spatial dataset, and about what that unlocks for the part of AI that has to operate in the physical world.

Section I · The Primitive

From Pixels to Physics

The engineering problem is this. You have two-dimensional video, pixels changing over time. You want to recover the three-dimensional physical event that produced it: a body moving through space, its joints in specific positions, rotating around axes that are themselves moving. One camera angle. No sensors on the athlete.

Until recently, this was considered intractable. Traditional motion capture solved it by changing the problem: put reflective markers on a suit, calibrate multiple cameras, control the lighting, triangulate from known viewpoints. Useful in a laboratory. Useless on a halfpipe in sub-zero weather. Non-negotiable at the Olympic level, where athletes will not wear anything that alters their weight or aerodynamics by a fraction of an ounce. You cannot ask Chloe Kim to change her gear so your data pipeline works.

The pipeline we built does not solve the problem differently. It dissolves the framing. A modern multimodal model, trained on enough video of bodies in motion, learns to infer depth and three-dimensional structure from a single monocular view. It does not triangulate. It predicts, drawing on everything it has learned about how bodies are shaped, how joints connect, how limbs occlude each other from specific angles, and produces a three-dimensional skeleton at every frame, including joints hidden from the camera.

From there, the math is classical. A rigid body frame anchored on the spine and shoulders for stable orientation. Quaternions to represent that orientation cleanly, avoiding the mathematical singularities that break standard angle-based approaches when an athlete is rotating around three axes at once. Angular displacements summed across frames to compute total rotation. Axis tilt measured against global vertical, weighted toward the fastest-spinning phases, to describe the style of the rotation.

That is the primitive. Multimodal model for reconstruction, body frame for stable reference, quaternions for the math, reasoning model for the translation into language. All of it running at the edge, on infrastructure installed at the venue, because sub-second latency is not optional when the output has to appear in a broadcast replay window. The edge constraint is not a deployment choice. It is structural. The same constraint that shaped See & Spray's per-pass inference on moving machinery, shaped this.

The important thing about the primitive is not its sophistication. It is its scope. Nothing in the description above is specific to snowboarding. Replace "spine and shoulder axes" with "trunk and pelvic axes" and you have figure skating. Replace "rotation axis" with "joint angle trajectory" and you have gymnastics, diving, physical therapy. Replace "human body" with "tractor implement" or "robotic arm" or "animal gait" and you have something else. The pipeline does not know what sport it is analyzing. It knows bodies in space.

Horizontal intelligence, vertical domain. Google and DeepMind provided the primitive. U.S. Ski & Snowboard provided the knowledge of what to measure, what to name it, and when to trust it. That division of labor is the economics of Physical AI, and it is not specific to this partnership. Manufacturing, agriculture, logistics, healthcare; the horizontal layer is the same. The domain experts on top are different. The pipeline here happens to be pointed at a halfpipe. Next year it will be pointed at a factory floor and a row crop field.

Section II · The Cork Ribbon

Where the Argument Stops Being Abstract

The public demonstration of this primitive at Milan Cortina was a graphic called the Cork Ribbon. It is worth describing precisely, because the Ribbon is where the argument stops being abstract.

A cork is an off-axis rotation. The snowboarder's axis of spin is not vertical; it tilts 45 to 60 degrees off global up, sending the body through a diagonal spiral through the air. The sport's naming convention, a 1080, a 1440, a double cork, counts planned rotations in 180-degree increments and assigns a clean 360 degrees to each. It is a useful shorthand. It is also a fiction about what is actually happening geometrically.

The Cork Ribbon · 3D rotational plane visualization from standard broadcast video

The Ribbon is a visualization of the body's rotational plane through the trick, rendered as a continuous surface traced from takeoff to landing. A clean axis produces a ribbon that stays consistent in orientation. A drifting axis produces a ribbon that twists. For the first time, a viewer can see the thing that coaches have been feeling for decades and could not describe: the difference between a rider who is controlling a trick and a rider who is fighting it.

Paired with the Ribbon is a second metric, Rotational Degrees, which sums the total physical rotation the body traveled through space. Shaun White's Cab Double Cork 1440 at the 2017 U.S. Open, re-analyzed with this pipeline nine years after it was shot, measured 1,122 degrees of actual rotation. The 318-degree gap is not an error. It is a measure of mastery. The fewer degrees an athlete needs to complete a named trick, the more precisely they have controlled the axis, and the more margin they have for style, amplitude, and a clean landing.

Google's team deliberately did not call this new metric "Rotation." The trick name captures convention. Rotational Degrees captures geometry. Both are true. Both belong on the broadcast. The restraint in that naming choice is the difference between AI that corrects a domain and AI that augments it.

The measurement was made additive, not authoritative. That is the standard that should apply to every instance of this primitive, in every other domain it reaches.

Section III · What's Real, and What's Next

The Honest Accounting

Here is the honest accounting.

What exists today is a pipeline that extracts three-dimensional biomechanical data from standard broadcast video, running on infrastructure installed at specific venues, producing specific metrics for specific sports, integrated into a broadcast graphics workflow by a network operating at Olympic scale. It works. The Ribbon is real. The 1,122 degrees are measured, not simulated. Nine years ago, this would have required a research project. Today it runs in a production pipeline on a timeline a broadcast network can plan around. The cost curve has bent.

The conditions that made the Olympics work possible are not trivial, but they are also not exotic. Known camera angles. Managed lighting. High-contrast subjects against snow. A motion vocabulary, cork rotations, grabs, inversions, bounded enough that a training set can cover it well. Latency measured in the broadcast replay window, which is generous. A production budget that allowed domain experts to sit alongside engineers long enough to decide what was worth measuring. None of these conditions are permanent features of the partnership. They are starting conditions for a primitive that is on a clear path to generalization.

Broadcast is, in fact, one of the cleaner environments this primitive will ever encounter. Professional sports run on instrumented venues with predictable athlete trajectories. The cameras are already deployed. The lighting is already managed. The athletes, competing at the highest level, are moving in ways the model can recognize. The hardest domains, arbitrary video, arbitrary angles, arbitrary subjects, in arbitrary conditions, are somewhere down the road. But the spectrum between "Olympic halfpipe" and "random YouTube clip" is not a cliff; it is a gradient. The next domains are close.

College sports. Professional leagues below the top tier. Gymnastics, figure skating, diving, every sport where physics and geometry are what the judges are trying to measure, and where the video infrastructure already exists. Then clinical gait analysis, physical therapy, rehabilitation, surgical technique review, domains where the subjects are cooperative, the environments are controlled, and the economic case for extracting spatial data from video is immediate. Then industrial quality control, where cameras on factory floors become inspection sensors. Each of these is easier than the Olympics in some respects and harder in others, but none of them are a full step-change.

The genuinely hard domains, unstructured outdoor environments, biological subjects that do not cooperate, conditions that change minute to minute, are where this primitive meets agriculture and robotics in the wild. That is the frontier my other writing has focused on, because that is where the constraints are most unforgiving. The halfpipe is not that frontier. It is the demonstration that the primitive is ready to leave the research lab and operate in production environments. It will reach the harder domains, because the direction of travel is now clear and the architecture that gets there is understood. But it will get there in stages, and most of the intermediate stages are still more tractable than anything Physical AI has had to deal with in agriculture or robotics.

What makes the Olympics case compelling is not that it solves the general problem. It is that it is the first large-scale public demonstration that the primitive works in production, on live events, at broadcast quality, with the extracted data integrated into workflows that commercial operations can build around.

A century of 2D video sits in archives, on servers, in cloud storage, in camera rolls. Some of it was shot in conditions close to what Milan Cortina required. Much more of it was not. The gap between "reliably extractable in controlled conditions" and "reliably extractable at scale in arbitrary conditions" is narrower than the gap Physical AI has to close in a crop field, and it is closing faster. That is the news.

The halfpipe is not a clean room. The halfpipe is the first field site where the primitive got its production debut. The clean rooms were the research labs where monocular 3D pose estimation was being iterated on for years before any of this reached a broadcast. The crop fields are still ahead. What the Olympics proved is that the distance between the lab and the most forgiving field sites has been closed, and that the primitive is now working outside the lab, under real production constraints, in ways that are going to generalize faster than most people expect.

The Sensor Was Always There

The primitive is this: a standard broadcast camera, pointed at a body moving through space, can now produce three-dimensional biomechanical data in real time. No sensors. No suits. No calibrated arrays. Just video, a multimodal model, and enough compute at the edge to keep up with the physics. The Milan Cortina work proved it runs in production. Everything that comes next is generalization.

What the Olympics demonstrated, more than anything, is that the sensor was always there. A century of 2D footage, broadcast archives, phone videos, security feeds, factory floor cameras, drone passes, everything, is now a latent spatial dataset waiting to be read. The hardware was never the bottleneck. The instrument to read it is what we were missing. We have it now.

This is the work I have been writing about under the name Physical AI. The horizontal primitive: spatial reconstruction from commodity sensors. The vertical domains: sports, agriculture, manufacturing, medicine, logistics, everywhere the physical world is the primary environment. The economics: Big Tech builds the primitive, domain experts build the value on top. The constraint that shapes all of it: intelligence that has to operate in the real world, under the conditions of the real world, without the luxuries of the cloud.

A halfpipe in Italy turned out to be the right place to prove the primitive in public. The next domains will prove themselves elsewhere, quieter, less telegenic, harder to put on a broadcast, but no less consequential. The question I am carrying out of this work is not whether the generalization happens. It is how fast, and in what order, and who is ready for what it makes possible.