Sora 2 vs Veo 3.1: An In-Depth Look at AI Video

This isn't the same article as "Sora 2 vs Veo 3.1: Which Model to Pick." That one's a quick decision guide. This one's a deep dive: 12 test scenes, what exactly breaks, where models lie about themselves, and why the same prompt gives different results on the fifth generation.

All tests were run in the Quantium video generator. One subscription, no separate OpenAI or Google accounts. Each scene ran at least 3 times; I took the median result.

12 Test Scenes — What We Generated

To avoid comparing apples to oranges, I set a fixed list of scenes. Each one's a specific real-world production case:

Cinematic night flight over a city
Surfer on a wave in slow-motion
Barista making coffee, pouring milk
Athlete at the start, then a sudden move
Cafe interior with two people at a table
Product animation (watch rotating on white background)
Landscape with clouds moving (timelapse)
Animal action — dog running on a beach
Dancer turns, clothing flowing
Image-to-video: bringing a couple's photo to life
Dialogue: two characters speaking and gesturing
Complex 18-second scene with multiple actions

Movement Physics

Sora 2 wins here in 8 out of 12 scenes. Especially with water, fabric, and inertia. When a surfer leaves the water and drops fly, Sora shows them following a real trajectory. Veo's sometimes stick to the body or hang in the air. When a dancer turns, Sora's dress fabric continues moving with inertia; Veo's

Sora 2 vs Veo 3.1:AI Video Deep Dive

12 Test Scenes — What We Generated

Movement Physics

Sora 2 vs Veo 3.1:
AI Video Deep Dive