This isn't the same article as "Sora 2 vs Veo 3.1: Which Model to Pick." That one's a quick decision guide. This one's a deep dive: 12 test scenes, what exactly breaks, where models lie about themselves, and why the same prompt gives different results on the fifth generation.

All tests were run in the Quantium video generator. One subscription, no separate OpenAI or Google accounts. Each scene ran at least 3 times; I took the median result.

12 Test Scenes — What We Generated

To avoid comparing apples to oranges, I set a fixed list of scenes. Each one's a specific real-world production case:

  • Cinematic night flight over a city
  • Surfer on a wave in slow-motion
  • Barista making coffee, pouring milk
  • Athlete at the start, then a sudden move
  • Cafe interior with two people at a table
  • Product animation (watch rotating on white background)
  • Landscape with clouds moving (timelapse)
  • Animal action — dog running on a beach
  • Dancer turns, clothing flowing
  • Image-to-video: bringing a couple's photo to life
  • Dialogue: two characters speaking and gesturing
  • Complex 18-second scene with multiple actions

Movement Physics

Sora 2 wins here in 8 out of 12 scenes. Especially with water, fabric, and inertia. When a surfer leaves the water and drops fly, Sora shows them following a real trajectory. Veo's sometimes stick to the body or hang in the air. When a dancer turns, Sora's dress fabric continues moving with inertia; Veo's