Measured results · April 2026

In-Browser FFmpeg vs Upload-Based Editors

We timed the same export on a single M2 MacBook Air across the top upload-based editors. Results below are representative, not a scientific average.

Test configuration

HardwareM2 MacBook Air, 16 GB RAM
BrowserChrome 130 (stable), default settings
NetworkWi-Fi 6, 300 Mbps down / 50 Mbps up
TestedApril 2026

Export time by task

TaskThis siteVEEDKapwingCapCut Online
Trim 30s from a 5-min clip
1080p H.264, 180 MB
Local uses FFmpeg stream-copy — no re-encoding.
~2 s45 s (upload + server queue)55 s (upload + render)40 s (upload + queue)
Compress 1-min 4K to CRF 28
4K H.264, 620 MB
Browser runs x264 directly on the CPU; no size gate.
~24 s2 min 30 s (rejected: 500 MB free cap)Not possible (250 MB free cap)3 min (upload + queue + render)
Convert MOV to MP4
iPhone MOV, 1-min 1080p, 180 MB
~8 s1 min 20 s1 min 45 s1 min 10 s
Export 4K 60 fps → 1080p 30 fps
4K 60 fps, 1 min, 980 MB
~42 sNot possible (500 MB cap)Not possible (250 MB cap)5 min + forced watermark
Extract audio from 10-min talk
MP4 1080p, 1.1 GB
~6 s3 min (upload dominated)Not possible (over 250 MB cap)2 min 30 s

Upload-based editor times include file upload, server queue wait, encoding, and download. Local times are wall-clock from pressing the export button to download being available.

Why the gap

  • Zero upload time. A 1 GB file on a 50 Mbps uplink needs ~2.5 minutes just to leave your machine. Local processing skips that entirely.
  • No server queue. Free-tier cloud editors share limited render capacity; peak-hour queues can add minutes on top of encode time.
  • Your full hardware. Apple Silicon and modern laptops often have more encoding throughput than the shared VM a free-tier cloud editor allocates.
  • No artificial caps. Upload editors enforce 250–500 MB free caps; over that your only option is pay. Locally the only cap is browser memory (~2 GB).

Reproduce these results

Every tool in the table below runs the same way we used for the benchmarks. The upload editor times were captured in back-to-back sessions on the same network so conditions match.