M1 Ultra Benchmarks — The Ugly Truth

Ok, real talk, so don’t judge — I was testing M1 Max vs. M1 Ultra GPU performance on the Mac Studio and… I just wasn’t seeing the difference I expected. Then, I realized, I was testing GFXBench Aztec at 1440 on screen, and… that just wasn’t enough load. Like… testing towing capacity between a Camry and a … Tacoma… with all the weight of a MacBook Air. I was barely putting a dent on the Max, never mind the Ultra. I needed to test with 4K off-screen in order to actually see what the Ultra could do. Then… it was like… boom… up to 1.9x. Same with Wildlife. I had to put it into Unlimited mode.

Then I was trying to decipher Apple’s M1 Ultra vs. GeForce 3090 graphs. Like, what were they even saying? Performance per watt up to max watt for the Ultra? Is that like… gas mileage with a 100 mile-per-hour speed limit? Couldn’t the Nvidia card just keep burning fuel to 300? Did I care most about peak high number or being able to actually fit into a small enclosure without melting into Super Mario lava?

Until I realized was it actually only measuring power on GPU, not power through system, which varies radically between Apple’s SoC-based approach compared to Nvidia’s discreet card-in-a-slot. Like… gas in both tanks vs. everything consumed to get that gas into the tank as well.

And it gets even weirded. Because I was looking at Shadow of the Tomb Raider tests, which run cross platform, but run as x86 through Rosetta translation on M1 Macs. They do target Apple’s low-level Metal API, which can theoretically perform as good or even better on M1 than it does on Intel Mac, but that depends entirely on the quality of the API implementation, and how well its optimized for Mac compared to Windows… and wow look at all those frames Nvidia gets. But then Topaz Video Enhance AI frames on the M1 Ultra, when you take the limiters off… and…

Honestly, so what if M1 Ultra does slightly better than 3090 on Aztec 4K off-screen and slightly worse on Wildlife Unlimited, no one who wants CUDA cores or high-end, Ray-traced gaming really cares how the M1 Ultra compares anyway. And anyone who just wants massive GPU with massive RAM on their Mac… can’t even use Nvidia… much less get it there.

So, while UltraFusing two separate 32-core GPU blocks into one massive 64-core GPU Metal target with up to 128GB of RAM and 800GB of memory bandwidth is an unprecedented table slap of silicon nerdery, it doesn’t really change anything fundamental about either ecosystem. It’s just chum in the headlines and comments sections for people who don’t really get how benchmarking really work anyway.

Mostly because, so many benchmarks now have been all wrapped up into neat little apps or games that literally anyone, including, terrifyingly — me! — can just download, run once and done. They spit out numbers, sometimes highly relative and abstract, with almost zero time and effort, all pretty for posting. But they don’t really tell you how to run them, what the numbers actually mean, or give you any of the context needed to interpret them. They’ve become pop culture, or what I’ve been calling Benchmark LARP, live action roll-play. Especially when compared to say… what outlets like AnandTech and a few others still manage to produce, with incredible talent, and a ton of high-order-bit work and effort.

Because the ugly truth is, while running benchmarking might be easier than ever, understanding them is significantly more complex.

I’m not even talking about the simple stuff, like if you’re testing single core perf, realizing all the M1 SoC have the exact same single cores, it’s just the bigger ones literally have way, way, n-to-the-way multi more of them. Or… like… if you’re comparing the M1 Ultra package size to the Intel or AMD CPU package size… when M1 Ultra is a whole entire SoC with CPU, yes, but also 64 GPU cores, 32 ANE cores, never mind the media engines, the I/O controllers… the RAM chips. Because you never want your take to be so hot it burns you.

Or if you’ve been paid for coverage by the Intels and Qualcomms, but also super salty Apple doesn’t pay for that, you’ve got to at least disclose that in your snark-tweets or you’re basically putting the PC in NPC.

But usually it’s way more complicated and nuanced. Like.. don’t laugh. Testing the 13-inch M1 MacBook Air Pro against the Intel 10th Gen 13-inch MacBook Pro with video rendering, and realizing H.264 encoding doesn’t hit the M1 Firestorm cores or the Intel Icelake cores, it hits the A14-generation media engines on the M1 box and the A10-generation media engines on the T2 coprocessor on the Intel box. To test the M1 vs. Intel, you’d have to test with something like ProRes, which is still CPU-bound on those boxes. Otherwise, the only thing it’s testing is Apples vs. Apple’s older Apples.

Never mind figuring out what’s hitting efficiency cores vs. performance cores, which may matter given M1 has 4 e-cores and M1 Pro and Max have 2 e-cores but M1 Ultra has 2 times 2 e-cores. We saw that blow up spectacularly with early A14 vs. A15 hot takes, where the efficiency cores ended up being significantly faster, and the performance cores, only slightly faster but quite a bit more efficient, which along with double the system cache, just ended up making the whole die way better for battery life. That burned a lot of bloggers and re-bloggers kinda badly… and since M2 may well be based on A15, like M1 was based on A14, history could repeat itself with the next Mac mini and MacBook Air as well.

Also, what’s hitting the Apple Neural Engine or ANE cores, vs. the CPU cores, even the AMX accelerators on the CPU cores. Because the machine learning controller taps into all of them, and they all vary in scale, to lesser or grater degrees, across the M1 family. And unlike Metal and the GPU, CoreML can’t treat them as a single target, but it can dispatch between them, so where you’d expect up to 1.9x scaling on M1 Ultra graphics workloads, you can really only expect close to 1.5x scaling on M1 Ultra machine learning workloads. Similar if not the same with CPU and media engine scalability, which can handle as many compute units as you throw at them… give you close to linear scalability, until they don’t… until some gnarly bit of code or codec hits them and flattens those hockey sticks.

Never mind how many other things can affect benchmarks and performance in general. Like ambient temperature. Radios, if you forget to go into Airplane Mode first. Errant processes, if you forget to reboot first. Other tasks, if you forget to make sure no other tasks are running at the same time. Even settings, if you forget to triple check they’re exactly the same between machines and tests.

Which is why it’s so easy to mark those benches but so damn hard to do it right, and why I don’t do very much of it, not beyond superficially validating performance claims at least. I just don’t want to contribute to the LARP culture or comment toxicity, and would rather just point you to AnandTech and the other experts. But also point out that if none of those numbers are meaningful or important to you, that’s fine too.

Just simplify it all the way down. Start with the M1. Is it enough for you? For the vast majority of people, the answer is an easy yes. But if you really do need more built-in ports and more performance, step up to M1 Pro. Especially when and if Apple rolls out tweener Mac mini Pros, maybe even iMac Pros, with that as an option. It’ll cover almost everyone at that point. But if you know your workload, and you know even M1 Pro isn’t enough, and you want basically double the GPU, media engines, and memory, jump up to the M1 Max. And only… only in the rare cases where even that isn’t enough, but double everything again will be, including price, only then leap up to the M1 Ultra. Or, if you really, truly need PCIe expansion slots, wait on the Apple Silicon Mac Pro.

But if you’re concerned about the price, if you’re worried about the money it’ll be costing you instead of the money it’ll be making you… if the time it’ll be saving you or the scale it’ll be giving isn’t worth way, way more than the money it’ll be costing you, that’s a giant, neon, omega-level mutant alert. Unless you already have all the money, and just want to flex the new shiny, in which case, you spend you!