2 Avalanche high performance cores. 4 Blizzard high efficiency cores. Up to 5 Graphics Cores. 16 Neural Engine Cores. But, increasingly, it’s the non-big compute core features that are the most interesting…
Ok, so, A15 is Apple’s fifth-generation Bionic system-on-a-chip, or SoC. An SoC just means the components, like CPU, GPU, and memory aren’t all laid out on a board like… a charcuterie plate. They’re all on the same die or package, like… a sandwich. That sacrifices modularity for some very real advantages in economy and efficiency. Which… as you’ll see, is going to be a bit of a theme.
Now, the base Bionic architecture has been consistent over the last few years. High-efficiency cores, high performance cores, graphics cores, neural engine cores, and a bunch of other, much more specific silicon features to support the experiences Apple wants to deliver with every new iPhone.
But it didn’t start out that way. It started out with Steve Jobs, way back in the day, understanding that Apple had to own the technologies that would become critical differentiators for their products. Sure, he wanted the best Sushi chef from Japan for Caffe Macs, but he needed the best silicon engineers in the world for Apple silicon.
In 2007, the original iPhone launched with an off-the-shelf Samsung ARM11 processor, re-purposed from… set top boxes. But when the original iPad was set to debut in 2010, just a few years later, Apple’s first in-house processor was set to debut with it — the A4. And go into the iPhone 4 just a few months later.
And if you want a video on that, the decision to use the A4 in both the iPad and the iPhone, and why it was such a huge… quad major key to Apple’s eventual silicon dominance… including everything from S1 to M2, let me know in the comments below the like button.
Anyway, A4 was also ARM, a reduced instruction set or RISC architecture as opposed to the complex instruction set, or CISC, of x86 that Intel and AMD were using to own the PC world at the time. But Intel, who Apple had just finished transitioning to on the Mac, couldn’t make anything nearly efficient enough for devices as small as the iPhone or iPad.
So, Apple licensed ARM’s Cortex A8 design and came up with a single core CPU based on Hummingbird, 1 GHz for the iPad, 800 MHz for the iPhone 4, fabricated, or fabbed on Samsung’s 45 nanometer process.
But Apple, and more specifically Jony Srouji’s silicon team, had their sites set further out. Much further out.
In 2011, the A5 launched with dual ARM Cortex A9 cores, fabbed on Samsung’s 45 nanometer process. Again, 1 GHz in the iPad 2, 800 MHz in the iPhone 4s. Also, Apple’s first Image Signal Processor, or ISP, which would go on to be... beyond important for Apple’s upcoming work on computational photography.
Now, ARM has two different kinds of licenses. A design license, where you get access to ARM’s own cores, like Cortex. But also an instruction set architecture, or ISA license, where you get access to the code ARM uses but are free to make your own, custom core designs.
In 2012, with the A6, Apple switched from licensing ARM’s Cortex designs to licensing the ARMv7 ISA, and launched their first custom cores. Dual 1.3 GHz Swift cores for the iPhone 5, fabbed on Samsung’s 32 nanometer process.
Apple’s ISA license was or became incredibly open-ended as well. Where they could pretty much do whatever they wanted to do. Something we and the industry would see pay off real soon…
For their designs, Apple went wide and slow… relatively speaking. More cores. More bandwidth. Lower clock-speeds. Which let them handle more instructions at lower power and with much less heat. Something that’s also going to come up again for the A15. Also highly out-of-order and superscalar.
Because it was never about raw, brutish performance, it was the efficiency, dammit. Like… instead of trying to force everything out a firehose in high-pressure linear bursts, they envisioned a river. Instead of a 2 line highway with a ton of super cars stuck in traffic, a 4 lane or 8 line boulevard of SUVs with way more throughput.
Apple also started pushing ARM for a 64-bit instruction set. Hard. Something the existing merchant silicon vendors like Qualcomm and even in-house fabs like Samsung just hadn’t been doing. Because they made money on the chips, and the longer they could keep the same 32-bit designs on the market, the more money they’d make. Apple didn’t care about profit or loss at the chipset level. They made their money off the whole device. So they had no reason to slow walk their designs. They had every reason to run.
And that’s when we got the shot heard around the silicon world — in 2013 with the A7 and its 1.3 GHz Cyclone cores, on the ARMv8 instruction set — on ARM64, all fabbed on Samsung’s 28 nanometer process. Not just a more modern architecture but a cleaner, more targeted one that would let Apple really start scaling for the future. And the key thing here is — not because of Apple using ARM, but because of Apple starting to drive ARM.
To the point where not only were Qualcomm and Samsung totally blindsided, ARM themselves didn’t even have 64-bit core designs ready to license yet. And Apple was already racing ahead.
In 2015, the A8 didn’t go with an entirely new CPU architecture, but with an enhanced version of Cyclone. The 1.4 GHz dual Typhoon cores were notable because they switched from Samsung’s process to Taiwan Semiconductor Manufacturing, TSMC’s 20 nanometer process for the big and bigger iPhone 6 and iPhone 6 Plus.
In 2016, Samsung came back into the mix briefly for the A9 and its dual 1.9 GHz Twister cores. Supply constraints led Apple to dual-source for the iPhone 6s and 6s Plus, with some of the chipsets fabbed on Samsung’s 14 nanometer process, some on TSMC’s 16 nanometer process.
Despite it sounding like Samsung’s process was smaller, and therefor better, TSMC’s process ended up making cores that beat Samsung out on power efficiency. Which highlights a few critically important aspects of silicon fabs — process size isn’t a physical convention, it’s a marketing convention. And different fabs, even on the same or better sounding processes, might not end up producing the same performance or efficiency. Thus ended Samsung as a fab for Apple silicon.
But beyond that, Apple benefitted from generation after generation of process shrinks. From 45 nanometer on the A4 to 16 nanometer on the A9. And when the process shrinks, it means you can either fit the same amount of transistors in a smaller space, which makes for less heat and even better efficiency, or you can fit more transistors in the same amount of space add heat, which gives you a bigger transistor budget spend on faster cores or, as we’ll soon see, a plethora of other features.
In 2016, the A10 was a milestone. It debuted a brand new “Fusion” architecture. Akin to what ARM markets as big.LITTLE. Basically fusing dual high-efficiency 1 GHz Zephyr cores with dual high-performance 2.3 GHz Hurricane cores, fabbed on Taiwan Semiconductor Manufacturing Company, or TSMC’s 16 nanometer process.
The idea of the efficiency cores, or e-cores, was, as the performance cores, or p-cores got bigger and faster, they wouldn’t leave a giant, battery-draining gap beneath them for tasks that didn’t need cores anywhere nearly that big and fast. And a new Apple Performance controller, the secret sauce, would figure out which tasks would go to which pair of cores. But with A10, only the performance cores or the efficiency cores could be used at any given time, because Fusion.
Then, in 2017, we got the A11 and the Bionic architecture, which is what the A15 still uses today. It’s de… fused the new quad 1.6 GHz Mistral high-efficiency cores and dual 2.4 GHz Monsoon high-performance cores, which meant any or all of the e-cores and p-cores could be used separately or together for any task at any given time.
In 2018, Apple built on Bionic with the A12 and its quad 1.6 GHz Tempest high-efficiency cores, and quad 2.5 GHz Vortex high-performance cores, fabbed on TSMC’s 7 nanometer process. In 2019, with the A13 and its quad 1.7 GHz Thunder high-efficiency cores, and dual 2.7 GHz Lightning high performance cores, fabbed on TCMS’s second generation 7 nanometer process. Which, for the first time, included dedicated Apple machine-learning accelerator, or AMX blocks.
Then, in 2020, just last year, we got to A14 Bionic, with quad 1.8 GHz Icestorm high-efficiency cores and dual 3.1 GHz Firestorm high-performance cores, fabbed on TSMC’s 5 nanometer process. For the iPad Air 4 and iPhone 12. And… and… which would also form the foundation of the M1 in the first generation of Apple Silicon Macs and the current iPad Pro.
And now we have A15 Bionic, with quad Blizzard high-efficiency cores, probably still just under 2 GHz, and dual 3.2 GHz Avalanche high-performance cores. At least in the iPhone 13. They’re down clocked to 2.9 GHz on the iPad mini, maybe for design reasons, maybe just binned for yield and supply reasons.
Weirdly,Apple didn’t do their typical year-over-year comparisons between the A15 and A14, instead choosing to claim 50% better CPU perf over the competition, which is almost certainly Qualcomm’s Snapdragon 888.
But, if you do the synthetic benchmark math, those Avalanche p-Cores come out to roughly 10% faster single core perf than last year’s Firestorm p-Cores and just under 20% for multi-core. It’s not the leaps and bounds we saw in the early years when Apple moved to fully custom cores, or added cores, or benefited from process shrinks. But that Apple moved from… a song of Ice and Firestorm… to Avalanche and Blizzard, or double the cold codenames, might be less coincidence and more of a hint at how the bandwidth increases have once again enabled a leap forward not in terms of pure performance but in terms of performance per watt.
Before we can talk about that, though, we have to talk about the…
Apple’s been doing GPU hardware acceleration since before it was fashionable, leaning heavily on it for things like interface animations and, back in the day, making sure the original iPhone ran at a solid, consistent 60 frames per second. Steve Jobs insisted on it. But Apple didn’t get into custom GPUs for a while.
Not even with the 2010 A4, which used an Imagination PowerVR SGX 535 and they stayed with PowerVR even all the way through the 2016 A10 Fusion, which was based on a hexacore PowerVR 7XT GT7600 Plus. Based on, because Apple started customizing the GPU with their own, in-house shader cores and half-precision floating point to increase performance and power efficiency. Especially for new features like the depth effect computational photography behind the iPhone 7’s Portrait Mode.
In 2017, the A11 took it not just a step further but shader-fueled leap, with the first fully custom 3-core GPU, the Apple G10. A12 took that to Custom GPU to 4-cores with the G11P.
The A13 and A14 kept that same 4-GPU core configuration, and so does this year’s A15. Kinda!
The iPhone 13 Pro (and iPad mini) GPU, presumably the G14, have a 5th core that gives the Pro a whopping 55% increase in Metal performance over the A14. Again, according to synthetic benchmarks. What we see in the real world will vary depending on workloads.
But like I said, GPU accelerates a lot of the iOS experience. And Apple’s making sure all those cores are being fed by double the already beefy, beefy system cache in the A15 as well, which should mean 32 MB this year. And, yeah, hot damn. Or rather, cold damn.
Anyway, with the A15 GPU, Apple’s also added support for lossy texture compression. The A12 previously added lossless support, but lossy support means half the memory for the same resolution textures or, better still, double the resolution for the same memory.
Also, sparse depth and stencil textures, which save memory by not rendering textures below UI elements, for example, or shadows that fall outside the camera area. And SIM-D shuffle and fill.
Now, we still need to talk about the A15’s most impressive new capability, but there’s one more set of cores to dive into first!
A15 Neural Engine
In the beginning, in the early days of hardware accelerated machine learning, Apple relied on the GPU. But in 2017, for the A11, they debuted a new, dual core Neural Engine, or ANE, to better handle the massive processing required for all the new algorithms and adversarial neural networks behind new features like Face ID on the iPhone X, and in a faster and more efficient way than the general purpose GPUs ever could. That’s why Apple called this architecture Bionic, because of Steve Austin. Not Stone Cold, the 6-million dollar man. Wikipedia it.
And what was particularly interesting here was at a time when a lot of pundits were piling on the “Apple is way behind on artificial intelligence”… pile on. Apple had been busy implementing it not only in software for things like battery optimization, but baking it right into the silicon starting 2-3 years earlier.
It was more of a proto-neural engine and wasn’t accessible to developers yet, but that changed in 2018 with A12. It went from 2 to 8 cores, and from 600 billion operations per second to 5 trillion. In 2019, with the A13 and the AMX blocks added to the CPU, Apple also added a Machine Learning controller that, similar to the performance controller, dispatches tasks to the AMX, GPU, and the ANE in real time. The secret… saucier?
In 2020 with the A14, Apple doubled the Neural Engine count again, going to 16 cores and 11 trillion operations per second.
And now, with the A15, they’re sticking with 16 cores but has increased the amount of operations per second by over 40%, from 11 trillion to almost 15.8 trillion.
And that makes the kind of sense that does, given the ANE is what Apple’s leaning on for new A15 features like Cinematic Mode, which applies bokeh and rack focus to videos. Unlike Portrait Mode, which used depth data to create a segmentation mask for a still frame, then applies a custom lens model to it, Cinematic Mode has to process not only the current frame, but adjacent frames before and after it, to make sure the bokeh stays consistent and isn’t jumping around, and the rack focus is not only detecting things like shifts in gaze, but moving smoothly between them. Even when a new gaze is coming into a shot from outside the crop, so it can anticipate and lock on immediately. Again, all in real time, in the view finder, so what you shoot is as close to possible what you get. That’s some extreme silicon heavy lifting.
Especially when you consider how long it took phones like Google’s Pixel to begin doing even basic Portrait Mode previews.
But it’s also more than that. Because for a while now, Apple’s been recognizing that the era of big, general-purpose compute cores is behind us. Sure, there’ll be future process shrinks, like TSMC’s 3 and… who knows Quantum Realm nanometer, and architectural advancements, and instruction set bonuses, but there will also be a point where the laws of physics, the thermal envelop of the iPhone, and the need to avoid browning out iPhone-sized batteries will all come into play.
Which is why, I think, Apple is increasingly spending their transistor budget, especially this year — not on the traditional big cores but on specific feature.
Everyone knows about the Image Signal Processor, or ISP by now. For the vast majority of people, cameras are probably the second most important feature of any modern phone. Apple added it with the A5 for things like auto white balance, focus, face detection… all the basics. But it’s gotten more and more sophisticated over the years, especially in the age of image stacking and bracketing, and, yeah, computational photography.
With the A12 in the iPhone XS, Apple introduced Smart HDR, their first, more sophisticated imaging pipeline. With the A13 in the iPhone 11, Smart HDR 2 and Night Mode for extreme low-light and astro-photography. With the A14, Smart HDR 3 and Deep Fusion, for indoor lighting. And they tied it into the performance controller and unified memory so the pipeline could round-trip to the GPU, ANE, anything else it needed to accomplish the increasingly complex features Apple was providing.
Unified memory architecture allowed those tasks to not only be assigned to the best possible cores at any given time, but even round-tripped between the different cores without wasting time and energy on copying, which significantly reduced overhead and increased capability. That big advantage for SoC’s… for silicon sandwiches.
This year, with the A15, the ISP and Smart HDR 4 can handle semantic rendering for multiple people in the same shot. In other words, identify all the different faces, skin tones, textures, and other elements in the scene, and process them separately and individually to provide the best results for each one.
Apple’s also been using custom encode and decode blocks for hardware accelerated video for years as well. For H.264, then H.265, aka HEVC, then VP9 for YouTube. This year, they’re all new, and in addition to improving performance efficiency, there’s hardware acceleration for ProRes video. Something Apple was using a reprogrammable ASIC Afterburner card for on the Mac Pro just a couple years ago.
There’s also a new custom NVMe storage controller. The original one debuted in 2015 with the A9. It was brought over from the Mac and used PCIe internally and SSD, instead of embedded flash chips. The goal was to make sure every photo in every burst and every frame in every video was safely and properly recorded, the whole time, without dropping a frame, at any time. Something other phones, including and especially Google’s Pixel, struggled with for years.
The new version is… just on Hulk serum. Meant to support Cinematic Video, along with the depth maps and focus data, that you can use to edit the aperture and rack focus in post, and not only sustain 10-bit 4K HDR ProRes data, 6 GB per minute — 6 GB per minute! — for extended periods of time, but all the throughput needed to handle record all that video in real time.
And that’s not the type of innovation that typically gets called out on stage or in comments, but it’s the kind that makes a huge difference everyday, whether it’s instant shutter making sure you get exactly the photo of your kid or pet, in exactly the fun moment you were hoping to, or you have usable video for the client work you’re being paid for or social media work that’s paying you.
Like Portrait Mode, Face ID, Spatial Audio and Dolby Atmos, HDR and Dolby Vision, it’s an example of Apple’s silicon team working with the hardware and software teams, years in advance, not to deliver spec bumps — though they’ll certainly brag about those too and at every opportunity — but specific features and experiences they think can only be delivered in that way. Or, at least, best delivered and differentiated in that way.
The biggest example with A15, is battery life. A15 manages to deliver 10%, 20%, 55% better performance across single, multi, and graphics, but it does it while also helping provide for 1.5 hours of additional battery life on the iPhone 13 mini and iPhone 13 Pro, and 2.5 hours on the iPhone 13 and iPhone 13 Pro Max.
It’s literally making this year’s mini last longer than last-years non-mini, midi, whatever! And that’s just for mixed, everyday workloads. For highly optimized workloads like streaming hardware accelerated video, you’re getting 3 extra hours for this year’s mini over last year’s, and a brain boggling 13 extra hours for the Max.
Now, sure, the batteries are slightly bigger this year. Software optimizations are better. The Qualcomm X60 modem is on Samsung’s 5 nanometer process, which isn’t quite as good as TSMC’s but is way better and more efficient than last year’s X55 modem, especially for 5G. Also, the iPhone 13 Pro and Pro Max have adaptive refresh displays now, which can ramp up to 120Hz for high frame rate scrolling or gaming, but also ramp down to 10Hz for high efficiency idling. And that’s driven by the A15’s new Display Engine and guided by it’s new always-on touch controller, which adjusts that refresh rate in real-time based on how fast your finger is moving on the display at any given time.
And that’s kinda Apple silicon’s big secret — they don’t focus on performance. They focus on efficiency and the performance comes from that.
They want to be able to deliver most tasks, most of the time, at the lowest voltage and frequency, but still be ready to ramp up, to spike even, if and when you need it. And not just now, the year when the chipset is introduced, but for the 5 or more years that follow, when new versions of iOS and apps will deliver increasingly valuable and demanding features. That’s why the Apple A9 from the iPhone 6s and original SE, which shipped with iOS 9, will still be getting iOS 15 this year. And why the A15 Bionic, which ships with iOS 15, will almost certainly be getting iOS 19, 20, maybe 21 one day.
In other words, barely supporting smooth scrolling on the current software stack isn’t cool, being able to support it 3-5 years from now, with heavier future stacks… that’s cool. That’s the customer-facing advantage of Apple’s chipset lead — the headroom it delivers for us.
It’s why Apple’s silicon team has never really cared about MAXIMUM PERF in terms of a spec sheet number, especially not if it comes at the expense of maximizing efficiency. They’ll even go to the time, effort and expense of swapping out components if they can find a version that’s more efficient. Because of the efficiency, even modest increases in performance end up feeling significant.
They’re not architecting for the number, for the highest right point on the graph, but for the experience.
It’s why Apple’s philosophy is to move the whole chip forward, all the silicon IP forward, every generation. Year over year, to leave no corner untouched. And a lot of that is informed by looking at the kind of apps people are using, both Apple apps and App Store apps, what the OS, the operating system teams inside Apple are planning for the next few years out, and what sorts of tasks and workloads seem to be coming onto the market, or they expect will be coming onto the market. The trends they’re anticipating.
And that relentless drive towards efficiency isn’t even just at the chipset level, but for the whole entire architecture. Just like Apple extended the A14 into M1 for the first generation of ultra-low power custom Silicon Macs, it’s not hard to imagine them extending A15 into M2 for a second generation. Increasing the amount of Avalanche pCores and graphics cores, including those Thunderbolt controllers and x86 translation accelerators, and supporting new features for the next MacBook Air and 24-inch iMac, and the next iPad Pro.
But more on that in a follow up…