Intel Lunar Lake ‘Skymont’ E-core Microarchitecture
Lion Cove delivers an impressive number of improvements to the core microarchitecture, but Skymont sees an even bigger advance over the prior gen with a 38% IPC improvement in integer workloads and a 68% IPC gain in floating point work, fueling an up to 2x increase in single-threaded performance and up to 4x more peak performance in multi-threaded workloads than the Meteor Lake LP E-cores. Intel also doubled throughput in vectorized AVX and VNNI workloads.
The Skymont architecture marks Intel’s third E-core design for x86 hybrid processors, following Gracemont in Alder Lake and Crestmont in Meteor Lake. (That’s not counting theTremont cores in Lakefield, which seem to be more of a proof of concept in hindsight.) The Meteor Lake design employed two E-cores placed in the SoC tile for extreme low-power workloads, with eight additional E-cores on the compute tile along with the P-cores (two quad-core clusters). With Lunar Lake, Intel employs a single quad core cluster on the compute tile to addresses both the low power E-core and high-power E-core roles with an expanded dynamic range.Intel optimized the branch prediction engine by incorporating a 96-instruction byte parallel fetch to feed the decode engine. The decode clusters are also expanded from 6-wide (2x3) with Crestmont to 9-wide (3x3) with Skymont, so any core in the new design can sustain nine instruction decodes per clock. Skymont also now employs nanocode to enable parallel microcode generation to allow the three decode clusters to execute in parallel more frequently. Micro-op capacity was also increased from 64 to 96 entries to add more buffering between the front end and back end.Skymont has an 8-wide allocation in the out of order engine, an increase from Crestmont’s 6-wide allocation. Skymont also expands to a 16-wide retire, a doubling over Crestmont’s 8-wide retire, to free up resources as quickly as possible after stalls, which improves power and area efficiency. The out of order window is 60% larger than the prior-gen, and the architecture has bigger register files, deeper reservation stations, and deeper load and store buffering. Parallelism is boosted by employing 26 dispatch ports, including eight ALUs, three jump ports, and support for three loads/cycle.Intel targeted a 2X improvement in vector performance, made by going from the two 128-bit FP and SIMD vector pipes to four with Skymont. Other improvements to the vector engine targeted latency reductions and adding support for floating point rounding. Intel also enhanced its load/store engine with a few enhancements listed in the slide.Previous E-core clusters had a shared 2MB L2 cache, but that has now been expanded to 4MB with double the L2 bandwidth. L1 to L1 transfer bandwidth was also improved.
The final results are impressive, with the aforementioned 38% and 68% improvement in single-threaded integer and floating-point performance, though this is notably compared to the low-power (LP) e-cores in the Meteor Lake SoC, not the standard quad-core cluster on the compute die. For perspective, the LP e-cores only have 2MB of cache compared to the standard e-core cluster with 4MB of cache. Again, Intel gives itself a rather large +/- 10% margin of error.Skymont’s power and single-threaded performance curve is vastly enhanced over Crestmont, but the comparisons are once again being made to the low-power Meteor Lake E-core instead of the full E-core. Compared to Crestmont’s peak performance, Skymont consumes one-third the power to deliver the same level of performance. However, it has more gas in the tank with 1.7X more performance at the same power level. Overall, Skymont’s peak single-threaded performance is twice that of the Crestmont LP E-cores.The multi-threaded power/performance metrics are skewed, as Intel compares Skymont’s quad-core cluster to Meteor Lake’s dual-core low-power E-core cluster instead of comparing it to the quad-core cluster. As such, we would expect to see half the stated advantages in these areas over the standard Meteor Lake quad-core cluster.Intel also provided comparisons for Skymont vs Raptor Lake’s P-core, which uses the Raptor Cove architecture. Intel claims a 2% IPC advantage for Skymont in integer and floating point.
Intel’s power-to-performance slides for the Skymont/Raptor Cove comparisons are easily misconstrued. In the last two slides, we can see that Intel zoomed in on an area of the performance curve that it says is the proper envelope for multi-threaded acceleration on a low power island. That yields the final slide, where Intel says that Skymont consumes 0.6X the power at the same performance as Raptor Cove, or 1.2X the performance at the same power. Again, we see the same high margin of error, so take these extrapolated comparisons with a healthy serving of salt.
Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.
Current page:Intel Lunar Lake ‘Skymont’ E-core Microarchitecture
Paul Alcorn is the Editor-in-Chief for Tom’s Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.