Memory Performance Across CPU Microarchitectures

Mourad Bouache

AI and Performance Tools at Intel

Published Nov 22, 2016

The modern evolution of CPU and memory showcase that Moore’s law is still applicable to this day. The transistor count has been increasing every year, however, due to thermal and/or power constraints, the frequency or speed has not enjoyed the same growth and has barely doubled in the last decade. From 2000 to 2009, the CPU speed went from 1.3 GHz to 2.8 GHz. Transistor-count on the other hand increased from 37.5 million in 2000 to 904 million in 2009. This means that transistor count does not automatically translate in raw CPU speed increase. CPU frequency slowly but regularly increased until around 2004 when the heat build-up in the chips caused Intel to abandon the consistent speed improvement and move towards a design with multiple processors (cores) on the same CPU chip. The industry followed soon after.

For memory subsystems, Moore’s Law originally applied only to random access memory (RAM). It has been generalized to apply to the CPU and to disk storage capacity as well. Indeed, disk capacity has been improving by leaps and bounds; it has improved 100 fold over the last decade. Disks spin three times faster now, and are also 5 times smaller than they were 15 years ago; while the data rate has improved only 30 fold in the same timeframe.

Processor speed and core counts are important factors when designing a new server platform. However with virtualization platforms the memory subsystem can have an equal or sometimes even greater impact on application performance than the processor speed. The application performance is also linked to the Quick Path Interconnect (QPI) speed as well as a Non-Uniform Memory Access (NUMA).

Memory configuration, or memory population, has a direct impact on server and application performance. The CPU type and generation impact the type of memory configuration and performance so when deciding on new server configurations there are a wide variety of options. Memory Channels, Memory bus frequency, and rank of DIMMs are just a selection of options you encounter. The number of DIMMs used and how the DIMMs are populated on the server board impact performance and the maximum supported memory capacities. All of which are taken into consideration when evaluating system or application performance.

Diagnosing and troubleshooting memory issues in enterprise server configurations is an important process that can help prevent unnecessary replacement of hardware components. Having a troubleshooting methodology helps to accurately diagnose good from bad components as well as determine component mismatches that may boot but not run optimally. For example, frequency mismatches down clock to the lower frequency thus stranding performance on the faster DIMM. Standard diagnostic tools usually help in troubleshooting memory problems by successfully isolating the specific DIMMs causing the problem, which prevents replacement of unaffected DIMMs, or in some cases, entire banks of memory. In addition, systematic troubleshooting can help determine if a firmware or other software download can resolve a problem without replacing hardware.

Many server manufacturers do provide a diagram on how to populate DIMMs in the proper order to ensure the system will run Power On Self Test (POST) and have optimal performance across the memory controller but it does not make note of the performance degradations that occur when mixing DIMMs of different Ranks or Frequency. Since this mismatched state leads to performance differences across the same model server within a given application cluster we wanted to evaluate what those differences are so we can more accurately detect them in the larger Data Center Environment. While optimal memory placement is recommended dealing with the upgrades and reconfigurations that results in mismatches are common in a real world environment.

Memory Performance Across CPU Microarchitectures

Mourad Bouache

AI and Performance Tools at Intel

More articles by this author

Sign in

Insights from the community

Others also viewed

Kubernetes for high-performance applications - part 2

Mainframe Memory

Things you forget to check about CPU in performance testing

[Performance] : What does CPU% usage tell us ?

"The Ultimate Guide to CPU Analysis: Boosting Efficiency and Troubleshooting Performance"

Performance, Scalability and Availability checklist which can be used to check if costly CPU cycles are the reason for the impact.

In the den of lions: Sys calls inside kernel space: PartIII

Beware of NUMA

10 Easiest Techniques to SAVE the MSUs

...but isn't the Raspberry Pi a weak computer, with very little CPU power?

Explore topics

Big Data PROBLEM!

Dec 8, 2017

Combine Spark with MLibrary and Big Data tools

Nov 14, 2017

Smart Java Garbage Collection

Jul 10, 2017

CPU or GPU?

May 23, 2017

CPU Microarchitecture Simulation Using Data Science

Dec 29, 2016

Storing Big Data

Dec 15, 2016

What’s Big Data?

Dec 9, 2016

Computing On Many-Cores [call for collaboration]

Nov 30, 2016

Secure Sockets Layer Performance

Nov 28, 2016

Deep Learning Platforms

Nov 23, 2016