NUMA is a computer memory architecture where the memory access times differ depending on the distance between the processor and the place where the memory is stored.

Processor caches can be used to reduce the variance in access latencies.

Optimising NUMA

Interleaved memory allocation

  • It is placed in a round-robin fashion across all the nodes
  • Ensures memory access times are uniform on average node-local
  • Assume the memory is likely to be accessed by threads running on cores in the same NUMA region
  • Allocates memory close to the processor which executes the malloc()

Distributed memory issue: Each processor has its own memory. The processor may need to communicate with remote processors if remote data is required