Last year I posted a blog entry entitled Linux Thinks It’s a CPU, But What Is It Really – Part I. Mapping Xeon 5500 (Nehalem) Processor Threads to Linux OS CPUs where I discussed the Intel CPU Topology Tool. The topology tool is most helpful when trying to quickly map Linux OS processors to physical processor cores or threads. That post gets read, on average, close to 20 times per day since it was posted (10,000+views) so I thought it deserves a follow-up pertaining to more recent Intel processors and, more importantly, more recent Linux releases.
I’m happy to point out that the tool still functions just fine for Intel Xeon 7500 series processors (a.k.a. Nehalem EX see also Sun Oracle’s Sun Fire X4800), however, with recent Linux releases the tool is not quite as necessary. With both Enterprise Linux Enterprise Linux Server release 5.5 (Oracle Enterprise Linux 5.5 ) and Red Hat Enterprise Linux Server release 5.5 the numactl(8) command now renders output that makes it quite clear which sockets associate with which OS processors.
The following output was captured from an 8-socket Nehalem EX machine:
$ numactl --hardware available: 8 nodes (0-7) node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71 node 0 size: 131062 MB node 0 free: 122879 MB node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79 node 1 size: 131072 MB node 1 free: 125546 MB node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87 node 2 size: 131072 MB node 2 free: 125312 MB node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95 node 3 size: 131072 MB node 3 free: 126543 MB node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103 node 4 size: 131072 MB node 4 free: 125454 MB node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111 node 5 size: 131072 MB node 5 free: 124881 MB node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119 node 6 size: 131072 MB node 6 free: 123862 MB node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127 node 7 size: 131072 MB node 7 free: 126054 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 15 20 15 15 20 20 20 1: 15 10 15 20 20 15 20 20 2: 20 15 10 15 20 20 15 20 3: 15 20 15 10 20 20 20 15 4: 15 20 20 20 10 15 15 20 5: 20 15 20 20 15 10 20 15 6: 20 20 15 20 15 20 10 15 7: 20 20 20 15 20 15 15 10
A node is synonymous with a socket in this case. So, as the output shows, socket 0 maps to OS processors 0-7 and 64-71, the latter range being processor threads. Let’s see how similar this output is to the Intel CPU Topology Tool (NOTE – hover over the box and click view source for best presentation):
Package 0 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') L1D is Level 1 Data cache, size(KBytes)= 32, Cores/cache= 2, Caches/package= 8 L1I is Level 1 Instruction cache, size(KBytes)= 32, Cores/cache= 2, Caches/package= 8 L2 is Level 2 Unified cache, size(KBytes)= 256, Cores/cache= 2, Caches/package= 8 L3 is Level 3 Unified cache, size(KBytes)= 24576, Cores/cache= 16, Caches/package= 1 +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1D | L1D | L1D | L1D | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | OScpu#| 0 64| 1 65| 2 66| 3 67| 4 68| 5 69| 6 70| 7 71| Core | c0_t0 c0_t1| c1_t0 c1_t1| c2_t0 c2_t1| c3_t0 c3_t1| c4_t0 c4_t1| c5_t0 c5_t1| c6_t0 c6_t1| c7_t0 c7_t1| AffMsk| 1 1z16| 2 2z16| 4 4z16| 8 8z16| 10 1z17| 20 2z17| 40 4z17| 80 8z17| +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1I | L1I | L1I | L1I | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L2 | L2 | L2 | L2 | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | 256K | 256K | 256K | 256K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L3 | Size | 24M | +-----------------------------------------------------------------------------------------------------------------------------------------------+ Combined socket AffinityMask= 0xff00000000000000ff Package 1 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1D | L1D | L1D | L1D | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | OScpu#| 8 72| 9 73| 10 74| 11 75| 12 76| 13 77| 14 78| 15 79| Core | c0_t0 c0_t1| c1_t0 c1_t1| c2_t0 c2_t1| c3_t0 c3_t1| c4_t0 c4_t1| c5_t0 c5_t1| c6_t0 c6_t1| c7_t0 c7_t1| AffMsk| 100 1z18| 200 2z18| 400 4z18| 800 8z18| 1z3 1z19| 2z3 2z19| 4z3 4z19| 8z3 8z19| +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1I | L1I | L1I | L1I | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L2 | L2 | L2 | L2 | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | 256K | 256K | 256K | 256K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L3 | Size | 24M | +-----------------------------------------------------------------------------------------------------------------------------------------------+ Combined socket AffinityMask= 0xff00000000000000ff00 Package 2 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1D | L1D | L1D | L1D | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | OScpu#| 16 80| 17 81| 18 82| 19 83| 20 84| 21 85| 22 86| 23 87| Core | c0_t0 c0_t1| c1_t0 c1_t1| c2_t0 c2_t1| c3_t0 c3_t1| c4_t0 c4_t1| c5_t0 c5_t1| c6_t0 c6_t1| c7_t0 c7_t1| AffMsk| 1z4 1z20| 2z4 2z20| 4z4 4z20| 8z4 8z20| 1z5 1z21| 2z5 2z21| 4z5 4z21| 8z5 8z21| +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1I | L1I | L1I | L1I | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L2 | L2 | L2 | L2 | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | 256K | 256K | 256K | 256K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L3 | Size | 24M | +-----------------------------------------------------------------------------------------------------------------------------------------------+ Combined socket AffinityMask= 0xff00000000000000ffz4 Package 3 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1D | L1D | L1D | L1D | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | OScpu#| 24 88| 25 89| 26 90| 27 91| 28 92| 29 93| 30 94| 31 95| Core | c0_t0 c0_t1| c1_t0 c1_t1| c2_t0 c2_t1| c3_t0 c3_t1| c4_t0 c4_t1| c5_t0 c5_t1| c6_t0 c6_t1| c7_t0 c7_t1| AffMsk| 1z6 1z22| 2z6 2z22| 4z6 4z22| 8z6 8z22| 1z7 1z23| 2z7 2z23| 4z7 4z23| 8z7 8z23| +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1I | L1I | L1I | L1I | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L2 | L2 | L2 | L2 | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | 256K | 256K | 256K | 256K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L3 | Size | 24M | +-----------------------------------------------------------------------------------------------------------------------------------------------+ Combined socket AffinityMask= 0xff00000000000000ffz6 Package 4 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1D | L1D | L1D | L1D | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | OScpu#| 32 96| 33 97| 34 98| 35 99| 36 100| 37 101| 38 102| 39 103| Core | c0_t0 c0_t1| c1_t0 c1_t1| c2_t0 c2_t1| c3_t0 c3_t1| c4_t0 c4_t1| c5_t0 c5_t1| c6_t0 c6_t1| c7_t0 c7_t1| AffMsk| 1z8 1z24| 2z8 2z24| 4z8 4z24| 8z8 8z24| 1z9 1z25| 2z9 2z25| 4z9 4z25| 8z9 8z25| +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1I | L1I | L1I | L1I | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L2 | L2 | L2 | L2 | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | 256K | 256K | 256K | 256K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L3 | Size | 24M | +-----------------------------------------------------------------------------------------------------------------------------------------------+ Combined socket AffinityMask= 0xff00000000000000ffz8 Package 5 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1D | L1D | L1D | L1D | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | OScpu#| 40 104| 41 105| 42 106| 43 107| 44 108| 45 109| 46 110| 47 111| Core | c0_t0 c0_t1| c1_t0 c1_t1| c2_t0 c2_t1| c3_t0 c3_t1| c4_t0 c4_t1| c5_t0 c5_t1| c6_t0 c6_t1| c7_t0 c7_t1| AffMsk| 1z10 1z26| 2z10 2z26| 4z10 4z26| 8z10 8z26| 1z11 1z27| 2z11 2z27| 4z11 4z27| 8z11 8z27| +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1I | L1I | L1I | L1I | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L2 | L2 | L2 | L2 | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | 256K | 256K | 256K | 256K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L3 | Size | 24M | +-----------------------------------------------------------------------------------------------------------------------------------------------+ Combined socket AffinityMask= 0xff00000000000000ffz10 Package 6 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1D | L1D | L1D | L1D | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | OScpu#| 48 112| 49 113| 50 114| 51 115| 52 116| 53 117| 54 118| 55 119| Core | c0_t0 c0_t1| c1_t0 c1_t1| c2_t0 c2_t1| c3_t0 c3_t1| c4_t0 c4_t1| c5_t0 c5_t1| c6_t0 c6_t1| c7_t0 c7_t1| AffMsk| 1z12 1z28| 2z12 2z28| 4z12 4z28| 8z12 8z28| 1z13 1z29| 2z13 2z29| 4z13 4z29| 8z13 8z29| +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1I | L1I | L1I | L1I | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L2 | L2 | L2 | L2 | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | 256K | 256K | 256K | 256K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L3 | Size | 24M | +-----------------------------------------------------------------------------------------------------------------------------------------------+ Combined socket AffinityMask= 0xff00000000000000ffz12 Package 7 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1D | L1D | L1D | L1D | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | OScpu#| 56 120| 57 121| 58 122| 59 123| 60 124| 61 125| 62 126| 63 127| Core | c0_t0 c0_t1| c1_t0 c1_t1| c2_t0 c2_t1| c3_t0 c3_t1| c4_t0 c4_t1| c5_t0 c5_t1| c6_t0 c6_t1| c7_t0 c7_t1| AffMsk| 1z14 1z30| 2z14 2z30| 4z14 4z30| 8z14 8z30| 1z15 1z31| 2z15 2z31| 4z15 4z31| 8z15 8z31| +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L1I | L1I | L1I | L1I | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | 32K | 32K | 32K | 32K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L2 | L2 | L2 | L2 | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | 256K | 256K | 256K | 256K | +-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+ Cache | L3 | Size | 24M | +-----------------------------------------------------------------------------------------------------------------------------------------------+
I’m quite happy to see this enhancement to numactl(8). I’ll try to blog soon on why you should care about this topic.
0 Responses to “Linux Thinks It’s a CPU, But What Is It Really – Part III. How Do Intel Xeon 7500 (Nehalem EX) Processors Map To Linux OS Processors?”