source: http://isnwiki.jot.com/WikiHome/Articles/500086
Detecting Multi-Core ProcessorsVersion 3, changed by dmsmith1 06/19/2006. Show version history |
||
Created by: admin
Category: Multi-Core In this entry we will show you how to detect the topological relationships between physical package, processor core, and logical processors sharing the same core in a multi-processing platform with IA-32 processors. The algorithm described in this paper applies across many hardware multi-processing configurations, including single-socket and multi-socket platforms, IA-32 processors supporting Hyper-Threading Technology, dual-core and multiple cores. Background: In 2002, IA-32 platform introduced Hyper-Threading Technology, where each physical package provides functionality logically equivalent to two discrete, single-thread-execution processors, but sharing the same processor core in a physical package. You can find more details on this under another entry in the Wiki Counting Physical and Logical 32-bit Processor. In 2005, Intel introduced IA-32 platforms with multi-core technology with the Intel Pentium® processor Extreme Edition, which provides two processor cores, each supporting HT Technology. Thus, hardware support for multi-processing has evolved from multiple discrete sockets, to HT Technology, to multiple cores with HT Technology. Enumerating processor topology correctly is essential for implementing licensing policy requirements. Understanding processor and cache topology information will allow multithreading software to make more efficient use of hardware multithreading resources and deliver optimal performance. Software must recognize hardware multi-processing support in all of these combinations. For licensing purposes, Intel recommends a policy based on discrete physical packages. For performance optimization purposes, software may need to manage physical resources depending on the details of the sharing topology implemented in these various forms of hardware multiprocessing.Hyper-Threading Technology, Multi-Core and Initial APIC ID In a multithreading environment, using IA-32 processors with hardware multithreading support, each logical processor in the platform must have a unique identifier. This is established during platform power-up, and is referred to as “initial APIC ID.” Each initial APIC ID’s value in a multiprocessor (MP) platform is assigned in an orderly manner such that software can extract a topological relationship between an initial APIC ID and the physical package and processor core. (APIC stands for Advanced Programmable Interrupt Controller ) Software can also extract the topological relationships of sibling logical processors sharing the same core or sharing a particular cache level of the cache hierarchy.
In general, each physical package can provide one or more processor cores. The number of logical processors sharing the same core must be derived from information provided by the CPUID(CPU_ID) instruction. With respect to the cache-sharing topology of the same microarchitecture, the number of logical processors sharing a given cache level may vary for each cache level in the cache hierarchy. Between different microarchitectures, the cache sharing topology may also vary. With HT Technology enabled processors, each cache level is shared by the all logical processors sharing a processor core. Software must use the CPUID instruction to query for relevant data for each cache level at runtime. (Details of using CPUID instruction to query initial APIC IDs, logical processor/core/cache configurations can be found in Chapter 3 of IA-32 Intel® Architecture Software Developer’s Manual Vol. 2A.) CPUID Instruction Aside from the initial APIC IDs reported by CPUID instruction, there are three other essential parameters reported by CPUID. These parameters are used to determine the multithreading resource topology of an IA-32 platform:
• Logical Processors per Package (CPUID.1.EBX[23:16]) — Indicates the maximum number of logical processors in a physical package. This represents the hardware capability of the processor as manufactured, and does not necessarily equate to the number of logical processors enabled by the platform bios or operating system. • Cores per Package (2) (CPUID.4.EAX[31:26] + 1) — The maximum number of cores in a physical package is indicated by one plus the decimal value represented by CPUID.4.EAX[31:26]. • Logical Processors Sharing a Cache (CPUID.4.EAX[25:14] + 1) — The maximum number of logical processors in a physical package sharing the target level cache is indicated by one plus the decimal value represented by CPUID.4.EAX[25:14]. Intel supports only homogeneous MP systems. This means that in an MP system, any logical processor from any physical package must report the same values for the three parameters described above. CPUID cannot be called directly from high-level languages such as C or C++. This must be done using assembly language. In this paper, we show sample code that executes the CPUID instruction from C/C++ source code using inline assembly.
Three Level Topology and Initial APIC ID For a single-clustered MP system 3 , the 8-bit value of an initial APIC ID decomposes into three bit fields. The order in which initial APIC IDs are assigned in an IA-32 MP system ensures that the right most bit field represents the maximum number of logical processors sharing the same core. This sub-field is referred to as SMT_ID, and its width is related to the maximum number of unique IDs required to identify each logical processor sharing the same core. This width of this field can be zero, e.g., Pentium D processor does not support Hyper-Threading Technology. Software must use the algorithm described in this paper to determine the width dynamically.
Adjacent to the SMT_ID bit field is a bit field that represents the maximum number of processor cores in a package. This sub-field is referred to as CORE_ID. For a single-core processor, the width of the CORE_ID field is zero.
The remaining bit field can be used to identify a physical package in a non-clustered MP platform. The sub-field is referred to as PACKAGE_ID. Figure 1 shows the layout out of the initial APIC ID (content of CPUID.1.EBX [31:24]). Note the width of each bit field depends on multithreading hardware configurations. Figures 2 through 4 depict the dependence of each bit field position on three different hardware configurations. These three examples illustrate the important point that software must not assume each bit field has a constant width, or exists at all. This paper does not discuss clustered systems. 2 Software must check CPUID for its support of leaf 4 when implementing support for multi-core. If CPUID leaf 4 is not available at runtime, software can handle the situation as if there is only one core per package 3 Typically, a clustered system is made up of many nodes of multi-processor systems. This paper applies to multi-processor systems within the same node in a clustered system. A multi-node clustered system will have a four-level topology and is beyond the scope of this paper.
Figure 1 Generalized bit position layout of initial APIC ID for a non-clustered system With a dual-core system that supports Hyper-Threading Technology, x will be equal to 1 and y will be equal to 1. This is shown in Figure 2.
Figure 2 Bit position layout for a dual-core system configuration supporting Hyper-Threading Technology.
With a dual-core system that does not support Hyper-Threading Technology, x will be equal to 0 and y will be equal to 1. This is shown in Figure 3.
Figure 3 Bit Position Layout for a dual-core system that does not support Hyper-Threading Technology
With a single-core-system supporting Hyper-Threading Technology, x will be equal to 1 and y
will be equal to 0. This is shown in Figure 4. Figure 4 Bit position layout for a single-core system supporting Hyper-Threading Technology An algorithm to map initial APIC ID to three-level topological IDs The following algorithm demonstrates the method by which the initial APIC ID is decomposed into its three topological identifiers. Several support routines used in this algorithm are described in the following text.
Detecting Hardware Support for Multi-Threading The first support routine determines if the current logical processor is contained in a physical package with hardware multi-threading support. This is accomplished by executing the CPUID instruction with register EAX set equal to one. If bit 28 of the returned value in register EDX is set, the associated physical package supports hardware multithreading. Note that this only asserts that a physical package is capable of hardware multithreading. It does not imply that hardware multithreading is enabled, or indicate the number of logical processors in the package enabled by system software. // // The function returns 0 when the hardware multi-threaded bit is // not set. // #define HWD_MT_BIT (1 << 28) // more descriptive unsigned int is_HWMT_Supported( void ) { unsigned int Regedx = 0; if ((CpuIDSupported() >= 1) && GenuineIntel()) { __asm { mov eax, 1 cpuid mov Regedx, edx } } return (Regedx & HWD_MT_BIT); } // // CpuIDSupported will return 0 if CPUID instruction is // unavailable. // Otherwise, it will return // the maximum supported standard function. // unsigned int CpuIDSupported( void ) { unsigned int MaxInputValue =0; __try // If CPUID instruction is supported { __asm { xor eax, eax // call cpuid with eax = 0 cpuid mov MaxInputValue, eax } } __except (EXCEPTION_EXECUTE_HANDLER) { return (0); // cpuid instruction is unavailable } return MaxInputValue; } // // GenuineIntel will return 0 if the processor is not a Genuine // Intel Processor // unsigned int GenuineIntel( void ) { unsigned int VendorID[3] = {0, 0, 0}; __try // If CPUID instruction is supported { __asm { xor eax, eax // call cpuid with eax = 0 cpuid // Get vendor id string mov VendorID, ebx mov VendorID + 4, edx mov VendorID + 8, ecx } } __except (EXCEPTION_EXECUTE_HANDLER) { return (0); // cpuid instruction is unavailable } return ( (VendorID[0] == 'uneG') && (VendorID[1] == 'Ieni') && (VendorID[2] == 'letn')); } Determining Maximum Logical Processors per Physical Package (Socket) The second support routine determines the maximum number of logical processors in a physical package. This parameter is obtained by executing CPUID with the register EAX set equal to one and storing bits 16 to 23 of register EBX on return. If an IA-32 processor does not have hardware multithreading support, the value in CPUID.1.EBX[23:16] is reserved. However, in this case, one can treat this as a discrete processor containing a maximum of one logical processor per package. Note that the number of logical processors obtained here is the maximum number of logical processors supported by this physical package. The actual number of logical processors enabled by system software and made available to applications may be less.
#define NUM_LOGICAL_BITS 0xFF0000 //(or (0xFF << 16)) unsigned GetMaxNumLPperPackage(void) { unsigned int reg_ebx = 0; if (!is_HWMT_supported()) return (unsigned char) 1; __asm { mov eax, 1 cpuid mov reg_ebx, ebx } return (unsigned) ((reg_ebx & NUM_LOGICAL_BITS) >> 16); }
Determining Maximum Number of Cores in a Package The third support routine determines the maximum number of cores per physical package. This parameter is obtained by executing the CPUID instruction with the input value of 4 in EAX and 0 in ECX, followed by storing the returned decimal value of EAX[31:26] and incrementing it by one. If a processor does not support theCPUID.4 leaf, then software must assume this is a single-core processor 4 . Note that if the maximum number of cores in a physical package is greater than one, it only indicates that the physical package provides more than one core. The actual number of cores enabled by system software and made available to application may be less.
#define NUM_CORE_BITS 0xfc000000 //(or (0xFC << 26) ) unsigned GetMaxNumCoresPerPackage(void) { unsigned int reg_eax = 0; if (!is_HWMT_supported()) { // must be single-core return (unsigned) 1;
4 Software must check for the availability of CPUID leaf 4 before querying for information using CPUID leaf 4. If a BIOS allows the end user to configure a multi-core processor to work with an older operating system that is not compatible with the presence of CPUID leaf 4, the user must ensure that the proper configuration of the platform is enforced by restoring the BIOS setting to default such that CPUID leaf 4 is available. The proper configuration and optimal performance of modern OS supporting multi-core processors relies on CPUID leaf 4 to be available.
} __asm { mov eax, 0 // how many leaves does cpuid support? cpuid cmp eax, 4 // does cpuid support leaf 4 jl single_core // if not, must be single core mov eax, 4 // call cpuid with eax = 4 mov ecx, 0 // start with 1st level using index = 0 cpuid mov reg_eax, eax // Has info on number of cores jmp multi_core single_core: mov reg_eax, 0 // must be single core multi_core: } return (unsigned ) ((reg_eax & NUM_CORE_BITS) >> 26)+1; }
Determining the Width of a Bit Field An important part of the algorithm is to determine the width of each bit field. This can be accomplished using a general purpose support routine that determines the width of a bit field based on the maximum number of unique identifiers that bit field can represent. The input value to determine the width of the SMT_ID bit field is the “maximum number of logical processors sharing the same core,” and the input value to determine the width of the CORE_ID bit field is the “maximum number of cores per physical package.” One should not assume that the number of available threads or cores will be a power of two.
unsigned find_maskwidth(unsigned count_item) { unsigned int mask_width, cnt = count_item; __asm { mov eax, cnt mov ecx, 0 mov mask_width, ecx dec eax bsr cx, ax jz next inc cx mov mask_width, ecx next: mov eax, mask_width } return mask_width; }
Collecting the Initial APIC IDs of Each Logical Processor Before putting the algorithm together, we must retrieve the initial APIC ID of each logical processor in the platform. This requires using an OS-specific processor affinity service and using an affinity mask to bind the current execution thread to a specific logical processor. Once the current thread has successfully affinitized to the specific logical processor, the following code will retrieve the initial APIC ID of the logical processor that this code is currently running on:
#define INITIAL_APIC_ID_BITS 0xff000000 //(or (0xFF << 24) ) unsigned char GetInitialApicId (void) { unsigned int reg_ebx = 0; __asm { mov eax, 1 // call cpuid with eax = 1 cpuid mov reg_ebx, ebx // Has APIC ID info } return (unsigned char) ((reg_ebx & INITIAL_APIC_ID_BITS) >> 24); }
Extracting a Bit Field from an 8-bit ID The next routine provides the finishing touch to complete the support routines needed to decompose an 8-bit initial APIC ID into three topological identifiers. A subset of bits in an 8-bit initial APIC ID can be extracted using the appropriate bit mask and shift value. The code below allows one to extract from an 8-bit “Full_ID” a subset of bits using two other input parameters. The input parameter “ MaxSubIDvalue ” determines the width of bit field to extract from the “Full_ID.” The input parameter “Shift_Count” specifies an offset from the right most bit of the 8-bit “Full_ID.”
// // This routine extracts a subset of bit fields from the 8-bit Full_ID // The return value, or subID, is a non-zero-based, 8-bit value // unsigned char GetNzbSubID(unsigned char Full_ID, unsigned char MaxSubIDvalue, unsigned char Shift_Count) { unsigned MaskWidth; unsigned char SubID, MaskBits; MaskWidth = find_maskwidth((unsigned ) MaxSubIDvalue); MaskBits = ((unsigned char) (0xff << Shift_Count)) ^ ((unsigned char) (0xff << (Shift_Count + MaskWidth))) ; SubID = Full_ID & MaskBits; return SubID; }
Putting Everything Together to Extract IDs for the Three-Level Topology Under an MP-aware OS, an application can assemble a list of all the initial APIC IDs of logical processors that are enabled by OS and made available to applications. Although there is a one-to-one mapping between each bit in an affinity mask to each unique APIC ID value, the ordering of affinity mask bits and the numerical values of initial APIC IDs are platform-specific. Because the OS allows applications to manage logical processors via affinity masks, the algorithm we describe here will construct two tables:
These two tables are built by first extracting the three-level identifiers from the initial APIC IDs of each logical processor. The code below illustrates how to use the support routines described previously to assemble individual tables that stores the sub-field ID of each level and for each logical processor.
To enumerate the processor and cache topology of all logical processors visible to an application, application software must rely on OS-specific services, such as affinity APIs, to bind the current execution context to each logical processor. The code example below uses Win32 APIs to manage processor affinity. The tables below demonstrate the relationships between processor/cache topology information with OS-specific affinity constructs.
AFFINITY_MASK dwProcessAffinity; AFFINITY_MASK dwSystemAffinity; int j, numLP_enabled, MaxLPPerCore; unsigned char apicId; unsigned char PackageIDMask; unsigned char tblPkg_ID[256]; unsigned char tblCore_ID[256]; unsigned char tblSMT_ID[256]; GetProcessAffinityMask( GetCurrentProcess(), &dwProcessAffinity, &dwSystemAffinity); if (dwProcessAffinity != dwSystemAffinity) { printf ("Not all logical processors in the platform are enabled for this process. n"); } j = 0; dwAffinityMask = 1; numLP_enabled = 0; // // This algorithm assumes that core within a package has the // same number of logical processors. // It does not assume that the value returned by // MaxLPPerPackage()or MaxCoresPerPackage() is a power of 2. // MaxLPPerCore = GetMaxNumLPperPackage()/GetMaxNumCoresPerPackage(); while (dwAffinityMask && dwAffinityMask <= dwSystemAffinity) { if (SetThreadAffinityMask(GetCurrentThread(), dwAffinityMask)) { Sleep(0); // Ensure this thread is on the affinitized CPU apicId = GetInitialApicId(); // // store SMT_ID and Core_ID of each logical processor // Shift value for SMT_ID is 0 // Shift value for Core_ID is the mask width for maximum // logical processors per core // tblSMT_ID[j] = GetNzbSubID(apicId, MaxLPPerCore, 0); tblCore_ID[j] = GetNzbSubID( apicId, GetMaxNumCoresPerPackage(), find_maskwidth(MaxLPPerCore)); // // Extract PACKAGE_ID: // Assume single cluster. // Shift value is mask width for maximum logical processors // per package // PackageIDMask = ((unsigned char) (0xff << find_maskwidth(GetMaxNumLPperPackage()))); tblPkg_ID[j] = apicId & PackageIDMask; numLP_enabled ++; } j++; dwAffinityMask = 1 << j; } // // numLP_enabled contains the number logical processors in the platform // that are enabled by system software and available for applications //
Counting Physical Packages Enabled in the Platform Once we have the three-level topological IDs of each logical processors enabled in the platform, we can create an affinity mask to represent the sibling logical processors residing in the same physical package. The code below sorts out the initial APIC IDs in the platform and puts those initial APIC IDs with identical PACKAGE_ID into the same group, and updates the affinity mask associated with each distinct PACKAGE_ID. // // pPkgMask points to a buffer allocated by the caller // NumStartedLPs is an integer supplied by the caller after the // caller had assembled three tables of PKG_ID, CORE_ID and SMT_ID // DWORD pPkgMask[256]; unsigned char PackageIDBucket[256]; unsigned ProcessorMask; int ProcessorNum; int i, PkgNum = 1; PackageIDBucket[0] = tblPkg_ID[0]; ProcessorMask = 1; pPkgMask[0] = ProcessorMask; for (ProcessorNum = 1; ProcessorNum < NumStartedLPs; ProcessorNum++) { ProcessorMask <<= 1; for (i=0; i < PkgNum; i++) { // // we may be comparing bit-fields of logical processors // residing in different packages, the code below assumes // that the bit-masks are the same on all processors in // the system // if ( tblPkg_ID[ProcessorNum] == PackageIDBucket[i]) { pPkgMask[i] |= ProcessorMask; break; } } if (i == PkgNum) { // // Did not match any bucket, start new bucket // PackageIDBucket[i] = tblPkg_ID[ProcessorNum] ; pPkgMask[i] = ProcessorMask; PkgNum++; } } // // PkgNum has the actual number of physical packages enabled in // the platform // pPkgMask[i] has the affinity mask of the sibling logical processors // for the i’th package //
Counting Processor Cores Enabled in the Platform The procedure to count OS-enabled processor cores in a platform is similar to the previous routine. Here, we compare both PACKAGE_ID and CORE_ID to distinguish whether more than one logical processors are siblings in the same core. The code below sorts out the initial APIC IDs in the platform and puts those initial APIC IDs with identical values of for PACKAGE_ID and CORE_ID into the same group and updates the affinity mask associated with each distinct core. // // pCoreProcessorMask points to a buffer allocated by the caller // NumStartedLPs is an integer supplied by the caller after the // caller had assembled three tables of PKG_ID, CORE_ID and SMT_ID // DWORD pCoreProcessorMask[256]; unsigned char CoreIDBucket [256]; unsigned ProcessorMask; int ProcessorNum; int i, CoreNum = 1; CoreIDBucket[0] = tblPkg_ID[0] | tblCore_ID[0]; ProcessorMask = 1; pCoreProcessorMask[0] = ProcessorMask; for (ProcessorNum = 1; ProcessorNum < NumStartedLPs; ProcessorNum++) { ProcessorMask <<= 1; for (i=0; i < CoreNum; i++) { // // we may be comparing bit-fields of logical processors // residing in different packages, the code below assumes // that the bit-masks are the same on all processors in // the system // if (( tblPkg_ID[ProcessorNum] | tblCore_ID[ProcessorNum]) == CoreIDBucket[i]) { pCoreProcessorMask[i] |= ProcessorMask; break; } } if (i == CoreNum) { // // Did not match any bucket, start new bucket // CoreIDBucket[i] = tblPkg_ID[ProcessorNum] | tblCore_ID[ProcessorNum]; pCoreProcessorMask[i] = ProcessorMask; CoreNum++; } } // // CoreNum has the actual number of cores enabled in the platform // pCoreProcessorMask[i] has the affinity mask of the sibling // logical processors for the i’th core //
5 The source code listing can be compiled using Linux* kernel verison 2.6 or higher (e.g. RH 4AS-2.8 using GCC 3.4.4). Due to syntax variances of Linux affinity APIs with earlier kernel version and dependence on glibc library versions, compilation on Linux environment with older kernels and compilers may require kernel patches or compiler upgrades. The following example will show you how to tie all of the above code snippets together. The example will examine every logical processor visible to the running process and enumerate the processor topology and cache topology of these logical processors. The status multi-core and hyper-threading capability in the platform is also summarized. The source code of the example is listed and can be compiled in a Win32 environment as well as Linux* environment 5 . Some compilers do not support inline assembly; it is left as an exercise for the reader to complete compiler-specific customizations.
Note : if the reader wishes to create a cpp file by copying the source listing below, please ensure only plain ascii text are being pasted into a standard ascii file. // // Copyright (c) 2005 Intel Corporation // All rights reserved // // // CpuCount.cpp : Detects hardware multi-threading topology on IA-32 platforms: // Multi-processor, Multi-core, and HyperThreading Technology. // This application enumerates the HW topology of the logical processors // enabled by OS and BIOS by using information provided by CPUID instruction. // The relevant topology can be identified using a three level decomposition of // "initial APIC ID" into Package_id, core_id, and SMT_id. Such decomposition // provides a three-level map of the topology of hardware resources. This // allows multi-threaded software to manage shared hardware resources in the // platform to reduce resource contention // // Multicore detection algorithm for processor and cache topology requires // all leaf functions of CPUID instructions be available. System administrator // must ensure BIOS settings is not configured to restrict CPUID functionalities. //--------------------------------------------------------------------- -------- // // cpuid(EAX=1)EDX[28] set if HT or multi-core capable // #define HWD_MT_BIT 0x10000000 // // cpuid(EAX=1)EBX[23:16] contains the maximum number of logical processors // in physical processor // #define NUM_LOGICAL_BITS 0x00FF0000 // // cpuid(EAX=4, ECX=0)EAX[31:26] contains the maximum number of cores // in physical processor // #define NUM_CORE_BITS 0xFC000000 // eax set to 4. // // cpuid(EAX=1)EBX[24:31] is the 8 bit initial APIC ID of the logical processor // #define INITIAL_APIC_ID_BITS 0xFF000000 // // Status Flags // #define SINGLE_CORE_AND_HT_ENABLED 1 #define SINGLE_CORE_AND_HT_DISABLED 2 #define SINGLE_CORE_AND_HT_NOT_CAPABLE 4 #define MULTI_CORE_AND_HT_NOT_CAPABLE 5 #define MULTI_CORE_AND_HT_ENABLED 6 #define MULTI_CORE_AND_HT_DISABLED 7 #define USER_CONFIG_ISSUE 8 unsigned int CpuIDSupported(void); unsigned int GenuineIntel(void); unsigned int HWD_MTSupported(void); unsigned int MaxLogicalProcPerPhysicalProc(void); unsigned int MaxCorePerPhysicalProc(void); unsigned int find_maskwidth(unsigned int); unsigned char GetAPIC_ID(void); unsigned char GetNzbSubID(unsigned char, unsigned char, unsigned char); unsigned char CPUCount(unsigned int *, unsigned int *, unsigned int *); // Define constant “LINUX” to compile under Linux #ifdef LINUX // The Linux source code listing can be compiled using Linux kernel verison 2.6 // or higher (e.g. RH 4AS-2.8 using GCC 3.4.4). // Due to syntax variances of Linux affinity APIs with earlier kernel versions // and dependence on glibc library versions, compilation on Linux environment // with older kernels and compilers may require kernel patches or compiler upgrades. #include <stdlib.h> #include <unistd.h> #include <string.h> #include <sched.h> #define DWORD unsigned long #else #include <windows.h> #endif #include <stdio.h> #include <assert.h> char g_s3Levels[2048]; int main(void) { unsigned int TotAvailLogical = 0, // # of available logical CPU per CORE TotAvailCore = 0, // # of available cores per physical // processor PhysicalNum = 0; // # of available physical processors unsigned char StatusFlag = 0; int MaxLPPerCore; if (CpuIDSupported() < 4) { // CPUID does not report leaf 4 information printf("User Warning: CPUID Leaf 4 is not supported or disabled. Please check BIOS and correct system configuration error if leaf 4 is disabled. n"); } StatusFlag = CPUCount(&TotAvailLogical, &TotAvailCore, &PhysicalNum); if ( USER_CONFIG_ISSUE == StatusFlag) { printf("User Configuration Error: Not all logical processors" " in the system are enabled n" "while running this process. Please rerun this application" " after make corrections. n"); exit(1); } printf("n----Counting Hardware MultiThreading Capabilities and" " Availability ---------- nn"); printf("This application displays information on three forms of hardware " "multithreadingn"); printf("capability and their availability to apps. The three forms of" " capabilities are:n"); printf("multi-processor (MP), Multi-core (core), and HyperThreading " "Technology (HT).n"); printf("nHardware capability results represents the maximum number" " provided in hardware.n"); printf("Note, Bios/OS or experienced user can make configuration changes" " resulting in n"); printf("less-than-full HW capabilities are available to applications.n"); printf("For best result, the operator is responsible to configure the" " BIOS/OS such thatn"); printf("full hardware multi-threading capabilities are enabled.n"); printf("n--------------------------------------------------------- - n"); printf("nCapabilities:nn"); switch(StatusFlag) { case MULTI_CORE_AND_HT_NOT_CAPABLE: printf("tHyper-Threading Technology: not capable nt" "Multi-core: Yes ntMulti-processor: "); if (PhysicalNum > 1) printf("yesn"); else printf("Non"); break; case SINGLE_CORE_AND_HT_NOT_CAPABLE: printf("tHyper-Threading Technology: Not capable nt" "Multi-core: No ntMulti-processor: "); if (PhysicalNum > 1) printf("yesn"); else printf("Non"); break; case SINGLE_CORE_AND_HT_DISABLED: printf("tHyper-Threading Technology: Disabled nt" "Multi-core: No ntMulti-processor: "); if (PhysicalNum > 1) printf("yesn"); else printf("Non"); break; case SINGLE_CORE_AND_HT_ENABLED: printf("tHyper-Threading Technology: Enabled nt" "Multi-core: No ntMulti-processor: "); if (PhysicalNum > 1) printf("yesn"); else printf("Non"); break; case MULTI_CORE_AND_HT_DISABLED: printf("tHyper-Threading Technology: Disabled nt" "Multi-core: Yes ntMulti-processor: "); if (PhysicalNum > 1) printf("yesn"); else printf("Non"); break; case MULTI_CORE_AND_HT_ENABLED: printf("tHyper-Threading Technology: Enabled nt" "Multi-core: Yes ntMulti-processor: "); if (PhysicalNum > 1) printf("yesn"); else printf("Non"); break; } printf("nnHardware capability and its availability to applications: n"); printf("n System wide availability: %d physical processors, %d cores," "%d logical processorsn", PhysicalNum, TotAvailCore, TotAvailLogical); MaxLPPerCore = MaxLogicalProcPerPhysicalProc() / MaxCorePerPhysicalProc() ; printf(" Multi-core capabililty : Maximum %d cores per package n", MaxCorePerPhysicalProc()); printf(" HT capability: Maximum %d logical processors per core n", MaxLPPerCore); assert (PhysicalNum * MaxCorePerPhysicalProc() >= TotAvailCore); assert (PhysicalNum * MaxLogicalProcPerPhysicalProc() >= TotAvailLogical); if ( PhysicalNum * MaxCorePerPhysicalProc() > TotAvailCore) { printf("n Not all cores in the system are enabled for this " "application.n"); } else { printf("n All cores in the system are enabled for this " "application.n"); } printf("nnRelationships between OS affinity mask, Initial APIC ID," " and 3-level sub-IDs: n"); printf("n%s", g_s3Levels); printf("nnPress Enter To Continuen"); getchar(); return 0; } // // CpuIDSupported() will return 0 if CPUID instruction is unavailable. // Otherwise, it will return the maximum supported standard function. // unsigned int CpuIDSupported(void) { unsigned int MaxInputValue =0; // If CPUID instruction is supported #ifdef LINUX try { MaxInputValue = 0; // call cpuid with eax = 0 asm ( "xorl %%eax,%%eaxnt" "cpuidnt" : "=a" (MaxInputValue) : : "%ebx", "%ecx", "%edx" ); } catch (...) { return (0); // cpuid instruction is unavailable } #else //Win32 try { __asm { xor eax, eax // call cpuid with eax = 0 cpuid mov MaxInputValue, eax } } catch (…) { return(0); // cpuid instruction is unavailable } #endif return MaxInputValue; } // // GenuineIntel will return 0 if the processor is not a Genuine Intel Processor // unsigned int GenuineIntel(void) { #ifdef LINUX unsigned int VendorIDb = 0,VendorIDd = 0, VendorIDc = 0; try // If CPUID instruction is supported { // Get vendor id string asm ( //get the vendor string // call cpuid with eax = 0 "xorl %%eax, %%eaxnt" "cpuidnt" : "=b" (VendorIDb), "=d" (VendorIDd), "=c" (VendorIDc) : : "%eax" ); } catch (...) { return (0); // cpuid instruction is unavailable } return ( (VendorIDb == 'uneG') && (VendorIDd == 'Ieni') && (VendorIDc == 'letn')); #else unsigned int VendorID[3] = {0, 0, 0}; try // If CPUID instruction is supported { __asm { xor eax, eax // call cpuid with eax = 0 cpuid // Get vendor id string mov VendorID, ebx mov VendorID + 4, edx mov VendorID + 8, ecx } } catch (…) { return(0); // cpuid instruction is unavailable } return ( (VendorID[0] == 'uneG') && (VendorID[1] == 'Ieni') && (VendorID[2] == 'letn')); #endif } // // MaxCorePerPhysicalProc() returns the maximum cores per physical package. // Note that the number of AVAILABLE cores per physical to be used by an // application might be less than this maximum value. // unsigned int MaxCorePerPhysicalProc(void) { unsigned int Regeax = 0; if (!HWD_MTSupported()) return (unsigned int) 1; // Single core #ifdef LINUX { asm ( "xorl %eax, %eaxnt" "cpuidnt" "cmpl $4, %eaxnt" // check if cpuid supports leaf 4 "jl .single_corent" // Single core "movl $4, %eaxnt" "movl $0, %ecxnt" // start with index = 0; Leaf 4 reports ); // at least one valid cache level asm ( "cpuid" : "=a" (Regeax) : : "%ebx", "%ecx", "%edx" ); asm ( "jmp .multi_coren" ".single_core:nt" "xor %eax, %eaxn" ".multi_core:" ); } #else __asm { xor eax, eax cpuid cmp eax, 4 // check if cpuid supports leaf 4 jl single_core // Single core if no leaf 4 mov eax, 4 mov ecx, 0 // start with index = 0; Leaf 4 reports cpuid // at least one valid cache level mov Regeax, eax jmp multi_core single_core: xor eax, eax multi_core: } #endif return (unsigned int)((Regeax & NUM_CORE_BITS) >> 26)+1; } // // HWD_MTSupported() returns 0 when the hardware multi-threaded bit is not set. // unsigned int HWD_MTSupported(void) { unsigned int Regedx = 0; if ((CpuIDSupported() >= 1) && GenuineIntel()) { #ifdef LINUX asm ( "movl $1,%%eaxnt" "cpuid" : "=d" (Regedx) : : "%eax","%ebx","%ecx" ); #else __asm { mov eax, 1 cpuid mov Regedx, edx } #endif } return (Regedx & HWD_MT_BIT); } // // MaxLogicalProcPerPhysicalProc() returns the maximum logical processors per // physical package. Note that the number of AVAILABLE logical processors per // physical to be used by an application might be less than this // maximum value. // unsigned int MaxLogicalProcPerPhysicalProc(void) { unsigned int Regebx = 0; if (!HWD_MTSupported()) return (unsigned int) 1; #ifdef LINUX asm ( "movl $1,%%eaxnt" "cpuid" : "=b" (Regebx) : : "%eax","%ecx","%edx" ); #else __asm { mov eax, 1 cpuid mov Regebx, ebx } #endif return (unsigned int) ((Regebx & NUM_LOGICAL_BITS) >> 16); } // // GetAPIC_ID() returns the initial APIC ID of the processor it executes on // unsigned char GetAPIC_ID(void) { unsigned int Regebx = 0; #ifdef LINUX asm ( "movl $1, %%eaxnt" "cpuid" : "=b" (Regebx) : : "%eax","%ecx","%edx" ); #else __asm { mov eax, 1 cpuid mov Regebx, ebx } #endif return (unsigned char) ((Regebx & INITIAL_APIC_ID_BITS) >> 24); } // // find_maskwidth() returns the number of bits required to represent the // argument CountItem // unsigned int find_maskwidth(unsigned int CountItem) { unsigned int MaskWidth, count = CountItem; #ifdef LINUX asm ( #ifdef __x86_64__ // define constant to compile "push %%rcxnt" // under 64-bit Linux "push %%raxnt" #else "pushl %%ecxnt" "pushl %%eaxnt" #endif //"movl $count, %%eaxnt" //done by Assembler below "xorl %%ecx, %%ecx" //"movl %%ecx, MaskWidthnt" //done by Assembler below : "=c" (MaskWidth) : "a" (count) //: "%ecx", "%eax" We don't list these as clobbered because we don't // want the assembler to put them back when we are done ); asm ( "decl %%eaxnt" "bsrw %%ax,%%cxnt" "jz nextnt" "incw %%cxnt" //"movl %%ecx, MaskWidthn" //done by Assembler below : "=c" (MaskWidth) : ); asm ( "next:nt" #ifdef __x86_64__ "pop %raxnt" "pop %rcx" #else "popl %eaxnt" "popl %ecx" #endif ); #else __asm { mov eax, count mov ecx, 0 mov MaskWidth, ecx dec eax bsr cx, ax jz next inc cx mov MaskWidth, ecx next: } #endif return MaskWidth; } // // GetNzbSubID() extracts the sub bit field of maximum value MaxSubIDValue // at bit position ShiftCount from the 8-bit value FullID. // It returns the 8-bit sub ID value // unsigned char GetNzbSubID(unsigned char FullID, unsigned char MaxSubIDValue, unsigned char ShiftCount) { unsigned int MaskWidth; unsigned char MaskBits; MaskWidth = find_maskwidth((unsigned int) MaxSubIDValue); MaskBits = (0xff << ShiftCount) ^ ((unsigned char) (0xff << (ShiftCount + MaskWidth))); return (FullID & MaskBits); } // // CPUCount() returns the total number of available Logical processors // cores and physical processors in the platform // unsigned char CPUCount(unsigned int *TotAvailLogical, unsigned int *TotAvailCore, unsigned int *PhysicalNum) { unsigned char StatusFlag = 0; unsigned int numLPEnabled = 0; DWORD dwAffinityMask; int j = 0, MaxLPPerCore; unsigned char apicID, PackageIDMask; unsigned char tblPkgID[256], tblCoreID[256], tblSMTID[256]; char tmp[256]; unsigned int i, ProcessorNum; g_s3Levels[0] = 0; *TotAvailCore = 1; *PhysicalNum = 1; #ifdef LINUX //we need to make sure that this process is allowed to run on //all of the logical processors that the OS itself can run on. //A process could acquire/inherit affinity settings that restricts the // current process to run on a subset of all logical processor visible to OS. // Linux doesn't easily allow us to look at the Affinity Bitmask directly, // but it does provide an API to test affinity maskbits of the current process // against each logical processor visible under OS. int sysNumProcs = sysconf(_SC_NPROCESSORS_CONF); //This will tell us how many //CPUs are currently enabled. //this will tell us which processors this process can run on. cpu_set_t allowedCPUs; sched_getaffinity(0, sizeof (allowedCPUs), &allowedCPUs); for ( int i = 0; i < sysNumProcs; i++ ) { if ( CPU_ISSET(i, &allowedCPUs) == 0 ) { StatusFlag = USER_CONFIG_ISSUE; return StatusFlag; } } #else DWORD dwProcessAffinity, dwSystemAffinity; GetProcessAffinityMask(GetCurrentProcess(), &dwProcessAffinity, &dwSystemAffinity); if (dwProcessAffinity != dwSystemAffinity) // not all CPUs are enabled { StatusFlag = USER_CONFIG_ISSUE; return StatusFlag; } #endif // // Assume that cores within a package have the SAME number of // maximum logical processors. Values returned by // MaxLogicalProcPerPhysicalProc and MaxCorePerPhysicalProc do not have // to be power of 2. // MaxLPPerCore = MaxLogicalProcPerPhysicalProc() / MaxCorePerPhysicalProc(); dwAffinityMask = 1; #ifdef LINUX cpu_set_t currentCPU; while ( j < sysNumProcs ) { CPU_ZERO(¤tCPU); CPU_SET(j, ¤tCPU); if ( sched_setaffinity (0, sizeof (currentCPU), ¤tCPU) == 0 ) { sleep(0); // Ensure system to switch to the right CPU #else while (dwAffinityMask && dwAffinityMask <= dwSystemAffinity) { if (SetThreadAffinityMask(GetCurrentThread(), dwAffinityMask)) { Sleep(0); // Ensure system to switch to the right CPU #endif apicID = GetAPIC_ID(); // // Store SMT ID and core ID of each logical processor // Shift vlaue for SMT ID is 0 // Shift value for core ID is the mask width for maximum logical // processors per core // tblSMTID[j] = GetNzbSubID(apicID, (unsigned char) MaxLPPerCore, 0); tblCoreID[j] = GetNzbSubID(apicID, (unsigned char) MaxCorePerPhysicalProc(), (unsigned char) find_maskwidth(MaxLPPerCore)); // // Extract package ID, assume single cluster. // Shift value is the mask width for max Logical per package // PackageIDMask = (unsigned char) (0xff << find_maskwidth(MaxLogicalProcPerPhysicalProc())); tblPkgID[j] = apicID & PackageIDMask; numLPEnabled ++; // increment count of available logical // processors in the system. sprintf(tmp," AffinityMask = %d; Initial APIC = %d;" " Physical ID = %d, Core ID = %d, SMT ID = %dn", dwAffinityMask, apicID, tblPkgID[j], tblCoreID[j], tblSMTID[j]); strcat(g_s3Levels, tmp); } // if j++; dwAffinityMask = 1 << j; } // while // restore the affinity setting to its original state #ifdef LINUX sched_setaffinity (0, sizeof (allowedCPUs), &allowedCPUs); #else SetThreadAffinityMask(GetCurrentThread(), dwProcessAffinity); #endif *TotAvailLogical = numLPEnabled; { // // Count available cores (TotAvailCore) in the system // unsigned char CoreIDBucket[256]; DWORD ProcessorMask, pCoreMask[256]; CoreIDBucket[0] = tblPkgID[0] | tblCoreID[0]; ProcessorMask = 1; pCoreMask[0] = ProcessorMask; for (ProcessorNum = 1; ProcessorNum < numLPEnabled; ProcessorNum++) { ProcessorMask <<= 1; for (i = 0; i < *TotAvailCore; i++) { // // Comparing bit-fields of logical processors residing in // different packages // Assumes that the bit-masks are the same on all processors // in the system. // if ((tblPkgID[ProcessorNum] | tblCoreID[ProcessorNum]) == CoreIDBucket[i]) { pCoreMask[i] |= ProcessorMask; break; } } // for i // // Did not match any bucket. Start a new one. // if (i == *TotAvailCore) { CoreIDBucket[i] = tblPkgID[ProcessorNum] | tblCoreID[ProcessorNum]; pCoreMask[i] = ProcessorMask; (*TotAvailCore)++; // Increment count of available cores // in the system } } // for ProcessorNum } { // // Count physical processor (PhysicalNum) in the system // unsigned char PackageIDBucket[256]; DWORD pPackageMask[256], ProcessorMask; PackageIDBucket[0] = tblPkgID[0]; ProcessorMask = 1; pPackageMask[0] = ProcessorMask; for (ProcessorNum = 1; ProcessorNum < numLPEnabled; ProcessorNum++) { ProcessorMask <<= 1; for (i = 0; i < *PhysicalNum; i++) { // // Comparing bit-fields of logical processors residing in // different packages // Assuming the bit-masks are the same on all processors // in the system. if (tblPkgID[ProcessorNum]== PackageIDBucket[i]) { pPackageMask[i] |= ProcessorMask; break; } } // for i // // Did not match any bucket. Start a new one. // if (i == *PhysicalNum) { PackageIDBucket[i] = tblPkgID[ProcessorNum]; pPackageMask[i] = ProcessorMask; (*PhysicalNum)++; // Increment count of available physical // processors in the system } } // for ProcessorNum } // // Check to see if the system is multi-core // Check if the system is hyper-threading // if (*TotAvailCore > *PhysicalNum) { // Multi-core if (MaxLPPerCore == 1) StatusFlag = MULTI_CORE_AND_HT_NOT_CAPABLE; else if (numLPEnabled > *TotAvailCore) StatusFlag = MULTI_CORE_AND_HT_ENABLED; else StatusFlag = MULTI_CORE_AND_HT_DISABLED; } else { // Single-core if (MaxLPPerCore == 1) StatusFlag = SINGLE_CORE_AND_HT_NOT_CAPABLE; else if (numLPEnabled > *TotAvailCore) StatusFlag = SINGLE_CORE_AND_HT_ENABLED; else StatusFlag = SINGLE_CORE_AND_HT_DISABLED; } return StatusFlag; }
About the Authors Khang Nguyen is Applications Engineer working with Intel's Software and Solutions Group. He can be reached at khang.t.nguyen@intel.com .
Shihjong Kuo is Senior Technical Marketing Engineer in Intel’s Digital Enterprise Group. He can be reached at shihjong.kuo@intel.com |