intel_pstate将CoreOS上的Intel Xeon E5-2650 v4的CPU锁定为400 MHz

硬件:

  • 4个Intel HNS2600TPR安装在2根电源线的一个机箱中,
  • 每个都配备1个Intel(R)Xeon(R)CPU E5-2650 v4 @ 2.20GHz,
  • 128 GB RAM。

软件:

  • 运行具有不同版本的CoreOS:

2512.2.0 发布日期:2020年5月19日内核:4.19.123 rkt:1.30.0 docker:18.06.3 etcd:3.3.20 systemd:241点火:0.34.0

2345.3.0 发布日期:2020年3月2日内核:4.19.106 rkt:1.30.0 docker:18.06.3 etcd:3.3.18 systemd:241点火:0.33.0

有时,某些节点在所有CPU内核上会降至400 MHz,如下所示:

sigma01 sigma # cat /proc/cpuinfo
processor       : 23
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping        : 1
microcode       : 0xb000038
cpu MHz         : 412.535
cache size      : 30720 KB
physical id     : 0
siblings        : 24
core id         : 13
cpu cores       : 12
apicid          : 27
initial apicid  : 27
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips        : 4389.81
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
Every 2.0s: cat /proc/cpuinfo | grep MHz                                                                                  sigma01: Fri May 22 13:44:33 2020

cpu MHz         : 422.084
cpu MHz         : 413.291
cpu MHz         : 420.521
cpu MHz         : 421.059
cpu MHz         : 417.286
cpu MHz         : 417.869
cpu MHz         : 419.568
cpu MHz         : 413.913
cpu MHz         : 416.606
cpu MHz         : 416.767
cpu MHz         : 418.188
cpu MHz         : 422.938
cpu MHz         : 413.258
cpu MHz         : 414.553
cpu MHz         : 409.921
cpu MHz         : 407.358
cpu MHz         : 410.833
cpu MHz         : 413.726
cpu MHz         : 417.325
cpu MHz         : 414.957
cpu MHz         : 411.737
cpu MHz         : 415.100
cpu MHz         : 413.458
cpu MHz         : 411.024
sigma03 sigma # ls /sys/devices/system/cpu/cpufreq/policy0/
affected_cpus  cpuinfo_max_freq  cpuinfo_min_freq  cpuinfo_transition_latency  related_cpus  scaling_available_governors  scaling_cur_freq  scaling_driver  scaling_governor  scaling_max_freq  scaling_min_freq  scaling_setspeed

sigma03 sigma # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_driver
intel_pstate

sigma03 sigma # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
<unsupported>
[root@sigma01 ~]# cpupower frequency-info
sh: modprobe: command not found
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 1.20 GHz - 2.90 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 1.20 GHz and 2.90 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 426 MHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes

在BIOS中,对于电源管理,HNS2600TPR设置为“性能”模式,对于风扇,也将其设置为“性能”。

我不确定这些信息是否足以帮助任何人解决此问题,但是一些提示对于如何系统地向下钻取和寻找提示很有用。

当前,4个节点中有3个具有2.5 GHz左右的正常CPU时钟,其中1个节点停留在400 Mhz。某天,其他节点停留在400 MHz。

当CPU处于400 MHz时,它们的温度约为25-30C。

非常感谢帮助!