Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17353

cfs_cpu_dead() Lustre: can't support CPU plug-out well now

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

      Occasionally seen in the logs that CPU cores are being deactivated on the system:

      [ 200.214057] LNet: 5394:0:(libcfs_cpu.c:1133:cfs_cpu_dead()) Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU 32]
      [ 200.231635] LNet: 5394:0:(libcfs_cpu.c:1133:cfs_cpu_dead()) Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU 33]
      [ 200.249606] LNet: 5394:0:(libcfs_cpu.c:1133:cfs_cpu_dead()) Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU 34]

      I suspect this is mainly a client issue, but could eventually be hit on servers in a cloud environment.

      It would be good to handle this situation better than just printing an error message. In particular, stop ptlrpcd threads running on those cores if the entire CPT is removed, so they don't continue to burn cycles. Also, recompute the CPT count.

            wc-triage WC Triage
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: