Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17353

cfs_cpu_dead() Lustre: can't support CPU plug-out well now

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      Occasionally seen in the logs that CPU cores are being deactivated on the system:

      [ 200.214057] LNet: 5394:0:(libcfs_cpu.c:1133:cfs_cpu_dead()) Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU 32]
      [ 200.231635] LNet: 5394:0:(libcfs_cpu.c:1133:cfs_cpu_dead()) Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU 33]
      [ 200.249606] LNet: 5394:0:(libcfs_cpu.c:1133:cfs_cpu_dead()) Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU 34]

      I suspect this is mainly a client issue, but could eventually be hit on servers in a cloud environment.

      It would be good to handle this situation better than just printing an error message. In particular, stop ptlrpcd threads running on those cores if the entire CPT is removed, so they don't continue to burn cycles. Also, recompute the CPT count.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: