Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16316

ZFS OSS locks

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • Lustre: 2.15.0_RC3, zfs 2.0.7 (both self-compiled)
      OS: Centos 8.5, kernel 4.18.0-348.7.1.el8_5.x86_64
    • 3
    • 9223372036854775807

    Description

      We have experienced locks over the past few weeks on OSS based on ZFS 2.0.7, which makes the node unresponsive in terms of Lustre (OSS node goes unhealthy) and causes a huge load (>400) on OSS. In some situations, directly after that, the load on MDS also increases, but it seems like a consequence of lost communication between MDS and affected OSS. We cannot associate this problem with the exact IO pattern or type of operation. We first address this problem here, but we cannot exclude that it should be addressed to ZFS developers - if you consider it, please let us know.  We attach two types of logs: the first from the 16th of October when both MDS and OSS were affected and the second from the 13th of November when only OSS was stuck. If you need more information, please don't hesitate to let us know.

       

      Regards

       

      Dominika Wanat 

      Attachments

        1. mds01_20221016.log
          40 kB
        2. oss03_20221113.log
          154 kB
        3. oss06_20221016.log
          129 kB

        Activity

          People

            wc-triage WC Triage
            wanat Dominika Wanat
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: