Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15742

lockup in LNetMDUnlink during filesystem migration

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.12.8
    • TOSS 3.7-19 based
      RHEL kernel 3.10.0-1160.59.1
      lustre 2.12.8_6.llnl
    • 3
    • 9223372036854775807

    Description

      We are having significant lnet issues that have caused us to disable lustre on one of our compute clusters (catalyst). We've had to turn off all of the router nodes in that cluster.

      When the routers for catalyst are on we see lots of errors and have connectivity problems on multiple clusters.

      This ticket may be useful to explain our lnet setup. https://jira.whamcloud.com/browse/LU-15234

      UPDATE: The initial issue have been resolved and our clusters and file systems are working and we don't have to turn off clusters and/or routers anymore. This ticket is now focused on the LNetMDUnlink() containing stack trace as a possible root cause. The OS update and underlying network issues we had seem to have been confounders.

      Related to https://jira.whamcloud.com/browse/LU-11895

      Attachments

        1. call-2022-4-19.tar.gz
          943 kB
          Gian-Carlo Defazio
        2. console.catalyst153
          1.96 MB
          Gian-Carlo Defazio
        3. console.orelic.tar.gz
          368 kB
          Gian-Carlo Defazio
        4. console.orelic2
          430 kB
          Gian-Carlo Defazio
        5. console.pascal128
          1.73 MB
          Gian-Carlo Defazio
        6. console.zrelic.tar.gz
          304 kB
          Gian-Carlo Defazio
        7. console.zrelic2
          485 kB
          Gian-Carlo Defazio
        8. lustre_network_updated.jpg
          202 kB
          Gian-Carlo Defazio
        9. opensm.orelic.log.gz
          132 kB
          Gian-Carlo Defazio
        10. opensm.zrelic.log.gz
          194 kB
          Gian-Carlo Defazio
        11. pascal128-vmcore-dmesg.txt
          772 kB
          Gian-Carlo Defazio
        12. pfstest-nodes.tar.gz
          7.51 MB
          Gian-Carlo Defazio

        Activity

          People

            ssmirnov Serguei Smirnov
            defazio Gian-Carlo Defazio
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: