Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15742

lockup in LNetMDUnlink during filesystem migration

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.12.8
    • TOSS 3.7-19 based
      RHEL kernel 3.10.0-1160.59.1
      lustre 2.12.8_6.llnl
    • 3
    • 9223372036854775807

    Description

      We are having significant lnet issues that have caused us to disable lustre on one of our compute clusters (catalyst). We've had to turn off all of the router nodes in that cluster.

      When the routers for catalyst are on we see lots of errors and have connectivity problems on multiple clusters.

      This ticket may be useful to explain our lnet setup. https://jira.whamcloud.com/browse/LU-15234

      UPDATE: The initial issue have been resolved and our clusters and file systems are working and we don't have to turn off clusters and/or routers anymore. This ticket is now focused on the LNetMDUnlink() containing stack trace as a possible root cause. The OS update and underlying network issues we had seem to have been confounders.

      Related to https://jira.whamcloud.com/browse/LU-11895

      Attachments

        1. console.orelic2
          430 kB
        2. console.catalyst153
          1.96 MB
        3. console.zrelic2
          485 kB
        4. lustre_network_updated.jpg
          lustre_network_updated.jpg
          202 kB
        5. console.orelic.tar.gz
          368 kB
        6. console.zrelic.tar.gz
          304 kB
        7. opensm.orelic.log.gz
          132 kB
        8. opensm.zrelic.log.gz
          194 kB
        9. call-2022-4-19.tar.gz
          943 kB
        10. pfstest-nodes.tar.gz
          7.51 MB
        11. console.pascal128
          1.73 MB
        12. pascal128-vmcore-dmesg.txt
          772 kB

        Activity

          People

            ssmirnov Serguei Smirnov
            defazio Gian-Carlo Defazio
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: