Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9725

Mount commands don't return for targets in LFS with DNE and 3 MDTs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.1, Lustre 2.11.0
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      kernel version: 3.10.0-514.21.1.el7_lustre.x86_64
      lustre version: 2.10.0_RC1-1.el7
      OS: CentOS Linux release 7.3.1611 (Core)

      Failure consistently occurs in test_filesystem_dne.py test_md0_undeleteable() during IML SSI automated test runs testing against lustre b2.10

      This is the only test we have which creates a filesystem with 3 MDTs

      On recreating LFS (outside of test infrastructure) in a similar configuration with mgs, 3*mdts and 1 ost through IML, all other targets mount commands return successfully but ost mount command never returns.

      During when the MDT mount commands are being issued, lots of activity in the kernel messages log including multiple LustreErrors and stack traces, warnings of high cpu usage and then

      kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [lwp_notify_fs1-:13630]

      This is on a LDISKF only lfs with DNE enabled. The OST mount command used is as follows and the MDT mount commands are of a similar format:

      mount -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk5 /mnt/fs1-OST0000

      The following gists show excerpts from the /var/log/messages log during instances of this type of failure (MDT mounting in DNE):

      https://gist.github.com/tanabarr/1adb35a7e7da2581be79df8f45417411
      https://gist.github.com/tanabarr/70d3bfa66c4fc474b82c7c02adcda511
      https://gist.github.com/tanabarr/9f54584621aacfdeb3899f59687cb918

      The last gist link is an extended excerpt giving more contextual log information regarding the attempted mounting of the MDTs and the subsequent CPU load warnings. The entire logfile for that failure instance (in addition to other IML related log files) is attached to this ticket.

      original IML ticket: https://github.com/intel-hpdd/intel-manager-for-lustre/issues/108

      Attachments

        1. yum.log.txt
          147 kB
        2. sysrq-t
          375 kB
        3. messages.txt
          2.13 MB
        4. job_scheduler.log.txt
          8.02 MB
        5. chroma-agent-console.log.txt
          1.22 MB
        6. chroma-agent.log.txt
          1.57 MB

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              tanabarr Tom Nabarro (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: