Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10887

2 MDTs stuck in WAITING

    XMLWordPrintable

Details

    • 1
    • 9223372036854775807

    Description

      Hi,

      find's were hanging on the main filesystem from one client. these processes looked to be unkillable. I rebooted the client running the finds and restarted the find sweep, but they hung again.

      I then failed over all the MDT's to one MDS (we have 2), and that went ok. I then failed all the MDT's back to the other MDS and it LBUG'd.

       kernel: LustreError: 49321:0:(lu_object.c:1177:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
      

      since then 2 of the MDT's won't connect. they are stuck in WAITING state and never get to RECOVERING or COMPLETE.

      [warble1]root: cat /proc/fs/lustre/mdt/dagg-MDT0001/recovery_status
      status: WAITING
      non-ready MDTs:  0000
      recovery_start: 1523093864
      time_waited: 388
      
      [warble1]root: cat /proc/fs/lustre/mdt/dagg-MDT0002/recovery_status
      status: WAITING
      non-ready MDTs:  0000
      recovery_start: 1523093864
      time_waited: 391
      

      the other MDT is ok.

      [warble2]root: cat /proc/fs/lustre/mdt/dagg-MDT0000/recovery_status
      status: COMPLETE
      recovery_start: 1523093168
      recovery_duration: 30
      completed_clients: 122/122
      replayed_requests: 0
      last_transno: 214748364800
      VBR: DISABLED
      IR: DISABLED
      

      I've tried umounting a few times and remountnig, but the time_waited: just keeps incrementing. it gets to 900s, spits out a message and then keeps going forever it looks like.

      any ideas?

      cheers,
      robin

      Attachments

        1. conman-warble1-traces.txt
          1.63 MB
        2. warble1.log-20180408.gz
          115 kB
        3. warble1-traces.txt
          1.54 MB
        4. warbles.txt
          456 kB
        5. warbles-messages-20180408.txt
          1.22 MB
        6. zfs-list.warble1.txt
          1 kB
        7. zpool-status.warble1.txt
          3 kB

        Issue Links

          Activity

            People

              yong.fan nasf (Inactive)
              scadmin SC Admin
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: