Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16973

Busy device after successful umount

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Upstream
    • 3
    • 9223372036854775807

    Description

      We have a script that checks failover process and if it is delayed the node is crashed. Usually 30seconds after umount complete are enough to finish all working stuff and have a zero reference for a raid device. But with high IO and two MDTs on a node makes things worse. Delayed works takes much time, we have seen 20+ minutes after umount complete.

      The umount jobs are divided on

      1) umount syscall
      2) mntput() is delayed and processing by kworker
      3) iputs, LU-15404 made it works at separate super block work queue and mntput waits it
      4) fputs() using delayed and shares global kernel list with others MDTs and list are processed with a single kworker

      md65 mount has 1747359 references and 2387523 files at fput list.

      [353410.384294] Lustre: server umount kjcf04-MDT0000 complete
      [353454.532032] sysrq: SysRq : Trigger a crash
      [0]: ffffd193c1a1f584
      struct mnt_pcp {
        mnt_count = 436042, 
        mnt_writers = 0
      }
      crash> mnt_pcp 0x3a1fd1e1f584:1
      [1]: ffffd193c1a5f584
      struct mnt_pcp {
        mnt_count = 436156, 
        mnt_writers = 0
      }
      crash> mnt_pcp 0x3a1fd1e1f584:a
      [0]: ffffd193c1a1f584
      struct mnt_pcp {
        mnt_count = 436042, 
        mnt_writers = 0
      }
      cat mount_count.pcp | tr -d ,|grep mnt_count|awk '{ sum += $3 } END { print sum }'
      1747359
      
      crash> file ffff978de3176300
      struct file {
        f_u = {
          fu_llist = {
            next = 0xffff9794e4529e00
          }, 
          fu_rcuhead = {
            next = 0xffff9794e4529e00, 
            func = 0x0
          }
        }, 
        f_path = {
          mnt = 0xffff9793b5200da0, 
          dentry = 0xffff97912a254a80
        }, 
        f_inode = 0xffff979ddd8351c0, 
        f_op = 0xffffffffc1c2b640 <ldiskfs_dir_operations>, 
      
       list 0xffff9794e4529e00 | wc -l
      
      2387523
       

       
      So in such situation the total umount time is unpredictable. Let's fix it.
       

      Attachments

        Issue Links

          Activity

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: