Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8355

VFS: Busy inodes after unmount of md0 ... causes kernel panic or at least memory leak

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      If we try to start mdt when using read only device or device with disabled journal we will have following message:

      VFS: Busy inodes after unmount of md0. Self-destruct in 5 seconds.  Have a nice day... 

      In such case one of the inode's counter has extra increment. Further it may cause kernel panic or at least memory leak.
      According to my investigation this inode is s_buddy_cache (created in ldiskfs_mb_init_backend).
      I used kprobe(added handler_pre to __iget) and found that extra increment comes from fsnotify_unmount_inodes:

      kprobe handler_pre __iget: inode ffff880079a09528 i_ino 21328 i_count 2 i_state 8
      Pid: 12211, comm: mount.lustre Tainted: G        W  ---------------    2.6.32-431.17.1.x1.6.39.x86_64 #1
      Call Trace:
       <#DB>  [<ffffffffa0552071>] ? handler_pre+0x41/0x44 [kprobe_example]
       [<ffffffff8152b455>] ? kprobe_exceptions_notify+0x3d5/0x430
       [<ffffffff8152b6c5>] ? notifier_call_chain+0x55/0x80
       [<ffffffff8152b72a>] ? atomic_notifier_call_chain+0x1a/0x20
       [<ffffffff810a12be>] ? notify_die+0x2e/0x30
       [<ffffffff81528ff5>] ? do_int3+0x35/0xb0
       [<ffffffff815288c3>] ? int3+0x33/0x40
       [<ffffffff811a4f01>] ? __iget+0x1/0x70
       <<EOE>>  [<ffffffff811ccc7f>] ? fsnotify_unmount_inodes+0x10f/0x120
       [<ffffffff811a620b>] ? invalidate_inodes+0x5b/0x190
       [<ffffffff811c5a14>] ? __sync_blockdev+0x24/0x50
       [<ffffffff8118b34c>] ? generic_shutdown_super+0x4c/0xe0
       [<ffffffff8118b411>] ? kill_block_super+0x31/0x50
       [<ffffffffa058f686>] ? ldiskfs_kill_block_super+0x16/0x60 [ldiskfs]
       [<ffffffff8118bbe7>] ? deactivate_super+0x57/0x80
       [<ffffffff811aabef>] ? mntput_no_expire+0xbf/0x110
       [<ffffffffa09f6515>] ? osd_mount+0x6f5/0xcb0 [osd_ldiskfs]
       [<ffffffffa09f93ff>] ? osd_device_alloc+0x4cf/0x970 [osd_ldiskfs]
       [<ffffffffa11fb33f>] ? obd_setup+0x1bf/0x290 [obdclass]
       [<ffffffffa11fb618>] ? class_setup+0x208/0x870 [obdclass]
       [<ffffffffa1203edc>] ? class_process_config+0xc6c/0x1ad0 [obdclass]
       [<ffffffffa10d54d8>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa1208dc2>] ? lustre_cfg_new+0x312/0x690 [obdclass]
       [<ffffffffa1209298>] ? do_lcfg+0x158/0x440 [obdclass]
       [<ffffffffa1209614>] ? lustre_start_simple+0x94/0x200 [obdclass]
       [<ffffffffa124503d>] ? server_fill_super+0x97d/0x1a7c [obdclass]
       [<ffffffffa10d54d8>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa120f0b0>] ? lustre_fill_super+0x1d0/0x5b0 [obdclass]
       [<ffffffffa120eee0>] ? lustre_fill_super+0x0/0x5b0 [obdclass]
       [<ffffffff8118c2af>] ? get_sb_nodev+0x5f/0xa0
       [<ffffffffa1206dd5>] ? lustre_get_sb+0x25/0x30 [obdclass]
       [<ffffffff8118b90b>] ? vfs_kern_mount+0x7b/0x1b0
       [<ffffffff8118bab2>] ? do_kern_mount+0x52/0x130
       [<ffffffff811aca8b>] ? do_mount+0x2fb/0x930
       [<ffffffff811ad150>] ? sys_mount+0x90/0xe0
       [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

      Reading /proc/slabinfo after unloading ldiskfs may cause kernel panic because ldiskfs_inode_cache was not cleaned correctly(it still has busy inodes).
      fsnotify_unmount_inodes and inotify_unmount_inodes have bugs in handling of inodes with I_NEW.

                     /* In case the dropping of a reference would nuke next_i. */                                           
                      if ((&next_i->i_sb_list != list) &&
                          atomic_read(&next_i->i_count) &&
                          !(next_i->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))) {                                
                              __iget(next_i);
                              need_iput = next_i;                                                                            
                      }

      We do __iget for next_i but on the next loop iteration we pass iput because inode has I_NEW state:

                     /*
                       * We cannot __iget() an inode in state I_CLEAR, I_FREEING,
                       * I_WILL_FREE, or I_NEW which is fine because by that point                                           
                       * the inode cannot have any associated watches.                                                       
                       */
                      if (inode->i_state & (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))                                            
                              continue;       

      The problem faced on 2.6.32-431.17.1. But as I see another kernels also have this bug.

      Attachments

        Activity

          People

            wc-triage WC Triage
            scherementsev Sergey Cheremencev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: