Loading...

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

If we try to start mdt when using read only device or device with disabled journal we will have following message:

VFS: Busy inodes after unmount of md0. Self-destruct in 5 seconds.  Have a nice day...

In such case one of the inode's counter has extra increment. Further it may cause kernel panic or at least memory leak.
According to my investigation this inode is s_buddy_cache (created in ldiskfs_mb_init_backend).
I used kprobe(added handler_pre to __iget) and found that extra increment comes from fsnotify_unmount_inodes:

kprobe handler_pre __iget: inode ffff880079a09528 i_ino 21328 i_count 2 i_state 8
Pid: 12211, comm: mount.lustre Tainted: G        W  ---------------    2.6.32-431.17.1.x1.6.39.x86_64 #1
Call Trace:
 <#DB>  [<ffffffffa0552071>] ? handler_pre+0x41/0x44 [kprobe_example]
 [<ffffffff8152b455>] ? kprobe_exceptions_notify+0x3d5/0x430
 [<ffffffff8152b6c5>] ? notifier_call_chain+0x55/0x80
 [<ffffffff8152b72a>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff810a12be>] ? notify_die+0x2e/0x30
 [<ffffffff81528ff5>] ? do_int3+0x35/0xb0
 [<ffffffff815288c3>] ? int3+0x33/0x40
 [<ffffffff811a4f01>] ? __iget+0x1/0x70
 <<EOE>>  [<ffffffff811ccc7f>] ? fsnotify_unmount_inodes+0x10f/0x120
 [<ffffffff811a620b>] ? invalidate_inodes+0x5b/0x190
 [<ffffffff811c5a14>] ? __sync_blockdev+0x24/0x50
 [<ffffffff8118b34c>] ? generic_shutdown_super+0x4c/0xe0
 [<ffffffff8118b411>] ? kill_block_super+0x31/0x50
 [<ffffffffa058f686>] ? ldiskfs_kill_block_super+0x16/0x60 [ldiskfs]
 [<ffffffff8118bbe7>] ? deactivate_super+0x57/0x80
 [<ffffffff811aabef>] ? mntput_no_expire+0xbf/0x110
 [<ffffffffa09f6515>] ? osd_mount+0x6f5/0xcb0 [osd_ldiskfs]
 [<ffffffffa09f93ff>] ? osd_device_alloc+0x4cf/0x970 [osd_ldiskfs]
 [<ffffffffa11fb33f>] ? obd_setup+0x1bf/0x290 [obdclass]
 [<ffffffffa11fb618>] ? class_setup+0x208/0x870 [obdclass]
 [<ffffffffa1203edc>] ? class_process_config+0xc6c/0x1ad0 [obdclass]
 [<ffffffffa10d54d8>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa1208dc2>] ? lustre_cfg_new+0x312/0x690 [obdclass]
 [<ffffffffa1209298>] ? do_lcfg+0x158/0x440 [obdclass]
 [<ffffffffa1209614>] ? lustre_start_simple+0x94/0x200 [obdclass]
 [<ffffffffa124503d>] ? server_fill_super+0x97d/0x1a7c [obdclass]
 [<ffffffffa10d54d8>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa120f0b0>] ? lustre_fill_super+0x1d0/0x5b0 [obdclass]
 [<ffffffffa120eee0>] ? lustre_fill_super+0x0/0x5b0 [obdclass]
 [<ffffffff8118c2af>] ? get_sb_nodev+0x5f/0xa0
 [<ffffffffa1206dd5>] ? lustre_get_sb+0x25/0x30 [obdclass]
 [<ffffffff8118b90b>] ? vfs_kern_mount+0x7b/0x1b0
 [<ffffffff8118bab2>] ? do_kern_mount+0x52/0x130
 [<ffffffff811aca8b>] ? do_mount+0x2fb/0x930
 [<ffffffff811ad150>] ? sys_mount+0x90/0xe0
 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

Reading /proc/slabinfo after unloading ldiskfs may cause kernel panic because ldiskfs_inode_cache was not cleaned correctly(it still has busy inodes).
fsnotify_unmount_inodes and inotify_unmount_inodes have bugs in handling of inodes with I_NEW.

               /* In case the dropping of a reference would nuke next_i. */                                           
                if ((&next_i->i_sb_list != list) &&
                    atomic_read(&next_i->i_count) &&
                    !(next_i->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))) {                                
                        __iget(next_i);
                        need_iput = next_i;                                                                            
                }

We do __iget for next_i but on the next loop iteration we pass iput because inode has I_NEW state:

               /*
                 * We cannot __iget() an inode in state I_CLEAR, I_FREEING,
                 * I_WILL_FREE, or I_NEW which is fine because by that point                                           
                 * the inode cannot have any associated watches.                                                       
                 */
                if (inode->i_state & (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))                                            
                        continue;

The problem faced on 2.6.32-431.17.1. But as I see another kernels also have this bug.

Details

Description

Attachments

Activity

People

Dates