Details
-
Bug
-
Resolution: Fixed
-
Major
-
Upstream
-
3
-
9223372036854775807
Description
We have a script that checks failover process and if it is delayed the node is crashed. Usually 30seconds after umount complete are enough to finish all working stuff and have a zero reference for a raid device. But with high IO and two MDTs on a node makes things worse. Delayed works takes much time, we have seen 20+ minutes after umount complete.
The umount jobs are divided on
1) umount syscall
2) mntput() is delayed and processing by kworker
3) iputs, LU-15404 made it works at separate super block work queue and mntput waits it
4) fputs() using delayed and shares global kernel list with others MDTs and list are processed with a single kworker
md65 mount has 1747359 references and 2387523 files at fput list.
[353410.384294] Lustre: server umount kjcf04-MDT0000 complete [353454.532032] sysrq: SysRq : Trigger a crash [0]: ffffd193c1a1f584 struct mnt_pcp { mnt_count = 436042, mnt_writers = 0 } crash> mnt_pcp 0x3a1fd1e1f584:1 [1]: ffffd193c1a5f584 struct mnt_pcp { mnt_count = 436156, mnt_writers = 0 } crash> mnt_pcp 0x3a1fd1e1f584:a [0]: ffffd193c1a1f584 struct mnt_pcp { mnt_count = 436042, mnt_writers = 0 } cat mount_count.pcp | tr -d ,|grep mnt_count|awk '{ sum += $3 } END { print sum }' 1747359 crash> file ffff978de3176300 struct file { f_u = { fu_llist = { next = 0xffff9794e4529e00 }, fu_rcuhead = { next = 0xffff9794e4529e00, func = 0x0 } }, f_path = { mnt = 0xffff9793b5200da0, dentry = 0xffff97912a254a80 }, f_inode = 0xffff979ddd8351c0, f_op = 0xffffffffc1c2b640 <ldiskfs_dir_operations>, list 0xffff9794e4529e00 | wc -l 2387523
So in such situation the total umount time is unpredictable. Let's fix it.