Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.1.0, Lustre 1.8.6
-
None
-
RHEL 5.6 (2.6.18-238.19.1.el5) with one SCSI device handler patch from RHEL 5.7 kernels
-
3
-
7261
Description
I've had my mkfs.lustre commands hang from time to time while formatting all of our OSTs on an OSS simultaneously (29-30 OSTs). The problem shows up with all of the mke2fs completed but the mkfs.lustre being stuck in a TASK_UNINTERRUPTIBLE state. The system starts reporting hung tasks for mkfs.lustre, kmmpd kernel threads, and a few other system resources that are stuck waiting on mutexs from the MMP issue.
I see the following message in the dmesg/syslog.
Aug 4 10:56:12 s01ns030 kernel: LDISKFS-fs warning (device dm-78): kmmpd: kmmpd being stopped since MMP feature has been disabled.
Aug 4 10:56:16 s01ns030 kernel: LDISKFS-fs warning (device dm-70): kmmpd: kmmpd being stopped since MMP feature has been disabled.
Aug 4 10:56:17 s01ns030 kernel: LDISKFS-fs warning (device dm-66): kmmpd: kmmpd being stopped since MMP feature has been disabled.
After adding some printks to kmmpd and forcing a panic, it looks like the issue is that the buffer_head being used by the kmmpd kthread is zeroed. The problem seems to be in ldiskfs_put_super that the buffer_head for the super block is being released prior to the kmmpd kthread being stopped.
Moving the release of the super block buffer head to after the MMP thread has stopped appears to have fixed the issue for me.
— ext4-mmp-rhel5.patch.orig 2011-08-11 12:01:59.000000000 +0000
+++ ext4-mmp-rhel5.patch 2011-08-11 12:06:42.000000000 +0000
@@ -522,12 +522,21 @@
#include "ext4.h"
#include "ext4_jbd2.h"
-@@ -698,6 +700,8 @@ static void ext4_put_super(struct super_
+@@ -682,7 +682,6 @@
+ percpu_counter_destroy(&sbi->s_freeinodes_counter);
+ percpu_counter_destroy(&sbi->s_dirs_counter);
+ percpu_counter_destroy(&sbi->s_dirtyblocks_counter);
+- brelse(sbi->s_sbh);
+ #ifdef CONFIG_QUOTA
+ for (i = 0; i < MAXQUOTAS; i++)
+ kfree(sbi->s_qf_names[i]);
+@@ -698,6 +700,9 @@ static void ext4_put_super(struct super_
invalidate_bdev(sbi->journal_bdev, 0);
ext4_blkdev_remove(sbi);
}
+ if (sbi->s_mmp_tsk)
+ kthread_stop(sbi->s_mmp_tsk);
++ brelse(sbi->s_sbh);
sb->s_fs_info = NULL;
/*
- Now that we are completely done shutting down the
Attachments
- ldiskfs_mmp.patch
- 0.8 kB
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA
Activity
Oleg, could you please cherry-pick the patch of http://review.whamcloud.com/3172 to Lustre b2_1 branch?
Since we can not cherry pick the patch from Lustre b1_8 branch to b2_1 branch, I ported the patch for b2_1 in http://review.whamcloud.com/5745.
Does an equivalent patch need to land on b2_1?
Yes, Peter.
Oleg, could you please cherry-pick the patch of http://review.whamcloud.com/3172 to Lustre b2_1 branch? Thanks.
Its been a while since I've had a chance to revisit this but I figured I'd provide an update.
From the looks of things all versions ext4/ldiskfs will need this to be fixed to prevent the possible buffer_head reuse although RHEL5 is the only one I think affected by the hang caused from the kthread_stop issue.
The upstream kernel has commit 63706172f332fd3f6e7458ebfb35fa6de9c21dc5 which I believe prevents the issue from calling kthread_stop on a kthread that has already returned. RHEL 6.2+ has that commit but I'm not sure about SLES.
Submitted a corrected patch for b1_8 under:
http://review.whamcloud.com/3172
Here is the stack traces from the hung processes on this.
[1229969.884772] LDISKFS-fs warning (device dm-6): kmmpd: kmmpd being stopped since MMP feature has been disabled.
[1229971.500258] JBD: barrier-based sync failed on dm-14-8 - disabling barriers
[1229974.181427] JBD: barrier-based sync failed on dm-18-8 - disabling barriers
[1229975.035900] LDISKFS-fs warning (device dm-33): kmmpd: kmmpd being stopped since MMP feature has been disabled.
[1229981.135807] LDISKFS-fs warning (device dm-14): kmmpd: kmmpd being stopped since MMP feature has been disabled.
[1230108.857001] INFO: task mkfs.lustre:459 blocked for more than 120 seconds.
[1230108.864089] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1230108.872246] mkfs.lustre D ffffffff80155449 0 459 457 467 (NOTLB)
[1230108.881067] ffff810b956abd98 0000000000000082 ffff81182946cac0 0000000000000009
[1230108.888897] ffff810c3fae6790 0000000000000009 ffff810c3fae6790 ffff81183fd7a100
[1230108.896708] 00045ea8370c6f40 0000000000000ce5 ffff810c3fae6978 000000138008e133
[1230108.904580] Call Trace:
[1230108.907533] [<ffffffff80063171>] wait_for_completion+0x79/0xa2
[1230108.913833] [<ffffffff8008e984>] default_wake_function+0x0/0xe
[1230108.920118] [<ffffffff800a2ca3>] kthread_stop+0x4a/0x80
[1230108.925779] [<ffffffff88559207>] :ldiskfs:ldiskfs_put_super+0x2a7/0x330
[1230108.932878] [<ffffffff800e6d63>] generic_shutdown_super+0x79/0xfb
[1230108.939517] [<ffffffff800e6e16>] kill_block_super+0x31/0x45
[1230108.945503] [<ffffffff800e6ee4>] deactivate_super+0x6a/0x82
[1230108.951532] [<ffffffff800f1061>] sys_umount+0x245/0x27b
[1230108.957141] [<ffffffff800b9db2>] audit_syscall_entry+0x1a8/0x1d3
[1230108.963570] [<ffffffff8005d28d>] tracesys+0xd5/0xe0
[1230108.968813]
[1230108.970541] INFO: task mkfs.lustre:467 blocked for more than 120 seconds.
[1230108.977778] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1230108.985979] mkfs.lustre D ffffffff80155449 0 467 457 471 459 (NOTLB)
[1230108.994880] ffff810c38fd3d88 0000000000000086 ffff810113313fb8 000000000000000e
[1230109.002690] ffff8118292d5800 0000000000000009 ffff810c21ea0040 ffff810c6a24d080
[1230109.010580] 00045ea85c233342 000000000019d9fd ffff810c21ea0228 0000000138fd3d58
[1230109.018219] Call Trace:
[1230109.021123] [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
[1230109.027518] [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
[1230109.033371] [<ffffffff800a2c6e>] kthread_stop+0x15/0x80
[1230109.039078] [<ffffffff88559207>] :ldiskfs:ldiskfs_put_super+0x2a7/0x330
[1230109.046143] [<ffffffff800e6d63>] generic_shutdown_super+0x79/0xfb
[1230109.052645] [<ffffffff800e6e16>] kill_block_super+0x31/0x45
[1230109.058641] [<ffffffff800e6ee4>] deactivate_super+0x6a/0x82
[1230109.064702] [<ffffffff800f1061>] sys_umount+0x245/0x27b
[1230109.070390] [<ffffffff800b9db2>] audit_syscall_entry+0x1a8/0x1d3
[1230109.076814] [<ffffffff8005d28d>] tracesys+0xd5/0xe0
[1230109.082158]
[1230109.083886] INFO: task mkfs.lustre:471 blocked for more than 120 seconds.
[1230109.090993] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1230109.099178] mkfs.lustre D ffffffff80155449 0 471 457 479 467 (NOTLB)
[1230109.108035] ffff810c02a49d68 0000000000000082 ffff810c02a49d28 ffff8112766450a0
[1230109.115923] ffff8113f0ae8180 0000000000000009 ffff810c1e9fc040 ffff81183fddd040
[1230109.123822] 00045ea8370d7659 0000000000007fd0 ffff810c1e9fc228 0000001402a49d48
[1230109.131487] Call Trace:
[1230109.134521] [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
[1230109.141044] [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
[1230109.146940] [<ffffffff80155449>] __next_cpu+0x19/0x28
[1230109.152451] [<ffffffff800a2c6e>] kthread_stop+0x15/0x80
[1230109.158100] [<ffffffff800a00f2>] destroy_workqueue+0x4a/0x8a
[1230109.164266] [<ffffffff88558f92>] :ldiskfs:ldiskfs_put_super+0x32/0x330
[1230109.171244] [<ffffffff800e6d63>] generic_shutdown_super+0x79/0xfb
[1230109.177798] [<ffffffff800e6e16>] kill_block_super+0x31/0x45
[1230109.183784] [<ffffffff800e6ee4>] deactivate_super+0x6a/0x82
[1230109.189683] [<ffffffff800f1061>] sys_umount+0x245/0x27b
[1230109.195282] [<ffffffff800b9db2>] audit_syscall_entry+0x1a8/0x1d3
[1230109.201694] [<ffffffff8005d28d>] tracesys+0xd5/0xe0
[1230109.206956]
[1230109.208651] INFO: task mkfs.lustre:479 blocked for more than 120 seconds.
[1230109.215801] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1230109.223881] mkfs.lustre D ffffffff80155449 0 479 457 1337 471 (NOTLB)
[1230109.232687] ffff810c29e7fd68 0000000000000086 ffff810c29e7fcf8 ffff810bf9043940
[1230109.240471] 00000000ffffffff 0000000000000009 ffff810c1bd90810 ffff810c6a2ed100
[1230109.248351] 00045ea8370db3bb 00000000000007bc ffff810c1bd909f8 0000000700000001
[1230109.255985] Call Trace:
[1230109.258940] [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
[1230109.265358] [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
[1230109.271219] [<ffffffff80063af9>] mutex_lock+0xd/0x1d
[1230109.276687] [<ffffffff800a0015>] flush_workqueue+0x3f/0x87
[1230109.282602] [<ffffffff8855840e>] :ldiskfs:ldiskfs_sync_fs+0x3e/0x70
[1230109.289220] [<ffffffff80063af9>] mutex_lock+0xd/0x1d
[1230109.294569] [<ffffffff800e417e>] __fsync_super+0x81/0x9e
[1230109.300242] [<ffffffff800e42e5>] fsync_super+0x9/0x16
[1230109.305630] [<ffffffff800e6d0d>] generic_shutdown_super+0x23/0xfb
[1230109.312093] [<ffffffff800e6e16>] kill_block_super+0x31/0x45
[1230109.318080] [<ffffffff800e6ee4>] deactivate_super+0x6a/0x82
[1230109.324117] [<ffffffff800f1061>] sys_umount+0x245/0x27b
[1230109.329848] [<ffffffff800b9db2>] audit_syscall_entry+0x1a8/0x1d3
[1230109.336318] [<ffffffff8005d28d>] tracesys+0xd5/0xe0
[1230109.341520]
[1230109.343198] INFO: task mkfs.lustre:1485 blocked for more than 120 seconds.
[1230109.350626] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1230109.358766] mkfs.lustre D ffffffff80155449 0 1485 457 1488 1408 (NOTLB)
[1230109.367651] ffff810717147d88 0000000000000086 ffff810c5a9adda0 000000000000000e
[1230109.375520] ffff811833b60c00 0000000000000007 ffff81055123e040 ffff810c4017f040
[1230109.383453] 00045ea762a45f73 00000000001394d9 ffff81055123e228 0000000417147d58
[1230109.391240] Call Trace:
[1230109.394210] [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
[1230109.400713] [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
[1230109.406565] [<ffffffff800a2c6e>] kthread_stop+0x15/0x80
[1230109.412220] [<ffffffff88559207>] :ldiskfs:ldiskfs_put_super+0x2a7/0x330
[1230109.419329] [<ffffffff800e6d63>] generic_shutdown_super+0x79/0xfb
[1230109.425872] [<ffffffff800e6e16>] kill_block_super+0x31/0x45
[1230109.431824] [<ffffffff800e6ee4>] deactivate_super+0x6a/0x82
[1230109.437722] [<ffffffff800f1061>] sys_umount+0x245/0x27b
[1230109.443365] [<ffffffff800b9db2>] audit_syscall_entry+0x1a8/0x1d3
[1230109.449802] [<ffffffff8005d28d>] tracesys+0xd5/0xe0
[1230109.455084]
[1230229.224627] INFO: task pdflush:31135 blocked for more than 120 seconds.
[1230229.231650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1230229.240099] pdflush D ffffffff80155449 0 31135 875 941 28677 (L-TLB)
[1230229.249156] ffff8100b1867da0 0000000000000046 ffff810c6a1ffe80 0000000000000000
[1230229.257129] ffffffff804db6f0 000000000000000a ffff810dfd5197d0 ffff810c400b2080
[1230229.265208] 00045ebce0d890b9 00000000000014ea ffff810dfd5199b8 0000000500000000
[1230229.273168] Call Trace:
[1230229.276152] [<ffffffff800a2d3a>] keventd_create_kthread+0x0/0xc4
[1230229.282723] [<ffffffff8006468c>] __down_read+0x7a/0x92
[1230229.288457] [<ffffffff800e7164>] sync_supers+0x42/0xe1
[1230229.294131] [<ffffffff800562b7>] pdflush+0x0/0x1fb
[1230229.299503] [<ffffffff800cc415>] wb_kupdate+0x3e/0x169
[1230229.305329] [<ffffffff800562b7>] pdflush+0x0/0x1fb
[1230229.310661] [<ffffffff80056408>] pdflush+0x151/0x1fb
[1230229.316173] [<ffffffff800cc3d7>] wb_kupdate+0x0/0x169
[1230229.321743] [<ffffffff8003275c>] kthread+0xfe/0x132
[1230229.327152] [<ffffffff8005dfb1>] child_rip+0xa/0x11
[1230229.332531] [<ffffffff800a2d3a>] keventd_create_kthread+0x0/0xc4
[1230229.339132] [<ffffffff8003265e>] kthread+0x0/0x132
[1230229.344416] [<ffffffff8005dfa7>] child_rip+0x0/0x11
[1230229.349937]
[1230229.351698] INFO: task mkfs.lustre:459 blocked for more than 120 seconds.
[1230229.358998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1230229.367370] mkfs.lustre D ffffffff80155449 0 459 457 467 (NOTLB)
[1230229.376443] ffff810b956abd98 0000000000000082 ffff81182946cac0 0000000000000009
[1230229.384348] ffff810c3fae6790 0000000000000009 ffff810c3fae6790 ffff81183fd7a100
[1230229.392334] 00045ea8370c6f40 0000000000000ce5 ffff810c3fae6978 000000138008e133
[1230229.399999] Call Trace:
[1230229.403134] [<ffffffff80063171>] wait_for_completion+0x79/0xa2
[1230229.409476] [<ffffffff8008e984>] default_wake_function+0x0/0xe
[1230229.415889] [<ffffffff800a2ca3>] kthread_stop+0x4a/0x80
[1230229.421718] [<ffffffff88559207>] :ldiskfs:ldiskfs_put_super+0x2a7/0x330
[1230229.428954] [<ffffffff800e6d63>] generic_shutdown_super+0x79/0xfb
[1230229.435671] [<ffffffff800e6e16>] kill_block_super+0x31/0x45
[1230229.441839] [<ffffffff800e6ee4>] deactivate_super+0x6a/0x82
[1230229.448032] [<ffffffff800f1061>] sys_umount+0x245/0x27b
[1230229.453814] [<ffffffff800b9db2>] audit_syscall_entry+0x1a8/0x1d3
[1230229.460456] [<ffffffff8005d28d>] tracesys+0xd5/0xe0
[1230229.465876]
[1230229.467691] INFO: task mkfs.lustre:467 blocked for more than 120 seconds.
[1230229.474971] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1230229.483399] mkfs.lustre D ffffffff80155449 0 467 457 471 459 (NOTLB)
[1230229.492383] ffff810c38fd3d88 0000000000000086 ffff810113313fb8 000000000000000e
[1230229.500300] ffff8118292d5800 0000000000000009 ffff810c21ea0040 ffff810c6a24d080
[1230229.508175] 00045ea85c233342 000000000019d9fd ffff810c21ea0228 0000000138fd3d58
[1230229.514516] Call Trace:
[1230229.519002] [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
[1230229.519013] [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
[1230229.531571] [<ffffffff800a2c6e>] kthread_stop+0x15/0x80
[1230229.531593] [<ffffffff88559207>] :ldiskfs:ldiskfs_put_super+0x2a7/0x330
[1230229.544548] [<ffffffff800e6d63>] generic_shutdown_super+0x79/0xfb
[1230229.544556] [<ffffffff800e6e16>] kill_block_super+0x31/0x45
[1230229.557262] [<ffffffff800e6ee4>] deactivate_super+0x6a/0x82
[1230229.557269] [<ffffffff800f1061>] sys_umount+0x245/0x27b
[1230229.569158] [<ffffffff800b9db2>] audit_syscall_entry+0x1a8/0x1d3
[1230229.569177] [<ffffffff8005d28d>] tracesys+0xd5/0xe0
[1230229.581039]
[1230229.581042] INFO: task mkfs.lustre:471 blocked for more than 120 seconds.
[1230229.590015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1230229.590018] mkfs.lustre D ffffffff80155449 0 471 457 479 467 (NOTLB)
[1230229.607341] ffff810c02a49d68 0000000000000082 ffff810c02a49d28 ffff8112766450a0
[1230229.615049] ffff8113f0ae8180 0000000000000009 ffff810c1e9fc040 ffff81183fddd040
[1230229.623215] 00045ea8370d7659 0000000000007fd0 ffff810c1e9fc228 0000001402a49d48
[1230229.630931] Call Trace:
[1230229.630957] [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
[1230229.640396] [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
[1230229.640402] [<ffffffff80155449>] __next_cpu+0x19/0x28
[1230229.651614] [<ffffffff800a2c6e>] kthread_stop+0x15/0x80
[1230229.651622] [<ffffffff800a00f2>] destroy_workqueue+0x4a/0x8a
[1230229.663150] [<ffffffff88558f92>] :ldiskfs:ldiskfs_put_super+0x32/0x330
[1230229.663161] [<ffffffff800e6d63>] generic_shutdown_super+0x79/0xfb
[1230229.676760] [<ffffffff800e6e16>] kill_block_super+0x31/0x45
[1230229.682927] [<ffffffff800e6ee4>] deactivate_super+0x6a/0x82
[1230229.689000] [<ffffffff800f1061>] sys_umount+0x245/0x27b
[1230229.694727] [<ffffffff800b9db2>] audit_syscall_entry+0x1a8/0x1d3
[1230229.701251] [<ffffffff8005d28d>] tracesys+0xd5/0xe0
[1230229.706704]
[1230229.708422] INFO: task mkfs.lustre:479 blocked for more than 120 seconds.
[1230229.715677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1230229.724001] mkfs.lustre D ffffffff80155449 0 479 457 1337 471 (NOTLB)
[1230229.732961] ffff810c29e7fd68 0000000000000086 ffff810c29e7fcf8 ffff810bf9043940
[1230229.740909] 00000000ffffffff 0000000000000009 ffff810c1bd90810 ffff810c6a2ed100
[1230229.748805] 00045ea8370db3bb 00000000000007bc ffff810c1bd909f8 0000000700000001
[1230229.756371] Call Trace:
[1230229.759252] [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
[1230229.759264] [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
[1230229.771357] [<ffffffff80063af9>] mutex_lock+0xd/0x1d
[1230229.771363] [<ffffffff800a0015>] flush_workqueue+0x3f/0x87
[1230229.771389] [<ffffffff8855840e>] :ldiskfs:ldiskfs_sync_fs+0x3e/0x70
[1230229.771394] [<ffffffff80063af9>] mutex_lock+0xd/0x1d
[1230229.771401] [<ffffffff800e417e>] __fsync_super+0x81/0x9e
[1230229.771408] [<ffffffff800e42e5>] fsync_super+0x9/0x16
[1230229.771413] [<ffffffff800e6d0d>] generic_shutdown_super+0x23/0xfb
[1230229.771421] [<ffffffff800e6e16>] kill_block_super+0x31/0x45
[1230229.771428] [<ffffffff800e6ee4>] deactivate_super+0x6a/0x82
[1230229.771434] [<ffffffff800f1061>] sys_umount+0x245/0x27b
[1230229.771445] [<ffffffff800b9db2>] audit_syscall_entry+0x1a8/0x1d3
[1230229.771463] [<ffffffff8005d28d>] tracesys+0xd5/0xe0
[1230229.771478]
[1284338.209572] audit(1340251325.870:2521): audit_pid=0 old=13012 by auid=0
[1284338.437997] type=1305 audit(1340251326.099:2522): audit_pid=9042 old=0 by auid=0
Its been a few months and I can't remember the exact situation of events but this definitely can cause incorrect operation. The issue wasn't the use of the buffer head it had to do with the timing of one thread exiting and the other calling kthread_stop. When I forced a panic I found that the kthread_stop_lock mutex was held and in wait_for_completion. As a result none of the kthreads used by mmp (or anything else for that matter) could exit since the mutex was held. Another process was holding another mutex as well but that's the one I don't remember and need to track down. Basically nothing could finish because they were deadlocking. I'll see if I can track down the logs or the kernel panic to see what exactly was happening and post it here.
Note that the fix as proposed here is not quite correct. By moving brelse(sb->s_sbh) after invalidate_bdev(sb->s_bdev) it will cause the buffer head to be leaked as well as the page on which it is attached. It should be enough to also move invalidate_bdev(sb->s_bdev) after brelse(sb->s_sbh) to avoid this problem.
This also needs to be fixed in the rhel6, master, and upstream kernel versions of the MMP code.
Note that "kmmpd being stopped" is only cosmetic, and should not cause any incorrect operation or crashes, since at worst the thread is accessing some random memory (as in this case) and it immediately exits without accessing anything else in the on-disk superblock. In the non-race case the thread is explicitly stopped at filesystem unmount time before any of the (few) structures that it is using are freed.
Patches were landed to Lustre b1_8 and b2_1 branches.