[LU-588] IO hangs from MMP Created: 11/Aug/11 Updated: 18/Mar/13 Resolved: 18/Mar/13 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 1.8.6 |
| Fix Version/s: | Lustre 2.1.5, Lustre 1.8.9 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jeremy Filizetti | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 5.6 (2.6.18-238.19.1.el5) with one SCSI device handler patch from RHEL 5.7 kernels |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 7261 |
| Description |
|
I've had my mkfs.lustre commands hang from time to time while formatting all of our OSTs on an OSS simultaneously (29-30 OSTs). The problem shows up with all of the mke2fs completed but the mkfs.lustre being stuck in a TASK_UNINTERRUPTIBLE state. The system starts reporting hung tasks for mkfs.lustre, kmmpd kernel threads, and a few other system resources that are stuck waiting on mutexs from the MMP issue. I see the following message in the dmesg/syslog. After adding some printks to kmmpd and forcing a panic, it looks like the issue is that the buffer_head being used by the kmmpd kthread is zeroed. The problem seems to be in ldiskfs_put_super that the buffer_head for the super block is being released prior to the kmmpd kthread being stopped. Moving the release of the super block buffer head to after the MMP thread has stopped appears to have fixed the issue for me. — ext4-mmp-rhel5.patch.orig 2011-08-11 12:01:59.000000000 +0000 #include "ext4.h"
|
| Comments |
| Comment by Andreas Dilger [ 03/Nov/11 ] |
|
Note that the fix as proposed here is not quite correct. By moving brelse(sb->s_sbh) after invalidate_bdev(sb->s_bdev) it will cause the buffer head to be leaked as well as the page on which it is attached. It should be enough to also move invalidate_bdev(sb->s_bdev) after brelse(sb->s_sbh) to avoid this problem. This also needs to be fixed in the rhel6, master, and upstream kernel versions of the MMP code. Note that "kmmpd being stopped" is only cosmetic, and should not cause any incorrect operation or crashes, since at worst the thread is accessing some random memory (as in this case) and it immediately exits without accessing anything else in the on-disk superblock. In the non-race case the thread is explicitly stopped at filesystem unmount time before any of the (few) structures that it is using are freed. |
| Comment by Jeremy Filizetti [ 06/Nov/11 ] |
|
Its been a few months and I can't remember the exact situation of events but this definitely can cause incorrect operation. The issue wasn't the use of the buffer head it had to do with the timing of one thread exiting and the other calling kthread_stop. When I forced a panic I found that the kthread_stop_lock mutex was held and in wait_for_completion. As a result none of the kthreads used by mmp (or anything else for that matter) could exit since the mutex was held. Another process was holding another mutex as well but that's the one I don't remember and need to track down. Basically nothing could finish because they were deadlocking. I'll see if I can track down the logs or the kernel panic to see what exactly was happening and post it here. |
| Comment by Jeremy Filizetti [ 22/Jun/12 ] |
|
Here is the stack traces from the hung processes on this. [1229969.884772] LDISKFS-fs warning (device dm-6): kmmpd: kmmpd being stopped since MMP feature has been disabled. |
| Comment by Jeremy Filizetti [ 22/Jun/12 ] |
|
Submitted a corrected patch for b1_8 under: |
| Comment by Jeremy Filizetti [ 30/Jul/12 ] |
|
Its been a while since I've had a chance to revisit this but I figured I'd provide an update. From the looks of things all versions ext4/ldiskfs will need this to be fixed to prevent the possible buffer_head reuse although RHEL5 is the only one I think affected by the hang caused from the kthread_stop issue. The upstream kernel has commit 63706172f332fd3f6e7458ebfb35fa6de9c21dc5 which I believe prevents the issue from calling kthread_stop on a kthread that has already returned. RHEL 6.2+ has that commit but I'm not sure about SLES. |
| Comment by James A Simmons [ 30/Jul/12 ] |
|
For SLES 2.6.32+ kernels yes that upstream commit is there. |
| Comment by Peter Jones [ 22/Feb/13 ] |
|
Does an equivalent patch need to land on b2_1? |
| Comment by Jian Yu [ 12/Mar/13 ] |
Yes, Peter. Oleg, could you please cherry-pick the patch of http://review.whamcloud.com/3172 to Lustre b2_1 branch? Thanks. |
| Comment by Jian Yu [ 17/Mar/13 ] |
Since we can not cherry pick the patch from Lustre b1_8 branch to b2_1 branch, I ported the patch for b2_1 in http://review.whamcloud.com/5745. |
| Comment by Jian Yu [ 18/Mar/13 ] |
|
Patches were landed to Lustre b1_8 and b2_1 branches. |