[LU-4474] deadlock of ldiskfs_quota_off() Created: 11/Jan/14  Updated: 29/Sep/16  Resolved: 29/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: Li Xi (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: ldiskfs

Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 12256

 Description   

When I ran quotacheck directly on ldiskfs, the process stuck. Following is the dump stack. The ext4_quota_off() function added by ext4-quota-force-block-alloc-quotaoff.patch tried to grab read lock of sb->s_umount. But it stuck there because it was holding write lock of sb->s_umount.

COMMAND: "quotacheck"
TASK: ffff88010286aae0 [THREAD_INFO: ffff880102610000]
CPU: 0
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 21560 TASK: ffff88010286aae0 CPU: 0 COMMAND: "quotacheck"
#0 [ffff880102611b60] schedule at ffffffff8150c432
#1 [ffff880102611c28] rwsem_down_failed_common at ffffffff8150e9e5
#2 [ffff880102611c88] rwsem_down_read_failed at ffffffff8150eb76
#3 [ffff880102611cc8] call_rwsem_down_read_failed at ffffffff81281b94
#4 [ffff880102611d30] ldiskfs_quota_off at ffffffffa02fe115 [ldiskfs]
#5 [ffff880102611d90] ldiskfs_remount at ffffffffa02fd41d [ldiskfs]
#6 [ffff880102611e60] do_remount_sb at ffffffff811825bc
#7 [ffff880102611ea0] do_mount at ffffffff811a3915
#8 [ffff880102611f20] sys_mount at ffffffff811a3c10
#9 [ffff880102611f80] system_call_fastpath at ffffffff8100b072
RIP: 00007f97e757ee0a RSP: 00007fffe66e9388 RFLAGS: 00010297
RAX: 00000000000000a5 RBX: ffffffff8100b072 RCX: 0000000000000001
RDX: 00007f97e9f02940 RSI: 00007f97e9f01740 RDI: 0000000000000000
RBP: 00007f97e822b4a0 R8: 0000000000000000 R9: 0000000000000000
R10: 00000000c0ed0021 R11: 0000000000000206 R12: 00007f97e7ea7047
R13: 00007f97e7ea5c00 R14: 0000000000000001 R15: 00007f97e822b4a0
ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b
crash>

Following is how to reproduce it. Please note that the version of quota-tools is 4.0.1. Somehow quota-tools of version 3.17 refuses to quotacheck a ldiskfs mount point.

[root@vm1 quota-tools]# mount -t ldiskfs /dev/sdb3 -o usrquota /mnt/
[root@vm1 quota-tools]# ./quotacheck /mnt/ -f
.



 Comments   
Comment by Li Xi (Inactive) [ 11/Jan/14 ]

I am sure, but maybe this problem could be fixed by removing the down_read(&sb->s_umount)? It seems that a write lock of sb->s_umount is alway being hold when ext4_quota_off() is called. Please correct me if I am wrong.

Comment by Peter Jones [ 11/Jan/14 ]

NIu

Could you please comment?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 13/Jan/14 ]

Hi, LiXi

What kernel version did you use? I didn't see how ldiskfs_remount calls ldiskfs_quota_off in my kernel (2.6.32-279).

BTW: Why do you need a quotacheck on ldiskfs? Isn't it obsoleted?

Comment by Li Xi (Inactive) [ 13/Jan/14 ]

Hi Yawei,

I am using linux-2.6.32-358.18.1.el6.x86_64. vfs_dq_off() is called in ext4_remount().

static int ext4_remount(struct super_block *sb, int *flags, char *data)
{
...
if ((*flags & MS_RDONLY) != (sb->s_flags & MS_RDONLY) ||
n_blocks_count > ext4_blocks_count(es)) {
if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED)

{ err = -EROFS; goto restore_opts; }

if (*flags & MS_RDONLY) {
err = vfs_dq_off(sb, 1);
if (err < 0 && err != -ENOSYS)

{ err = -EBUSY; goto restore_opts; }

/*

  • First of all, the unconditional stuff we have to do
  • to disable replay of the journal when we next remount
    */
    sb->s_flags |= MS_RDONLY;
    ...

Yeah, I understand this is not a common operation for Lustre. I thought we can use ldiskfs as a normal file system, so I ran quota utility directly on ldiskfs. If it is not proper to do so, do you have any suggestions about how to check the quota mechanism of ldiskfs? As you know, I am working on a quota stuff. Quota codes of ldiskfs have to be updated for support. Thanks!

Comment by Niu Yawei (Inactive) [ 13/Jan/14 ]

quota accounting is always enabled on ldiskfs (see ext4-quota-first-class.patch), so quotacheck is no longer needed anymore. (if the filesystem is upgraded from old ldiskfs which doesn't have EXT4_FEATURE_RO_COMPAT_QUOTA feature, the quota accounting will be udpated automatically when you enable the feature by 'tune2fs -O quota')

Comment by Li Xi (Inactive) [ 13/Jan/14 ]

Ah, thank you! Do you think this problem could be trigger by any operation from Lustre or not? I think it would be better to fix this problem anyway if it is not too difficult.

Comment by Niu Yawei (Inactive) [ 13/Jan/14 ]

I think it could probably be triggered in Lustre, but I didn't see why it was never reported before.

Looks it can't be fixed simply be removing down_read(s_umount), because I didn't see s_umount is held in quotactl path (do_quotactl()). Could you try to search to see if it's already fixed in upstream?

Comment by Niu Yawei (Inactive) [ 14/Jan/14 ]

Looks it has been fixed in upstream:

It must be already acquired by caller. Fix lock_dep complain.

Signed-off-by: Dmitry Monakhov <dmon...@gmail.com>
Reviewed-by: Jan Kara <ja...@suse.cz>

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e59eb37..062d1bc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4107,12 +4107,10 @@ static int ext4_quota_on(struct super_block *sb, int
type, int format_id,

 static int ext4_quota_off(struct super_block *sb, int type)
 {
-	/* Force all delayed allocation blocks to be allocated */
-	if (test_opt(sb, DELALLOC)) {
-		down_read(&sb->s_umount);
+	/* Force all delayed allocation blocks to be allocated.
+	 * Caller already holds s_umount sem */
+	if (test_opt(sb, DELALLOC))
 		sync_filesystem(sb);
-		up_read(&sb->s_umount);
-	}

 	return dquot_quota_off(sb, type);
 }

I'll backport it to Lustre, thank you, LiXi.

Comment by Li Xi (Inactive) [ 14/Jan/14 ]

Ah, Yawei, thank you so much for investigte this!

Comment by Niu Yawei (Inactive) [ 14/Jan/14 ]

http://review.whamcloud.com/8828

Comment by Li Xi (Inactive) [ 17/Jan/14 ]

Hi Yawei,

I found following messages when running mkfs.lustre on a server which has applied your patch. It seems sync_filesystem() complained that: WARN_ON(!rwsem_is_locked(&sb->s_umount));

Do you think this related to the patch?

Thanks!

Jan 16 13:36:53 vm1 kernel: LDISKFS-fs (sdb3): mounted filesystem with ordered data mode. quota=on. Opts:
Jan 16 13:36:53 vm1 kernel: -----------[ cut here ]-----------
Jan 16 13:36:53 vm1 kernel: WARNING: at fs/sync.c:64 sync_filesystem+0x63/0x70() (Tainted: G W --------------- )
Jan 16 13:36:53 vm1 kernel: Hardware name: KVM
Jan 16 13:36:53 vm1 kernel: Modules linked in: ldiskfs sha512_generic sha256_generic bridge stp llc nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core sg ext4 jbd2 mbcache sd_mod crc_t10dif pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
Jan 16 13:36:53 vm1 kernel: Pid: 19717, comm: mkfs.lustre Tainted: G W --------------- 2.6.32 #4
Jan 16 13:36:53 vm1 kernel: Call Trace:
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e43a>] ? warn_slowpath_null+0x1a/0x20
Jan 16 13:36:53 vm1 kernel: [<ffffffff811b1893>] ? sync_filesystem+0x63/0x70
Jan 16 13:36:53 vm1 kernel: [<ffffffffa02e7048>] ? ldiskfs_quota_off+0x218/0x220 [ldiskfs]
Jan 16 13:36:53 vm1 kernel: [<ffffffff81183496>] ? deactivate_super+0x46/0x80
Jan 16 13:36:53 vm1 kernel: [<ffffffff811a17ef>] ? mntput_no_expire+0xbf/0x110
Jan 16 13:36:53 vm1 kernel: [<ffffffff811a225b>] ? sys_umount+0x7b/0x3a0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Jan 16 13:36:53 vm1 kernel: --[ end trace 7cd192156324e646 ]--
Jan 16 13:36:53 vm1 kernel: -----------[ cut here ]-----------
Jan 16 13:36:53 vm1 kernel: WARNING: at fs/fs-writeback.c:1051 writeback_inodes_sb_nr+0x94/0xa0() (Tainted: G W --------------- )
Jan 16 13:36:53 vm1 kernel: Hardware name: KVM
Jan 16 13:36:53 vm1 kernel: Modules linked in: ldiskfs sha512_generic sha256_generic bridge stp llc nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core sg ext4 jbd2 mbcache sd_mod crc_t10dif pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
Jan 16 13:36:53 vm1 kernel: Pid: 19717, comm: mkfs.lustre Tainted: G W --------------- 2.6.32 #4
Jan 16 13:36:53 vm1 kernel: Call Trace:
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e43a>] ? warn_slowpath_null+0x1a/0x20
Jan 16 13:36:53 vm1 kernel: [<ffffffff811ab704>] ? writeback_inodes_sb_nr+0x94/0xa0
Jan 16 13:36:53 vm1 kernel: [<ffffffff811ab753>] ? writeback_inodes_sb+0x43/0x50
Jan 16 13:36:53 vm1 kernel: [<ffffffff811b162e>] ? __sync_filesystem+0x3e/0x90
Jan 16 13:36:53 vm1 kernel: [<ffffffff811b186a>] ? sync_filesystem+0x3a/0x70
Jan 16 13:36:53 vm1 kernel: [<ffffffffa02e7048>] ? ldiskfs_quota_off+0x218/0x220 [ldiskfs]
Jan 16 13:36:53 vm1 kernel: [<ffffffff81183496>] ? deactivate_super+0x46/0x80
Jan 16 13:36:53 vm1 kernel: [<ffffffff811a17ef>] ? mntput_no_expire+0xbf/0x110
Jan 16 13:36:53 vm1 kernel: [<ffffffff811a225b>] ? sys_umount+0x7b/0x3a0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Jan 16 13:36:53 vm1 kernel: --[ end trace 7cd192156324e647 ]--
Jan 16 13:36:53 vm1 kernel: -----------[ cut here ]-----------
Jan 16 13:36:53 vm1 kernel: WARNING: at fs/fs-writeback.c:548 wb_writeback+0x339/0x3f0() (Tainted: G W --------------- )
Jan 16 13:36:53 vm1 kernel: Hardware name: KVM
Jan 16 13:36:53 vm1 kernel: Modules linked in: ldiskfs sha512_generic sha256_generic bridge stp llc nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core sg ext4 jbd2 mbcache sd_mod crc_t10dif pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
Jan 16 13:36:53 vm1 kernel: Pid: 18643, comm: flush-8:16 Tainted: G W --------------- 2.6.32 #4
Jan 16 13:36:53 vm1 kernel: Call Trace:
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e43a>] ? warn_slowpath_null+0x1a/0x20
Jan 16 13:36:53 vm1 kernel: [<ffffffff811aca49>] ? wb_writeback+0x339/0x3f0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8150c480>] ? thread_return+0x4e/0x76e
Jan 16 13:36:53 vm1 kernel: [<ffffffff81081be2>] ? del_timer_sync+0x22/0x30
Jan 16 13:36:53 vm1 kernel: [<ffffffff811acbbb>] ? wb_do_writeback+0xbb/0x240
Jan 16 13:36:53 vm1 kernel: [<ffffffff811acda3>] ? bdi_writeback_task+0x63/0x1b0
Jan 16 13:36:53 vm1 kernel: [<ffffffff81096c67>] ? bit_waitqueue+0x17/0xd0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8113c1d0>] ? bdi_start_fn+0x0/0x100
Jan 16 13:36:53 vm1 kernel: [<ffffffff8113c256>] ? bdi_start_fn+0x86/0x100
Jan 16 13:36:53 vm1 kernel: [<ffffffff8113c1d0>] ? bdi_start_fn+0x0/0x100
Jan 16 13:36:53 vm1 kernel: [<ffffffff81096a36>] ? kthread+0x96/0xa0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Jan 16 13:36:53 vm1 kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Jan 16 13:36:53 vm1 kernel: --[ end trace 7cd192156324e648 ]--
Jan 16 13:36:53 vm1 kernel: -----------[ cut here ]-----------
Jan 16 13:36:53 vm1 kernel: WARNING: at fs/fs-writeback.c:1131 sync_inodes_sb+0x169/0x190() (Tainted: G W --------------- )
Jan 16 13:36:53 vm1 kernel: Hardware name: KVM
Jan 16 13:36:53 vm1 kernel: Modules linked in: ldiskfs sha512_generic sha256_generic bridge stp llc nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core sg ext4 jbd2 mbcache sd_mod crc_t10dif pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
Jan 16 13:36:53 vm1 kernel: Pid: 19717, comm: mkfs.lustre Tainted: G W --------------- 2.6.32 #4
Jan 16 13:36:53 vm1 kernel: Call Trace:
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e43a>] ? warn_slowpath_null+0x1a/0x20
Jan 16 13:36:53 vm1 kernel: [<ffffffff811ab8c9>] ? sync_inodes_sb+0x169/0x190
Jan 16 13:36:53 vm1 kernel: [<ffffffff811b1672>] ? __sync_filesystem+0x82/0x90
Jan 16 13:36:53 vm1 kernel: [<ffffffff811b187b>] ? sync_filesystem+0x4b/0x70
Jan 16 13:36:53 vm1 kernel: [<ffffffffa02e7048>] ? ldiskfs_quota_off+0x218/0x220 [ldiskfs]
Jan 16 13:36:53 vm1 kernel: [<ffffffff81183496>] ? deactivate_super+0x46/0x80
Jan 16 13:36:53 vm1 kernel: [<ffffffff811a17ef>] ? mntput_no_expire+0xbf/0x110
Jan 16 13:36:53 vm1 kernel: [<ffffffff811a225b>] ? sys_umount+0x7b/0x3a0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Jan 16 13:36:53 vm1 kernel: --[ end trace 7cd192156324e649 ]--
Jan 16 13:36:53 vm1 kernel: -----------[ cut here ]-----------
Jan 16 13:36:53 vm1 kernel: WARNING: at fs/fs-writeback.c:548 wb_writeback+0x339/0x3f0() (Tainted: G W --------------- )
Jan 16 13:36:53 vm1 kernel: Hardware name: KVM
Jan 16 13:36:53 vm1 kernel: Modules linked in: ldiskfs sha512_generic sha256_generic bridge stp llc nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core sg ext4 jbd2 mbcache sd_mod crc_t10dif pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
Jan 16 13:36:53 vm1 kernel: Pid: 18643, comm: flush-8:16 Tainted: G W --------------- 2.6.32 #4
Jan 16 13:36:53 vm1 kernel: Call Trace:
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e43a>] ? warn_slowpath_null+0x1a/0x20
Jan 16 13:36:53 vm1 kernel: [<ffffffff811aca49>] ? wb_writeback+0x339/0x3f0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8150c480>] ? thread_return+0x4e/0x76e
Jan 16 13:36:53 vm1 kernel: [<ffffffff81081be2>] ? del_timer_sync+0x22/0x30
Jan 16 13:36:53 vm1 kernel: [<ffffffff811acbbb>] ? wb_do_writeback+0xbb/0x240
Jan 16 13:36:53 vm1 kernel: [<ffffffff811acda3>] ? bdi_writeback_task+0x63/0x1b0
Jan 16 13:36:53 vm1 kernel: [<ffffffff81096c67>] ? bit_waitqueue+0x17/0xd0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8113c1d0>] ? bdi_start_fn+0x0/0x100
Jan 16 13:36:53 vm1 kernel: [<ffffffff8113c256>] ? bdi_start_fn+0x86/0x100
Jan 16 13:36:53 vm1 kernel: [<ffffffff8113c1d0>] ? bdi_start_fn+0x0/0x100
Jan 16 13:36:53 vm1 kernel: [<ffffffff81096a36>] ? kthread+0x96/0xa0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Jan 16 13:36:53 vm1 kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Jan 16 13:36:53 vm1 kernel: --[ end trace 7cd192156324e64a ]--
Jan 16 13:36:53 vm1 kernel: -----------[ cut here ]-----------
Jan 16 13:36:53 vm1 kernel: WARNING: at fs/fs-writeback.c:990 sync_inodes_sb+0x17f/0x190() (Tainted: G W --------------- )
Jan 16 13:36:53 vm1 kernel: Hardware name: KVM
Jan 16 13:36:53 vm1 kernel: Modules linked in: ldiskfs sha512_generic sha256_generic bridge stp llc nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core sg ext4 jbd2 mbcache sd_mod crc_t10dif pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
Jan 16 13:36:53 vm1 kernel: Pid: 19717, comm: mkfs.lustre Tainted: G W --------------- 2.6.32 #4
Jan 16 13:36:53 vm1 kernel: Call Trace:
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8106e43a>] ? warn_slowpath_null+0x1a/0x20
Jan 16 13:36:53 vm1 kernel: [<ffffffff811ab8df>] ? sync_inodes_sb+0x17f/0x190
Jan 16 13:36:53 vm1 kernel: [<ffffffff811b1672>] ? __sync_filesystem+0x82/0x90
Jan 16 13:36:53 vm1 kernel: [<ffffffff811b187b>] ? sync_filesystem+0x4b/0x70
Jan 16 13:36:53 vm1 kernel: [<ffffffffa02e7048>] ? ldiskfs_quota_off+0x218/0x220 [ldiskfs]
Jan 16 13:36:53 vm1 kernel: [<ffffffff81183496>] ? deactivate_super+0x46/0x80
Jan 16 13:36:53 vm1 kernel: [<ffffffff811a17ef>] ? mntput_no_expire+0xbf/0x110
Jan 16 13:36:53 vm1 kernel: [<ffffffff811a225b>] ? sys_umount+0x7b/0x3a0
Jan 16 13:36:53 vm1 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

Comment by Niu Yawei (Inactive) [ 17/Jan/14 ]

Yes, indeed. Looks there were heavy changes about the s_umount lock in upstream kernel, and we can't simply backport the path.

Comment by Niu Yawei (Inactive) [ 20/Jan/14 ]

Hi, LiXi
Given that remount operation will never be performed on Lustre device in normal usage, I think this deadlock should never happen on Lustre, I'm wondering if we can just leave it as Not Fixed. What do you think about? (Is there any operation in Lustre normal uasage could trigger this deadlock?)

Comment by Li Xi (Inactive) [ 19/May/14 ]

Hi Yawei,

Sorry for my late reply. I got this problem when changing some codes of ldiskfs. And I don't know there is any way to trigger this problem from Lustre operation. So, I agree to close this ticket as won't fix.

Thanks!

Comment by Niu Yawei (Inactive) [ 19/May/14 ]

Thank you, LiXi.

Comment by Gerrit Updater [ 07/Jul/16 ]

Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: http://review.whamcloud.com/21181
Subject: LU-4474 osd: Add nodelalloc to ldisk mount options
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4130fa92a09b80abf57ba7d539214948ddbf2869

Comment by Artem Blagodarenko (Inactive) [ 07/Jul/16 ]

This hang happened on my system because SELinux is enabled (permissive mode) on my VM. But it can happened in different situations then os_mount cleanup happens

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.lustre D 0000000000000000 0 10566 10564 0x00000080
ffff88004e689710 0000000000000082 0000000000000000 ffff88004e689718
0000000000000000 ffff88004e6896e8 0000000000000286 ffff88004e6896e8
ffff88004d5d7af8 ffff88004e689fd8 000000000000fbc8 ffff88004d5d7af8
Call Trace:
[<ffffffff8152a6d5>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff8152a866>] rwsem_down_read_failed+0x26/0x30
[<ffffffff8128f174>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff81529d64>] ? down_read+0x24/0x30
[<ffffffffa060217c>] ldiskfs_quota_off+0x1bc/0x1f0 [ldiskfs]
[<ffffffff812325aa>] ? selinux_sb_copy_data+0x14a/0x1e0
[<ffffffff8118b866>] deactivate_locked_super+0x46/0x90
[<ffffffff8118b9bd>] vfs_kern_mount+0x10d/0x1b0
[<ffffffffa0654cc4>] osd_mount+0x544/0xf90 [osd_ldiskfs]
[<ffffffffa0655c1f>] osd_device_alloc+0x50f/0x9d0 [osd_ldiskfs]
[<ffffffffa1010d7f>] obd_setup+0x1bf/0x290 [obdclass]
[<ffffffffa0ef9820>] ? cfs_hash_create+0x2e0/0x940 [libcfs]
[<ffffffffa1010fff>] class_setup+0x1af/0x7d0 [obdclass]
[<ffffffffa0ff550f>] ? class_name2dev+0x2f/0x40 [obdclass]
[<ffffffffa1018fac>] class_process_config+0xc6c/0x1ad0 [obdclass]
[<ffffffffa0eeb888>] ? libcfs_log_return+0x28/0x40 [libcfs]
[<ffffffffa101dfab>] ? lustre_cfg_new+0x40b/0x6f0 [obdclass]
[<ffffffffa101e3e8>] do_lcfg+0x158/0x450 [obdclass]
[<ffffffffa101e774>] lustre_start_simple+0x94/0x200 [obdclass]
[<ffffffffa1059411>] server_fill_super+0xf61/0x1abc [obdclass]
[<ffffffffa0eeb888>] ? libcfs_log_return+0x28/0x40 [libcfs]
[<ffffffffa1024388>] lustre_fill_super+0x1d8/0x550 [obdclass]
[<ffffffffa10241b0>] ? lustre_fill_super+0x0/0x550 [obdclass]
[<ffffffff8118c2cf>] get_sb_nodev+0x5f/0xa0
[<ffffffffa101bec5>] lustre_get_sb+0x25/0x30 [obdclass]
[<ffffffff8118b92b>] vfs_kern_mount+0x7b/0x1b0
[<ffffffff8118bad2>] do_kern_mount+0x52/0x130
[<ffffffff811acb0b>] do_mount+0x2fb/0x930
[<ffffffff811ad1d0>] sys_mount+0x90/0xe0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Just tested on my local environment.
Without the patch

root@devvm-sl6-2]# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   permissive
Mode from config file:          permissive
Policy version:                 24
Policy from config file:        mls

[root@devvm-sl6-2]# LOAD=yes lustre/tests/llmount.sh
Loading modules from /lustre/mnt/work/MRP-3473/lustre/tests/..
detected 8 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=vfstrace rpctrace dlmtrace neterror ha config               ioctl super lfsck
subsystem_debug=all -lnet -lnd -pinger
gss/krb5 is not supported

[root@devvm-sl6-2]# mount -t lustre -o loop -o context=fake_context /tmp/lustre-mdt1 /mnt/mds1

SELinux:  duplicate or incompatible mount options
INFO: task mount.lustre:10567 blocked for more than 120 seconds.
      Not tainted 2.6.32master #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.lustre  D 0000000000000007     0 10567  10565 0x00000080
 ffff880064bd7710 0000000000000086 00000000ffffffea ffff880064bd7718
 0000000000000000 ffff880064bd76e8 0000000000000282 ffff880064bd76e8
 ffff880037335098 ffff880064bd7fd8 000000000000fbc8 ffff880037335098
Call Trace:
 [<ffffffff8152a6d5>] rwsem_down_failed_common+0x95/0x1d0
 [<ffffffff8152a866>] rwsem_down_read_failed+0x26/0x30
 [<ffffffff8128f174>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff81529d64>] ? down_read+0x24/0x30
 [<ffffffffa0cd326c>] ldiskfs_quota_off+0x1bc/0x1f0 [ldiskfs]
 [<ffffffff812325aa>] ? selinux_sb_copy_data+0x14a/0x1e0
 [<ffffffff8118b866>] deactivate_locked_super+0x46/0x90
 [<ffffffff8118b9bd>] vfs_kern_mount+0x10d/0x1b0
 [<ffffffffa0d25cd4>] osd_mount+0x544/0xf90 [osd_ldiskfs]
 [<ffffffffa0d26c2f>] osd_device_alloc+0x50f/0x9d0 [osd_ldiskfs]
 [<ffffffffa061b3ff>] obd_setup+0x1bf/0x290 [obdclass]
 [<ffffffffa052f840>] ? cfs_hash_create+0x2e0/0x940 [libcfs]
 [<ffffffffa061b67f>] class_setup+0x1af/0x7d0 [obdclass]
 [<ffffffffa05ffb8f>] ? class_name2dev+0x2f/0x40 [obdclass]
 [<ffffffffa062362c>] class_process_config+0xc6c/0x1ad0 [obdclass]
 [<ffffffffa0628715>] ? lustre_cfg_new+0x4f5/0x6f0 [obdclass]
 [<ffffffffa0628a68>] do_lcfg+0x158/0x450 [obdclass]
 [<ffffffffa0628df4>] lustre_start_simple+0x94/0x200 [obdclass]
 [<ffffffffa0663c51>] server_fill_super+0x1021/0x1b76 [obdclass]
 [<ffffffffa062ea58>] lustre_fill_super+0x218/0x5f0 [obdclass]
 [<ffffffffa062e840>] ? lustre_fill_super+0x0/0x5f0 [obdclass]
 [<ffffffff8118c2cf>] get_sb_nodev+0x5f/0xa0
 [<ffffffffa0626545>] lustre_get_sb+0x25/0x30 [obdclass]
 [<ffffffff8118b92b>] vfs_kern_mount+0x7b/0x1b0
 [<ffffffff8118bad2>] do_kern_mount+0x52/0x130
 [<ffffffff811acb0b>] do_mount+0x2fb/0x930
 [<ffffffff811ad1d0>] sys_mount+0x90/0xe0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task mount.lustre:10567 blocked for more than 120 seconds.
      Not tainted 2.6.32master #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

With patch

SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   permissive
Mode from config file:          permissive
Policy version:                 24
Policy from config file:        mls
root@devvm-sl6-2]# mount -t lustre -o loop -o context=fake_context /tmp/lustre-mdt1 /mnt/mds1
mount.lustre: mount /dev/loop0 at /mnt/mds1 failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.

[root@devvm-sl6-2]# dmesg | tail -n 50
LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts:
SELinux:  duplicate or incompatible mount options
LustreError: 7914:0:(osd_handler.c:5900:osd_mount()) lustre-MDT0000-osd: can't mount /dev/loop0: -22
LustreError: 7914:0:(obd_config.c:517:class_setup()) setup lustre-MDT0000-osd failed (-22)
LustreError: 7914:0:(obd_mount.c:207:lustre_start_simple()) lustre-MDT0000-osd setup error -22
LustreError: 7914:0:(obd_mount_server.c:1713:server_fill_super()) Unable to start osd on /dev/loop0: -22
LustreError: 7914:0:(obd_mount.c:1327:lustre_fill_super()) Unable to mount  (-22)
[root@devvm-sl6-2]#

After patch mount exit with error as expected.

Comment by Bruno Faccini (Inactive) [ 12/Sep/16 ]

Artem,
I wonder if your change #21181 will not introduce some performance decrease for some workloads??
Did you expose it to some filesystem benchmarks suites?

Comment by Bruno Faccini (Inactive) [ 22/Sep/16 ]

Well, after having some internal discussions, it has been established that "delayed allocation" feature of ext4 is not being used in ldiskfs, so your patch (Gerrit-change #21181) finally looks as an easy way to fix this problem!

Comment by Gerrit Updater [ 29/Sep/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21181/
Subject: LU-4474 osd: Add nodelalloc to ldiskfs mount options
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7597c672850e7e9232f7400170455fd12aa08b8a

Comment by Peter Jones [ 29/Sep/16 ]

Landed for 2.9. Please speak up if any further work is still needed

Generated at Sat Feb 10 01:43:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.