[LU-4474] deadlock of ldiskfs_quota_off() Created: 11/Jan/14 Updated: 29/Sep/16 Resolved: 29/Sep/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Li Xi (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ldiskfs | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 12256 | ||||
| Description |
|
When I ran quotacheck directly on ldiskfs, the process stuck. Following is the dump stack. The ext4_quota_off() function added by ext4-quota-force-block-alloc-quotaoff.patch tried to grab read lock of sb->s_umount. But it stuck there because it was holding write lock of sb->s_umount. COMMAND: "quotacheck" Following is how to reproduce it. Please note that the version of quota-tools is 4.0.1. Somehow quota-tools of version 3.17 refuses to quotacheck a ldiskfs mount point. [root@vm1 quota-tools]# mount -t ldiskfs /dev/sdb3 -o usrquota /mnt/ |
| Comments |
| Comment by Li Xi (Inactive) [ 11/Jan/14 ] |
|
I am sure, but maybe this problem could be fixed by removing the down_read(&sb->s_umount)? It seems that a write lock of sb->s_umount is alway being hold when ext4_quota_off() is called. Please correct me if I am wrong. |
| Comment by Peter Jones [ 11/Jan/14 ] |
|
NIu Could you please comment? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 13/Jan/14 ] |
|
Hi, LiXi What kernel version did you use? I didn't see how ldiskfs_remount calls ldiskfs_quota_off in my kernel (2.6.32-279). BTW: Why do you need a quotacheck on ldiskfs? Isn't it obsoleted? |
| Comment by Li Xi (Inactive) [ 13/Jan/14 ] |
|
Hi Yawei, I am using linux-2.6.32-358.18.1.el6.x86_64. vfs_dq_off() is called in ext4_remount(). static int ext4_remount(struct super_block *sb, int *flags, char *data) if (*flags & MS_RDONLY) { /*
Yeah, I understand this is not a common operation for Lustre. I thought we can use ldiskfs as a normal file system, so I ran quota utility directly on ldiskfs. If it is not proper to do so, do you have any suggestions about how to check the quota mechanism of ldiskfs? As you know, I am working on a quota stuff. Quota codes of ldiskfs have to be updated for support. Thanks! |
| Comment by Niu Yawei (Inactive) [ 13/Jan/14 ] |
|
quota accounting is always enabled on ldiskfs (see ext4-quota-first-class.patch), so quotacheck is no longer needed anymore. (if the filesystem is upgraded from old ldiskfs which doesn't have EXT4_FEATURE_RO_COMPAT_QUOTA feature, the quota accounting will be udpated automatically when you enable the feature by 'tune2fs -O quota') |
| Comment by Li Xi (Inactive) [ 13/Jan/14 ] |
|
Ah, thank you! Do you think this problem could be trigger by any operation from Lustre or not? I think it would be better to fix this problem anyway if it is not too difficult. |
| Comment by Niu Yawei (Inactive) [ 13/Jan/14 ] |
|
I think it could probably be triggered in Lustre, but I didn't see why it was never reported before. Looks it can't be fixed simply be removing down_read(s_umount), because I didn't see s_umount is held in quotactl path (do_quotactl()). Could you try to search to see if it's already fixed in upstream? |
| Comment by Niu Yawei (Inactive) [ 14/Jan/14 ] |
|
Looks it has been fixed in upstream: It must be already acquired by caller. Fix lock_dep complain. Signed-off-by: Dmitry Monakhov <dmon...@gmail.com> Reviewed-by: Jan Kara <ja...@suse.cz> diff --git a/fs/ext4/super.c b/fs/ext4/super.c index e59eb37..062d1bc 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4107,12 +4107,10 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id, static int ext4_quota_off(struct super_block *sb, int type) { - /* Force all delayed allocation blocks to be allocated */ - if (test_opt(sb, DELALLOC)) { - down_read(&sb->s_umount); + /* Force all delayed allocation blocks to be allocated. + * Caller already holds s_umount sem */ + if (test_opt(sb, DELALLOC)) sync_filesystem(sb); - up_read(&sb->s_umount); - } return dquot_quota_off(sb, type); } I'll backport it to Lustre, thank you, LiXi. |
| Comment by Li Xi (Inactive) [ 14/Jan/14 ] |
|
Ah, Yawei, thank you so much for investigte this! |
| Comment by Niu Yawei (Inactive) [ 14/Jan/14 ] |
| Comment by Li Xi (Inactive) [ 17/Jan/14 ] |
|
Hi Yawei, I found following messages when running mkfs.lustre on a server which has applied your patch. It seems sync_filesystem() complained that: WARN_ON(!rwsem_is_locked(&sb->s_umount)); Do you think this related to the patch? Thanks! Jan 16 13:36:53 vm1 kernel: LDISKFS-fs (sdb3): mounted filesystem with ordered data mode. quota=on. Opts: |
| Comment by Niu Yawei (Inactive) [ 17/Jan/14 ] |
|
Yes, indeed. Looks there were heavy changes about the s_umount lock in upstream kernel, and we can't simply backport the path. |
| Comment by Niu Yawei (Inactive) [ 20/Jan/14 ] |
|
Hi, LiXi |
| Comment by Li Xi (Inactive) [ 19/May/14 ] |
|
Hi Yawei, Sorry for my late reply. I got this problem when changing some codes of ldiskfs. And I don't know there is any way to trigger this problem from Lustre operation. So, I agree to close this ticket as won't fix. Thanks! |
| Comment by Niu Yawei (Inactive) [ 19/May/14 ] |
|
Thank you, LiXi. |
| Comment by Gerrit Updater [ 07/Jul/16 ] |
|
Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: http://review.whamcloud.com/21181 |
| Comment by Artem Blagodarenko (Inactive) [ 07/Jul/16 ] |
|
This hang happened on my system because SELinux is enabled (permissive mode) on my VM. But it can happened in different situations then os_mount cleanup happens "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.lustre D 0000000000000000 0 10566 10564 0x00000080 ffff88004e689710 0000000000000082 0000000000000000 ffff88004e689718 0000000000000000 ffff88004e6896e8 0000000000000286 ffff88004e6896e8 ffff88004d5d7af8 ffff88004e689fd8 000000000000fbc8 ffff88004d5d7af8 Call Trace: [<ffffffff8152a6d5>] rwsem_down_failed_common+0x95/0x1d0 [<ffffffff8152a866>] rwsem_down_read_failed+0x26/0x30 [<ffffffff8128f174>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff81529d64>] ? down_read+0x24/0x30 [<ffffffffa060217c>] ldiskfs_quota_off+0x1bc/0x1f0 [ldiskfs] [<ffffffff812325aa>] ? selinux_sb_copy_data+0x14a/0x1e0 [<ffffffff8118b866>] deactivate_locked_super+0x46/0x90 [<ffffffff8118b9bd>] vfs_kern_mount+0x10d/0x1b0 [<ffffffffa0654cc4>] osd_mount+0x544/0xf90 [osd_ldiskfs] [<ffffffffa0655c1f>] osd_device_alloc+0x50f/0x9d0 [osd_ldiskfs] [<ffffffffa1010d7f>] obd_setup+0x1bf/0x290 [obdclass] [<ffffffffa0ef9820>] ? cfs_hash_create+0x2e0/0x940 [libcfs] [<ffffffffa1010fff>] class_setup+0x1af/0x7d0 [obdclass] [<ffffffffa0ff550f>] ? class_name2dev+0x2f/0x40 [obdclass] [<ffffffffa1018fac>] class_process_config+0xc6c/0x1ad0 [obdclass] [<ffffffffa0eeb888>] ? libcfs_log_return+0x28/0x40 [libcfs] [<ffffffffa101dfab>] ? lustre_cfg_new+0x40b/0x6f0 [obdclass] [<ffffffffa101e3e8>] do_lcfg+0x158/0x450 [obdclass] [<ffffffffa101e774>] lustre_start_simple+0x94/0x200 [obdclass] [<ffffffffa1059411>] server_fill_super+0xf61/0x1abc [obdclass] [<ffffffffa0eeb888>] ? libcfs_log_return+0x28/0x40 [libcfs] [<ffffffffa1024388>] lustre_fill_super+0x1d8/0x550 [obdclass] [<ffffffffa10241b0>] ? lustre_fill_super+0x0/0x550 [obdclass] [<ffffffff8118c2cf>] get_sb_nodev+0x5f/0xa0 [<ffffffffa101bec5>] lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff8118b92b>] vfs_kern_mount+0x7b/0x1b0 [<ffffffff8118bad2>] do_kern_mount+0x52/0x130 [<ffffffff811acb0b>] do_mount+0x2fb/0x930 [<ffffffff811ad1d0>] sys_mount+0x90/0xe0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Just tested on my local environment. root@devvm-sl6-2]# sestatus SELinux status: enabled SELinuxfs mount: /selinux Current mode: permissive Mode from config file: permissive Policy version: 24 Policy from config file: mls [root@devvm-sl6-2]# LOAD=yes lustre/tests/llmount.sh Loading modules from /lustre/mnt/work/MRP-3473/lustre/tests/.. detected 8 online CPUs by sysfs libcfs will create CPU partition based on online CPUs debug=vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck subsystem_debug=all -lnet -lnd -pinger gss/krb5 is not supported [root@devvm-sl6-2]# mount -t lustre -o loop -o context=fake_context /tmp/lustre-mdt1 /mnt/mds1 SELinux: duplicate or incompatible mount options INFO: task mount.lustre:10567 blocked for more than 120 seconds. Not tainted 2.6.32master #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.lustre D 0000000000000007 0 10567 10565 0x00000080 ffff880064bd7710 0000000000000086 00000000ffffffea ffff880064bd7718 0000000000000000 ffff880064bd76e8 0000000000000282 ffff880064bd76e8 ffff880037335098 ffff880064bd7fd8 000000000000fbc8 ffff880037335098 Call Trace: [<ffffffff8152a6d5>] rwsem_down_failed_common+0x95/0x1d0 [<ffffffff8152a866>] rwsem_down_read_failed+0x26/0x30 [<ffffffff8128f174>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff81529d64>] ? down_read+0x24/0x30 [<ffffffffa0cd326c>] ldiskfs_quota_off+0x1bc/0x1f0 [ldiskfs] [<ffffffff812325aa>] ? selinux_sb_copy_data+0x14a/0x1e0 [<ffffffff8118b866>] deactivate_locked_super+0x46/0x90 [<ffffffff8118b9bd>] vfs_kern_mount+0x10d/0x1b0 [<ffffffffa0d25cd4>] osd_mount+0x544/0xf90 [osd_ldiskfs] [<ffffffffa0d26c2f>] osd_device_alloc+0x50f/0x9d0 [osd_ldiskfs] [<ffffffffa061b3ff>] obd_setup+0x1bf/0x290 [obdclass] [<ffffffffa052f840>] ? cfs_hash_create+0x2e0/0x940 [libcfs] [<ffffffffa061b67f>] class_setup+0x1af/0x7d0 [obdclass] [<ffffffffa05ffb8f>] ? class_name2dev+0x2f/0x40 [obdclass] [<ffffffffa062362c>] class_process_config+0xc6c/0x1ad0 [obdclass] [<ffffffffa0628715>] ? lustre_cfg_new+0x4f5/0x6f0 [obdclass] [<ffffffffa0628a68>] do_lcfg+0x158/0x450 [obdclass] [<ffffffffa0628df4>] lustre_start_simple+0x94/0x200 [obdclass] [<ffffffffa0663c51>] server_fill_super+0x1021/0x1b76 [obdclass] [<ffffffffa062ea58>] lustre_fill_super+0x218/0x5f0 [obdclass] [<ffffffffa062e840>] ? lustre_fill_super+0x0/0x5f0 [obdclass] [<ffffffff8118c2cf>] get_sb_nodev+0x5f/0xa0 [<ffffffffa0626545>] lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff8118b92b>] vfs_kern_mount+0x7b/0x1b0 [<ffffffff8118bad2>] do_kern_mount+0x52/0x130 [<ffffffff811acb0b>] do_mount+0x2fb/0x930 [<ffffffff811ad1d0>] sys_mount+0x90/0xe0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task mount.lustre:10567 blocked for more than 120 seconds. Not tainted 2.6.32master #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. With patch SELinux status: enabled
SELinuxfs mount: /selinux
Current mode: permissive
Mode from config file: permissive
Policy version: 24
Policy from config file: mls
root@devvm-sl6-2]# mount -t lustre -o loop -o context=fake_context /tmp/lustre-mdt1 /mnt/mds1
mount.lustre: mount /dev/loop0 at /mnt/mds1 failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.
[root@devvm-sl6-2]# dmesg | tail -n 50
LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts:
SELinux: duplicate or incompatible mount options
LustreError: 7914:0:(osd_handler.c:5900:osd_mount()) lustre-MDT0000-osd: can't mount /dev/loop0: -22
LustreError: 7914:0:(obd_config.c:517:class_setup()) setup lustre-MDT0000-osd failed (-22)
LustreError: 7914:0:(obd_mount.c:207:lustre_start_simple()) lustre-MDT0000-osd setup error -22
LustreError: 7914:0:(obd_mount_server.c:1713:server_fill_super()) Unable to start osd on /dev/loop0: -22
LustreError: 7914:0:(obd_mount.c:1327:lustre_fill_super()) Unable to mount (-22)
[root@devvm-sl6-2]#
After patch mount exit with error as expected. |
| Comment by Bruno Faccini (Inactive) [ 12/Sep/16 ] |
|
Artem, |
| Comment by Bruno Faccini (Inactive) [ 22/Sep/16 ] |
|
Well, after having some internal discussions, it has been established that "delayed allocation" feature of ext4 is not being used in ldiskfs, so your patch (Gerrit-change #21181) finally looks as an easy way to fix this problem! |
| Comment by Gerrit Updater [ 29/Sep/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21181/ |
| Comment by Peter Jones [ 29/Sep/16 ] |
|
Landed for 2.9. Please speak up if any further work is still needed |