[LU-15380] Hang in osp_precreate_reserve() while setstriping (-C 200) on single OST Created: 17/Dec/21 Updated: 18/Dec/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Arshad Hussain | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Hang in osp_precreate_reserve() while creating 2000+ overstripe. There is only one OST. Steps to recreate
1. Prepare enviroment with just single ost. (single node will also do) 2. lfs setstripe -C 2000 /mnt/lustre/file0 <Hang>
Dmesg 06:23:00 centos79 kernel: Lustre: mdt00_001: service thread pid 2247 was inactive for 40.028 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: 06:23:00 centos79 kernel: Pid: 2247, comm: mdt00_001 3.10.0-1160.15.2.el7.x86_64 #1 SMP Wed Feb 3 15:06:38 UTC 2021 06:23:00 centos79 kernel: Call Trace: 06:23:00 centos79 kernel: [<0>] osp_precreate_reserve+0x490/0x9b0 [osp] 06:23:00 centos79 kernel: [<0>] osp_declare_create+0x1ac/0x660 [osp] 06:23:00 centos79 kernel: [<0>] lod_sub_declare_create+0xdf/0x210 [lod] 06:23:00 centos79 kernel: [<0>] lod_qos_declare_object_on+0xf3/0x420 [lod] 06:23:00 centos79 kernel: [<0>] lod_ost_alloc_rr.constprop.23+0xaad/0x1150 [lod] 06:23:00 centos79 kernel: [<0>] lod_qos_prep_create+0x1245/0x1a20 [lod] 06:23:00 centos79 kernel: [<0>] lod_prepare_create+0x230/0x320 [lod] 06:23:00 centos79 kernel: [<0>] lod_declare_striped_create+0x1fe/0x9e0 [lod] 06:23:00 centos79 kernel: [<0>] lod_declare_xattr_set+0x139/0x2020 [lod] 06:23:00 centos79 kernel: [<0>] mdd_create_data+0x532/0x760 [mdd] 06:23:00 centos79 kernel: [<0>] mdt_mfd_open+0xcb7/0xed0 [mdt] 06:23:00 centos79 kernel: [<0>] mdt_finish_open+0x79b/0x930 [mdt] 06:23:00 centos79 kernel: [<0>] mdt_open_by_fid_lock+0x704/0xb10 [mdt] 06:23:00 centos79 kernel: [<0>] mdt_reint_open+0x783/0x2a80 [mdt] 06:23:00 centos79 kernel: [<0>] mdt_reint_rec+0x8a/0x240 [mdt] 06:23:00 centos79 kernel: [<0>] mdt_reint_internal+0x76c/0xb50 [mdt] 06:23:00 centos79 kernel: [<0>] mdt_intent_open+0x93/0x480 [mdt] 06:23:00 centos79 kernel: [<0>] mdt_intent_opc+0x1e0/0xc10 [mdt] 06:23:00 centos79 kernel: [<0>] mdt_intent_policy+0x1a1/0x360 [mdt] 06:23:00 centos79 kernel: [<0>] ldlm_lock_enqueue+0x3c5/0xb50 [ptlrpc] 06:23:00 centos79 kernel: [<0>] ldlm_handle_enqueue0+0xab6/0x16d0 [ptlrpc] 06:23:00 centos79 kernel: [<0>] tgt_enqueue+0x64/0x240 [ptlrpc] 06:23:00 centos79 kernel: [<0>] tgt_request_handle+0x823/0x1850 [ptlrpc] 06:23:00 centos79 kernel: [<0>] ptlrpc_server_handle_request+0x253/0xb10 [ptlrpc] 06:23:00 centos79 kernel: [<0>] ptlrpc_main+0xbf4/0x15e0 [ptlrpc] 06:23:00 centos79 kernel: [<0>] kthread+0xd1/0xe0 06:23:00 centos79 kernel: [<0>] ret_from_fork_nospec_begin+0xe/0x21 06:23:00 centos79 kernel: [<0>] 0xfffffffffffffffe |
| Comments |
| Comment by Arshad Hussain [ 17/Dec/21 ] |
(gdb) list *(osp_precreate_reserve+0x490)
0x13fc0 is in osp_precreate_reserve (/root/lustre-dev/lustre-release/lustre/osp/osp_precreate.c:1509).
1504 LASSERT(d->opd_pre);
1505 rc = -ENOBUFS;
1506 break;
1507 }
1508
1509 if (wait_event_idle_timeout(
1510 d->opd_pre_user_waitq,
1511 osp_precreate_ready_condition(env, d),
1512 cfs_time_seconds(obd_timeout)) == 0) {
1513 CDEBUG(D_HA,
(gdb)
|
| Comment by Patrick Farrell [ 17/Dec/21 ] |
|
Arshad, Did the hang resolve or did it stay stuck indefinitely? Also, 2000 stripes on one OST is not going to work well at all, in a practical sense |
| Comment by Arshad Hussain [ 17/Dec/21 ] |
|
Patrick, I hard reset the VM after ~10 mins. I can confirm that until ~10 mins the hang was not resolved and was stuck. Let me rerun the recreation and confirm to you if it is stuck indefinitely. Probably, I will wait for ~30 mins instead of 10min, that I understand should be good enough. >Also, 2000 stripes on one OST is not going to work well at all, ... Thanks |
| Comment by Arshad Hussain [ 17/Dec/21 ] |
|
Patrick, Now that I am looking at the osp_precreate code. As I understand the 100seconds "obd_timeout" should already have triggered? It is a complete hang IMO. if (wait_event_idle_timeout(
d->opd_pre_user_waitq,
osp_precreate_ready_condition(env, d),
cfs_time_seconds(obd_timeout)) == 0) { <<<<<<<<< 100 seconds (However, after 10mins it was still stuck)
|
| Comment by Andreas Dilger [ 17/Dec/21 ] |
|
You should try Alex's patch for reducing the credits for overstriped files: https://review.whamcloud.com/45765 " |
| Comment by Andreas Dilger [ 17/Dec/21 ] |
|
Oh, never mind, this is stuck on the MDS before it even gets to the OST. Probably on a newly-booted MDS there are no procreated objects, and it is unable to get the objects it needs. It would probably work with "-C 32". Of course it shouldn't hang either. We have some tests doing "-C 2000" with only 7 OSTs, so that is 280 objects per OST, so not sure why this wouldn't work. |
| Comment by Patrick Farrell [ 17/Dec/21 ] |
|
Or even 2 OSTs - I've certainly tried it locally. But it's not a new file system in those cases, is it? |
| Comment by Arshad Hussain [ 18/Dec/21 ] |
|
>Or even 2 OSTs - I've certainly tried it locally. But it's not a new file system in those cases, is it? Yes, it hangs with 2 OSTs also. Mine is relatively new FS (Mostly bought up with llmount.sh). This is my local setup (edited to remove FS options making it more readable) # mount | grep lus /dev/sda2 on /usr/sbin/mount.lustre type ext4 (rw,relatime,data=ordered) 1. MDS/MGS /dev/mapper/mds1_flakey on /mnt/lustre-mds1 type lustre 3 OST /dev/mapper/ost1_flakey on /mnt/lustre-ost1 type lustre /dev/mapper/ost2_flakey on /mnt/lustre-ost2 type lustre /dev/mapper/ost3_flakey on /mnt/lustre-ost3 type lustre 192.168.50.95@tcp:/lustre on /mnt/lustre type lustre |
| Comment by Arshad Hussain [ 18/Dec/21 ] |
|
Andreas, >...It would probably work with "-C 32". Of course it shouldn't hang either. ... With 1 OST it does work with overstripe(-C) 32. However, I am observing that the 'rm' after setstripe hangs with no backtrace on dmesg etc... Given that llmount.sh bring up the environment like below
# mount | grep lus /dev/sda2 on /usr/sbin/mount.lustre type ext4 (rw,relatime,data=ordered) /dev/mapper/mds1_flakey on /mnt/lustre-mds1 type lustre /dev/mapper/ost1_flakey on /mnt/lustre-ost1 type lustre /dev/mapper/ost2_flakey on /mnt/lustre-ost2 type lustre /dev/mapper/ost3_flakey on /mnt/lustre-ost3 type lustre 192.168.50.95@tcp:/lustre on /mnt/lustre type lustre umount 2 OST (simulate a single OST)
# umount /mnt/lustre-ost3 # umount /mnt/lustre-ost2
# mount | grep lus /dev/sda2 on /usr/sbin/mount.lustre type ext4 (rw,relatime,data=ordered) /dev/mapper/mds1_flakey on /mnt/lustre-mds1 type lustre (ro,svname=lustre-MDT0000,mgs,osd=osd-ldiskfs,user_xattr,errors=remount-ro) /dev/mapper/ost1_flakey on /mnt/lustre-ost1 type lustre (ro,svname=lustre-OST0000,mgsnode=192.168.50.95@tcp,osd=osd-ldiskfs,errors=remount-ro) 192.168.50.95@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt) Run "setstrip" - This is success # lfs setstripe -C 32 /mnt/lustre/file0 Try to remove newly created file.. "rm" hangs forever
# rm /mnt/lustre/file0
meanwhile dmesg has
Dec 17 23:54:30 centos79 kernel: LustreError: 11-0: lustre-OST0001-osc-ffff917e5a017800: operation ost_connect to node 0@lo failed: rc = -19 Dec 17 23:55:00 centos79 kernel: LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. Dec 17 23:55:00 centos79 kernel: LustreError: Skipped 51 previous similar messages Dec 17 23:56:46 centos79 kernel: LustreError: 11-0: lustre-OST0002-osc-ffff917e5a017800: operation ost_connect to node 0@lo failed: rc = -19 Dec 17 23:57:01 centos79 kernel: LustreError: 11-0: lustre-OST0001-osc-ffff917e5a017800: operation ost_connect to node 0@lo failed: rc = -19 Dec 17 23:57:31 centos79 kernel: LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
Now bring up the privous unmonted OST
# mount.lustre /dev/mapper/ost2_flakey /mnt/lustre-ost2/ # mount.lustre /dev/mapper/ost3_flakey /mnt/lustre-ost3/ "rm" comes alive... # rm /mnt/lustre/file0 rm: remove regular empty file ‘/mnt/lustre/file0’? y # ls -ali /mnt/lustre/ total 8 144115188193296385 drwxr-xr-x 3 root root 4096 Dec 18 00:02 . 1700609 drwxr-xr-x. 9 root root 4096 Sep 13 06:17 ..
|
| Comment by Arshad Hussain [ 18/Dec/21 ] |
|
...continue The same is applied on "setstrip -C 2000" on single OST. # umount /mnt/lustre-ost3/ # umount /mnt/lustre-ost2/ # lfs setstripe -C 2000 /mnt/lustre/file0 <hangs> Wait for ~15 mins... Then mount back the unmounted OST # mount.lustre /dev/mapper/ost2_flakey /mnt/lustre-ost2/ # mount.lustre /dev/mapper/ost3_flakey /mnt/lustre-ost3/ Wait some more... setstripe thread comes alive. Ofcourse object has no stripe info though
# lfs setstripe -C 2000 /mnt/lustre/file0 ...lots of waiting... # lfs getstripe /mnt/lustre/file0 /mnt/lustre/file0 has no stripe info dmesg Dec 18 00:46:01 centos79 kernel: Lustre: lustre-OST0002-osc-MDT0000: Connection restored to 192.168.50.95@tcp (at 0@lo) Dec 18 00:54:16 centos79 systemd-logind: New session 4 of user root. Dec 18 00:54:16 centos79 systemd: Started Session 4 of user root. Dec 18 00:54:50 centos79 kernel: Lustre: 6473:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1639806289/real 1639806289] req@ffff929b61754900 x1719460205452416/t0(0) o101->lustre-MDT0000-mdc-ffff929b58e76000@0@lo:12/10 lens 648/66264 e 24 to 1 dl 1639806890 ref 2 fl Rpc:IXPQr/2/ffffffff rc 0/-1 job:'lfs.0' Dec 18 00:54:50 centos79 kernel: Lustre: lustre-MDT0000-mdc-ffff929b58e76000: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Dec 18 00:54:50 centos79 kernel: Lustre: lustre-MDT0000: Client a48ea68c-ea49-4863-8b15-e54b432375a0 (at 0@lo) reconnecting Dec 18 00:54:50 centos79 kernel: Lustre: lustre-MDT0000-mdc-ffff929b58e76000: Connection restored to 192.168.50.95@tcp (at 0@lo) Summary: IMO, the way OST's are initially initilize the precreated objects are set is causing the setstripe hang. If lustre is setup with 3 OST then -C 2000 works flawlessly on this. As soon as we unmount a OST it starts to fail. Either in setstrip or any operation like rm (shown above). Even in maloo I think we can simulate it by letting 7 OST (presently setup) to come up and then unmounting a sinlge OST. Setstrip will also hang there.
|