[LU-16966] ofd_object_fallocate dead lock? Created: 15/Jul/23  Updated: 17/Nov/23  Resolved: 16/Oct/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.2
Fix Version/s: Lustre 2.16.0, Lustre 2.15.4

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre version:
lustre-iokit-2.15.2-1nas_mofed496el8_lustre_20230111v1.x86_64
kmod-lustre-2.15.2-1nas_mofed496el8_lustre_20230111v1.x86_64
pcp-lustre-0.4.16-2.noarch
lustre-devel-2.15.2-1nas_mofed496el8_lustre_20230111v1.x86_64
lustre-osd-ldiskfs-mount-2.15.2-1nas_mofed496el8_lustre_20230111v1.x86_64
lustre-2.15.2-1nas_mofed496el8_lustre_20230111v1.x86_64
lustre-tests-2.15.2-1nas_mofed496el8_lustre_20230111v1.x86_64
kmod-lustre-osd-ldiskfs-2.15.2-1nas_mofed496el8_lustre_20230111v1.x86_64
kmod-lustre-tests-2.15.2-1nas_mofed496el8_lustre_20230111v1.x86_64

kernel: 4.18.0-425.3.1.el8_lustre.x86_64
mofed: mlnx-ofa_kernel-4.9-mofed496.x86_64


Attachments: HTML File brw_stats     File brw_stats.save.1693236421     File dmesg.out     Text File fallocate-range-locking.patch     File nbp15.hang     File stack.out     File stack1-1.out     File stack1.out    
Issue Links:
Duplicate
duplicates LU-15800 Fallocate causes transaction deadlock Resolved
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

We have had multiple servers get dead lock with this stack trace.

(attached longer console output)

Jul 15 05:46:28 nbp11-srv3 kernel: INFO: task ll_ost07_000:9230 blocked for more than 120 seconds.
Jul 15 05:46:28 nbp11-srv3 kernel:      Tainted: G           OE    --------- -  - 4.18.0-425.3.1.el8_lustre.x86_64 #1
Jul 15 05:46:28 nbp11-srv3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 15 05:46:28 nbp11-srv3 kernel: task:ll_ost07_000    state:D stack:    0 pid: 9230 ppid:     2 flags:0x80004080
Jul 15 05:46:28 nbp11-srv3 kernel: Call Trace:
Jul 15 05:46:28 nbp11-srv3 kernel: __schedule+0x2d1/0x860
Jul 15 05:46:28 nbp11-srv3 kernel: schedule+0x35/0xa0
Jul 15 05:46:28 nbp11-srv3 kernel: wait_transaction_locked+0x89/0xd0 [jbd2]
Jul 15 05:46:28 nbp11-srv3 kernel: ? finish_wait+0x80/0x80
Jul 15 05:46:28 nbp11-srv3 kernel: add_transaction_credits+0xd4/0x290 [jbd2]
Jul 15 05:46:28 nbp11-srv3 kernel: ? ldiskfs_do_update_inode+0x604/0x800 [ldiskfs]
Jul 15 05:46:28 nbp11-srv3 kernel: start_this_handle+0x10a/0x520 [jbd2]
Jul 15 05:46:28 nbp11-srv3 kernel: ? osd_fallocate_preallocate.isra.38+0x275/0x760 [osd_ldiskfs]
Jul 15 05:46:28 nbp11-srv3 kernel: ? ldiskfs_mark_iloc_dirty+0x32/0x90 [ldiskfs]
Jul 15 05:46:28 nbp11-srv3 kernel: jbd2__journal_restart+0xb4/0x160 [jbd2]
Jul 15 05:46:28 nbp11-srv3 kernel: osd_fallocate_preallocate.isra.38+0x5a6/0x760 [osd_ldiskfs]
Jul 15 05:46:28 nbp11-srv3 kernel: osd_fallocate+0xfd/0x370 [osd_ldiskfs]
Jul 15 05:46:28 nbp11-srv3 kernel: ofd_object_fallocate+0x5dd/0xa30 [ofd]
Jul 15 05:46:28 nbp11-srv3 kernel: ofd_fallocate_hdl+0x467/0x730 [ofd]
Jul 15 05:46:28 nbp11-srv3 kernel: tgt_request_handle+0xc97/0x1a40 [ptlrpc]
Jul 15 05:46:28 nbp11-srv3 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc]
Jul 15 05:46:28 nbp11-srv3 kernel: ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
Jul 15 05:46:28 nbp11-srv3 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc]
Jul 15 05:46:28 nbp11-srv3 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
Jul 15 05:46:28 nbp11-srv3 kernel: kthread+0x10a/0x120
Jul 15 05:46:28 nbp11-srv3 kernel: ? set_kthread_struct+0x50/0x50
Jul 15 05:46:28 nbp11-srv3 kernel: ret_from_fork+0x1f/0x40


 Comments   
Comment by Alex Zhuravlev [ 17/Jul/23 ]

looks like a duplicate of LU-15800

Comment by Peter Jones [ 17/Jul/23 ]

Thanks Alex! 

Xing

Can you please port the fox from LU-15800 to b2_15?

Thanks

Peter

Comment by Xing Huang [ 18/Jul/23 ]

Yes, Peter, I'll today.

Comment by Mahmoud Hanafi [ 03/Aug/23 ]

We applied LU-15800 but we say a filesystem hang. Longer console logs attached to the case.nbp15.hang

Aug  2 23:09:33 nbp15-srv1 kernel: INFO: task ll_ost02_011:14622 blocked for more than 120 seconds.
Aug  2 23:09:33 nbp15-srv1 kernel:      Tainted: G           OE    --------- -  - 4.18.0-477.10.1.el8_lustre.x86_64 #1
Aug  2 23:09:33 nbp15-srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  2 23:09:33 nbp15-srv1 kernel: task:ll_ost02_011    state:D stack:    0 pid:14622 ppid:     2 flags:0x80004080
Aug  2 23:09:33 nbp15-srv1 kernel: Call Trace:
Aug  2 23:09:33 nbp15-srv1 kernel: __schedule+0x2d1/0x870
Aug  2 23:09:33 nbp15-srv1 kernel: schedule+0x55/0xf0
Aug  2 23:09:33 nbp15-srv1 kernel: wait_transaction_locked+0x89/0xd0 [jbd2]
Aug  2 23:09:33 nbp15-srv1 kernel: ? finish_wait+0x80/0x80
Aug  2 23:09:33 nbp15-srv1 kernel: add_transaction_credits+0xd4/0x290 [jbd2]
Aug  2 23:09:33 nbp15-srv1 kernel: ? osd_declare_qid+0x398/0x4c0 [osd_ldiskfs]
Aug  2 23:09:33 nbp15-srv1 kernel: start_this_handle+0x10a/0x520 [jbd2]
Aug  2 23:09:33 nbp15-srv1 kernel: ? jbd2__journal_start+0x8f/0x1f0 [jbd2]
Aug  2 23:09:33 nbp15-srv1 kernel: ? kmem_cache_alloc+0x13f/0x280
Aug  2 23:09:33 nbp15-srv1 kernel: jbd2__journal_start+0xee/0x1f0 [jbd2]
Aug  2 23:09:33 nbp15-srv1 kernel: ? osd_trans_start+0x13b/0x500 [osd_ldiskfs]
Aug  2 23:09:33 nbp15-srv1 kernel: __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs]
Aug  2 23:09:33 nbp15-srv1 kernel: osd_trans_start+0x13b/0x500 [osd_ldiskfs]
Aug  2 23:09:33 nbp15-srv1 kernel: ofd_attr_set+0x546/0x1090 [ofd]
Aug  2 23:09:33 nbp15-srv1 kernel: ofd_setattr_hdl+0x458/0x8e0 [ofd]
Aug  2 23:09:33 nbp15-srv1 kernel: tgt_request_handle+0xccd/0x1b10 [ptlrpc]
Aug  2 23:09:33 nbp15-srv1 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc]
Aug  2 23:09:33 nbp15-srv1 kernel: ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
Aug  2 23:09:33 nbp15-srv1 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc]
Aug  2 23:09:33 nbp15-srv1 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
Aug  2 23:09:33 nbp15-srv1 kernel: kthread+0x134/0x150
Aug  2 23:09:33 nbp15-srv1 kernel: ? set_kthread_struct+0x50/0x50
Aug  2 23:09:33 nbp15-srv1 kernel: ret_from_fork+0x1f/0x40
Aug  2 23:09:33 nbp15-srv1 kernel: INFO: task ll_ost03_006:14631 blocked for more than 120 seconds.
Aug  2 23:09:33 nbp15-srv1 kernel:      Tainted: G           OE    --------- -  - 4.18.0-477.10.1.el8_lustre.x86_64 #1
Aug  2 23:09:33 nbp15-srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  2 23:09:33 nbp15-srv1 kernel: task:ll_ost03_006    state:D stack:    0 pid:14631 ppid:     2 flags:0x80004080
Aug  2 23:09:33 nbp15-srv1 kernel: Call Trace:
Aug  2 23:09:33 nbp15-srv1 kernel: __schedule+0x2d1/0x870
Aug  2 23:09:33 nbp15-srv1 kernel: schedule+0x55/0xf0
Aug  2 23:09:33 nbp15-srv1 kernel: wait_transaction_locked+0x89/0xd0 [jbd2]
Aug  2 23:09:33 nbp15-srv1 kernel: ? finish_wait+0x80/0x80
Aug  2 23:09:33 nbp15-srv1 kernel: add_transaction_credits+0xd4/0x290 [jbd2]
Aug  2 23:09:33 nbp15-srv1 kernel: ? ldiskfs_do_update_inode+0x604/0x800 [ldiskfs]
Aug  2 23:09:33 nbp15-srv1 kernel: start_this_handle+0x10a/0x520 [jbd2]
Aug  2 23:09:33 nbp15-srv1 kernel: ? osd_fallocate_preallocate.isra.37+0x275/0x760 [osd_ldiskfs]
Aug  2 23:09:33 nbp15-srv1 kernel: ? ldiskfs_mark_iloc_dirty+0x32/0x90 [ldiskfs]
Aug  2 23:09:33 nbp15-srv1 kernel: jbd2__journal_restart+0xb4/0x160 [jbd2]
Aug  2 23:09:33 nbp15-srv1 kernel: osd_fallocate_preallocate.isra.37+0x5a6/0x760 [osd_ldiskfs]
Aug  2 23:09:33 nbp15-srv1 kernel: osd_fallocate+0xfd/0x370 [osd_ldiskfs]
Aug  2 23:09:33 nbp15-srv1 kernel: ofd_object_fallocate+0x5dc/0xa30 [ofd]
Aug  2 23:09:33 nbp15-srv1 kernel: ofd_fallocate_hdl+0x467/0x730 [ofd]
Aug  2 23:09:33 nbp15-srv1 kernel: tgt_request_handle+0xccd/0x1b10 [ptlrpc]
Aug  2 23:09:33 nbp15-srv1 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc]
Aug  2 23:09:33 nbp15-srv1 kernel: ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
Aug  2 23:09:33 nbp15-srv1 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc]
Aug  2 23:09:33 nbp15-srv1 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
Aug  2 23:09:33 nbp15-srv1 kernel: kthread+0x134/0x150
Aug  2 23:09:33 nbp15-srv1 kernel: ? set_kthread_struct+0x50/0x50
Aug  2 23:09:33 nbp15-srv1 kernel: ret_from_fork+0x1f/0x40 
Comment by Mahmoud Hanafi [ 04/Aug/23 ]

This looks more like LU-16691

 

Comment by Mahmoud Hanafi [ 04/Aug/23 ]

Filesystem hung again. attaching stack output.

stack.out

 

 

Comment by Alex Zhuravlev [ 04/Aug/23 ]

Filesystem hung again. attaching stack output.

thanks, need some time to analyze the traces.

Comment by Alex Zhuravlev [ 07/Aug/23 ]

We applied LU-15800 but we say a filesystem hang. Longer console logs attached to the case.nbp15.hang

I noticed OOM:

Aug  2 20:55:33 nbp15-srv1 kernel: mdt_rdpg02_001: page allocation failure: order:0, mode:0x40(__GFP_IO), nodemask=(null),cpuset=/,mems_allowed=0
Comment by Alex Zhuravlev [ 07/Aug/23 ]

can you please show output for lctl get_param osd*.OST.brw_stats from that OST?

Comment by Alex Zhuravlev [ 07/Aug/23 ]

could you please clarify what exact version you're running? the ticket says 2.15.2, but you also siad LU-15800 was added? anything else? I need to know exact state of the source.

Comment by Peter Jones [ 07/Aug/23 ]

Alex

I would expect that the answer will be the b2_15 branch from https://github.com/champios/lustre-nas

Peter

Comment by Alex Zhuravlev [ 07/Aug/23 ]

thanks Peter

Comment by Alex Zhuravlev [ 07/Aug/23 ]

b2_15-nas is the only "15" one, right?

Comment by Mahmoud Hanafi [ 07/Aug/23 ]

Initially we were running wc 2.15.2. After we got the patch we were running 2.15.3 + LU-15800 (2.15.3-2nas) 

https://github.com/champios/lustre-nas)

btw_stats attached.

brw_stats

Comment by Alex Zhuravlev [ 07/Aug/23 ]

looking at the code in that tree I noticed couple things:

  • LU-15117 is missing (not sure it's really related, but I saw few RPC timeouts and that could cause similar symptoms)
  • LU-15894 is not reverted (and I see one trace stuck at the range lock)
Comment by Alex Zhuravlev [ 07/Aug/23 ]

btw_stats attached

unfortunately, LU-15564 has not been landed yet in that tree. that could help to understand whether it's block allocation related or not.

Comment by Mahmoud Hanafi [ 08/Aug/23 ]

Is there some additional debugging we can do to help with this issue.

Comment by Alex Zhuravlev [ 09/Aug/23 ]

Is there some additional debugging we can do to help with this issue.

would it be possible to apply LU-15564 so we can track allocation time?

Comment by Mahmoud Hanafi [ 09/Aug/23 ]

We are going to do a build with LU-15564. Should we also pick up LU-15117 and revert LU-15894?

We would need a backport for LU-15117.

Comment by Mahmoud Hanafi [ 16/Aug/23 ]

Still waiting for answer to above question.

Comment by Colin Faber [ 17/Aug/23 ]

Hi mhanafi
Sorry for the delay, you can try LU-15117 (there is a port already for b2_15). Likely LU-15894 may help but there is no port available, the patch bzzz is asking for you to try will provide additional statistics to help better understand the problem. I would suggest that you first start there, provide the requested statistics and allow us to better understand the problem before attempting other patches.

Comment by Mahmoud Hanafi [ 28/Aug/23 ]

We were able to get some of our servers patched with LU-15564 and got brw_stats after a hang, see below. btw, if we set debug=+trace it reduce the chance of hitting this bug. 

brw_stats.save.1693236421

stack1.out

 

Comment by Mahmoud Hanafi [ 03/Sep/23 ]

Can we get an update please. We have filesystem hanging multiple times a day.

Comment by Alex Zhuravlev [ 03/Sep/23 ]

Can we get an update please. We have filesystem hanging multiple times a day.

mhanafisorry for the delay, I'm trying to reconstruct the problem using brw_stats you provided.

Comment by Alex Zhuravlev [ 03/Sep/23 ]

mhanafi could you please apply the patch just added? the patch reverts range locking.

Comment by Mahmoud Hanafi [ 03/Sep/23 ]

Thanks we'll get the patch applied this week and let you know the results.

Comment by Alex Zhuravlev [ 04/Sep/23 ]

mhanafi I'm very very sorry, but ... could you please instead apply the patch I'm attaching. I've gone through few code paths and now think there is another problem with fallocate and actually we better reuse range locking you still have in your tree to fix the problem.
the path is fallocate-range-locking.patch

Comment by Gerrit Updater [ 04/Sep/23 ]

"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52264
Subject: LU-16966 osd: take trunc_lock for fallocate
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 48a6d52640fe716760d1a369f4bc53ebdba25e6d

Comment by Mahmoud Hanafi [ 25/Sep/23 ]

We applied the patch provided and we have not seeing the issue.

Comment by Gerrit Updater [ 16/Oct/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52264/
Subject: LU-16966 osd: take trunc_lock for fallocate
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 51529fb57f85210e292a15c882cf25a4689ea77d

Comment by Peter Jones [ 16/Oct/23 ]

Landed for 2.16

Comment by Alex Zhuravlev [ 16/Oct/23 ]

Does this issue also affect b_es6_0 and b2_15?

yes, b_es6_0 needs that for sure, will check b2_15

Comment by Gerrit Updater [ 16/Oct/23 ]

"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52710
Subject: LU-16966 osd: take trunc_lock for fallocate
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: d6e549e9ea2eb6e7b141203dad0130cc8da5f1db

Comment by Gerrit Updater [ 13/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52710/
Subject: LU-16966 osd: take trunc_lock for fallocate
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 9c97d1969e2298fdfe5daa616e36cbe17a9b3d5e

Comment by Peter Jones [ 17/Nov/23 ]

Will be included in 2.15.4

Generated at Sat Feb 10 03:31:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.