[LU-15695] dt_ladvise(): ASSERTION( dt->do_body_ops->dbo_ladvise ) Created: 25/Mar/22 Updated: 19/Dec/23 Resolved: 19/Dec/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.8 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Stephane Thiell | Assignee: | Peter Jones |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.9 |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
We saw this OSS LBUG yesterday following the use of lfs ladvise. Both 2.12.8 -based clients and servers (we have a few patches on top). The command that triggered this from a client as root was: touch testladvise lfs ladvise -a dontneed -s 0 -e 1048576000 testladvise Result on the OSS: Mar 24 18:53:55 oak-io7-s1 kernel: LustreError: 3782:0:(dt_object.h:2507:dt_ladvise()) ASSERTION( dt->do_body_ops->dbo_ladvise ) failed: Mar 24 18:53:55 oak-io7-s1 kernel: LustreError: 3782:0:(dt_object.h:2507:dt_ladvise()) LBUG Mar 24 18:53:55 oak-io7-s1 kernel: Pid: 3782, comm: ll_ost_io00_111 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020 Mar 24 18:53:55 oak-io7-s1 kernel: Call Trace: Mar 24 18:53:55 oak-io7-s1 kernel: [<ffffffffc085c7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] Mar 24 18:53:55 oak-io7-s1 kernel: [<ffffffffc085c87c>] lbug_with_loc+0x4c/0xa0 [libcfs] Mar 24 18:53:55 oak-io7-s1 kernel: [<ffffffffc15e12ce>] ofd_ladvise_hdl+0xdee/0xf10 [ofd] Mar 24 18:53:55 oak-io7-s1 kernel: [<ffffffffc1235f1a>] tgt_request_handle+0xada/0x1570 [ptlrpc] Mar 24 18:53:55 oak-io7-s1 kernel: [<ffffffffc11dabfb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 24 18:53:55 oak-io7-s1 kernel: [<ffffffffc11de564>] ptlrpc_main+0xb34/0x1470 [ptlrpc] Mar 24 18:53:55 oak-io7-s1 kernel: [<ffffffffac0c5c21>] kthread+0xd1/0xe0 Mar 24 18:53:55 oak-io7-s1 kernel: [<ffffffffac794ddd>] ret_from_fork_nospec_begin+0x7/0x21 Mar 24 18:53:55 oak-io7-s1 kernel: [<ffffffffffffffff>] 0xffffffffffffffff However, I can't seem to reproduce on a test system. It is not time sensitive for us. Just wanted to open a ticket as I didn't find this LBUG on the Jira. |
| Comments |
| Comment by Andreas Dilger [ 25/Mar/22 ] |
|
Looking at the code in 2.12.8 (and master) I see that ".dbo_ladvise = osd_ladvise" is set in the object methods, so this assertion is very strange. I was likewise unable to reproduce this issue on my test system, even with a directory or symlink, or with a PFL file that had an uninstantiated layout in the "advice" range. There is PS: you should fork your lustre github repo from lustre/lustre-release, which is the clone that I maintain from the whamcloud.com repo. |
| Comment by Stephane Thiell [ 25/Mar/22 ] |
|
Interesting, ok I'm now watching And very cool, I didn't know about the clone repo on Github, this is going to be useful, thanks!
|
| Comment by Alexander Zarochentsev [ 21/Mar/23 ] |
|
Wouldn't be possible to ignore ladvise requests if the object is not fully initialised? it is an optional hint, not a real read request as in |
| Comment by Andreas Dilger [ 23/Mar/23 ] |
|
Zam, I agree. The ladvise request could be dropped in this case, rather than taking the system down. |
| Comment by Dongyang Li [ 16/Nov/23 ] |
|
This should be fixed with https://review.whamcloud.com/c/fs/lustre-release/+/48080
|
| Comment by Gerrit Updater [ 17/Nov/23 ] |
|
|
| Comment by Andreas Dilger [ 19/Dec/23 ] |
|
The |