[LU-15695] dt_ladvise(): ASSERTION( dt->do_body_ops->dbo_ladvise ) Created: 25/Mar/22  Updated: 19/Dec/23  Resolved: 19/Dec/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.8
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Stephane Thiell Assignee: Peter Jones
Resolution: Duplicate Votes: 0
Labels: None
Environment:

CentOS 7.9


Issue Links:
Cloners
Related
is related to LU-15139 sanity test_160h: dt_record_write() A... Resolved
is related to LU-16057 OBD_MD_FLGROUP not set for ladvise rpc Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We saw this OSS LBUG yesterday following the use of lfs ladvise. Both 2.12.8 -based clients and servers (we have a few patches on top).

The command that triggered this from a client as root was:

touch testladvise
lfs ladvise -a dontneed -s 0 -e 1048576000 testladvise

Result on the OSS:

Mar 24 18:53:55 oak-io7-s1 kernel: LustreError: 3782:0:(dt_object.h:2507:dt_ladvise()) ASSERTION( dt->do_body_ops->dbo_ladvise ) failed: 
Mar 24 18:53:55 oak-io7-s1 kernel: LustreError: 3782:0:(dt_object.h:2507:dt_ladvise()) LBUG
Mar 24 18:53:55 oak-io7-s1 kernel: Pid: 3782, comm: ll_ost_io00_111 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020
Mar 24 18:53:55 oak-io7-s1 kernel: Call Trace:
Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc085c7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc085c87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc15e12ce>] ofd_ladvise_hdl+0xdee/0xf10 [ofd]
Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc1235f1a>] tgt_request_handle+0xada/0x1570 [ptlrpc]
Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc11dabfb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc11de564>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffac0c5c21>] kthread+0xd1/0xe0
Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffac794ddd>] ret_from_fork_nospec_begin+0x7/0x21
Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffffffffff>] 0xffffffffffffffff 

However, I can't seem to reproduce on a test system. It is not time sensitive for us. Just wanted to open a ticket as I didn't find this LBUG on the Jira.



 Comments   
Comment by Andreas Dilger [ 25/Mar/22 ]

Looking at the code in 2.12.8 (and master) I see that ".dbo_ladvise = osd_ladvise" is set in the object methods, so this assertion is very strange. I was likewise unable to reproduce this issue on my test system, even with a directory or symlink, or with a PFL file that had an uninstantiated layout in the "advice" range.

There is LU-15139, which is a similar LASSERT() for missing object operations that is unexplained so far.

PS: you should fork your lustre github repo from lustre/lustre-release, which is the clone that I maintain from the whamcloud.com repo.

Comment by Stephane Thiell [ 25/Mar/22 ]

Interesting, ok I'm now watching LU-15139.

And very cool, I didn't know about the clone repo on Github, this is going to be useful, thanks!

 

Comment by Alexander Zarochentsev [ 21/Mar/23 ]

Wouldn't be possible to ignore ladvise requests if the object is not fully initialised? it is an optional hint, not a real read request as in LU-15139.

Comment by Andreas Dilger [ 23/Mar/23 ]

Zam, I agree. The ladvise request could be dropped in this case, rather than taking the system down.

Comment by Dongyang Li [ 16/Nov/23 ]

This should be fixed with https://review.whamcloud.com/c/fs/lustre-release/+/48080

LU-16057 obdclass: set OBD_MD_FLGROUP for ladvise RPC

Comment by Gerrit Updater [ 17/Nov/23 ]

"Andriy Skulysh <andriy.skulysh@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53161
Subject: LU-15695 osd: dt_ladvise(): ASSERTION( dt>do_body_ops->dbo_ladvise)-
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e2c5a0faaa6e2405e27575935fc8a96b8510bf8c

Comment by Andreas Dilger [ 19/Dec/23 ]

The LU-16057 patch was included in 2.15.53.

Generated at Sat Feb 10 03:20:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.