Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15695

dt_ladvise(): ASSERTION( dt->do_body_ops->dbo_ladvise )

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.12.8
    • None
    • CentOS 7.9
    • 3
    • 9223372036854775807

    Description

      We saw this OSS LBUG yesterday following the use of lfs ladvise. Both 2.12.8 -based clients and servers (we have a few patches on top).

      The command that triggered this from a client as root was:

      touch testladvise
      lfs ladvise -a dontneed -s 0 -e 1048576000 testladvise
      

      Result on the OSS:

      Mar 24 18:53:55 oak-io7-s1 kernel: LustreError: 3782:0:(dt_object.h:2507:dt_ladvise()) ASSERTION( dt->do_body_ops->dbo_ladvise ) failed: 
      Mar 24 18:53:55 oak-io7-s1 kernel: LustreError: 3782:0:(dt_object.h:2507:dt_ladvise()) LBUG
      Mar 24 18:53:55 oak-io7-s1 kernel: Pid: 3782, comm: ll_ost_io00_111 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020
      Mar 24 18:53:55 oak-io7-s1 kernel: Call Trace:
      Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc085c7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc085c87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc15e12ce>] ofd_ladvise_hdl+0xdee/0xf10 [ofd]
      Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc1235f1a>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc11dabfb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffc11de564>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffac0c5c21>] kthread+0xd1/0xe0
      Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffac794ddd>] ret_from_fork_nospec_begin+0x7/0x21
      Mar 24 18:53:55 oak-io7-s1 kernel:  [<ffffffffffffffff>] 0xffffffffffffffff 

      However, I can't seem to reproduce on a test system. It is not time sensitive for us. Just wanted to open a ticket as I didn't find this LBUG on the Jira.

      Attachments

        Issue Links

          Activity

            [LU-15695] dt_ladvise(): ASSERTION( dt->do_body_ops->dbo_ladvise )

            The LU-16057 patch was included in 2.15.53.

            adilger Andreas Dilger added a comment - The LU-16057 patch was included in 2.15.53.
            gerrit Gerrit Updater added a comment - - edited

            "Andriy Skulysh <andriy.skulysh@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53161
            Subject: LU-15695 osd: dt_ladvise(): ASSERTION( dt>do_body_ops->dbo_ladvise)-
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e2c5a0faaa6e2405e27575935fc8a96b8510bf8c

            gerrit Gerrit Updater added a comment - - edited "Andriy Skulysh <andriy.skulysh@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53161 Subject: LU-15695 osd: dt_ladvise(): ASSERTION( dt >do_body_ops->dbo_ladvise)- Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e2c5a0faaa6e2405e27575935fc8a96b8510bf8c
            dongyang Dongyang Li added a comment -

            This should be fixed with https://review.whamcloud.com/c/fs/lustre-release/+/48080

            LU-16057 obdclass: set OBD_MD_FLGROUP for ladvise RPC

            dongyang Dongyang Li added a comment - This should be fixed with https://review.whamcloud.com/c/fs/lustre-release/+/48080 LU-16057 obdclass: set OBD_MD_FLGROUP for ladvise RPC

            Zam, I agree. The ladvise request could be dropped in this case, rather than taking the system down.

            adilger Andreas Dilger added a comment - Zam, I agree. The ladvise request could be dropped in this case, rather than taking the system down.

            Wouldn't be possible to ignore ladvise requests if the object is not fully initialised? it is an optional hint, not a real read request as in LU-15139.

            zam Alexander Zarochentsev added a comment - Wouldn't be possible to ignore ladvise requests if the object is not fully initialised? it is an optional hint, not a real read request as in LU-15139 .

            Interesting, ok I'm now watching LU-15139.

            And very cool, I didn't know about the clone repo on Github, this is going to be useful, thanks!

             

            sthiell Stephane Thiell added a comment - Interesting, ok I'm now watching LU-15139 . And very cool, I didn't know about the clone repo on Github, this is going to be useful, thanks!  

            Looking at the code in 2.12.8 (and master) I see that ".dbo_ladvise = osd_ladvise" is set in the object methods, so this assertion is very strange. I was likewise unable to reproduce this issue on my test system, even with a directory or symlink, or with a PFL file that had an uninstantiated layout in the "advice" range.

            There is LU-15139, which is a similar LASSERT() for missing object operations that is unexplained so far.

            PS: you should fork your lustre github repo from lustre/lustre-release, which is the clone that I maintain from the whamcloud.com repo.

            adilger Andreas Dilger added a comment - Looking at the code in 2.12.8 (and master) I see that " .dbo_ladvise = osd_ladvise " is set in the object methods, so this assertion is very strange. I was likewise unable to reproduce this issue on my test system, even with a directory or symlink, or with a PFL file that had an uninstantiated layout in the "advice" range. There is LU-15139 , which is a similar LASSERT() for missing object operations that is unexplained so far. PS: you should fork your lustre github repo from lustre/lustre-release , which is the clone that I maintain from the whamcloud.com repo.

            People

              pjones Peter Jones
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: