Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15644

failed llog cancel should not generate an error

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      If llog cancel is cancelling a record that does not exist (either because the record is already cancelled or the log has been removed), this is generating a lot of console logs and (potentially) errors on the other servers:

      lfs02-n05:
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:753:llog_cat_cancel_arr_rec()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 llog-records: rc = -116
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:790:llog_cat_cancel_records()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 of 1 llog-records: rc = -116
      
      lfs02-n06:
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:753:llog_cat_cancel_arr_rec()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 llog-records: rc = -116
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:790:llog_cat_cancel_records()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 of 1 llog-records: rc = -116
      
      lfs02-n07:
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:753:llog_cat_cancel_arr_rec()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 llog-records: rc = -116
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:790:llog_cat_cancel_records()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 of 1 llog-records: rc = -116
      [repeats for all MDS servers]
      

      The -116=-ESTALE error is because the OUT recovery llog on the MDT was deleted, but the FID->inode record is still in the OI file and it finds the inode, but the inode has i_nlink=0 on disk.

      Regardless of that, failure to cancel an llog record that doesn't exist (e.g. -ENOENT or -ESTALE) should not be a cause for an error that is retried. The local record should be cancelled in this case and not retried.

      Attachments

        Issue Links

          Activity

            [LU-15644] failed llog cancel should not generate an error

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55335/
            Subject: LU-15644 llog: don't report warning in no error case
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 383fedf853e46edec18b7d2bb3699fb0b0a37438

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55335/ Subject: LU-15644 llog: don't report warning in no error case Project: fs/lustre-release Branch: master Current Patch Set: Commit: 383fedf853e46edec18b7d2bb3699fb0b0a37438
            pjones Peter Jones added a comment -

            Another patch to track

            pjones Peter Jones added a comment - Another patch to track

            "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55335
            Subject: LU-15644 llog: don't report warning in no error case
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f5a2530f3d9558965e3e88ce500c68d882c86436

            gerrit Gerrit Updater added a comment - "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55335 Subject: LU-15644 llog: don't report warning in no error case Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f5a2530f3d9558965e3e88ce500c68d882c86436
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55151/
            Subject: LU-15644 llog: don't replace llog error with -ENOTDIR
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: bd9839f7dbdf59751e7cdc234602eb338c518104

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55151/ Subject: LU-15644 llog: don't replace llog error with -ENOTDIR Project: fs/lustre-release Branch: master Current Patch Set: Commit: bd9839f7dbdf59751e7cdc234602eb338c518104

            "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55151
            Subject: LU-15644 llog: don't replace llog error with -ENOTDIR
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 936ab1a4f5fbd16afe7ed202e25c3e8cc8c620d1

            gerrit Gerrit Updater added a comment - "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55151 Subject: LU-15644 llog: don't replace llog error with -ENOTDIR Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 936ab1a4f5fbd16afe7ed202e25c3e8cc8c620d1

            People

              tappro Mikhail Pershin
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: