Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15644

failed llog cancel should not generate an error

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      If llog cancel is cancelling a record that does not exist (either because the record is already cancelled or the log has been removed), this is generating a lot of console logs and (potentially) errors on the other servers:

      lfs02-n05:
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:753:llog_cat_cancel_arr_rec()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 llog-records: rc = -116
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:790:llog_cat_cancel_records()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 of 1 llog-records: rc = -116
      
      lfs02-n06:
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:753:llog_cat_cancel_arr_rec()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 llog-records: rc = -116
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:790:llog_cat_cancel_records()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 of 1 llog-records: rc = -116
      
      lfs02-n07:
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:753:llog_cat_cancel_arr_rec()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 llog-records: rc = -116
      Mar 12 14:06:15 lfs02-n30 kernel: LustreError: 28071:0:(llog_cat.c:790:llog_cat_cancel_records()) lfs02-MDT0004-osp-MDT001d: fail to cancel 1 of 1 llog-records: rc = -116
      [repeats for all MDS servers]
      

      The -116=-ESTALE error is because the OUT recovery llog on the MDT was deleted, but the FID->inode record is still in the OI file and it finds the inode, but the inode has i_nlink=0 on disk.

      Regardless of that, failure to cancel an llog record that doesn't exist (e.g. -ENOENT or -ESTALE) should not be a cause for an error that is retried. The local record should be cancelled in this case and not retried.

      Attachments

        Issue Links

          Activity

            [LU-15644] failed llog cancel should not generate an error

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55335/
            Subject: LU-15644 llog: don't report warning in no error case
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 383fedf853e46edec18b7d2bb3699fb0b0a37438

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55335/ Subject: LU-15644 llog: don't report warning in no error case Project: fs/lustre-release Branch: master Current Patch Set: Commit: 383fedf853e46edec18b7d2bb3699fb0b0a37438
            pjones Peter Jones added a comment -

            Another patch to track

            pjones Peter Jones added a comment - Another patch to track

            "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55335
            Subject: LU-15644 llog: don't report warning in no error case
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f5a2530f3d9558965e3e88ce500c68d882c86436

            gerrit Gerrit Updater added a comment - "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55335 Subject: LU-15644 llog: don't report warning in no error case Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f5a2530f3d9558965e3e88ce500c68d882c86436
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55151/
            Subject: LU-15644 llog: don't replace llog error with -ENOTDIR
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: bd9839f7dbdf59751e7cdc234602eb338c518104

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55151/ Subject: LU-15644 llog: don't replace llog error with -ENOTDIR Project: fs/lustre-release Branch: master Current Patch Set: Commit: bd9839f7dbdf59751e7cdc234602eb338c518104

            "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55151
            Subject: LU-15644 llog: don't replace llog error with -ENOTDIR
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 936ab1a4f5fbd16afe7ed202e25c3e8cc8c620d1

            gerrit Gerrit Updater added a comment - "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55151 Subject: LU-15644 llog: don't replace llog error with -ENOTDIR Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 936ab1a4f5fbd16afe7ed202e25c3e8cc8c620d1

            It looks like this same problem was also hit in LU-12985 and LU-13469, with -ENOENT, -EIO, and -ESTALE.

            It would be useful if the error messages also included the FID of the llog file itself, so that the problematic llog file can be tracked more easily.

            adilger Andreas Dilger added a comment - It looks like this same problem was also hit in LU-12985 and LU-13469 , with -ENOENT , -EIO , and -ESTALE . It would be useful if the error messages also included the FID of the llog file itself, so that the problematic llog file can be tracked more easily.

            People

              tappro Mikhail Pershin
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: