Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7986

HSM: Duplicated cookies in "actions"

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Some entries can end up being duplicated in "actions". For instance:

      lrh=[type=10680000 len=136 idx=1/14] fid=[0x200000401:0x10:0x0] dfid=[0x200000401:0x10:0x0] compound/cookie=0x5703ea60/0x5703ea6c action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
      lrh=[type=10680000 len=136 idx=1/51633] fid=[0x200000401:0x10:0x0] dfid=[0x200000401:0x10:0x0] compound/cookie=0x5703ea60/0x5703ea6c action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      a
      

      As cdt_llog_lock is taken in mdt_hsm_actions_proc_show(), I think that could be a race between the coordinator/client and the proc_show() function.

      Attachments

        Issue Links

          Activity

            [LU-7986] HSM: Duplicated cookies in "actions"

            Landed to master for 2.9.0

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed to master for 2.9.0

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19447/
            Subject: LU-7986 hsm: update actions llog in place
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 91144acb3dc1120c00797269afa621c94cb64e1e

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19447/ Subject: LU-7986 hsm: update actions llog in place Project: fs/lustre-release Branch: master Current Patch Set: Commit: 91144acb3dc1120c00797269afa621c94cb64e1e

            Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19577
            Subject: LU-7986 hsm: update actions llog in place
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e4d96fb6b56a0d40aa721a67351bea6a62db97b9

            gerrit Gerrit Updater added a comment - Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19577 Subject: LU-7986 hsm: update actions llog in place Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e4d96fb6b56a0d40aa721a67351bea6a62db97b9

            John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/19447
            Subject: LU-7986 hsm: update actions llog in place
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 54a07389e965a0cba7d246f8a6179591fb4b8798

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/19447 Subject: LU-7986 hsm: update actions llog in place Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 54a07389e965a0cba7d246f8a6179591fb4b8798

            Thanks.

            fzago Frank Zago (Inactive) added a comment - Thanks.
            jhammond John Hammond added a comment -

            Hi Frank,

            I'll push a patch that should address this. I'll push it after returning from LUG.

            jhammond John Hammond added a comment - Hi Frank, I'll push a patch that should address this. I'll push it after returning from LUG.

            May be that could explain why we're sometimes seeing replay of cookies. From our copytool traces, just after the hsm_recv:

            Apr  5 19:20:23 [lustre] new request (cookie=57ac3d15, FID=0x2000d0ebb:0x4cd:0x0, DFID=0x2000d0ebb:0x4cd:0x0, len=72, action=20, extent=-1/0, gid=0, item=0)
            Apr  5 20:34:33 [lustre] new request (cookie=57ac3d15, FID=0x2000d0ebb:0x4cd:0x0, DFID=0x2000d0ebb:0x4cd:0x0, len=72, action=20, extent=-1/0, gid=0, item=0)
            Apr  5 20:37:31 [lustre] new request (cookie=57ac3d15, FID=0x2000d0ebb:0x4cd:0x0, DFID=0x2000d0ebb:0x4cd:0x0, len=72, action=20, extent=-1/0, gid=0, item=0)
            Apr  5 20:37:31 [lustre] new request (cookie=57ac3d15, FID=0x2000d0ebb:0x4cd:0x0, DFID=0x2000d0ebb:0x4cd:0x0, len=72, action=20, extent=-1/0, gid=0, item=0)
            
            fzago Frank Zago (Inactive) added a comment - May be that could explain why we're sometimes seeing replay of cookies. From our copytool traces, just after the hsm_recv: Apr 5 19:20:23 [lustre] new request (cookie=57ac3d15, FID=0x2000d0ebb:0x4cd:0x0, DFID=0x2000d0ebb:0x4cd:0x0, len=72, action=20, extent=-1/0, gid=0, item=0) Apr 5 20:34:33 [lustre] new request (cookie=57ac3d15, FID=0x2000d0ebb:0x4cd:0x0, DFID=0x2000d0ebb:0x4cd:0x0, len=72, action=20, extent=-1/0, gid=0, item=0) Apr 5 20:37:31 [lustre] new request (cookie=57ac3d15, FID=0x2000d0ebb:0x4cd:0x0, DFID=0x2000d0ebb:0x4cd:0x0, len=72, action=20, extent=-1/0, gid=0, item=0) Apr 5 20:37:31 [lustre] new request (cookie=57ac3d15, FID=0x2000d0ebb:0x4cd:0x0, DFID=0x2000d0ebb:0x4cd:0x0, len=72, action=20, extent=-1/0, gid=0, item=0)

            This comes from the way the llog record is updated. We could not update in place so we add a new one and remove the old one. This is done in mdt_agent_record_update_cb(): add a new record and ask the callback caller to invalid the old one.
            So if llog cleanning is not fast enough we can have multiple records for the same request.
            At HSM dev time, I did not find a way to update in place.

            jcl jacques-charles lafoucriere added a comment - This comes from the way the llog record is updated. We could not update in place so we add a new one and remove the old one. This is done in mdt_agent_record_update_cb(): add a new record and ask the callback caller to invalid the old one. So if llog cleanning is not fast enough we can have multiple records for the same request. At HSM dev time, I did not find a way to update in place.

            actions.1459874574 is "actions" before starting the copytool
            actions.1459874610 is "actions" while the copytool is running

            The list of duplicated FIDs can be seen with the following command:

            sed -e "s/.*fid=\[\(.*\)\] dfid=.*/\1/" actions.1459874610 | sort | uniq -c | sort -r | less
            
            fzago Frank Zago (Inactive) added a comment - actions.1459874574 is "actions" before starting the copytool actions.1459874610 is "actions" while the copytool is running The list of duplicated FIDs can be seen with the following command: sed -e "s/.*fid=\[\(.*\)\] dfid=.*/\1/" actions.1459874610 | sort | uniq -c | sort -r | less

            Some FIDs/cookie even appear 3 times:

            lrh=[type=10680000 len=136 idx=1/1415] fid=[0x200000401:0x605:0x0] dfid=[0x200000401:0x605:0x0] compound/cookie=0x5703eaea/0x5703efce action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
            lrh=[type=10680000 len=136 idx=1/48820] fid=[0x200000401:0x605:0x0] dfid=[0x200000401:0x605:0x0] compound/cookie=0x5703eaea/0x5703efce action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
            lrh=[type=10680000 len=136 idx=1/51221] fid=[0x200000401:0x605:0x0] dfid=[0x200000401:0x605:0x0] compound/cookie=0x5703eaea/0x5703efce action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=SUCCEED data=[]
            
            fzago Frank Zago (Inactive) added a comment - Some FIDs/cookie even appear 3 times: lrh=[type=10680000 len=136 idx=1/1415] fid=[0x200000401:0x605:0x0] dfid=[0x200000401:0x605:0x0] compound/cookie=0x5703eaea/0x5703efce action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[] lrh=[type=10680000 len=136 idx=1/48820] fid=[0x200000401:0x605:0x0] dfid=[0x200000401:0x605:0x0] compound/cookie=0x5703eaea/0x5703efce action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[] lrh=[type=10680000 len=136 idx=1/51221] fid=[0x200000401:0x605:0x0] dfid=[0x200000401:0x605:0x0] compound/cookie=0x5703eaea/0x5703efce action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=SUCCEED data=[]

            People

              jhammond John Hammond
              fzago Frank Zago (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: