Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1717

mdt_recovery.c:611:mdt_steal_ack_locks()) Resent req xid XXX has mismatched opc: new 101 old 0

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • Lustre 2.3.0
    • LLNL Hyperion, CHAOS 5 servers/clients, Lustre 2.2.92
    • 3
    • 6355

    Description

      Running SWL tests, mix of various (IOR, mdtest, simul, mir, fdtree)
      Seeing this sequence repeatedly, lustre dump sent to ftp site. File Name: lu-1442.dump.gz

      Aug 7 09:51:29 ehyperion-rst6 kernel: LustreError: 29701:0:(mdt_recovery.c:611:mdt_steal_ack_locks()) Resent req xid 1409327760945603 has mismatched opc: new 101 old 0
      Aug 7 09:51:29 ehyperion-rst6 kernel: LustreError: 29701:0:(mdt_recovery.c:611:mdt_steal_ack_locks()) Skipped 5 previous similar messages
      Aug 7 09:51:29 ehyperion-rst6 kernel: Lustre: 29701:0:(mdt_recovery.c:622:mdt_steal_ack_locks()) Stealing 1 locks from rs ffff8802d1a96000 x1409327760945603.t537972417766 o0 NID 192.168.117.9@o2ib1
      Aug 7 09:51:29 ehyperion-rst6 kernel: Lustre: 29701:0:(mdt_recovery.c:622:mdt_steal_ack_locks()) Skipped 5 previous similar messages
      Aug 7 09:51:29 ehyperion-rst6 kernel: Lustre: 4710:0:(service.c:2095:ptlrpc_handle_rs()) All locks stolen from rs ffff8802d1a96000 x1409327760945603.t537972417766 o0 NID 192.168.117.9@o2ib1

      Attachments

        Issue Links

          Activity

            [LU-1717] mdt_recovery.c:611:mdt_steal_ack_locks()) Resent req xid XXX has mismatched opc: new 101 old 0
            bobijam Zhenyu Xu made changes -
            Link New: This issue is related to DDN-47 [ DDN-47 ]
            jlevi Jodi Levi (Inactive) made changes -
            Fix Version/s New: Lustre 2.4.0 [ 10154 ]
            morrone Christopher Morrone (Inactive) made changes -
            Link New: This issue is related to LU-2187 [ LU-2187 ]
            ian Ian Colle (Inactive) made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            Patch landed

            ian Ian Colle (Inactive) added a comment - Patch landed

            http://review.whamcloud.com/4271

            This does what Oleg suggested.

            liwei Li Wei (Inactive) added a comment - http://review.whamcloud.com/4271 This does what Oleg suggested.
            green Oleg Drokin added a comment -

            I don't really know how are you losing the messages.

            The resent xid could only occur if a reply was not seen by a client and it decided to resend the message (there probably should be a client-side message about that too).
            The specific message you see could only happen when that lost reply happened to be one for a so called "difficult" reply - where a lock is being returned to the client.

            green Oleg Drokin added a comment - I don't really know how are you losing the messages. The resent xid could only occur if a reply was not seen by a client and it decided to resend the message (there probably should be a client-side message about that too). The specific message you see could only happen when that lost reply happened to be one for a so called "difficult" reply - where a lock is being returned to the client.

            Oleg, if your assumption about lost replies is correct, then I think we have a bigger problem here. We do not have lnet routers on Sequoia so we should have a reliable communication fabric.

            How are we losing messages so often??

            morrone Christopher Morrone (Inactive) added a comment - Oleg, if your assumption about lost replies is correct, then I think we have a bigger problem here. We do not have lnet routers on Sequoia so we should have a reliable communication fabric. How are we losing messages so often??
            green Oleg Drokin added a comment -

            The particular problem you'd see this in is when a reply from server to client was lost and client did a resend.
            The message is harmless (and wrong, and will be fixed).

            green Oleg Drokin added a comment - The particular problem you'd see this in is when a reply from server to client was lost and client did a resend. The message is harmless (and wrong, and will be fixed).
            morrone Christopher Morrone (Inactive) made changes -
            Link New: This issue is duplicated by ORI-669 [ ORI-669 ]

            People

              liwei Li Wei (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: