Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12441

Response tracker is not detached on router ping reply

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0, Lustre 2.12.4
    • Lustre 2.13.0, Lustre 2.12.2
    • None
    • 3
    • 9223372036854775807

    Description

      This leads to some false-positive "timeouts". In the excerpt below we see the response tracker attached to the ping msg, we see the reply, but we never unlink the md so the response tracker does not get detached.

      00000400:00000200:26.0:1560633919.659270:0:2482:0:(router.c:1045:lnet_ping_router_locked()) Check: 12345-93@gni4
      00000400:00000200:26.0:1560633919.659290:0:2482:0:(lib-move.c:4844:LNetGet()) LNetGet msg ffff8807832cdc00 -> 12345-93@gni4
      00000400:00000200:26.0:1560633919.659291:0:2482:0:(lib-msg.c:364:lnet_msg_attach_md()) attached md ffff88078a861f68 to msg ffff8807832cdc00
      00000400:00000200:26.0:1560633919.659293:0:2482:0:(lib-move.c:4505:lnet_attach_rsp_tracker()) Add rspt ffff88078b5ef000 to md ffff88078a861f68 dl 1560633969s ne false
      00000400:00000200:6.0:1560633919.659345:0:2479:0:(lib-msg.c:775:lnet_msg_detach_md()) ffff88078a861f68 ref 0 fl 2 thr -1 opt 10 off 0 size 0 len 272 msg ffff8807832cdc00 unlink false
      00000400:00000200:6.0:1560633919.659466:0:2479:0:(lib-move.c:3890:lnet_parse_reply()) 60@gni4: Reply msg ffff88078c803800 from 12345-93@gni4 of length 80/80 into md 0x65931
      00000400:00000200:6.0:1560633919.659467:0:2479:0:(lib-msg.c:364:lnet_msg_attach_md()) attached md ffff88078a861f68 to msg ffff88078c803800
      00000400:00000200:6.0:1560633919.659474:0:2479:0:(router.c:120:lnet_notify_locked()) Old news
      00000400:00000200:6.0:1560633919.659475:0:2479:0:(lib-msg.c:775:lnet_msg_detach_md()) ffff88078a861f68 ref 0 fl 2 thr -1 opt 10 off 0 size 0 len 272 msg ffff88078c803800 unlink false
      00000400:00000200:6.0:1560633919.659507:0:2479:0:(router.c:120:lnet_notify_locked()) Old news
      00000400:00000200:6.0:1560633919.659578:0:2479:0:(router.c:120:lnet_notify_locked()) Old news
      00000400:00000200:6.0:1560633919.659619:0:2479:0:(router.c:120:lnet_notify_locked()) Old news
      00000400:00000100:26.0:1560633977.003290:0:2482:0:(lib-move.c:2781:lnet_finalize_expired_responses()) Response timed out: md = ffff88078a861f68: nid = 93@gni4
      

      Attachments

        Issue Links

          Activity

            [LU-12441] Response tracker is not detached on router ping reply

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36634/
            Subject: LU-12441 lnet: Detach rspt when md_threshold is infinite
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: c095fbda55ca632cff2696550f22a13a19ee4514

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36634/ Subject: LU-12441 lnet: Detach rspt when md_threshold is infinite Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: c095fbda55ca632cff2696550f22a13a19ee4514

            Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36634
            Subject: LU-12441 lnet: Detach rspt when md_threshold is infinite
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: bf8bfe8338ac6d3a5715f66f1f845b9618d270dc

            gerrit Gerrit Updater added a comment - Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36634 Subject: LU-12441 lnet: Detach rspt when md_threshold is infinite Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: bf8bfe8338ac6d3a5715f66f1f845b9618d270dc
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35452/
            Subject: LU-12441 lnet: Detach rspt when md_threshold is infinite
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ebbf909a1c2d0f5400da2d98e1bb274a9e82e0a5

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35452/ Subject: LU-12441 lnet: Detach rspt when md_threshold is infinite Project: fs/lustre-release Branch: master Current Patch Set: Commit: ebbf909a1c2d0f5400da2d98e1bb274a9e82e0a5

            Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/35452
            Subject: LU-12441 lnet: response tracker cleanup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5812fe14104d090b65be0c353e9088b079e0ce42

            gerrit Gerrit Updater added a comment - Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/35452 Subject: LU-12441 lnet: response tracker cleanup Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5812fe14104d090b65be0c353e9088b079e0ce42
            gerrit Gerrit Updater added a comment - - edited

            Edit - Removed reference to abandoned patch

            gerrit Gerrit Updater added a comment - - edited Edit - Removed reference to abandoned patch
            hornc Chris Horn added a comment -

            I think the solution here is to detach the response tracker in lnet_router_checker_event() for the reply case.

            hornc Chris Horn added a comment - I think the solution here is to detach the response tracker in lnet_router_checker_event() for the reply case.

            People

              hornc Chris Horn
              hornc Chris Horn
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: