Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17476

lnet: only report mismatched nid in ME if bits match

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0, Lustre 2.15.5
    • Lustre 2.15.5
    • None
    • 3
    • 9223372036854775807

    Description

      There are rare cases where a client-to-server AST reply was being dropped by the server, with messages similar to the following with o104, o105, or o106 as the RPC type:

      Lustre: 3678513:0:(client.c:2318:ptlrpc_expire_one_request())
           @@@ Request sent has timed out for slow reply: [sent 1706140870/real 1706140870]
            req@00000000a8fbe768 x1788044801687552/t0(0) o104->lfs00-MDT0001@10.31.3.109@tcp:15/16
           lens 328/224 e 0 to 1 dl 1706140908 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:''
      

      and in the kernel debug logs it shows that LNet is dropping the RPC due to no matching request:

      lnet_parse_put()) Dropping PUT from 12345-10.31.3.108@tcp portal 16 match 1788044801687552 offset 224 length 224: 4
      :
      request_out_callback()) @@@ type 5, status 0  req@00000000a8fbe768 x1788044801687552/t0(0) o104->lfs02-MDT0001@10.31.3.109@tcp:15/16 lens 328/224 e 0 to 0 dl 1706140946 ref 2 fl Rpc:r/2/ffffffff rc 0/-1 job:''
      lnet_parse_put()) Dropping PUT from 12345-10.31.3.108@tcp portal 16 match 1788044801687552 offset 224 length 224: 4
      lnet_is_health_check()) Msg 00000000a906b193 is in inconsistent state, don't perform health checking (-2, 0)
      lnet_is_health_check()) health check = 0, status = -2, hstatus = 0
      

      As a part of MD matching for incoming GET or PUT from a peer with multiple NIDs, use "matchbits" only if they are available and only report an error on NID/PID mismatch. If can't use "matchbits" for matching, fail on NID/PID mismatch as before.

      Attachments

        Issue Links

          Activity

            [LU-17476] lnet: only report mismatched nid in ME if bits match
            maloo Maloo made changes -
            Remote Link New: This issue links to "Page (Whamcloud Community Wiki)" [ 44149 ]
            cfaber Colin Faber made changes -
            Link New: This issue is related to DDN-5549 [ DDN-5549 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DDN-5292 [ DDN-5292 ]
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.15.5 [ 16491 ]

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55489/
            Subject: LU-17476 lnet: use bits only to match ME in all cases
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: a34b3596ad29fc4fd9e7d1f007e4f6ee514dfcaa

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55489/ Subject: LU-17476 lnet: use bits only to match ME in all cases Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: a34b3596ad29fc4fd9e7d1f007e4f6ee514dfcaa

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55488/
            Subject: LU-17476 lnet: prefer to use bits only to match ME
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: eb35ce5538512b67fd82955c54a148eb707a10ee

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55488/ Subject: LU-17476 lnet: prefer to use bits only to match ME Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: eb35ce5538512b67fd82955c54a148eb707a10ee
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-17958 [ LU-17958 ]
            adilger Andreas Dilger made changes -
            Affects Version/s New: Lustre 2.15.5 [ 16491 ]

            "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55489
            Subject: LU-17476 lnet: use bits only to match ME in all cases
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: b342b92b923938892df81a207a0271473c16060e

            gerrit Gerrit Updater added a comment - "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55489 Subject: LU-17476 lnet: use bits only to match ME in all cases Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: b342b92b923938892df81a207a0271473c16060e

            "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55488
            Subject: LU-17476 lnet: prefer to use bits only to match ME
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: e10fa830fe3313768df31eee657fe1b02792c1ab

            gerrit Gerrit Updater added a comment - "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55488 Subject: LU-17476 lnet: prefer to use bits only to match ME Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: e10fa830fe3313768df31eee657fe1b02792c1ab

            People

              ssmirnov Serguei Smirnov
              ssmirnov Serguei Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: