Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0
    • Lustre 2.12.1
    • None
    • configuration with router, MDS_FS_MKFS_OPTS="-O large_xattr", sanity-dom/sanityn
    • 3
    • 9223372036854775807

    Description

      During sanity-dom testing the next issue was appeared:

       [ 1644.726837] LNetError: 3137:0:(lib-move.c:4143:lnet_parse()) 192.168.8.1@tcp, src 192.168.8.1@tcp: bad PUT payload 1051832 (1048576 max expected)

      I've added a bit debug to take vmcores from a sender.

      Here is analyze from crash

       
      md = {
           start = 0xffff880098200100,
           length = 1051832,
           threshold = 0,
           max_size = -1742733056,
           options = 0,
           user_ptr = 0xffff880098200000,
           eq_handle = {
             cookie = 23
           },
           bulk_handle = {
             cookie = 0
           }
         },
      msg_niov = 1,
       msg_iov = 0xffff88009995aba0,
       
      msg_kiov = 0x0,
       
      ffff880135be9800
      rc_fmt = 0xffffffffc095d080 <RQF_LDLM_INTENT_OPEN>,
       
      static const struct req_msg_field *ldlm_intent_open_server[] = {
             &RMF_PTLRPC_BODY,
             &RMF_DLM_REP,
             &RMF_MDT_BODY,
             &RMF_MDT_MD,
             &RMF_ACL,
             &RMF_CAPA1,
             &RMF_CAPA2,
             &RMF_NIOBUF_INLINE,
      };
      rc_area = {{4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295}, {184, 112, 216, 2432, 260, 0, 0, 1048592, 4294967295, 4294967295}}
       }
       
      crash> p 184+112+216+2432+260+1048592
      $15 = 1051796
       

      The DOM size during open was 1Mb, the total length of lnet request was 1051796, and it doesn't fit at LNET_MTU limit. So the router shows error.
      This brings us to problem when we cannot handle 1Mb stripe size DOM at LNET layer. I think it is a problem for PFL when a first stripe located at MDS, probably.
      The workaround for sanity-dom testing is to decrease DOM_SIZE at sanity-dom.sh
      Also MDS should limit this size to prevent such misbehavior.
      I've assigned this to Mikhail, I'm not sure.

      Attachments

        Activity

          [LU-12140] DOM: limitaion of data size
          pjones Peter Jones added a comment -

          Landed for 2.13

          pjones Peter Jones added a comment - Landed for 2.13

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34975/
          Subject: LU-12140 lnet: adds checking msg len
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 4d43a6c3b182485ffaf7d94c726653b1a36d1b9b

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34975/ Subject: LU-12140 lnet: adds checking msg len Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4d43a6c3b182485ffaf7d94c726653b1a36d1b9b

          Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/34975
          Subject: LU-12140 lnet: adds checking msg len
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: aeb963fc19a6f78777e91ee3fd98ad93c7d9acca

          gerrit Gerrit Updater added a comment - Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/34975 Subject: LU-12140 lnet: adds checking msg len Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: aeb963fc19a6f78777e91ee3fd98ad93c7d9acca

          People

            tappro Mikhail Pershin
            aboyko Alexander Boyko
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: