Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12140

DOM: limitaion of data size

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0
    • Lustre 2.12.1
    • None
    • configuration with router, MDS_FS_MKFS_OPTS="-O large_xattr", sanity-dom/sanityn
    • 3
    • 9223372036854775807

    Description

      During sanity-dom testing the next issue was appeared:

       [ 1644.726837] LNetError: 3137:0:(lib-move.c:4143:lnet_parse()) 192.168.8.1@tcp, src 192.168.8.1@tcp: bad PUT payload 1051832 (1048576 max expected)

      I've added a bit debug to take vmcores from a sender.

      Here is analyze from crash

       
      md = {
           start = 0xffff880098200100,
           length = 1051832,
           threshold = 0,
           max_size = -1742733056,
           options = 0,
           user_ptr = 0xffff880098200000,
           eq_handle = {
             cookie = 23
           },
           bulk_handle = {
             cookie = 0
           }
         },
      msg_niov = 1,
       msg_iov = 0xffff88009995aba0,
       
      msg_kiov = 0x0,
       
      ffff880135be9800
      rc_fmt = 0xffffffffc095d080 <RQF_LDLM_INTENT_OPEN>,
       
      static const struct req_msg_field *ldlm_intent_open_server[] = {
             &RMF_PTLRPC_BODY,
             &RMF_DLM_REP,
             &RMF_MDT_BODY,
             &RMF_MDT_MD,
             &RMF_ACL,
             &RMF_CAPA1,
             &RMF_CAPA2,
             &RMF_NIOBUF_INLINE,
      };
      rc_area = {{4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295}, {184, 112, 216, 2432, 260, 0, 0, 1048592, 4294967295, 4294967295}}
       }
       
      crash> p 184+112+216+2432+260+1048592
      $15 = 1051796
       

      The DOM size during open was 1Mb, the total length of lnet request was 1051796, and it doesn't fit at LNET_MTU limit. So the router shows error.
      This brings us to problem when we cannot handle 1Mb stripe size DOM at LNET layer. I think it is a problem for PFL when a first stripe located at MDS, probably.
      The workaround for sanity-dom testing is to decrease DOM_SIZE at sanity-dom.sh
      Also MDS should limit this size to prevent such misbehavior.
      I've assigned this to Mikhail, I'm not sure.

      Attachments

        Activity

          People

            tappro Mikhail Pershin
            aboyko Alexander Boyko
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: