Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19725

sanity test_119f: packet is too big, leading to timeout

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Cyril Bordage <cbordage@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/88f16456-951c-4fc1-8e2e-84431c160b72

      test_119f failed with the following error:

      Timeout occurred after 169 minutes, last suite running was sanity
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-b2_16/6 - 5.14.0-427.31.1.el9_4.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/119887 - 4.18.0-553.89.1.el8_lustre.x86_64

      There is a repeating error raised by LNet about a too big packet:

        [ 4694.397511] LNetError: 3503:0:(lib-ptl.c:196:lnet_try_match_md()) Matching packet from 12345-10.240.46.175@tcp, match 1851863842237312 length 1048576 too big: 987136 left, 987136 allowed
        [ 4811.427160] Lustre: 3508:0:(client.c:2357:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1766079264/real 1766079264]  req@ff2c7f5b481356c0 x1851863842237312/t0(0) o4->lustre-OST0002-osc-ff2c7f5c73fc8800@10.240.46.175@tcp:6/4 lens 488/448 e 4 to 1 dl 1766079381 ref 2 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'dd.0' uid:0 gid:0
        [ 4811.427203] Lustre: lustre-OST0002-osc-ff2c7f5c73fc8800: Connection to lustre-OST0002 (at 10.240.46.175@tcp) was lost; in progress operations using this service will wait for recovery to complete
        [ 4811.433550] Lustre: lustre-OST0002-osc-ff2c7f5c73fc8800: Connection restored to  (at 10.240.46.175@tcp)
        [ 4811.434719] LNetError: 3503:0:(lib-ptl.c:196:lnet_try_match_md()) Matching packet from 12345-10.240.46.175@tcp, match 1851863842278528 length 1048576 too big: 987136 left, 987136 allowed
        [ 4859.774492] Autotest: Test running for 80 minutes (lustre-reviews_custom_119887.1002)
        [ 4928.162677] Lustre: 3508:0:(client.c:2357:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1766079381/real 1766079381]  req@ff2c7f5b481356c0 x1851863842237312/t0(0) o4->lustre-OST0002-osc-ff2c7f5c73fc8800@10.240.46.175@tcp:6/4 lens 488/448 e 4 to 1 dl 1766079498 ref 2 fl Rpc:XQr/202/ffffffff rc 0/-1 job:'dd.0' uid:0 gid:0
        [ 4928.162721] Lustre: lustre-OST0002-osc-ff2c7f5c73fc8800: Connection to lustre-OST0002 (at 10.240.46.175@tcp) was lost; in progress operations using this service will wait for recovery to complete
        [ 4928.169252] Lustre: lustre-OST0002-osc-ff2c7f5c73fc8800: Connection restored to  (at 10.240.46.175@tcp)
      

      It happens when using older client version: "clientversion=2.16".

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_119f - Timeout occurred after 169 minutes, last suite running was sanity

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated: