Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Cyril Bordage <cbordage@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/88f16456-951c-4fc1-8e2e-84431c160b72
test_119f failed with the following error:
Timeout occurred after 169 minutes, last suite running was sanity
Test session details:
clients: https://build.whamcloud.com/job/lustre-b2_16/6 - 5.14.0-427.31.1.el9_4.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/119887 - 4.18.0-553.89.1.el8_lustre.x86_64
There is a repeating error raised by LNet about a too big packet:
[ 4694.397511] LNetError: 3503:0:(lib-ptl.c:196:lnet_try_match_md()) Matching packet from 12345-10.240.46.175@tcp, match 1851863842237312 length 1048576 too big: 987136 left, 987136 allowed [ 4811.427160] Lustre: 3508:0:(client.c:2357:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1766079264/real 1766079264] req@ff2c7f5b481356c0 x1851863842237312/t0(0) o4->lustre-OST0002-osc-ff2c7f5c73fc8800@10.240.46.175@tcp:6/4 lens 488/448 e 4 to 1 dl 1766079381 ref 2 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'dd.0' uid:0 gid:0 [ 4811.427203] Lustre: lustre-OST0002-osc-ff2c7f5c73fc8800: Connection to lustre-OST0002 (at 10.240.46.175@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 4811.433550] Lustre: lustre-OST0002-osc-ff2c7f5c73fc8800: Connection restored to (at 10.240.46.175@tcp) [ 4811.434719] LNetError: 3503:0:(lib-ptl.c:196:lnet_try_match_md()) Matching packet from 12345-10.240.46.175@tcp, match 1851863842278528 length 1048576 too big: 987136 left, 987136 allowed [ 4859.774492] Autotest: Test running for 80 minutes (lustre-reviews_custom_119887.1002) [ 4928.162677] Lustre: 3508:0:(client.c:2357:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1766079381/real 1766079381] req@ff2c7f5b481356c0 x1851863842237312/t0(0) o4->lustre-OST0002-osc-ff2c7f5c73fc8800@10.240.46.175@tcp:6/4 lens 488/448 e 4 to 1 dl 1766079498 ref 2 fl Rpc:XQr/202/ffffffff rc 0/-1 job:'dd.0' uid:0 gid:0 [ 4928.162721] Lustre: lustre-OST0002-osc-ff2c7f5c73fc8800: Connection to lustre-OST0002 (at 10.240.46.175@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 4928.169252] Lustre: lustre-OST0002-osc-ff2c7f5c73fc8800: Connection restored to (at 10.240.46.175@tcp)
It happens when using older client version: "clientversion=2.16".
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_119f - Timeout occurred after 169 minutes, last suite running was sanity