Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.16.0, Lustre 2.15.6
-
3
-
9223372036854775807
Description
This issue was created by maloo for S Buisson <sbuisson@ddn.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/37495c46-a593-48c0-a5d8-28bc0e342fc3
test_6 failed with the following error:
dd not finished in 240 secs
This happened with ZFS backend.
In the OSS dmesg, we can see complaints about lost connection to the MDS:
[ 3066.737564] Lustre: 17933:0:(client.c:2245:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591889766/real 1591889766] req@ffff8c0c6e17f180 x1669214324881792/t0(0) o601->lustre-MDT0000-lwp-OST0001@10.9.6.237@tcp:23/10 lens 336/336 e 0 to 1 dl 1591889773 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'ll_ost_io00_005.0' [ 3066.742583] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.9.6.237@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 3075.071151] Lustre: 17933:0:(client.c:2245:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591889774/real 1591889774] req@ffff8c0c6e17df80 x1669214324882240/t0(0) o601->lustre-MDT0000-lwp-OST0001@10.9.6.237@tcp:23/10 lens 336/336 e 0 to 1 dl 1591889781 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'ll_ost_io00_005.0' [ 3075.076488] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.9.6.237@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 3083.079772] Lustre: 17933:0:(client.c:2245:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591889782/real 1591889782] req@ffff8c0c6e17c480 x1669214324882752/t0(0) o601->lustre-MDT0000-lwp-OST0001@10.9.6.237@tcp:23/10 lens 336/336 e 0 to 1 dl 1591889789 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'ll_ost_io00_005.0' [ 3083.084896] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.9.6.237@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 3091.610366] Lustre: 17933:0:(client.c:2245:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591889791/real 1591889791] req@ffff8c0c7be6b180 x1669214324883264/t0(0) o601->lustre-MDT0000-lwp-OST0001@10.9.6.237@tcp:23/10 lens 336/336 e 0 to 1 dl 1591889798 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'ll_ost_io00_005.0' [ 3091.615435] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.9.6.237@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 3098.919547] Lustre: DEBUG MARKER: lctl set_param at_max=600 [ 3099.698478] Lustre: DEBUG MARKER: dmesg [ 3101.618918] Lustre: 17933:0:(client.c:2245:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591889801/real 1591889801] req@ffff8c0c5d30fa80 x1669214324883776/t0(0) o601->lustre-MDT0000-lwp-OST0001@10.9.6.237@tcp:23/10 lens 336/336 e 0 to 1 dl 1591889808 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'ll_ost_io00_005.0' [ 3101.623866] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.9.6.237@tcp) was lost; in progress operations using this service will wait for recovery to complete
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-quota test_6 - dd not finished in 240 secs