[LU-2894] 2.1.4<->2.4.0 interop: recovery-small test 11: write: Cannot send after transport endpoint shutdown Created: 01/Mar/13  Updated: 09/Jan/20  Resolved: 09/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.1.4
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jian Yu Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/180
Lustre master server build: http://build.whamcloud.com/job/lustre-master/1278
Distro/Arch: RHEL6.3/x86_64


Severity: 3
Rank (Obsolete): 6978

 Description   

The recovery-small test 11 failed as follows:

== recovery-small test 11: wake up a thread waiting for completion after eviction (b=2460) == 05:16:17 (1362057377)
CMD: client-14vm2.lab.whamcloud.com multiop /mnt/lustre/f11 Ow
write: Cannot send after transport endpoint shutdown
 recovery-small test_11: @@@@@@ FAIL: test_11 failed with 1

Dmesg on client (client-14vm2) showed that:

Lustre: DEBUG MARKER: == recovery-small test 10: finish request on server after client eviction (bug 1521) == 05:16:05 (1362057365)
Lustre: DEBUG MARKER: mcreate /mnt/lustre/f10
Lustre: DEBUG MARKER: lctl set_param fail_loc=0x305
Lustre: DEBUG MARKER: chmod 0777 /mnt/lustre/f10
LustreError: 14535:0:(libcfs_fail.h:84:cfs_fail_check_set()) *** cfs_fail_loc=305 ***
LustreError: 8333:0:(libcfs_fail.h:84:cfs_fail_check_set()) *** cfs_fail_loc=305 ***
LustreError: 8333:0:(libcfs_fail.h:84:cfs_fail_check_set()) *** cfs_fail_loc=305 ***
Lustre: DEBUG MARKER: lctl set_param fail_loc=0
Lustre: DEBUG MARKER: touch /mnt/lustre/f10
LustreError: 167-0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
LustreError: 14550:0:(mdc_locks.c:736:mdc_enqueue()) ldlm_cli_enqueue: -4
LustreError: 14550:0:(lmv_obd.c:1036:lmv_fid_alloc()) Can't alloc new fid, rc -19
LustreError: 14550:0:(client.c:1060:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req@ffff88007b033400 x1428215442821761/t0(0) o101->lustre-MDT0000-mdc-ffff88007cd0f400@10.10.4.126@tcp:12/10 lens 544/1136 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
LustreError: 14550:0:(client.c:1060:ptlrpc_import_delay_req()) Skipped 1 previous similar message
LustreError: 14550:0:(mdc_locks.c:736:mdc_enqueue()) ldlm_cli_enqueue: -108
Lustre: DEBUG MARKER: checkstat -v -p 0777 /mnt/lustre/f10
Lustre: DEBUG MARKER: munlink /mnt/lustre/f10
Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 2>/dev/null || true
Lustre: DEBUG MARKER: /usr/sbin/lctl mark == recovery-small test 11: wake up a thread waiting for completion after eviction \(b=2460\) == 05:16:17 \(1362057377\)
LustreError: 167-0: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
Lustre: DEBUG MARKER: == recovery-small test 11: wake up a thread waiting for completion after eviction (b=2460) == 05:16:17 (1362057377)
Lustre: DEBUG MARKER: multiop /mnt/lustre/f11 Ow
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  recovery-small test_11: @@@@@@ FAIL: test_11 failed with 1 

Maloo report: https://maloo.whamcloud.com/test_sets/d07106e4-81ae-11e2-9f6b-52540035b04c



 Comments   
Comment by Jian Yu [ 13/Mar/13 ]

Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/186
Lustre master server build: http://build.whamcloud.com/job/lustre-master/1302
Distro/Arch: RHEL6.3/x86_64

The recovery-small test 11 passed: https://maloo.whamcloud.com/test_sets/dd3c5df8-8b57-11e2-965f-52540035b04c

Comment by Andreas Dilger [ 09/Jan/20 ]

Close old bug

Generated at Sat Feb 10 01:29:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.