[LU-15915] /bin/rm: fts_read failed: Cannot send after transport endpoint shutdown - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.12.8
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Am running a large number of deletes on clients and after a while they get evicted, the error on the client is:

/bin/rm: fts_read failed: Cannot send after transport endpoint shutdown

On the MDS, the error is:

un  6 19:28:59 fmds1 kernel: LustreError: 9744:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.21.22.31@tcp  ns: mdt-foxtrot-MDT0000_UUID lock: ffff94f72a408480/0xb4442ee3e798319c lrc: 3/0,0 mode: PR/PR res: [0x20009b3c6:0x29eb:0x0].0x0 bits 0x20/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.21.22.31@tcp remote: 0x40ff70b2e6a5419f expref: 147862 pid: 61992 timeout: 6578337 lvb_type: 0

I'm running maybe 10-15 recursive rm on 3 clients, so 30-45 in total at once.

I've set debugging params as follows:

lctl set_param debug_mb=1024
lctl set_param debug="+dlmtrace +info +rpctrace"
lctl set_param dump_on_eviction=1

on clients and the MDS.

Lustre version is 2.12.8_6_g5457c37

Attachments

Issue Links

duplicates

LU-14741 Close RPC might get stuck behind normal RPCs waiting for slot

Resolved

is related to

LU-14741 Close RPC might get stuck behind normal RPCs waiting for slot

Resolved

LU-15821 Server driven blocking callbacks can wait behind general lru_size management

Resolved

Activity

People

Assignee:: Lai Siyao

Reporter:: Dneg (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 06/Jun/22 7:23 PM

Updated:: 25/Nov/22 1:48 PM

Resolved:: 19/Nov/22 4:22 PM