Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.12.8
-
None
-
3
-
9223372036854775807
Description
Am running a large number of deletes on clients and after a while they get evicted, the error on the client is:
/bin/rm: fts_read failed: Cannot send after transport endpoint shutdown
On the MDS, the error is:
un 6 19:28:59 fmds1 kernel: LustreError: 9744:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.21.22.31@tcp ns: mdt-foxtrot-MDT0000_UUID lock: ffff94f72a408480/0xb4442ee3e798319c lrc: 3/0,0 mode: PR/PR res: [0x20009b3c6:0x29eb:0x0].0x0 bits 0x20/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.21.22.31@tcp remote: 0x40ff70b2e6a5419f expref: 147862 pid: 61992 timeout: 6578337 lvb_type: 0
I'm running maybe 10-15 recursive rm on 3 clients, so 30-45 in total at once.
I've set debugging params as follows:
lctl set_param debug_mb=1024 lctl set_param debug="+dlmtrace +info +rpctrace" lctl set_param dump_on_eviction=1
on clients and the MDS.
Lustre version is 2.12.8_6_g5457c37