[LU-11084] Lustre client got evicted and cannot recover Created: 12/Jun/18 Updated: 01/Feb/21 Resolved: 01/Feb/21 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | anhua | Assignee: | Yang Sheng |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
centos 7.2; MDS * 2; OSS * 2; |
||
| Epic/Theme: | Lustre-2.8.0 |
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
As can be seen from the log, the client is evicted by the OST0014. After that, this ost on that client can not write or read. Reading from that ost will give out errors like -108. The corresponding obd is with status "IN" as can be seen from "lctl dl". We failed to umount, unless with -l option. And there are periodical "sluggish" warnings as can be seen from /var/log/messages around every 10mins. However other clients are all normal. This client can not recover even after several hours, even days I believe (I didn't try waiting that long).
OSS02: Jun 10 13:12:21 oss02 kernel: LustreError: 10566:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer expired after 150s: evicting client at 10.3.28.26@o2ib ns: filter-stjfs-OST0014_UUID lock: ffff8831290af000/0xc76c97daaf26c922 lrc: 3/0,0 mode: PW/PW res: [0x1fe2965:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->32767) flags: 0x60000400010020 nid: 10.3.28.26@o2ib remote: 0x9b6b25718fa9bc69 expref: 250765 pid: 9550 timeout: 5051103759 lvb_type: 0 Jun 10 13:13:48 oss02 kernel: LustreError: 12072:0:(ldlm_lockd.c:2368:ldlm_cancel_handler()) ldlm_cancel from 10.3.28.26@o2ib arrived at 1528603939 with bad export cookie 14370027470008155739
client:
|