Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
None
-
3
-
12530
Description
Some users have reported to us that the "rm" command is taking a long time. Some investigation revealed that at least the first "rm" in a directory takes just over 100 seconds, which of course sounds like OBD_TIMEOUT_DEFAULT.
This isn't necessarily the simplest reproducer, but the following reproducer is completely consistent:
- set directory striping default count to 48
- touch a file on client A
- rm file on client B
The clients are running 2.4.0-19chaos, servers are at 2.4.0-21chaos. The servers are using zfs as the backend.
I have some lustre logs that I will share and talk about in additional posts to this ticket. But essentially it looks like the server always times out on a AST to client A (explaining the 100 second delay). It is not really clear yet to me why that happens, because client A appears to be completely responsive. My current suspicion is the the MDT is to blame.
Attachments
Issue Links
- duplicates
-
LU-4963 client eviction during IOR test - lock callback timer expired
- Closed
- is related to
-
LU-5525 ASSERTION( new_lock->l_readers + new_lock->l_writers == 0 ) failed
- Resolved
-
LU-5632 ldlm_lock_addref()) ASSERTION( lock != ((void *)0) )
- Resolved
-
LU-5686 (mdt_handler.c:3203:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed
- Resolved
- is related to
-
LU-2827 mdt_intent_fixup_resent() cannot find the proper lock in hash
- Resolved