Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
I do not have much to share except the attached reproducer.
The key elements of the reproducer seem to be:
- setup lustre with two mountpoints;
- create a file;
- launch a copytool on mountpoint A;
- suspend the copytool;
- archive the file created at step 1 from mountpoint A*;
- delete the file on mountpoint B;
- sync;
- un-suspend the copytool (the output of the copytool should indicate that llapi_hsm_action_begin() failed with EIO, not ENOENT)
- umount => the process hangs in an unkillable state.
*You can use mountpoint B at step 5, but only if you created the file from mountpoint A.
I added some debug in the reproducer that should be logged in /tmp.
I suspect those two lines in the dmesg are related to this issue (they are logged at umount time):
[ 143.575078] LustreError: 3703:0:(ldlm_resource.c:1094:ldlm_resource_complain()) filter-lustre-OST0000_UUID: namespace resource [0x2:0x0:0x0].0x0 (ffff8806ab7b6900) refcount nonzero (1) after lock cleanup; forcing cleanup. [ 143.578233] LustreError: 3703:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x2:0x0:0x0].0x0 (ffff8806ab7b6900) refcount = 2
Note: the title should probably be updated once we figure what the issue exactly is
Attachments
Issue Links
Activity
Reporter | Original: Quentin Bouget [ bougetq ] | New: CEA [ cealustre ] |
Link | Original: This issue is related to JFC-19 [ JFC-19 ] |
Link | Original: This issue is related to JFC-10 [ JFC-10 ] |
Link | New: This issue is related to JFC-20 [ JFC-20 ] |
Fix Version/s | New: Lustre 2.11.0 [ 13091 ] | |
Assignee | Original: Bruno Faccini [ bfaccini ] | New: John Hammond [ jhammond ] |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Link | New: This issue is related to JFC-19 [ JFC-19 ] |
Link | New: This issue is related to JFC-10 [ JFC-10 ] |
Landed for 2.11