Details
-
Technical task
-
Resolution: Fixed
-
Major
-
Lustre 2.5.0
-
9517
Description
Running racer with HSM operations I see messages of the form:
LustreError: 4158:0:(ldlm_resource.c:1188:ldlm_resource_get()) lustre-OST0001: lvbo_init failed for resource 0x1936:0x0: rc = -2 LustreError: 11-0: lustre-OST0001-osc-ffff8801f01ff000: Communicating with 0@lo, operation ost_getattr failed with -12. LustreError: 4140:0:(mdt_coordinator.c:1500:mdt_hsm_update_request_state()) lustre-MDT0000: Progress on [0x200000401:0x972f:0x0] for cookie 0x51faf3c6 action=ARCHIVE is not coherent (err=12 and not completed (flags=2))
after which the coordinator just stops sending actions to the copytool.
The coordinator seems to just drop these incoherent progress kernels. Is there a use case for a HPK with hpk_errval != 0 but which is not complete?
Do not be distracted by the specific errno here. The node is not really OOM, it's just that somewhere in the OST code a NULL something is misinterpreted as meaning -ENOMEM, whereas really it means -ENOENT or something.