HSM _not only_ small fixes and to do list goes here
(LU-3647)
|
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Technical task | Priority: | Major |
| Reporter: | John Hammond | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HSM | ||
| Rank (Obsolete): | 9517 |
| Description |
|
Running racer with HSM operations I see messages of the form: LustreError: 4158:0:(ldlm_resource.c:1188:ldlm_resource_get()) lustre-OST0001: lvbo_init failed for resource 0x1936:0x0: rc = -2 LustreError: 11-0: lustre-OST0001-osc-ffff8801f01ff000: Communicating with 0@lo, operation ost_getattr failed with -12. LustreError: 4140:0:(mdt_coordinator.c:1500:mdt_hsm_update_request_state()) lustre-MDT0000: Progress on [0x200000401:0x972f:0x0] for cookie 0x51faf3c6 action=ARCHIVE is not coherent (err=12 and not completed (flags=2)) after which the coordinator just stops sending actions to the copytool. The coordinator seems to just drop these incoherent progress kernels. Is there a use case for a HPK with hpk_errval != 0 but which is not complete? Do not be distracted by the specific errno here. The node is not really OOM, it's just that somewhere in the OST code a NULL something is misinterpreted as meaning -ENOMEM, whereas really it means -ENOENT or something. |
| Comments |
| Comment by Aurelien Degremont (Inactive) [ 05/Aug/13 ] |
|
Hi John, I've looked at this. Indeed, HP_FLAG_COMPLETED is missing on error cases for copy_start(), but everything seems fine for copy_end(). Could you confirm? |
| Comment by Aurelien Degremont (Inactive) [ 05/Aug/13 ] |
|
You could assign this ticket to me. |
| Comment by John Hammond [ 05/Aug/13 ] |
|
Hi Aurelien, You are correct about ll_ioc_copy_end(). My mistake. I reproduced this by adding an HSM archive, release, restore loop to racer. But it can be done more specifically by racing unlink versus archive. It seems that I cannot assign this issue to you since JIRA does not consider you to be a "developer." My condolences. I will and see about adding you to that group. |
| Comment by jacques-charles lafoucriere [ 05/Aug/13 ] |
|
"It seems that I cannot assign this issue to you since JIRA does not consider you to be a "developer." My condolences. I will and see about adding you to that group" |
| Comment by Aurelien Degremont (Inactive) [ 07/Aug/13 ] |
|
Patch for this: http://review.whamcloud.com/7265 |
| Comment by John Hammond [ 19/Aug/13 ] |
|
Patch landed to master. |