Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3647 HSM _not only_ small fixes and to do list goes here
  3. LU-3685

some paths in ll_ioc_copy_{start,end} set hpk_errval non-zero but don't set HP_FLAG_COMPLETED

Details

    • Technical task
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 9517

    Description

      Running racer with HSM operations I see messages of the form:

      LustreError: 4158:0:(ldlm_resource.c:1188:ldlm_resource_get()) lustre-OST0001: lvbo_init failed for resource 0x1936:0x0: rc = -2
      LustreError: 11-0: lustre-OST0001-osc-ffff8801f01ff000: Communicating with 0@lo, operation ost_getattr failed with -12.
      LustreError: 4140:0:(mdt_coordinator.c:1500:mdt_hsm_update_request_state()) lustre-MDT0000: Progress on [0x200000401:0x972f:0x0] for cookie 0x51faf3c6 action=ARCHIVE is not coherent (err=12 and not completed (flags=2))
      

      after which the coordinator just stops sending actions to the copytool.

      The coordinator seems to just drop these incoherent progress kernels. Is there a use case for a HPK with hpk_errval != 0 but which is not complete?

      Do not be distracted by the specific errno here. The node is not really OOM, it's just that somewhere in the OST code a NULL something is misinterpreted as meaning -ENOMEM, whereas really it means -ENOENT or something.

      Attachments

        Activity

          [LU-3685] some paths in ll_ioc_copy_{start,end} set hpk_errval non-zero but don't set HP_FLAG_COMPLETED
          jhammond John Hammond added a comment -

          Patch landed to master.

          jhammond John Hammond added a comment - Patch landed to master.
          adegremont Aurelien Degremont (Inactive) added a comment - Patch for this: http://review.whamcloud.com/7265

          "It seems that I cannot assign this issue to you since JIRA does not consider you to be a "developer." My condolences. I will and see about adding you to that group"
          In the past it was restricted to Whamcloud/Intel employees (because this group may see things non coorp guys like us should not see). If possible add Thomas, henri and myself

          jcl jacques-charles lafoucriere added a comment - "It seems that I cannot assign this issue to you since JIRA does not consider you to be a "developer." My condolences. I will and see about adding you to that group" In the past it was restricted to Whamcloud/Intel employees (because this group may see things non coorp guys like us should not see). If possible add Thomas, henri and myself
          jhammond John Hammond added a comment -

          Hi Aurelien,

          You are correct about ll_ioc_copy_end(). My mistake.

          I reproduced this by adding an HSM archive, release, restore loop to racer. But it can be done more specifically by racing unlink versus archive.

          It seems that I cannot assign this issue to you since JIRA does not consider you to be a "developer." My condolences. I will and see about adding you to that group.

          jhammond John Hammond added a comment - Hi Aurelien, You are correct about ll_ioc_copy_end(). My mistake. I reproduced this by adding an HSM archive, release, restore loop to racer. But it can be done more specifically by racing unlink versus archive. It seems that I cannot assign this issue to you since JIRA does not consider you to be a "developer." My condolences. I will and see about adding you to that group.

          You could assign this ticket to me.

          adegremont Aurelien Degremont (Inactive) added a comment - You could assign this ticket to me.

          Hi John,

          I've looked at this. Indeed, HP_FLAG_COMPLETED is missing on error cases for copy_start(), but everything seems fine for copy_end(). Could you confirm?
          By the way, do you have a way to reproduce this?

          adegremont Aurelien Degremont (Inactive) added a comment - Hi John, I've looked at this. Indeed, HP_FLAG_COMPLETED is missing on error cases for copy_start(), but everything seems fine for copy_end(). Could you confirm? By the way, do you have a way to reproduce this?

          People

            jay Jinshan Xiong (Inactive)
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: