Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10302

hsm: obscure bug with multi-mountpoints and ldlm

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      I do not have much to share except the attached reproducer.

      The key elements of the reproducer seem to be:

      1. setup lustre with two mountpoints;
      2. create a file;
      3. launch a copytool on mountpoint A;
      4. suspend the copytool;
      5. archive the file created at step 1 from mountpoint A*;
      6. delete the file on mountpoint B;
      7. sync;
      8. un-suspend the copytool (the output of the copytool should indicate that llapi_hsm_action_begin() failed with EIO, not ENOENT)
      9. umount => the process hangs in an unkillable state.

      *You can use mountpoint B at step 5, but only if you created the file from mountpoint A.

      I added some debug in the reproducer that should be logged in /tmp.

      I suspect those two lines in the dmesg are related to this issue (they are logged at umount time):

      [  143.575078] LustreError: 3703:0:(ldlm_resource.c:1094:ldlm_resource_complain()) filter-lustre-OST0000_UUID: namespace resource [0x2:0x0:0x0].0x0 (ffff8806ab7b6900) refcount nonzero (1) after lock cleanup; forcing cleanup.
      [  143.578233] LustreError: 3703:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x2:0x0:0x0].0x0 (ffff8806ab7b6900) refcount = 2
      

      Note: the title should probably be updated once we figure what the issue exactly is

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              cealustre CEA
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: