Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 15181

    Description

      There are several users that run the copytool and a Lustre client on the same node and use the same mount point for both. In LU-4727, the comment is made

       I strongly recommend using a dedicated mount point for the copytool. This should be somewhere in the HSM documentation.
      

      In the documentation (Lustre manual) the closest thing to this recommendation is in section 22.2.1 "Requirements" :

      a minimum of 2 clients, 1 used for your chosen computation task that generates useful data, and 1 used as an agent.
      

      We need to add an additional recommendation to run the copytool on a dedicated mount point in either section 22.3 "Agents and copytool"

      Attachments

        Issue Links

          Activity

            [LUDOC-252] Copytool Recommendations - Add/Clarify

            It should work
            You hit LU-5683

            hdoreau Henri Doreau (Inactive) added a comment - It should work You hit LU-5683

            I've tried that, and it leads to errors.

            For instance I have the same Lustre filesystem mounted on /mnt/l1 and /mnt/l2. I run the copytool on /mnt/l2, and issue the following "lfs hsm*" commands on /mnt/l1 (staged in time so the previous command completes):

            rm -f /mnt/l1/share/ls
            cp /bin/ls /mnt/l1/share/ls
            lfs hsm_archive /mnt/l1/share/ls
            lfs hsm_release /mnt/l1/share/ls
            lfs hsm_restore /mnt/l1/share/ls
            lfs hsm_remove /mnt/l1/share/ls
            lfs hsm_archive /mnt/l1/share/ls
            

            The last archive command will fail.

            lhsmtool_posix[19897]: '[0x200002b10:0x98:0x0]' action ARCHIVE reclen 72, cookie=0x54185972
            lhsmtool_posix[19897]: processing file 'share/ls'
            lhsmtool_posix[19897]: archiving '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' to '/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp'
            lhsmtool_posix[19897]: saving stripe info of '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' in /vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp.lov
            lhsmtool_posix[19897]: going to copy data from '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' to '/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp'
            lhsmtool_posix[19897]: progress ioctl for copy '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0'->'/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp' failed: No such file or directory (2)
            lhsmtool_posix[19897]: data copy failed from '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' to '/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp': No such file or directory (2)
            lhsmtool_posix[19897]: Action completed, notifying coordinator cookie=0x54185972, FID=[0x200002b10:0x98:0x0], hp_flags=0 err=2
            lhsmtool_posix[19897]: llapi_hsm_action_end() on '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' failed: No such file or directory (2)
            

            If I wait a bit and restart the archive command, it succeeds.

            fzago Frank Zago (Inactive) added a comment - I've tried that, and it leads to errors. For instance I have the same Lustre filesystem mounted on /mnt/l1 and /mnt/l2. I run the copytool on /mnt/l2, and issue the following "lfs hsm*" commands on /mnt/l1 (staged in time so the previous command completes): rm -f /mnt/l1/share/ls cp /bin/ls /mnt/l1/share/ls lfs hsm_archive /mnt/l1/share/ls lfs hsm_release /mnt/l1/share/ls lfs hsm_restore /mnt/l1/share/ls lfs hsm_remove /mnt/l1/share/ls lfs hsm_archive /mnt/l1/share/ls The last archive command will fail. lhsmtool_posix[19897]: '[0x200002b10:0x98:0x0]' action ARCHIVE reclen 72, cookie=0x54185972 lhsmtool_posix[19897]: processing file 'share/ls' lhsmtool_posix[19897]: archiving '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' to '/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp' lhsmtool_posix[19897]: saving stripe info of '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' in /vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp.lov lhsmtool_posix[19897]: going to copy data from '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' to '/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp' lhsmtool_posix[19897]: progress ioctl for copy '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0'->'/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp' failed: No such file or directory (2) lhsmtool_posix[19897]: data copy failed from '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' to '/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp': No such file or directory (2) lhsmtool_posix[19897]: Action completed, notifying coordinator cookie=0x54185972, FID=[0x200002b10:0x98:0x0], hp_flags=0 err=2 lhsmtool_posix[19897]: llapi_hsm_action_end() on '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' failed: No such file or directory (2) If I wait a bit and restart the archive command, it succeeds.

            People

              LM-Triage Lustre Manual Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: