[LUDOC-252] Copytool Recommendations - Add/Clarify Created: 05/Aug/14  Updated: 15/Sep/16

Status: Open
Project: Lustre Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Lustre Manual Triage
Resolution: Unresolved Votes: 0
Labels: hsm

Issue Links:
Related
is related to LU-4727 Lhsmtool_posix process stuck in ll_la... Resolved
Severity: 3
Rank (Obsolete): 15181

 Description   

There are several users that run the copytool and a Lustre client on the same node and use the same mount point for both. In LU-4727, the comment is made

 I strongly recommend using a dedicated mount point for the copytool. This should be somewhere in the HSM documentation.

In the documentation (Lustre manual) the closest thing to this recommendation is in section 22.2.1 "Requirements" :

a minimum of 2 clients, 1 used for your chosen computation task that generates useful data, and 1 used as an agent.

We need to add an additional recommendation to run the copytool on a dedicated mount point in either section 22.3 "Agents and copytool"



 Comments   
Comment by Frank Zago (Inactive) [ 17/Sep/14 ]

I've tried that, and it leads to errors.

For instance I have the same Lustre filesystem mounted on /mnt/l1 and /mnt/l2. I run the copytool on /mnt/l2, and issue the following "lfs hsm*" commands on /mnt/l1 (staged in time so the previous command completes):

rm -f /mnt/l1/share/ls
cp /bin/ls /mnt/l1/share/ls
lfs hsm_archive /mnt/l1/share/ls
lfs hsm_release /mnt/l1/share/ls
lfs hsm_restore /mnt/l1/share/ls
lfs hsm_remove /mnt/l1/share/ls
lfs hsm_archive /mnt/l1/share/ls

The last archive command will fail.

lhsmtool_posix[19897]: '[0x200002b10:0x98:0x0]' action ARCHIVE reclen 72, cookie=0x54185972
lhsmtool_posix[19897]: processing file 'share/ls'
lhsmtool_posix[19897]: archiving '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' to '/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp'
lhsmtool_posix[19897]: saving stripe info of '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' in /vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp.lov
lhsmtool_posix[19897]: going to copy data from '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' to '/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp'
lhsmtool_posix[19897]: progress ioctl for copy '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0'->'/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp' failed: No such file or directory (2)
lhsmtool_posix[19897]: data copy failed from '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' to '/vsm/tasfs1/0098/0000/2b10/0000/0002/0000/0x200002b10:0x98:0x0_tmp': No such file or directory (2)
lhsmtool_posix[19897]: Action completed, notifying coordinator cookie=0x54185972, FID=[0x200002b10:0x98:0x0], hp_flags=0 err=2
lhsmtool_posix[19897]: llapi_hsm_action_end() on '/mnt/l2/.lustre/fid/0x200002b10:0x98:0x0' failed: No such file or directory (2)

If I wait a bit and restart the archive command, it succeeds.

Comment by Henri Doreau (Inactive) [ 02/Apr/15 ]

It should work
You hit LU-5683

Generated at Sat Feb 10 03:41:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.