[LU-5655] lhsmtool_posix (copy tool agent) does not provide facility to un register Created: 24/Sep/14 Updated: 25/Feb/15 Resolved: 25/Feb/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0, Lustre 2.6.0, Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Critical |
| Reporter: | Vinayak Hariharmath (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | HSM | ||
| Environment: |
centos 6.5 |
||
| Attachments: |
|
| Epic: | client |
| Project: | HSM |
| Rank (Obsolete): | 15853 |
| Description |
|
steps : # lctl set_param mdt.lustre-MDT0000.hsm_control=enabled 3. Started copy tool daemon (only one copy tool agent can run on a client ) #lhsmtool_posix --daemon --hsm-root /tmp/HSM --archive=1 /mnt/lustre/ 4. Only way to stop agent is to send TERM signal to agent as per lustre manual. So I killed it (as I wanted to run modified copy tool agent) # ps -ef | grep lhs
root 4017 1 0 16:54 ? 00:00:00 lhsmtool_posix --daemon --hsm-root /tmp/HSM --archive=1 /mnt/lustre/
root 4045 2110 0 16:55 pts/1 00:00:00 grep lhs
5. Now I tried to start new copy tool agent #lhsmtool_posix --daemon --hsm-root /tmp/HSM --archive=1 /mnt/lustre/ But got below message from kernel Sep 11 16:55:34 localhost kernel: Lustre: HSM agent b150c068-22f2-83cd-21b0-2b4e76a3082a already registered. |
| Comments |
| Comment by Vinayak Hariharmath (Inactive) [ 24/Sep/14 ] |
|
I feel once the daemon is killed , it should get unregistered. |
| Comment by Andreas Dilger [ 26/Sep/14 ] |
|
Is this a new problem in 2.6.0 or does the same problem exist in 2.5? |
| Comment by Robert Read (Inactive) [ 29/Sep/14 ] |
|
I'm not able to reproduce this on 2.5.3. Whenever I kill the daemon, it stops running and the is no longer registered. It's not clear in the description that the ps in step 4 was run before or after you killed the copytool. If it is after, then clearly the daemon is still running for some reason and so it is still registered. |
| Comment by Vinayak Hariharmath (Inactive) [ 14/Oct/14 ] |
|
ps in step 4 was run before I killed copy tool after that I killed it. Later I tried to start copy tool again but I got below message. Using b2_6 branch. Oct 14 17:53:59 localhost kernel: Lustre: HSM agent 0f4d5578-4a43-df67-a8b1-ce31a2c2cd3a already registered |
| Comment by Robert Read (Inactive) [ 30/Oct/14 ] |
|
I'm seeing this on b2_6 as well, so this appears to be new in 2.6.0. |
| Comment by Jodi Levi (Inactive) [ 02/Dec/14 ] |
|
Bruno, |
| Comment by Bruno Faccini (Inactive) [ 03/Dec/14 ] |
|
Ok. I just wonder if copytool death and unregister could not take longer than before in b2_6 and then if this ticket could be related to |
| Comment by Bruno Faccini (Inactive) [ 08/Dec/14 ] |
|
BTW I am unable to reproduce it with master/b2_6 builds. |
| Comment by Robert Read (Inactive) [ 08/Dec/14 ] |
|
I just tried again on a recent master build and wasn't able to reproduce it, either. I don't recall exact version of 2.6 I had been using, but the configuration would have been a single node setup using llmount.sh plus an additional mountpoint for the copytool. |
| Comment by Bruno Faccini (Inactive) [ 11/Dec/14 ] |
|
Vinayak, are you still able to reproduce the problem at your site? |
| Comment by Vinayak Hariharmath (Inactive) [ 11/Dec/14 ] |
|
Sure. I will verify it on my side and update the bug. Sorry I was stuck with other work and could not able update it. |
| Comment by Bruno Faccini (Inactive) [ 20/Jan/15 ] |
|
Vinayak, Any update ? Did you spend some time for more testing about this issue ? Could it be, as I suspected above, that the copytool unregister could take longer ? |
| Comment by Vinayak Hariharmath (Inactive) [ 21/Jan/15 ] |
|
Hi Yes I have spent bit time on it to verify. Lustre: HSM agent 277f9613-7a7d-1caa-5849-0b5ffe11fdbb already registered but there is no problem with copy tool which works fine (Earlier with the above error message, copy tool was failing to start after killing it). I guess it only a grammar correction. Dmesg logs attached. |
| Comment by Bruno Faccini (Inactive) [ 24/Feb/15 ] |
|
Hello Vinayak, |
| Comment by Vinayak Hariharmath (Inactive) [ 25/Feb/15 ] |
|
Hi Bruno, We can close this as "Cannot reproduce", but there is one more ticket LU-5216 looks bit similar to this issue. I am trying to reproduce it and draw some relation with this. Thanks |
| Comment by Bruno Faccini (Inactive) [ 25/Feb/15 ] |
|
According to the description in LU-5216, then we may easily think that if both tickets are related, this one would only be a consequence of LU-5216 due to some orphan copytool thread still present or related data structure deferred/missing cleanup. |