[LU-13332] Can't start lhsmtool Created: 05/Mar/20  Updated: 05/Mar/20  Resolved: 05/Mar/20

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Question/Request Priority: Minor
Reporter: Mahmoud Hanafi Assignee: Ben Evans (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

Not sure what I am doing wrong. Followed the documentation.
I have hsm_control enabled on mdt

nbptest3-srv1 ~ # lctl get_param mdt.*.hsm_control
mdt.nbptest3-MDT0000.hsm_control=enabled

When I try to start hsm I get an error

service198 ~ # lhsmtool_posix --hsm-root /mnt/pcc -v /nobackuptest3 
lhsmtool_posix: 1583393927.046985 lhsmtool_posix[10762]: action=0 src=(null) dst=(null) mount_point=/nobackuptest3
lhsmtool_posix: cannot start copytool on '/nobackuptest3': No such device or address (6)
lhsmtool_posix: 1583393927.080040 lhsmtool_posix[10762]: cannot start copytool interface: No such device or address (6)
lhsmtool_posix: 1583393927.080134 lhsmtool_posix[10762]: process finished, errs: 0 major, 0 minor, rc=-6 (No such device or address)
service198 ~ # df
Filesystem                     1K-blocks      Used    Available Use% Mounted on
devtmpfs                        16300832         0     16300832   0% /dev
tmpfs                           16315784         4     16315780   1% /dev/shm
tmpfs                           16315784     14768     16301016   1% /run
/dev/sda32                     237959828   9766176    216099292   5% /
tmpfs                           16315784         0     16315784   0% /sys/fs/cgroup
/dev/sda11                       1176704     72300      1042964   7% /boot
10.151.27.53@o2ib:/nbptest3 311935815440     14440 296206997160   1% /nobackuptest3
/dev/nvme0n1                  1537235176  46143840   1412934264   4% /mnt/pcc
tmpfs                            3263160         0      3263160   0% /run/user/11312


 Comments   
Comment by Peter Jones [ 05/Mar/20 ]

Ben

Could you please advise?

Thanks

Peter

Comment by Ben Evans (Inactive) [ 05/Mar/20 ]

Mahmoud, could you please run lfs df on the lhsmtool client?

If you could also get /var/log/messages from the MDS and the client where this is run (just the last few lines when you try to start the copytool should be sufficient)

Comment by Mahmoud Hanafi [ 05/Mar/20 ]

Nothing on the mds

Mar  5 13:19:33 nbptest3-srv1 kernel: [146234.152375] Lustre: nbptest3-OST0000: Connection restored to 964be805-c593-4 (at 10.151.27.56@o2ib)
Mar  5 13:19:33 nbptest3-srv1 kernel: [146234.152378] Lustre: Skipped 1 previous similar message
Mar  5 13:20:01 nbptest3-srv1 systemd[1]: Created slice User Slice of root.
Mar  5 13:20:01 nbptest3-srv1 systemd[1]: Started Session 1443 of user root.
Mar  5 13:20:01 nbptest3-srv1 systemd[1]: Started Session 1444 of user root.
Mar  5 13:20:01 nbptest3-srv1 systemd[1]: Started Session 1445 of user root.
Mar  5 13:20:01 nbptest3-srv1 systemd[1]: Removed slice User Slice of root.
nbptest3-srv1 ~ # 

 
On the client

Mar  5 13:19:14 service198 kernel: [21244.122717] LNet: Using FMR for registration
Mar  5 13:19:16 service198 kernel: [21246.847040] LNet: Added LNI 10.151.27.56@o2ib [32/125536/0/0]
Mar  5 13:19:33 service198 kernel: [21263.646119] Lustre: Mounted nbptest3-client
Mar  5 13:20:02 service198 systemd[1]: Created slice User Slice of root.
Mar  5 13:20:02 service198 systemd[1]: Started Session 216 of user root.
Mar  5 13:20:02 service198 systemd[1]: Started Session 214 of user root.
Mar  5 13:20:02 service198 systemd[1]: Started Session 215 of user root.
Mar  5 13:20:03 service198 systemd[1]: Removed slice User Slice of root.
Mar  5 13:20:28 service198 kernel: [21318.585281] LustreError: 33188:0:(lmv_obd.c:778:lmv_hsm_ct_register()) nbptest3-clilmv-ffffa0b793817000: iocontrol MDC nbptest3-MDT0001_UUID on MDT idx 1 cmd 401866d5: err = -6
Mar  5 13:20:28 service198 kernel: [21318.603000] VFS: Close: file count is 0

On the cmd line

service198 ~ # lhsmtool_posix --daemon --hsm-root /mnt/pcc /nobackuptest3
lhsmtool_posix: 1583443391.886380 lhsmtool_posix[33322]: action=0 src=(null) dst=(null) mount_point=/nobackuptest3
service198 ~ # lhsmtool_posix: cannot start copytool on '/nobackuptest3': No such device or address (6)
lhsmtool_posix: 1583443391.920160 lhsmtool_posix[33323]: cannot start copytool interface: No such device or address (6)
lhsmtool_posix: 1583443391.920277 lhsmtool_posix[33323]: process finished, errs: 0 major, 0 minor, rc=-6 (No such device or address)

Mounted Filesystem

Filesystem                     1K-blocks     Used    Available Use% Mounted on
devtmpfs                        16300832        0     16300832   0% /dev
tmpfs                           16315784        4     16315780   1% /dev/shm
tmpfs                           16315784    14672     16301112   1% /run
/dev/sda32                     237959828 13620136    212245332   7% /
tmpfs                           16315784        0     16315784   0% /sys/fs/cgroup
/dev/sda11                       1176704    72300      1042964   7% /boot
/dev/nvme0n1                  1537235176 46143840   1412934264   4% /mnt/pcc
10.151.27.53@o2ib:/nbptest3 311935815440    14440 296206997160   1% /nobackuptest3
service198 /nobackuptest3 # lctl dl
  0 UP mgc MGC10.151.27.53@o2ib eddb5d50-3b93-4 4
  1 UP lov nbptest3-clilov-ffffa0b793817000 a30aa14f-9ea3-4 3
  2 UP lmv nbptest3-clilmv-ffffa0b793817000 a30aa14f-9ea3-4 4
  3 UP mdc nbptest3-MDT0001-mdc-ffffa0b793817000 a30aa14f-9ea3-4 4
  4 UP mdc nbptest3-MDT0000-mdc-ffffa0b793817000 a30aa14f-9ea3-4 4
  5 UP osc nbptest3-OST0005-osc-ffffa0b793817000 a30aa14f-9ea3-4 4
  6 UP osc nbptest3-OST0003-osc-ffffa0b793817000 a30aa14f-9ea3-4 4
  7 UP osc nbptest3-OST0007-osc-ffffa0b793817000 a30aa14f-9ea3-4 4
  8 UP osc nbptest3-OST0004-osc-ffffa0b793817000 a30aa14f-9ea3-4 4
  9 UP osc nbptest3-OST0009-osc-ffffa0b793817000 a30aa14f-9ea3-4 4
 10 UP osc nbptest3-OST0001-osc-ffffa0b793817000 a30aa14f-9ea3-4 4
 11 UP osc nbptest3-OST0008-osc-ffffa0b793817000 a30aa14f-9ea3-4 4
 12 UP osc nbptest3-OST0000-osc-ffffa0b793817000 a30aa14f-9ea3-4 4
 13 UP osc nbptest3-OST0006-osc-ffffa0b793817000 a30aa14f-9ea3-4 4
 14 UP osc nbptest3-OST0002-osc-ffffa0b793817000 a30aa14f-9ea3-4 4
Comment by Mahmoud Hanafi [ 05/Mar/20 ]

We can close this I figured out the issue. I forgot about the second MDT and didn't enable hsm_control on the second MDT. Its working now.

Comment by Ben Evans (Inactive) [ 05/Mar/20 ]

Resolved by user

Generated at Sat Feb 10 03:00:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.