[LU-9751] ZFS snapshot doesn't work when using RSH Created: 07/Jul/17 Updated: 14/Jun/18 Resolved: 14/Jun/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
build: b2_10 #2 tag-RC1 When testing the ZFS Snapshot with Subdir Mount, I found that the lctl snapshot_* command only works with ssh, with rsh, it returns error. I have tried with both snapshot_create and list, both return the same error. with ssh: [root@onyx-69 ~]# lctl snapshot_list -d -F lustre --name test-snap Password: filesystem_name: lustre snapshot_name: test-snap snapshot_role: MDT0000 Password: modify_time: Fri Jul 7 21:26:41 2017 create_time: Fri Jul 7 21:26:41 2017 snapshot_fsname: 55284e9 Password: status: not mount snapshot_role: OST0000 modify_time: Fri Jul 7 21:26:41 2017 create_time: Fri Jul 7 21:26:41 2017 snapshot_fsname: 55284e9 status: not mount try with same environment but with rsh option [root@onyx-69 ~]# lctl snapshot_list -d -F lustre --name test-snap -r rsh bash: zfs: command not found Can't list the snapshot test-snap [root@onyx-69 ~]# The rsh did work between nodes. [root@onyx-69 ~]# rsh onyx-70 Last login: Fri Jul 7 21:26:07 from onyx-69.onyx.hpdd.intel.com [root@onyx-70 ~]# |
| Comments |
| Comment by Peter Jones [ 07/Jul/17 ] |
|
Fan Yong Can you please advise on this one? Thanks Peter |
| Comment by nasf (Inactive) [ 07/Jul/17 ] |
|
Sarah, Would you please to attach the log file /var/log/lsnapshot.log? Thanks! |
| Comment by nasf (Inactive) [ 10/Jul/17 ] |
Sat Jul 8 01:33:28 2017 (33860:jt_snapshot_list:2161:lustre:ssh): Can't list snapshot test with detail <no>: -22 Sat Jul 8 01:37:17 2017 (33964:jt_snapshot_list:2161:lustre:rsh): Can't list snapshot test with detail <no>: -22 The log shows that the snapshot_list has ever failed with "ssh" before the "rsh" failure, right? Would you please to try the following: lctl snapshot_list -F lustre --name test-snap -r ssh lctl snapshot_list -F lustre --name test-snap -r rsh lctl snapshot_list -F lustre --name test-snap Thanks! |
| Comment by Sarah Liu [ 10/Jul/17 ] |
|
yes, before rsh, it failed with ssh since I didn't setup the keyless. Then I want to try with the easy way(rsh) and failed. |
| Comment by nasf (Inactive) [ 11/Jul/17 ] |
|
Thanks Sarah. If you still have the environment, I can login and try by myself. |
| Comment by Sarah Liu [ 11/Jul/17 ] |
|
env is [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r ssh filesystem_name: lustre snapshot_name: test snapshot_fsname: 7cf6c78 create_time: Tue Jul 11 18:55:53 2017 modify_time: Tue Jul 11 18:55:53 2017 status: not mount [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r rsh bash: zfs: command not found Can't list the snapshot test [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test filesystem_name: lustre snapshot_name: test snapshot_fsname: 7cf6c78 create_time: Tue Jul 11 18:55:53 2017 modify_time: Tue Jul 11 18:55:53 2017 status: not mount |
| Comment by Gerrit Updater [ 12/Jul/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/27999 |
| Comment by nasf (Inactive) [ 12/Jul/17 ] |
|
Sarah, Would you please to try above patch? Thanks! |
| Comment by Sarah Liu [ 13/Jul/17 ] |
|
the patch doesn't work, even the ssh fails with syntax error. [root@onyx-69 ~]# lctl snapshot_create -F lustre -n test
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
ss [ OPTIONS ] [ FILTER ]
-h, --help this message
-V, --version output version information
-n, --numeric don't resolve service names
-r, --resolve resolve host names
-a, --all display all sockets
-l, --listening display listening sockets
-o, --options show timer information
-e, --extended show detailed socket information
-m, --memory show socket memory usage
-p, --processes show process using socket
-i, --info show internal TCP information
-s, --summary show socket usage summary
-b, --bpf show bpf filter socket information
-Z, --context display process SELinux security contexts
-z, --contexts display process and socket SELinux security contexts
-N, --net switch to the specified network namespace name
-4, --ipv4 display only IP version 4 sockets
-6, --ipv6 display only IP version 6 sockets
-0, --packet display PACKET sockets
-t, --tcp display only TCP sockets
-u, --udp display only UDP sockets
-d, --dccp display only DCCP sockets
-w, --raw display only RAW sockets
-x, --unix display only Unix domain sockets
-f, --family=FAMILY display sockets of type FAMILY
-A, --query=QUERY, --socket=QUERY
QUERY := {all|inet|tcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|packet|netlink}[,QUERY]
-D, --diag=FILE Dump raw information about TCP sockets to FILE
-F, --filter=FILE read filter information from FILE
FILTER := [ state STATE-FILTER ] [ EXPRESSION ]
STATE-FILTER := {all|connected|synchronized|bucket|big|TCP-STATES}
TCP-STATES := {established|syn-sent|syn-recv|fin-wait-{1,2}|time-wait|closed|close-wait|last-ack|listen|closing}
connected := {established|syn-sent|syn-recv|fin-wait-{1,2}|time-wait|close-wait|last-ack|closing}
synchronized := {established|syn-recv|fin-wait-{1,2}|time-wait|close-wait|last-ack|closing}
bucket := {syn-recv|time-wait}
big := {established|syn-sent|fin-wait-{1,2}|closed|close-wait|last-ack|listen|closing}
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
ss [ OPTIONS ] [ FILTER ]
-h, --help this message
-V, --version output version information
-n, --numeric don't resolve service names
-r, --resolve resolve host names
-a, --all display all sockets
-l, --listening display listening sockets
-o, --options show timer information
-e, --extended show detailed socket information
-m, --memory show socket memory usage
-p, --processes show process using socket
-i, --info show internal TCP information
-s, --summary show socket usage summary
-b, --bpf show bpf filter socket information
-Z, --context display process SELinux security contexts
-z, --contexts display process and socket SELinux security contexts
-N, --net switch to the specified network namespace name
-4, --ipv4 display only IP version 4 sockets
-6, --ipv6 display only IP version 6 sockets
-0, --packet display PACKET sockets
-t, --tcp display only TCP sockets
-u, --udp display only UDP sockets
-d, --dccp display only DCCP sockets
-w, --raw display only RAW sockets
-x, --unix display only Unix domain sockets
-f, --family=FAMILY display sockets of type FAMILY
-A, --query=QUERY, --socket=QUERY
QUERY := {all|inet|tcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|packet|netlink}[,QUERY]
-D, --diag=FILE Dump raw information about TCP sockets to FILE
-F, --filter=FILE read filter information from FILE
FILTER := [ state STATE-FILTER ] [ EXPRESSION ]
STATE-FILTER := {all|connected|synchronized|bucket|big|TCP-STATES}
TCP-STATES := {established|syn-sent|syn-recv|fin-wait-{1,2}|time-wait|closed|close-wait|last-ack|listen|closing}
connected := {established|syn-sent|syn-recv|fin-wait-{1,2}|time-wait|close-wait|last-ack|closing}
synchronized := {established|syn-recv|fin-wait-{1,2}|time-wait|close-wait|last-ack|closing}
bucket := {syn-recv|time-wait}
big := {established|syn-sent|fin-wait-{1,2}|closed|close-wait|last-ack|listen|closing}
Can't create the snapshot test
[root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r rsh
sh: rs: command not found
sh: rs: command not found
Can't create the snapshot test
[root@onyx-69 ~]#
|
| Comment by nasf (Inactive) [ 14/Jul/17 ] |
|
The patch has been updated, please try again. Thanks! |
| Comment by Sarah Liu [ 17/Jul/17 ] |
|
doesn't work [root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r ssh Miss MDT0 in the config file /etc/ldev.conf [root@onyx-69 ~]# lctl snapshot_create -F lustre -n test Miss MDT0 in the config file /etc/ldev.conf [root@onyx-69 ~]# cat /etc/ldev.conf # example /etc/ldev.conf # #local foreign/- label [md|zfs:]device-path [journal-path]/- [raidtab] # #zeno-mds1 - zeno-MDT0000 zfs:lustre-zeno-mds1/mdt1 onyx-69 - lustre-MDT0000 zfs:lustre-mdt1/mdt1 # #zeno1 zeno5 zeno-OST0000 zfs:lustre-zeno1/ost1 onyx-70 - lustre-OST0000 zfs:lustre-ost1/ost1 #zeno2 zeno6 zeno-OST0001 zfs:lustre-zeno2/ost1 #zeno3 zeno7 zeno-OST0002 zfs:lustre-zeno3/ost1 #zeno4 zeno8 zeno-OST0003 zfs:lustre-zeno4/ost1 #zeno5 zeno1 zeno-OST0004 zfs:lustre-zeno5/ost1 #zeno6 zeno2 zeno-OST0005 zfs:lustre-zeno6/ost1 #zeno7 zeno3 zeno-OST0006 zfs:lustre-zeno7/ost1 #zeno8 zeno4 zeno-OST0007 zfs:lustre-zeno8/ost1 [root@onyx-69 ~]# ls /proc/fs/lustre/osd-zfs/ lustre-MDT0000 [root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r rsh Miss MDT0 in the config file /etc/ldev.conf [root@onyx-69 ~]# |
| Comment by nasf (Inactive) [ 18/Jul/17 ] |
|
Sorry Sarah, it is my typo. I have updated the patch set 5 |
| Comment by Sarah Liu [ 18/Jul/17 ] |
|
the rsh still doesn't work, the same error as before [root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r ssh [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r rsh bash: zfs: command not found Can't list the snapshot test [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test filesystem_name: lustre snapshot_name: test modify_time: Tue Jul 18 17:38:29 2017 snapshot_fsname: 19d3a51 create_time: Tue Jul 18 17:38:29 2017 status: not mount |
| Comment by nasf (Inactive) [ 19/Jul/17 ] |
|
Sorry, the "PATH" should be set for "zfs/zpool", not for "rsh/ssh". I updated the patch (set 6). |
| Comment by Sarah Liu [ 19/Jul/17 ] |
|
can the PATH set to include both? I got following error.. [root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r rsh sh: rsh: command not found sh: rsh: command not found Can't create the snapshot test [root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r ssh [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r ssh filesystem_name: lustre snapshot_name: test snapshot_fsname: 5aa1a00 modify_time: Wed Jul 19 23:05:29 2017 create_time: Wed Jul 19 23:05:29 2017 status: not mount [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r rsh sh: rsh: command not found Can't list the snapshot test [root@onyx-69 ~]# |
| Comment by nasf (Inactive) [ 20/Jul/17 ] |
|
Update as Sarah suggested (set 7) |
| Comment by Sarah Liu [ 20/Jul/17 ] |
|
Here is the problem of patch #7 1. with rsh, snapshot_create return error message but it did create the snapshot; [root@onyx-69 ~]# lctl snapshot_destroy -F lustre -n test [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test cannot open 'lustre-mdt1/mdt1@test': dataset does not exist Can't list the snapshot test [root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r rsh bash: zfs: command not found bash: zfs: command not found [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test filesystem_name: lustre snapshot_name: test snapshot_fsname: 0cb2593 modify_time: Thu Jul 20 22:14:52 2017 create_time: Thu Jul 20 22:14:52 2017 status: not mount [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r rsh bash: zfs: command not found Can't list the snapshot test [root@onyx-69 ~]# lctl snapshot_destroy -F lustre -n test -r rsh bash: zfs: command not found Can't destroy the snapshot test [root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r ssh filesystem_name: lustre snapshot_name: test snapshot_fsname: 0cb2593 modify_time: Thu Jul 20 22:14:52 2017 create_time: Thu Jul 20 22:14:52 2017 status: not mount [root@onyx-69 ~]# lctl snapshot_destroy -F lustre -n test -r ssh [root@onyx-69 ~]# |
| Comment by nasf (Inactive) [ 21/Jul/17 ] |
|
updated the patch as set 8, and simply tested it on onyx-69/70. |
| Comment by Sarah Liu [ 21/Jul/17 ] |
|
#8 works! Verified with rsh, ssh and default with no problem. |
| Comment by Gerrit Updater [ 14/Jun/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27999/ |
| Comment by nasf (Inactive) [ 14/Jun/18 ] |
|
The patch has been landed to master. |