[LU-9751] ZFS snapshot doesn't work when using RSH Created: 07/Jul/17  Updated: 14/Jun/18  Resolved: 14/Jun/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File lsnapshot.log    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

build: b2_10 #2 tag-RC1

When testing the ZFS Snapshot with Subdir Mount, I found that the lctl snapshot_* command only works with ssh, with rsh, it returns error. I have tried with both snapshot_create and list, both return the same error.

with ssh:

[root@onyx-69 ~]# lctl snapshot_list -d -F lustre --name test-snap
Password: 

filesystem_name: lustre
snapshot_name: test-snap

snapshot_role: MDT0000
Password: 
modify_time: Fri Jul  7 21:26:41 2017
create_time: Fri Jul  7 21:26:41 2017
snapshot_fsname: 55284e9 
Password: 
status: not mount

snapshot_role: OST0000
modify_time: Fri Jul  7 21:26:41 2017
create_time: Fri Jul  7 21:26:41 2017
snapshot_fsname: 55284e9 
status: not mount

try with same environment but with rsh option

[root@onyx-69 ~]# lctl snapshot_list -d -F lustre --name test-snap -r rsh
bash: zfs: command not found
Can't list the snapshot test-snap
[root@onyx-69 ~]# 

The rsh did work between nodes.

[root@onyx-69 ~]# rsh onyx-70
Last login: Fri Jul  7 21:26:07 from onyx-69.onyx.hpdd.intel.com
[root@onyx-70 ~]#


 Comments   
Comment by Peter Jones [ 07/Jul/17 ]

Fan Yong

Can you please advise on this one?

Thanks

Peter

Comment by nasf (Inactive) [ 07/Jul/17 ]

Sarah,

Would you please to attach the log file /var/log/lsnapshot.log? Thanks!

Comment by nasf (Inactive) [ 10/Jul/17 ]
Sat Jul  8 01:33:28 2017 (33860:jt_snapshot_list:2161:lustre:ssh): Can't list snapshot test with detail <no>: -22
Sat Jul  8 01:37:17 2017 (33964:jt_snapshot_list:2161:lustre:rsh): Can't list snapshot test with detail <no>: -22

The log shows that the snapshot_list has ever failed with "ssh" before the "rsh" failure, right? Would you please to try the following:

lctl snapshot_list -F lustre --name test-snap -r ssh
lctl snapshot_list -F lustre --name test-snap -r rsh
lctl snapshot_list -F lustre --name test-snap

Thanks!

Comment by Sarah Liu [ 10/Jul/17 ]

yes, before rsh, it failed with ssh since I didn't setup the keyless. Then I want to try with the easy way(rsh) and failed.
I will try the commands and update you.

Comment by nasf (Inactive) [ 11/Jul/17 ]

Thanks Sarah. If you still have the environment, I can login and try by myself.

Comment by Sarah Liu [ 11/Jul/17 ]

env is
MDS/MDT onyx-69
OST onyx-70
client onyx-23vm1

[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r ssh

filesystem_name: lustre
snapshot_name: test
snapshot_fsname: 7cf6c78 
create_time: Tue Jul 11 18:55:53 2017
modify_time: Tue Jul 11 18:55:53 2017
status: not mount
[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r rsh
bash: zfs: command not found
Can't list the snapshot test
[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test

filesystem_name: lustre
snapshot_name: test
snapshot_fsname: 7cf6c78 
create_time: Tue Jul 11 18:55:53 2017
modify_time: Tue Jul 11 18:55:53 2017
status: not mount

Comment by Gerrit Updater [ 12/Jul/17 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/27999
Subject: LU-9751 snapshot: set PATH for remote zfs commands
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: aaadad709778cba02bdb22aaac17613df63356e2

Comment by nasf (Inactive) [ 12/Jul/17 ]

Sarah,

Would you please to try above patch?

Thanks!

Comment by Sarah Liu [ 13/Jul/17 ]

the patch doesn't work, even the ssh fails with syntax error.

[root@onyx-69 ~]# lctl snapshot_create -F lustre -n test
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
       ss [ OPTIONS ] [ FILTER ]
   -h, --help          this message
   -V, --version       output version information
   -n, --numeric       don't resolve service names
   -r, --resolve       resolve host names
   -a, --all           display all sockets
   -l, --listening     display listening sockets
   -o, --options       show timer information
   -e, --extended      show detailed socket information
   -m, --memory        show socket memory usage
   -p, --processes     show process using socket
   -i, --info          show internal TCP information
   -s, --summary       show socket usage summary
   -b, --bpf           show bpf filter socket information
   -Z, --context       display process SELinux security contexts
   -z, --contexts      display process and socket SELinux security contexts
   -N, --net           switch to the specified network namespace name

   -4, --ipv4          display only IP version 4 sockets
   -6, --ipv6          display only IP version 6 sockets
   -0, --packet        display PACKET sockets
   -t, --tcp           display only TCP sockets
   -u, --udp           display only UDP sockets
   -d, --dccp          display only DCCP sockets
   -w, --raw           display only RAW sockets
   -x, --unix          display only Unix domain sockets
   -f, --family=FAMILY display sockets of type FAMILY

   -A, --query=QUERY, --socket=QUERY
       QUERY := {all|inet|tcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|packet|netlink}[,QUERY]

   -D, --diag=FILE     Dump raw information about TCP sockets to FILE
   -F, --filter=FILE   read filter information from FILE
       FILTER := [ state STATE-FILTER ] [ EXPRESSION ]
       STATE-FILTER := {all|connected|synchronized|bucket|big|TCP-STATES}
         TCP-STATES := {established|syn-sent|syn-recv|fin-wait-{1,2}|time-wait|closed|close-wait|last-ack|listen|closing}
          connected := {established|syn-sent|syn-recv|fin-wait-{1,2}|time-wait|close-wait|last-ack|closing}
       synchronized := {established|syn-recv|fin-wait-{1,2}|time-wait|close-wait|last-ack|closing}
             bucket := {syn-recv|time-wait}
                big := {established|syn-sent|fin-wait-{1,2}|closed|close-wait|last-ack|listen|closing}
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
       ss [ OPTIONS ] [ FILTER ]
   -h, --help          this message
   -V, --version       output version information
   -n, --numeric       don't resolve service names
   -r, --resolve       resolve host names
   -a, --all           display all sockets
   -l, --listening     display listening sockets
   -o, --options       show timer information
   -e, --extended      show detailed socket information
   -m, --memory        show socket memory usage
   -p, --processes     show process using socket
   -i, --info          show internal TCP information
   -s, --summary       show socket usage summary
   -b, --bpf           show bpf filter socket information
   -Z, --context       display process SELinux security contexts
   -z, --contexts      display process and socket SELinux security contexts
   -N, --net           switch to the specified network namespace name

   -4, --ipv4          display only IP version 4 sockets
   -6, --ipv6          display only IP version 6 sockets
   -0, --packet        display PACKET sockets
   -t, --tcp           display only TCP sockets
   -u, --udp           display only UDP sockets
   -d, --dccp          display only DCCP sockets
   -w, --raw           display only RAW sockets
   -x, --unix          display only Unix domain sockets
   -f, --family=FAMILY display sockets of type FAMILY

   -A, --query=QUERY, --socket=QUERY
       QUERY := {all|inet|tcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|packet|netlink}[,QUERY]

   -D, --diag=FILE     Dump raw information about TCP sockets to FILE
   -F, --filter=FILE   read filter information from FILE
       FILTER := [ state STATE-FILTER ] [ EXPRESSION ]
       STATE-FILTER := {all|connected|synchronized|bucket|big|TCP-STATES}
         TCP-STATES := {established|syn-sent|syn-recv|fin-wait-{1,2}|time-wait|closed|close-wait|last-ack|listen|closing}
          connected := {established|syn-sent|syn-recv|fin-wait-{1,2}|time-wait|close-wait|last-ack|closing}
       synchronized := {established|syn-recv|fin-wait-{1,2}|time-wait|close-wait|last-ack|closing}
             bucket := {syn-recv|time-wait}
                big := {established|syn-sent|fin-wait-{1,2}|closed|close-wait|last-ack|listen|closing}
Can't create the snapshot test


[root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r rsh
sh: rs: command not found
sh: rs: command not found
Can't create the snapshot test
[root@onyx-69 ~]# 
Comment by nasf (Inactive) [ 14/Jul/17 ]

The patch has been updated, please try again. Thanks!

Comment by Sarah Liu [ 17/Jul/17 ]

doesn't work

[root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r ssh
Miss MDT0 in the config file /etc/ldev.conf
[root@onyx-69 ~]# lctl snapshot_create -F lustre -n test
Miss MDT0 in the config file /etc/ldev.conf
[root@onyx-69 ~]# cat /etc/ldev.conf 
# example /etc/ldev.conf
#
#local  foreign/-  label       [md|zfs:]device-path   [journal-path]/- [raidtab]
#
#zeno-mds1 - zeno-MDT0000 zfs:lustre-zeno-mds1/mdt1
onyx-69 - lustre-MDT0000 zfs:lustre-mdt1/mdt1
#
#zeno1 zeno5 zeno-OST0000 zfs:lustre-zeno1/ost1
onyx-70 - lustre-OST0000 zfs:lustre-ost1/ost1
#zeno2 zeno6 zeno-OST0001 zfs:lustre-zeno2/ost1
#zeno3 zeno7 zeno-OST0002 zfs:lustre-zeno3/ost1
#zeno4 zeno8 zeno-OST0003 zfs:lustre-zeno4/ost1
#zeno5 zeno1 zeno-OST0004 zfs:lustre-zeno5/ost1
#zeno6 zeno2 zeno-OST0005 zfs:lustre-zeno6/ost1
#zeno7 zeno3 zeno-OST0006 zfs:lustre-zeno7/ost1
#zeno8 zeno4 zeno-OST0007 zfs:lustre-zeno8/ost1
[root@onyx-69 ~]# ls /proc/fs/lustre/osd-zfs/
lustre-MDT0000
[root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r rsh
Miss MDT0 in the config file /etc/ldev.conf
[root@onyx-69 ~]# 
Comment by nasf (Inactive) [ 18/Jul/17 ]

Sorry Sarah, it is my typo. I have updated the patch set 5

Comment by Sarah Liu [ 18/Jul/17 ]

the rsh still doesn't work, the same error as before

[root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r ssh
[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r rsh
bash: zfs: command not found
Can't list the snapshot test
[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test

filesystem_name: lustre
snapshot_name: test
modify_time: Tue Jul 18 17:38:29 2017
snapshot_fsname: 19d3a51 
create_time: Tue Jul 18 17:38:29 2017
status: not mount

Comment by nasf (Inactive) [ 19/Jul/17 ]

Sorry, the "PATH" should be set for "zfs/zpool", not for "rsh/ssh". I updated the patch (set 6).

Comment by Sarah Liu [ 19/Jul/17 ]

can the PATH set to include both? I got following error..

[root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r rsh
sh: rsh: command not found
sh: rsh: command not found
Can't create the snapshot test
[root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r ssh
[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r ssh

filesystem_name: lustre
snapshot_name: test
snapshot_fsname: 5aa1a00 
modify_time: Wed Jul 19 23:05:29 2017
create_time: Wed Jul 19 23:05:29 2017
status: not mount
[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r rsh
sh: rsh: command not found
Can't list the snapshot test
[root@onyx-69 ~]# 

Comment by nasf (Inactive) [ 20/Jul/17 ]

Update as Sarah suggested (set 7)

Comment by Sarah Liu [ 20/Jul/17 ]

Here is the problem of patch #7

1. with rsh, snapshot_create return error message but it did create the snapshot;
2. snapshot_list/destroy with rsh still don't work

[root@onyx-69 ~]# lctl snapshot_destroy -F lustre -n test
[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test
cannot open 'lustre-mdt1/mdt1@test': dataset does not exist
Can't list the snapshot test
[root@onyx-69 ~]# lctl snapshot_create -F lustre -n test -r rsh
bash: zfs: command not found
bash: zfs: command not found
[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test

filesystem_name: lustre
snapshot_name: test
snapshot_fsname: 0cb2593 
modify_time: Thu Jul 20 22:14:52 2017
create_time: Thu Jul 20 22:14:52 2017
status: not mount
[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r rsh
bash: zfs: command not found
Can't list the snapshot test
[root@onyx-69 ~]# lctl snapshot_destroy -F lustre -n test -r rsh
bash: zfs: command not found
Can't destroy the snapshot test
[root@onyx-69 ~]# lctl snapshot_list -F lustre -n test -r ssh

filesystem_name: lustre
snapshot_name: test
snapshot_fsname: 0cb2593 
modify_time: Thu Jul 20 22:14:52 2017
create_time: Thu Jul 20 22:14:52 2017
status: not mount
[root@onyx-69 ~]# lctl snapshot_destroy -F lustre -n test -r ssh
[root@onyx-69 ~]# 
Comment by nasf (Inactive) [ 21/Jul/17 ]

updated the patch as set 8, and simply tested it on onyx-69/70.

Comment by Sarah Liu [ 21/Jul/17 ]

#8 works! Verified with rsh, ssh and default with no problem.

Comment by Gerrit Updater [ 14/Jun/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27999/
Subject: LU-9751 snapshot: set PATH for remote zfs commands
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 13fa5f46bff31d6836409383733f817989cca59d

Comment by nasf (Inactive) [ 14/Jun/18 ]

The patch has been landed to master.

Generated at Sat Feb 10 02:28:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.