[LU-8021] interop: 2.1.x server <-> clients version > 2.3: t-f debugsave() debugrestore() defect Created: 14/Apr/16  Updated: 27/Oct/16  Resolved: 21/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.5
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: parinay v kondekar (Inactive) Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

test-framework.sh :

debugsave() {
    DEBUGSAVE="$(lctl get_param -n debug)"
}

– DEBUGSAVE is equal to value set on client

debugrestore() {
    [ -n "$DEBUGSAVE" ] && \
        do_nodes $(comma_list $(nodes_list)) "$LCTL set_param debug=\\\"${DEBUGSAVE}\\\";"
    DEBUGSAVE=""
}

– sets debug=$DEBUGSAVE on all nodes including the server nodes.
I.e. debugrestore () does not restore the initial debug values set on servers, but sets the debug value equal to initial debug value set on client.

Intel clients (starting from 2.3, D_LFSCK added by LU-957) have some debugging masks are missing on 2.1.x servers.

libcfs_debug.h :
#define D_SEC           0x08000000
#define D_LFSCK         0x10000000 /* For both OI scrub and LFSCK */
#define D_HSM           0x20000000

The described debugsave() and debugrestore() defect leads the tests to fail when run with PTLDEBUG=-1 because of EINVAL returned by lctl executed on servers:

+ debugrestore
...
+ /usr/bin/pdsh -R ssh -S -w fre0205,fre0206,fre0207,fre0208 '(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre"  FSTYPE=ldiskfs sh -c "/usr/sbin/lctl set_param debug=\"trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck\";")'
fre0206: error: set_param: writing to file /proc/sys/lnet/debug: Invalid argument
pdsh@fre0207: fre0206: ssh exited with exit code 1
  • Reproducible on 2.1 Server <-> 2.5.1 client.
== sanity test 63b: async write errors should be returned to fsync ===== 21:43:48 (1457502228)
debug=-1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00600947 s, 682 kB/s
fail_loc=0x80000406
fsync: Input/output error
192.18.177.138: error: set_param: setting /proc/sys/lnet/debug=trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm: Invalid argument
pdsh@osh-1: 192.18.177.138: ssh exited with exit code 1
debug=trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm
debug=trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm
Resetting fail_loc and fail_val on all nodes...done.
PASS 63b (13s)
resend_count is set to 4 4
resend_count is set to 4 4
resend_count is set to 4 4
resend_count is set to 4 4
resend_count is set to 4 4
== sanity test complete, duration 235 sec == 21:44:02 (1457502242)
Stopping clients: osh-1.xyus.xyratex.com /mnt/lustre (opts:-f)
Stopping client osh-1.xyus.xyratex.com /mnt/lustre opts:-f
Stopping clients: osh-1.xyus.xyratex.com /mnt/lustre2 (opts:-f)
Stopping /mnt/mds1 (opts:-f) on 192.18.177.138
Stopping /mnt/ost1 (opts:-f) on 192.18.177.138
Stopping /mnt/ost2 (opts:-f) on 192.18.177.138
modules unloaded.
[root@osh-1 tests]# 



-----------

2.1.x server

[root@osh-1 ~]# lctl set_param -n debug=-1
You have new mail in /var/spool/mail/root
[root@osh-1 ~]# lctl get_param -n debug
trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec
[root@osh-1 ~]#

2.5.1 client
[root@osh-1 tests]# lctl set_param -n debug=-1
[root@osh-1 tests]# lctl get_param -n debug
trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec *lfsck hsm*
[root@osh-1 tests]#

-------------


 Comments   
Comment by Andreas Dilger [ 14/Apr/16 ]

Probably the right solution is to have a separate DEBUGSAVE_OSS and DEBUGSAVE_MDS to hold the server values (assume they are the same on all servers) between debugsave() and debugrestore().

Comment by parinay v kondekar (Inactive) [ 15/Apr/16 ]

Andreas,

  • I had created and tested following patch. I used a single var DEBUGSAVE_SERVER instead of what you suggested and client debug flags are continued to be saved in DEBUGSAVE.
  • Also this issue is seen in case of interop and when PTLDEBUG=-1 is set.
diff --git a/lustre/tests/test-framework.sh b/lustre/tests/test-framework.sh
index ac11c8f..6149963 100755
--- a/lustre/tests/test-framework.sh
+++ b/lustre/tests/test-framework.sh
@@ -4570,13 +4570,24 @@ pgcache_empty() {
 }
 
 debugsave() {
-    DEBUGSAVE="$(lctl get_param -n debug)"
+       DEBUGSAVE="$($LCTL get_param -n debug)"
+       DEBUGSAVE_SERVER=$(do_facet $SINGLEMDS "$LCTL get_param -n debug")
 }
 
 debugrestore() {
-    [ -n "$DEBUGSAVE" ] && 
-        do_nodes $(comma_list $(nodes_list)) "$LCTL set_param debug=\\\"${DEBUGSAVE}\\\";"
-    DEBUGSAVE=""
+       local SERVERS=$(comma_list $(mdts_nodes) $(osts_nodes))
+
+       [ -n "$DEBUGSAVE" ] && 
+               do_nodes $CLIENTS \
+               "$LCTL set_param debug=\\\"${DEBUGSAVE}\\\";"
+       DEBUGSAVE=""
+
+       [ -n "DEBUGSAVE_SERVER" ] && 
+               do_nodes $SERVERS \
+               "$LCTL set_param debug=\\\"${DEBUGSAVE_SERVER}\\\";"
+
+       DEBUGSAVE_SERVER=""
+
 }

Do you think, I should go ahead and change DEBUGSAVE_SERVER into DEBUGSAVE_MDS and DEBUGSAVE_OSS or the present patch looks good.?

Appreciate your time.
Thanks

Comment by Gerrit Updater [ 15/Apr/16 ]

Parinay Kondekar (parinay.kondekar@seagate.com) uploaded a new patch: http://review.whamcloud.com/19604
Subject: LU-8021 tests: In interop, ensure to save/restore correct debug flags
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d8fb09f453e8c419611d12a6e655d3047d40d968

Comment by Gerrit Updater [ 11/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19604/
Subject: LU-8021 tests: In interop, ensure to save/restore correct debug flags
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 77ae71b56e1ed44a0a80a95c465dafd1f41d2e86

Comment by Joseph Gmitter (Inactive) [ 13/Jul/16 ]

Patch has landed to master for 2.9.0

Comment by Andreas Dilger [ 13/Sep/16 ]

This patch opened a defect in debugrestore():

CMD: trevis-27vm3,trevis-27vm4,trevis-27vm8 /usr/sbin/lctl set_param debug=\"\"
trevis-27vm4: error: set_param: setting debug: no value
trevis-27vm3: error: set_param: setting debug: no value
trevis-27vm8: error: set_param: setting debug: no value

The problem is in the newly added code:

 debugrestore() {
+       [ -n "DEBUGSAVE_SERVER" ] &&
+               do_nodes $(comma_list $(all_server_nodes)) \
+                        "$LCTL set_param debug=\\\"${DEBUGSAVE_SERVER}\\\""

This should actually be [ -n "$DEBUGSAVE_SERVER" ] && (note leading $ there. Otherwise the check is always true and the code tries to restore an empty string and it produces a lot of spurious error messages in the test logs.

Parinay, can you please submit a patch to fix this. Note that it can use the Test-Parameters: trivial tag to bypass most of the testing.

Comment by Peter Jones [ 19/Sep/16 ]

Emoly is looking into this

Comment by Emoly Liu [ 19/Sep/16 ]

Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/22586
Subject: LU-8021 tests: Add leading $ to "DEBUGSAVE_SERVER"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 90910df9cf50f963868c0bd65af1ac0e9d6c141a

Comment by parinay v kondekar (Inactive) [ 21/Sep/16 ]

Andreas,
Sorry I missed this notification, somehow it did not reach my mail .

Comment by Gerrit Updater [ 21/Sep/16 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/22586/
Subject: LU-8021 tests: Add leading $ to "DEBUGSAVE_SERVER"
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 81f309421dafa9a7519634bb27f54faabd5ef852

Comment by Peter Jones [ 21/Sep/16 ]

Landed to 2.9

Generated at Sat Feb 10 02:13:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.