[LU-8021] interop: 2.1.x server <-> clients version > 2.3: t-f debugsave() debugrestore() defect Created: 14/Apr/16 Updated: 27/Oct/16 Resolved: 21/Sep/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.5 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | parinay v kondekar (Inactive) | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
test-framework.sh : debugsave() {
DEBUGSAVE="$(lctl get_param -n debug)"
}
– DEBUGSAVE is equal to value set on client debugrestore() {
[ -n "$DEBUGSAVE" ] && \
do_nodes $(comma_list $(nodes_list)) "$LCTL set_param debug=\\\"${DEBUGSAVE}\\\";"
DEBUGSAVE=""
}
– sets debug=$DEBUGSAVE on all nodes including the server nodes. Intel clients (starting from 2.3, D_LFSCK added by libcfs_debug.h :
#define D_SEC 0x08000000
#define D_LFSCK 0x10000000 /* For both OI scrub and LFSCK */
#define D_HSM 0x20000000
The described debugsave() and debugrestore() defect leads the tests to fail when run with PTLDEBUG=-1 because of EINVAL returned by lctl executed on servers: + debugrestore
...
+ /usr/bin/pdsh -R ssh -S -w fre0205,fre0206,fre0207,fre0208 '(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" FSTYPE=ldiskfs sh -c "/usr/sbin/lctl set_param debug=\"trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck\";")'
fre0206: error: set_param: writing to file /proc/sys/lnet/debug: Invalid argument
pdsh@fre0207: fre0206: ssh exited with exit code 1
== sanity test 63b: async write errors should be returned to fsync ===== 21:43:48 (1457502228) debug=-1 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00600947 s, 682 kB/s fail_loc=0x80000406 fsync: Input/output error 192.18.177.138: error: set_param: setting /proc/sys/lnet/debug=trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm: Invalid argument pdsh@osh-1: 192.18.177.138: ssh exited with exit code 1 debug=trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm debug=trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm Resetting fail_loc and fail_val on all nodes...done. PASS 63b (13s) resend_count is set to 4 4 resend_count is set to 4 4 resend_count is set to 4 4 resend_count is set to 4 4 resend_count is set to 4 4 == sanity test complete, duration 235 sec == 21:44:02 (1457502242) Stopping clients: osh-1.xyus.xyratex.com /mnt/lustre (opts:-f) Stopping client osh-1.xyus.xyratex.com /mnt/lustre opts:-f Stopping clients: osh-1.xyus.xyratex.com /mnt/lustre2 (opts:-f) Stopping /mnt/mds1 (opts:-f) on 192.18.177.138 Stopping /mnt/ost1 (opts:-f) on 192.18.177.138 Stopping /mnt/ost2 (opts:-f) on 192.18.177.138 modules unloaded. [root@osh-1 tests]# ----------- 2.1.x server [root@osh-1 ~]# lctl set_param -n debug=-1 You have new mail in /var/spool/mail/root [root@osh-1 ~]# lctl get_param -n debug trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec [root@osh-1 ~]# 2.5.1 client [root@osh-1 tests]# lctl set_param -n debug=-1 [root@osh-1 tests]# lctl get_param -n debug trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec *lfsck hsm* [root@osh-1 tests]# ------------- |
| Comments |
| Comment by Andreas Dilger [ 14/Apr/16 ] |
|
Probably the right solution is to have a separate DEBUGSAVE_OSS and DEBUGSAVE_MDS to hold the server values (assume they are the same on all servers) between debugsave() and debugrestore(). |
| Comment by parinay v kondekar (Inactive) [ 15/Apr/16 ] |
|
Andreas,
diff --git a/lustre/tests/test-framework.sh b/lustre/tests/test-framework.sh
index ac11c8f..6149963 100755
--- a/lustre/tests/test-framework.sh
+++ b/lustre/tests/test-framework.sh
@@ -4570,13 +4570,24 @@ pgcache_empty() {
}
debugsave() {
- DEBUGSAVE="$(lctl get_param -n debug)"
+ DEBUGSAVE="$($LCTL get_param -n debug)"
+ DEBUGSAVE_SERVER=$(do_facet $SINGLEMDS "$LCTL get_param -n debug")
}
debugrestore() {
- [ -n "$DEBUGSAVE" ] &&
- do_nodes $(comma_list $(nodes_list)) "$LCTL set_param debug=\\\"${DEBUGSAVE}\\\";"
- DEBUGSAVE=""
+ local SERVERS=$(comma_list $(mdts_nodes) $(osts_nodes))
+
+ [ -n "$DEBUGSAVE" ] &&
+ do_nodes $CLIENTS \
+ "$LCTL set_param debug=\\\"${DEBUGSAVE}\\\";"
+ DEBUGSAVE=""
+
+ [ -n "DEBUGSAVE_SERVER" ] &&
+ do_nodes $SERVERS \
+ "$LCTL set_param debug=\\\"${DEBUGSAVE_SERVER}\\\";"
+
+ DEBUGSAVE_SERVER=""
+
}
Do you think, I should go ahead and change DEBUGSAVE_SERVER into DEBUGSAVE_MDS and DEBUGSAVE_OSS or the present patch looks good.? Appreciate your time. |
| Comment by Gerrit Updater [ 15/Apr/16 ] |
|
Parinay Kondekar (parinay.kondekar@seagate.com) uploaded a new patch: http://review.whamcloud.com/19604 |
| Comment by Gerrit Updater [ 11/Jul/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19604/ |
| Comment by Joseph Gmitter (Inactive) [ 13/Jul/16 ] |
|
Patch has landed to master for 2.9.0 |
| Comment by Andreas Dilger [ 13/Sep/16 ] |
|
This patch opened a defect in debugrestore(): CMD: trevis-27vm3,trevis-27vm4,trevis-27vm8 /usr/sbin/lctl set_param debug=\"\" trevis-27vm4: error: set_param: setting debug: no value trevis-27vm3: error: set_param: setting debug: no value trevis-27vm8: error: set_param: setting debug: no value The problem is in the newly added code: debugrestore() {
+ [ -n "DEBUGSAVE_SERVER" ] &&
+ do_nodes $(comma_list $(all_server_nodes)) \
+ "$LCTL set_param debug=\\\"${DEBUGSAVE_SERVER}\\\""
This should actually be [ -n "$DEBUGSAVE_SERVER" ] && (note leading $ there. Otherwise the check is always true and the code tries to restore an empty string and it produces a lot of spurious error messages in the test logs. Parinay, can you please submit a patch to fix this. Note that it can use the Test-Parameters: trivial tag to bypass most of the testing. |
| Comment by Peter Jones [ 19/Sep/16 ] |
|
Emoly is looking into this |
| Comment by Emoly Liu [ 19/Sep/16 ] |
|
Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/22586 |
| Comment by parinay v kondekar (Inactive) [ 21/Sep/16 ] |
|
Andreas, |
| Comment by Gerrit Updater [ 21/Sep/16 ] |
|
Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/22586/ |
| Comment by Peter Jones [ 21/Sep/16 ] |
|
Landed to 2.9 |