[LU-15022] sanity test_104c returned 2 Created: 21/Sep/21  Updated: 22/Sep/21  Resolved: 22/Sep/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Arshad Hussain
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14997 Register "stack_trap" for sanity/104c... Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for S Buisson <sbuisson@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4824ed24-8a35-468e-b1f1-3e1efcc5321f

test_104c failed with the following error:

test_104c returned 2

The test seems broken.
It ends with:

After recordsize change
lfs output : filesystem_summary: 73.0G 150.8M 72.9G 1% /mnt/lustre
df  output : 10.9.4.19@tcp:/lustre 74G 150M 73G 1% /mnt/lustre
CMD: trevis-71vm4 /usr/sbin/lctl get_param -n osd-zfs.lustre-MDT0000..mntdev
CMD: trevis-71vm4 zfs set recordsize=131072 lustre-mdt1/mdt1
CMD: trevis-71vm5 /usr/sbin/lctl get_param -n osd-zfs.lustre-MDT0000..mntdev
trevis-71vm5: error: get_param: param_path 'osd-zfs/lustre-MDT0000//mntdev': No such file or directory
pdsh@trevis-71vm1: trevis-71vm5: ssh exited with exit code 2

But when it succeeds, for instance in https://testing.whamcloud.com/sub_tests/272da434-33ef-44cc-9710-77a6372826e6, what happens is not really different, but the script manages to set recordsize:

After recordsize change
lfs output : filesystem_summary: 73.0G 146.8M 72.9G 1% /mnt/lustre
df  output : 10.9.10.22@tcp:/lustre 74G 147M 73G 1% /mnt/lustre
CMD: trevis-209vm14 lctl get_param -n osd-zfs.lustre-MDT0000..mntdev
CMD: trevis-209vm14 zfs set recordsize=131072 lustre-mdt1/mdt1
CMD: trevis-209vm15 lctl get_param -n osd-zfs.lustre-MDT0000..mntdev
trevis-209vm15: error: get_param: param_path 'osd-zfs/lustre-MDT0000//mntdev': No such file or directory
pdsh@trevis-209vm11: trevis-209vm15: ssh exited with exit code 2
CMD: trevis-209vm15 zfs set recordsize=131072

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_104c - test_104c returned 2



 Comments   
Comment by Andreas Dilger [ 21/Sep/21 ]

This started on 2021-09-14. The test was added in arch https://review.whamcloud.com/43154 "LU-14565 ofd: Do not rely on tgd_blockbit" but it landed on 2021-05-19 so it isn't clear why it started failing.

The error reported is because "ostparam" and "mdtparam" have a trailing "." but also adds its own "." when it is used.

Comment by Arshad Hussain [ 21/Sep/21 ]

Andreas, I am looking into this.

Comment by Andreas Dilger [ 21/Sep/21 ]

Actually, it may not be caused by the double ".." in the parameter name. Since these are converted to "//" in the pathname, and lookups will eat multiple consecutive "//" in the path, that may just be a cosmetic, but harmless issue.

As Sebastien points out, in the case where the test passes, the "zfs set recordsize" command is run at the end, and that hides the error returned from "lctl get_param osd-zfs.*.mntdev". If the first error is fixed, then the test should pass consistently.

It probably also makes sense to move the "restore recordsize" to stack_trap, so that it is always restored even if the test calls "error" earlier. Note, however, that the stack_trap commands are still run in the context of the test, so they cannot return an error or the test will be considered a failure.

Comment by Andreas Dilger [ 21/Sep/21 ]

I just looked at the patch list from "git log --oneline --after 2021-09-16 master" and I think this is pretty clearly caused by patch https://review.whamcloud.com/44882 "LU-14997 tests: Register stack_trap for sanity/104c". Unfortunately, that patch was tested with "Test-Parameters: trivial" but test_104c is skipped for ldiskfs, and the patch causes the failure exclusively on ZFS backends.

Comment by Gerrit Updater [ 21/Sep/21 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45008
Subject: LU-15022 revert: "LU-14997 tests: Register stack_trap for sanity/104c"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cc7bf497aeb4845637946ea0c519d632f292e06b

Comment by Gerrit Updater [ 22/Sep/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45008/
Subject: LU-15022 revert: "LU-14997 tests: Register stack_trap for sanity/104c"
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: df2c0c19b1b8075a21b583b06aaf11f215f59c22

Generated at Sat Feb 10 03:14:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.