[LU-16462] conf-sanity sles12.5 test_43a: lctl: attr.c:201: validate_nla: Assertion `0' failed. Created: 11/Jan/23 Updated: 21/Jul/23 Resolved: 01/May/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | James A Simmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run on sles12.5 clients: test_43a failed with the following error: lctl dl BUG at file position attr.c:201:validate_nla lctl: attr.c:201: validate_nla: Assertion `0' failed. The same failure exists in a number of other subtests that also use "lctl dl":
It looks like the validate_nla() function is part of libnl (netlink), so very likely relates to the new usage of netlink in "lctl dl" to get the device list. VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Andreas Dilger [ 12/Jan/23 ] |
|
James, can you please take a look. I don't think SLES12 clients are in such heavy usage that they need to get the latest netlink functionality, but at one point at least Cray was heavily based on SLES for their client distro, so at least we shouldn't break it gratuitously. If there isn't a straight forward way to fix it, I'd be fine with just configuring out the netlink functionality and always using ioctl/debugfs in this case (which isn't worse than what was available before). |
| Comment by James A Simmons [ 12/Jan/23 ] |
|
What version of libnl is installed? Do we have a special Test-parameter tag for SUSE12. I suspect that the libnl library is older so its lacking proper support for NLA_S64. |
| Comment by Gerrit Updater [ 12/Jan/23 ] |
|
"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49608 |
| Comment by Andreas Dilger [ 27/Jan/23 ] |
|
This is preventing patches on e2fsprogs from passing testing, since they run with sles12sp5 clients. https://testing.whamcloud.com/test_sessions/96f8909f-bbaa-40f2-b969-f261d4b0398f |
| Comment by Andreas Dilger [ 06/Mar/23 ] |
|
Comment from Dongyang in the 49608 patch that explains the issue:
|
| Comment by Andreas Dilger [ 06/Apr/23 ] |
|
Hi James, any thought on how to make progress on this issue? We have e2fsck fixes blocked from landing for a couple of months because the netlink patch has broken "lctl dl" on SLES12 clients. I don't think we need to retroactively add support for SLES12 clients to allow non-root users to run "lctl dl", so it would be fine if the netlink code was completely disabled for older clients that don't have NLA_S32 or NLA_NUL_STRING and only the ioctl fallback was used. It just needs to not break the old code. |
| Comment by Andreas Dilger [ 06/Apr/23 ] |
|
I'm going to push a patch that disables yaml netlink usage if NLA_NUL_STRING is not defined. This works fine for "lctl dl" in my local testing, but still need to fix "lctl ping" (sanity test_217). |
| Comment by Gerrit Updater [ 12/Apr/23 ] |
|
|
| Comment by Andreas Dilger [ 24/Apr/23 ] |
|
patch https://review.whamcloud.com/49608 " |
| Comment by James A Simmons [ 25/Apr/23 ] |
|
Thank you Andreas for your help |
| Comment by Andreas Dilger [ 25/Apr/23 ] |
|
I just ran across an old patch from Amir that is replacing usage of "lctl ping" and "lctl list_nids" with the equivalent "lnetctl" commands. The output is clunky and needs some awk to parse it into just a NID: $ lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: tcp
local NI(s):
- nid: 192.168.10.99@tcp
status: up
interfaces:
0: enp0s3
$ lnetctl net show | awk '/nid:/ && $3 != "0@lo" { print $3 }'
192.168.10.99@tcp
Alexey suggested in that patch to put this into a helper function on test-framework.sh instead of having it inline in multiple places. However, users would probably also want to print some of these fields outside of the testing, instead of the full YAML. Having a command-line argument like "lnetctl net show - |
| Comment by Gerrit Updater [ 01/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49608/ |
| Comment by Peter Jones [ 01/May/23 ] |
|
Landed for 2.16 |