Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16462

conf-sanity sles12.5 test_43a: lctl: attr.c:201: validate_nla: Assertion `0' failed.

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run on sles12.5 clients:
      https://testing.whamcloud.com/test_sets/258bf667-e863-4adb-af68-213f7877b909
      https://testing.whamcloud.com/test_sets/e3b187ad-af69-4bd1-b43c-583da240aef3

      test_43a failed with the following error:

      lctl dl
      BUG at file position attr.c:201:validate_nla
      lctl: attr.c:201: validate_nla: Assertion `0' failed.
      

      The same failure exists in a number of other subtests that also use "lctl dl":

      • sanity: test_33i, test_104d, test_154d
      • conf-sanity: test_43b, test_70c, test_91

      It looks like the validate_nla() function is part of libnl (netlink), so very likely relates to the new usage of netlink in "lctl dl" to get the device list.

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      conf-sanity test_70c - set deactivate failed

      Attachments

        Issue Links

          Activity

            [LU-16462] conf-sanity sles12.5 test_43a: lctl: attr.c:201: validate_nla: Assertion `0' failed.
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49608/
            Subject: LU-16462 utils: handle lack of newer nla_attrs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ae1ee11cea0a90631e14d670883528d6ac6e86b7

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49608/ Subject: LU-16462 utils: handle lack of newer nla_attrs Project: fs/lustre-release Branch: master Current Patch Set: Commit: ae1ee11cea0a90631e14d670883528d6ac6e86b7

            I just ran across an old patch from Amir that is replacing usage of "lctl ping" and "lctl list_nids" with the equivalent "lnetctl" commands.

            The output is clunky and needs some awk to parse it into just a NID:

            $ lnetctl net show
            net:
                - net type: lo
                  local NI(s):
                    - nid: 0@lo
                      status: up
                - net type: tcp
                  local NI(s):
                    - nid: 192.168.10.99@tcp
                      status: up
                      interfaces:
                          0: enp0s3
            
            $ lnetctl net show | awk '/nid:/ && $3 != "0@lo" { print $3 }'
            192.168.10.99@tcp
            

            Alexey suggested in that patch to put this into a helper function on test-framework.sh instead of having it inline in multiple places. However, users would probably also want to print some of these fields outside of the testing, instead of the full YAML.

            Having a command-line argument like "lnetctl net show -nid" in this case, but also able to print other fields like "... --status", "nettype", "-interfaces") would more convenient than users having to use "awk" or "yq" to extract the fields manually.

            adilger Andreas Dilger added a comment - I just ran across an old patch from Amir that is replacing usage of " lctl ping " and " lctl list_nids " with the equivalent " lnetctl " commands. The output is clunky and needs some awk to parse it into just a NID: $ lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 192.168.10.99@tcp status: up interfaces: 0: enp0s3 $ lnetctl net show | awk '/nid:/ && $3 != "0@lo" { print $3 }' 192.168.10.99@tcp Alexey suggested in that patch to put this into a helper function on test-framework.sh instead of having it inline in multiple places. However, users would probably also want to print some of these fields outside of the testing, instead of the full YAML. Having a command-line argument like " lnetctl net show - nid " in this case, but also able to print other fields like " ... --status ", " nettype ", " -interfaces ") would more convenient than users having to use " awk " or " yq " to extract the fields manually.

            Thank you Andreas for your help

            simmonsja James A Simmons added a comment - Thank you Andreas for your help

            patch https://review.whamcloud.com/49608 "LU-16462 utils: handle lack of NLA_S64" has been updated to handle the sles12sp5 libnl incompatibility, along with test and tool fixes for the netlink-unavailable fallback case so that "lctl ping" and "lctl list_nids" continue to work.

            adilger Andreas Dilger added a comment - patch https://review.whamcloud.com/49608 " LU-16462 utils: handle lack of NLA_S64 " has been updated to handle the sles12sp5 libnl incompatibility, along with test and tool fixes for the netlink-unavailable fallback case so that " lctl ping " and " lctl list_nids " continue to work.
            gerrit Gerrit Updater added a comment - - edited

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50610
            Subject: LU-16462 utils: skip netlink for old libnl3
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5ff26d3c3205baacefb3c0e48c2a03ff0713db39

            gerrit Gerrit Updater added a comment - - edited "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50610 Subject: LU-16462 utils: skip netlink for old libnl3 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5ff26d3c3205baacefb3c0e48c2a03ff0713db39

            I'm going to push a patch that disables yaml netlink usage if NLA_NUL_STRING is not defined. This works fine for "lctl dl" in my local testing, but still need to fix "lctl ping" (sanity test_217).

            adilger Andreas Dilger added a comment - I'm going to push a patch that disables yaml netlink usage if NLA_NUL_STRING is not defined. This works fine for " lctl dl " in my local testing, but still need to fix " lctl ping " (sanity test_217).

            Hi James, any thought on how to make progress on this issue?

            We have e2fsck fixes blocked from landing for a couple of months because the netlink patch has broken "lctl dl" on SLES12 clients. I don't think we need to retroactively add support for SLES12 clients to allow non-root users to run "lctl dl", so it would be fine if the netlink code was completely disabled for older clients that don't have NLA_S32 or NLA_NUL_STRING and only the ioctl fallback was used. It just needs to not break the old code.

            adilger Andreas Dilger added a comment - Hi James, any thought on how to make progress on this issue? We have e2fsck fixes blocked from landing for a couple of months because the netlink patch has broken " lctl dl " on SLES12 clients. I don't think we need to retroactively add support for SLES12 clients to allow non-root users to run " lctl dl ", so it would be fine if the netlink code was completely disabled for older clients that don't have NLA_S32 or NLA_NUL_STRING and only the ioctl fallback was used. It just needs to not break the old code.

            Comment from Dongyang in the 49608 patch that explains the issue:

            lctl dl is triggering assert inside libnl3 because we use NLA_NUL_STRING.
            in the old libnl3, we don't have NLA_NUL_STRING and NLA_S8|16|32|64:

            enum {
            	NLA_UNSPEC,	/**< Unspecified type, binary data chunk */
            	NLA_U8,		/**< 8 bit integer */
            	NLA_U16,	/**< 16 bit integer */
            	NLA_U32,	/**< 32 bit integer */
            	NLA_U64,	/**< 64 bit integer */
            	NLA_STRING,	/**< NUL terminated character string */
            	NLA_FLAG,	/**< Flag */
            	NLA_MSECS,	/**< Micro seconds (64bit) */
            	NLA_NESTED,	/**< Nested attributes */
            	__NLA_TYPE_MAX,
            };
            
            #define NLA_TYPE_MAX (__NLA_TYPE_MAX - 1)
            

            and if we try to use any NLA_TYPE greater than NLA_TYPE_MAX, it will trigger the assert in validate_nla().
            Do we have to use NLA_NUL_STRING instead of NLA_STRING, and the signed nla types?

            adilger Andreas Dilger added a comment - Comment from Dongyang in the 49608 patch that explains the issue: lctl dl is triggering assert inside libnl3 because we use NLA_NUL_STRING . in the old libnl3, we don't have NLA_NUL_STRING and NLA_S8|16|32|64 : enum { NLA_UNSPEC, /**< Unspecified type, binary data chunk */ NLA_U8, /**< 8 bit integer */ NLA_U16, /**< 16 bit integer */ NLA_U32, /**< 32 bit integer */ NLA_U64, /**< 64 bit integer */ NLA_STRING, /**< NUL terminated character string */ NLA_FLAG, /**< Flag */ NLA_MSECS, /**< Micro seconds (64bit) */ NLA_NESTED, /**< Nested attributes */ __NLA_TYPE_MAX, }; #define NLA_TYPE_MAX (__NLA_TYPE_MAX - 1) and if we try to use any NLA_TYPE greater than NLA_TYPE_MAX , it will trigger the assert in validate_nla() . Do we have to use NLA_NUL_STRING instead of NLA_STRING , and the signed nla types?
            adilger Andreas Dilger added a comment - This is preventing patches on e2fsprogs from passing testing, since they run with sles12sp5 clients. https://testing.whamcloud.com/test_sessions/96f8909f-bbaa-40f2-b969-f261d4b0398f https://testing.whamcloud.com/test_sessions/1f946f9d-86de-401d-9c45-b3b445eb10b4

            People

              simmonsja James A Simmons
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: