Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9904

LNetError: 16885:0:(peer.c:1786:lnet_peer_push_event()) Push Put from unknown 0@<0:0> (source 0@<0:0>)

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      New error seen during test on el6.9. Not seen before. Seen during lustre_rmmod every time.

      Suspect due to recent landings on master.

      Attachments

        Issue Links

          Activity

            [LU-9904] LNetError: 16885:0:(peer.c:1786:lnet_peer_push_event()) Push Put from unknown 0@<0:0> (source 0@<0:0>)
            pjones Peter Jones added a comment -

            Landed for 2.11.

            pjones Peter Jones added a comment - Landed for 2.11.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29026/
            Subject: LU-9904 lnet: reduce logging severity
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7fc8037d61b29fa2ac93ab5fb30fccd9b1c0066c

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29026/ Subject: LU-9904 lnet: reduce logging severity Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7fc8037d61b29fa2ac93ab5fb30fccd9b1c0066c

            I'd be happy if you fixed the test script to not call list_nids before the modules are loaded.

            Even so, the error message itself doesn't explain anything to the user who runs "lctl list_nids". I'm not suggesting the error should be removed, but rather that it should be improved to be actually useful to the user:

            • include lctl list_nids: at the start, so the reader knows which command is failing
            • expand the message for EINVAL to suggest that the lnet module is not loaded
            adilger Andreas Dilger added a comment - I'd be happy if you fixed the test script to not call list_nids before the modules are loaded. Even so, the error message itself doesn't explain anything to the user who runs " lctl list_nids ". I'm not suggesting the error should be removed, but rather that it should be improved to be actually useful to the user: include lctl list_nids: at the start, so the reader knows which command is failing expand the message for EINVAL to suggest that the lnet module is not loaded

            yaml.sh:yml_node_info() gets called before the modules are loaded. in this function

            for nw in $(lctl list_nids | grep -v @lo | cut -f 2 -d '@' | uniq); do
            

            This doesn't look like it's something new. I'm not sure exactly how the test framework works to make a change there.

            I would rather not remove the error when list_nids fails, because it's useful for debugging. prints out why it failed. I think it makes more sense not to call lctl list_nids from the test unless the modules are loaded first.

            ashehata Amir Shehata (Inactive) added a comment - yaml.sh:yml_node_info() gets called before the modules are loaded. in this function for nw in $(lctl list_nids | grep -v @lo | cut -f 2 -d '@' | uniq); do This doesn't look like it's something new. I'm not sure exactly how the test framework works to make a change there. I would rather not remove the error when list_nids fails, because it's useful for debugging. prints out why it failed. I think it makes more sense not to call lctl list_nids from the test unless the modules are loaded first.

            I would suggest that IOC_LIBCFS_GET_NI error 22: Invalid argument is not a useful error to print if lctl list_nids failed, so that should be fixed up somehow.

            Secondly, I don't see where lctl list_nids is being called, or I'd suggest to avoid calling it if that doesn't make sense. The only places I see it are in sk_nodemap_setup(), which is only called if $SK_S2S is set, and in host_nids_address() which is only sanity.sh::test_217 and not during mount that I can see.

            adilger Andreas Dilger added a comment - I would suggest that IOC_LIBCFS_GET_NI error 22: Invalid argument is not a useful error to print if lctl list_nids failed, so that should be fixed up somehow. Secondly, I don't see where lctl list_nids is being called, or I'd suggest to avoid calling it if that doesn't make sense. The only places I see it are in sk_nodemap_setup() , which is only called if $SK_S2S is set, and in host_nids_address() which is only sanity.sh::test_217 and not during mount that I can see.

            Andreas, the reason for
            IOC_LIBCFS_GET_NI error 22: Invalid argument
            is because an lctl list_nids is being called before the modules are loaded

            testvm.centos7: executing check_logdir /tmp/test_logs/1505518717
            Logging to shared log directory: /tmp/test_logs/1505518717
            testvm.centos7: executing yml_node
            IOC_LIBCFS_GET_NI error 22: Invalid argument <----- call to lctl list_nids
            Client: Lustre version: 2.10.52_98_g8e75219_dirty
            MDS: Lustre version: 2.10.52_98_g8e75219_dirty
            OSS: Lustre version: 2.10.52_98_g8e75219_dirty
            Stopping clients: testvm.centos7 /mnt/lustre (opts:)
            Stopping clients: testvm.centos7 /mnt/lustre2 (opts:)
            Loading modules from /usr/lib64/lustre <--- modules (including lnet) are loaded
            
            ashehata Amir Shehata (Inactive) added a comment - Andreas, the reason for IOC_LIBCFS_GET_NI error 22: Invalid argument is because an lctl list_nids is being called before the modules are loaded testvm.centos7: executing check_logdir /tmp/test_logs/1505518717 Logging to shared log directory: /tmp/test_logs/1505518717 testvm.centos7: executing yml_node IOC_LIBCFS_GET_NI error 22: Invalid argument <----- call to lctl list_nids Client: Lustre version: 2.10.52_98_g8e75219_dirty MDS: Lustre version: 2.10.52_98_g8e75219_dirty OSS: Lustre version: 2.10.52_98_g8e75219_dirty Stopping clients: testvm.centos7 /mnt/lustre (opts:) Stopping clients: testvm.centos7 /mnt/lustre2 (opts:) Loading modules from /usr/lib64/lustre <--- modules (including lnet) are loaded

            Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/29026
            Subject: LU-9904 lnet: reduce logging severity
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0732d936aa07e8ee129829e738b82f10189cd7ce

            gerrit Gerrit Updater added a comment - Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/29026 Subject: LU-9904 lnet: reduce logging severity Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0732d936aa07e8ee129829e738b82f10189cd7ce

            I've started hitting this after a recent update as well. We definitely shouldn't be printing "Error" on the console for something that happens during normal operation. I'm also seeing a new error during mount:

            # sh sanity.sh
            client: executing check_logdir /tmp/test_logs/1505452045
            client: ../libcfs/libcfs/libcfs options: 'libcfs_panic_on_lbug=0'
            Logging to shared log directory: /tmp/test_logs/1505452045
            client: executing yml_node
            IOC_LIBCFS_GET_NI error 22: Invalid argument
            

            The mount itself continues fine, but as with the other error, we shouldn't be printing errors to the console/terminal during normal operations, as that tends to confuse users.

            adilger Andreas Dilger added a comment - I've started hitting this after a recent update as well. We definitely shouldn't be printing "Error" on the console for something that happens during normal operation. I'm also seeing a new error during mount: # sh sanity.sh client: executing check_logdir /tmp/test_logs/1505452045 client: ../libcfs/libcfs/libcfs options: 'libcfs_panic_on_lbug=0' Logging to shared log directory: /tmp/test_logs/1505452045 client: executing yml_node IOC_LIBCFS_GET_NI error 22: Invalid argument The mount itself continues fine, but as with the other error, we shouldn't be printing errors to the console/terminal during normal operations, as that tends to confuse users.

            found a set of steps to reproduce without shutdown:

            peer 2:
            modprobe lnet
            lnetctl lnet configure
            lnetctl net add --net tcp --if eth0,eth1
            
            peer1
            modprobe lnet
            lnetctl lnet configure
            lnetctl net add --net tcp --if eth0,eth1
            lnetctl discover 192.168.122.30@tcp # discover peer2
            # in /var/log/messages
            # Aug 24 17:49:54 MRtest01 kernel: LNetError: 3447:0:(peer.c:1786:lnet_peer_push_event()) Push Put from unknown 0@<0:0> (source 0@<0:0>)
            
            ashehata Amir Shehata (Inactive) added a comment - found a set of steps to reproduce without shutdown: peer 2: modprobe lnet lnetctl lnet configure lnetctl net add --net tcp -- if eth0,eth1 peer1 modprobe lnet lnetctl lnet configure lnetctl net add --net tcp -- if eth0,eth1 lnetctl discover 192.168.122.30@tcp # discover peer2 # in / var /log/messages # Aug 24 17:49:54 MRtest01 kernel: LNetError: 3447:0:(peer.c:1786:lnet_peer_push_event()) Push Put from unknown 0@<0:0> (source 0@<0:0>)

            sure. I'll push in a patch.

            ashehata Amir Shehata (Inactive) added a comment - sure. I'll push in a patch.

            People

              ashehata Amir Shehata (Inactive)
              bogl Bob Glossman (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: