Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3617

configure incorrectly finds no for RDMA events 14 and 15 on latest RHEL5

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • Lustre 1.8.9
    • Lustre 1.8.9
    • 3
    • 9298

    Description

      We ran into an issue at Yale recently where the Lustre servers all got RDMA_CM_EVENT_TIMEWAIT_EXIT, which the servers didn't recognize and LBUGed on. There are two problems here. The first is that configure clearly doesn't do the right thing anymore, and the second is that Lustre shouldn't LBUG if it gets those events, even if configure doesn't find them.

      I double checked on your build systems to make sure it wasn't just ours:
      http://build.whamcloud.com/job/lustre-b1_8/arch=x86_64,build_type=server,distro=el5,ib_stack=inkernel/258/consoleFull

      That's the official 1.8.9 build console if I'm not mistaken, and you can see:
      checking if OFED has RDMA_CM_EVENT_ADDR_CHANGE... no
      checking if OFED has RDMA_CM_EVENT_TIMEWAIT_EXIT... no

      Attachments

        Activity

          [LU-3617] configure incorrectly finds no for RDMA events 14 and 15 on latest RHEL5

          While the symptoms are the same, this is not the same issue. LU-3166 is configuration issues with OFED-3.5, this one is for RHEL's in-kernel OFED. I think it should be reopened as the solution is going to be different.

          kitwestneat Kit Westneat (Inactive) added a comment - While the symptoms are the same, this is not the same issue. LU-3166 is configuration issues with OFED-3.5, this one is for RHEL's in-kernel OFED. I think it should be reopened as the solution is going to be different.

          This is a duplicate of 3166, which tracks the same issue for release 2.5

          ashehata Amir Shehata (Inactive) added a comment - This is a duplicate of 3166, which tracks the same issue for release 2.5

          Both test fail from the same thing.

          In file included from /data/buildsystem/jsimmons-widow/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18-348.3.1.el5.x86_64/
          include/rdma/rdma_cm.h:39,
          from /data/buildsystem/jsimmons-widow/rpmbuild/usr/src/lustre-1.8.9/build/conftest.c:43:
          /data/buildsystem/jsimmons-widow/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18-348.3.1.el5.x86_64/include/rdma/ib_addr.h
          : In function 'rdma_vlan_dev_vlan_id':
          /data/buildsystem/jsimmons-widow/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18-348.3.1.el5.x86_64/include/rdma/ib_addr.h
          :154: error: implicit declaration of function 'vlan_dev_vlan_id'

          -------------------------------------------------------------------------------------------------------------
          [jsimmons@testbox linux-2.6.18-348.3.1.el5.widow.x86_64]# grep -rl vlan_dev_vlan_id .
          ./include/rdma/ib_addr.h
          ./include/scsi/fc_compat.h

          So yes vlan_dev_vlan_id is located in fc_compact.h

          simmonsja James A Simmons added a comment - Both test fail from the same thing. In file included from /data/buildsystem/jsimmons-widow/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18-348.3.1.el5.x86_64/ include/rdma/rdma_cm.h:39, from /data/buildsystem/jsimmons-widow/rpmbuild/usr/src/lustre-1.8.9/build/conftest.c:43: /data/buildsystem/jsimmons-widow/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18-348.3.1.el5.x86_64/include/rdma/ib_addr.h : In function 'rdma_vlan_dev_vlan_id': /data/buildsystem/jsimmons-widow/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18-348.3.1.el5.x86_64/include/rdma/ib_addr.h :154: error: implicit declaration of function 'vlan_dev_vlan_id' ------------------------------------------------------------------------------------------------------------- [jsimmons@testbox linux-2.6.18-348.3.1.el5.widow.x86_64] # grep -rl vlan_dev_vlan_id . ./include/rdma/ib_addr.h ./include/scsi/fc_compat.h So yes vlan_dev_vlan_id is located in fc_compact.h

          James, can you please look at the questions I just posted on Gerrit?

          isaac Isaac Huang (Inactive) added a comment - James, can you please look at the questions I just posted on Gerrit?

          I'm seeing Lustre initialization errors in Maloo. The logs are not very informative so does anyone know what is going wrong?

          simmonsja James A Simmons added a comment - I'm seeing Lustre initialization errors in Maloo. The logs are not very informative so does anyone know what is going wrong?

          We just ran into this problem on our production systems. No detecting certain features can cause a Oops. I have a patch that fixes this problem at

          http://review.whamcloud.com/#/c/7488

          Sorry the patch points to the original ticket I filed.

          simmonsja James A Simmons added a comment - We just ran into this problem on our production systems. No detecting certain features can cause a Oops. I have a patch that fixes this problem at http://review.whamcloud.com/#/c/7488 Sorry the patch points to the original ticket I filed.

          That is correct. It is RHEL 5.9, with the in-kernel RDMA.

          kitwestneat Kit Westneat (Inactive) added a comment - That is correct. It is RHEL 5.9, with the in-kernel RDMA.

          It looks like this is RBEL 5. What version of RHEL, and what version of OFED?

          adilger Andreas Dilger added a comment - It looks like this is RBEL 5. What version of RHEL, and what version of OFED?

          People

            ashehata Amir Shehata (Inactive)
            kitwestneat Kit Westneat (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: