Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5543

tests of sles11sp3 in lustre-review always fail

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.6.0
    • None
    • sles11sp3 server and client
    • 3
    • 1851

    Description

      When I use Test-Parameters in a commit header to call for a test run on sles11sp3 instead of the default el6 it never works. It fails in node-provsioning or lustre-initialization.

      An example is https://testing.hpdd.intel.com/test_sessions/95efaf60-1c54-11e4-b7cc-5254006e85c2. The mod under test is http://review.whamcloud.com/#/c/11133. It has "Test-Parameters: mdsdistro=sles11sp3 ossdistro=sles11sp3 clientdistro=sles11sp3 mdsfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs" in the commit header. The test run fails in lustre-initialization. All the el6 test runs are fine, only the sles11sp3 one fails.

      I strongly suspect there's something wrong with the test node recipe for sles11sp3. As far as I can tell there's nothing wrong with the build, it looks complete and OK.

      Attachments

        Activity

          [LU-5543] tests of sles11sp3 in lustre-review always fail

          Patch landed to Master.

          jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master.

          as far as I know this bug can be closed. future failures in sles11sp3 will be for different causes and should have fresh tickets.

          bogl Bob Glossman (Inactive) added a comment - as far as I know this bug can be closed. future failures in sles11sp3 will be for different causes and should have fresh tickets.

          Can this bug be closed.

          adilger Andreas Dilger added a comment - Can this bug be closed.

          I take it back. all the ldiskfs patch sets for newer kernels already have EXPORT_SYMBOL(ext4_map_blocks) in them. Looks like the sles11sp3 series was the only place where it was needed and missing.

          bogl Bob Glossman (Inactive) added a comment - I take it back. all the ldiskfs patch sets for newer kernels already have EXPORT_SYMBOL(ext4_map_blocks) in them. Looks like the sles11sp3 series was the only place where it was needed and missing.

          in flight ldiskfs patches for el7, fc20, and 3.12 kernels will need similar mods

          bogl Bob Glossman (Inactive) added a comment - in flight ldiskfs patches for el7, fc20, and 3.12 kernels will need similar mods
          bogl Bob Glossman (Inactive) added a comment - - edited

          Joshua: I now think there is really a lustre bug here, not a TEI issue. Strongly suspect this is due to http://review.whamcloud.com/#/c/8116, only recently landed in master.

          Among other things that mod added direct call to ldiskfs_map_blocks() at line 879 of osd-ldiskfs/osd_io.c. It is in conditional code, only exists and is called when HAVE_LDISKFS_MAP_BLOCKS is #define'd. Of the currently supported distros that autoconf setting is only true in sles11sp3.

          The problem comes due to the fact that nothing in ldiskfs patches was changed to make that routine globally callable with EXPORT_SYMBOL(). osd-ldiskfs.ko needs it, but ldiskfs.ko doesn't supply it.

          This would also be a problem in any newer kernels where HAVE_LDISKFS_MAP_BLOCKS is #define'd, but other than build testing most functional tests have been pretty much confined to client-only build. This is only an issue in server builds.

          I will need to push a small mod to ldiskfs patches for sles11sp3 asap. That should take care of it.

          bogl Bob Glossman (Inactive) added a comment - - edited Joshua: I now think there is really a lustre bug here, not a TEI issue. Strongly suspect this is due to http://review.whamcloud.com/#/c/8116 , only recently landed in master. Among other things that mod added direct call to ldiskfs_map_blocks() at line 879 of osd-ldiskfs/osd_io.c. It is in conditional code, only exists and is called when HAVE_LDISKFS_MAP_BLOCKS is #define'd. Of the currently supported distros that autoconf setting is only true in sles11sp3. The problem comes due to the fact that nothing in ldiskfs patches was changed to make that routine globally callable with EXPORT_SYMBOL(). osd-ldiskfs.ko needs it, but ldiskfs.ko doesn't supply it. This would also be a problem in any newer kernels where HAVE_LDISKFS_MAP_BLOCKS is #define'd, but other than build testing most functional tests have been pretty much confined to client-only build. This is only an issue in server builds. I will need to push a small mod to ldiskfs patches for sles11sp3 asap. That should take care of it.

          Bob: could you download a build from our Jenkins and see if you can run the tests? It would make sense that you can do local builds and installs, since that is all on your machine. The fact that the tests fail may point to something wrong in our system (either lbuild or our test nodes).

          joshua Joshua Kugler (Inactive) added a comment - Bob: could you download a build from our Jenkins and see if you can run the tests? It would make sense that you can do local builds and installs, since that is all on your machine. The fact that the tests fail may point to something wrong in our system (either lbuild or our test nodes).

          This doesn't make sense to me. I can do local builds, loads, & installs of current master lustre on sles11sp3 without seeing this problem. ldiskfs_map_blocks() does exist but is local to ldiskfs.ko, not needed to be EXPORTed as far as I can see.

          bogl Bob Glossman (Inactive) added a comment - This doesn't make sense to me. I can do local builds, loads, & installs of current master lustre on sles11sp3 without seeing this problem. ldiskfs_map_blocks() does exist but is local to ldiskfs.ko, not needed to be EXPORTed as far as I can see.
          mdiep Minh Diep added a comment -

          I found that the MDT can not be mounted due to

          [233447.724063] osd_ldiskfs: Unknown symbol ldiskfs_map_blocks (err 0)
          [233447.724998] LustreError: 158-c: Can't load module 'osd-ldiskfs'
          [233447.725006] LustreError: 26468:0:(genops.c:338:class_newdev()) OBD: unknown type: osd-ldiskfs
          [233447.725014] LustreError: 26468:0:(obd_config.c:376:class_attach()) Cannot create device lustre-MDT0000-osd of type osd-ldiskfs : -19
          [233447.725023] LustreError: 26468:0:(obd_mount.c:199:lustre_start_simple()) lustre-MDT0000-osd attach error -19
          [233447.725031] LustreError: 26468:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start osd on /dev/mapper/lvm--Role_MDS-P1: -19
          [233447.725046] LustreError: 26468:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-19)

          This is Lustre issue, not autotest afaik.

          mdiep Minh Diep added a comment - I found that the MDT can not be mounted due to [233447.724063] osd_ldiskfs: Unknown symbol ldiskfs_map_blocks (err 0) [233447.724998] LustreError: 158-c: Can't load module 'osd-ldiskfs' [233447.725006] LustreError: 26468:0:(genops.c:338:class_newdev()) OBD: unknown type: osd-ldiskfs [233447.725014] LustreError: 26468:0:(obd_config.c:376:class_attach()) Cannot create device lustre-MDT0000-osd of type osd-ldiskfs : -19 [233447.725023] LustreError: 26468:0:(obd_mount.c:199:lustre_start_simple()) lustre-MDT0000-osd attach error -19 [233447.725031] LustreError: 26468:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start osd on /dev/mapper/lvm--Role_MDS-P1: -19 [233447.725046] LustreError: 26468:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-19) This is Lustre issue, not autotest afaik.

          People

            bogl Bob Glossman (Inactive)
            bogl Bob Glossman (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: