Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2406

Interop 2.3<->2.4 Failure: unable to handle kernel NULL pointer dereference at (null)

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • None
    • server: 2.3 RHEL6
      client: lustre-master build#1065 RHEL6
    • 3
    • 5701

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/72277ac4-397d-11e2-9fda-52540035b04c.

      The sub-test test_26a failed with the following error:

      test_26a returned 1

      From OST console log:

      15:34:24:Lustre: DEBUG MARKER: == sanity test 26a: multiple component symlink ========================= 15:34:02 (1353972842)
      15:34:24:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 2>/dev/null || true
      15:34:24:BUG: unable to handle kernel NULL pointer dereference at (null)
      15:34:24:IP: [<(null)>] (null)
      15:34:24:PGD 7bc31067 PUD 7bc38067 PMD 0 
      15:34:24:Oops: 0010 [#1] SMP 
      15:34:24:last sysfs file: /sys/devices/system/cpu/possible
      15:34:24:CPU 0 
      15:34:24:Modules linked in: osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) lustre(U) ofd(U) ost(U) cmm(U) mdt(U) mdd(U) mds(U) mgs(U) jbd2 obdecho(U) mgc(U) lquota(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      15:34:24:
      15:34:24:Pid: 17619, comm: ll_ost00_007 Not tainted 2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 #1 Red Hat KVM
      15:34:24:RIP: 0010:[<0000000000000000>]  [<(null)>] (null)
      15:34:24:RSP: 0018:ffff88007b21bdf8  EFLAGS: 00010093
      15:34:24:RAX: ffff88007b2a1e30 RBX: ffffffffffffffe8 RCX: 0000000000000000
      15:34:24:RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88007b2a1e30
      15:34:24:RBP: ffff88007b21be40 R08: 0000000000000000 R09: 0000000000000000
      15:34:24:R10: 000000000000000f R11: 000000000000000f R12: 0000000000000000
      15:34:24:R13: ffff880078a8a280 R14: 0000000000000000 R15: 0000000000000000
      15:34:24:FS:  00007f0021cd2700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      15:34:24:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      15:34:24:CR2: 0000000000000000000fffbc000 - 0000000100000000 (reserved)
      

      Attachments

        Issue Links

          Activity

            [LU-2406] Interop 2.3<->2.4 Failure: unable to handle kernel NULL pointer dereference at (null)
            pjones Peter Jones added a comment -

            Landed for 2.4

            pjones Peter Jones added a comment - Landed for 2.4
            sarah Sarah Liu added a comment -

            In the latest testing between tag-2.3.57 and b2_3, this test passed:
            https://maloo.whamcloud.com/sub_tests/3ab9ea90-4413-11e2-8b5c-52540035b04c

            sarah Sarah Liu added a comment - In the latest testing between tag-2.3.57 and b2_3, this test passed: https://maloo.whamcloud.com/sub_tests/3ab9ea90-4413-11e2-8b5c-52540035b04c

            Per Andreas and Oleg this can be removed as an NF Blocker, but will remain a top blocker for 2.4.

            jlevi Jodi Levi (Inactive) added a comment - Per Andreas and Oleg this can be removed as an NF Blocker, but will remain a top blocker for 2.4.

            This is caused by b2_3 interop tests failing due to LU-1883.

            adilger Andreas Dilger added a comment - This is caused by b2_3 interop tests failing due to LU-1883 .
            adilger Andreas Dilger added a comment - http://review.whamcloud.com/4727

            Unfortunately, there is not enough information in the stack dump to know anything about what failed, or where.

            Looking more closely, it appears that the root problem is that the 2.4 test-framework.sh defaults to "USE_OFD=yes", which causes the 2.3 code to run with the ofd and osd-ldiskfs modules on the OST.

            I'm just working on a patch to remove "USE_OFD" from the 2.4 t-f entirely.

            adilger Andreas Dilger added a comment - Unfortunately, there is not enough information in the stack dump to know anything about what failed, or where. Looking more closely, it appears that the root problem is that the 2.4 test-framework.sh defaults to "USE_OFD=yes", which causes the 2.3 code to run with the ofd and osd-ldiskfs modules on the OST. I'm just working on a patch to remove "USE_OFD" from the 2.4 t-f entirely.

            People

              adilger Andreas Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: