[LU-2406] Interop 2.3<->2.4 Failure: unable to handle kernel NULL pointer dereference at (null) Created: 29/Nov/12  Updated: 26/Dec/12  Resolved: 13/Dec/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None
Environment:

server: 2.3 RHEL6
client: lustre-master build#1065 RHEL6


Issue Links:
Related
is related to LU-1883 osd-ldiskfs fills file offsets into l... Resolved
Severity: 3
Rank (Obsolete): 5701

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/72277ac4-397d-11e2-9fda-52540035b04c.

The sub-test test_26a failed with the following error:

test_26a returned 1

From OST console log:

15:34:24:Lustre: DEBUG MARKER: == sanity test 26a: multiple component symlink ========================= 15:34:02 (1353972842)
15:34:24:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 2>/dev/null || true
15:34:24:BUG: unable to handle kernel NULL pointer dereference at (null)
15:34:24:IP: [<(null)>] (null)
15:34:24:PGD 7bc31067 PUD 7bc38067 PMD 0 
15:34:24:Oops: 0010 [#1] SMP 
15:34:24:last sysfs file: /sys/devices/system/cpu/possible
15:34:24:CPU 0 
15:34:24:Modules linked in: osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) lustre(U) ofd(U) ost(U) cmm(U) mdt(U) mdd(U) mds(U) mgs(U) jbd2 obdecho(U) mgc(U) lquota(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
15:34:24:
15:34:24:Pid: 17619, comm: ll_ost00_007 Not tainted 2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 #1 Red Hat KVM
15:34:24:RIP: 0010:[<0000000000000000>]  [<(null)>] (null)
15:34:24:RSP: 0018:ffff88007b21bdf8  EFLAGS: 00010093
15:34:24:RAX: ffff88007b2a1e30 RBX: ffffffffffffffe8 RCX: 0000000000000000
15:34:24:RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88007b2a1e30
15:34:24:RBP: ffff88007b21be40 R08: 0000000000000000 R09: 0000000000000000
15:34:24:R10: 000000000000000f R11: 000000000000000f R12: 0000000000000000
15:34:24:R13: ffff880078a8a280 R14: 0000000000000000 R15: 0000000000000000
15:34:24:FS:  00007f0021cd2700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
15:34:24:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
15:34:24:CR2: 0000000000000000000fffbc000 - 0000000100000000 (reserved)


 Comments   
Comment by Andreas Dilger [ 30/Nov/12 ]

Unfortunately, there is not enough information in the stack dump to know anything about what failed, or where.

Looking more closely, it appears that the root problem is that the 2.4 test-framework.sh defaults to "USE_OFD=yes", which causes the 2.3 code to run with the ofd and osd-ldiskfs modules on the OST.

I'm just working on a patch to remove "USE_OFD" from the 2.4 t-f entirely.

Comment by Andreas Dilger [ 30/Nov/12 ]

http://review.whamcloud.com/4727

Comment by Andreas Dilger [ 30/Nov/12 ]

This is caused by b2_3 interop tests failing due to LU-1883.

Comment by Jodi Levi (Inactive) [ 05/Dec/12 ]

Per Andreas and Oleg this can be removed as an NF Blocker, but will remain a top blocker for 2.4.

Comment by Sarah Liu [ 12/Dec/12 ]

In the latest testing between tag-2.3.57 and b2_3, this test passed:
https://maloo.whamcloud.com/sub_tests/3ab9ea90-4413-11e2-8b5c-52540035b04c

Comment by Peter Jones [ 13/Dec/12 ]

Landed for 2.4

Generated at Sat Feb 10 01:24:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.