[LU-3051] Failure on sanityn test 2f: unable to handle kernel NULL pointer dereference at 0000000000000010 Created: 28/Mar/13  Updated: 28/Mar/13  Resolved: 28/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Sarah Liu Assignee: Di Wang
Resolution: Duplicate Votes: 0
Labels: dne, zfs
Environment:

client and server: lustre-master build #1340
1 MDS with 2 MDTs running ZFS


Rank (Obsolete): 7445

 Description   

MDS console:

Lustre: DEBUG MARKER: == sanityn test 2f: check attr/owner updates on DNE with 2 mtpt's == 17:54:45 (1364432085)
Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x00000003c0000400-0x0000000400000400):1:mdt
Lustre: cli-ctl-lustre-MDT0000-osp-MDT0001: Allocated super-sequence [0x00000003c0000400-0x0000000400000400):1:mdt]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa06f69f7>] lu_object_alloc+0x67/0x300 [obdclass]
PGD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/possible
CPU 0 
Modules linked in: osp(U) lod(U) mdt(U) mdd(U) mgs(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb microcode sg serio_raw iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 2967, comm: mdt_out00_000 Tainted: P           ---------------    2.6.32-279.19.1.el6_lustre.x86_64 #1 Supermicro X8DTT/X8DTT
RIP: 0010:[<ffffffffa06f69f7>]  [<ffffffffa06f69f7>] lu_object_alloc+0x67/0x300 [obdclass]
RSP: 0018:ffff880318485a10  EFLAGS: 00010283
RAX: 0000000000000000 RBX: ffff8803188f3600 RCX: 0000000000000000
RDX: 0000000380000400 RSI: ffffffffa0d3d9fb RDI: ffff88030a1c9b88
RBP: ffff880318485a60 R08: ffff88030a1c9b08 R09: 00000000000034e1
R10: 0000000000000001 R11: a000000000000000 R12: ffffc90020095e80
R13: ffffc90020095e80 R14: ffff88030a1c9b08 R15: 0000000000000000
FS:  00007f7b15178700(0000) GS:ffff880032e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 0000000331b4f000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mdt_out00_000 (pid: 2967, threadinfo ffff880318484000, task ffff880318483540)
Stack:
 ffff880318485a60 ffffffffa06f6dd9 ffff8803267cb800 ffffc9002417a000
<d> ffff8803170fca80 ffff8803170fca80 ffff8803267cb800 ffffc90020095e80
<d> 0000000000000000 0000000000000000 ffff880318485b20 ffffffffa06f75c5
Call Trace:
 [<ffffffffa06f6dd9>] ? htable_lookup+0x119/0x1c0 [obdclass]
 [<ffffffffa06f75c5>] lu_object_find_at+0x205/0x360 [obdclass]
 [<ffffffffa0d39715>] osd_object_find+0x35/0x340 [osd_zfs]
 [<ffffffffa02e2afe>] ? sa_update+0x4e/0x60 [zfs]
 [<ffffffffa0d39be9>] osd_dir_insert+0x1c9/0x4f8 [osd_zfs]
 [<ffffffffa0d259ca>] ? osd_object_sa_update+0x4a/0xd0 [osd_zfs]
 [<ffffffffa0e9fa89>] out_tx_index_insert_exec+0x2b9/0x470 [mdt]
 [<ffffffffa0e9c7e1>] ? out_obj_ref_add+0x91/0x200 [mdt]
 [<ffffffffa0e996cd>] out_tx_end+0xbd/0x550 [mdt]
 [<ffffffffa0e9d8e0>] out_handle+0x720/0xad0 [mdt]
 [<ffffffffa0e5d078>] mdt_handle_common+0x648/0x1660 [mdt]
 [<ffffffffa0e991f5>] mdt_out_handle+0x15/0x20 [mdt]
 [<ffffffffa08861ac>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc]
 [<ffffffffa087d7e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffffa059b2c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffff81052223>] ? __wake_up+0x53/0x70
 [<ffffffffa08876f5>] ptlrpc_main+0xb75/0x1870 [ptlrpc]
 [<ffffffffa0886b80>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffffa0886b80>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
 [<ffffffffa0886b80>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: c0 31 f6 48 89 df 48 8b 42 10 ff 10 48 85 c0 49 89 c6 0f 84 d3 01 00 00 48 3d 00 f0 ff ff 0f 87 56 02 00 00 48 8b 00 49 8b 14 24 <48> 89 50 10 49 8b 54 24 08 48 89 50 18 49 8b 06 49 89 c5 48 89 
RIP  [<ffffffffa06f69f7>] lu_object_alloc+0x67/0x300 [obdclass]
 RSP <ffff880318485a10>
CR2: 0000000000000010
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu


 Comments   
Comment by Sarah Liu [ 28/Mar/13 ]

sanity test-1a hit the same error:

Lustre: DEBUG MARKER: == sanity test 1a: mkdir .../d1; mkdir .../d1/d2 ======================= 18:08:57 (1364432937)
Lustre: cli-ctl-lustre-MDT0000-osp-MDT0001: Allocated super-sequence [0x00000003c0000400-0x0000000400000400):1:mdt]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa06f69f7>] lu_object_alloc+0x67/0x300 [obdclass]
PGD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/possible
CPU 0 
Modules linked in: osp(U) lod(U) mdt(U) mdd(U) mgs(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb microcode serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core ioatdma dca shpchp ext3 jbd mbcache sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 3052, comm: mdt_out00_001 Tainted: P           ---------------    2.6.32-279.19.1.el6_lustre.x86_64 #1 Supermicro X8DTT/X8DTT
RIP: 0010:[<ffffffffa06f69f7>]  [<ffffffffa06f69f7>] lu_object_alloc+0x67/0x300 [obdclass]
RSP: 0018:ffff88031b3a7a10  EFLAGS: 00010287
RAX: 0000000000000000 RBX: ffff8803285c9540 RCX: 0000000000000000
RDX: 0000000380000400 RSI: ffffffffa0d4c9fb RDI: ffff880319544dc8
RBP: ffff88031b3a7a60 R08: ffff880319544d48 R09: 00000000000016a9
R10: 0000000000000001 R11: 5000000000000000 R12: ffffc900207ded10
R13: ffffc900207ded10 R14: ffff880319544d48 R15: 0000000000000000
FS:  00007f779d386700(0000) GS:ffff880032e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 00000003358f5000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mdt_out00_001 (pid: 3052, threadinfo ffff88031b3a6000, task ffff88031a07cae0)
Stack:
 ffff88031b3a7a60 ffffffffa06f6dd9 ffff88031652f800 ffffc90024833000
<d> ffff88031a1d9540 ffff88031a1d9540 ffff88031652f800 ffffc900207ded10
<d> 0000000000000000 0000000000000000 ffff88031b3a7b20 ffffffffa06f75c5
Call Trace:
 [<ffffffffa06f6dd9>] ? htable_lookup+0x119/0x1c0 [obdclass]
 [<ffffffffa06f75c5>] lu_object_find_at+0x205/0x360 [obdclass]
 [<ffffffffa0d48715>] osd_object_find+0x35/0x340 [osd_zfs]
 [<ffffffffa02d8afe>] ? sa_update+0x4e/0x60 [zfs]
 [<ffffffffa0d48be9>] osd_dir_insert+0x1c9/0x4f8 [osd_zfs]
 [<ffffffffa0d349ca>] ? osd_object_sa_update+0x4a/0xd0 [osd_zfs]
 [<ffffffffa0eaea89>] out_tx_index_insert_exec+0x2b9/0x470 [mdt]
 [<ffffffffa0eab7e1>] ? out_obj_ref_add+0x91/0x200 [mdt]
 [<ffffffffa0ea86cd>] out_tx_end+0xbd/0x550 [mdt]
 [<ffffffffa0eac8e0>] out_handle+0x720/0xad0 [mdt]
 [<ffffffffa0e6c078>] mdt_handle_common+0x648/0x1660 [mdt]
 [<ffffffffa0ea81f5>] mdt_out_handle+0x15/0x20 [mdt]
 [<ffffffffa08861ac>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc]
 [<ffffffffa087d7e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffffa059b2c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffff81052223>] ? __wake_up+0x53/0x70
 [<ffffffffa08876f5>] ptlrpc_main+0xb75/0x1870 [ptlrpc]
 [<ffffffffa0886b80>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffffa0886b80>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
 [<ffffffffa0886b80>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: c0 31 f6 48 89 df 48 8b 42 10 ff 10 48 85 c0 49 89 c6 0f 84 d3 01 00 00 48 3d 00 f0 ff ff 0f 87 56 02 00 00 48 8b 00 49 8b 14 24 <48> 89 50 10 49 8b 54 24 08 48 89 50 18 49 8b 06 49 89 c5 48 89 
RIP  [<ffffffffa06f69f7>] lu_object_alloc+0x67/0x300 [obdclass]
 RSP <ffff88031b3a7a10>
CR2: 0000000000000010
Comment by Di Wang [ 28/Mar/13 ]

Ah, you need http://review.whamcloud.com/#change,4933 to be landed, then run DNE test.

Comment by Peter Jones [ 28/Mar/13 ]

believe to be fixed by landing for LU-1187. Please reopen if this is not the case

Generated at Sat Feb 10 01:30:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.