Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
few recent test runs at Oleg tests showed this trace:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000026 IP: [<ffffffffa033793d>] lu_object_find+0xd/0x20 [obdclass] PGD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) crc32_generic libcfs(OE) dm_flakey dm_mod crc_t10dif crct10dif_generic crct10dif_common rpcsec_gss_krb5 pcspkr squashfs i2c_piix4 i2c_core binfmt_misc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi serio_raw ata_piix libata CPU: 0 PID: 8810 Comm: mdt_out00_001 Kdump: loaded Tainted: P OE ------------ 3.10.0-7.7-debug #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800a5f94780 ti: ffff8800aa2f0000 task.ti: ffff8800aa2f0000 RIP: 0010:[<ffffffffa033793d>] [<ffffffffa033793d>] lu_object_find+0xd/0x20 [obdclass] RSP: 0018:ffff8800aa2f3b58 EFLAGS: 00010246 RAX: 0000000000000006 RBX: ffff8800a5f86448 RCX: 0000000000000000 RDX: ffff8800a5f86448 RSI: ffff8800c05cf000 RDI: ffff88009c38a400 RBP: ffff8800aa2f3b58 R08: ffff880106431000 R09: ffff8800b5ee8080 R10: ffff8800a5f86000 R11: ffff8800aa2f3876 R12: ffff88009c38a400 R13: ffff8800c05cf000 R14: ffff8800ac23fe70 R15: ffff88009c38a400 FS: 0000000000000000(0000) GS:ffff88011e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000026 CR3: 00000000a2856000 CR4: 00000000000006f0 Call Trace: [<ffffffffa0cd7e9b>] mdt_object_find+0x4b/0x170 [mdt] [<ffffffffa0d10dc0>] mdt_lvbo_fill+0x530/0xa80 [mdt] [<ffffffffa05f1f5d>] ldlm_handle_enqueue0+0x5cd/0x15f0 [ptlrpc] [<ffffffffa061ba50>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [<ffffffffa067a292>] tgt_enqueue+0x62/0x210 [ptlrpc] [<ffffffffa0682f55>] tgt_request_handle+0x965/0x1620 [ptlrpc] [<ffffffffa020bdde>] ? libcfs_nid2str_r+0xfe/0x130 [lnet] [<ffffffffa0625f60>] ptlrpc_server_handle_request+0x250/0xb10 [ptlrpc] [<ffffffff810c6941>] ? __wake_up_common_lock+0x91/0xc0 [<ffffffff810c6250>] ? sched_feat_set+0xf0/0xf0 [<ffffffffa062a1c0>] ptlrpc_main+0xcb0/0x1cb0 [ptlrpc] [<ffffffff810c665d>] ? finish_task_switch+0x5d/0x1b0 [<ffffffffa0629510>] ? ptlrpc_register_service+0xff0/0xff0 [ptlrpc] [<ffffffff810b8254>] kthread+0xe4/0xf0 [<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140 [<ffffffff817e5ddd>] ret_from_fork_nospec_begin+0x7/0x21 [<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140
problem is related to wrongly initialized mdt_thread_info values, particularly mti_mdt. Interesting that none of them are needed in mdt_lvbo_fill, there are only couple fields are needed as temporary storage for FID and data buffer.