Details
-
Bug
-
Resolution: Duplicate
-
Major
-
Lustre 2.11.0
-
None
-
3
-
9223372036854775807
Description
Seems to be introduced by LU-9727 patch https://review.whamcloud.com/28114
[75525.109249] Lustre: DEBUG MARKER: == sanity test 232a: failed lock should not block umount ============================================= 21:15:03 (1514081703) [75525.200932] Lustre: *** cfs_fail_loc=31c, val=0*** [75525.201581] LustreError: 11-0: lustre-OST0000-osc-ffff88029158c800: operation ldlm_enqueue to node 0@lo failed: rc = -12 [75526.044646] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [75526.046450] IP: [<ffffffffa124d9ba>] mdd_changelog_data_store_by_fid+0xfa/0x1c0 [mdd] [75526.047346] PGD 0 [75526.047757] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [75526.048211] Modules linked in: brd lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea ata_generic sysfillrect pata_acpi sysimgblt ttm drm_kms_helper ata_piix i2c_piix4 drm virtio_balloon virtio_console pcspkr i2c_core libata serio_raw virtio_blk floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [75526.053084] CPU: 0 PID: 19445 Comm: mdt00_001 Tainted: P OE ------------ 3.10.0-debug #2 [75526.053959] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [75526.054557] task: ffff8800a0182600 ti: ffff8802d4fc8000 task.ti: ffff8802d4fc8000 [75526.055830] RIP: 0010:[<ffffffffa124d9ba>] [<ffffffffa124d9ba>] mdd_changelog_data_store_by_fid+0xfa/0x1c0 [mdd] [75526.057220] RSP: 0018:ffff8802d4fcbaa0 EFLAGS: 00010246 [75526.057688] RAX: 0000000000000040 RBX: ffff8802d4fcbbf0 RCX: 0000000000000060 [75526.058159] RDX: 0000000000000042 RSI: 0000000000000001 RDI: ffff8802ac9b9f90 [75526.058727] RBP: ffff8802d4fcbae8 R08: ffff88024fe15dc8 R09: ffff8800bb225480 [75526.059197] R10: 0000000000009042 R11: 0000000000000000 R12: ffff8802ac9b9f80 [75526.061825] R13: 0000000000000000 R14: ffff8802ac9b9f90 R15: 0000000000000000 [75526.062445] FS: 0000000000000000(0000) GS:ffff88033e400000(0000) knlGS:0000000000000000 [75526.063454] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [75526.063987] CR2: 0000000000000018 CR3: 00000002a55f6000 CR4: 00000000000006f0 [75526.064553] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [75526.065395] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [75526.066085] Stack: [75526.066502] ffff8800bb225480 0000000b00000000 ffff88025241fc00 0000007000009042 [75526.067576] ffff880304158fa0 ffff8802d4fcbbf0 ffff8800bb225480 0000000000000000 [75526.068786] 0000000000000000 ffff8802d4fcbb08 ffffffffa124ea80 ffff880304158fa0 [75526.069698] Call Trace: [75526.070168] [<ffffffffa124ea80>] mdd_changelog_data_store+0xf0/0x220 [mdd] [75526.070687] [<ffffffffa124f95b>] mdd_close+0x25b/0xcf0 [mdd] [75526.071207] [<ffffffffa12c1b58>] mdt_mfd_close+0x478/0x730 [mdt] [75526.071709] [<ffffffffa12904a1>] mdt_obd_disconnect+0x371/0x680 [mdt] [75526.072335] [<ffffffffa05c024f>] target_handle_disconnect+0x13f/0x4c0 [ptlrpc] [75526.073287] [<ffffffffa065c817>] tgt_disconnect+0x37/0x140 [ptlrpc] [75526.073869] [<ffffffffa06651ab>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [75526.074395] [<ffffffffa060a141>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [75526.075333] [<ffffffffa060def8>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [75526.075895] [<ffffffffa060d4a0>] ? ptlrpc_register_service+0xeb0/0xeb0 [ptlrpc] [75526.076806] [<ffffffff810a2eba>] kthread+0xea/0xf0 [75526.077250] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [75526.077772] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [75526.078239] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [75526.078744] Code: 56 08 4d 8d 74 24 10 49 89 44 24 30 31 c0 45 85 ff 49 89 54 24 38 66 41 89 44 24 10 75 53 be 01 00 00 00 4c 89 f7 e8 76 23 ff ff <41> 8b 55 18 41 8b 75 14 4c 89 f7 e8 a6 23 ff ff 48 8b 0c 24 48 [75526.080697] RIP [<ffffffffa124d9ba>] mdd_changelog_data_store_by_fid+0xfa/0x1c0 [mdd] [75526.081632] RSP <ffff8802d4fcbaa0> [75526.082083] CR2: 0000000000000018
(gdb) l *(mdd_changelog_data_store_by_fid+0xfa) 0x219ea is in mdd_changelog_data_store_by_fid (/home/green/git/lustre-release/lustre/mdd/mdd_object.c:675). 670 mdd_changelog_rec_ext_jobid(&rec->cr, uc->uc_jobid); 671 672 if (flags & CLF_EXTRA_FLAGS) { 673 mdd_changelog_rec_ext_extra_flags(&rec->cr, xflags); 674 if (xflags & CLFE_UIDGID) 675 mdd_changelog_rec_extra_uidgid(&rec->cr, 676 uc->uc_uid, uc->uc_gid); 677 } 678 679 rc = mdd_changelog_store(env, mdd, rec, handle); (gdb) quit
If we look at this function, we can see this bit of code at the start:
int xflags = CLFE_INVALID; ... flags = (flags & CLF_FLAGMASK) | CLF_VERSION | CLF_EXTRA_FLAGS; if (uc != NULL && uc->uc_jobid[0] != '\0') flags |= CLF_JOBID; xflags |= CLFE_UIDGID;
It looks like we really need to move that xflags assignment into under the if case?
Also should we really be ORin the new flags onto invalid bit, or sohuld that just become a proper assignment?