Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.13.0, Lustre 2.12.3
-
None
-
3
-
9223372036854775807
Description
It looks like something broke in master/b2_12 relatively recently.
Typical crash:
[14473.601088] Lustre: DEBUG MARKER: == sanity test complete, duration 5433 sec =========================================================== 03:14:06 (1567926846) [14492.366277] BUG: unable to handle kernel NULL pointer dereference at 0000000000000c80 [14492.370164] IP: [<ffffffffa0b8e854>] osd_object_delete+0x1f4/0x2a0 [osd_ldiskfs] [14492.372007] PGD 0 [14492.372787] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC [14492.373666] Modules linked in: dm_flakey dm_mod lustre(OE) mdt(OE) mdd(OE) mdc(OE) obdecho(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) mgc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic crct10dif_common virtio_console pcspkr i2c_piix4 virtio_balloon binfmt_misc ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm ata_piix drm_panel_orientation_quirks libata floppy virtio_blk serio_raw i2c_core [last unloaded: mdt] [14492.388104] CPU: 6 PID: 8302 Comm: ldlm_cn03_003 Kdump: loaded Tainted: P W OE ------------ 3.10.0-7.6-debug #2 [14492.389885] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [14492.390901] task: ffff88003c6209c0 ti: ffff88004139c000 task.ti: ffff88004139c000 [14492.392652] RIP: 0010:[<ffffffffa0b8e854>] [<ffffffffa0b8e854>] osd_object_delete+0x1f4/0x2a0 [osd_ldiskfs] [14492.394473] RSP: 0018:ffff88004139fa30 EFLAGS: 00010246 [14492.395341] RAX: 0000000000000000 RBX: 0000000000000c80 RCX: 0000000000000000 [14492.396537] RDX: ffff88006430ae00 RSI: ffffffffa0be0160 RDI: ffff880082322b40 [14492.397291] RBP: ffff88004139fa60 R08: 0000000000000000 R09: d8c8000000000000 [14492.398095] R10: ffff8800bb75e000 R11: ffff8800bb75e7c8 R12: 0000000000000000 [14492.398993] R13: ffff880115978e00 R14: ffff880082322b40 R15: 0000000000000000 [14492.399675] FS: 0000000000000000(0000) GS:ffff880139780000(0000) knlGS:0000000000000000 [14492.401123] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [14492.402167] CR2: 0000000000000c80 CR3: 0000000130b34000 CR4: 00000000000006e0 [14492.422237] Call Trace: [14492.423115] [<ffffffffa036d0a5>] lu_object_free.isra.31+0x65/0x170 [obdclass] [14492.424986] [<ffffffffa0370e42>] lu_object_put+0xc2/0x3c0 [obdclass] [14492.426016] [<ffffffffa0d7fea0>] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] [14492.427067] [<ffffffffa0d7ff51>] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] [14492.428115] [<ffffffffa05f25c6>] ldlm_work_cp_ast_lock+0xa6/0x1d0 [ptlrpc] [14492.429196] [<ffffffffa06393b0>] ptlrpc_set_wait+0x70/0x790 [ptlrpc] [14492.430253] [<ffffffffa062fe6d>] ? ptlrpc_prep_set+0x5d/0x290 [ptlrpc] [14492.431457] [<ffffffffa0350279>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [14492.432481] [<ffffffff810b5a70>] ? __init_waitqueue_head+0x20/0x30 [14492.433523] [<ffffffffa062ff07>] ? ptlrpc_prep_set+0xf7/0x290 [ptlrpc] [14492.434526] [<ffffffffa05f7e15>] ldlm_run_ast_work+0xd5/0x380 [ptlrpc] [14492.435592] [<ffffffffa05f927f>] __ldlm_reprocess_all+0xff/0x340 [ptlrpc] [14492.436653] [<ffffffffa05f94d0>] ldlm_reprocess_all+0x10/0x20 [ptlrpc] [14492.437708] [<ffffffffa06219b4>] ldlm_handle_convert0+0x2f4/0x450 [ptlrpc] [14492.438727] [<ffffffffa0621ffb>] ldlm_cancel_handler+0x29b/0x590 [ptlrpc] [14492.439812] [<ffffffffa06524b6>] ptlrpc_server_handle_request+0x256/0xad0 [ptlrpc] [14492.441715] [<ffffffffa06564a1>] ptlrpc_main+0xb91/0x2110 [ptlrpc] [14492.442734] [<ffffffff810c32ed>] ? finish_task_switch+0x5d/0x1b0 [14492.443744] [<ffffffff817b6cd0>] ? __schedule+0x410/0xa00 [14492.445222] [<ffffffffa0655910>] ? ptlrpc_register_service+0xfb0/0xfb0 [ptlrpc] [14492.447031] [<ffffffff810b4ed4>] kthread+0xe4/0xf0 [14492.447974] [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140 [14492.448960] [<ffffffff817c4c77>] ret_from_fork_nospec_begin+0x21/0x21 [14492.449957] [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140 [14492.450934] Code: e0 03 00 0f 1f 40 00 4d 85 ed 74 c0 4c 89 f7 48 c7 c6 60 01 be a0 e8 cc e7 7d ff 49 89 c4 44 89 f8 31 c9 49 8d 9c 24 80 0c 00 00 <49> 89 84 24 80 0c 00 00 4c 89 ee 4c 89 f7 48 89 da e8 86 16 f0
Always in dom discard path
First master-next occurence in githash f8c100f.
List of patches:
f8c100f LU-12575 build: add ibutils2 for MOFED build 0ecb29f LU-12560 tests: Use full path for test-groups 4f4f90b LU-12400 ptlrpc: Sun RPC changes for RCU locking 5a19817 LU-12527 utils: Make lustre_user.h c++-legal f0ba5de LU-12472 tests: update sanity-krb5.sh adfa543 LU-4315 doc: split lctl get_param and set_param man pages a57ede6 LU-12355 ldiskfs: Remove old map blocks support 87f6b68 LU-12405 lnet: Oracle OFED extensions default to on 04cdd15 LU-12343 osc: Fix dom handling in weight_ast 3d1920a LU-8066 mdt: migrate procfs files to sysfs bc02a4e LU-12075 mdt: commit migrate transaction with locks held 27cd9fd LU-10070 test: llapi_layout_test enhancements 2490ed4 LU-11617 mdc: fix possible deadlock in chlg_open() 860dbcb LU-12559 ptlrpc: Hold imp lock for idle reconnect ce3ccbd LU-6142 tests: Fix style issues for write_disjoint.c 6012e3e LU-6142 tests: Fix style issues for write_append_truncate.c 16792c9 LU-6142 tests: Fix style issues for lp_utils.c ac153a9 LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted b598d82 LU-12523 ptlrpc: Don't get jobid in body_v2 b2f2bfc LU-6202 utils: remove obsolete l_ioctl2() wrapper 42fdd2f LU-12440 lnet: Misleading error from lnet_is_health_check aec7b1a LU-12439 lnet: Convert noisy timeout error to cdebug 93419c4 LU-11023 quota: remove quota pool ID
First b2_12-next occurrence githash: 8ec5896 list of patches:
3674d393d5 LU-12608 kernel: kernel update RHEL7.6 [3.10.0-957.27.2.el7] fe03ca414f LU-11761 fld: let's caller to retry FLD_QUERY 316310cbb4 LU-12387 tests: Validate l_tunedisk max_sectors_kb tuning 0edb0a6951 LU-8130 libcfs: don't include rhashtable if unavailable 9ac11632fb LU-12660 kernel: kernel update SLES12 SP4 [4.12.14-95.29.1] 3a35d97aee LU-12539 build: pass --with-o2ib when building deb packages e2ac8c3269 LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted cbb6d8c8ef LU-12586 lov: Correct write_intent end for trunc a1e888dcbc LU-10756 ptlrpc: change IMPORT_SET_* macros into real functions bbf40d8c71 LU-11537 osp: avoid nested transaction 61c93c46c4 LU-12343 osc: Fix dom handling in weight_ast
I guess LU-12343 is the common factor here?