Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
After additional testing was added to racer for migration and mirroring (not sure which exactly is the culprint here), patches: https://review.whamcloud.com/13669 and https://review.whamcloud.com/41368
a frequent crash cropped up:
[ 6876.856207] BUG: unable to handle kernel NULL pointer dereference at 0000000000000006 [ 6876.857009] IP: [<ffffffffa0f50aa6>] lod_declare_update_extents.isra.57+0x86/0x1910 [lod] [ 6876.857009] PGD 2aab7d067 PUD 322652067 PMD 0 [ 6876.857009] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 6876.857009] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) dm_flakey dm_mod libcfs(OE) loop zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) jbd2 mbcache crc32_generic crc_t10dif crct10dif_generic crct10dif_common virtio_console virtio_balloon pcspkr i2c_piix4 ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm ata_piix drm_panel_orientation_quirks serio_raw virtio_blk libata i2c_core floppy [last unloaded: libcfs] [ 6876.866880] CPU: 11 PID: 5226 Comm: mdt05_000 Kdump: loaded Tainted: P OE ------------ 3.10.0-7.9-debug #2 [ 6876.866880] Hardware name: Red Hat KVM, BIOS 1.16.0-3.module_el8.7.0+1218+f626c2ff 04/01/2014 [ 6876.866880] task: ffff8800a5780010 ti: ffff88028bde8000 task.ti: ffff88028bde8000 [ 6876.866880] RIP: 0010:[<ffffffffa0f50aa6>] [<ffffffffa0f50aa6>] lod_declare_update_extents.isra.57+0x86/0x1910 [lod] [ 6876.866880] RSP: 0018:ffff88028bdeb738 EFLAGS: 00010246 [ 6876.866880] RAX: 0000000000000000 RBX: ffff8802a92d9b40 RCX: ffff8802a28d6378 [ 6876.866880] RDX: ffff880079bc3ef8 RSI: ffffffffa0f812e0 RDI: ffff880323d607d8 [ 6876.866880] RBP: ffff88028bdeb830 R08: 0000000000000000 R09: 0000000000000000 [ 6876.866880] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88026c882e98 [ 6876.866880] R13: ffff8800a29c0ff8 R14: ffff880323d607d8 R15: ffff8802a92d9b40 [ 6876.866880] FS: 0000000000000000(0000) GS:ffff880331cc0000(0000) knlGS:0000000000000000 [ 6876.866880] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 6876.866880] CR2: 0000000000000006 CR3: 00000002ad280000 CR4: 00000000000007e0 [ 6876.866880] Call Trace: [ 6876.882318] [<ffffffffa0c0a1d0>] ? osd_trans_create+0xa0/0x690 [osd_ldiskfs] [ 6876.882318] [<ffffffffa03ecf88>] ? lu_buf_alloc+0x58/0x320 [obdclass] [ 6876.882318] [<ffffffffa0c0a1d0>] ? osd_trans_create+0xa0/0x690 [osd_ldiskfs] [ 6876.882318] [<ffffffffa0f34098>] ? lod_striping_load+0x98/0x6c0 [lod] [ 6876.884562] [<ffffffffa0f592aa>] lod_declare_update_plain+0x56a/0x980 [lod] [ 6876.884562] [<ffffffff8122054f>] ? __kmalloc+0x1ef/0x370 [ 6876.884562] [<ffffffffa0f5b83e>] lod_declare_layout_change+0x69e/0xba0 [lod] [ 6876.884562] [<ffffffffa03ecfa1>] ? lu_buf_alloc+0x71/0x320 [obdclass] [ 6876.884562] [<ffffffffa0dd339b>] mdd_declare_layout_change+0x4b/0x100 [mdd] [ 6876.884562] [<ffffffffa0dde491>] mdd_layout_change+0xd91/0x1bc0 [mdd] [ 6876.884562] [<ffffffffa0e3db3f>] mdt_layout_change+0x2bf/0x450 [mdt] [ 6876.884562] [<ffffffffa0e444f0>] mdt_intent_layout+0x910/0xeb0 [mdt] [ 6876.884562] [<ffffffffa0e3b4fc>] mdt_intent_opc+0x1dc/0xc40 [mdt] [ 6876.884562] [<ffffffffa0e43be0>] ? mdt_intent_open+0x480/0x480 [mdt] [ 6876.884562] [<ffffffffa0e4134a>] mdt_intent_policy+0xfa/0x460 [mdt] [ 6876.884562] [<ffffffffa06941b1>] ldlm_lock_enqueue+0x3b1/0xbb0 [ptlrpc] [ 6876.884562] [<ffffffffa01c0f05>] ? cfs_hash_rw_unlock+0x15/0x20 [libcfs] [ 6876.884562] [<ffffffffa01c4186>] ? cfs_hash_add+0xa6/0x180 [libcfs] [ 6876.884562] [<ffffffffa06bc4e5>] ldlm_handle_enqueue+0x375/0x17d0 [ptlrpc] [ 6876.884562] [<ffffffffa063ca00>] ? lustre_msg_buf_v2+0x1e0/0x1f0 [ptlrpc] [ 6876.884562] [<ffffffffa06fe958>] tgt_enqueue+0x68/0x240 [ptlrpc] [ 6876.894516] [<ffffffffa0708f8e>] tgt_request_handle+0x88e/0x19b0 [ptlrpc] [ 6876.894784] [<ffffffffa064ebb1>] ptlrpc_server_handle_request+0x251/0xc00 [ptlrpc] [ 6876.894784] [<ffffffffa06508c6>] ptlrpc_main+0xc66/0x1670 [ptlrpc] [ 6876.894784] [<ffffffff810dbb51>] ? put_prev_entity+0x31/0x400 [ 6876.894784] [<ffffffff814119f9>] ? do_raw_spin_unlock+0x49/0x90 [ 6876.894784] [<ffffffffa064fc60>] ? ptlrpc_wait_event+0x620/0x620 [ptlrpc] [ 6876.894784] [<ffffffff810ba114>] kthread+0xe4/0xf0 [ 6876.894784] [<ffffffff810ba030>] ? kthread_create_on_node+0x140/0x140 [ 6876.894784] [<ffffffff817ede5d>] ret_from_fork_nospec_begin+0x7/0x21 [ 6876.894784] [<ffffffff810ba030>] ? kthread_create_on_node+0x140/0x140 [ 6876.894784] Code: 28 ff 01 74 0d f6 05 c5 2a 28 ff 04 0f 85 92 07 00 00 48 63 45 b8 48 8d 04 80 48 01 c0 48 89 85 68 ff ff ff 49 03 87 98 00 00 00 <0f> b7 58 06 0f b7 40 08 83 c0 01 41 f6 87 a0 00 00 00 1e 44 0f [ 6876.901347] RIP [<ffffffffa0f50aa6>] lod_declare_update_extents.isra.57+0x86/0x1910 [lod] [ 6876.901347] RSP <ffff88028bdeb738> [ 6876.901347] CR2: 0000000000000006
Here are crashes with crashdumps:
http://testing.linuxhacker.ru/lustre-reports/external/crashes/boilpot-bigmem-50-2023-08-22-10:12:59/
Attachments
Issue Links
- is duplicated by
-
LU-17066 lod_xattr_set()) ASSERTION( (!!(!strcmp(name, "lustre.""lov") || !strcmp(name, "trusted.lov")) == !!(!lod_dt_obj(dt)->ldo_comp_cached)) ) failed
- Resolved
- is related to
-
LU-14988 crash in ll_migrate in racer
- Open
-
LU-18466 code glitch in MDD transaction retry
- Open
- is related to
-
LU-14621 Broken lock-transaction ordering in MDS code
- Resolved