Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.15.0
-
None
-
3
-
9223372036854775807
Description
It looks like we included something in master-next in June (8th or so) and it was not noticed as much, but led to periodic kernel memory corruption in conf-sanity test 111. Since this was awhile ago, exact list of patches is not available for the hash d043fd32f4
Either way the corruption manifests in kernel crashes of which the most typical ones are in netlink (this is the most frequent one with 10 hits since June):
[18817.912385] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc [18849.582422] LustreError: 15f-b: lustre-OST0000: cannot register this server with the MGS: rc = -110. Is the MGS running? [18849.583805] LustreError: 22088:0:(obd_mount_server.c:2027:server_fill_super()) Unable to start targets: -110 [18849.584966] LustreError: 22088:0:(obd_mount_server.c:1644:server_put_super()) no obd lustre-OST0000 [18849.586014] LustreError: 22088:0:(obd_mount_server.c:131:server_deregister_mount()) lustre-OST0000 not registered [18850.106773] Lustre: server umount lustre-OST0000 complete [18850.107377] LustreError: 22088:0:(super25.c:183:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -110 [18850.484936] BUG: unable to handle kernel paging request at fffffffffffffec0 [18850.485009] IP: [<ffffffff816b1930>] netlink_compare+0x10/0x30 [18850.485009] PGD 1c14067 PUD 1c16067 PMD 0 [18850.485009] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [18850.485009] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod loop zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) jbd2 mbcache crc32_generic crc_t10dif crct10dif_generic crct10dif_common pcspkr virtio_balloon virtio_console i2c_piix4 ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm ata_piix drm_panel_orientation_quirks serio_raw virtio_blk floppy libata i2c_core [last unloaded: libcfs] [18850.485009] CPU: 15 PID: 671 Comm: systemd-udevd Kdump: loaded Tainted: P B OE ------------ 3.10.0-7.9-debug #2 [18850.485009] Hardware name: Red Hat KVM, BIOS 1.15.0-1.module_el8.6.0+1087+b42c8331 04/01/2014 [18850.485009] task: ffff88031a525c40 ti: ffff880319910000 task.ti: ffff880319910000 [18850.485009] RIP: 0010:[<ffffffff816b1930>] [<ffffffff816b1930>] netlink_compare+0x10/0x30 [18850.485009] RSP: 0018:ffff880319913d80 EFLAGS: 00010283 [18850.485009] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffaca7 [18850.485009] RDX: ffff880319913d90 RSI: fffffffffffffb10 RDI: ffff880319913da0 [18850.485009] RBP: ffff880319913d80 R08: 0000000000000010 R09: 0000000000000000 [18850.485009] R10: 000055d8d713a010 R11: 0000000000000246 R12: ffff8803293c4ff0 [18850.485009] R13: fffffffffffffb10 R14: ffff8802a30d2b18 R15: 00000000000004f0 [18850.485009] FS: 00007f3c467018c0(0000) GS:ffff880331dc0000(0000) knlGS:0000000000000000 [18850.485009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [18850.485009] CR2: fffffffffffffec0 CR3: 000000031a0d2000 CR4: 00000000000007e0 [18850.485009] Call Trace: [18850.485009] [<ffffffff816b3cf7>] __netlink_lookup+0xe7/0x160 [18850.485009] [<ffffffff816b4097>] netlink_autobind.isra.34+0xc7/0x100 [18850.485009] [<ffffffff816b4af1>] netlink_bind+0x1b1/0x260 [18850.485009] [<ffffffff81656890>] SYSC_bind+0xe0/0x120 [18850.485009] [<ffffffff81654330>] ? sock_alloc_file+0xa0/0x140 [18850.485009] [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a [18850.485009] [<ffffffff817edf49>] ? system_call_after_swapgs+0x96/0x13a [18850.485009] [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a [18850.485009] [<ffffffff817edf49>] ? system_call_after_swapgs+0x96/0x13a [18850.485009] [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a [18850.485009] [<ffffffff817edf49>] ? system_call_after_swapgs+0x96/0x13a [18850.485009] [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a [18850.485009] [<ffffffff817edf49>] ? system_call_after_swapgs+0x96/0x13a [18850.485009] [<ffffffff8165863e>] SyS_bind+0xe/0x10 [18850.485009] [<ffffffff817ee00c>] system_call_fastpath+0x1f/0x24 [18850.485009] [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a [18850.485009] Code: 00 69 00 00 00 ff 93 58 03 00 00 eb e3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 8b 57 08 b8 01 00 00 00 48 89 e5 8b 4a 08 <39> 8e b0 03 00 00 74 08 5d c3 66 0f 1f 44 00 00 48 8b 46 30 48 [18850.485009] RIP [<ffffffff816b1930>] netlink_compare+0x10/0x30 [18850.485009] RSP <ffff880319913d80>
Or in a deferred work handler:
[322679.293627] Lustre: Unmounted lustre-client [322679.307406] LustreError: 16865:0:(super25.c:181:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -5 [322686.752896] LustreError: 16926:0:(mgc_request.c:253:do_config_log_add()) MGC192.168.123.21@tcp: failed processing log, type 1: rc = -5 [322694.649884] LustreError: 16970:0:(mgc_request.c:611:do_requeue()) failed processing log: -5 [322697.762917] LustreError: 15c-8: MGC192.168.123.21@tcp: Confguration from log lustre-client failed from MGS -5. Communication error between node & MGS, a bad configuration, or other errors. See syslog for more info [322697.764854] Lustre: Unmounted lustre-client [322697.780053] LustreError: 16926:0:(super25.c:181:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -5 [322704.234134] LustreError: 16988:0:(mgc_request.c:253:do_config_log_add()) MGC192.168.123.21@tcp: failed processing log, type 1: rc = -5 [322710.691969] LustreError: 17033:0:(mgc_request.c:611:do_requeue()) failed processing log: -5 [322715.247941] LustreError: 15c-8: MGC192.168.123.21@tcp: Confguration from log lustre-client failed from MGS -5. Communication error between node & MGS, a bad configuration, or other errors. See syslog for more info [322715.250428] Lustre: Unmounted lustre-client [322715.300754] LustreError: 16988:0:(super25.c:181:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -5 [322715.778963] Lustre: DEBUG MARKER: conf-sanity test_111: @@@@@@ FAIL: cannot create /mnt/lustre/d111.conf-sanity [322719.941431] Lustre: DEBUG MARKER: centos-19.localnet: executing set_hostid [322720.746449] BUG: unable to handle kernel NULL pointer dereference at (null) [322720.747055] IP: [<ffffffff8140ee21>] rht_deferred_worker+0x201/0x430 [322720.747055] PGD 0 [322720.747055] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [322720.747055] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod loop zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) jbd2 mbcache crc32_generic crc_t10dif crct10dif_generic crct10dif_common virtio_balloon virtio_console pcspkr i2c_piix4 ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm ata_piix drm_panel_orientation_quirks libata serio_raw virtio_blk i2c_core floppy [last unloaded: libcfs] [322720.747055] CPU: 4 PID: 557 Comm: kworker/4:1 Kdump: loaded Tainted: P B OE ------------ 3.10.0-7.9-debug #2 [322720.747055] Hardware name: Red Hat KVM, BIOS 1.15.0-1.module_el8.6.0+1087+b42c8331 04/01/2014 [322720.747055] Workqueue: events rht_deferred_worker [322720.747055] task: ffff8800b95b0010 ti: ffff8802f5890000 task.ti: ffff8802f5890000 [322720.747055] RIP: 0010:[<ffffffff8140ee21>] [<ffffffff8140ee21>] rht_deferred_worker+0x201/0x430 [322720.747055] RSP: 0018:ffff8802f5893d80 EFLAGS: 00010246 [322720.747055] RAX: 0000000000000000 RBX: ffff8800ad12d2d8 RCX: 0000000000000000 [322720.747055] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800ad12ab60 [322720.747055] RBP: ffff8802f5893df8 R08: ffffffff8140e330 R09: 0000000000000118 [322720.747055] R10: ffff8802f5893cb0 R11: 00000000121284be R12: 0000000000000023 [322720.747055] R13: ffff8800a147d508 R14: ffff8800ad12d3f0 R15: ffff8803293c5038 [322720.747055] FS: 0000000000000000(0000) GS:ffff880331b00000(0000) knlGS:0000000000000000 [322720.747055] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [322720.747055] CR2: 0000000000000000 CR3: 0000000319af2000 CR4: 00000000000007e0 [322720.747055] Call Trace: [322720.747055] [<ffffffff810b243d>] process_one_work+0x18d/0x4a0 [322720.790019] [<ffffffff810b3176>] worker_thread+0x126/0x3b0 [322720.790019] [<ffffffff810b3050>] ? manage_workers.isra.23+0x2a0/0x2a0 [322720.790019] [<ffffffff810ba114>] kthread+0xe4/0xf0 [322720.790019] [<ffffffff810ba030>] ? kthread_create_on_node+0x140/0x140 [322720.790019] [<ffffffff817ede5d>] ret_from_fork_nospec_begin+0x7/0x21 [322720.790019] [<ffffffff810ba030>] ? kthread_create_on_node+0x140/0x140 [322720.790019] Code: 00 4e 8d 34 0b 41 8b 4d 04 85 c9 0f 85 eb 00 00 00 8b 53 04 85 d2 0f 85 23 02 00 00 4a 8d 04 0b 48 8b 00 a8 01 0f 85 18 01 00 00 <48> 8b 18 f6 c3 01 0f 85 01 01 00 00 49 89 c6 eb 0c 66 0f 1f 44 [322720.790019] RIP [<ffffffff8140ee21>] rht_deferred_worker+0x201/0x430 [322720.790019] RSP <ffff8802f5893d80> [322720.790019] CR2: 0000000000000000