Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.12.8, Lustre 2.15.0
-
None
-
3
-
9223372036854775807
Description
This is a frequent fixture in boilpot testing:
[561965.306321] Lustre: DEBUG MARKER: == conf-sanity test 90c: check max_mod_rpcs_in_flight update limits ================================== 13:46:20 (1643395580) [561968.515428] Lustre: Lustre: Build Version: 2.12.8_17_ge72a30b [561969.230746] LNet: Added LNI 192.168.123.156@tcp [8/512/0/180] [561969.231606] LNet: Accept secure, port 988 [561971.496242] Lustre: 23709:0:(gss_svc_upcall.c:1149:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active [561971.497876] Key type lgssc registered [561973.046482] Lustre: Echo OBD driver; http://www.lustre.org/ [561977.656992] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [561977.995492] Lustre: MGS: Connection restored to MGC192.168.123.156@tcp_0 (at 0@lo) [561978.674547] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180 [561980.637597] Lustre: DEBUG MARKER: centos-154.localnet: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [561981.944367] LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5 [561981.948361] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc [561983.596774] Lustre: lustre-MDT0000: Connection restored to 192.168.123.156@tcp (at 0@lo) [561983.607091] Lustre: lustre-OST0000: Imperative Recovery not enabled, recovery window 60-180 [561984.606708] Lustre: lustre-OST0000: Connection restored to 192.168.123.156@tcp (at 0@lo) [561984.607637] Lustre: Skipped 1 previous similar message [561984.613312] Lustre: lustre-OST0000: deleting orphan objects from 0x0:91 to 0x0:129 [561985.851321] Lustre: DEBUG MARKER: centos-154.localnet: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid [561986.256018] Lustre: Mounted lustre-client [561988.711619] Lustre: Unmounted lustre-client [561989.633975] Lustre: lustre-MDT0000: Connection restored to 192.168.123.156@tcp (at 0@lo) [561989.634963] Lustre: Skipped 2 previous similar messages [561989.751355] Lustre: Mounted lustre-client [561990.034221] LustreError: 25021:0:(genops.c:2230:obd_set_max_mod_rpcs_in_flight()) lustre-MDT0000-mdc-ffff8802936d4a88: can't set max_mod_rpcs_in_flight=9 higher than ocd_maxmodrpcs=8 returned by the server at connection [561990.131862] Lustre: Modifying parameter lustre-M-MDT0000.mdc.lustre-MDT0000-mdc-*.max_rpcs_in_flight in log params [561992.304651] Lustre: Unmounted lustre-client [561993.514193] Lustre: Mounted lustre-client [561994.530413] Lustre: Modifying parameter lustre-M-MDT0000.mdc.lustre-MDT0000-mdc-*.max_rpcs_in_flight in log params [561994.531511] Lustre: Skipped 1 previous similar message [561996.421067] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0 [561996.422013] IP: [<ffffffffa03cb915>] obd_set_max_mod_rpcs_in_flight+0x1a5/0x370 [obdclass] [561996.422013] PGD aadfb067 PUD aadfa067 PMD 0 [561996.422013] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [561996.422013] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod loop zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic crct10dif_common i2c_piix4 virtio_balloon pcspkr virtio_console ip_tables rpcsec_gss_krb5 drm_kms_helper ttm ata_generic drm pata_acpi drm_panel_orientation_quirks ata_piix floppy serio_raw virtio_blk libata i2c_core [last unloaded: libcfs] [561996.422013] CPU: 3 PID: 25154 Comm: lctl Kdump: loaded Tainted: P OE ------------ 3.10.0-7.9-debug #1 [561996.422013] Hardware name: Red Hat KVM, BIOS 1.13.0-2.module_el8.5.0+746+bbd5d70c 04/01/2014 [561996.422013] task: ffff8802dc2a5c40 ti: ffff88008ccb8000 task.ti: ffff88008ccb8000 [561996.422013] RIP: 0010:[<ffffffffa03cb915>] [<ffffffffa03cb915>] obd_set_max_mod_rpcs_in_flight+0x1a5/0x370 [obdclass] [561996.422013] RSP: 0018:ffff88008ccbbe10 EFLAGS: 00010202 [561996.422013] RAX: 0000000000000006 RBX: ffff8802b4cfc2b8 RCX: 0000000000000007 [561996.422013] RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff8802b4cfc2b8 [561996.422013] RBP: ffff88008ccbbe38 R08: 0000000000000007 R09: 0000000000000001 [561996.422013] R10: 000000000000000a R11: f000000000000000 R12: 0000000000000007 [561996.422013] R13: 0000000000000007 R14: 0000000000000000 R15: ffff88008ccbbf18 [561996.422013] FS: 00007f3ed9911740(0000) GS:ffff880331ac0000(0000) knlGS:0000000000000000 [561996.422013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [561996.422013] CR2: 00000000000000b0 CR3: 00000000ab370000 CR4: 00000000000007e0 [561996.422013] Call Trace: [561996.422013] [<ffffffffa092bce9>] max_mod_rpcs_in_flight_store+0x69/0x80 [mdc] [561996.422013] [<ffffffffa03d5a08>] lustre_attr_store+0x18/0x20 [obdclass] [561996.422013] [<ffffffff812d6e0d>] sysfs_kf_write+0x3d/0x50 [561996.422013] [<ffffffff812d68f0>] kernfs_fop_write+0x100/0x180 [561996.422013] [<ffffffff81245a16>] vfs_write+0xd6/0x230 [561996.422013] [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a [561996.422013] [<ffffffff8124685f>] SyS_write+0x7f/0xf0 [561996.422013] [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a [561996.422013] [<ffffffff817ee00c>] system_call_fastpath+0x1f/0x24 [561996.422013] [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a [561996.422013] Code: 00 00 c7 05 e2 51 0a 00 40 00 00 00 48 83 c2 40 e8 a1 59 e9 ff e9 00 ff ff ff 0f 1f 40 00 f6 05 2d 87 eb ff 20 0f 84 96 fe ff ff <49> 8b 86 b0 00 00 00 45 0f b7 8e 58 03 00 00 0f b7 ce 4d 8b 86 [561996.422013] RIP [<ffffffffa03cb915>] obd_set_max_mod_rpcs_in_flight+0x1a5/0x370 [obdclass]
in the debug log the last activity we see is osc being torn down, so I think it's a race where cli->import is set to NULL as we are trying to call into the max_mod_rpcs_in_flight_store, similar issues is in obd_set_max_rpcs_in_flight
Attachments
Issue Links
- is related to
-
LU-14441 Oops in obd_set_max_rpcs_in_flight()
-
- Resolved
-
probably dup of
LU-14441