[LU-12896] recovery-small test_110k: (gss_keyring.c:152:ctx_upcall_timeout_kr()) ASSERTION( key ) failed Created: 22/Oct/19 Updated: 19/Oct/23 Resolved: 16/Oct/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Sebastien Buisson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Chris Horn <hornc@cray.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f10303a6-f4c7-11e9-add9-52540065bddc test_110k failed and hit an assertion: [ 5669.278804] Lustre: DEBUG MARKER: == rpc test complete, duration -o sec ================================================================ 10:29:44 (1571740184) [ 5669.612623] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-35vm12.onyx.whamcloud.com: executing set_default_debug -1 all 4 [ 5669.809330] Lustre: DEBUG MARKER: onyx-35vm12.onyx.whamcloud.com: executing set_default_debug -1 all 4 [ 5704.525897] Lustre: 0:0:(gss_keyring.c:150:ctx_upcall_timeout_kr()) ctx ffff94b33fc03da0, key (null) [ 5704.527744] LustreError: 0:0:(gss_keyring.c:152:ctx_upcall_timeout_kr()) ASSERTION( key ) failed: [ 5704.529239] LustreError: 0:0:(gss_keyring.c:152:ctx_upcall_timeout_kr()) LBUG [ 5704.530425] Kernel panic - not syncing: LBUG in interrupt. [ 5704.531587] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7.x86_64 #1 [ 5704.533438] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 5704.534384] Call Trace: [ 5704.534832] [] dump_stack+0x19/0x1b [ 5704.535867] [] panic+0xe8/0x21f [ 5704.536703] [] ? ctx_unlist_kr+0xc0/0xc0 [ptlrpc_gss] [ 5704.537877] [] lbug_with_loc+0x8d/0xa0 [libcfs] [ 5704.538922] [] ? ctx_unlist_kr+0xc0/0xc0 [ptlrpc_gss] [ 5704.540029] [] ctx_upcall_timeout_kr+0xc3/0xd0 [ptlrpc_gss] [ 5704.541244] [] call_timer_fn+0x38/0x110 [ 5704.542162] [] ? ctx_unlist_kr+0xc0/0xc0 [ptlrpc_gss] [ 5704.543272] [] run_timer_softirq+0x24d/0x300 [ 5704.544254] [] __do_softirq+0xf5/0x280 [ 5704.545180] [] call_softirq+0x1c/0x30 [ 5704.546095] [] do_softirq+0x65/0xa0 [ 5704.546972] [] irq_exit+0x105/0x110 [ 5704.547823] [] smp_apic_timer_interrupt+0x48/0x60 [ 5704.548871] [] apic_timer_interrupt+0x162/0x170 [ 5704.549893] [] ? __cpuidle_text_start+0x8/0x8 [ 5704.551024] [] ? native_safe_halt+0xb/0x20 [ 5704.551976] [] default_idle+0x1e/0xc0 [ 5704.552881] [] arch_cpu_idle+0x20/0xc0 [ 5704.553807] [] cpu_startup_entry+0x14a/0x1e0 [ 5704.554795] [] rest_init+0x77/0x80 [ 5704.555665] [] start_kernel+0x44b/0x46c [ 5704.556587] [] ? repair_env_string+0x5c/0x5c [ 5704.557586] [] ? early_idt_handler_array+0x120/0x120 [ 5704.558683] [] x86_64_start_reservations+0x24/0x26 [ 5704.559753] [] x86_64_start_kernel+0x154/0x177 [ 5704.560776] [] start_cpu+0x5/0x14 |
| Comments |
| Comment by Chris Horn [ 22/Oct/19 ] |
|
Looks like same issue with different signature: https://testing.whamcloud.com/test_sets/1a32dd98-f4d5-11e9-a197-52540065bddc [ 7024.609940] BUG: unable to handle kernel paging request at ffffffff9d4aab37 [ 7024.611169] IP: [] ctx_upcall_timeout_kr+0x85/0xd0 [ptlrpc_gss] [ 7024.612513] PGD 28814067 PUD 28815063 PMD 27c000e1 [ 7024.613458] Oops: 0003 [#1] SMP [ 7024.614093] Modules linked in: ptlrpc_gss(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ucm rpcrdma rdma_ucm ib_uverbs ib_umad ib_iser rdma_cm ib_ipoib iw_cm libiscsi scsi_transport_iscsi ib_cm mlx4_ib ib_core sunrpc iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev i2c_piix4 pcspkr virtio_balloon parport_pc parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi mlx4_en ptp pps_core virtio_blk ata_piix mlx4_core libata 8139too crct10dif_pclmul crct10dif_common crc32c_intel [ 7024.628253] serio_raw virtio_pci devlink virtio_ring virtio 8139cp mii floppy [ 7024.629523] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7.x86_64 #1 [ 7024.631366] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 7024.632322] task: ffffffff9e018480 ti: ffffffff9e000000 task.ti: ffffffff9e000000 [ 7024.633557] RIP: 0010:[] [] ctx_upcall_timeout_kr+0x85/0xd0 [ptlrpc_gss] [ 7024.635229] RSP: 0018:ffff8df03fc03e50 EFLAGS: 00010292 [ 7024.636120] RAX: 0000000000000000 RBX: ffffffff9d4aaac7 RCX: 000000000000083f [ 7024.637298] RDX: 00000000ffffffff RSI: 0000000000000200 RDI: ffff8df03fc03da0 [ 7024.638464] RBP: ffff8df03fc03e60 R08: 0000000000000000 R09: ffff8df03d160f00 [ 7024.639638] R10: 000000000000082c R11: ffff8df03fc039ce R12: ffff8df0254611e0 [ 7024.640825] R13: 0000000000000100 R14: ffffffffc0fe38d0 R15: ffff8df0254611e0 [ 7024.642127] FS: 0000000000000000(0000) GS:ffff8df03fc00000(0000) knlGS:0000000000000000 [ 7024.643909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7024.644970] CR2: ffffffff9d4aab37 CR3: 000000007b1bc000 CR4: 00000000000606f0 [ 7024.646171] Call Trace: [ 7024.646610] [ 7024.646999] [] call_timer_fn+0x38/0x110 [ 7024.647991] [] ? ctx_unlist_kr+0xc0/0xc0 [ptlrpc_gss] [ 7024.649106] [] run_timer_softirq+0x24d/0x300 [ 7024.650107] [] __do_softirq+0xf5/0x280 [ 7024.651070] [] call_softirq+0x1c/0x30 [ 7024.651999] [] do_softirq+0x65/0xa0 [ 7024.652889] [] irq_exit+0x105/0x110 [ 7024.653745] [] smp_apic_timer_interrupt+0x48/0x60 [ 7024.654815] [] apic_timer_interrupt+0x162/0x170 [ 7024.655843] [ 7024.656186] [] ? __cpuidle_text_start+0x8/0x8 [ 7024.657272] [] ? native_safe_halt+0xb/0x20 [ 7024.658235] [] default_idle+0x1e/0xc0 [ 7024.659135] [] arch_cpu_idle+0x20/0xc0 [ 7024.660066] [] cpu_startup_entry+0x14a/0x1e0 [ 7024.661069] [] rest_init+0x77/0x80 [ 7024.661946] [] start_kernel+0x44b/0x46c [ 7024.662860] [] ? repair_env_string+0x5c/0x5c [ 7024.663840] [] ? early_idt_handler_array+0x120/0x120 [ 7024.664942] [] x86_64_start_reservations+0x24/0x26 [ 7024.666009] [] x86_64_start_kernel+0x154/0x177 [ 7024.667030] [] start_cpu+0x5/0x14 [ 7024.667864] Code: c7 05 84 f2 01 00 00 04 00 00 48 c7 05 81 f2 01 00 90 2b 00 c1 e8 fc ad 8d ff 48 85 db 74 18 48 8d bd 40 ff ff ff e8 ab 50 fe ff 80 4b 70 04 48 83 c4 08 5b 5d c3 48 c7 c7 00 89 ff c0 48 c7 [ 7024.673175] RIP [] ctx_upcall_timeout_kr+0x85/0xd0 [ptlrpc_gss] [ 7024.674479] RSP [ 7024.675076] CR2: ffffffff9d4aab37 |
| Comment by Alex Zhuravlev [ 31/Oct/19 ] |
|
https://testing.whamcloud.com/test_sets/9fa3cc34-fb6b-11e9-a197-52540065bddc |
| Comment by Andreas Dilger [ 20/Oct/20 ] |
|
The patch https://review.whamcloud.com/40161 " |
| Comment by Gerrit Updater [ 22/Sep/23 ] |
|
"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52473 |
| Comment by Gerrit Updater [ 16/Oct/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52473/ |
| Comment by Peter Jones [ 16/Oct/23 ] |
|
Landed for 2.16 |