[LU-10451] sptlrpc_ctxs_lprocfs_seq_show crash in recovery-small test 57 Created: 03/Jan/18 Updated: 04/Jan/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
After Well, I just had a very similar crash in a different place: [128715.132365] Lustre: DEBUG MARKER: == recovery-small test 57: read procfs entries causes kernel crash =================================== 10:05:20 (1514387120) [128717.357108] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC [128717.358256] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm floppy i2c_piix4 virtio_console pcspkr virtio_balloon serio_raw virtio_blk ata_piix i2c_core libata nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [128717.371008] CPU: 3 PID: 20280 Comm: lctl Tainted: P OE ------------ 3.10.0-debug #2 [128717.372286] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [128717.372932] task: ffff8802cd528a80 ti: ffff8800a6838000 task.ti: ffff8800a6838000 [128717.382186] RIP: 0010:[<ffffffffa05b8a47>] [<ffffffffa05b8a47>] sptlrpc_ctxs_lprocfs_seq_show+0x27/0x100 [ptlrpc] [128717.383566] RSP: 0018:ffff8800a683be78 EFLAGS: 00010203 [128717.384213] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8802a1586700 RCX: 0000000000000004 [128717.385411] RDX: fffffffffffffff4 RSI: 0000000000000001 RDI: ffffffffa0636425 [128717.389241] RBP: ffff8800a683be90 R08: 0000000000000001 R09: ffff8802f092f000 [128717.390434] R10: 0000000000000000 R11: 0000000000000246 R12: ffff8800a284ef00 [128717.391627] R13: 0000000000000001 R14: ffff8800a683bf48 R15: ffff8800a284ef00 [128717.394109] FS: 00007f93a5430740(0000) GS:ffff88033e460000(0000) knlGS:0000000000000000 [128717.395374] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [128717.396187] CR2: 00007f93a4aa7000 CR3: 0000000095e38000 CR4: 00000000000006e0 [128717.403348] Lustre: Unmounted lustre-client [128717.413399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [128717.414405] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [128717.415645] Stack: [128717.416340] 0000000000000000 ffff880096912e00 0000000000000001 ffff8800a683bf00 [128717.417877] ffffffff81212c85 0000000000001000 0000000001e19af0 ffff8800a284ef38 [128717.419405] 0000000000001000 0000000000000000 ffff880096912e00 0000000001ff0c13 [128717.420901] Call Trace: [128717.421503] [<ffffffff81212c85>] seq_read+0x105/0x3e0 [128717.422165] [<ffffffff811ed1dc>] vfs_read+0x9c/0x170 [128717.422623] [<ffffffff811edd44>] SyS_read+0x84/0xf0 [128717.423149] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b [128717.423877] Code: 1f 44 00 00 0f 1f 44 00 00 55 b9 04 00 00 00 48 89 e5 41 55 41 54 49 89 fc 53 48 8b 9f d8 00 00 00 48 c7 c7 25 64 63 a0 48 8b 03 <4c> 8b 68 40 4c 89 ee f3 a6 75 42 48 8b bb 58 08 00 00 48 85 ff [128717.425876] RIP [<ffffffffa05b8a47>] sptlrpc_ctxs_lprocfs_seq_show+0x27/0x100 [ptlrpc] (gdb) l *(sptlrpc_ctxs_lprocfs_seq_show+0x27)
0x8ca47 is in sptlrpc_ctxs_lprocfs_seq_show (/home/green/git/lustre-release/lustre/ptlrpc/sec_lproc.c:122).
117 {
118 struct obd_device *dev = seq->private;
119 struct client_obd *cli = &dev->u.cli;
120 struct ptlrpc_sec *sec = NULL;
121
122 LASSERT(strcmp(dev->obd_type->typ_name, LUSTRE_OSC_NAME) == 0 ||
123 strcmp(dev->obd_type->typ_name, LUSTRE_MDC_NAME) == 0 ||
124 strcmp(dev->obd_type->typ_name, LUSTRE_MGC_NAME) == 0 ||
125 strcmp(dev->obd_type->typ_name, LUSTRE_LWP_NAME) == 0 ||
126 strcmp(dev->obd_type->typ_name, LUSTRE_OSP_NAME) == 0);
It's not as frequent as all those othe failures, but still needs to be looked at I guess. |
| Comments |
| Comment by Oleg Drokin [ 04/Jan/18 ] |
|
hm, fresh on the heel of this crash I just had another one in the same area: [635642.285323] Lustre: DEBUG MARKER: == recovery-small test 57: read procfs entries causes kernel crash =================================== 18:39:59 (1515022799) [635644.361797] BUG: unable to handle kernel paging request at ffff8802d00b1fd0 [635644.363434] IP: [<ffffffffa07fa405>] osc_stats_seq_show+0x65/0xb0 [osc] [635644.364258] PGD 2e75067 PUD 33e9f9067 PMD 33e978067 PTE 80000002d00b1060 [635644.365053] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [635644.365787] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm drm_kms_helper serio_raw virtio_blk ata_generic pcspkr floppy virtio_balloon virtio_console pata_acpi drm ata_piix i2c_piix4 i2c_core libata nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [635644.372330] CPU: 10 PID: 7201 Comm: lctl Tainted: P OE ------------ 3.10.0-debug #2 [635644.373754] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [635644.375666] task: ffff88008cf2e580 ti: ffff88006f360000 task.ti: ffff88006f360000 [635644.377090] RIP: 0010:[<ffffffffa07fa405>] [<ffffffffa07fa405>] osc_stats_seq_show+0x65/0xb0 [osc] [635644.378526] RSP: 0018:ffff88006f363e68 EFLAGS: 00010246 [635644.380324] Lustre: Unmounted lustre-client [635644.382857] RAX: 0000000000000000 RBX: ffff8802a5459f00 RCX: 0000000000000000 [635644.383690] RDX: 0000000000001000 RSI: ffffffffa082113c RDI: 0000000000000000 [635644.384543] RBP: ffff88006f363e90 R08: 000000000000000a R09: 000000000000fffe [635644.385353] R10: 0000000000000000 R11: ffff88006f363cfe R12: ffff8802d00b1f80 [635644.386173] R13: 0000000000000001 R14: ffff88006f363f48 R15: ffff8802a5459f00 [635644.387021] FS: 00007f35a45bf740(0000) GS:ffff88033e540000(0000) knlGS:0000000000000000 [635644.387869] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [635644.388306] CR2: ffff8802d00b1fd0 CR3: 000000009d8e9000 CR4: 00000000000006e0 [635644.389378] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [635644.390328] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [635644.391156] Stack: [635644.391554] 000000005a4d69d1 000000002a30b8e8 00000000399a6e25 0000000000000000 [635644.392407] ffff88008fa0ce00 ffff88006f363f00 ffffffff81212c85 0000000000001000 [635644.393615] 00000000009f6010 ffff8802a5459f38 0000000000001000 0000000000000000 [635644.395054] Call Trace: [635644.395743] [<ffffffff81212c85>] seq_read+0x105/0x3e0 [635644.396544] [<ffffffff811ed1dc>] vfs_read+0x9c/0x170 [635644.397203] [<ffffffff811edd44>] SyS_read+0x84/0xf0 [635644.397850] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b [635644.398509] Code: 48 8b 55 d8 48 c7 c6 c8 32 82 a0 48 89 df 31 c0 e8 c1 8d a1 e0 49 8b 54 24 48 48 c7 c6 21 11 82 a0 48 89 df 31 c0 e8 ab 8d a1 e0 <49> 8b 54 24 50 48 c7 c6 3d 11 82 a0 48 89 df 31 c0 e8 95 8d a1 [635644.401054] RIP [<ffffffffa07fa405>] osc_stats_seq_show+0x65/0xb0 [osc] [635644.401766] RSP <ffff88006f363e68> [635644.402351] CR2: ffff8802d00b1fd0 (gdb) l *(osc_stats_seq_show+0x65) 0xe405 is in osc_stats_seq_show (/home/green/git/lustre-release/lustre/osc/lproc_osc.c:798). 793 794 seq_printf(seq, "snapshot_time: %lld.%09lu (secs.nsecs)\n", 795 (s64)now.tv_sec, now.tv_nsec); 796 seq_printf(seq, "lockless_write_bytes\t\t%llu\n", 797 stats->os_lockless_writes); 798 seq_printf(seq, "lockless_read_bytes\t\t%llu\n", 799 stats->os_lockless_reads); 800 seq_printf(seq, "lockless_truncate\t\t%llu\n", 801 stats->os_lockless_truncates); 802 return 0; So it looks like the problem became less severe, but is still there. |