Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.4.0
-
3
-
5624
Description
On reboot of the MDS this morning, we hit the following crash:
Lustre: lstest-MDT0000: Recovery over after 5:00, of 449 clients 194 recovered and 255 were evicted. Lustre: lstest-MDT0000: Client a449214e-338a-17fc-d926-5a3a273629af (at 172.20.17.191@o2ib500) reconnecting Lustre: lstest-MDT0000: Client cbe0af34-1d34-affd-7602-97be89da80f8 (at 172.20.17.89@o2ib500) reconnecting Lustre: Skipped 23 previous similar messages Lustre: lstest-MDT0000: Client 3f7011c2-9552-065c-9d01-46c12930b0a3 (at 172.20.17.63@o2ib500) reconnecting Lustre: Skipped 100 previous similar messages LustreError: 11-0: lstest-OST0197-osc-MDT0000: Communicating with 172.20.3.8@o2ib500, operation ost_connect failed with -19 LustreError: Skipped 33 previous similar messages LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11 LustreError: Skipped 63 previous similar messages Lustre: lstest-OST01fd-osc-MDT0000: Connection restored to lstest-OST01fd (at 172.20.3.109@o2ib500) Lustre: lstest-OST01a3-osc-MDT0000: Connection restored to lstest-OST01a3 (at 172.20.3.19@o2ib500) Lustre: lstest-OST01ca-osc-MDT0000: Connection restored to lstest-OST01ca (at 172.20.3.58@o2ib500) Lustre: Skipped 1 previous similar message Lustre: lstest-OST01cf-osc-MDT0000: Connection restored to lstest-OST01cf (at 172.20.3.63@o2ib500) Lustre: Skipped 2 previous similar messages Lustre: lstest-OST01ec-osc-MDT0000: Connection restored to lstest-OST01ec (at 172.20.3.92@o2ib500) Lustre: Skipped 11 previous similar messages Lustre: 32886:0:(llog.c:92:llog_free_handle()) Still busy: 2: 0x8d58:0x1:0: 64767 43517 1801984 1 Pid: 32886, comm: osp-syn-461 Call Trace: [<ffffffffa05be965>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0716e81>] llog_free_handle+0xb1/0x3c0 [obdclass] [<ffffffffa07171ed>] llog_close+0x5d/0x190 [obdclass] [<ffffffffa071cdf7>] llog_cat_cancel_records+0x137/0x360 [obdclass] [<ffffffffa0d30408>] osp_sync_process_committed+0x238/0x760 [osp] [<ffffffffa09259a7>] ? ptlrpcd_add_req+0x187/0x2e0 [ptlrpc] [<ffffffffa0d309c4>] osp_sync_process_queues+0x94/0x11c0 [osp] [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20 [<ffffffffa0718ecb>] llog_process_thread+0x8fb/0xe10 [obdclass] [<ffffffffa0d30930>] ? osp_sync_process_queues+0x0/0x11c0 [osp] [<ffffffffa071ac0d>] llog_process_or_fork+0x12d/0x660 [obdclass] [<ffffffffa071cc13>] llog_cat_process_cb+0x2d3/0x380 [obdclass] [<ffffffffa0718ecb>] llog_process_thread+0x8fb/0xe10 [obdclass] [<ffffffffa071c940>] ? llog_cat_process_cb+0x0/0x380 [obdclass] [<ffffffffa071ac0d>] llog_process_or_fork+0x12d/0x660 [obdclass] [<ffffffffa071b749>] llog_cat_process_or_fork+0x89/0x290 [obdclass] [<ffffffff8104cab9>] ? __wake_up_common+0x59/0x90 [<ffffffffa0d30930>] ? osp_sync_process_queues+0x0/0x11c0 [osp] [<ffffffffa071b969>] llog_cat_process+0x19/0x20 [obdclass] [<ffffffffa05bf83a>] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [<ffffffffa0d32aaa>] osp_sync_thread+0x22a/0x700 [osp] [<ffffffffa0d32880>] ? osp_sync_thread+0x0/0x700 [osp] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa0d32880>] ? osp_sync_thread+0x0/0x700 [osp] [<ffffffffa0d32880>] ? osp_sync_thread+0x0/0x700 [osp] [<ffffffff8100c140>] ? child_rip+0x0/0x20 general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:80/0000:80:02.2/0000:83:00.0/host7/port-7:1/expander-7:1/port-7:1:10/end_device-7:1:10/target7:0:30/7:0:30:0/state CPU 18 Modules linked in: osp(U) mdt(U) mdd(U) fid(U) fld(U) lod(U) mgs(U) mgc(U) osd_zfs(U) lquota(U) ptlrpc(U) obdclass(U) lvfs(U) acpi_cpufreq freq_table mperf ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath dm_mod vhost_net macvtap macvlan tun kvm zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate sg ses enclosure sd_mod crc_t10dif isci libsas wmi mpt2sas scsi_transport_sas raid_class sb_edac edac_core i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ahci ioatdma shpchp ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core igb dca [last unloaded: cpufreq_ondemand] Pid: 32886, comm: osp-syn-461 Tainted: P W ---------------- 2.6.32-220.23.1.2chaos.ch5.x86_64 #1 appro 2620x-in/S2600GZ RIP: 0010:[<ffffffffa071889c>] [<ffffffffa071889c>] llog_process_thread+0x2cc/0xe10 [obdclass] RSP: 0018:ffff881fa1b6db50 EFLAGS: 00010206 RAX: 5a5a5a5a5a5a5a5a RBX: 000000000000aa01 RCX: 0000000000000000 RDX: ffff881017306940 RSI: ffff881fa1c28000 RDI: ffff880f81308000 RBP: ffff881fa1b6dbf0 R08: ffff880fffeed800 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880f7f1bc058 R13: 000000000000fcff R14: ffff880f81306000 R15: ffff881fa1b6de80 FS: 00007ffff7fdc700(0000) GS:ffff880060740000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaafa4450 CR3: 0000000001a85000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process osp-syn-461 (pid: 32886, threadinfo ffff881fa1b6c000, task ffff881fa1988ae0) Stack: ffff880f00002000 0000000101e5b370 ffff880f00000000 ffff880f81306001 <0> 0000a9fe00000000 0000000000000000 0000000000000000 00000000002aa000 <0> 0000fd00a1b6dbd0 ffff880f81308000 ffff881017306940 ffff880fffeed800 Call Trace: [<ffffffffa0d30930>] ? osp_sync_process_queues+0x0/0x11c0 [osp] [<ffffffffa071ac0d>] llog_process_or_fork+0x12d/0x660 [obdclass] [<ffffffffa071cc13>] llog_cat_process_cb+0x2d3/0x380 [obdclass] [<ffffffffa0718ecb>] llog_process_thread+0x8fb/0xe10 [obdclass] [<ffffffffa071c940>] ? llog_cat_process_cb+0x0/0x380 [obdclass] [<ffffffffa071ac0d>] llog_process_or_fork+0x12d/0x660 [obdclass] [<ffffffffa071b749>] llog_cat_process_or_fork+0x89/0x290 [obdclass] [<ffffffff8104cab9>] ? __wake_up_common+0x59/0x90 [<ffffffffa0d30930>] ? osp_sync_process_queues+0x0/0x11c0 [osp] [<ffffffffa071b969>] llog_cat_process+0x19/0x20 [obdclass] [<ffffffffa05bf83a>] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [<ffffffffa0d32aaa>] osp_sync_thread+0x22a/0x700 [osp] [<ffffffffa0d32880>] ? osp_sync_thread+0x0/0x700 [osp] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa0d32880>] ? osp_sync_thread+0x0/0x700 [osp] [<ffffffffa0d32880>] ? osp_sync_thread+0x0/0x700 [osp] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Code: 78 0c 00 01 00 00 00 e8 33 6d eb ff 48 83 7d b8 00 0f 84 f8 03 00 00 4c 8b 45 b8 49 8b 80 b0 00 00 00 48 85 c0 0f 84 e4 03 00 00 <48> 8b 40 08 48 85 c0 0f 84 67 02 00 00 4d 89 f1 c7 04 24 00 20 RIP [<ffffffffa071889c>] llog_process_thread+0x2cc/0xe10 [obdclass] RSP <ffff881fa1b6db50>
crash> bt PID: 32886 TASK: ffff881fa1988ae0 CPU: 18 COMMAND: "osp-syn-461 " #0 [ffff881fa1b6d8e0] machine_kexec at ffffffff8103216b #1 [ffff881fa1b6d940] crash_kexec at ffffffff810b8d12 #2 [ffff881fa1b6da10] oops_end at ffffffff814f2c00 #3 [ffff881fa1b6da40] die at ffffffff8100f26b #4 [ffff881fa1b6da70] do_general_protection at ffffffff814f2792 #5 [ffff881fa1b6daa0] general_protection at ffffffff814f1f65 [exception RIP: llog_process_thread+716] RIP: ffffffffa071889c RSP: ffff881fa1b6db50 RFLAGS: 00010206 RAX: 5a5a5a5a5a5a5a5a RBX: 000000000000aa01 RCX: 0000000000000000 RDX: ffff881017306940 RSI: ffff881fa1c28000 RDI: ffff880f81308000 RBP: ffff881fa1b6dbf0 R8: ffff880fffeed800 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880f7f1bc058 R13: 000000000000fcff R14: ffff880f81306000 R15: ffff881fa1b6de80 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff881fa1b6dbf8] llog_process_or_fork at ffffffffa071ac0d [obdclass] #7 [ffff881fa1b6dc48] llog_cat_process_cb at ffffffffa071cc13 [obdclass] #8 [ffff881fa1b6dc88] llog_process_thread at ffffffffa0718ecb [obdclass] #9 [ffff881fa1b6dd38] llog_process_or_fork at ffffffffa071ac0d [obdclass] #10 [ffff881fa1b6dd88] llog_cat_process_or_fork at ffffffffa071b749 [obdclass] #11 [ffff881fa1b6de18] llog_cat_process at ffffffffa071b969 [obdclass] #12 [ffff881fa1b6de38] osp_sync_thread at ffffffffa0d32aaa [osp] #13 [ffff881fa1b6df48] kernel_thread at ffffffff8100c14a
(gdb) l *llog_process_thread+716 0x58cc is in llog_process_thread (/builddir/build/BUILD/lustre-2.3.56/lustre/include/lustre_log.h:662). 657 /builddir/build/BUILD/lustre-2.3.56/lustre/include/lustre_log.h: No such file or directory. in /builddir/build/BUILD/lustre-2.3.56/lustre/include/lustre_log.h
lustre/include/lustre_log.h: 649 static inline int llog_next_block(const struct lu_env *env, 650 struct llog_handle *loghandle, int *cur_idx, 651 int next_idx, __u64 *cur_offset, void *buf, 652 int len) 653 { 654 struct llog_operations *lop; 655 int rc; 656 657 ENTRY; 658 659 rc = llog_handle2ops(loghandle, &lop); 660 if (rc) 661 RETURN(rc); 662 if (lop->lop_next_block == NULL) 663 RETURN(-EOPNOTSUPP); 664 665 rc = lop->lop_next_block(env, loghandle, cur_idx, next_idx, 666 cur_offset, buf, len); 667 RETURN(rc); 668 }
This is probably related to LU-2109.
Attachments
Issue Links
- is related to
-
LU-2109 __llog_process_thread() GPF
- Resolved