Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
None
-
kernel: 2.6.32-220.23.1.1chaos.ch5.x86_64.debug
lustre: orion-2_3_49_54_1-55chaos + http://review.whamcloud.com/3355
-
3
-
3071
Description
I'm trying to run Lustre-Orion against a debug kernel on the MDS and hit this BUG twice yesterday. There are a couple "[...] used gratest stack depth" messages, so I'm curious if the stack was stomped on causing the crash.
zpool used greatest stack depth: 1552 bytes left Lustre: Lustre: Build Version: 2.0.59-llnl3-base-DEBUG--CHANGED-2.6.32-220.23.1.1chaos.ch5.x86_64.debug Lustre: MGS: Mounted grove-mds2/mgs mount.lustre used greatest stack depth: 1280 bytes left LustreError: 11-0: MGC172.20.5.2@o2ib500: Communicating with 0@lo, operation llog_origin_handle_create failed with -2 LustreError: 20904:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2 Lustre: 20909:0:(fld_index.c:354:fld_index_init()) srv-lstest-MDT0000: File "fld" doesn't support range lookup, using stub. DNE and FIDs on OST will not work with this backend ib0: no IPv6 routers present Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.154@o2ib500 Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.191@o2ib500 Lustre: lstest-MDT0000: Mounted grove-mds2/mdt0 Lustre: lstest-MDT0000: Will be in recovery for at least 5:00, or until 256 clients reconnect. ------------[ cut here ]------------ kernel BUG at /usr/src/kernels/2.6.32-220.23.1.1chaos.ch5.x86_64.debug/include/linux/scatterlist.h:65! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/module/ptlrpc/initstate CPU 20 Modules linked in: osp(U) mdt(U) mdd(U) lod(U) mgs(U) mgc(U) osd_zfs(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) acpi_cpufreq freq_table mperf ko2iblnd(U) lnet(U) libcfs(U) ib_ipoib ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath dm_mod vhost_net macvtap macvlan tun kvm_intel kvm zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ses enclosure sg sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core shpchp ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core igb dca [last unloaded: cpufreq_ondemand] Pid: 20891, comm: ll_mgs_02 Tainted: P W ---------------- 2.6.32-220.23.1.1chaos.ch5.x86_64.debug #1 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[<ffffffffa0693ca6>] [<ffffffffa0693ca6>] kiblnd_setup_rd_iov+0x1f6/0x2f0 [ko2iblnd] RSP: 0018:ffff88178f87d960 EFLAGS: 00010293 RAX: ffffea00a792e280 RBX: ffff882fdddbe408 RCX: 0000000000000000 RDX: 00000000000020c0 RSI: 0000000087654321 RDI: ffff882fe0d30148 RBP: ffff88178f87d9b0 R08: ffff882fdddbe408 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff882fe0d30148 R13: ffffc9004ed47000 R14: 00000000000020c0 R15: 0000000000000000 FS: 00007ffff7fdc700(0000) GS:ffff881895800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000006d3e70 CR3: 0000002ffa068000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ll_mgs_02 (pid: 20891, threadinfo ffff88178f87c000, task ffff88178f878ac0) Stack: ffffc9002b567748 ffff8817b7ee9818 ffff8817b5004000 00000001dddbc058 <0> 00000000000020c0 ffff882fdddbc058 00000000000020c0 ffffc9002b567748 <0> 0000000000000001 000501f4ac14043d ffff88178f87da50 ffffffffa069892a Call Trace: [<ffffffffa069892a>] kiblnd_send+0x59a/0x870 [ko2iblnd] [<ffffffffa062e359>] ? lnet_send+0x59/0x9f0 [lnet] [<ffffffffa062a14b>] lnet_ni_send+0x4b/0x110 [lnet] [<ffffffffa062e55b>] lnet_send+0x25b/0x9f0 [lnet] [<ffffffffa062f5bb>] LNetPut+0x2ab/0x670 [lnet] [<ffffffffa086a71e>] ptl_send_buf+0x18e/0x440 [ptlrpc] [<ffffffffa08875f0>] ? at_measured+0x1e0/0x320 [ptlrpc] [<ffffffffa08a2285>] ? null_authorize+0x75/0x110 [ptlrpc] [<ffffffffa086ac2f>] ptlrpc_send_reply+0x25f/0x770 [ptlrpc] [<ffffffffa08425e4>] target_send_reply_msg+0x54/0x160 [ptlrpc] [<ffffffffa0842a3e>] target_send_reply+0x34e/0x680 [ptlrpc] [<ffffffffa08868d3>] ? llog_origin_handle_read_header+0x193/0x520 [ptlrpc] [<ffffffffa0c8cd16>] mgs_handle+0xd6/0x1020 [mgs] [<ffffffffa0706a0f>] ? keys_fill+0x6f/0x1a0 [obdclass] [<ffffffffa08717f4>] ? lustre_msg_get_transno+0x54/0x90 [ptlrpc] [<ffffffffa087bc6c>] ptlrpc_server_handle_request+0x3fc/0xce0 [ptlrpc] [<ffffffffa059256e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa059ff09>] ? lc_watchdog_touch+0x79/0x110 [libcfs] [<ffffffffa0876e20>] ? ptlrpc_wait_event+0xb0/0x2b0 [ptlrpc] [<ffffffff810aeb6d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81055043>] ? __wake_up+0x53/0x70 [<ffffffffa087df00>] ptlrpc_main+0x710/0x1190 [ptlrpc] [<ffffffff810aeb6d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffa087d7f0>] ? ptlrpc_main+0x0/0x1190 [ptlrpc] [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff815231f0>] ? _spin_unlock_irq+0x30/0x40 [<ffffffff8100bb50>] ? restore_args+0x0/0x30 [<ffffffffa087d7f0>] ? ptlrpc_main+0x0/0x1190 [ptlrpc] [<ffffffff8100c200>] ? child_rip+0x0/0x20 Code: 35 f2 01 00 00 00 04 00 e8 28 a8 f0 ff 48 c7 c7 60 2e 6b a0 c7 05 df f1 01 00 00 00 04 00 e8 02 e2 ef ff 0f 0b eb fe 0f 0b eb fe <0f> 0b 0f 1f 84 00 00 00 00 00 eb f6 48 c7 c7 20 2e 6b a0 48 c7 RIP [<ffffffffa0693ca6>] kiblnd_setup_rd_iov+0x1f6/0x2f0 [ko2iblnd] RSP <ffff88178f87d960>
crash> bt PID: 20891 TASK: ffff88178f878ac0 CPU: 20 COMMAND: "ll_mgs_02" #0 [ffff88178f87d620] machine_kexec at ffffffff81032ad0 #1 [ffff88178f87d680] crash_kexec at ffffffff810cab52 #2 [ffff88178f87d750] oops_end at ffffffff81524c20 #3 [ffff88178f87d780] die at ffffffff8100f3bb #4 [ffff88178f87d7b0] do_trap at ffffffff81524334 #5 [ffff88178f87d810] do_invalid_op at ffffffff8100cff5 #6 [ffff88178f87d8b0] invalid_op at ffffffff8100bf9b [exception RIP: kiblnd_setup_rd_iov+502] RIP: ffffffffa0693ca6 RSP: ffff88178f87d960 RFLAGS: 00010293 RAX: ffffea00a792e280 RBX: ffff882fdddbe408 RCX: 0000000000000000 RDX: 00000000000020c0 RSI: 0000000087654321 RDI: ffff882fe0d30148 RBP: ffff88178f87d9b0 R8: ffff882fdddbe408 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff882fe0d30148 R13: ffffc9004ed47000 R14: 00000000000020c0 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88178f87d9b8] kiblnd_send at ffffffffa069892a [ko2iblnd] #8 [ffff88178f87da58] lnet_ni_send at ffffffffa062a14b [lnet] #9 [ffff88178f87da78] lnet_send at ffffffffa062e55b [lnet] #10 [ffff88178f87dae8] LNetPut at ffffffffa062f5bb [lnet] #11 [ffff88178f87db48] ptl_send_buf at ffffffffa086a71e [ptlrpc] #12 [ffff88178f87dbf8] ptlrpc_send_reply at ffffffffa086ac2f [ptlrpc] #13 [ffff88178f87dc78] target_send_reply_msg at ffffffffa08425e4 [ptlrpc] #14 [ffff88178f87dca8] target_send_reply at ffffffffa0842a3e [ptlrpc] #15 [ffff88178f87dd18] mgs_handle at ffffffffa0c8cd16 [mgs] #16 [ffff88178f87dda8] ptlrpc_server_handle_request at ffffffffa087bc6c [ptlrpc] #17 [ffff88178f87de98] ptlrpc_main at ffffffffa087df00 [ptlrpc] #18 [ffff88178f87df48] kernel_thread at ffffffff8100c20a
From scatterlist.h:
55 static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
56
Attachments
Issue Links
- is duplicated by
-
LU-1714 crash upon loading libcfs module
- Resolved