[LU-8256] BUG: unable to handle kernel paging request Created: 09/Jun/16 Updated: 14/Jun/18 Resolved: 04/Jan/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Olaf Faaland | Assignee: | Zhenyu Xu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | llnl, llnlfixready | ||
| Environment: |
RHEL 7.2 derivative, TOSS 3 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
On one cluster, copying a file stored on the lustre filesystem causes the node to crash with BUG: unable to handle kernel paging request at 00007fffffffa650 reported in the console log: 2016-06-08 17:01:54 [ 604.544195] BUG: unable to handle kernel paging request at 00007fffffffa650 2016-06-08 17:01:54 [ 604.552365] IP: [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre] 2016-06-08 17:01:54 [ 604.559682] PGD 201eb17067 PUD 200b0af067 PMD 2023ba0067 PTE 8000000feafd7067 2016-06-08 17:01:54 [ 604.567958] Oops: 0001 1 SMP 2016-06-08 17:01:54 [ 604.571838] Modules linked in: lmv(OE) fld(OE) mgc(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) fid(OE) ptlrpc(OE) obdclass(OE) rpcsec_gss_krb5 ko2iblnd(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables nfsv3 nf_log_ipv4 nf_log_common xt_LOG xt_multiport xfs libcrc32c intel_powerclamp coretemp intel_rapl kvm iTCO_wdt iTCO_vendor_support ipmi_devintf hfi1(OE) sb_edac mei_me lpc_ich edac_core sg pcspkr mei mfd_core i2c_i801 shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi binfmt_misc ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfsd nfs_acl auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sd_mod crc_t10dif crct10dif_generic mxm_wmi crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel mgag200 ghash_clmulni_intel syscopyarea sysfillrect sysimgblt drm_kms_helper aesni_intel lrw igb ttm gf128mul ahci glue_helper dca ablk_helper libahci ptp drm cryptd pps_core libata i2c_algo_bit i2c_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod scsi_transport_iscsi zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate [last unloaded: ip_tables] 2016-06-08 17:01:55 [ 604.708455] CPU: 38 PID: 7679 Comm: cp Tainted: P OE ------------ 3.10.0-327.13.1.3chaos.ch6.x86_64 #1 2016-06-08 17:01:55 [ 604.720645] Hardware name: Penguin Computing Relion 2900e/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 2016-06-08 17:01:55 [ 604.733342] task: ffff88101bb78b80 ti: ffff881019fbc000 task.ti: ffff881019fbc000 2016-06-08 17:01:55 [ 604.742269] RIP: 0010:[<ffffffffa113b3d7>] [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre] 2016-06-08 17:01:55 [ 604.752594] RSP: 0018:ffff881019fbfe78 EFLAGS: 00010206 2016-06-08 17:01:55 [ 604.759123] RAX: 00007fffffffa650 RBX: 0000000000000fe0 RCX: ffff881020fe1880 2016-06-08 17:01:55 [ 604.767703] RDX: 0000000000000002 RSI: ffff88101e6d2000 RDI: ffff8810235b1b48 2016-06-08 17:01:55 [ 604.776293] RBP: ffff881019fbfeb0 R08: 0000000000000000 R09: ffff88101e6d2000 2016-06-08 17:01:55 [ 604.784890] R10: ffffffffa113b27b R11: 0000000000000000 R12: ffff88101e6d2020 2016-06-08 17:01:55 [ 604.793493] R13: 7fffffffffffffff R14: ffff881019fbfec8 R15: ffff88101e6d2000 2016-06-08 17:01:55 [ 604.802102] FS: 00002aaaaab0a6c0(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000 2016-06-08 17:01:55 [ 604.811797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2016-06-08 17:01:55 [ 604.818875] CR2: 00007fffffffa650 CR3: 0000002022ce0000 CR4: 00000000003407e0 2016-06-08 17:01:55 [ 604.827521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2016-06-08 17:01:55 [ 604.836175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 2016-06-08 17:01:55 [ 604.844830] Stack: 2016-06-08 17:01:55 [ 604.847761] ffff8810235b1b48 0000000000000000 0000000000000000 ffff8810235b1b48 2016-06-08 17:01:55 [ 604.856779] 00007fffffffa630 0000000000000003 0000000000000000 ffff881019fbff28 2016-06-08 17:01:55 [ 604.865801] ffffffff811fb409 7fffffffffffffff 0000000000000001 0000000000000048 2016-06-08 17:01:55 [ 604.874833] Call Trace: 2016-06-08 17:01:55 [ 604.878300] [<ffffffff811fb409>] do_vfs_ioctl+0x169/0x510 2016-06-08 17:01:55 [ 604.885174] [<ffffffff811fb851>] SyS_ioctl+0xa1/0xc0 2016-06-08 17:01:55 [ 604.891572] [<ffffffff8165cd49>] system_call_fastpath+0x16/0x1b 2016-06-08 17:01:55 [ 604.899040] Code: a0 d1 08 e0 44 89 e8 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00 49 8b 46 10 48 8b 7d c8 4c 89 fe 4d 8d 67 20 <48> 8b 10 49 89 57 20 48 8b 50 08 49 89 57 28 48 8b 50 10 49 89 2016-06-08 17:01:55 [ 604.922361] RIP [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre] 2016-06-08 17:01:55 [ 604.930295] RSP <ffff881019fbfe78> 2016-06-08 17:01:55 [ 604.935013] CR2: 00007fffffffa650 2016-06-08 17:01:55 [ 605.513479] --[ end trace d79d98174ba667ee ]-- 2016-06-08 17:01:55 [ 605.521246] Kernel panic - not syncing: Fatal exception Another cluster running Lustre 2.5 client on TOSS3, mounting a TOSS2/Lustre2.5 server does not show the same problem. |
| Comments |
| Comment by Olaf Faaland [ 09/Jun/16 ] |
|
This lustre tag is available where Chris normally pushes our branch. Peter knows where that is. Code indicated by RIP in the error output is: (gdb) l *( ll_fiemap+0x1a7) 0x173d7 is in ll_fiemap (/usr/src/debug/lustre-2.5.5/lustre/llite/file.c:3319). 3314 fiemap->fm_flags = fieinfo->fi_flags; 3315 fiemap->fm_extent_count = fieinfo->fi_extents_max; 3316 fiemap->fm_start = start; 3317 fiemap->fm_length = len; 3318 if (extent_count > 0) 3319 memcpy(&fiemap->fm_extents[0], fieinfo->fi_extents_start, 3320 sizeof(struct ll_fiemap_extent)); 3321 3322 rc = ll_do_fiemap(inode, fiemap, num_bytes); |
| Comment by Olaf Faaland [ 09/Jun/16 ] |
|
I'm not at all certain, but it looks to me like line 3319 should be using copy_from_user, not memcpy(). ll_fiemap() is called from the kernel by ioctl_fiemap(). From what I can see, fieinfo->fi_extents_start points to user memory (ufiemap->fm_extents). static int ioctl_fiemap(struct file *filp, unsigned long arg)
{
struct fiemap fiemap;
struct fiemap __user *ufiemap = (struct fiemap __user *) arg;
struct fiemap_extent_info fieinfo = { 0, };
struct inode *inode = file_inode(filp);
struct super_block *sb = inode->i_sb;
u64 len;
int error;
if (!inode->i_op->fiemap)
return -EOPNOTSUPP;
if (copy_from_user(&fiemap, ufiemap, sizeof(fiemap)))
return -EFAULT;
if (fiemap.fm_extent_count > FIEMAP_MAX_EXTENTS)
return -EINVAL;
error = fiemap_check_ranges(sb, fiemap.fm_start, fiemap.fm_length,
&len);
if (error)
return error;
fieinfo.fi_flags = fiemap.fm_flags;
fieinfo.fi_extents_max = fiemap.fm_extent_count;
fieinfo.fi_extents_start = ufiemap->fm_extents;
if (fiemap.fm_extent_count != 0 &&
!access_ok(VERIFY_WRITE, fieinfo.fi_extents_start,
fieinfo.fi_extents_max * sizeof(struct fiemap_extent)))
return -EFAULT;
if (fieinfo.fi_flags & FIEMAP_FLAG_SYNC)
filemap_write_and_wait(inode->i_mapping);
error = inode->i_op->fiemap(inode, &fieinfo, fiemap.fm_start, len);
|
| Comment by Andreas Dilger [ 10/Jun/16 ] |
|
It looks like this was fixed in patch http://review.whamcloud.com/13608 " There is a patch for b2_7_fe at: http://review.whamcloud.com/16815 The patch also needs to be backported to b2_5_fe and b_ieel2_0. |
| Comment by Zhenyu Xu [ 11/Jun/16 ] |
|
b2_5_fe backport at http://review.whamcloud.com/20731 |
| Comment by Olaf Faaland [ 14/Jun/16 ] |
|
Thanks for working on the backport. However it's a big patch. Is the all the change appropriate for a stable branch? It looks to me that most of it is pruning unnecessary code, as opposed to fixing code that is wrong. |
| Comment by Olaf Faaland [ 14/Jun/16 ] |
|
I made a smaller change and tested it locally on the cluster that was hitting the issue. It worked there. I assume I won't be able to push to the b2_5_fe branch, so I've attached it as 0001- |
| Comment by Olaf Faaland [ 15/Jun/16 ] |
|
Actually, I'm not sure about this patch. I ran my cp command under strace, and I see the FIEMAP ioctl returned EOPNOTSUPP. This does indeed allow the copy to succeed without crashing the node, but I don't understand why my patch had that effect. The patch just changes the memcpy calls to copy_to/from_user. |
| Comment by Zhenyu Xu [ 15/Jun/16 ] |
|
yes, your patch can solve this issue as well. As the EOPNOTSUPP of the fiemap path, I can see one place has this /* If the stripe_count > 1 and the application does not understand
* DEVICE_ORDER flag, then it cannot interpret the extents correctly.
*/
if (lsm->lsm_stripe_count > 1 &&
!(fiemap->fm_flags & FIEMAP_FLAG_DEVICE_ORDER))
GOTO(out, rc = -EOPNOTSUPP);
the file data crosses multiple OST stripes, and fiemap will return each OST index in the fiemap_extent::fe_reserved[0], and lustre-aware-less application can not interpret the extents and use it correctly. |
| Comment by Olaf Faaland [ 15/Jun/16 ] |
|
I see. Thank you for the explanation. |
| Comment by Olaf Faaland [ 15/Jun/16 ] |
|
Zhenyu, LustreError: 11-0: fsrzb-OST0009-osc-ffff88200c9f7000: Communicating with 172.21.1.110@o2ib200, operation ost_get_info failed with -95. LustreError: 170050:0:(file.c:1900:ll_do_fiemap()) obd_get_info failed: rc = -95 |
| Comment by Olaf Faaland [ 15/Jun/16 ] |
|
dbo_fiemap_get is not implemented in the ZFS osd, because zfs does not support fiemap (not that it could not, it's just not implemented). We suspect that ofd_get_info() is calling dt_fiemap_get() which returns -EOPNOTSUPP, and that error code is passed back to the client. |
| Comment by Olaf Faaland [ 15/Jun/16 ] |
|
I think I've confirmed that is what is happening: On the client: [root@rzgenie48:fiemap]# lfs getstripe test.1k.sparse.1extent
test.1k.sparse.1extent
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 9
obdidx objid objid group
9 259446897 0xf76d871 0
[root@rzgenie48:fiemap]# strace cp --sparse=always test.1k.sparse.1extent test.1k.sparse.1extent.copy2 2>&1 | grep ioctl
ioctl(3, FS_IOC_FIEMAP, 0x7fffffffc760) = -1 EOPNOTSUPP (Operation not supported)
[root@rzgenie48:fiemap]# lctl list_nids
192.168.128.48@o2ib21
Logs on the OSS: 00000100:00100000:9.0:1466033830.426303:0:10320:0:(service.c:2086:ptlrpc_server_handle_request()) Handl ing RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost02_006:68119517-017d-eeee-db17-abd63a7e2e89+5:125727:x1537240460206584:12345-192.168.128.48@o2ib21:7 00002000:00000001:9.0:1466033830.426308:0:10320:0:(ofd_obd.c:536:ofd_get_info()) Process entered 00002000:00000001:9.0:1466033830.426309:0:10320:0:(ofd_obd.c:599:ofd_get_info()) Process leaving via out (rc=0 : 0 : 0x0) 00002000:00000001:9.0:1466033830.426310:0:10320:0:(ofd_obd.c:669:ofd_get_info()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:9.0:1466033830.426311:0:10320:0:(pack_generic.c:336:lustre_pack_reply_v2()) Process entered 00000100:00000001:9.0:1466033830.426313:0:10320:0:(pack_generic.c:370:lustre_pack_reply_v2()) Process leaving (rc=0 : 0 : 0) 00002000:00000001:9.0:1466033830.426314:0:10320:0:(ofd_obd.c:536:ofd_get_info()) Process entered 00002000:00000001:9.0:1466033830.426315:0:10320:0:(ofd_objects.c:81:ofd_object_find()) Process entered 00002000:00000001:9.0:1466033830.426317:0:10320:0:(ofd_objects.c:89:ofd_object_find()) Process leaving (rc=18446612192165995760 : -131881543555856 : ffff880def71a4f0) 00002000:00000001:9.0:1466033830.426319:0:10320:0:(ofd_obd.c:669:ofd_get_info()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1) 00000100:00000001:9.0:1466033830.426321:0:10320:0:(niobuf.c:649:ptlrpc_send_error()) Process entered 00000100:00000001:9.0:1466033830.426323:0:10320:0:(connection.c:136:ptlrpc_connection_addref()) Process entered 00000100:00000001:9.0:1466033830.426323:0:10320:0:(connection.c:143:ptlrpc_connection_addref()) Process leaving (rc=18446612201684338752 : -131872025212864 : ffff881026c80840) 00000100:00000001:9.0:1466033830.426325:0:10320:0:(niobuf.c:60:ptl_send_buf()) Process entered 00000100:00000001:9.0:1466033830.426332:0:10320:0:(niobuf.c:101:ptl_send_buf()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:9.0:1466033830.426333:0:10320:0:(connection.c:99:ptlrpc_connection_put()) Process entered 00000100:00000001:9.0:1466033830.426334:0:10320:0:(connection.c:129:ptlrpc_connection_put()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:9.0:1466033830.426335:0:10320:0:(niobuf.c:666:ptlrpc_send_error()) Process leaving (rc=0 : 0 : 0) 00000100:00100000:9.0:1466033830.426338:0:10320:0:(service.c:2130:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost02_006:68119517-017d-eeee-db17-abd63a7e2e89+5:125727:x1537240460206584:12345-192.168.128.48@o2ib21:7 Request procesed in 38us (104us total) trans 0 rc -95/-95 |
| Comment by Olaf Faaland [ 15/Jun/16 ] |
|
So, I believe your patch is working as expected. |
| Comment by Zhenyu Xu [ 16/Jun/16 ] |
|
great, thank you for your search info. |
| Comment by Scott Wood [ 09/Sep/16 ] |
|
Not sure if I should open another issue, but I have seen this with 2.7.0. Clients and servers were both installed using the RPMs from the Intel community repos. Servers are running CentOS6 with 2.6.32-504.8.1.el6_lustre.x86_64 kernel (again from the Intel rpms) and CentOS7 clients running the CentOS provided 3.10.0-123.20.1.el7.x86_64 kernel. One user can reliably bring down a host and processing the kdump vmcore produces the following: [root@hpcnode045 crash]# crash 127.0.0.1-2016-09-09-11\:54\:04/vmcore /usr/lib/debug/lib/modules/3.10.0-123.20.1.el7.x86_64/vmlinux crash 7.1.2-3.el7_2.1 Copyright (C) 2002-2014 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: /usr/lib/debug/lib/modules/3.10.0-123.20.1.el7.x86_64/vmlinux DUMPFILE: 127.0.0.1-2016-09-09-11:54:04/vmcore [PARTIAL DUMP] CPUS: 56 DATE: Fri Sep 9 11:53:48 2016 UPTIME: 00:08:59 LOAD AVERAGE: 10.24, 10.07, 5.45 TASKS: 1137 NODENAME: hpcnode045.adqimr.ad.lan RELEASE: 3.10.0-123.20.1.el7.x86_64 VERSION: #1 SMP Thu Jan 29 18:05:33 UTC 2015 MACHINE: x86_64 (2594 Mhz) MEMORY: 255.9 GB PANIC: "BUG: unable to handle kernel paging request at 00007fff6fe704b0" PID: 4366 COMMAND: "cp" TASK: ffff883fcc340000 [THREAD_INFO: ffff883fb92c2000] CPU: 49 STATE: TASK_RUNNING (PANIC) crash>bt PID: 4366 TASK: ffff883fcc340000 CPU: 49 COMMAND: "cp" #0 [ffff883fb92c3aa0] machine_kexec at ffffffff81041191 #1 [ffff883fb92c3af8] crash_kexec at ffffffff810cf2b2 #2 [ffff883fb92c3bc8] oops_end at ffffffff815eb648 #3 [ffff883fb92c3bf0] no_context at ffffffff815db9e4 #4 [ffff883fb92c3c40] __bad_area_nosemaphore at ffffffff815dba7a #5 [ffff883fb92c3c88] bad_area_nosemaphore at ffffffff815dbbe4 #6 [ffff883fb92c3c98] __do_page_fault at ffffffff815ee46e #7 [ffff883fb92c3d98] do_page_fault at ffffffff815ee68a #8 [ffff883fb92c3dc0] page_fault at ffffffff815ea8c8 [exception RIP: ll_fiemap+591] RIP: ffffffffa0ba3bbf RSP: ffff883fb92c3e78 RFLAGS: 00010206 RAX: 00007fff6fe704b0 RBX: 0000000000000fe0 RCX: ffff883fcf76a580 RDX: 0000000000000002 RSI: ffff883fd1572000 RDI: ffff883fcf428108 RBP: ffff883fb92c3eb0 R8: 0000000000000000 R9: ffff883fd1572000 R10: ffff881ffec07300 R11: ffffffffa0814754 R12: ffff883fd1572020 R13: 0000000000000000 R14: ffff883fb92c3ec8 R15: ffff883fd1572000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff883fb92c3e70] ll_fiemap at ffffffffa0ba3a06 [lustre] #10 [ffff883fb92c3eb8] do_vfs_ioctl at ffffffff811c2b04 #11 [ffff883fb92c3f30] sys_ioctl at ffffffff811c2f01 #12 [ffff883fb92c3f80] system_call_fastpath at ffffffff815f3219 RIP: 00007f9dd8e64d77 RSP: 00007fff6fe71340 RFLAGS: 00010202 RAX: 0000000000000010 RBX: ffffffff815f3219 RCX: 00007f9dd8e685da RDX: 00007fff6fe70490 RSI: 00000000c020660b RDI: 0000000000000003 RBP: 7fffffffffffffff R8: 00000000ffffffff R9: 0000000000401007 R10: 00007fff6fe70210 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fff6fe71600 R15: 00007fff6fe70490 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b |
| Comment by Peter Jones [ 04/Jan/17 ] |
|
This was fixed on master as part of |