Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.5.5
-
RHEL 7.2 derivative, TOSS 3
kernel 3.10.0-327.13.1.3chaos.ch6.x86_64
lustre-2.5.5-6chaos.4.ch6.x86_64
Servers are lustre 2.5 on TOSS 2/RHEL6.7
-
3
-
9223372036854775807
Description
On one cluster, copying a file stored on the lustre filesystem causes the node to crash with BUG: unable to handle kernel paging request at 00007fffffffa650 reported in the console log:
2016-06-08 17:01:54 [ 604.544195] BUG: unable to handle kernel paging request at 00007fffffffa650 2016-06-08 17:01:54 [ 604.552365] IP: [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre] 2016-06-08 17:01:54 [ 604.559682] PGD 201eb17067 PUD 200b0af067 PMD 2023ba0067 PTE 8000000feafd7067 2016-06-08 17:01:54 [ 604.567958] Oops: 0001 1 SMP 2016-06-08 17:01:54 [ 604.571838] Modules linked in: lmv(OE) fld(OE) mgc(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) fid(OE) ptlrpc(OE) obdclass(OE) rpcsec_gss_krb5 ko2iblnd(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables nfsv3 nf_log_ipv4 nf_log_common xt_LOG xt_multiport xfs libcrc32c intel_powerclamp coretemp intel_rapl kvm iTCO_wdt iTCO_vendor_support ipmi_devintf hfi1(OE) sb_edac mei_me lpc_ich edac_core sg pcspkr mei mfd_core i2c_i801 shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi binfmt_misc ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfsd nfs_acl auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sd_mod crc_t10dif crct10dif_generic mxm_wmi crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel mgag200 ghash_clmulni_intel syscopyarea sysfillrect sysimgblt drm_kms_helper aesni_intel lrw igb ttm gf128mul ahci glue_helper dca ablk_helper libahci ptp drm cryptd pps_core libata i2c_algo_bit i2c_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod scsi_transport_iscsi zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate [last unloaded: ip_tables] 2016-06-08 17:01:55 [ 604.708455] CPU: 38 PID: 7679 Comm: cp Tainted: P OE ------------ 3.10.0-327.13.1.3chaos.ch6.x86_64 #1 2016-06-08 17:01:55 [ 604.720645] Hardware name: Penguin Computing Relion 2900e/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 2016-06-08 17:01:55 [ 604.733342] task: ffff88101bb78b80 ti: ffff881019fbc000 task.ti: ffff881019fbc000 2016-06-08 17:01:55 [ 604.742269] RIP: 0010:[<ffffffffa113b3d7>] [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre] 2016-06-08 17:01:55 [ 604.752594] RSP: 0018:ffff881019fbfe78 EFLAGS: 00010206 2016-06-08 17:01:55 [ 604.759123] RAX: 00007fffffffa650 RBX: 0000000000000fe0 RCX: ffff881020fe1880 2016-06-08 17:01:55 [ 604.767703] RDX: 0000000000000002 RSI: ffff88101e6d2000 RDI: ffff8810235b1b48 2016-06-08 17:01:55 [ 604.776293] RBP: ffff881019fbfeb0 R08: 0000000000000000 R09: ffff88101e6d2000 2016-06-08 17:01:55 [ 604.784890] R10: ffffffffa113b27b R11: 0000000000000000 R12: ffff88101e6d2020 2016-06-08 17:01:55 [ 604.793493] R13: 7fffffffffffffff R14: ffff881019fbfec8 R15: ffff88101e6d2000 2016-06-08 17:01:55 [ 604.802102] FS: 00002aaaaab0a6c0(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000 2016-06-08 17:01:55 [ 604.811797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2016-06-08 17:01:55 [ 604.818875] CR2: 00007fffffffa650 CR3: 0000002022ce0000 CR4: 00000000003407e0 2016-06-08 17:01:55 [ 604.827521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2016-06-08 17:01:55 [ 604.836175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 2016-06-08 17:01:55 [ 604.844830] Stack: 2016-06-08 17:01:55 [ 604.847761] ffff8810235b1b48 0000000000000000 0000000000000000 ffff8810235b1b48 2016-06-08 17:01:55 [ 604.856779] 00007fffffffa630 0000000000000003 0000000000000000 ffff881019fbff28 2016-06-08 17:01:55 [ 604.865801] ffffffff811fb409 7fffffffffffffff 0000000000000001 0000000000000048 2016-06-08 17:01:55 [ 604.874833] Call Trace: 2016-06-08 17:01:55 [ 604.878300] [<ffffffff811fb409>] do_vfs_ioctl+0x169/0x510 2016-06-08 17:01:55 [ 604.885174] [<ffffffff811fb851>] SyS_ioctl+0xa1/0xc0 2016-06-08 17:01:55 [ 604.891572] [<ffffffff8165cd49>] system_call_fastpath+0x16/0x1b 2016-06-08 17:01:55 [ 604.899040] Code: a0 d1 08 e0 44 89 e8 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00 49 8b 46 10 48 8b 7d c8 4c 89 fe 4d 8d 67 20 <48> 8b 10 49 89 57 20 48 8b 50 08 49 89 57 28 48 8b 50 10 49 89 2016-06-08 17:01:55 [ 604.922361] RIP [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre] 2016-06-08 17:01:55 [ 604.930295] RSP <ffff881019fbfe78> 2016-06-08 17:01:55 [ 604.935013] CR2: 00007fffffffa650 2016-06-08 17:01:55 [ 605.513479] --[ end trace d79d98174ba667ee ]-- 2016-06-08 17:01:55 [ 605.521246] Kernel panic - not syncing: Fatal exception
Another cluster running Lustre 2.5 client on TOSS3, mounting a TOSS2/Lustre2.5 server does not show the same problem.