Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8256

BUG: unable to handle kernel paging request

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.5.5
    • RHEL 7.2 derivative, TOSS 3
      kernel 3.10.0-327.13.1.3chaos.ch6.x86_64
      lustre-2.5.5-6chaos.4.ch6.x86_64
      Servers are lustre 2.5 on TOSS 2/RHEL6.7
    • 3
    • 9223372036854775807

    Description

      On one cluster, copying a file stored on the lustre filesystem causes the node to crash with BUG: unable to handle kernel paging request at 00007fffffffa650 reported in the console log:

      2016-06-08 17:01:54 [ 604.544195] BUG: unable to handle kernel paging request at 00007fffffffa650
      2016-06-08 17:01:54 [ 604.552365] IP: [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre]
      2016-06-08 17:01:54 [ 604.559682] PGD 201eb17067 PUD 200b0af067 PMD 2023ba0067 PTE 8000000feafd7067
      2016-06-08 17:01:54 [ 604.567958] Oops: 0001 1 SMP
      2016-06-08 17:01:54 [ 604.571838] Modules linked in: lmv(OE) fld(OE) mgc(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) fid(OE) ptlrpc(OE) obdclass(OE) rpcsec_gss_krb5 ko2iblnd(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables nfsv3 nf_log_ipv4 nf_log_common xt_LOG xt_multiport xfs libcrc32c intel_powerclamp coretemp intel_rapl kvm iTCO_wdt iTCO_vendor_support ipmi_devintf hfi1(OE) sb_edac mei_me lpc_ich edac_core sg pcspkr mei mfd_core i2c_i801 shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi binfmt_misc ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfsd nfs_acl auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sd_mod crc_t10dif crct10dif_generic mxm_wmi crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel mgag200 ghash_clmulni_intel syscopyarea sysfillrect sysimgblt drm_kms_helper aesni_intel lrw igb ttm gf128mul ahci glue_helper dca ablk_helper libahci ptp drm cryptd pps_core libata i2c_algo_bit i2c_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod scsi_transport_iscsi zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate [last unloaded: ip_tables]
      2016-06-08 17:01:55 [ 604.708455] CPU: 38 PID: 7679 Comm: cp Tainted: P OE ------------ 3.10.0-327.13.1.3chaos.ch6.x86_64 #1
      2016-06-08 17:01:55 [ 604.720645] Hardware name: Penguin Computing Relion 2900e/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
      2016-06-08 17:01:55 [ 604.733342] task: ffff88101bb78b80 ti: ffff881019fbc000 task.ti: ffff881019fbc000
      2016-06-08 17:01:55 [ 604.742269] RIP: 0010:[<ffffffffa113b3d7>] [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre]
      2016-06-08 17:01:55 [ 604.752594] RSP: 0018:ffff881019fbfe78 EFLAGS: 00010206
      2016-06-08 17:01:55 [ 604.759123] RAX: 00007fffffffa650 RBX: 0000000000000fe0 RCX: ffff881020fe1880
      2016-06-08 17:01:55 [ 604.767703] RDX: 0000000000000002 RSI: ffff88101e6d2000 RDI: ffff8810235b1b48
      2016-06-08 17:01:55 [ 604.776293] RBP: ffff881019fbfeb0 R08: 0000000000000000 R09: ffff88101e6d2000
      2016-06-08 17:01:55 [ 604.784890] R10: ffffffffa113b27b R11: 0000000000000000 R12: ffff88101e6d2020
      2016-06-08 17:01:55 [ 604.793493] R13: 7fffffffffffffff R14: ffff881019fbfec8 R15: ffff88101e6d2000
      2016-06-08 17:01:55 [ 604.802102] FS: 00002aaaaab0a6c0(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000
      2016-06-08 17:01:55 [ 604.811797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      2016-06-08 17:01:55 [ 604.818875] CR2: 00007fffffffa650 CR3: 0000002022ce0000 CR4: 00000000003407e0
      2016-06-08 17:01:55 [ 604.827521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      2016-06-08 17:01:55 [ 604.836175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      2016-06-08 17:01:55 [ 604.844830] Stack:
      2016-06-08 17:01:55 [ 604.847761] ffff8810235b1b48 0000000000000000 0000000000000000 ffff8810235b1b48
      2016-06-08 17:01:55 [ 604.856779] 00007fffffffa630 0000000000000003 0000000000000000 ffff881019fbff28
      2016-06-08 17:01:55 [ 604.865801] ffffffff811fb409 7fffffffffffffff 0000000000000001 0000000000000048
      2016-06-08 17:01:55 [ 604.874833] Call Trace:
      2016-06-08 17:01:55 [ 604.878300] [<ffffffff811fb409>] do_vfs_ioctl+0x169/0x510
      2016-06-08 17:01:55 [ 604.885174] [<ffffffff811fb851>] SyS_ioctl+0xa1/0xc0
      2016-06-08 17:01:55 [ 604.891572] [<ffffffff8165cd49>] system_call_fastpath+0x16/0x1b
      2016-06-08 17:01:55 [ 604.899040] Code: a0 d1 08 e0 44 89 e8 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00 49 8b 46 10 48 8b 7d c8 4c 89 fe 4d 8d 67 20 <48> 8b 10 49 89 57 20 48 8b 50 08 49 89 57 28 48 8b 50 10 49 89
      2016-06-08 17:01:55 [ 604.922361] RIP [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre]
      2016-06-08 17:01:55 [ 604.930295] RSP <ffff881019fbfe78>
      2016-06-08 17:01:55 [ 604.935013] CR2: 00007fffffffa650
      2016-06-08 17:01:55 [ 605.513479] --[ end trace d79d98174ba667ee ]--
      2016-06-08 17:01:55 [ 605.521246] Kernel panic - not syncing: Fatal exception
      

      Another cluster running Lustre 2.5 client on TOSS3, mounting a TOSS2/Lustre2.5 server does not show the same problem.

      Attachments

        Issue Links

          Activity

            [LU-8256] BUG: unable to handle kernel paging request

            So, I believe your patch is working as expected.

            ofaaland Olaf Faaland added a comment - So, I believe your patch is working as expected.

            I think I've confirmed that is what is happening:

            On the client:

            [root@rzgenie48:fiemap]# lfs getstripe test.1k.sparse.1extent
            test.1k.sparse.1extent
            lmm_stripe_count:   1
            lmm_stripe_size:    1048576
            lmm_pattern:        1
            lmm_layout_gen:     0
            lmm_stripe_offset:  9
                    obdidx           objid           objid           group
                         9       259446897      0xf76d871                0
            
            [root@rzgenie48:fiemap]# strace cp --sparse=always test.1k.sparse.1extent test.1k.sparse.1extent.copy2 2>&1 | grep ioctl
            ioctl(3, FS_IOC_FIEMAP, 0x7fffffffc760) = -1 EOPNOTSUPP (Operation not supported)
            [root@rzgenie48:fiemap]# lctl list_nids
            192.168.128.48@o2ib21
            

            Logs on the OSS:

            00000100:00100000:9.0:1466033830.426303:0:10320:0:(service.c:2086:ptlrpc_server_handle_request()) Handl
            ing RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost02_006:68119517-017d-eeee-db17-abd63a7e2e89+5:125727:x1537240460206584:12345-192.168.128.48@o2ib21:7
            00002000:00000001:9.0:1466033830.426308:0:10320:0:(ofd_obd.c:536:ofd_get_info()) Process entered
            00002000:00000001:9.0:1466033830.426309:0:10320:0:(ofd_obd.c:599:ofd_get_info()) Process leaving via out (rc=0 : 0 : 0x0)
            00002000:00000001:9.0:1466033830.426310:0:10320:0:(ofd_obd.c:669:ofd_get_info()) Process leaving (rc=0 : 0 : 0)
            00000100:00000001:9.0:1466033830.426311:0:10320:0:(pack_generic.c:336:lustre_pack_reply_v2()) Process entered
            00000100:00000001:9.0:1466033830.426313:0:10320:0:(pack_generic.c:370:lustre_pack_reply_v2()) Process leaving (rc=0 : 0 : 0)
            00002000:00000001:9.0:1466033830.426314:0:10320:0:(ofd_obd.c:536:ofd_get_info()) Process entered
            00002000:00000001:9.0:1466033830.426315:0:10320:0:(ofd_objects.c:81:ofd_object_find()) Process entered
            00002000:00000001:9.0:1466033830.426317:0:10320:0:(ofd_objects.c:89:ofd_object_find()) Process leaving (rc=18446612192165995760 : -131881543555856 : ffff880def71a4f0)
            00002000:00000001:9.0:1466033830.426319:0:10320:0:(ofd_obd.c:669:ofd_get_info()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)
            00000100:00000001:9.0:1466033830.426321:0:10320:0:(niobuf.c:649:ptlrpc_send_error()) Process entered
            00000100:00000001:9.0:1466033830.426323:0:10320:0:(connection.c:136:ptlrpc_connection_addref()) Process entered
            00000100:00000001:9.0:1466033830.426323:0:10320:0:(connection.c:143:ptlrpc_connection_addref()) Process leaving (rc=18446612201684338752 : -131872025212864 : ffff881026c80840)
            00000100:00000001:9.0:1466033830.426325:0:10320:0:(niobuf.c:60:ptl_send_buf()) Process entered
            00000100:00000001:9.0:1466033830.426332:0:10320:0:(niobuf.c:101:ptl_send_buf()) Process leaving (rc=0 : 0 : 0)
            00000100:00000001:9.0:1466033830.426333:0:10320:0:(connection.c:99:ptlrpc_connection_put()) Process entered
            00000100:00000001:9.0:1466033830.426334:0:10320:0:(connection.c:129:ptlrpc_connection_put()) Process leaving (rc=0 : 0 : 0)
            00000100:00000001:9.0:1466033830.426335:0:10320:0:(niobuf.c:666:ptlrpc_send_error()) Process leaving (rc=0 : 0 : 0)
            00000100:00100000:9.0:1466033830.426338:0:10320:0:(service.c:2130:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost02_006:68119517-017d-eeee-db17-abd63a7e2e89+5:125727:x1537240460206584:12345-192.168.128.48@o2ib21:7 Request procesed in 38us (104us total) trans 0 rc -95/-95
            
            ofaaland Olaf Faaland added a comment - I think I've confirmed that is what is happening: On the client: [root@rzgenie48:fiemap]# lfs getstripe test.1k.sparse.1extent test.1k.sparse.1extent lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 9 obdidx objid objid group 9 259446897 0xf76d871 0 [root@rzgenie48:fiemap]# strace cp --sparse=always test.1k.sparse.1extent test.1k.sparse.1extent.copy2 2>&1 | grep ioctl ioctl(3, FS_IOC_FIEMAP, 0x7fffffffc760) = -1 EOPNOTSUPP (Operation not supported) [root@rzgenie48:fiemap]# lctl list_nids 192.168.128.48@o2ib21 Logs on the OSS: 00000100:00100000:9.0:1466033830.426303:0:10320:0:(service.c:2086:ptlrpc_server_handle_request()) Handl ing RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost02_006:68119517-017d-eeee-db17-abd63a7e2e89+5:125727:x1537240460206584:12345-192.168.128.48@o2ib21:7 00002000:00000001:9.0:1466033830.426308:0:10320:0:(ofd_obd.c:536:ofd_get_info()) Process entered 00002000:00000001:9.0:1466033830.426309:0:10320:0:(ofd_obd.c:599:ofd_get_info()) Process leaving via out (rc=0 : 0 : 0x0) 00002000:00000001:9.0:1466033830.426310:0:10320:0:(ofd_obd.c:669:ofd_get_info()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:9.0:1466033830.426311:0:10320:0:(pack_generic.c:336:lustre_pack_reply_v2()) Process entered 00000100:00000001:9.0:1466033830.426313:0:10320:0:(pack_generic.c:370:lustre_pack_reply_v2()) Process leaving (rc=0 : 0 : 0) 00002000:00000001:9.0:1466033830.426314:0:10320:0:(ofd_obd.c:536:ofd_get_info()) Process entered 00002000:00000001:9.0:1466033830.426315:0:10320:0:(ofd_objects.c:81:ofd_object_find()) Process entered 00002000:00000001:9.0:1466033830.426317:0:10320:0:(ofd_objects.c:89:ofd_object_find()) Process leaving (rc=18446612192165995760 : -131881543555856 : ffff880def71a4f0) 00002000:00000001:9.0:1466033830.426319:0:10320:0:(ofd_obd.c:669:ofd_get_info()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1) 00000100:00000001:9.0:1466033830.426321:0:10320:0:(niobuf.c:649:ptlrpc_send_error()) Process entered 00000100:00000001:9.0:1466033830.426323:0:10320:0:(connection.c:136:ptlrpc_connection_addref()) Process entered 00000100:00000001:9.0:1466033830.426323:0:10320:0:(connection.c:143:ptlrpc_connection_addref()) Process leaving (rc=18446612201684338752 : -131872025212864 : ffff881026c80840) 00000100:00000001:9.0:1466033830.426325:0:10320:0:(niobuf.c:60:ptl_send_buf()) Process entered 00000100:00000001:9.0:1466033830.426332:0:10320:0:(niobuf.c:101:ptl_send_buf()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:9.0:1466033830.426333:0:10320:0:(connection.c:99:ptlrpc_connection_put()) Process entered 00000100:00000001:9.0:1466033830.426334:0:10320:0:(connection.c:129:ptlrpc_connection_put()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:9.0:1466033830.426335:0:10320:0:(niobuf.c:666:ptlrpc_send_error()) Process leaving (rc=0 : 0 : 0) 00000100:00100000:9.0:1466033830.426338:0:10320:0:(service.c:2130:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost02_006:68119517-017d-eeee-db17-abd63a7e2e89+5:125727:x1537240460206584:12345-192.168.128.48@o2ib21:7 Request procesed in 38us (104us total) trans 0 rc -95/-95

            dbo_fiemap_get is not implemented in the ZFS osd, because zfs does not support fiemap (not that it could not, it's just not implemented). We suspect that ofd_get_info() is calling dt_fiemap_get() which returns -EOPNOTSUPP, and that error code is passed back to the client.

            ofaaland Olaf Faaland added a comment - dbo_fiemap_get is not implemented in the ZFS osd, because zfs does not support fiemap (not that it could not, it's just not implemented). We suspect that ofd_get_info() is calling dt_fiemap_get() which returns -EOPNOTSUPP, and that error code is passed back to the client.

            Zhenyu,
            I applied your patch to our Lustre 2.5.5 stack and am running it on a couple of nodes. It turns out in my test case the EOPNOTSUPP is coming from obd_get_info, both with my patch yesterday and with yours today. I get the following console log messages:

            LustreError: 11-0: fsrzb-OST0009-osc-ffff88200c9f7000: Communicating with 172.21.1.110@o2ib200, operation ost_get_info failed with -95.
            LustreError: 170050:0:(file.c:1900:ll_do_fiemap()) obd_get_info failed: rc = -95
            
            ofaaland Olaf Faaland added a comment - Zhenyu, I applied your patch to our Lustre 2.5.5 stack and am running it on a couple of nodes. It turns out in my test case the EOPNOTSUPP is coming from obd_get_info, both with my patch yesterday and with yours today. I get the following console log messages: LustreError: 11-0: fsrzb-OST0009-osc-ffff88200c9f7000: Communicating with 172.21.1.110@o2ib200, operation ost_get_info failed with -95. LustreError: 170050:0:(file.c:1900:ll_do_fiemap()) obd_get_info failed: rc = -95
            ofaaland Olaf Faaland added a comment -

            I see. Thank you for the explanation.

            ofaaland Olaf Faaland added a comment - I see. Thank you for the explanation.
            bobijam Zhenyu Xu added a comment -

            yes, your patch can solve this issue as well.

            As the EOPNOTSUPP of the fiemap path, I can see one place has this

                    /* If the stripe_count > 1 and the application does not understand
                     * DEVICE_ORDER flag, then it cannot interpret the extents correctly.
                     */
                    if (lsm->lsm_stripe_count > 1 &&
                        !(fiemap->fm_flags & FIEMAP_FLAG_DEVICE_ORDER))
                            GOTO(out, rc = -EOPNOTSUPP);
            

            the file data crosses multiple OST stripes, and fiemap will return each OST index in the fiemap_extent::fe_reserved[0], and lustre-aware-less application can not interpret the extents and use it correctly.

            bobijam Zhenyu Xu added a comment - yes, your patch can solve this issue as well. As the EOPNOTSUPP of the fiemap path, I can see one place has this /* If the stripe_count > 1 and the application does not understand * DEVICE_ORDER flag, then it cannot interpret the extents correctly. */ if (lsm->lsm_stripe_count > 1 && !(fiemap->fm_flags & FIEMAP_FLAG_DEVICE_ORDER)) GOTO(out, rc = -EOPNOTSUPP); the file data crosses multiple OST stripes, and fiemap will return each OST index in the fiemap_extent::fe_reserved [0] , and lustre-aware-less application can not interpret the extents and use it correctly.
            ofaaland Olaf Faaland added a comment - - edited

            Actually, I'm not sure about this patch. I ran my cp command under strace, and I see the FIEMAP ioctl returned EOPNOTSUPP. This does indeed allow the copy to succeed without crashing the node, but I don't understand why my patch had that effect. The patch just changes the memcpy calls to copy_to/from_user.

            ofaaland Olaf Faaland added a comment - - edited Actually, I'm not sure about this patch. I ran my cp command under strace, and I see the FIEMAP ioctl returned EOPNOTSUPP. This does indeed allow the copy to succeed without crashing the node, but I don't understand why my patch had that effect. The patch just changes the memcpy calls to copy_to/from_user.
            ofaaland Olaf Faaland added a comment -

            I made a smaller change and tested it locally on the cluster that was hitting the issue. It worked there. I assume I won't be able to push to the b2_5_fe branch, so I've attached it as 0001-LU-8256llite-use-copy_-t-from_user-in-ll_fiemap.patch

            ofaaland Olaf Faaland added a comment - I made a smaller change and tested it locally on the cluster that was hitting the issue. It worked there. I assume I won't be able to push to the b2_5_fe branch, so I've attached it as 0001- LU-8256 llite-use-copy_-t-from _user-in-ll_fiemap.patch

            Thanks for working on the backport. However it's a big patch. Is the all the change appropriate for a stable branch? It looks to me that most of it is pruning unnecessary code, as opposed to fixing code that is wrong.

            ofaaland Olaf Faaland added a comment - Thanks for working on the backport. However it's a big patch. Is the all the change appropriate for a stable branch? It looks to me that most of it is pruning unnecessary code, as opposed to fixing code that is wrong.
            bobijam Zhenyu Xu added a comment -

            b2_5_fe backport at http://review.whamcloud.com/20731
            b_ieel2_0 backport at http://review.whamcloud.com/20732

            bobijam Zhenyu Xu added a comment - b2_5_fe backport at http://review.whamcloud.com/20731 b_ieel2_0 backport at http://review.whamcloud.com/20732

            It looks like this was fixed in patch http://review.whamcloud.com/13608 "LU-6201 llite: remove duplicate fiemap defines". That was considered mostly a code cleanup, but included the copy_from_user() fix.

            There is a patch for b2_7_fe at: http://review.whamcloud.com/16815

            The patch also needs to be backported to b2_5_fe and b_ieel2_0.

            adilger Andreas Dilger added a comment - It looks like this was fixed in patch http://review.whamcloud.com/13608 " LU-6201 llite: remove duplicate fiemap defines". That was considered mostly a code cleanup, but included the copy_from_user() fix. There is a patch for b2_7_fe at: http://review.whamcloud.com/16815 The patch also needs to be backported to b2_5_fe and b_ieel2_0.

            People

              bobijam Zhenyu Xu
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: