Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8256

BUG: unable to handle kernel paging request

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.5.5
    • RHEL 7.2 derivative, TOSS 3
      kernel 3.10.0-327.13.1.3chaos.ch6.x86_64
      lustre-2.5.5-6chaos.4.ch6.x86_64
      Servers are lustre 2.5 on TOSS 2/RHEL6.7
    • 3
    • 9223372036854775807

    Description

      On one cluster, copying a file stored on the lustre filesystem causes the node to crash with BUG: unable to handle kernel paging request at 00007fffffffa650 reported in the console log:

      2016-06-08 17:01:54 [ 604.544195] BUG: unable to handle kernel paging request at 00007fffffffa650
      2016-06-08 17:01:54 [ 604.552365] IP: [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre]
      2016-06-08 17:01:54 [ 604.559682] PGD 201eb17067 PUD 200b0af067 PMD 2023ba0067 PTE 8000000feafd7067
      2016-06-08 17:01:54 [ 604.567958] Oops: 0001 1 SMP
      2016-06-08 17:01:54 [ 604.571838] Modules linked in: lmv(OE) fld(OE) mgc(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) fid(OE) ptlrpc(OE) obdclass(OE) rpcsec_gss_krb5 ko2iblnd(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables nfsv3 nf_log_ipv4 nf_log_common xt_LOG xt_multiport xfs libcrc32c intel_powerclamp coretemp intel_rapl kvm iTCO_wdt iTCO_vendor_support ipmi_devintf hfi1(OE) sb_edac mei_me lpc_ich edac_core sg pcspkr mei mfd_core i2c_i801 shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi binfmt_misc ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfsd nfs_acl auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sd_mod crc_t10dif crct10dif_generic mxm_wmi crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel mgag200 ghash_clmulni_intel syscopyarea sysfillrect sysimgblt drm_kms_helper aesni_intel lrw igb ttm gf128mul ahci glue_helper dca ablk_helper libahci ptp drm cryptd pps_core libata i2c_algo_bit i2c_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod scsi_transport_iscsi zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate [last unloaded: ip_tables]
      2016-06-08 17:01:55 [ 604.708455] CPU: 38 PID: 7679 Comm: cp Tainted: P OE ------------ 3.10.0-327.13.1.3chaos.ch6.x86_64 #1
      2016-06-08 17:01:55 [ 604.720645] Hardware name: Penguin Computing Relion 2900e/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
      2016-06-08 17:01:55 [ 604.733342] task: ffff88101bb78b80 ti: ffff881019fbc000 task.ti: ffff881019fbc000
      2016-06-08 17:01:55 [ 604.742269] RIP: 0010:[<ffffffffa113b3d7>] [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre]
      2016-06-08 17:01:55 [ 604.752594] RSP: 0018:ffff881019fbfe78 EFLAGS: 00010206
      2016-06-08 17:01:55 [ 604.759123] RAX: 00007fffffffa650 RBX: 0000000000000fe0 RCX: ffff881020fe1880
      2016-06-08 17:01:55 [ 604.767703] RDX: 0000000000000002 RSI: ffff88101e6d2000 RDI: ffff8810235b1b48
      2016-06-08 17:01:55 [ 604.776293] RBP: ffff881019fbfeb0 R08: 0000000000000000 R09: ffff88101e6d2000
      2016-06-08 17:01:55 [ 604.784890] R10: ffffffffa113b27b R11: 0000000000000000 R12: ffff88101e6d2020
      2016-06-08 17:01:55 [ 604.793493] R13: 7fffffffffffffff R14: ffff881019fbfec8 R15: ffff88101e6d2000
      2016-06-08 17:01:55 [ 604.802102] FS: 00002aaaaab0a6c0(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000
      2016-06-08 17:01:55 [ 604.811797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      2016-06-08 17:01:55 [ 604.818875] CR2: 00007fffffffa650 CR3: 0000002022ce0000 CR4: 00000000003407e0
      2016-06-08 17:01:55 [ 604.827521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      2016-06-08 17:01:55 [ 604.836175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      2016-06-08 17:01:55 [ 604.844830] Stack:
      2016-06-08 17:01:55 [ 604.847761] ffff8810235b1b48 0000000000000000 0000000000000000 ffff8810235b1b48
      2016-06-08 17:01:55 [ 604.856779] 00007fffffffa630 0000000000000003 0000000000000000 ffff881019fbff28
      2016-06-08 17:01:55 [ 604.865801] ffffffff811fb409 7fffffffffffffff 0000000000000001 0000000000000048
      2016-06-08 17:01:55 [ 604.874833] Call Trace:
      2016-06-08 17:01:55 [ 604.878300] [<ffffffff811fb409>] do_vfs_ioctl+0x169/0x510
      2016-06-08 17:01:55 [ 604.885174] [<ffffffff811fb851>] SyS_ioctl+0xa1/0xc0
      2016-06-08 17:01:55 [ 604.891572] [<ffffffff8165cd49>] system_call_fastpath+0x16/0x1b
      2016-06-08 17:01:55 [ 604.899040] Code: a0 d1 08 e0 44 89 e8 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00 49 8b 46 10 48 8b 7d c8 4c 89 fe 4d 8d 67 20 <48> 8b 10 49 89 57 20 48 8b 50 08 49 89 57 28 48 8b 50 10 49 89
      2016-06-08 17:01:55 [ 604.922361] RIP [<ffffffffa113b3d7>] ll_fiemap+0x1a7/0x5c0 [lustre]
      2016-06-08 17:01:55 [ 604.930295] RSP <ffff881019fbfe78>
      2016-06-08 17:01:55 [ 604.935013] CR2: 00007fffffffa650
      2016-06-08 17:01:55 [ 605.513479] --[ end trace d79d98174ba667ee ]--
      2016-06-08 17:01:55 [ 605.521246] Kernel panic - not syncing: Fatal exception
      

      Another cluster running Lustre 2.5 client on TOSS3, mounting a TOSS2/Lustre2.5 server does not show the same problem.

      Attachments

        Issue Links

          Activity

            [LU-8256] BUG: unable to handle kernel paging request
            bobijam Zhenyu Xu added a comment -

            yes, your patch can solve this issue as well.

            As the EOPNOTSUPP of the fiemap path, I can see one place has this

                    /* If the stripe_count > 1 and the application does not understand
                     * DEVICE_ORDER flag, then it cannot interpret the extents correctly.
                     */
                    if (lsm->lsm_stripe_count > 1 &&
                        !(fiemap->fm_flags & FIEMAP_FLAG_DEVICE_ORDER))
                            GOTO(out, rc = -EOPNOTSUPP);
            

            the file data crosses multiple OST stripes, and fiemap will return each OST index in the fiemap_extent::fe_reserved[0], and lustre-aware-less application can not interpret the extents and use it correctly.

            bobijam Zhenyu Xu added a comment - yes, your patch can solve this issue as well. As the EOPNOTSUPP of the fiemap path, I can see one place has this /* If the stripe_count > 1 and the application does not understand * DEVICE_ORDER flag, then it cannot interpret the extents correctly. */ if (lsm->lsm_stripe_count > 1 && !(fiemap->fm_flags & FIEMAP_FLAG_DEVICE_ORDER)) GOTO(out, rc = -EOPNOTSUPP); the file data crosses multiple OST stripes, and fiemap will return each OST index in the fiemap_extent::fe_reserved [0] , and lustre-aware-less application can not interpret the extents and use it correctly.
            ofaaland Olaf Faaland added a comment - - edited

            Actually, I'm not sure about this patch. I ran my cp command under strace, and I see the FIEMAP ioctl returned EOPNOTSUPP. This does indeed allow the copy to succeed without crashing the node, but I don't understand why my patch had that effect. The patch just changes the memcpy calls to copy_to/from_user.

            ofaaland Olaf Faaland added a comment - - edited Actually, I'm not sure about this patch. I ran my cp command under strace, and I see the FIEMAP ioctl returned EOPNOTSUPP. This does indeed allow the copy to succeed without crashing the node, but I don't understand why my patch had that effect. The patch just changes the memcpy calls to copy_to/from_user.
            ofaaland Olaf Faaland added a comment -

            I made a smaller change and tested it locally on the cluster that was hitting the issue. It worked there. I assume I won't be able to push to the b2_5_fe branch, so I've attached it as 0001-LU-8256llite-use-copy_-t-from_user-in-ll_fiemap.patch

            ofaaland Olaf Faaland added a comment - I made a smaller change and tested it locally on the cluster that was hitting the issue. It worked there. I assume I won't be able to push to the b2_5_fe branch, so I've attached it as 0001- LU-8256 llite-use-copy_-t-from _user-in-ll_fiemap.patch

            Thanks for working on the backport. However it's a big patch. Is the all the change appropriate for a stable branch? It looks to me that most of it is pruning unnecessary code, as opposed to fixing code that is wrong.

            ofaaland Olaf Faaland added a comment - Thanks for working on the backport. However it's a big patch. Is the all the change appropriate for a stable branch? It looks to me that most of it is pruning unnecessary code, as opposed to fixing code that is wrong.
            bobijam Zhenyu Xu added a comment -

            b2_5_fe backport at http://review.whamcloud.com/20731
            b_ieel2_0 backport at http://review.whamcloud.com/20732

            bobijam Zhenyu Xu added a comment - b2_5_fe backport at http://review.whamcloud.com/20731 b_ieel2_0 backport at http://review.whamcloud.com/20732

            It looks like this was fixed in patch http://review.whamcloud.com/13608 "LU-6201 llite: remove duplicate fiemap defines". That was considered mostly a code cleanup, but included the copy_from_user() fix.

            There is a patch for b2_7_fe at: http://review.whamcloud.com/16815

            The patch also needs to be backported to b2_5_fe and b_ieel2_0.

            adilger Andreas Dilger added a comment - It looks like this was fixed in patch http://review.whamcloud.com/13608 " LU-6201 llite: remove duplicate fiemap defines". That was considered mostly a code cleanup, but included the copy_from_user() fix. There is a patch for b2_7_fe at: http://review.whamcloud.com/16815 The patch also needs to be backported to b2_5_fe and b_ieel2_0.

            I'm not at all certain, but it looks to me like line 3319 should be using copy_from_user, not memcpy(). ll_fiemap() is called from the kernel by ioctl_fiemap(). From what I can see, fieinfo->fi_extents_start points to user memory (ufiemap->fm_extents).

            static int ioctl_fiemap(struct file *filp, unsigned long arg)
            {
                    struct fiemap fiemap;
                    struct fiemap __user *ufiemap = (struct fiemap __user *) arg;
                    struct fiemap_extent_info fieinfo = { 0, };
                    struct inode *inode = file_inode(filp);
                    struct super_block *sb = inode->i_sb;
                    u64 len;
                    int error;
            
                    if (!inode->i_op->fiemap)
                            return -EOPNOTSUPP;
            
                    if (copy_from_user(&fiemap, ufiemap, sizeof(fiemap)))
                            return -EFAULT;
            
                    if (fiemap.fm_extent_count > FIEMAP_MAX_EXTENTS)
                            return -EINVAL;
            
                    error = fiemap_check_ranges(sb, fiemap.fm_start, fiemap.fm_length,
                                                &len);
                    if (error)
                            return error;
            
                    fieinfo.fi_flags = fiemap.fm_flags;
                    fieinfo.fi_extents_max = fiemap.fm_extent_count;
                    fieinfo.fi_extents_start = ufiemap->fm_extents;
            
                    if (fiemap.fm_extent_count != 0 &&
                        !access_ok(VERIFY_WRITE, fieinfo.fi_extents_start,
                                   fieinfo.fi_extents_max * sizeof(struct fiemap_extent)))
                            return -EFAULT;
            
                    if (fieinfo.fi_flags & FIEMAP_FLAG_SYNC)
                            filemap_write_and_wait(inode->i_mapping);
            
                    error = inode->i_op->fiemap(inode, &fieinfo, fiemap.fm_start, len);
            
            ofaaland Olaf Faaland added a comment - I'm not at all certain, but it looks to me like line 3319 should be using copy_from_user, not memcpy(). ll_fiemap() is called from the kernel by ioctl_fiemap(). From what I can see, fieinfo->fi_extents_start points to user memory (ufiemap->fm_extents). static int ioctl_fiemap(struct file *filp, unsigned long arg) { struct fiemap fiemap; struct fiemap __user *ufiemap = (struct fiemap __user *) arg; struct fiemap_extent_info fieinfo = { 0, }; struct inode *inode = file_inode(filp); struct super_block *sb = inode->i_sb; u64 len; int error; if (!inode->i_op->fiemap) return -EOPNOTSUPP; if (copy_from_user(&fiemap, ufiemap, sizeof(fiemap))) return -EFAULT; if (fiemap.fm_extent_count > FIEMAP_MAX_EXTENTS) return -EINVAL; error = fiemap_check_ranges(sb, fiemap.fm_start, fiemap.fm_length, &len); if (error) return error; fieinfo.fi_flags = fiemap.fm_flags; fieinfo.fi_extents_max = fiemap.fm_extent_count; fieinfo.fi_extents_start = ufiemap->fm_extents; if (fiemap.fm_extent_count != 0 && !access_ok(VERIFY_WRITE, fieinfo.fi_extents_start, fieinfo.fi_extents_max * sizeof(struct fiemap_extent))) return -EFAULT; if (fieinfo.fi_flags & FIEMAP_FLAG_SYNC) filemap_write_and_wait(inode->i_mapping); error = inode->i_op->fiemap(inode, &fieinfo, fiemap.fm_start, len);

            This lustre tag is available where Chris normally pushes our branch. Peter knows where that is.

            Code indicated by RIP in the error output is:

            (gdb) l *( ll_fiemap+0x1a7)
            0x173d7 is in ll_fiemap (/usr/src/debug/lustre-2.5.5/lustre/llite/file.c:3319).
            3314            fiemap->fm_flags = fieinfo->fi_flags;
            3315            fiemap->fm_extent_count = fieinfo->fi_extents_max;
            3316            fiemap->fm_start = start;
            3317            fiemap->fm_length = len;
            3318            if (extent_count > 0)
            3319                    memcpy(&fiemap->fm_extents[0], fieinfo->fi_extents_start,
            3320                           sizeof(struct ll_fiemap_extent));
            3321
            3322            rc = ll_do_fiemap(inode, fiemap, num_bytes);
            
            ofaaland Olaf Faaland added a comment - This lustre tag is available where Chris normally pushes our branch. Peter knows where that is. Code indicated by RIP in the error output is: (gdb) l *( ll_fiemap+0x1a7) 0x173d7 is in ll_fiemap (/usr/src/debug/lustre-2.5.5/lustre/llite/file.c:3319). 3314 fiemap->fm_flags = fieinfo->fi_flags; 3315 fiemap->fm_extent_count = fieinfo->fi_extents_max; 3316 fiemap->fm_start = start; 3317 fiemap->fm_length = len; 3318 if (extent_count > 0) 3319 memcpy(&fiemap->fm_extents[0], fieinfo->fi_extents_start, 3320 sizeof(struct ll_fiemap_extent)); 3321 3322 rc = ll_do_fiemap(inode, fiemap, num_bytes);

            People

              bobijam Zhenyu Xu
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: