Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6854

Setting page_writeback on a non-dirty page

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • Lustre 2.5.3, Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      A recent change in the upstream kernel uncovered what I think is a bug in our handling of writeback bit on pages.

      If we go by the "sync io" path: vvp_io_commit_write->vvp_page_sync_io->…..->vvp_page_prep_write

      then the page is never dirty, vvp_page_prep_write then asserts the page is not dirty and then sets it as writeback:

      static int vvp_page_prep_write(const struct lu_env *env,
                                     const struct cl_page_slice *slice,
                                     struct cl_io *unused)
      {
              struct page *vmpage = cl2vm_page(slice);
      
              LASSERT(PageLocked(vmpage));
              LASSERT(!PageDirty(vmpage));
      
              set_page_writeback(vmpage);
              vvp_write_pending(cl2ccc(slice->cpl_obj), cl2ccc_page(slice));
      
              return 0;
      }
      

      Now, the problem is, from kernel perspective page writeback means this is a cached page that is just in flight being written to the device, so it must start as dirty in the first place and we violate that assumption.

      So now in 4.2.0 there's new cgroup dirty page accounting logic and as part of that set_page_writeback updates a writeback structure hanging off inode (i_wb), but this structure is only initialized when either an inode is dirtied or a page is dirtied and we crash otherwise (that's how I uncovered this).

      Since this path I am talking about is a sync write, there are two trains of thoughts possible here, I imagine.
      1: This is a sync write, so that's why we do not set dirty bit and we do a sync writeout => we probably don't need to set page_writeback either then.
      2: The page is in cache already anyway (that's how we got to it in the first place), so even though we cannot add it to OUR cache, we still need to set it dirty and then it'll become clean once the write completes anyway and we can throw it away out of cache in the same breath too if we want to.

      Attachments

        Issue Links

          Activity

            [LU-6854] Setting page_writeback on a non-dirty page
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/15610/
            Subject: LU-6854 llite: Do not set writeback for sync write pages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 610ac5c64d92f95924da839d3a2da28e9909956a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/15610/ Subject: LU-6854 llite: Do not set writeback for sync write pages Project: fs/lustre-release Branch: master Current Patch Set: Commit: 610ac5c64d92f95924da839d3a2da28e9909956a

            Great! Thank you very much!

            jpiles Joan Piles (Inactive) added a comment - Great! Thank you very much!

            We're in the process of landing this patch.

            jay Jinshan Xiong (Inactive) added a comment - We're in the process of landing this patch.

            So far we were using also 2.8.56, but without this patch. My question was because I saw the proposed patch was available since one year ago, yet had not been picked by master, and the bug is still open.

            We have now tried applying it on the master branch (a bit after 2.8.60), and it's apparently working well, so we'll deploy it.

            We were also having other issues, but I think some of them could be secondary effects from this problem... also we were likely hitting LU-7927 and / or LU-7981, whose fixes are also included I think.

            jpiles Joan Piles (Inactive) added a comment - So far we were using also 2.8.56, but without this patch. My question was because I saw the proposed patch was available since one year ago, yet had not been picked by master, and the bug is still open. We have now tried applying it on the master branch (a bit after 2.8.60), and it's apparently working well, so we'll deploy it. We were also having other issues, but I think some of them could be secondary effects from this problem... also we were likely hitting LU-7927 and / or LU-7981 , whose fixes are also included I think.
            ake_s Åke Sandgren added a comment - - edited

            I've been running with the proposed patch on Ubuntu 16.04 based on the 2.8.56 tag plus the patch for LU-6808 for while on our cluster and haven yet seen that specific problem reappear.

            It so far looks safe enough to me, but since i'm having other issues i'm not 100% sure.

            Joan, which version of lustre client are you running?

            ake_s Åke Sandgren added a comment - - edited I've been running with the proposed patch on Ubuntu 16.04 based on the 2.8.56 tag plus the patch for LU-6808 for while on our cluster and haven yet seen that specific problem reappear. It so far looks safe enough to me, but since i'm having other issues i'm not 100% sure. Joan, which version of lustre client are you running?

            We are hitting this bug using a Ubuntu 16.04 kernel (4.4.0-34) as reported here, or at least the stack trace is pretty much the same:

            [Mon Nov 14 11:44:36 2016] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
            [Mon Nov 14 11:44:36 2016] IP: [<ffffffff8141995f>] __percpu_counter_add+0xf/0x70
            [Mon Nov 14 11:44:36 2016] PGD 3fb7428067 PUD 3de0fb7067 PMD 0 
            [Mon Nov 14 11:44:36 2016] Oops: 0000 [#13] SMP 
            [Mon Nov 14 11:44:36 2016] Modules linked in: nfsv3 nfs_acl nfs lockd grace fscache nvidia_uvm(POE) ipmi_devintf 8021q garp mrp osc(OE) mgc(OE) lustre(OE) lmv(OE) fld(OE) mdc(OE) fid(OE) lov(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) ib_netlink(OE) mlx4_en(OE) mlx4_core(OE) mlx_compat(OE) nvidia(POE) intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
            [Mon Nov 14 11:44:36 2016]  input_leds joydev cryptd ipmi_ssif sb_edac mei_me mei edac_core lpc_ich ioatdma shpchp mac_hid acpi_power_meter ipmi_si 8250_fintek ipmi_msghandler acpi_pad binfmt_misc knem(OE) parport_pc ppdev lp sunrpc parport autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear ast ttm ixgbe drm_kms_helper syscopyarea sysfillrect vxlan igb sysimgblt ip6_udp_tunnel dca udp_tunnel fb_sys_fops raid1 hid_generic ptp usbhid ahci pps_core mdio drm hid libahci i2c_algo_bit wmi fjes
            [Mon Nov 14 11:44:36 2016] CPU: 19 PID: 30139 Comm: python Tainted: P      D    OE   4.4.0-34-generic #53-Ubuntu
            [Mon Nov 14 11:44:36 2016] Hardware name: Supermicro SYS-2028GR-TRH/X10DRG-H, BIOS 1.0c 05/20/2015
            [Mon Nov 14 11:44:36 2016] task: ffff883fed1ca940 ti: ffff883b2b758000 task.ti: ffff883b2b758000
            [Mon Nov 14 11:44:36 2016] RIP: 0010:[<ffffffff8141995f>]  [<ffffffff8141995f>] __percpu_counter_add+0xf/0x70
            [Mon Nov 14 11:44:36 2016] RSP: 0018:ffff883b2b75b938  EFLAGS: 00010006
            [Mon Nov 14 11:44:36 2016] RAX: 0000000000000005 RBX: ffffea00f51f47c0 RCX: 000000000000001b
            [Mon Nov 14 11:44:36 2016] RDX: 0000000000000030 RSI: 0000000000000001 RDI: 0000000000000088
            [Mon Nov 14 11:44:36 2016] RBP: ffff883b2b75b950 R08: ffff883cc575e600 R09: 0000000000000000
            [Mon Nov 14 11:44:36 2016] R10: 0000000000000000 R11: ffff883fd8b907d0 R12: 0000000000000088
            [Mon Nov 14 11:44:36 2016] R13: 0000000000000001 R14: ffff8839fcbf0090 R15: ffff8839fcbf0210
            [Mon Nov 14 11:44:36 2016] FS:  00002b5017e18a40(0000) GS:ffff883fff240000(0000) knlGS:0000000000000000
            [Mon Nov 14 11:44:36 2016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [Mon Nov 14 11:44:36 2016] CR2: 00000000000000a8 CR3: 0000003fd5ce0000 CR4: 00000000001406e0
            [Mon Nov 14 11:44:36 2016] Stack:
            [Mon Nov 14 11:44:36 2016]  ffffea00f51f47c0 ffff8839fcbf01f8 ffff881f69099000 ffff883b2b75b9a0
            [Mon Nov 14 11:44:36 2016]  ffffffff8119b4d0 0000000000000287 0000000000000000 ffff883ff23fd118
            [Mon Nov 14 11:44:36 2016]  ffff883cc575e650 0000000000000068 ffff883fd844e128 ffff883fd2768d20
            [Mon Nov 14 11:44:36 2016] Call Trace:
            [Mon Nov 14 11:44:36 2016]  [<ffffffff8119b4d0>] __test_set_page_writeback+0x190/0x1d0
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc0ba3d82>] vvp_page_prep_write+0x22/0x90 [lustre]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc0847e8a>] cl_page_invoke+0x5a/0x160 [obdclass]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc0848175>] cl_page_prep+0x35/0x1e0 [obdclass]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc0c24f98>] osc_io_submit+0x138/0x5c0 [osc]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc084ca80>] cl_io_submit_rw+0x60/0x150 [obdclass]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc0aca6ae>] lov_io_submit+0x29e/0x480 [lov]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc084ca80>] cl_io_submit_rw+0x60/0x150 [obdclass]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc084eb68>] cl_io_submit_sync+0xb8/0x1a0 [obdclass]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc0ba80d3>] vvp_io_write_commit+0x5a3/0x900 [lustre]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc0ba890b>] vvp_io_write_start+0x4db/0x610 [lustre]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc084ac76>] ? cl_lock_request+0x66/0x1d0 [obdclass]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc084c82c>] cl_io_start+0x5c/0x110 [obdclass]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc084e8c1>] cl_io_loop+0xa1/0x180 [obdclass]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc0b57418>] ll_file_io_generic+0x768/0xad0 [lustre]
            [Mon Nov 14 11:44:36 2016]  [<ffffffffc0b579cd>] ll_file_write_iter+0x7d/0xe0 [lustre]
            [Mon Nov 14 11:44:36 2016]  [<ffffffff8120c97b>] new_sync_write+0x9b/0xe0
            [Mon Nov 14 11:44:36 2016]  [<ffffffff8120c9e6>] __vfs_write+0x26/0x40
            [Mon Nov 14 11:44:36 2016]  [<ffffffff8120d369>] vfs_write+0xa9/0x1a0
            [Mon Nov 14 11:44:36 2016]  [<ffffffff8120e025>] SyS_write+0x55/0xc0
            [Mon Nov 14 11:44:36 2016]  [<ffffffff8182def2>] entry_SYSCALL_64_fastpath+0x16/0x71
            [Mon Nov 14 11:44:36 2016] Code: 40 41 00 48 89 d8 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 <48> 8b 47 20 48 63 ca 65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a 
            [Mon Nov 14 11:44:36 2016] RIP  [<ffffffff8141995f>] __percpu_counter_add+0xf/0x70
            [Mon Nov 14 11:44:36 2016]  RSP <ffff883b2b75b938>
            [Mon Nov 14 11:44:36 2016] CR2: 00000000000000a8
            [Mon Nov 14 11:44:36 2016] ---[ end trace f8cd37dfb7aa1008 ]---
            

            We have found that the proposed patch has not been picked by the current master. Is it safe to apply?

            jpiles Joan Piles (Inactive) added a comment - We are hitting this bug using a Ubuntu 16.04 kernel (4.4.0-34) as reported here , or at least the stack trace is pretty much the same: [Mon Nov 14 11:44:36 2016] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [Mon Nov 14 11:44:36 2016] IP: [<ffffffff8141995f>] __percpu_counter_add+0xf/0x70 [Mon Nov 14 11:44:36 2016] PGD 3fb7428067 PUD 3de0fb7067 PMD 0 [Mon Nov 14 11:44:36 2016] Oops: 0000 [#13] SMP [Mon Nov 14 11:44:36 2016] Modules linked in: nfsv3 nfs_acl nfs lockd grace fscache nvidia_uvm(POE) ipmi_devintf 8021q garp mrp osc(OE) mgc(OE) lustre(OE) lmv(OE) fld(OE) mdc(OE) fid(OE) lov(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) ib_netlink(OE) mlx4_en(OE) mlx4_core(OE) mlx_compat(OE) nvidia(POE) intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul [Mon Nov 14 11:44:36 2016] input_leds joydev cryptd ipmi_ssif sb_edac mei_me mei edac_core lpc_ich ioatdma shpchp mac_hid acpi_power_meter ipmi_si 8250_fintek ipmi_msghandler acpi_pad binfmt_misc knem(OE) parport_pc ppdev lp sunrpc parport autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear ast ttm ixgbe drm_kms_helper syscopyarea sysfillrect vxlan igb sysimgblt ip6_udp_tunnel dca udp_tunnel fb_sys_fops raid1 hid_generic ptp usbhid ahci pps_core mdio drm hid libahci i2c_algo_bit wmi fjes [Mon Nov 14 11:44:36 2016] CPU: 19 PID: 30139 Comm: python Tainted: P D OE 4.4.0-34-generic #53-Ubuntu [Mon Nov 14 11:44:36 2016] Hardware name: Supermicro SYS-2028GR-TRH/X10DRG-H, BIOS 1.0c 05/20/2015 [Mon Nov 14 11:44:36 2016] task: ffff883fed1ca940 ti: ffff883b2b758000 task.ti: ffff883b2b758000 [Mon Nov 14 11:44:36 2016] RIP: 0010:[<ffffffff8141995f>] [<ffffffff8141995f>] __percpu_counter_add+0xf/0x70 [Mon Nov 14 11:44:36 2016] RSP: 0018:ffff883b2b75b938 EFLAGS: 00010006 [Mon Nov 14 11:44:36 2016] RAX: 0000000000000005 RBX: ffffea00f51f47c0 RCX: 000000000000001b [Mon Nov 14 11:44:36 2016] RDX: 0000000000000030 RSI: 0000000000000001 RDI: 0000000000000088 [Mon Nov 14 11:44:36 2016] RBP: ffff883b2b75b950 R08: ffff883cc575e600 R09: 0000000000000000 [Mon Nov 14 11:44:36 2016] R10: 0000000000000000 R11: ffff883fd8b907d0 R12: 0000000000000088 [Mon Nov 14 11:44:36 2016] R13: 0000000000000001 R14: ffff8839fcbf0090 R15: ffff8839fcbf0210 [Mon Nov 14 11:44:36 2016] FS: 00002b5017e18a40(0000) GS:ffff883fff240000(0000) knlGS:0000000000000000 [Mon Nov 14 11:44:36 2016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Mon Nov 14 11:44:36 2016] CR2: 00000000000000a8 CR3: 0000003fd5ce0000 CR4: 00000000001406e0 [Mon Nov 14 11:44:36 2016] Stack: [Mon Nov 14 11:44:36 2016] ffffea00f51f47c0 ffff8839fcbf01f8 ffff881f69099000 ffff883b2b75b9a0 [Mon Nov 14 11:44:36 2016] ffffffff8119b4d0 0000000000000287 0000000000000000 ffff883ff23fd118 [Mon Nov 14 11:44:36 2016] ffff883cc575e650 0000000000000068 ffff883fd844e128 ffff883fd2768d20 [Mon Nov 14 11:44:36 2016] Call Trace: [Mon Nov 14 11:44:36 2016] [<ffffffff8119b4d0>] __test_set_page_writeback+0x190/0x1d0 [Mon Nov 14 11:44:36 2016] [<ffffffffc0ba3d82>] vvp_page_prep_write+0x22/0x90 [lustre] [Mon Nov 14 11:44:36 2016] [<ffffffffc0847e8a>] cl_page_invoke+0x5a/0x160 [obdclass] [Mon Nov 14 11:44:36 2016] [<ffffffffc0848175>] cl_page_prep+0x35/0x1e0 [obdclass] [Mon Nov 14 11:44:36 2016] [<ffffffffc0c24f98>] osc_io_submit+0x138/0x5c0 [osc] [Mon Nov 14 11:44:36 2016] [<ffffffffc084ca80>] cl_io_submit_rw+0x60/0x150 [obdclass] [Mon Nov 14 11:44:36 2016] [<ffffffffc0aca6ae>] lov_io_submit+0x29e/0x480 [lov] [Mon Nov 14 11:44:36 2016] [<ffffffffc084ca80>] cl_io_submit_rw+0x60/0x150 [obdclass] [Mon Nov 14 11:44:36 2016] [<ffffffffc084eb68>] cl_io_submit_sync+0xb8/0x1a0 [obdclass] [Mon Nov 14 11:44:36 2016] [<ffffffffc0ba80d3>] vvp_io_write_commit+0x5a3/0x900 [lustre] [Mon Nov 14 11:44:36 2016] [<ffffffffc0ba890b>] vvp_io_write_start+0x4db/0x610 [lustre] [Mon Nov 14 11:44:36 2016] [<ffffffffc084ac76>] ? cl_lock_request+0x66/0x1d0 [obdclass] [Mon Nov 14 11:44:36 2016] [<ffffffffc084c82c>] cl_io_start+0x5c/0x110 [obdclass] [Mon Nov 14 11:44:36 2016] [<ffffffffc084e8c1>] cl_io_loop+0xa1/0x180 [obdclass] [Mon Nov 14 11:44:36 2016] [<ffffffffc0b57418>] ll_file_io_generic+0x768/0xad0 [lustre] [Mon Nov 14 11:44:36 2016] [<ffffffffc0b579cd>] ll_file_write_iter+0x7d/0xe0 [lustre] [Mon Nov 14 11:44:36 2016] [<ffffffff8120c97b>] new_sync_write+0x9b/0xe0 [Mon Nov 14 11:44:36 2016] [<ffffffff8120c9e6>] __vfs_write+0x26/0x40 [Mon Nov 14 11:44:36 2016] [<ffffffff8120d369>] vfs_write+0xa9/0x1a0 [Mon Nov 14 11:44:36 2016] [<ffffffff8120e025>] SyS_write+0x55/0xc0 [Mon Nov 14 11:44:36 2016] [<ffffffff8182def2>] entry_SYSCALL_64_fastpath+0x16/0x71 [Mon Nov 14 11:44:36 2016] Code: 40 41 00 48 89 d8 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 <48> 8b 47 20 48 63 ca 65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a [Mon Nov 14 11:44:36 2016] RIP [<ffffffff8141995f>] __percpu_counter_add+0xf/0x70 [Mon Nov 14 11:44:36 2016] RSP <ffff883b2b75b938> [Mon Nov 14 11:44:36 2016] CR2: 00000000000000a8 [Mon Nov 14 11:44:36 2016] ---[ end trace f8cd37dfb7aa1008 ]--- We have found that the proposed patch has not been picked by the current master. Is it safe to apply?

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15610
            Subject: LU-6854 llite: Do not set writeback for sync write pages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7c44f46769de80862d252d90af3a56852b0aef83

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15610 Subject: LU-6854 llite: Do not set writeback for sync write pages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7c44f46769de80862d252d90af3a56852b0aef83
            green Oleg Drokin added a comment -

            Jinshan told me a similar (though code is different) problem exists in master too, so that's why 2.8.0 is in the list of affected versions.

            green Oleg Drokin added a comment - Jinshan told me a similar (though code is different) problem exists in master too, so that's why 2.8.0 is in the list of affected versions.

            People

              bobijam Zhenyu Xu
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: