Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10938 Metadata writeback cache support
  3. LU-15413

WBC: endless loop in balance_dirty_pages

    XMLWordPrintable

Details

    • Technical task
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      When write a larger file into Lustre with WBC enabled (aging_keep flush mode), it trapped into a endless loop:

      dd if=/dev/zero of=/mnt/lustre/tdir/tfile bs=1M count=4096
      
      cat /proc/22735/stack
      [<0>] balance_dirty_pages+0x426/0xcd0
      [<0>] balance_dirty_pages_ratelimited+0x2af/0x3b0
      [<0>] generic_perform_write+0x16a/0x1b0
      [<0>] __generic_file_write_iter+0xfa/0x1c0
      [<0>] generic_file_write_iter+0xab/0x150
      [<0>] memfs_file_write_iter+0xd7/0x180 [lustre]
      [<0>] new_sync_write+0x124/0x170
      [<0>] vfs_write+0xa5/0x1a0
      [<0>] ksys_write+0x4f/0xb0
      [<0>] do_syscall_64+0x5b/0x1b0
      [<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
      [<0>] 0xffffffffffffffff
      

      The reason is because that the kernel found the current writing process tries to write out some dirty pages in @balance_dirty_pages() due to the rate limit mechanism in Linux kernel, but the pages are pinned in MemFS, and are not reclaimable.
      We found that for a client with 96G memory, it will trap into the endless loop when the write size is larger than 8G.

      Here, there are two solution:
      one solution is to disable dirty account for the DBI:

      sb->s_bdi->capabilities |= BDI_CAP_NO_ACCT_DIRTY;
      
      void balance_dirty_pages_ratelimited(struct address_space *mapping)
      {
      	struct inode *inode = mapping->host;
      	struct backing_dev_info *bdi = inode_to_bdi(inode);
      	struct bdi_writeback *wb = NULL;
      	int ratelimit;
      	int *p;
      
      	if (!bdi_cap_account_dirty(bdi))
      		return;
            ...
      

      By this way, it will not trigger to call balance_dirty_pages. It can write as many cache pages as possible before reaching the page cache limits in MemFS.

      Another solution is that:
      when write-out inode in @balance_dirty_pages->wb_start_background_writeback(), the client assimilates the cache pages from MemFS into Lustre, after that the assimilated pages in Lustre are reclaimable, the dirty pages can be written out to Lustre backend.

      Attachments

        Activity

          People

            qian_wc Qian Yingjin
            qian_wc Qian Yingjin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: