Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10938 Metadata writeback cache support
  3. LU-15413

WBC: endless loop in balance_dirty_pages

XMLWordPrintable

    • Icon: Technical task Technical task
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • None
    • None
    • 9223372036854775807

      When write a larger file into Lustre with WBC enabled (aging_keep flush mode), it trapped into a endless loop:

      dd if=/dev/zero of=/mnt/lustre/tdir/tfile bs=1M count=4096
      
      cat /proc/22735/stack
      [<0>] balance_dirty_pages+0x426/0xcd0
      [<0>] balance_dirty_pages_ratelimited+0x2af/0x3b0
      [<0>] generic_perform_write+0x16a/0x1b0
      [<0>] __generic_file_write_iter+0xfa/0x1c0
      [<0>] generic_file_write_iter+0xab/0x150
      [<0>] memfs_file_write_iter+0xd7/0x180 [lustre]
      [<0>] new_sync_write+0x124/0x170
      [<0>] vfs_write+0xa5/0x1a0
      [<0>] ksys_write+0x4f/0xb0
      [<0>] do_syscall_64+0x5b/0x1b0
      [<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
      [<0>] 0xffffffffffffffff
      

      The reason is because that the kernel found the current writing process tries to write out some dirty pages in @balance_dirty_pages() due to the rate limit mechanism in Linux kernel, but the pages are pinned in MemFS, and are not reclaimable.
      We found that for a client with 96G memory, it will trap into the endless loop when the write size is larger than 8G.

      Here, there are two solution:
      one solution is to disable dirty account for the DBI:

      sb->s_bdi->capabilities |= BDI_CAP_NO_ACCT_DIRTY;
      
      void balance_dirty_pages_ratelimited(struct address_space *mapping)
      {
      	struct inode *inode = mapping->host;
      	struct backing_dev_info *bdi = inode_to_bdi(inode);
      	struct bdi_writeback *wb = NULL;
      	int ratelimit;
      	int *p;
      
      	if (!bdi_cap_account_dirty(bdi))
      		return;
            ...
      

      By this way, it will not trigger to call balance_dirty_pages. It can write as many cache pages as possible before reaching the page cache limits in MemFS.

      Another solution is that:
      when write-out inode in @balance_dirty_pages->wb_start_background_writeback(), the client assimilates the cache pages from MemFS into Lustre, after that the assimilated pages in Lustre are reclaimable, the dirty pages can be written out to Lustre backend.

            qian_wc Qian Yingjin
            qian_wc Qian Yingjin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: