  1. Lustre
  2. LU-10938 Metadata writeback cache support
  3. LU-15413

WBC: endless loop in balance_dirty_pages



      When write a larger file into Lustre with WBC enabled (aging_keep flush mode), it trapped into a endless loop:

      dd if=/dev/zero of=/mnt/lustre/tdir/tfile bs=1M count=4096
      cat /proc/22735/stack
      [<0>] balance_dirty_pages+0x426/0xcd0
      [<0>] balance_dirty_pages_ratelimited+0x2af/0x3b0
      [<0>] generic_perform_write+0x16a/0x1b0
      [<0>] __generic_file_write_iter+0xfa/0x1c0
      [<0>] generic_file_write_iter+0xab/0x150
      [<0>] memfs_file_write_iter+0xd7/0x180 [lustre]
      [<0>] new_sync_write+0x124/0x170
      [<0>] vfs_write+0xa5/0x1a0
      [<0>] ksys_write+0x4f/0xb0
      [<0>] do_syscall_64+0x5b/0x1b0
      [<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
      [<0>] 0xffffffffffffffff

      The reason is because that the kernel found the current writing process tries to write out some dirty pages in @balance_dirty_pages() due to the rate limit mechanism in Linux kernel, but the pages are pinned in MemFS, and are not reclaimable.
      We found that for a client with 96G memory, it will trap into the endless loop when the write size is larger than 8G.

      Here, there are two solution:
      one solution is to disable dirty account for the DBI:

      sb->s_bdi->capabilities |= BDI_CAP_NO_ACCT_DIRTY;
      void balance_dirty_pages_ratelimited(struct address_space *mapping)
      	struct inode *inode = mapping->host;
      	struct backing_dev_info *bdi = inode_to_bdi(inode);
      	struct bdi_writeback *wb = NULL;
      	int ratelimit;
      	int *p;
      	if (!bdi_cap_account_dirty(bdi))

      By this way, it will not trigger to call balance_dirty_pages. It can write as many cache pages as possible before reaching the page cache limits in MemFS.

      Another solution is that:
      when write-out inode in @balance_dirty_pages->wb_start_background_writeback(), the client assimilates the cache pages from MemFS into Lustre, after that the assimilated pages in Lustre are reclaimable, the dirty pages can be written out to Lustre backend.




