Metadata writeback cache support (LU-10938)

[LU-15413] WBC: endless loop in balance_dirty_pages Created: 06/Jan/22  Updated: 06/Jun/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Technical task Priority: Minor
Reporter: Qian Yingjin Assignee: Qian Yingjin
Resolution: Unresolved Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

When write a larger file into Lustre with WBC enabled (aging_keep flush mode), it trapped into a endless loop:

dd if=/dev/zero of=/mnt/lustre/tdir/tfile bs=1M count=4096

cat /proc/22735/stack
[<0>] balance_dirty_pages+0x426/0xcd0
[<0>] balance_dirty_pages_ratelimited+0x2af/0x3b0
[<0>] generic_perform_write+0x16a/0x1b0
[<0>] __generic_file_write_iter+0xfa/0x1c0
[<0>] generic_file_write_iter+0xab/0x150
[<0>] memfs_file_write_iter+0xd7/0x180 [lustre]
[<0>] new_sync_write+0x124/0x170
[<0>] vfs_write+0xa5/0x1a0
[<0>] ksys_write+0x4f/0xb0
[<0>] do_syscall_64+0x5b/0x1b0
[<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
[<0>] 0xffffffffffffffff

The reason is because that the kernel found the current writing process tries to write out some dirty pages in @balance_dirty_pages() due to the rate limit mechanism in Linux kernel, but the pages are pinned in MemFS, and are not reclaimable.
We found that for a client with 96G memory, it will trap into the endless loop when the write size is larger than 8G.

Here, there are two solution:
one solution is to disable dirty account for the DBI:

sb->s_bdi->capabilities |= BDI_CAP_NO_ACCT_DIRTY;

void balance_dirty_pages_ratelimited(struct address_space *mapping)
{
	struct inode *inode = mapping->host;
	struct backing_dev_info *bdi = inode_to_bdi(inode);
	struct bdi_writeback *wb = NULL;
	int ratelimit;
	int *p;

	if (!bdi_cap_account_dirty(bdi))
		return;
      ...

By this way, it will not trigger to call balance_dirty_pages. It can write as many cache pages as possible before reaching the page cache limits in MemFS.

Another solution is that:
when write-out inode in @balance_dirty_pages->wb_start_background_writeback(), the client assimilates the cache pages from MemFS into Lustre, after that the assimilated pages in Lustre are reclaimable, the dirty pages can be written out to Lustre backend.



 Comments   
Comment by Gerrit Updater [ 06/Jan/22 ]

"Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45988
Subject: LU-15413 wbc: disable accounting for dirty pages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a42813460f149769ced6cad4bda715a1e17e58b6

Comment by Gerrit Updater [ 07/Jan/22 ]

"Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45997
Subject: LU-15413 wbc: assimilate inode cache pages for large write
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e31ea2a117e7c7078d112cb99d4cc4643f101b44

Comment by Gerrit Updater [ 06/Jun/22 ]

"Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47541
Subject: LU-15413 wbc: assimilation for data under @wbci_rw_sem
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e13359e5878781643eae4a4f325681a8860c25bd

Generated at Sat Feb 10 03:18:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.