[LU-13182] MAP_POPULATE hangs with Linux 5.4 Created: 31/Jan/20  Updated: 29/Sep/21  Resolved: 03/Nov/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.14.0, Lustre 2.12.7

Type: Bug Priority: Minor
Reporter: Sebastien Buisson Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14021 NULL pointer dereference in _raw_writ... Resolved
is related to LU-14022 sanity-hsm test 1a hung at verifying ... Resolved
is related to LU-13740 Ubuntu 20.04 LTS release Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

While testing Lustre with Linux 5.4 (kernel installed on top of Ubuntu 18.04), I came across an issue with mmap_cat test utility.

In this program, the call to mmap is:

mmappedData = mmap(NULL, filesize, PROT_READ, MAP_PRIVATE | MAP_POPULATE, fd, 0);

When using mmap_cat on a file, it hangs. However, if the call to mmap is changed as following, it works:

mmappedData = mmap(NULL, filesize, PROT_READ, MAP_PRIVATE, fd, 0);

So it seems the bug is related to the MAP_POPULATE flag, which semantic is:

MAP_POPULATE (since Linux 2.5.46)
              Populate (prefault) page tables for a mapping.  For a file
              mapping, this causes read-ahead on the file.  This will help
              to reduce blocking on page faults later.  MAP_POPULATE is sup‐
              ported for private mappings only since Linux 2.6.23.


 Comments   
Comment by Oleg Drokin [ 12/Oct/20 ]

this problem comes from the fast read patch LU-4257

The hang is because there's now a loop: ll_fault0()>ll_filemap_fault()>ll_readpage() return NULL->filemapfault returns FAULT_RETRY -> ll_fault exits -> kernel retries.

I am not100% sure why ll_readpage returns NULL yet, is it because the page is not in the mapping or some such?

Either way disabling the fast read in ll_fault0() works this around for now.

Comment by Gerrit Updater [ 12/Oct/20 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40221
Subject: LU-13182 llite: Do not use fast read for MAP_POPULATE faults
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 748e88027624040eead5749c2ef374f33915cec9

Comment by Gerrit Updater [ 03/Nov/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40221/
Subject: LU-13182 llite: Avoid eternel retry loops with MAP_POPULATE
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bb50c62c6f4cdd7a31145ab81e7c166e0760ed11

Comment by Peter Jones [ 03/Nov/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 09/Jun/21 ]

Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43958
Subject: LU-13182 llite: Avoid eternel retry loops with MAP_POPULATE
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: cd2aa3e27caf988d02c721b4be4916c42cbb710d

Comment by Gerrit Updater [ 15/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43958/
Subject: LU-13182 llite: Avoid eternel retry loops with MAP_POPULATE
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 21dc165991f9038aefe679cc58fdfc30d65dbaed

Generated at Sat Feb 10 02:59:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.