[LU-4856] osc_lru_reserve()) ASSERTION( atomic_read(cli->cl_lru_left) >= 0 ) failed Created: 03/Apr/14  Updated: 01/Oct/15  Resolved: 30/Sep/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.6.0, Lustre 2.4.2
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Major
Reporter: Stephen Champion Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
Severity: 3
Rank (Obsolete): 13394

 Description   

The atomic_t used to count LRU entries is overflowing on systems with large memory configurations:

LustreError: 22141:0:(osc_page.c:892:osc_lru_reserve()) ASSERTION(atomic_read(cli->cl_lru_left) >= 0 ) failed:

PID: 54214 TASK: ffff88fdef4e4100 CPU: 40 COMMAND: "cat"
#3 [ffff88fdf0823900] lbug_with_loc at ffffffffa07fedc3 [libcfs]
#4 [ffff88fdf0823920] osc_lru_reserve at ffffffffa0c2a28a [osc]
#5 [ffff88fdf08239a0] cl_page_alloc at ffffffffa09a7122 [obdclass]
#6 [ffff88fdf08239e0] cl_page_find0 at ffffffffa09a742d [obdclass]
#7 [ffff88fdf0823a40] lov_page_init_raid0 at ffffffffa0cc0f21 [lov]
#8 [ffff88fdf0823aa0] cl_page_alloc at ffffffffa09a7122 [obdclass]
#9 [ffff88fdf0823ae0] cl_page_find0 at ffffffffa09a742d [obdclass]
#10 [ffff88fdf0823b40] ll_cl_init at ffffffffa0d74123 [lustre]
#11 [ffff88fdf0823bd0] ll_readpage at ffffffffa0d74485 [lustre]
#12 [ffff88fdf0823c00] do_generic_file_read at ffffffff810fa39e
#13 [ffff88fdf0823c80] generic_file_aio_read at ffffffff810fad4c
#14 [ffff88fdf0823d40] vvp_io_read_start at ffffffffa0da2fb0 [lustre]
#15 [ffff88fdf0823da0] cl_io_start at ffffffffa09af979 [obdclass]
#16 [ffff88fdf0823dd0] cl_io_loop at ffffffffa09b3d33 [obdclass]
#17 [ffff88fdf0823e00] ll_file_io_generic at ffffffffa0d49c32 [lustre]
#18 [ffff88fdf0823e70] ll_file_aio_read at ffffffffa0d4a3b3 [lustre]
#19 [ffff88fdf0823ec0] ll_file_read at ffffffffa0d4aec3 [lustre]
#20 [ffff88fdf0823f10] vfs_read at ffffffff8115b237
#21 [ffff88fdf0823f40] sys_read at ffffffff8115b3a3

In this case, the atomic_t (signed int) held:
crash> pd (int)0xffff943de11780fc
$10 = -1506317746

We've triggered this specific problem with configurations down to 11TB of physmem. A 10.5TB system can cat a small file without crashing.

I noticed several other cases where page counts are handled using a signed int, and suspect anything more than 4TB is problematic. The kernel itself is consistently using unsigned long for page counts on all architectures.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 03/Apr/14 ]

we can use atomic64_t instead.

Comment by Stephen Champion [ 10/Apr/14 ]

I've been digging at this, trying to identify the changes required. To support large memory systems, all global accounting of pages needs to be done with 64 types. Just tracing usage of cfs_num_physpages (which cl_lru_left is derived from), the problem snowballs quickly, and affects almost every subsystem in Lustre.

Some casting will be required, but it should not be a problem to use 32 bit counters for page vectors. I doubt any networks support 8 TB transactions yet.

Comment by Stephen Champion [ 07/May/14 ]

I have been working on a patch against master to address easily identified overflow hazards. This cascaded into lock management as well.

I am about to give it a whirl on internal systems to make sure I didn't break anything, then allocate time on a system with 16T of memory to make sure it addresses the problem. I won't be able to run acceptance on the large system anytime soon, but will do some basic functionality testing.

I will also need to cleanup for coding standards.

Comment by Stephen Champion [ 31/May/14 ]

http://review.whamcloud.com/#/c/10537/

Comment by Stephen Champion [ 03/Jun/14 ]

I setup an i686 build environment and worked through the initial errors.

The kernel does not implement atomic64_add_unless on this arch, so I'll have to find a way around this problem. I will push the updated patch for feedback, but there will certainly be additional revisions, possibly major.

Comment by John Fuchs-Chesney (Inactive) [ 25/Jul/14 ]

Hello Stephen,
Do you want us to keep this ticket open?

Many thanks,
~ jfc.

Comment by Stephen Champion [ 25/Jul/14 ]

Yes please. The patch needs to have i686 build problems addressed, and I need to sync up with everyone who offered comments. I expect to get back to it during the week of Aug 5.

Comment by Stephen Champion [ 27/Aug/14 ]

I pushed a new revision of the patch this morning. I expected tests to start automatically - do I need to add Test-Parameters?

I will testing on my own x86_64 / IB test environment today, but do not have a means to test i686.
This has not yet been tested on a large memory system. We are starting that process.

Comment by Peter Jones [ 27/Aug/14 ]

Hi Steve

It's started testing now. Just higher than usual load on the test system.

Peter

Comment by Stephen Champion [ 04/Sep/14 ]

Revision 4 of http://review.whamcloud.com/#/c/10537/ is tested, working.

  1. rpm -q lustre-client
    lustre-client-2.6.51-3.0.101_0.35_default_gc69b1a0
  2. grep ^processor /proc/cpuinfo | wc -l
    3072
  3. grep ^MemTotal /proc/meminfo
    MemTotal: 32825421388 kB
  4. mount -t lustre mds1-esa@tcp0:/esa-uv /mnt/esa-uv
  5. cd /mnt/esa-uv/schamp
  6. ls -l
    total 3145740
    rw-rr- 1 schamp sgiemp_00 1073741824 Sep 4 14:36 foo.1
    rw-rr- 1 schamp sgiemp_00 1073741824 Sep 4 15:20 foo.2
  7. cp foo.2 foo.3
  8. ls -l
    total 3145740
    rw-rr- 1 schamp sgiemp_00 1073741824 Sep 4 14:36 foo.1
    rw-rr- 1 schamp sgiemp_00 1073741824 Sep 4 15:20 foo.2
    rw-rr- 1 root root 1073741824 Sep 4 18:24 foo.3
Comment by Stephen Champion [ 10/Sep/14 ]

http://review.whamcloud.com/#/c/10537/5 is confirmed as resolving this problem on a 32TB system.
I also ran sanity and sanityn without serious failure.

Comment by John Fuchs-Chesney (Inactive) [ 16/Sep/14 ]

Stephen,
Can you please review the comments on patch set 5?

Thanks,
~ jfc.

Comment by Stephen Champion [ 16/Sep/14 ]

In the middle of it right now. Had to rebase to master again.

I am hesitant to simply #define the lprocfs_.._long functions to _u64 functions, as sign conversion hazards might catch unsuspecting users. Seems like a great way to introduce very obscure bugs.

I think I can eliminate the introduction of the long function by using the _64 functions in the cases my patch was using them.
This does force 32 bit systems to unnecessarily use 64 bit types, but not in critical paths. This is what I have started on.

Comment by Stephen Champion [ 18/Sep/14 ]

http://review.whamcloud.com/#/c/10537/7 eliminates the lprocfs_..long functions entirely.

Comment by John Fuchs-Chesney (Inactive) [ 18/Sep/14 ]

Made it through autotest.
~ jfc.

Comment by Peter Jones [ 30/Sep/14 ]

Landed for 2.7

Comment by Jian Yu [ 10/Oct/14 ]

Here is the patch for master branch to resolve the issue that if "val" is larger than 2^32 on a 32-bit system, the code in proc_max_dirty_pages_in_mb() may truncate "val" when assigning it to obd_max_dirty_pages: http://review.whamcloud.com/12269/

Comment by Manish Patel (Inactive) [ 10/Jul/15 ]

Hi

We are seeing same issues with SLES 11 + SP3 and we using Lustre version 2.4.3

luster client installed with 2048 core SGI UV1000 running:

 cat /etc/SuSE-release

SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3
hungabee:~ # lsb_release -a
LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch
Distributor ID: SUSE LINUX
Description: SUSE Linux Enterprise Server 11 (x86_64)
Release: 11
Codename: n/a
hungabee:~ # rpm -qa | egrep "(lustre|ofed)"
lustre-client-modules-2.4.3-3.0.101_0.29_default
ofed-doc-1.5.4.1-0.11.5
ofed-1.5.4.1-0.11.5
ofed-kmp-trace-1.5.4.1_3.0.76_0.11-0.11.5
ofed-kmp-default-1.5.4.1_3.0.76_0.11-0.11.5
lustre-client-2.4.3-3.0.101_0.29_default

So can we have a backport path for Lustre client v2.4.3 and is this patch included in 2.5.x branches if not then can we have backport patch for 2.5 branch too.

Thank You,
Manish

Comment by Gerrit Updater [ 01/Oct/15 ]

Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/16697
Subject: LU-4856 misc: Reduce exposure to overflow on page counters.
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: f14f45c4e52246efe2c478b87c703705a30b3774

Generated at Sat Feb 10 01:46:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.