Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.5.0, Lustre 2.6.0, Lustre 2.4.2
-
3
-
13394
Description
The atomic_t used to count LRU entries is overflowing on systems with large memory configurations:
LustreError: 22141:0:(osc_page.c:892:osc_lru_reserve()) ASSERTION(atomic_read(cli->cl_lru_left) >= 0 ) failed:
PID: 54214 TASK: ffff88fdef4e4100 CPU: 40 COMMAND: "cat"
#3 [ffff88fdf0823900] lbug_with_loc at ffffffffa07fedc3 [libcfs]
#4 [ffff88fdf0823920] osc_lru_reserve at ffffffffa0c2a28a [osc]
#5 [ffff88fdf08239a0] cl_page_alloc at ffffffffa09a7122 [obdclass]
#6 [ffff88fdf08239e0] cl_page_find0 at ffffffffa09a742d [obdclass]
#7 [ffff88fdf0823a40] lov_page_init_raid0 at ffffffffa0cc0f21 [lov]
#8 [ffff88fdf0823aa0] cl_page_alloc at ffffffffa09a7122 [obdclass]
#9 [ffff88fdf0823ae0] cl_page_find0 at ffffffffa09a742d [obdclass]
#10 [ffff88fdf0823b40] ll_cl_init at ffffffffa0d74123 [lustre]
#11 [ffff88fdf0823bd0] ll_readpage at ffffffffa0d74485 [lustre]
#12 [ffff88fdf0823c00] do_generic_file_read at ffffffff810fa39e
#13 [ffff88fdf0823c80] generic_file_aio_read at ffffffff810fad4c
#14 [ffff88fdf0823d40] vvp_io_read_start at ffffffffa0da2fb0 [lustre]
#15 [ffff88fdf0823da0] cl_io_start at ffffffffa09af979 [obdclass]
#16 [ffff88fdf0823dd0] cl_io_loop at ffffffffa09b3d33 [obdclass]
#17 [ffff88fdf0823e00] ll_file_io_generic at ffffffffa0d49c32 [lustre]
#18 [ffff88fdf0823e70] ll_file_aio_read at ffffffffa0d4a3b3 [lustre]
#19 [ffff88fdf0823ec0] ll_file_read at ffffffffa0d4aec3 [lustre]
#20 [ffff88fdf0823f10] vfs_read at ffffffff8115b237
#21 [ffff88fdf0823f40] sys_read at ffffffff8115b3a3
In this case, the atomic_t (signed int) held:
crash> pd (int)0xffff943de11780fc
$10 = -1506317746
We've triggered this specific problem with configurations down to 11TB of physmem. A 10.5TB system can cat a small file without crashing.
I noticed several other cases where page counts are handled using a signed int, and suspect anything more than 4TB is problematic. The kernel itself is consistently using unsigned long for page counts on all architectures.