Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
For a single-stripe file, after applied https://review.whamcloud.com/c/fs/lustre-release/+/49226
Subject: LU-16338 readahead: clip readahead with kms,
It cause a dead loop:
https://testing.whamcloud.com/test_logs/8d73c4e7-0e6e-482c-b93c-bf7159706890/show_text
00000080:00200000:1.0:1676614736.277040:0:21215:0:(rw.c:1981:ll_readpage()) pgno:260, cnt:1032192, pos:0 00000080:00000001:1.0:1676614736.277041:0:21215:0:(vvp_io.c:1654:vvp_io_read_ahead()) Process entered 00000080:00000001:1.0:1676614736.277041:0:21215:0:(vvp_io.c:1666:vvp_io_read_ahead()) Process leaving (rc=0 : 0 : 0) 00000008:00000001:1.0:1676614736.277042:0:21215:0:(osc_io.c:83:osc_io_read_ahead()) Process entered 00000008:00000001:1.0:1676614736.277042:0:21215:0:(osc_lock.c:1281:osc_obj_dlmlock_at_pgoff()) Process entered 00000008:00000001:1.0:1676614736.277043:0:21215:0:(osc_request.c:3137:osc_match_base()) Process entered 00000008:00000001:1.0:1676614736.277043:0:21215:0:(osc_request.c:3172:osc_match_base()) Process leaving (rc=4 : 4 : 4) 00000008:00000001:1.0:1676614736.277044:0:21215:0:(osc_lock.c:1315:osc_obj_dlmlock_at_pgoff()) Process leaving (rc=18446620395345229440 : -123678364322176 : ffff8f83e3073680) 00000008:00000001:1.0:1676614736.277045:0:21215:0:(osc_io.c:112:osc_io_read_ahead()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:1.0:1676614736.277045:0:21215:0:(rw.c:2012:ll_readpage()) Process leaving (rc=524289 : 524289 : 80001) 00000080:00000001:1.0:1676614736.277046:0:21215:0:(rw.c:1873:ll_readpage()) Process entered 00000080:00200000:1.0:1676614736.277046:0:21215:0:(rw.c:1981:ll_readpage()) pgno:260, cnt:1032192, pos:0 00000080:00000001:1.0:1676614736.277047:0:21215:0:(vvp_io.c:1654:vvp_io_read_ahead()) Process entered 00000080:00000001:1.0:1676614736.277047:0:21215:0:(vvp_io.c:1666:vvp_io_read_ahead()) Process leaving (rc=0 : 0 : 0) 00000008:00000001:1.0:1676614736.277048:0:21215:0:(osc_io.c:83:osc_io_read_ahead()) Process entered 00000008:00000001:1.0:1676614736.277048:0:21215:0:(osc_lock.c:1281:osc_obj_dlmlock_at_pgoff()) Process entered 00000008:00000001:1.0:1676614736.277049:0:21215:0:(osc_request.c:3137:osc_match_base()) Process entered 00000008:00000001:1.0:1676614736.277049:0:21215:0:(osc_request.c:3172:osc_match_base()) Process leaving (rc=4 : 4 : 4) 00000008:00000001:1.0:1676614736.277050:0:21215:0:(osc_lock.c:1315:osc_obj_dlmlock_at_pgoff()) Process leaving (rc=18446620395345229440 : -123678364322176 : ffff8f83e3073680) 00000008:00000001:1.0:1676614736.277051:0:21215:0:(osc_io.c:112:osc_io_read_ahead()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:1.0:1676614736.277051:0:21215:0:(rw.c:2012:ll_readpage()) Process leaving (rc=524289 : 524289 : 80001)
The dead loop code lines:
if (cl_offset(clob, vmpage->index) >= iter->count + iocb->ki_pos) { result = cl_io_read_ahead(env, io, vmpage->index, &ra); if (result < 0 || vmpage->index > ra.cra_end_idx) { cl_read_ahead_release(env, &ra); unlock_page(vmpage); RETURN(AOP_TRUNCATED_PAGE); ===>AOP_TRUNCATED_PAGE = 80001 } }
After investigated, I found the ending beyond offset calculation is wrong as each time finished the read of a page, it will advance the @iter->count (iter->count - read_bytes).
The wrong ending offset results in the dead loop described above.