Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.1.6
-
None
-
RHEL 6 w/ kernel 2.6.32_220.23.1
-
3
-
14689
Description
One Lustre client frequently crash on LBUG with the ASSERTION( page_idx > ria->ria_stoff ).. (13 crashes in the past 3 months)
This Lustre client acts as a nfs server and exports Lustre to a web server through nfs-ganesha.
----8< ----
[24937.600920] Lustre: DEBUG MARKER: Thu Jun 5 20:00:01 2014
[24937.600921]
[24950.667750] LustreError: 4126:0:(rw.c:698:ll_read_ahead_pages()) ASSERTION( page_idx > ria->ria_stoff ) failed: Invalid page_idx 234497rs 234497 re 300287 ro 234751 rl 256 rp 1
[24950.683642] LustreError: 4126:0:(rw.c:698:ll_read_ahead_pages()) LBUG
[24950.690154] Pid: 4126, comm: ganesha.nfsd
[24950.695572]
[24950.695573] Call Trace:
[24950.702337] [<ffffffffa05697f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[24950.710703] [<ffffffffa0569e07>] lbug_with_loc+0x47/0xb0 [libcfs]
[24950.718317] [<ffffffffa0bb511f>] ll_readahead+0x10cf/0x1100 [lustre]
[24950.726131] [<ffffffffa0bdc805>] vvp_io_read_page+0x305/0x360 [lustre]
[24950.734159] [<ffffffffa068eb4d>] cl_io_read_page+0x8d/0x170 [obdclass]
[24950.742108] [<ffffffffa0682c19>] ? cl_page_assume+0xf9/0x2d0 [obdclass]
[24950.750166] [<ffffffffa0bb5746>] ll_readpage+0x96/0x200 [lustre]
[24950.757661] [<ffffffff810ff88c>] generic_file_aio_read+0x1fc/0x700
[24950.765360] [<ffffffff810816ff>] ? up+0x2f/0x50
[24950.771433] [<ffffffffa0bdce1b>] vvp_io_read_start+0x13b/0x3e0 [lustre]
[24950.779575] [<ffffffffa068cb4a>] cl_io_start+0x6a/0x140 [obdclass]
[24950.787244] [<ffffffffa0690e2c>] cl_io_loop+0xcc/0x190 [obdclass]
[24950.794848] [<ffffffffa0b8d097>] ll_file_io_generic+0x3a7/0x560 [lustre]
[24950.803059] [<ffffffffa0b8d389>] ll_file_aio_read+0x139/0x2c0 [lustre]
[24950.811092] [<ffffffffa0b8d849>] ll_file_read+0x169/0x2a0 [lustre]
[24950.818784] [<ffffffff81164525>] vfs_read+0xb5/0x1a0
[24950.825275] [<ffffffff81164852>] sys_pread64+0x82/0xa0
[24950.831917] [<ffffffff810030f2>] system_call_fastpath+0x16/0x1b
[24950.839353]
[24950.842659] Kernel panic - not syncing: LBUG
[24950.847986] Pid: 4126, comm: ganesha.nfsd Not tainted 2.6.32-220.23.1.bl6.Bull.28.10.x86_64 #1
[24950.858073] Call Trace:
[24950.861907] [<ffffffff814851a0>] ? panic+0x78/0x143
[24950.868308] [<ffffffffa0569e5b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
[24950.876096] [<ffffffffa0bb511f>] ? ll_readahead+0x10cf/0x1100 [lustre]
[24950.884141] [<ffffffffa0bdc805>] ? vvp_io_read_page+0x305/0x360 [lustre]
[24950.892371] [<ffffffffa068eb4d>] ? cl_io_read_page+0x8d/0x170 [obdclass]
[24950.900562] [<ffffffffa0682c19>] ? cl_page_assume+0xf9/0x2d0 [obdclass]
[24950.908676] [<ffffffffa0bb5746>] ? ll_readpage+0x96/0x200 [lustre]
[24950.916357] [<ffffffff810ff88c>] ? generic_file_aio_read+0x1fc/0x700
[24950.924221] [<ffffffff810816ff>] ? up+0x2f/0x50
[24950.930283] [<ffffffffa0bdce1b>] ? vvp_io_read_start+0x13b/0x3e0 [lustre]
[24950.938586] [<ffffffffa068cb4a>] ? cl_io_start+0x6a/0x140 [obdclass]
[24950.946448] [<ffffffffa0690e2c>] ? cl_io_loop+0xcc/0x190 [obdclass]
[24950.954218] [<ffffffffa0b8d097>] ? ll_file_io_generic+0x3a7/0x560 [lustre]
[24950.962601] [<ffffffffa0b8d389>] ? ll_file_aio_read+0x139/0x2c0 [lustre]
[24950.970813] [<ffffffffa0b8d849>] ? ll_file_read+0x169/0x2a0 [lustre]
[24950.978649] [<ffffffff81164525>] ? vfs_read+0xb5/0x1a0
[24950.985288] [<ffffffff81164852>] ? sys_pread64+0x82/0xa0
[24950.992092] [<ffffffff810030f2>] ? system_call_fastpath+0x16/0x1b
----8< ----
We asked the customer to add read ahead to the debug log. (lctl set_param debug=+reada)
The debug log is available in the attached support bundle (from crash 2014-06-05-20:01:29).
Read-ahead settings:
----8< ----
- lctl get_param llite..max_read_ahead
llite.store1-ffff88120ee93c00.max_read_ahead_mb=40
llite.store1-ffff88120ee93c00.max_read_ahead_per_file_mb=40
llite.store1-ffff88120ee93c00.max_read_ahead_whole_mb=2
----8< ----
We asked the customer to take a look at the web server logs to see which files are accessed at the time of the crash. This is never the same file.
It looks like LU-4192
Attachments
Issue Links
- duplicates
-
LU-4192 NFS server crash while fsx runs on clients
- Resolved