Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.4.0
-
Lustre: 2.3.51-3chaos
-
3
-
4363
Description
I see many messages on the console of our OSTs running 2.3.51-3chaos:
2012-10-04 03:49:04 BUG: Bad page state in process ll_ost_io01_039 pfn:63ae41 2012-10-04 03:49:04 page:ffffea0015ce1e38 flags:0040000000000080 count:0 mapcount:0 mapping:(null) index:0 (Tainted: P B ---------------- ) 2012-10-04 03:49:04 Pid: 7115, comm: ll_ost_io01_039 Tainted: P B ---------------- 2.6.32-220.23.1.1chaos.ch5.x86_64 #1 2012-10-04 03:49:04 Call Trace: 2012-10-04 03:49:04 [<ffffffff81121507>] ? bad_page+0x107/0x160 2012-10-04 03:49:04 [<ffffffff81124599>] ? free_hot_cold_page+0x1c9/0x220 2012-10-04 03:49:04 [<ffffffff811246af>] ? free_hot_page+0x2f/0x60 2012-10-04 03:49:04 [<ffffffff811275de>] ? __put_single_page+0x1e/0x30 2012-10-04 03:49:04 [<ffffffff81127755>] ? put_page+0x25/0x40 2012-10-04 03:49:04 [<ffffffffa086ff38>] ? ptlrpc_free_bulk+0x98/0x330 [ptlrpc] 2012-10-04 03:49:04 [<ffffffffa0d68e01>] ? ost_brw_write+0x501/0x15e0 [ost] 2012-10-04 03:49:04 [<ffffffffa08443c0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] 2012-10-04 03:49:04 [<ffffffffa0d6f4d2>] ? ost_handle+0x32e2/0x4690 [ost] 2012-10-04 03:49:04 [<ffffffffa088b39b>] ? ptlrpc_update_export_timer+0x4b/0x470 [ptlrpc] 2012-10-04 03:49:04 [<ffffffffa08937fc>] ? ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc] 2012-10-04 03:49:04 [<ffffffffa03306be>] ? cfs_timer_arm+0xe/0x10 [libcfs] 2012-10-04 03:49:04 [<ffffffffa034213f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs] 2012-10-04 03:49:04 [<ffffffffa088abb7>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc] 2012-10-04 03:49:04 [<ffffffff81051ba3>] ? __wake_up+0x53/0x70 2012-10-04 03:49:04 [<ffffffffa0894dd1>] ? ptlrpc_main+0xbf1/0x19e0 [ptlrpc] 2012-10-04 03:49:04 [<ffffffffa08941e0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] 2012-10-04 03:49:04 [<ffffffff8100c14a>] ? child_rip+0xa/0x20 2012-10-04 03:49:04 [<ffffffffa08941e0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] 2012-10-04 03:49:04 [<ffffffffa08941e0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] 2012-10-04 03:49:04 [<ffffffff8100c140>] ? child_rip+0x0/0x20
These drown out any other console messages, making it hard to pick out any other error messages.
It looks very similar to ORI-783, although that bug was triggered by setting 'sync_journal=1', which we are not doing here.