Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.5.3
-
None
-
3
-
9223372036854775807
Description
Some IEEL2.2.x and Lustre 2.5 installations have triggered this unexpected LBUG on Lustre Clients consuming ChangeLogs :
<3>LustreError: 31536:0:(llog_cat.c:192:llog_cat_id2handle()) store2-MDT0000-mdc-ffff88206515bc00: error opening log id 0x2def:1:0: rc = -2 <3>LustreError: 31536:0:(llog_cat.c:565:llog_cat_process_cb()) store2-MDT0000-mdc-ffff88206515bc00: cannot find handle for llog 0x2def:1: -2 <0>LustreError: 31536:0:(llog.c:861:llog_write()) ASSERTION( loghandle->lgh_obj != ((void *)0) ) failed: <0>LustreError: 31536:0:(llog.c:861:llog_write()) LBUG <4>Pid: 31536, comm: mdc_clg_send_th <4> <4>Call Trace: <4> [<ffffffffa0437895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0437e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0553c52>] llog_write+0x352/0x420 [obdclass] <4> [<ffffffffa0553e0c>] llog_cancel_rec+0xbc/0x7c0 [obdclass] <4> [<ffffffffa0558bcc>] llog_cat_cleanup+0x9c/0x260 [obdclass] <4> [<ffffffffa0559fd8>] llog_cat_process_cb+0x588/0x630 [obdclass] <4> [<ffffffffa0554e6b>] llog_process_thread+0x95b/0xd70 [obdclass] <4> [<ffffffffa0559a50>] ? llog_cat_process_cb+0x0/0x630 [obdclass] <4> [<ffffffffa0556a67>] llog_process_or_fork+0x127/0x550 [obdclass] <4> [<ffffffffa0557f09>] llog_cat_process_or_fork+0x89/0x350 [obdclass] <4> [<ffffffff8152aede>] ? mutex_lock+0x1e/0x50 <4> [<ffffffffa094fe10>] ? changelog_kkuc_cb+0x0/0x250 [mdc] <4> [<ffffffffa05581e9>] llog_cat_process+0x19/0x20 [obdclass] <4> [<ffffffffa09541eb>] mdc_changelog_send_thread+0x4cb/0x810 [mdc] <4> [<ffffffffa0953d20>] ? mdc_changelog_send_thread+0x0/0x810 [mdc] <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 31536, comm: mdc_clg_send_th Not tainted 2.6.32-504.12.2.el6.x86_64 #1 <4>Call Trace: <4> [<ffffffff8152933c>] ? panic+0xa7/0x16f <4> [<ffffffffa0437eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0553c52>] ? llog_write+0x352/0x420 [obdclass] <4> [<ffffffffa0553e0c>] ? llog_cancel_rec+0xbc/0x7c0 [obdclass] <4> [<ffffffffa0558bcc>] ? llog_cat_cleanup+0x9c/0x260 [obdclass] <4> [<ffffffffa0559fd8>] ? llog_cat_process_cb+0x588/0x630 [obdclass] <4> [<ffffffffa0554e6b>] ? llog_process_thread+0x95b/0xd70 [obdclass] <4> [<ffffffffa0559a50>] ? llog_cat_process_cb+0x0/0x630 [obdclass] <4> [<ffffffffa0556a67>] ? llog_process_or_fork+0x127/0x550 [obdclass] <4> [<ffffffffa0557f09>] ? llog_cat_process_or_fork+0x89/0x350 [obdclass] <4> [<ffffffff8152aede>] ? mutex_lock+0x1e/0x50 <4> [<ffffffffa094fe10>] ? changelog_kkuc_cb+0x0/0x250 [mdc] <4> [<ffffffffa05581e9>] ? llog_cat_process+0x19/0x20 [obdclass] <4> [<ffffffffa09541eb>] ? mdc_changelog_send_thread+0x4cb/0x810 [mdc] <4> [<ffffffffa0953d20>] ? mdc_changelog_send_thread+0x0/0x810 [mdc] <4> [<ffffffff8109e66e>] ? kthread+0x9e/0xc0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
The problem seems to have been introduced by patch for LU-5195, which does not handle the case where code is running on a Lustre Client (ie, cathandle->lgh_obj == NULL) and is only a ChangeLog/LLOG consumer/replicator.
This problem had not been seen before since concerned change is only being executed when triggering some catalog inconsistency. Looks like master is also impacted, will push a patch soon.
Attachments
Issue Links
- is related to
-
LU-5195 HSM: mdt_hsm_cdt_actions.c:104:cdt_llog_process() failed to process HSM_ACTIONS llog
- Resolved