[LU-6471] Unexpected Lustre Client LBUG in llog_write() Created: 17/Apr/15 Updated: 14/Jun/18 Resolved: 25/May/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Bruno Faccini (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Some IEEL2.2.x and Lustre 2.5 installations have triggered this unexpected LBUG on Lustre Clients consuming ChangeLogs : <3>LustreError: 31536:0:(llog_cat.c:192:llog_cat_id2handle()) store2-MDT0000-mdc-ffff88206515bc00: error opening log id 0x2def:1:0: rc = -2 <3>LustreError: 31536:0:(llog_cat.c:565:llog_cat_process_cb()) store2-MDT0000-mdc-ffff88206515bc00: cannot find handle for llog 0x2def:1: -2 <0>LustreError: 31536:0:(llog.c:861:llog_write()) ASSERTION( loghandle->lgh_obj != ((void *)0) ) failed: <0>LustreError: 31536:0:(llog.c:861:llog_write()) LBUG <4>Pid: 31536, comm: mdc_clg_send_th <4> <4>Call Trace: <4> [<ffffffffa0437895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0437e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0553c52>] llog_write+0x352/0x420 [obdclass] <4> [<ffffffffa0553e0c>] llog_cancel_rec+0xbc/0x7c0 [obdclass] <4> [<ffffffffa0558bcc>] llog_cat_cleanup+0x9c/0x260 [obdclass] <4> [<ffffffffa0559fd8>] llog_cat_process_cb+0x588/0x630 [obdclass] <4> [<ffffffffa0554e6b>] llog_process_thread+0x95b/0xd70 [obdclass] <4> [<ffffffffa0559a50>] ? llog_cat_process_cb+0x0/0x630 [obdclass] <4> [<ffffffffa0556a67>] llog_process_or_fork+0x127/0x550 [obdclass] <4> [<ffffffffa0557f09>] llog_cat_process_or_fork+0x89/0x350 [obdclass] <4> [<ffffffff8152aede>] ? mutex_lock+0x1e/0x50 <4> [<ffffffffa094fe10>] ? changelog_kkuc_cb+0x0/0x250 [mdc] <4> [<ffffffffa05581e9>] llog_cat_process+0x19/0x20 [obdclass] <4> [<ffffffffa09541eb>] mdc_changelog_send_thread+0x4cb/0x810 [mdc] <4> [<ffffffffa0953d20>] ? mdc_changelog_send_thread+0x0/0x810 [mdc] <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 31536, comm: mdc_clg_send_th Not tainted 2.6.32-504.12.2.el6.x86_64 #1 <4>Call Trace: <4> [<ffffffff8152933c>] ? panic+0xa7/0x16f <4> [<ffffffffa0437eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0553c52>] ? llog_write+0x352/0x420 [obdclass] <4> [<ffffffffa0553e0c>] ? llog_cancel_rec+0xbc/0x7c0 [obdclass] <4> [<ffffffffa0558bcc>] ? llog_cat_cleanup+0x9c/0x260 [obdclass] <4> [<ffffffffa0559fd8>] ? llog_cat_process_cb+0x588/0x630 [obdclass] <4> [<ffffffffa0554e6b>] ? llog_process_thread+0x95b/0xd70 [obdclass] <4> [<ffffffffa0559a50>] ? llog_cat_process_cb+0x0/0x630 [obdclass] <4> [<ffffffffa0556a67>] ? llog_process_or_fork+0x127/0x550 [obdclass] <4> [<ffffffffa0557f09>] ? llog_cat_process_or_fork+0x89/0x350 [obdclass] <4> [<ffffffff8152aede>] ? mutex_lock+0x1e/0x50 <4> [<ffffffffa094fe10>] ? changelog_kkuc_cb+0x0/0x250 [mdc] <4> [<ffffffffa05581e9>] ? llog_cat_process+0x19/0x20 [obdclass] <4> [<ffffffffa09541eb>] ? mdc_changelog_send_thread+0x4cb/0x810 [mdc] <4> [<ffffffffa0953d20>] ? mdc_changelog_send_thread+0x0/0x810 [mdc] <4> [<ffffffff8109e66e>] ? kthread+0x9e/0xc0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 The problem seems to have been introduced by patch for This problem had not been seen before since concerned change is only being executed when triggering some catalog inconsistency. Looks like master is also impacted, will push a patch soon. |
| Comments |
| Comment by Gerrit Updater [ 17/Apr/15 ] |
|
Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/14489 |
| Comment by Antoine Percher [ 17/Apr/15 ] |
|
We meet the same issue on T100 system with lustre 2.5.3.90 on the robinhood node |
| Comment by Bruno Faccini (Inactive) [ 17/Apr/15 ] |
|
Having a look to old (ie, running with Lustre version not including patch from <3>LustreError: 31536:0:(llog_cat.c:192:llog_cat_id2handle()) store2-MDT0000-mdc-ffff88206515bc00: error opening log id 0x2def:1:0: rc = -2 <3>LustreError: 31536:0:(llog_cat.c:565:llog_cat_process_cb()) store2-MDT0000-mdc-ffff88206515bc00: cannot find handle for llog 0x2def:1: -2 This means that since Servers/MDS also did not run with |
| Comment by Gerrit Updater [ 23/Apr/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14489/ |
| Comment by Peter Jones [ 25/May/15 ] |
|
Landed for 2.8 |