[LU-7100] conf-sanity test_84 LBUGS with “(llog_osd.c:811:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index )” Created: 03/Sep/15 Updated: 19/Apr/17 Resolved: 19/Apr/17 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Mikhail Pershin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Tests run in the autotest environment |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
conf-sanity test 84 hangs at mount. We’ve seen this test LBUG with the stack trace below three times in the past month. Logs for an interop occurrence are at https://testing.hpdd.intel.com/test_sets/9145fb1a-51a8-11e5-9249-5254006e85c2 From the MDS log: 00:44:38:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 00:44:38:LustreError: 18100:0:(llog_osd.c:811:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed: 00:44:38:LustreError: 18100:0:(llog_osd.c:811:llog_osd_next_block()) LBUG 00:44:38:Pid: 18100, comm: llog_process_th 00:44:38: 00:44:38:Call Trace: 00:44:38: [<ffffffffa046c875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 00:44:38: [<ffffffffa046ce77>] lbug_with_loc+0x47/0xb0 [libcfs] 00:44:38: [<ffffffffa058ed25>] llog_osd_next_block+0xb75/0xbf0 [obdclass] 00:44:38: [<ffffffffa0580bae>] llog_process_thread+0x2de/0xfc0 [obdclass] 00:44:38: [<ffffffffa05cc3a5>] ? keys_fill+0xd5/0x1b0 [obdclass] 00:44:38: [<ffffffffa0581ed5>] llog_process_thread_daemonize+0x45/0x70 [obdclass] 00:44:38: [<ffffffffa0581e90>] ? llog_process_thread_daemonize+0x0/0x70 [obdclass] 00:44:38: [<ffffffff8109e78e>] kthread+0x9e/0xc0 00:44:38: [<ffffffff8100c28a>] child_rip+0xa/0x20 00:44:38: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 00:44:38: [<ffffffff8100c280>] ? child_rip+0x0/0x20 00:44:38: 00:44:38:Kernel panic - not syncing: LBUG 00:44:38:Pid: 18100, comm: llog_process_th Not tainted 2.6.32-504.30.3.el6_lustre.g339e9ad.x86_64 #1 In a different occurrence and in a DNE setup, with logs at https://testing.hpdd.intel.com/test_sets/2eae8eae-4f7d-11e5-bc53-5254006e85c2, the MDS console has a few more errors before the LBUG: 22:49:38:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 22:49:38:LustreError: 14217:0:(llog_osd.c:788:llog_osd_next_block()) lustre-MDT0000-osd: invalid llog tail at log id 0x4:10/0 offset 16384 22:49:38:LustreError: 14198:0:(mgs_llog.c:457:mgs_find_or_make_fsdb()) Can't get db from client log -22 22:49:38:LustreError: 14198:0:(mgs_llog.c:496:mgs_check_index()) Can't get db for lustre 22:49:38:LustreError: 14219:0:(llog_osd.c:778:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed: 22:49:38:LustreError: 14219:0:(llog_osd.c:778:llog_osd_next_block()) LBUG 22:49:38:Pid: 14219, comm: llog_process_th 22:49:38: 22:49:38:Call Trace: 22:49:38: [<ffffffffa046c875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 22:49:38: [<ffffffffa046ce77>] lbug_with_loc+0x47/0xb0 [libcfs] 22:49:38: [<ffffffffa058ed15>] llog_osd_next_block+0xb75/0xbf0 [obdclass] 22:49:38: [<ffffffffa0580b4e>] llog_process_thread+0x2de/0xfc0 [obdclass] 22:49:38: [<ffffffffa05cc0e5>] ? keys_fill+0xd5/0x1b0 [obdclass] 22:49:38: [<ffffffffa0581e75>] llog_process_thread_daemonize+0x45/0x70 [obdclass] 22:49:38: [<ffffffffa0581e30>] ? llog_process_thread_daemonize+0x0/0x70 [obdclass] 22:49:38: [<ffffffff8109e78e>] kthread+0x9e/0xc0 22:49:38: [<ffffffff8100c28a>] child_rip+0xa/0x20 22:49:38: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 22:49:38: [<ffffffff8100c280>] ? child_rip+0x0/0x20 22:49:38: 22:49:38:Kernel panic - not syncing: LBUG 22:49:38:Pid: 14219, comm: llog_process_th Not tainted 2.6.32-504.30.3.el6_lustre.gc67434c.x86_64 #1 Another set of logs on review-dne-part-1 are at https://testing.hpdd.intel.com/test_sets/189b85b6-38a5-11e5-9f03-5254006e85c2 |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 04/Sep/15 ] |
|
Hi Mike, |
| Comment by Andreas Dilger [ 18/Sep/15 ] |
|
Hit this a few times: |
| Comment by Gerrit Updater [ 08/Oct/15 ] |
|
Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/16771 |
| Comment by James Nunez (Inactive) [ 19/Apr/17 ] |
|
conf-sanity test 84 has not timed out nor failed for the past 3 months. Let's close this ticket and we can reopen if this test or this issue comes up again. |