[LU-8673] kernel panic when mounting MDS after restore Created: 06/Oct/16  Updated: 22/Sep/22  Resolved: 06/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Martijn Brizee Assignee: WC Triage
Resolution: Incomplete Votes: 0
Labels: None
Environment:

os: Centos 7.2
kernel: Linux master 3.10.0-327.3.1.el7_lustre.x86_64 #1 SMP Thu Feb 18 10:53:23 PST 2016 x86_64 x86_64 x86_64 GNU/Linux


Issue Links:
Duplicate
is duplicated by LU-16182 llog_osd_prev_block() ASSERTION( last... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hi,

Running Lustre 2.8 on Centos 7.2

I did a file level restore of our MDS but when trying to mount the MDS to start lustre I get the following panic thrown:

Message from syslogd@master at Oct 6 19:26:18 ...
kernel:LustreError: 29516:0:(llog_osd.c:1075:llog_osd_prev_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:

Message from syslogd@master at Oct 6 19:26:18 ...
kernel:LustreError: 29516:0:(llog_osd.c:1075:llog_osd_prev_block()) LBUG
Oct 6 19:26:18 master kernel: Pid: 29516, comm: mount.lustre
Oct 6 19:26:18 master kernel: #012Call Trace:
Oct 6 19:26:18 master kernel: [<ffffffffa099b7d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
Oct 6 19:26:18 master kernel: [<ffffffffa099bd75>] lbug_with_loc+0x45/0xc0 [libcfs]
Oct 6 19:26:18 master kernel: [<ffffffffa0c5e917>] llog_osd_prev_block+0x9f7/0xaf0 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa0c4fee7>] llog_reverse_process+0x147/0xac0 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa0c531f6>] ? llog_cat_id2handle+0x336/0x660 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa131e6c0>] ? changelog_init_cb+0x0/0x1f0 [mdd]
Oct 6 19:26:18 master kernel: [<ffffffffa0c54917>] llog_cat_reverse_process_cb+0x157/0x540 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa0c50009>] llog_reverse_process+0x269/0xac0 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa0c547c0>] ? llog_cat_reverse_process_cb+0x0/0x540 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa0c50e39>] llog_cat_reverse_process+0x199/0x2d0 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa131e6c0>] ? changelog_init_cb+0x0/0x1f0 [mdd]
Oct 6 19:26:18 master kernel: [<ffffffffa13250c9>] mdd_prepare+0x1269/0x1a00 [mdd]
Oct 6 19:26:18 master kernel: [<ffffffffa11e1d01>] mdt_prepare+0x51/0x3b0 [mdt]
Oct 6 19:26:18 master kernel: [<ffffffffa0cbf0c4>] server_start_targets+0x2574/0x2e10 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa0c8b290>] ? class_config_llog_handler+0x0/0x1a80 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa0cc09ed>] server_fill_super+0x108d/0x184c [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa0c94e48>] lustre_fill_super+0x328/0x950 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffffa0c94b20>] ? lustre_fill_super+0x0/0x950 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffff811e1ccd>] mount_nodev+0x4d/0xb0
Oct 6 19:26:18 master kernel: [<ffffffffa0c8d108>] lustre_mount+0x38/0x60 [obdclass]
Oct 6 19:26:18 master kernel: [<ffffffff811e2679>] mount_fs+0x39/0x1b0
Oct 6 19:26:18 master kernel: [<ffffffff811fdf1f>] vfs_kern_mount+0x5f/0xf0
Oct 6 19:26:18 master kernel: [<ffffffff8120046e>] do_mount+0x24e/0xa40
Oct 6 19:26:18 master kernel: [<ffffffff8116defe>] ? __get_free_pages+0xe/0x50
Oct 6 19:26:18 master kernel: [<ffffffff81200cf6>] SyS_mount+0x96/0xf0
Oct 6 19:26:18 master kernel: [<ffffffff81645b09>] system_call_fastpath+0x16/0x1b
Oct 6 19:26:18 master kernel:
Oct 6 19:26:18 master kernel: Kernel panic - not syncing: LBUG

Message from syslogd@master at Oct 6 19:26:18 ...
kernel:Kernel panic - not syncing: LBUG

The messages file has some more lines before the kernel panic.
Oct 6 19:26:17 master kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache
Oct 6 19:26:18 master kernel: Lustre: MGS: Connection restored to 01e7b6f4-8781-be82-634f-62413c0d2a26 (at 0@lo)
Oct 6 19:26:18 master kernel: LustreError: 29706:0:(llog.c:598:llog_process_thread()) Local llog found corrupted
Oct 6 19:26:18 master kernel: LustreError: 29706:0:(llog.c:598:llog_process_thread()) Skipped 1 previous similar message
Oct 6 19:26:18 master kernel: LustreError: 29516:0:(llog_osd.c:1075:llog_osd_prev_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:
Oct 6 19:26:18 master kernel: LustreError: 29516:0:(llog_osd.c:1075:llog_osd_prev_block()) LBUG

An e2fsck of the volume showed no errors before the crash.



 Comments   
Comment by Bruno Faccini (Inactive) [ 06/Oct/16 ]

Hello Martjin,
Can you better detail what you mean by "file level restore of our MDS" ? And also why you had to use this?
I am asking you because from the crash/LBUG stack that you have attached, it seems that there are some unexpected inaccurate/corrupted content in one of the ChangeLogs LLOG.

Generated at Sat Feb 10 02:19:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.