[LU-7039] llog_osd.c:778:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed: Created: 25/Aug/15  Updated: 08/Jun/20  Resolved: 27/Jan/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: Cliff White (Inactive) Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None
Environment:

Hyperion SWL test


Attachments: File LU-7039.llog.txt.gz     File console.log.bz2     File lola-10-lustre-log.1444148492.4548-dm-minus-one.log.bz2     File lustre-log.1443755187.9078     File memory-counter-lola-11.dat.bz2     File messages-lola-11.log.bz2     File slab-details-lola-11.dat.bz2     File slab-details-one-file-per-slab.tar.bz2     File slab-total-lola-11.dat.bz2     File vmcore-dmesg.txt.bz2    
Issue Links:
Duplicate
is duplicated by LU-7391 (osp_md_object.c:1155:osp_md_write())... Resolved
is duplicated by LU-7100 conf-sanity test_84 LBUGS with “(llog... Closed
Related
is related to LU-6831 The ticket for tracking all DNE2 bugs Reopened
is related to LU-7715 out_handle() misuses GOTO() Resolved
is related to LU-6994 MDT recovery timer goes negative, rec... Resolved
is related to LU-7455 Tracking tickets to make DNE pass soa... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running tip of master with SWL and DNE.

2015-08-25 12:19:21 Lustre: lustre-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
2015-08-25 12:19:21 LustreError: 6378:0:(llog_osd.c:778:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:
2015-08-25 12:19:21 LustreError: 6384:0:(llog_osd.c:788:llog_osd_next_block()) lustre-MDT000b-osp-MDT0001: invalid llog tail at log id 0x3:2147484674/0 offset 3407872
2015-08-25 12:19:21 LustreError: 6384:0:(lod_dev.c:392:lod_sub_recovery_thread()) lustre-MDT000b-osp-MDT0001 getting update log failed: rc = -22
2015-08-25 12:19:21 LustreError: 6378:0:(llog_osd.c:778:llog_osd_next_block()) LBUG
2015-08-25 12:19:21 Pid: 6378, comm: lod0001_rec0005
2015-08-25 12:19:21 Aug 25 12:19:21
2015-08-25 12:19:21 iws12 kernel: LuCall Trace:
2015-08-25 12:19:21 streError: 6378: [<ffffffffa04a2875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2015-08-25 12:19:21 0:(llog_osd.c:77 [<ffffffffa04a2e77>] lbug_with_loc+0x47/0xb0 [libcfs]
2015-08-25 12:19:21 8:llog_osd_next_ [<ffffffffa08dad15>] llog_osd_next_block+0xb75/0xbf0 [obdclass]
2015-08-25 12:19:21 block()) ASSERTI [<ffffffffa08ccb4e>] llog_process_thread+0xInitializing cgroup subsys cpuset
2015-08-25 12:19:21 Initializing cgroup subsys cpu

Attempting to recreate and get a dump



 Comments   
Comment by Cliff White (Inactive) [ 25/Aug/15 ]

Repeat, got the full stack this time:

2015-08-25 12:33:20 Lustre: lustre-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
2015-08-25 12:33:20 LustreError: 6140:0:(llog_osd.c:778:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:
2015-08-25 12:33:20 LustreError: 6134:0:(llog_osd.c:788:llog_osd_next_block()) lustre-MDT0005-osp-MDT0001: invalid llog tail at log id 0x3:1073742851/0 offset 3407872
2015-08-25 12:33:20 LustreError: 6134:0:(lod_dev.c:392:lod_sub_recovery_thread()) lustre-MDT0005-osp-MDT0001 getting update log failed: rc = -22
2015-08-25 12:33:20 LustreError: 6140:0:(llog_osd.c:778:llog_osd_next_block()) LBUG
2015-08-25 12:33:20 Pid: 6140, comm: lod0001_rec000b
2015-08-25 12:33:20
2015-08-25 12:33:20 Call Trace:
2015-08-25 12:33:20  [<ffffffffa04a2875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2015-08-25 12:33:20  [<ffffffffa04a2e77>] lbug_with_loc+0x47/0xb0 [libcfs]
2015-08-25 12:33:20  [<ffffffffa05f4d15>] llog_osd_next_block+0xb75/0xbf0 [obdclass]
2015-08-25 12:33:20  [<ffffffffa05e6b4e>] llog_process_thread+0x2de/0xfc0 [obdclass]
2015-08-25 12:33:20  [<ffffffffa05e78ed>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
2015-08-25 12:33:20  [<ffffffffa12bc990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
2015-08-25 12:33:20  [<ffffffffa05eadb8>] llog_cat_process_cb+0x458/0x600 [obdclass]
2015-08-25 12:33:20  [<ffffffffa05e71ba>] llog_process_thread+0x94a/0xfc0 [obdclass]
2015-08-25 12:33:20  [<ffffffffa05e78ed>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
2015-08-25 12:33:20  [<ffffffffa05ea960>] ? llog_cat_process_cb+0x0/0x600 [obdclass]
2015-08-25 12:33:20  [<ffffffffa05e961d>] llog_cat_process_or_fork+0x1ad/0x300 [obdclass]
2015-08-25 12:33:20  [<ffffffffa12bc990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
2015-08-25 12:33:20  [<ffffffffa12bc990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
2015-08-25 12:33:20  [<ffffffffa04aebf1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
2015-08-25 12:33:20  [<ffffffffa05e9789>] llog_cat_process+0x19/0x20 [obdclass]
2015-08-25 12:33:20  [<ffffffffa12c1e4a>] lod_sub_recovery_thread+0x69a/0xbc0 [lod]
2015-08-25 12:33:20  [<ffffffffa12c17b0>] ? lod_sub_recovery_thread+0x0/0xbc0 [lod]
2015-08-25 12:33:20  [<ffffffff8109e78e>] kthread+0x9e/0xc0
2015-08-25 12:33:20  [<ffffffff8100c28a>] child_rip+0xa/0x20
2015-08-25 12:33:20  [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
2015-08-25 12:33:20  [<ffffffff8100c280>] ? child_rip+0x0/0x20

Lustre log attached

Comment by Cliff White (Inactive) [ 25/Aug/15 ]

Lustre log

Comment by James Nunez (Inactive) [ 25/Aug/15 ]

I saw this on some master failures from last night. I attributed the failures to LU-3263. The assertions and stack traces look alike.

Comment by Gerrit Updater [ 26/Aug/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16084
Subject: LU-7039 llog: debug patches
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: de2d20f45a8e41a6e1510b596097e6c8ab270396

Comment by Di Wang [ 26/Aug/15 ]

Cliff: could you please try this debug patch and get the debug log(-1 level) for me? thanks.

Comment by Joseph Gmitter (Inactive) [ 03/Sep/15 ]

Per Cliff: we have tried the debug patch for a couple days and didn't get a debug log. Di is aware of this and will provide a new build.

Comment by Di Wang [ 04/Sep/15 ]

cliff: could you please try the test with this build http://review.whamcloud.com/#/c/13942/ , I added a few debug information and also some fixes, which are not landed yet.

Comment by Frank Heckes (Inactive) [ 02/Oct/15 ]

Same error happened during soak testing of build http://review.whamcloud.com/#/c/16685/
(see https://wiki.hpdd.intel.com/pages/viewpage.action?spaceKey=Releases&title=Soak+Testing+on+Lola#SoakTestingonLola-20150930)
Here's the stack trace:

Oct  1 20:06:27 lola-10 kernel: LustreError: 9078:0:(llog_osd.c:856:llog_osd_next_block()) ASSERTION( last_rec
->lrh_index == tail->lrt_index ) failed: 
Oct  1 20:06:27 lola-10 kernel: LustreError: 9078:0:(llog_osd.c:856:llog_osd_next_block()) LBUG
Oct  1 20:06:27 lola-10 kernel: Pid: 9078, comm: lod0004_rec0000
Oct  1 20:06:27 lola-10 kernel: 
Oct  1 20:06:27 lola-10 kernel: Call Trace:
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa05c1875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa05c1e77>] lbug_with_loc+0x47/0xb0 [libcfs]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa06d5b90>] llog_osd_next_block+0xa40/0xc80 [obdclass]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa06c6c0e>] llog_process_thread+0x2de/0xfc0 [obdclass]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa06c79ad>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa11ac990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa06cae78>] llog_cat_process_cb+0x458/0x600 [obdclass]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa06c727a>] llog_process_thread+0x94a/0xfc0 [obdclass]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa06c79ad>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa06caa20>] ? llog_cat_process_cb+0x0/0x600 [obdclass]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa06c96dd>] llog_cat_process_or_fork+0x1ad/0x300 [obdclass]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa11d8209>] ? lod_sub_prep_llog+0x4f9/0x7a0 [lod]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa11ac990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa06c9849>] llog_cat_process+0x19/0x20 [obdclass]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa11b1e2a>] lod_sub_recovery_thread+0x69a/0xbc0 [lod]
Oct  1 20:06:27 lola-10 kernel: [<ffffffffa11b1790>] ? lod_sub_recovery_thread+0x0/0xbc0 [lod]
Oct  1 20:06:27 lola-10 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0
Oct  1 20:06:27 lola-10 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Oct  1 20:06:27 lola-10 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
Oct  1 20:06:27 lola-10 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Oct  1 20:06:27 lola-10 kernel: 
Oct  1 20:06:27 lola-10 kernel: LustreError: dumping log to /tmp/lustre-log.1443755187.9078

Also attached the debug log. I hope it contains some usable informations.

Comment by Frank Heckes (Inactive) [ 02/Oct/15 ]

debug log of MDS node lola-10

Comment by Frank Heckes (Inactive) [ 02/Oct/15 ]

The node continuously panics with same error at the moment. On thing to add for the test set-up. AS we hit LDEV-193
MDTs are mounted with option 'abort_recovery'.

Comment by Gerrit Updater [ 02/Oct/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16714
Subject: LU-7039 llog: remove LASSERT check for record index
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f07a1980bd1e5cb08a00108ef286adc6bad4b4ea

Comment by Di Wang [ 02/Oct/15 ]

Frank, Cliff: I add some debug information in this patch. Could you please run test with this patch and collect -1 level debug log for me? Thank you.

Comment by Frank Heckes (Inactive) [ 05/Oct/15 ]

yep, we'll create a build containing your patch and the fix for LDEV-24. Otherwise this bug has no chance to occur.

Comment by Frank Heckes (Inactive) [ 05/Oct/15 ]

lola is updated with your patch, but MDSes crash on first mount.

Comment by Frank Heckes (Inactive) [ 06/Oct/15 ]

A single debug message beside a LBUG was written:

Oct  5 09:51:39 lola-10 kernel: LustreError: 14865:0:(llog_osd.c:866:llog_osd_next_block()) soaked-MDT0000-osp
-MDT0004: invalid llog head/tail at log id 0x2:165027/0 offset 163840 last_rec 1515870810/1515870810/151587081
0/1515870810tail 0/0
Oct  5 09:51:39 lola-10 kernel: LustreError: 14865:0:(llog_osd.c:868:llog_osd_next_block()) LBUG
Oct  5 09:51:39 lola-10 kernel: Pid: 14865, comm: lod0004_rec0000
Oct  5 09:51:39 lola-10 kernel: 
Oct  5 09:51:39 lola-10 kernel: Call Trace:
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa05c1875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa05c1e77>] lbug_with_loc+0x47/0xb0 [libcfs]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa06d5c23>] llog_osd_next_block+0xad3/0xd70 [obdclass]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa06c6c0e>] llog_process_thread+0x2de/0xfc0 [obdclass]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa06c79ad>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa11ae990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa06cae78>] llog_cat_process_cb+0x458/0x600 [obdclass]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa06c727a>] llog_process_thread+0x94a/0xfc0 [obdclass]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa06c79ad>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa06caa20>] ? llog_cat_process_cb+0x0/0x600 [obdclass]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa06c96dd>] llog_cat_process_or_fork+0x1ad/0x300 [obdclass]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa11da549>] ? lod_sub_prep_llog+0x4f9/0x7a0 [lod]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa11ae990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa06c9849>] llog_cat_process+0x19/0x20 [obdclass]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa11b3e2a>] lod_sub_recovery_thread+0x69a/0xbc0 [lod]
Oct  5 09:51:39 lola-10 kernel: [<ffffffffa11b3790>] ? lod_sub_recovery_thread+0x0/0xbc0 [lod]
Oct  5 09:51:39 lola-10 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0
Oct  5 09:51:39 lola-10 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Oct  5 09:51:39 lola-10 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
Oct  5 09:51:39 lola-10 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Oct  5 09:51:39 lola-10 kernel:
Comment by Di Wang [ 06/Oct/15 ]

According to the debug log

00000004:00000010:25.0:1444148492.324048:0:4548:0:(osp_trans.c:215:osp_update_request_destroy()) kfreed 'our': 72 at ffff880816a30740.
00000004:00000040:25.0:1444148492.324050:0:4548:0:(osp_md_object.c:1208:osp_md_read()) soaked-MDT0000-osp-MDT0004: total read [0x2000284a3:0x2:0x0] pos 163840 len 32768
00000004:00000001:25.0:1444148492.324059:0:4548:0:(osp_md_object.c:1209:osp_md_read()) Process leaving via out (rc=32768 : 32768 : 0x8000)
00000040:00020000:25.0:1444148492.324062:0:4548:0:(llog_osd.c:866:llog_osd_next_block()) soaked-MDT0000-osp-MDT0004: invalid llog head/tail at log id 0x2:165027/0 offset 163840 last_rec 1515870810/1515870810/1515870810/1515870810tail 0/0

tail->lrt_len = 0 & tail->lrt_index = 0. Hmm, it seems update log is corrupted.

Frank: could you please mount MDT0 as ldiskfs,

mount -t ldiskfs mdt0-dev  /mnt/mds1

and get me this file /mnt/mds1/update_log_dir/[0x2000284a3:0x2:0x0] and post it here. Thanks.

Comment by Di Wang [ 07/Oct/15 ]

Sigh, according the debug log and real llog update file, it seems llog file is corrupted, and the end of llog file is filled with junks. Unfortunately there are no enough information to know why it is corrupted.

To fix the filesystem, it needs to mount all of MDTs as ldiskfs, and delete these files

[root@testnode tests]# mount -t ldiskfs -o loop /tmp/lustre-mdtn /mnt/mdsn
[root@testnode tests]# rm -rf /mnt/mdsn/update_log
[root@testnode tests]# rm -rf /mnt/mdsn/update_log_dir
[root@testnode tests]# umount /mnt/mdsn

Then restart lustre.

Comment by Gerrit Updater [ 07/Oct/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16740
Subject: LU-7039 llog: skip to next chunk for corrupt record
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c612e7c78344b2ff88c0d5803b558c9991ab766f

Comment by Di Wang [ 07/Oct/15 ]

This is not a real fix. Just fix log_reader to make it skip the corrupt records.

Comment by Mikhail Pershin [ 08/Oct/15 ]

the same issue with LU-7100, but different llog

Comment by Mikhail Pershin [ 08/Oct/15 ]

Di, I notice that changes made by you some time ago:

	lgi->lgi_off = max_t(__u64, lgi->lgi_attr.la_size, lgi->lgi_off);

Not sure why that was made, could you explain that a bit? IIRC, that was done so because la_size may be not yet updated after first write, e.g. header, right?
If so then the same can happen with pad record, consider that pad was written, the lgi_off with offset is lost after that - it is rewriten by 0 for header update, but la_size may be not yet updated, so the record itself will overwrite the pad record.

The solution could be saving the lgi_off after pad writting in local variable and use it.

Comment by Mikhail Pershin [ 08/Oct/15 ]

I've just cook the patch if that make sense: http://review.whamcloud.com/16771

Comment by Di Wang [ 08/Oct/15 ]
If so then the same can happen with pad record, consider that pad was written, the lgi_off with offset is lost after that - it is rewriten by 0 for header update, but la_size may be not yet updated, so the record itself will overwrite the pad record.

Hmm, we suppose to update the size in osp cache, see osp_md_write(). Note: this is only for remote llog object, for local llog object, adding pad record will update la_size in time, since you also see this problem in LU-7100 for local log, so I doubt this is the reason.

Comment by Mikhail Pershin [ 09/Oct/15 ]

If so, then we don't need max_t(__u64, lgi->lgi_attr.la_size, lgi->lgi_off); anymore, right? it can be just la_size always.

Comment by Mikhail Pershin [ 09/Oct/15 ]

As I can see that obj->ooa may not exists, and it seems in most cases that is so, because it it is created just upon create of file what is not case with llog. Nevertheless, I agree that my suggestion doesn't explain corruption in local llog, I've added some debug in patch, let's wait for new issues.

Comment by Frank Heckes (Inactive) [ 09/Oct/15 ]

Both patches installed for this bug + LDEV-24 patch that landed after creation of the build (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151007) and the cleanup procedure was executed.
The error didn't happened since now.
The following error message was printed during failover of MDT-0004 from lola-10 --> lola-11 and on other MDSes, too:

Oct  8 16:23:23 lola-11 kernel: LDISKFS-fs (dm-6): recovery complete
Oct  8 16:23:23 lola-11 kernel: LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. quota=on. Opts: 
Oct  8 16:23:23 lola-11 kernel: Lustre: soaked-MDT0006: Connection restored to 192.168.1.111@o2ib10 (at 0@lo)
Oct  8 16:23:23 lola-11 kernel: Lustre: Skipped 3 previous similar messages
Oct  8 16:23:23 lola-11 kernel: LustreError: 11-0: soaked-MDT0003-osp-MDT0005: operation mds_connect to node 1
92.168.1.109@o2ib10 failed: rc = -114
Oct  8 16:23:23 lola-11 kernel: LustreError: Skipped 2 previous similar messages
Oct  8 16:23:23 lola-11 kernel: Lustre: soaked-MDT0005: Imperative Recovery enabled, recovery window shrunk fr
om 300-900 down to 150-450
Oct  8 16:23:23 lola-11 sshd[5890]: Received disconnect from 10.4.0.116: 11: disconnected by user
Oct  8 16:23:23 lola-11 sshd[5890]: pam_unix(sshd:session): session closed for user root
Oct  8 16:23:23 lola-11 kernel: LustreError: 5981:0:(llog.c:534:llog_process_thread()) soaked-MDT0006-osp-MDT0005: Invalid record: index 8 but expected 7
Oct  8 16:23:23 lola-11 kernel: LustreError: 5981:0:(lod_dev.c:392:lod_sub_recovery_thread()) soaked-MDT0006-osp-MDT0005 getting update log failed: rc = -34

Not sure whether it is related to this bug.

Comment by Gerrit Updater [ 12/Oct/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16797
Subject: LU-7039 tgt: Delete txn_callback correctly in tgt_init()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 57daeb1473bbf2cf3c3cfde83890818beee55f28

Comment by Gerrit Updater [ 15/Oct/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16839
Subject: LU-7039 osp: set update req error once import invalid
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 946fbe320ab6f093885247ec8234f2e4a28b50dc

Comment by Frank Heckes (Inactive) [ 16/Oct/15 ]

soak ran with build for CentOS-6.6 of build associated with change http://review.whamcloud.com/#/c/16838/
After reset of single MDS and remount the error occurred again:

Oct 15 16:08:03 lola-10 kernel: LDISKFS-fs warning (device dm-5): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, please wait.
Oct 15 16:08:03 lola-10 kernel: 
Oct 15 16:09:00 lola-10 kernel: LDISKFS-fs (dm-5): recovery complete
Oct 15 16:09:00 lola-10 kernel: LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. quota=on. Opts: 
Oct 15 16:09:01 lola-10 kernel: LustreError: 137-5: soaked-MDT0004_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
Oct 15 16:09:01 lola-10 kernel: Lustre: soaked-MDT0005: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
Oct 15 16:09:01 lola-10 kernel: Lustre: 4390:0:(llog.c:520:llog_process_thread()) invalid length 0 in llog record for index 0/43
Oct 15 16:09:01 lola-10 kernel: LustreError: 4390:0:(lod_dev.c:402:lod_sub_recovery_thread()) soaked-MDT0002-osp-MDT0005 getting update log failed: rc = -22
Oct 15 16:09:01 lola-10 kernel: LustreError: 4242:0:(mdt_handler.c:5596:mdt_iocontrol()) soaked-MDT0005: Aborting recovery for device
Oct 15 16:09:01 lola-10 kernel: LustreError: 4242:0:(ldlm_lib.c:2473:target_stop_recovery_thread()) soaked-MDT0005: Aborting recovery
Oct 15 16:09:01 lola-10 kernel: Lustre: 4395:0:(ldlm_lib.c:1946:target_recovery_overseer()) recovery is aborted, evict exports in recovery
Oct 15 16:09:01 lola-10 kernel: Lustre: soaked-MDT0005: disconnecting 16 stale clients
Oct 15 16:09:07 lola-10 kernel: Lustre: soaked-MDT0005: Denying connection for new client e20f50f6-1966-798c-abf7-2d3bde20760a(at 192.168.1.126@o2ib100), waiting for 16 known clients (0 recovered, 0 in progress, and 16 evicted) to recover in 21116921:47
Oct 15 16:09:08 lola-10 kernel: LustreError: 4393:0:(llog_osd.c:856:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed: 
Oct 15 16:09:08 lola-10 kernel: LustreError: 4393:0:(llog_osd.c:856:llog_osd_next_block()) LBUG
Oct 15 16:09:08 lola-10 kernel: Pid: 4393, comm: lod0005_rec0006
Oct 15 16:09:08 lola-10 kernel: 
Oct 15 16:09:08 lola-10 kernel: Call Trace:
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa05c1875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa05c1e77>] lbug_with_loc+0x47/0xb0 [libcfs]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa06d5bab>] llog_osd_next_block+0xa4b/0xc90 [obdclass]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa06c6c2e>] llog_process_thread+0x2de/0xfc0 [obdclass]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa06c79cd>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa11a7990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa06cae98>] llog_cat_process_cb+0x458/0x600 [obdclass]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa06c729a>] llog_process_thread+0x94a/0xfc0 [obdclass]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa07168b4>] ? dt_read+0x14/0x50 [obdclass]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa06c79cd>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa06caa40>] ? llog_cat_process_cb+0x0/0x600 [obdclass]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa06c96fd>] llog_cat_process_or_fork+0x1ad/0x300 [obdclass]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa11d3589>] ? lod_sub_prep_llog+0x4f9/0x7a0 [lod]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa11a7990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa06c9869>] llog_cat_process+0x19/0x20 [obdclass]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa11a8c8e>] lod_sub_recovery_thread+0x26e/0xb90 [lod]
Oct 15 16:09:08 lola-10 kernel: [<ffffffffa11a8a20>] ? lod_sub_recovery_thread+0x0/0xb90 [lod]
Oct 15 16:09:08 lola-10 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0
Oct 15 16:09:08 lola-10 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Oct 15 16:09:08 lola-10 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
Oct 15 16:09:08 lola-10 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Oct 15 16:09:08 lola-10 kernel: 
Comment by Di Wang [ 17/Oct/15 ]

I just figured out a way to reproduce the problem, and also prove my thought that the failure of llog records cause the corruption. I will update the patch.

Comment by Di Wang [ 18/Oct/15 ]

I update the fix along with other DNE patches to http://review.whamcloud.com/16838 . After the test approved this actually works, I will update 16389.

Comment by Mikhail Pershin [ 19/Oct/15 ]

Di, could you describe what caused llog corruption? Was it wrong usage of llog by external caller or problem inside llog itself?

Comment by Di Wang [ 19/Oct/15 ]

Please check the test case replay single 117 in patch 16838. Basically it because of the llog header is not protected by lock when used by OSP for update log. Basically, it happens like this

1. For example OSP adds 3 updates into the OSP sending list. All of them will write to the same remote llog file. So they will share the same bitmap and log index etc. Though the bitmap in 2nd will include 1st's bit, and the 3rd includes both 1st and 2nd bits.

2. And the 1 updates failed for some reason, which might happen especially when we restart 2 MDTs at the same time in soak-test.

3. After recovery, it will keep writing 2nd and 3 rd updates, which might include 1st bitmap, but there are no 1st record because of the failure of step 2.

So right now, I will let all of updates in the sending list fail once this happen, then refresh the header and also llog size for the new RPC.

That said this probably not the reason for local log corruption. And another thing I do not understand is that why this just start to happen recently (since last 2 weeks)? I checked the recent commit for llog, and did not find any suspicious patch? Do you have any clue?

Comment by Mikhail Pershin [ 26/Oct/15 ]

I don't see how the dt_attr_get() synchronized with dt_write() over OSP. As I can see there is no cache for attributes if llog wasn't created but opened. That means the dt_attr_get() will cause RPC to another server, but dt_write() might not be even applied remotely, so returned size might be smaller and llog will be corrupted as result. Is that true or I miss something?

Comment by Di Wang [ 26/Oct/15 ]

the llog update (write + size update) is serialized by rpc_version, see osp_send_update_thread().

Comment by Mikhail Pershin [ 26/Oct/15 ]

It is not about updates but about attr_get() which is not serialized with updates, so can return smaller size

Comment by Di Wang [ 26/Oct/15 ]

They are serialize by lgh_lock in llog_cat_add_rec().

Comment by Mikhail Pershin [ 27/Oct/15 ]

I don't see how lgh_lock helps here. Consider the following scenario:

  • llog_osd_write() issues new record, write RPC is scheduled for another server
  • new llog_osd_write() is called immediately and issues attr_get() RPC for the same llog to the remote server
  • since these two RPCs are not serialized and were issued almost together technically the attr_get() RPC might return size value before write will be applied on remote server

This is not the case if llog write RPC is synchronous but it is not so, isn't it?

Comment by Di Wang [ 27/Oct/15 ]

Hmm, Mike, I think you are right. We need update the size for the OSP object after write.

Comment by Mikhail Pershin [ 27/Oct/15 ]

Do you mean to make opo_ooa exists all time? That would solve this problem I think.

Comment by Di Wang [ 27/Oct/15 ]

Yes, I believe so. I added this fix to 16838 (along with a few other changes) and soak-test will try to see if this can resolve the corrupt issue.

Comment by Gerrit Updater [ 27/Oct/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16969
Subject: LU-7039 llog: update llog header and size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9191d9a7f79c739d71dc7652333b9f07456218ad

Comment by Gerrit Updater [ 28/Oct/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16797/
Subject: LU-7039 tgt: Delete txn_callback correctly in tgt_init()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 44e9ec0b46fc46cc72bebbdc35e4a59a0397a81c

Comment by Frank Heckes (Inactive) [ 29/Oct/15 ]

Di: For build '20151027' (https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151027)
which doesn't include change 16797, as far as I can see, the problem is still present:

Oct 28 07:10:59 lola-8 kernel: LustreError: 6954:0:(llog_osd.c:874:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == 
tail->lrt_index ) failed: 
Oct 28 07:10:59 lola-8 kernel: LustreError: 6954:0:(llog_osd.c:874:llog_osd_next_block()) LBUG
Oct 28 07:10:59 lola-8 kernel: Pid: 6954, comm: lod0003_rec0007
Oct 28 07:10:59 lola-8 kernel: 
Oct 28 07:10:59 lola-8 kernel: Call Trace:
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa07fc875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa07fce77>] lbug_with_loc+0x47/0xb0 [libcfs]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa0917a7b>] llog_osd_next_block+0xa4b/0xc90 [obdclass]
Oct 28 07:10:59 lola-8 kernel: [<ffffffff81174450>] ? cache_alloc_refill+0x1c0/0x240
Oct 28 07:10:59 lola-8 kernel: LustreError: 6948:0:(llog.c:534:llog_process_thread()) soaked-MDT0000-osp-MDT0003: Invalid re
cord: index 9421 but expected 9420
Oct 28 07:10:59 lola-8 kernel: LustreError: 6948:0:(lod_dev.c:402:lod_sub_recovery_thread()) soaked-MDT0000-osp-MDT0003 gett
ing update log failed: rc = -34
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa0906d3e>] llog_process_thread+0x2de/0xfc0 [obdclass]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa0904d5c>] ? llog_init_handle+0x11c/0x950 [obdclass]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa0907add>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa1374990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa090b108>] llog_cat_process_cb+0x458/0x600 [obdclass]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa09073aa>] llog_process_thread+0x94a/0xfc0 [obdclass]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa0907add>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa090acb0>] ? llog_cat_process_cb+0x0/0x600 [obdclass]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa090996d>] llog_cat_process_or_fork+0x1ad/0x300 [obdclass]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa13a0589>] ? lod_sub_prep_llog+0x4f9/0x7a0 [lod]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa1374990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa0909ad9>] llog_cat_process+0x19/0x20 [obdclass]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa1375c8e>] lod_sub_recovery_thread+0x26e/0xb90 [lod]
Oct 28 07:10:59 lola-8 kernel: [<ffffffffa1375a20>] ? lod_sub_recovery_thread+0x0/0xb90 [lod]
Oct 28 07:10:59 lola-8 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0
Oct 28 07:10:59 lola-8 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Comment by Di Wang [ 29/Oct/15 ]

Frank: I updated http://review.whamcloud.com/16838, and added more debug information there. Could you please retry with the patch. Thanks.

Comment by Gerrit Updater [ 30/Oct/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16740/
Subject: LU-7039 llog: skip to next chunk for corrupt record
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 04f4023cf59b6e5a1634ba492cd813dcb1af0c7c

Comment by Gerrit Updater [ 15/Nov/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/17199
Subject: LU-7039 recovery: abort update recovery once fails
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 70905df1d7ea16d50c927b6af9957bced89a0f3b

Comment by Di Wang [ 20/Nov/15 ]

Just update, it seems corruption disappears in the build of 20151120, though we need run more test to confirm this. Currently the soak-test is blocked by LU-7456, and we will continue soak-test to check this problem once 7456 is fixed.

Comment by Di Wang [ 24/Nov/15 ]

Just reminder, http://review.whamcloud.com/16969 and http://review.whamcloud.com/17199 are key fixes for this problem.

Comment by Frank Heckes (Inactive) [ 04/Dec/15 ]

The error below happens during soak testing of change 16838 patch set #31 (no Wiki entry for build exits, yet) on cluster lola. DNE is enabled and MDSes are configured in active-active HA failover configuration.

  • Primary resources of MDT lola-11 were failed back at Dec, 3 20:18.
    The allocation of slabs increased continuously till ~ 31 GB till crash
  • MDS node lola-11 crashed with oom-killer at Dec, 4 00:21 (local time). (see also LU-7432)
  • ptlrpc_cache seems to be the biggest consumer
    Attached lola-11's messages, console log, vmcore-dmesg file, collectl (version V4.0.2-1) files (for time interval specified above). Also
    attached files containing extracted counters for memory, slab totals and per slab allocation.

The crash dump has been saved to lola-1:/scratch/crashdumps/lu-7039/127.0.0.1-2015-12-04-00\:22\:36.

Comment by Frank Heckes (Inactive) [ 04/Dec/15 ]

It turned out that the collectl raw files are to big to be uploaded to Jira. I saved them to lola-1:/scratch/crashdumps/lu-7039.

Comment by Di Wang [ 04/Dec/15 ]

This OOM is not caused by llog corruption, so it is a new problem (LU-7517).

Comment by Sarah Liu [ 19/Jan/16 ]

another instance on master DNE mode
https://testing.hpdd.intel.com/test_sets/b57ce146-bbfd-11e5-8506-5254006e85c2
client and server: lustre-master build#3305

Comment by Gerrit Updater [ 26/Jan/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16969/
Subject: LU-7039 llog: update llog header and size
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e1745ed18d8e28f3cf3d72df3b7ef50d83f36601

Comment by Joseph Gmitter (Inactive) [ 27/Jan/16 ]

Landed for 2.8.0

Generated at Sat Feb 10 02:05:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.