[LU-3263] llog_osd_next_block(): ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed: Created: 02/May/13  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Li Wei (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: endianness, sparc

Issue Links:
Related
is related to LU-6968 Update the whole header in llog_cance... Resolved
Severity: 3
Rank (Obsolete): 8082

 Description   

On a SPARC machine, "./llmount.sh" hit this assertion failure:

LustreError: 20452:0:(llog_osd.c:630:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:
LustreError: 20452:0:(llog_osd.c:630:llog_osd_next_block()) LBUG
Pid: 20452, comm: ll_mgs_0002

Call Trace:

Kernel panic - not syncing: LBUG
Call Trace:
 [00000000103a3194] lbug_with_loc+0x94/0xc0 [libcfs]
 [0000000010535fbc] llog_osd_next_block+0xb5c/0x1000 [obdclass]
 [00000000104f39d0] llog_process_thread+0x2b0/0x13a0 [obdclass]
 [00000000104f4cdc] llog_process_or_fork+0x21c/0x980 [obdclass]
 [000000001090a140] mgs_steal_llog_for_mdt_from_client+0x5e0/0xae0 [mgs]
 [000000001090b120] mgs_write_log_mdt+0xae0/0x3a60 [mgs]
 [00000000109262f8] mgs_write_log_target+0x798/0x20a0 [mgs]
 [00000000108ea624] mgs_handle_target_reg+0xd44/0x17c0 [mgs]
 [00000000108edab8] mgs_handle+0xd18/0x22a0 [mgs]
 [00000000106f5f60] ptlrpc_server_handle_request+0x980/0x16c0 [ptlrpc]
 [00000000106fc130] ptlrpc_main+0xa10/0x1680 [ptlrpc]
 [000000000042ad88] kernel_thread+0x30/0x48
 [00000000103aea44] cfs_create_thread+0x24/0x60 [libcfs]
Press Stop-A (L1-A) to return to the boot prom

The llog_osd_next_block() lines in question are

                tail = (struct llog_rec_tail *)((char *)buf + rc -
                                                sizeof(struct llog_rec_tail));
                /* get the last record in block */
                last_rec = (struct llog_rec_hdr *)((char *)buf + rc -
                                                   le32_to_cpu(tail->lrt_len));

                if (LLOG_REC_HDR_NEEDS_SWABBING(last_rec))
                        lustre_swab_llog_rec(last_rec);
                LASSERT(last_rec->lrh_index == tail->lrt_index);

The le32_to_cpu() call above assumes the data to be little-endian. That is not true, however, because configuration logs (as well as at least OSP logs) are actually written in host-endianness, which is big-endian on sparc Linux.

It is not clear what the endianness rule should be. The comment above the definition of llog_rec_hdr requires little-endianness, while the LLOG_REC_HDR_NEEDS_SWABBING() calls and log writing code suggest host-endianness (or adaptive-endianness). Enforcing little-endianness requires a larger amount of changes, while host-endianness makes it impossible to find the index of the last record in a chunk in O(1) time, since the record header must be read first to determine endianness.



 Comments   
Comment by Andreas Dilger [ 02/May/13 ]

I think the record header would always have been read initially, so it would be possible to save the endianness of the llog if it can't be determined in isolation, which shouldn't change from one call to the next.

Comment by Li Wei (Inactive) [ 03/May/13 ]

Strictly speaking, each record could have its own endianness, based on the adaptive-endianness scheme. And, whether mixed endianness is possible is beyond llog_osd's knowledge.

Comment by John Hammond [ 06/May/13 ]

For the time being, is it enough just to ensure that future llog records are written in LE?

Comment by Li Wei (Inactive) [ 07/May/13 ]

Yes, I think that would be excellent, although considerable work are required.

Comment by Andreas Dilger [ 07/May/13 ]

I know there were efforts to that end at one time or another, maybe Mike will recall the details. We have never officially supported having big-endian servers, so this wouldn't impact existing systems except those SPARC systems from Fujitsu (AFAIK).

Comment by John Hammond [ 07/May/13 ]

We also see misaligned accesses in the llog OSD code:

Kernel unaligned access at TPC[10325280] llog_osd_write_rec+0xca0/0x1c20 [obdclass]
Kernel unaligned access at TPC[10325298] llog_osd_write_rec+0xcb8/0x1c20 [obdclass]
Kernel unaligned access at TPC[103252b4] llog_osd_write_rec+0xcd4/0x1c20 [obdclass]
Kernel unaligned access at TPC[103252d0] llog_osd_write_rec+0xcf0/0x1c20 [obdclass]
Kernel unaligned access at TPC[103252f8] llog_osd_write_rec+0xd18/0x1c20 [obdclass]
Comment by Li Wei (Inactive) [ 08/May/13 ]

I checked every of these unaligned accesses. All were resulted from the last CDEBUG() in llog_osd_write_rec(). These, although need to be fixed, shouldn't be harmful at the moment.

The root cause is an interesting semantics of the "packed" attribute. From the GCC manual:

`-Wpacked'
Warn if a structure is given the packed attribute, but the packed
attribute has no effect on the layout or size of the structure.
Such structures may be mis-aligned for little benefit. For
instance, in this code, the variable `f.x' in `struct bar' will be
misaligned even though `struct bar' does not itself have the
packed attribute:

struct foo

Unknown macro: { int x; char a, b, c, d; }

_attribute_((packed));
struct bar

Unknown macro: { char z; struct foo f; }

;

This led me thinking whether we should use "packed" for structure definitions at all. But anyway, it seems this could be resolved a bit later.

Comment by Oleg Drokin [ 17/May/13 ]

If we have anything like what's described in the gcc manual, that's our bug.

We should NOT mix packed and non-packed structures in the same structure.

Overall packed structures are for tings on the wire/on disk. Even then we can do without if we carefully do our own layout of the structures.

For any bad structures like that we have right now (esp. in OSD), we need to fix them by yesterday so 2.4.0 has all of those changes and there is no protocol breakage going forward.
So I am expecting a patch real soon.

Comment by Li Wei (Inactive) [ 17/May/13 ]

The structure caused the unaligned accesses above is llog_handle. It is not packed, but contains a packed llog_logid lgh_id, which was misaligned on sparc. llog_handle instances are not for wire or disk, AFAIK. Hence, this defect doesn't affect protocol or disk format.

Comment by Cliff White (Inactive) [ 09/Jul/15 ]

We are seeing this now on lola with 2.7.56

Jul  8 15:18:27 lola-11 kernel: LustreError: 14265:0:(llog_osd.c:784:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:
Jul  8 15:18:27 lola-11 kernel: LustreError: 14265:0:(llog_osd.c:784:llog_osd_next_block()) LBUG
Jul  8 15:18:27 lola-11 kernel: Pid: 14265, comm: lod0006_rec0005
Jul  8 15:18:27 lola-11 kernel:
Jul  8 15:18:27 lola-11 kernel: Call Trace:
Jul  8 15:18:27 lola-11 kernel: [<ffffffffa0741875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Jul  8 15:18:27 lola-11 kernel: [<ffffffffa0741e77>] lbug_with_loc+0x47/0xb0 [libcfs]
Jul  8 15:18:27 lola-11 kernel: [<ffffffffa085d397>] llog_osd_next_block+0xb37/0xbc0 [obdclass]
Jul  8 15:18:27 lola-11 kernel: [<ffffffffa084b0e6>] llog_process_thread+0x286/0xfd0 [obdclass]
Jul  8 15:18:27 lola-11 kernel: [<ffffffffa084d9d4>] ? llog_init_handle+0x104/0xbb0 [obdclass]
Jul  8 15:18:27 lola-11 kernel: [<ffffffffa12d2f20>] ? lod_process_recovery_updates+0x0/0x420 [lod]
Jul  8 15:18:27 lola-11 kernel: [<ffffffffa084bf6f>] llog_process_or_fork+0x13f/0x690 [obdclass]
Jul  8 15:18:27 lola-11 kernel: [<ffffffffa0850b68>] llog_cat_process_cb+0x458/0x600 [obdclass]
Jul  8 15:18:27 lola-11 kernel: [<ffffffffa084b9e2>] llog_process_thread+0xb82/0xfd0 [obdclass]
Comment by James Nunez (Inactive) [ 25/Aug/15 ]

We've seen this issue recently in the test results for a patch to master (pre-2.8). The patch is modifying some llog routines. The logs are at:
2015-08-24 22:36:54 - https://testing.hpdd.intel.com/test_sets/fffdc95e-4ad0-11e5-b2ff-5254006e85c2

Comment by James A Simmons [ 25/Aug/15 ]

The patch in question is for LU-6968

Generated at Sat Feb 10 01:32:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.