Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3263

llog_osd_next_block(): ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.4.0
    • 3
    • 8082

    Description

      On a SPARC machine, "./llmount.sh" hit this assertion failure:

      LustreError: 20452:0:(llog_osd.c:630:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:
      LustreError: 20452:0:(llog_osd.c:630:llog_osd_next_block()) LBUG
      Pid: 20452, comm: ll_mgs_0002
      
      Call Trace:
      
      Kernel panic - not syncing: LBUG
      Call Trace:
       [00000000103a3194] lbug_with_loc+0x94/0xc0 [libcfs]
       [0000000010535fbc] llog_osd_next_block+0xb5c/0x1000 [obdclass]
       [00000000104f39d0] llog_process_thread+0x2b0/0x13a0 [obdclass]
       [00000000104f4cdc] llog_process_or_fork+0x21c/0x980 [obdclass]
       [000000001090a140] mgs_steal_llog_for_mdt_from_client+0x5e0/0xae0 [mgs]
       [000000001090b120] mgs_write_log_mdt+0xae0/0x3a60 [mgs]
       [00000000109262f8] mgs_write_log_target+0x798/0x20a0 [mgs]
       [00000000108ea624] mgs_handle_target_reg+0xd44/0x17c0 [mgs]
       [00000000108edab8] mgs_handle+0xd18/0x22a0 [mgs]
       [00000000106f5f60] ptlrpc_server_handle_request+0x980/0x16c0 [ptlrpc]
       [00000000106fc130] ptlrpc_main+0xa10/0x1680 [ptlrpc]
       [000000000042ad88] kernel_thread+0x30/0x48
       [00000000103aea44] cfs_create_thread+0x24/0x60 [libcfs]
      Press Stop-A (L1-A) to return to the boot prom
      

      The llog_osd_next_block() lines in question are

                      tail = (struct llog_rec_tail *)((char *)buf + rc -
                                                      sizeof(struct llog_rec_tail));
                      /* get the last record in block */
                      last_rec = (struct llog_rec_hdr *)((char *)buf + rc -
                                                         le32_to_cpu(tail->lrt_len));
      
                      if (LLOG_REC_HDR_NEEDS_SWABBING(last_rec))
                              lustre_swab_llog_rec(last_rec);
                      LASSERT(last_rec->lrh_index == tail->lrt_index);
      

      The le32_to_cpu() call above assumes the data to be little-endian. That is not true, however, because configuration logs (as well as at least OSP logs) are actually written in host-endianness, which is big-endian on sparc Linux.

      It is not clear what the endianness rule should be. The comment above the definition of llog_rec_hdr requires little-endianness, while the LLOG_REC_HDR_NEEDS_SWABBING() calls and log writing code suggest host-endianness (or adaptive-endianness). Enforcing little-endianness requires a larger amount of changes, while host-endianness makes it impossible to find the index of the last record in a chunk in O(1) time, since the record header must be read first to determine endianness.

      Attachments

        Issue Links

          Activity

            [LU-3263] llog_osd_next_block(): ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:
            simmonsja James A Simmons added a comment - - edited

            The patch in question is for LU-6968

            simmonsja James A Simmons added a comment - - edited The patch in question is for LU-6968

            We've seen this issue recently in the test results for a patch to master (pre-2.8). The patch is modifying some llog routines. The logs are at:
            2015-08-24 22:36:54 - https://testing.hpdd.intel.com/test_sets/fffdc95e-4ad0-11e5-b2ff-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - We've seen this issue recently in the test results for a patch to master (pre-2.8). The patch is modifying some llog routines. The logs are at: 2015-08-24 22:36:54 - https://testing.hpdd.intel.com/test_sets/fffdc95e-4ad0-11e5-b2ff-5254006e85c2

            We are seeing this now on lola with 2.7.56

            Jul  8 15:18:27 lola-11 kernel: LustreError: 14265:0:(llog_osd.c:784:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:
            Jul  8 15:18:27 lola-11 kernel: LustreError: 14265:0:(llog_osd.c:784:llog_osd_next_block()) LBUG
            Jul  8 15:18:27 lola-11 kernel: Pid: 14265, comm: lod0006_rec0005
            Jul  8 15:18:27 lola-11 kernel:
            Jul  8 15:18:27 lola-11 kernel: Call Trace:
            Jul  8 15:18:27 lola-11 kernel: [<ffffffffa0741875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            Jul  8 15:18:27 lola-11 kernel: [<ffffffffa0741e77>] lbug_with_loc+0x47/0xb0 [libcfs]
            Jul  8 15:18:27 lola-11 kernel: [<ffffffffa085d397>] llog_osd_next_block+0xb37/0xbc0 [obdclass]
            Jul  8 15:18:27 lola-11 kernel: [<ffffffffa084b0e6>] llog_process_thread+0x286/0xfd0 [obdclass]
            Jul  8 15:18:27 lola-11 kernel: [<ffffffffa084d9d4>] ? llog_init_handle+0x104/0xbb0 [obdclass]
            Jul  8 15:18:27 lola-11 kernel: [<ffffffffa12d2f20>] ? lod_process_recovery_updates+0x0/0x420 [lod]
            Jul  8 15:18:27 lola-11 kernel: [<ffffffffa084bf6f>] llog_process_or_fork+0x13f/0x690 [obdclass]
            Jul  8 15:18:27 lola-11 kernel: [<ffffffffa0850b68>] llog_cat_process_cb+0x458/0x600 [obdclass]
            Jul  8 15:18:27 lola-11 kernel: [<ffffffffa084b9e2>] llog_process_thread+0xb82/0xfd0 [obdclass]
            
            cliffw Cliff White (Inactive) added a comment - We are seeing this now on lola with 2.7.56 Jul 8 15:18:27 lola-11 kernel: LustreError: 14265:0:(llog_osd.c:784:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed: Jul 8 15:18:27 lola-11 kernel: LustreError: 14265:0:(llog_osd.c:784:llog_osd_next_block()) LBUG Jul 8 15:18:27 lola-11 kernel: Pid: 14265, comm: lod0006_rec0005 Jul 8 15:18:27 lola-11 kernel: Jul 8 15:18:27 lola-11 kernel: Call Trace: Jul 8 15:18:27 lola-11 kernel: [<ffffffffa0741875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Jul 8 15:18:27 lola-11 kernel: [<ffffffffa0741e77>] lbug_with_loc+0x47/0xb0 [libcfs] Jul 8 15:18:27 lola-11 kernel: [<ffffffffa085d397>] llog_osd_next_block+0xb37/0xbc0 [obdclass] Jul 8 15:18:27 lola-11 kernel: [<ffffffffa084b0e6>] llog_process_thread+0x286/0xfd0 [obdclass] Jul 8 15:18:27 lola-11 kernel: [<ffffffffa084d9d4>] ? llog_init_handle+0x104/0xbb0 [obdclass] Jul 8 15:18:27 lola-11 kernel: [<ffffffffa12d2f20>] ? lod_process_recovery_updates+0x0/0x420 [lod] Jul 8 15:18:27 lola-11 kernel: [<ffffffffa084bf6f>] llog_process_or_fork+0x13f/0x690 [obdclass] Jul 8 15:18:27 lola-11 kernel: [<ffffffffa0850b68>] llog_cat_process_cb+0x458/0x600 [obdclass] Jul 8 15:18:27 lola-11 kernel: [<ffffffffa084b9e2>] llog_process_thread+0xb82/0xfd0 [obdclass]

            The structure caused the unaligned accesses above is llog_handle. It is not packed, but contains a packed llog_logid lgh_id, which was misaligned on sparc. llog_handle instances are not for wire or disk, AFAIK. Hence, this defect doesn't affect protocol or disk format.

            liwei Li Wei (Inactive) added a comment - The structure caused the unaligned accesses above is llog_handle. It is not packed, but contains a packed llog_logid lgh_id, which was misaligned on sparc. llog_handle instances are not for wire or disk, AFAIK. Hence, this defect doesn't affect protocol or disk format.
            green Oleg Drokin added a comment -

            If we have anything like what's described in the gcc manual, that's our bug.

            We should NOT mix packed and non-packed structures in the same structure.

            Overall packed structures are for tings on the wire/on disk. Even then we can do without if we carefully do our own layout of the structures.

            For any bad structures like that we have right now (esp. in OSD), we need to fix them by yesterday so 2.4.0 has all of those changes and there is no protocol breakage going forward.
            So I am expecting a patch real soon.

            green Oleg Drokin added a comment - If we have anything like what's described in the gcc manual, that's our bug. We should NOT mix packed and non-packed structures in the same structure. Overall packed structures are for tings on the wire/on disk. Even then we can do without if we carefully do our own layout of the structures. For any bad structures like that we have right now (esp. in OSD), we need to fix them by yesterday so 2.4.0 has all of those changes and there is no protocol breakage going forward. So I am expecting a patch real soon.

            I checked every of these unaligned accesses. All were resulted from the last CDEBUG() in llog_osd_write_rec(). These, although need to be fixed, shouldn't be harmful at the moment.

            The root cause is an interesting semantics of the "packed" attribute. From the GCC manual:

            `-Wpacked'
            Warn if a structure is given the packed attribute, but the packed
            attribute has no effect on the layout or size of the structure.
            Such structures may be mis-aligned for little benefit. For
            instance, in this code, the variable `f.x' in `struct bar' will be
            misaligned even though `struct bar' does not itself have the
            packed attribute:

            struct foo

            Unknown macro: { int x; char a, b, c, d; }

            _attribute_((packed));
            struct bar

            Unknown macro: { char z; struct foo f; }

            ;

            This led me thinking whether we should use "packed" for structure definitions at all. But anyway, it seems this could be resolved a bit later.

            liwei Li Wei (Inactive) added a comment - I checked every of these unaligned accesses. All were resulted from the last CDEBUG() in llog_osd_write_rec(). These, although need to be fixed, shouldn't be harmful at the moment. The root cause is an interesting semantics of the "packed" attribute. From the GCC manual: `-Wpacked' Warn if a structure is given the packed attribute, but the packed attribute has no effect on the layout or size of the structure. Such structures may be mis-aligned for little benefit. For instance, in this code, the variable `f.x' in `struct bar' will be misaligned even though `struct bar' does not itself have the packed attribute: struct foo Unknown macro: { int x; char a, b, c, d; } _ attribute _((packed)); struct bar Unknown macro: { char z; struct foo f; } ; This led me thinking whether we should use "packed" for structure definitions at all. But anyway, it seems this could be resolved a bit later.
            jhammond John Hammond added a comment -

            We also see misaligned accesses in the llog OSD code:

            Kernel unaligned access at TPC[10325280] llog_osd_write_rec+0xca0/0x1c20 [obdclass]
            Kernel unaligned access at TPC[10325298] llog_osd_write_rec+0xcb8/0x1c20 [obdclass]
            Kernel unaligned access at TPC[103252b4] llog_osd_write_rec+0xcd4/0x1c20 [obdclass]
            Kernel unaligned access at TPC[103252d0] llog_osd_write_rec+0xcf0/0x1c20 [obdclass]
            Kernel unaligned access at TPC[103252f8] llog_osd_write_rec+0xd18/0x1c20 [obdclass]
            
            jhammond John Hammond added a comment - We also see misaligned accesses in the llog OSD code: Kernel unaligned access at TPC[10325280] llog_osd_write_rec+0xca0/0x1c20 [obdclass] Kernel unaligned access at TPC[10325298] llog_osd_write_rec+0xcb8/0x1c20 [obdclass] Kernel unaligned access at TPC[103252b4] llog_osd_write_rec+0xcd4/0x1c20 [obdclass] Kernel unaligned access at TPC[103252d0] llog_osd_write_rec+0xcf0/0x1c20 [obdclass] Kernel unaligned access at TPC[103252f8] llog_osd_write_rec+0xd18/0x1c20 [obdclass]

            I know there were efforts to that end at one time or another, maybe Mike will recall the details. We have never officially supported having big-endian servers, so this wouldn't impact existing systems except those SPARC systems from Fujitsu (AFAIK).

            adilger Andreas Dilger added a comment - I know there were efforts to that end at one time or another, maybe Mike will recall the details. We have never officially supported having big-endian servers, so this wouldn't impact existing systems except those SPARC systems from Fujitsu (AFAIK).

            Yes, I think that would be excellent, although considerable work are required.

            liwei Li Wei (Inactive) added a comment - Yes, I think that would be excellent, although considerable work are required.

            For the time being, is it enough just to ensure that future llog records are written in LE?

            jhammond John Hammond added a comment - For the time being, is it enough just to ensure that future llog records are written in LE?

            People

              wc-triage WC Triage
              liwei Li Wei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: