Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3263

llog_osd_next_block(): ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.4.0
    • 3
    • 8082

    Description

      On a SPARC machine, "./llmount.sh" hit this assertion failure:

      LustreError: 20452:0:(llog_osd.c:630:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:
      LustreError: 20452:0:(llog_osd.c:630:llog_osd_next_block()) LBUG
      Pid: 20452, comm: ll_mgs_0002
      
      Call Trace:
      
      Kernel panic - not syncing: LBUG
      Call Trace:
       [00000000103a3194] lbug_with_loc+0x94/0xc0 [libcfs]
       [0000000010535fbc] llog_osd_next_block+0xb5c/0x1000 [obdclass]
       [00000000104f39d0] llog_process_thread+0x2b0/0x13a0 [obdclass]
       [00000000104f4cdc] llog_process_or_fork+0x21c/0x980 [obdclass]
       [000000001090a140] mgs_steal_llog_for_mdt_from_client+0x5e0/0xae0 [mgs]
       [000000001090b120] mgs_write_log_mdt+0xae0/0x3a60 [mgs]
       [00000000109262f8] mgs_write_log_target+0x798/0x20a0 [mgs]
       [00000000108ea624] mgs_handle_target_reg+0xd44/0x17c0 [mgs]
       [00000000108edab8] mgs_handle+0xd18/0x22a0 [mgs]
       [00000000106f5f60] ptlrpc_server_handle_request+0x980/0x16c0 [ptlrpc]
       [00000000106fc130] ptlrpc_main+0xa10/0x1680 [ptlrpc]
       [000000000042ad88] kernel_thread+0x30/0x48
       [00000000103aea44] cfs_create_thread+0x24/0x60 [libcfs]
      Press Stop-A (L1-A) to return to the boot prom
      

      The llog_osd_next_block() lines in question are

                      tail = (struct llog_rec_tail *)((char *)buf + rc -
                                                      sizeof(struct llog_rec_tail));
                      /* get the last record in block */
                      last_rec = (struct llog_rec_hdr *)((char *)buf + rc -
                                                         le32_to_cpu(tail->lrt_len));
      
                      if (LLOG_REC_HDR_NEEDS_SWABBING(last_rec))
                              lustre_swab_llog_rec(last_rec);
                      LASSERT(last_rec->lrh_index == tail->lrt_index);
      

      The le32_to_cpu() call above assumes the data to be little-endian. That is not true, however, because configuration logs (as well as at least OSP logs) are actually written in host-endianness, which is big-endian on sparc Linux.

      It is not clear what the endianness rule should be. The comment above the definition of llog_rec_hdr requires little-endianness, while the LLOG_REC_HDR_NEEDS_SWABBING() calls and log writing code suggest host-endianness (or adaptive-endianness). Enforcing little-endianness requires a larger amount of changes, while host-endianness makes it impossible to find the index of the last record in a chunk in O(1) time, since the record header must be read first to determine endianness.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              liwei Li Wei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: