Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1238

record_lcfg() failed with ENOSPC

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.2.0, Lustre 2.3.0
    • None
    • 3
    • 10883

    Description

      While running ost-pools test 5 with 2000 OSTs, after adding 2000 OSTs to one OST pool and then removing the OSTs from the pool, the test failed as follows:

      <~snip~>
      client-19-ib: Warning, OST lustre-OST041f_UUID still found in pool lustre.testpool
      client-19-ib: Warning, OST lustre-OST0420_UUID still found in pool lustre.testpool
      <~snip~>
      

      Console log on the combined MGS/MDS showed that:

      LustreError: 16312:0:(mgs_llog.c:752:record_lcfg()) failed -28
      LustreError: 16340:0:(mgs_llog.c:752:record_lcfg()) failed -28
      LustreError: 16340:0:(mgs_llog.c:788:record_base()) error -28: lcfg lustre-MDT0000-mdtlov 0xce022 lustre testpool lustre-OST041f_UUID (null)
      LustreError: 16340:0:(mgs_llog.c:788:record_base()) error -28: lcfg lustre-clilov 0xce022 lustre testpool lustre-OST041f_UUID (null)
      LustreError: 16369:0:(mgs_llog.c:752:record_lcfg()) failed -28
      LustreError: 16369:0:(mgs_llog.c:752:record_lcfg()) Skipped 5 previous similar messages
      LustreError: 16369:0:(mgs_llog.c:788:record_base()) error -28: lcfg lustre-MDT0000-mdtlov 0xce022 lustre testpool lustre-OST0420_UUID (null)

      Maloo report: https://maloo.whamcloud.com/test_sets/a610c4b2-71cd-11e1-9716-5254004bbbd3

      By running llog_reader on CONFIGS/lustre-MDT0000 file on the MGS/MDS node, I found there were 63293 records in that file and 1474 bits were not set. The last several records are:

      #64763 (224)marker 2043193 (flags=0x01, v2.2.0.0) lustre-MDT0000-mdtlov 'rem lustre.testpool.lustre-OST041d_UUID' Mon Mar 19 03:01:52 2012-
      #64764 (144)pool remove 0:lustre-MDT0000-mdtlov 1:lustre 2:testpool 3:lustre-OST041d_UUID
      #64765 (224)marker 2043193 (flags=0x02, v2.2.0.0) lustre-MDT0000-mdtlov 'rem lustre.testpool.lustre-OST041d_UUID' Mon Mar 19 03:01:52 2012-
      #64766 (224)marker 2043195 (flags=0x01, v2.2.0.0) lustre-MDT0000-mdtlov 'rem lustre.testpool.lustre-OST041e_UUID' Mon Mar 19 03:02:02 2012-
      #64767 (144)pool remove 0:lustre-MDT0000-mdtlov 1:lustre 2:testpool 3:lustre-OST041e_UUID

      The OST pool operations consumed most of the records and caused the record count reach to the following limitation:

               /* if it's the last idx in log file, then return -ENOSPC */
               if (loghandle->lgh_last_idx >= LLOG_BITMAP_SIZE(llh) - 1)
                       RETURN(-ENOSPC);
      
      /* (8192 - 88 - 8) * 8 = 64768 */
      #define LLOG_BITMAP_SIZE(llh)  ((llh->llh_hdr.lrh_len -         \
                                       llh->llh_bitmap_offset -       \
                                       sizeof(llh->llh_tail)) * 8)
      

      Please find the attached lustre-MDT0000.log for the output of "llog_reader lustre-MDT0000" and see how to resolve this issue.

      Attachments

        Activity

          People

            wc-triage WC Triage
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: