Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5303

osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 0

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.1
    • None
    • 3
    • 14808

    Description

      I hit a crash of OSS when mounting its targets.
      The issue is the same as LU-4528 but with Lustre 2.4.1.

      3>LustreError: 27422:0:(osd_io.c:1220:osd_ldiskfs_write_record()) loop21: error reading offset 0 (block 0): rc = -28
      <3>LustreError: 27422:0:(llog_osd.c:160:llog_osd_write_blob()) fs96OST-OST003b-osd: error writing log record: rc = -28
      <0>LustreError: 27422:0:(osd_internal.h:953:osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 0
      <0>LustreError: 27422:0:(osd_internal.h:953:osd_trans_exec_op()) LBUG
      <4>Pid: 27422, comm: mount.lustre
      <4>
      <4>Call Trace:
      <4> [<ffffffffa0bfb895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa0bfbe97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa161d42d>] osd_trans_exec_op+0x2ad/0x2e0 [osd_ldiskfs]
      <4> [<ffffffffa162e723>] osd_attr_set+0xe3/0x540 [osd_ldiskfs]
      <4> [<ffffffffa163b845>] ? osd_punch+0x1b5/0x600 [osd_ldiskfs]
      <4> [<ffffffffa10e60f1>] llog_osd_write_blob+0x211/0x850 [obdclass]
      <4> [<ffffffffa10e9d34>] llog_osd_write_rec+0x7d4/0x1370 [obdclass]
      <4> [<ffffffffa10b5438>] llog_write_rec+0xc8/0x290 [obdclass]
      <4> [<ffffffffa10b6bad>] llog_write+0x2ad/0x420 [obdclass]
      <4> [<ffffffffa10b6d44>] llog_copy_handler+0x24/0x30 [obdclass]
      <4> [<ffffffffa10b7e0b>] llog_process_thread+0x8fb/0xe00 [obdclass]
      <4> [<ffffffffa10b6d20>] ? llog_copy_handler+0x0/0x30 [obdclass]
      <4> [<ffffffffa10b9c7d>] llog_process_or_fork+0x12d/0x660 [obdclass]
      <4> [<ffffffffa10ba5a2>] llog_backup+0x3d2/0x500 [obdclass]
      <4> [<ffffffff8128cd30>] ? sprintf+0x40/0x50
      <4> [<ffffffffa16a38cf>] mgc_process_log+0x119f/0x18f0 [mgc]
      <4> [<ffffffffa169c8ba>] ? mgc_name2resid+0x4a/0x230 [mgc]
      <4> [<ffffffffa169d370>] ? mgc_blocking_ast+0x0/0x800 [mgc]
      <4> [<ffffffffa1215b20>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      <4> [<ffffffffa16a5514>] mgc_process_config+0x594/0xed0 [mgc]
      <4> [<ffffffffa110164c>] lustre_process_log+0x25c/0xaa0 [obdclass]
      <4> [<ffffffffa112bffc>] ? server_find_mount+0xbc/0x160 [obdclass]
      <4> [<ffffffffa112ebd6>] ? server_register_mount+0x516/0x8f0 [obdclass]
      <4> [<ffffffffa1134467>] server_start_targets+0x5c7/0x19c0 [obdclass]
      <4> [<ffffffffa0bfcb2e>] ? cfs_free+0xe/0x10 [libcfs]
      <4> [<ffffffffa1104eb5>] ? lustre_start_mgc+0x4a5/0x2180 [obdclass]
      <4> [<ffffffffa10fca20>] ? class_config_llog_handler+0x0/0x1890 [obdclass]
      <4> [<ffffffffa113640c>] server_fill_super+0xbac/0x1660 [obdclass]
      <4> [<ffffffffa1106d68>] lustre_fill_super+0x1d8/0x530 [obdclass]
      <4> [<ffffffffa1106b90>] ? lustre_fill_super+0x0/0x530 [obdclass]
      <4> [<ffffffff8118c7cf>] get_sb_nodev+0x5f/0xa0
      <4> [<ffffffffa10fe3b5>] lustre_get_sb+0x25/0x30 [obdclass]
      <4> [<ffffffff8118be2b>] vfs_kern_mount+0x7b/0x1b0
      <4> [<ffffffff8118bfd2>] do_kern_mount+0x52/0x130
      <4> [<ffffffff811acfdb>] do_mount+0x2fb/0x930
      <4> [<ffffffff811ad6a0>] sys_mount+0x90/0xe0
      <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Would it be possible to backport patch http://review.whamcloud.com/#/c/10108/ in b2_4 branch ?

      Attachments

        Issue Links

          Activity

            [LU-5303] osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 0
            pjones Peter Jones added a comment -

            duplicate of LU-4528

            pjones Peter Jones added a comment - duplicate of LU-4528

            Hi Peter,
            The patch http://review.whamcloud.com/#/c/11751/ is ready to be landed into b2_5 (two positive inspections).
            Could it be included in the future 2.5.4 release ?

            thanks,
            Grégoire.

            pichong Gregoire Pichon added a comment - Hi Peter, The patch http://review.whamcloud.com/#/c/11751/ is ready to be landed into b2_5 (two positive inspections). Could it be included in the future 2.5.4 release ? thanks, Grégoire.
            pjones Peter Jones added a comment -

            Gregoire

            The b2_5 patch is being tracked under LU-4528, which this issue is believed to be a duplicate of

            Peter

            pjones Peter Jones added a comment - Gregoire The b2_5 patch is being tracked under LU-4528 , which this issue is believed to be a duplicate of Peter

            Would it be possible to have a patch for b2_5 worked out ?
            I see a 2.5.3 version is going to be released soon, it would be good to have the fix integrated for that version since some of our customers are going to use lustre 2.5 in septembre.
            thanks.

            pichong Gregoire Pichon added a comment - Would it be possible to have a patch for b2_5 worked out ? I see a 2.5.3 version is going to be released soon, it would be good to have the fix integrated for that version since some of our customers are going to use lustre 2.5 in septembre. thanks.

            The issue did not occured on a production cluster, so this does not require immediate handling. Anyway, this is still a node crash and I would not like to see the same issue appear at a customer site.

            pichong Gregoire Pichon added a comment - The issue did not occured on a production cluster, so this does not require immediate handling. Anyway, this is still a node crash and I would not like to see the same issue appear at a customer site.

            Gregoire, since we seem to agree that b2_5 is the better branch to fix, can you give us a rough idea when you will need the solution to be in place?
            This will help us with our workload planning.

            Many thanks,
            ~ jfc

            jfc John Fuchs-Chesney (Inactive) added a comment - Gregoire, since we seem to agree that b2_5 is the better branch to fix, can you give us a rough idea when you will need the solution to be in place? This will help us with our workload planning. Many thanks, ~ jfc

            People

              bogl Bob Glossman (Inactive)
              pichong Gregoire Pichon
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: