Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5303

osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 0

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.1
    • None
    • 3
    • 14808

    Description

      I hit a crash of OSS when mounting its targets.
      The issue is the same as LU-4528 but with Lustre 2.4.1.

      3>LustreError: 27422:0:(osd_io.c:1220:osd_ldiskfs_write_record()) loop21: error reading offset 0 (block 0): rc = -28
      <3>LustreError: 27422:0:(llog_osd.c:160:llog_osd_write_blob()) fs96OST-OST003b-osd: error writing log record: rc = -28
      <0>LustreError: 27422:0:(osd_internal.h:953:osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 0
      <0>LustreError: 27422:0:(osd_internal.h:953:osd_trans_exec_op()) LBUG
      <4>Pid: 27422, comm: mount.lustre
      <4>
      <4>Call Trace:
      <4> [<ffffffffa0bfb895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa0bfbe97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa161d42d>] osd_trans_exec_op+0x2ad/0x2e0 [osd_ldiskfs]
      <4> [<ffffffffa162e723>] osd_attr_set+0xe3/0x540 [osd_ldiskfs]
      <4> [<ffffffffa163b845>] ? osd_punch+0x1b5/0x600 [osd_ldiskfs]
      <4> [<ffffffffa10e60f1>] llog_osd_write_blob+0x211/0x850 [obdclass]
      <4> [<ffffffffa10e9d34>] llog_osd_write_rec+0x7d4/0x1370 [obdclass]
      <4> [<ffffffffa10b5438>] llog_write_rec+0xc8/0x290 [obdclass]
      <4> [<ffffffffa10b6bad>] llog_write+0x2ad/0x420 [obdclass]
      <4> [<ffffffffa10b6d44>] llog_copy_handler+0x24/0x30 [obdclass]
      <4> [<ffffffffa10b7e0b>] llog_process_thread+0x8fb/0xe00 [obdclass]
      <4> [<ffffffffa10b6d20>] ? llog_copy_handler+0x0/0x30 [obdclass]
      <4> [<ffffffffa10b9c7d>] llog_process_or_fork+0x12d/0x660 [obdclass]
      <4> [<ffffffffa10ba5a2>] llog_backup+0x3d2/0x500 [obdclass]
      <4> [<ffffffff8128cd30>] ? sprintf+0x40/0x50
      <4> [<ffffffffa16a38cf>] mgc_process_log+0x119f/0x18f0 [mgc]
      <4> [<ffffffffa169c8ba>] ? mgc_name2resid+0x4a/0x230 [mgc]
      <4> [<ffffffffa169d370>] ? mgc_blocking_ast+0x0/0x800 [mgc]
      <4> [<ffffffffa1215b20>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      <4> [<ffffffffa16a5514>] mgc_process_config+0x594/0xed0 [mgc]
      <4> [<ffffffffa110164c>] lustre_process_log+0x25c/0xaa0 [obdclass]
      <4> [<ffffffffa112bffc>] ? server_find_mount+0xbc/0x160 [obdclass]
      <4> [<ffffffffa112ebd6>] ? server_register_mount+0x516/0x8f0 [obdclass]
      <4> [<ffffffffa1134467>] server_start_targets+0x5c7/0x19c0 [obdclass]
      <4> [<ffffffffa0bfcb2e>] ? cfs_free+0xe/0x10 [libcfs]
      <4> [<ffffffffa1104eb5>] ? lustre_start_mgc+0x4a5/0x2180 [obdclass]
      <4> [<ffffffffa10fca20>] ? class_config_llog_handler+0x0/0x1890 [obdclass]
      <4> [<ffffffffa113640c>] server_fill_super+0xbac/0x1660 [obdclass]
      <4> [<ffffffffa1106d68>] lustre_fill_super+0x1d8/0x530 [obdclass]
      <4> [<ffffffffa1106b90>] ? lustre_fill_super+0x0/0x530 [obdclass]
      <4> [<ffffffff8118c7cf>] get_sb_nodev+0x5f/0xa0
      <4> [<ffffffffa10fe3b5>] lustre_get_sb+0x25/0x30 [obdclass]
      <4> [<ffffffff8118be2b>] vfs_kern_mount+0x7b/0x1b0
      <4> [<ffffffff8118bfd2>] do_kern_mount+0x52/0x130
      <4> [<ffffffff811acfdb>] do_mount+0x2fb/0x930
      <4> [<ffffffff811ad6a0>] sys_mount+0x90/0xe0
      <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Would it be possible to backport patch http://review.whamcloud.com/#/c/10108/ in b2_4 branch ?

      Attachments

        Issue Links

          Activity

            [LU-5303] osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 0

            Gregoire, since we seem to agree that b2_5 is the better branch to fix, can you give us a rough idea when you will need the solution to be in place?
            This will help us with our workload planning.

            Many thanks,
            ~ jfc

            jfc John Fuchs-Chesney (Inactive) added a comment - Gregoire, since we seem to agree that b2_5 is the better branch to fix, can you give us a rough idea when you will need the solution to be in place? This will help us with our workload planning. Many thanks, ~ jfc

            It does appear a back port to b2_5 is more plausible. Less than half the number of files need manual attention to merge and the edits needed may not require an expert as in b2_4.

            bogl Bob Glossman (Inactive) added a comment - It does appear a back port to b2_5 is more plausible. Less than half the number of files need manual attention to merge and the edits needed may not require an expert as in b2_4.

            Thanks for looking.
            I wonder if a back port in b2_5, which is the maintenance release, would'nt be more appropriate in this case. This would benefit to the whole community and I am definitely going to move to lustre 2.5.x version at some time.

            pichong Gregoire Pichon added a comment - Thanks for looking. I wonder if a back port in b2_5, which is the maintenance release, would'nt be more appropriate in this case. This would benefit to the whole community and I am definitely going to move to lustre 2.5.x version at some time.

            Mike – we've added you as a watcher on this ticket.
            Can you please advise on the best way forward to resolve Gregoire's problem?
            Thanks,
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Mike – we've added you as a watcher on this ticket. Can you please advise on the best way forward to resolve Gregoire's problem? Thanks, ~ jfc.
            bogl Bob Glossman (Inactive) added a comment - - edited

            It looks to me like a back port should be possible, but needs the attention of somebody who really understands the code being modified to do it correctly. Trying to cherry-pick http://review.whamcloud.com/#/c/10108 back into b2_4 leaves 10 or more files that need manual editing to merge. Some appear to need more knowledge that just trying to resolve context diffs.

            I note that the Author of the original master patch was Mike Pershin. Maybe it's a job for him.

            bogl Bob Glossman (Inactive) added a comment - - edited It looks to me like a back port should be possible, but needs the attention of somebody who really understands the code being modified to do it correctly. Trying to cherry-pick http://review.whamcloud.com/#/c/10108 back into b2_4 leaves 10 or more files that need manual editing to merge. Some appear to need more knowledge that just trying to resolve context diffs. I note that the Author of the original master patch was Mike Pershin. Maybe it's a job for him.

            Hello Gregoire,
            Bob Glossman will take a look at the 'portability' of this patch to 2.4.x and will advise if this is possible.
            Thanks,
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Hello Gregoire, Bob Glossman will take a look at the 'portability' of this patch to 2.4.x and will advise if this is possible. Thanks, ~ jfc.

            Bob, Can you explore this please.

            jfc John Fuchs-Chesney (Inactive) added a comment - Bob, Can you explore this please.

            People

              bogl Bob Glossman (Inactive)
              pichong Gregoire Pichon
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: