Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5303

osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 0

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.1
    • None
    • 3
    • 14808

    Description

      I hit a crash of OSS when mounting its targets.
      The issue is the same as LU-4528 but with Lustre 2.4.1.

      3>LustreError: 27422:0:(osd_io.c:1220:osd_ldiskfs_write_record()) loop21: error reading offset 0 (block 0): rc = -28
      <3>LustreError: 27422:0:(llog_osd.c:160:llog_osd_write_blob()) fs96OST-OST003b-osd: error writing log record: rc = -28
      <0>LustreError: 27422:0:(osd_internal.h:953:osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 0
      <0>LustreError: 27422:0:(osd_internal.h:953:osd_trans_exec_op()) LBUG
      <4>Pid: 27422, comm: mount.lustre
      <4>
      <4>Call Trace:
      <4> [<ffffffffa0bfb895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa0bfbe97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa161d42d>] osd_trans_exec_op+0x2ad/0x2e0 [osd_ldiskfs]
      <4> [<ffffffffa162e723>] osd_attr_set+0xe3/0x540 [osd_ldiskfs]
      <4> [<ffffffffa163b845>] ? osd_punch+0x1b5/0x600 [osd_ldiskfs]
      <4> [<ffffffffa10e60f1>] llog_osd_write_blob+0x211/0x850 [obdclass]
      <4> [<ffffffffa10e9d34>] llog_osd_write_rec+0x7d4/0x1370 [obdclass]
      <4> [<ffffffffa10b5438>] llog_write_rec+0xc8/0x290 [obdclass]
      <4> [<ffffffffa10b6bad>] llog_write+0x2ad/0x420 [obdclass]
      <4> [<ffffffffa10b6d44>] llog_copy_handler+0x24/0x30 [obdclass]
      <4> [<ffffffffa10b7e0b>] llog_process_thread+0x8fb/0xe00 [obdclass]
      <4> [<ffffffffa10b6d20>] ? llog_copy_handler+0x0/0x30 [obdclass]
      <4> [<ffffffffa10b9c7d>] llog_process_or_fork+0x12d/0x660 [obdclass]
      <4> [<ffffffffa10ba5a2>] llog_backup+0x3d2/0x500 [obdclass]
      <4> [<ffffffff8128cd30>] ? sprintf+0x40/0x50
      <4> [<ffffffffa16a38cf>] mgc_process_log+0x119f/0x18f0 [mgc]
      <4> [<ffffffffa169c8ba>] ? mgc_name2resid+0x4a/0x230 [mgc]
      <4> [<ffffffffa169d370>] ? mgc_blocking_ast+0x0/0x800 [mgc]
      <4> [<ffffffffa1215b20>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      <4> [<ffffffffa16a5514>] mgc_process_config+0x594/0xed0 [mgc]
      <4> [<ffffffffa110164c>] lustre_process_log+0x25c/0xaa0 [obdclass]
      <4> [<ffffffffa112bffc>] ? server_find_mount+0xbc/0x160 [obdclass]
      <4> [<ffffffffa112ebd6>] ? server_register_mount+0x516/0x8f0 [obdclass]
      <4> [<ffffffffa1134467>] server_start_targets+0x5c7/0x19c0 [obdclass]
      <4> [<ffffffffa0bfcb2e>] ? cfs_free+0xe/0x10 [libcfs]
      <4> [<ffffffffa1104eb5>] ? lustre_start_mgc+0x4a5/0x2180 [obdclass]
      <4> [<ffffffffa10fca20>] ? class_config_llog_handler+0x0/0x1890 [obdclass]
      <4> [<ffffffffa113640c>] server_fill_super+0xbac/0x1660 [obdclass]
      <4> [<ffffffffa1106d68>] lustre_fill_super+0x1d8/0x530 [obdclass]
      <4> [<ffffffffa1106b90>] ? lustre_fill_super+0x0/0x530 [obdclass]
      <4> [<ffffffff8118c7cf>] get_sb_nodev+0x5f/0xa0
      <4> [<ffffffffa10fe3b5>] lustre_get_sb+0x25/0x30 [obdclass]
      <4> [<ffffffff8118be2b>] vfs_kern_mount+0x7b/0x1b0
      <4> [<ffffffff8118bfd2>] do_kern_mount+0x52/0x130
      <4> [<ffffffff811acfdb>] do_mount+0x2fb/0x930
      <4> [<ffffffff811ad6a0>] sys_mount+0x90/0xe0
      <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Would it be possible to backport patch http://review.whamcloud.com/#/c/10108/ in b2_4 branch ?

      Attachments

        Issue Links

          Activity

            [LU-5303] osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 0
            pjones Peter Jones added a comment -

            duplicate of LU-4528

            pjones Peter Jones added a comment - duplicate of LU-4528

            Hi Peter,
            The patch http://review.whamcloud.com/#/c/11751/ is ready to be landed into b2_5 (two positive inspections).
            Could it be included in the future 2.5.4 release ?

            thanks,
            Grégoire.

            pichong Gregoire Pichon added a comment - Hi Peter, The patch http://review.whamcloud.com/#/c/11751/ is ready to be landed into b2_5 (two positive inspections). Could it be included in the future 2.5.4 release ? thanks, Grégoire.
            pjones Peter Jones added a comment -

            Gregoire

            The b2_5 patch is being tracked under LU-4528, which this issue is believed to be a duplicate of

            Peter

            pjones Peter Jones added a comment - Gregoire The b2_5 patch is being tracked under LU-4528 , which this issue is believed to be a duplicate of Peter

            Would it be possible to have a patch for b2_5 worked out ?
            I see a 2.5.3 version is going to be released soon, it would be good to have the fix integrated for that version since some of our customers are going to use lustre 2.5 in septembre.
            thanks.

            pichong Gregoire Pichon added a comment - Would it be possible to have a patch for b2_5 worked out ? I see a 2.5.3 version is going to be released soon, it would be good to have the fix integrated for that version since some of our customers are going to use lustre 2.5 in septembre. thanks.

            The issue did not occured on a production cluster, so this does not require immediate handling. Anyway, this is still a node crash and I would not like to see the same issue appear at a customer site.

            pichong Gregoire Pichon added a comment - The issue did not occured on a production cluster, so this does not require immediate handling. Anyway, this is still a node crash and I would not like to see the same issue appear at a customer site.

            Gregoire, since we seem to agree that b2_5 is the better branch to fix, can you give us a rough idea when you will need the solution to be in place?
            This will help us with our workload planning.

            Many thanks,
            ~ jfc

            jfc John Fuchs-Chesney (Inactive) added a comment - Gregoire, since we seem to agree that b2_5 is the better branch to fix, can you give us a rough idea when you will need the solution to be in place? This will help us with our workload planning. Many thanks, ~ jfc

            It does appear a back port to b2_5 is more plausible. Less than half the number of files need manual attention to merge and the edits needed may not require an expert as in b2_4.

            bogl Bob Glossman (Inactive) added a comment - It does appear a back port to b2_5 is more plausible. Less than half the number of files need manual attention to merge and the edits needed may not require an expert as in b2_4.

            Thanks for looking.
            I wonder if a back port in b2_5, which is the maintenance release, would'nt be more appropriate in this case. This would benefit to the whole community and I am definitely going to move to lustre 2.5.x version at some time.

            pichong Gregoire Pichon added a comment - Thanks for looking. I wonder if a back port in b2_5, which is the maintenance release, would'nt be more appropriate in this case. This would benefit to the whole community and I am definitely going to move to lustre 2.5.x version at some time.

            Mike – we've added you as a watcher on this ticket.
            Can you please advise on the best way forward to resolve Gregoire's problem?
            Thanks,
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Mike – we've added you as a watcher on this ticket. Can you please advise on the best way forward to resolve Gregoire's problem? Thanks, ~ jfc.
            bogl Bob Glossman (Inactive) added a comment - - edited

            It looks to me like a back port should be possible, but needs the attention of somebody who really understands the code being modified to do it correctly. Trying to cherry-pick http://review.whamcloud.com/#/c/10108 back into b2_4 leaves 10 or more files that need manual editing to merge. Some appear to need more knowledge that just trying to resolve context diffs.

            I note that the Author of the original master patch was Mike Pershin. Maybe it's a job for him.

            bogl Bob Glossman (Inactive) added a comment - - edited It looks to me like a back port should be possible, but needs the attention of somebody who really understands the code being modified to do it correctly. Trying to cherry-pick http://review.whamcloud.com/#/c/10108 back into b2_4 leaves 10 or more files that need manual editing to merge. Some appear to need more knowledge that just trying to resolve context diffs. I note that the Author of the original master patch was Mike Pershin. Maybe it's a job for him.

            People

              bogl Bob Glossman (Inactive)
              pichong Gregoire Pichon
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: