Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6969

osd_internal.h:1090:osd_trans_exec_check()) LBUG for osd_index_ea_delete()

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      Met this during local racer test on master.

      5[77465]: segfault at 8 ip 00000031f720b3f3 sp 00007fff8f7c3e50 error 4 in ld-2.12.so[31f7200000+20000]
      LustreError: 20700:0:(mdd_object.c:70:mdd_la_get()) lustre-MDD0000: object [0x200000404:0x4774:0x0] not found: rc = -2
      LustreError: 20700:0:(mdd_object.c:70:mdd_la_get()) Skipped 1 previous similar message
      Lustre: 42406:0:(osd_internal.h:1087:osd_trans_exec_check()) op 9: used 10, used now 10, reserved 5
      Lustre: 42406:0:(osd_handler.c:902:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
      Lustre: 42406:0:(osd_handler.c:909:osd_trans_dump_creds())   attr_set: 2/2/0, xattr_set: 1/64/0
      Lustre: 42406:0:(osd_handler.c:919:osd_trans_dump_creds())   write: 6/14/0, punch: 0/0/0, quota 4/4/0
      Lustre: 42406:0:(osd_handler.c:926:osd_trans_dump_creds())   insert: 0/0/0, delete: 1/5/10
      Lustre: 42406:0:(osd_handler.c:933:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 1/1/0
      LustreError: 42406:0:(osd_internal.h:1090:osd_trans_exec_check()) LBUG
      Pid: 42406, comm: mdt_out00_004
      
      Call Trace:
       [<ffffffffa05b4875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa05b4e77>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0f30631>] osd_index_ea_delete+0x7b1/0xe10 [osd_ldiskfs]
       [<ffffffffa0999f90>] out_obj_index_delete+0x150/0x370 [ptlrpc]
       [<ffffffffa099a1d8>] out_tx_index_delete_exec+0x28/0x190 [ptlrpc]
       [<ffffffffa098e0ca>] out_tx_end+0xda/0x5d0 [ptlrpc]
       [<ffffffffa09931df>] out_handle+0x7af/0x1950 [ptlrpc]
       [<ffffffffa05c0c01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa098afc2>] tgt_request_handle+0xa42/0x1230 [ptlrpc]
       [<ffffffffa09331a1>] ptlrpc_main+0xe41/0x1920 [ptlrpc]
       [<ffffffffa0932360>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]
       [<ffffffff8109e66e>] kthread+0x9e/0xc0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      LustreError: dumping log to /tmp/lustre-log.1438790617.42406
      
      Message from syslogd@testnode at Aug  5 09:48:22 ...                                                                                                                                                                                                                                
       kernel:LustreError: 8393:0:(osd_internal.h:1090:osd_trans_exec_check()) LBUG
      

      Attachments

        Issue Links

          Activity

            [LU-6969] osd_internal.h:1090:osd_trans_exec_check()) LBUG for osd_index_ea_delete()

            Sorry, my bad.

            vinayakh Vinayak (Inactive) added a comment - Sorry, my bad.

            sorry, this is a different issue happened to OST.

            bzzz Alex Zhuravlev added a comment - sorry, this is a different issue happened to OST.

            The issue looks to be not resolved completely. Hit this while running racer, test_1 for multiple times. Attaching the log file.

            Lustre: DEBUG MARKER: == racer test 1: racer on clients: fre0311,fre0312 DURATION=900 == 22:55:04 (1443740104)
            
            Lustre: 7246:0:(osd_handler.c:912:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
            
            Lustre: 7246:0:(osd_handler.c:919:osd_trans_dump_creds())   attr_set: 1/1/0, xattr_set: 1/1/0
            
            Lustre: 7246:0:(osd_handler.c:929:osd_trans_dump_creds())   write: 2/12/0, punch: 1/4/0, quota 2/2/0
            
            Lustre: 7246:0:(osd_handler.c:936:osd_trans_dump_creds())   insert: 0/0/0, delete: 0/0/0
            
            Lustre: 7246:0:(osd_handler.c:943:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
            
            LustreError: 7246:0:(osd_internal.h:1040:osd_trans_exec_op()) lustre-OST0001-osd: op = 7, rb = 7
            
            LustreError: 7246:0:(osd_internal.h:1048:osd_trans_exec_op()) LBUG
            
            Pid: 7246, comm: ll_ost_io00_009
            
            
            
            Call Trace:
            
             [<ffffffffa032a875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            
             [<ffffffffa032ae77>] lbug_with_loc+0x47/0xb0 [libcfs]
            
             [<ffffffffa0b7e0be>] osd_write+0x41e/0x5b0 [osd_ldiskfs]
            
             [<ffffffffa0477c4d>] dt_record_write+0x3d/0x130 [obdclass]
            
             [<ffffffffa06f8545>] tgt_client_data_write+0x165/0x1b0 [ptlrpc]
            
             [<ffffffffa06f9517>] tgt_txn_stop_cb+0x477/0x1110 [ptlrpc]
            
             [<ffffffffa0477b1e>] dt_txn_hook_stop+0x5e/0x90 [obdclass]
            
             [<ffffffffa0b5b0ce>] osd_trans_stop+0x1ae/0x990 [osd_ldiskfs]
            
             [<ffffffffa0b6bd58>] ? osd_attr_set+0x148/0x620 [osd_ldiskfs]
            
             [<ffffffffa0ce7a7f>] ofd_trans_stop+0x1f/0x60 [ofd]
            
             [<ffffffffa0ce94aa>] ofd_object_punch+0x35a/0xa30 [ofd]
            
             [<ffffffffa0cd573e>] ofd_punch_hdl+0x36e/0xb20 [ofd]
            
             [<ffffffffa07084bc>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
            
             [<ffffffffa06afb41>] ptlrpc_main+0xe41/0x1910 [ptlrpc]
            
             [<ffffffffa06aed00>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
            
             [<ffffffff8109abf6>] kthread+0x96/0xa0
            
             [<ffffffff8100c20a>] child_rip+0xa/0x20
            
             [<ffffffff8109ab60>] ? kthread+0x0/0xa0
            
             [<ffffffff8100c200>] ? child_rip+0x0/0x20
            

            Correct me If I am wrong.

            vinayakh Vinayak (Inactive) added a comment - The issue looks to be not resolved completely. Hit this while running racer, test_1 for multiple times. Attaching the log file. Lustre: DEBUG MARKER: == racer test 1: racer on clients: fre0311,fre0312 DURATION=900 == 22:55:04 (1443740104) Lustre: 7246:0:(osd_handler.c:912:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 Lustre: 7246:0:(osd_handler.c:919:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 1/1/0 Lustre: 7246:0:(osd_handler.c:929:osd_trans_dump_creds()) write: 2/12/0, punch: 1/4/0, quota 2/2/0 Lustre: 7246:0:(osd_handler.c:936:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 Lustre: 7246:0:(osd_handler.c:943:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 LustreError: 7246:0:(osd_internal.h:1040:osd_trans_exec_op()) lustre-OST0001-osd: op = 7, rb = 7 LustreError: 7246:0:(osd_internal.h:1048:osd_trans_exec_op()) LBUG Pid: 7246, comm: ll_ost_io00_009 Call Trace: [<ffffffffa032a875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa032ae77>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0b7e0be>] osd_write+0x41e/0x5b0 [osd_ldiskfs] [<ffffffffa0477c4d>] dt_record_write+0x3d/0x130 [obdclass] [<ffffffffa06f8545>] tgt_client_data_write+0x165/0x1b0 [ptlrpc] [<ffffffffa06f9517>] tgt_txn_stop_cb+0x477/0x1110 [ptlrpc] [<ffffffffa0477b1e>] dt_txn_hook_stop+0x5e/0x90 [obdclass] [<ffffffffa0b5b0ce>] osd_trans_stop+0x1ae/0x990 [osd_ldiskfs] [<ffffffffa0b6bd58>] ? osd_attr_set+0x148/0x620 [osd_ldiskfs] [<ffffffffa0ce7a7f>] ofd_trans_stop+0x1f/0x60 [ofd] [<ffffffffa0ce94aa>] ofd_object_punch+0x35a/0xa30 [ofd] [<ffffffffa0cd573e>] ofd_punch_hdl+0x36e/0xb20 [ofd] [<ffffffffa07084bc>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc] [<ffffffffa06afb41>] ptlrpc_main+0xe41/0x1910 [ptlrpc] [<ffffffffa06aed00>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Correct me If I am wrong.

            Landed for 2.8.0

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed for 2.8.0

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15924/
            Subject: LU-6969 osd: remove agent inodes in a separate transaction
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 0887b89c0c4e2b7c5a7ba3365e758a7d94c667fa

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15924/ Subject: LU-6969 osd: remove agent inodes in a separate transaction Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0887b89c0c4e2b7c5a7ba3365e758a7d94c667fa

            We faced this issue in sanity, test_51e and with the patch indicated in this ticket http://review.whamcloud.com/#/c/15924, did not face the issue even after running the test for 50 times.

            We have asked test team to verify the same if they have any scenario/cases in which this re-produces and also started multi runs for Intel specific failures like racer, test_1 etc.

            vinayakh Vinayak (Inactive) added a comment - We faced this issue in sanity, test_51e and with the patch indicated in this ticket http://review.whamcloud.com/#/c/15924 , did not face the issue even after running the test for 50 times. We have asked test team to verify the same if they have any scenario/cases in which this re-produces and also started multi runs for Intel specific failures like racer, test_1 etc.

            Kalpak, did you test http://review.whamcloud.com/15924 to see if it fixes this issue?

            adilger Andreas Dilger added a comment - Kalpak, did you test http://review.whamcloud.com/15924 to see if it fixes this issue?

            People

              bzzz Alex Zhuravlev
              di.wang Di Wang
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: