Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7332

LustreError: 201113:0:(osd_internal.h:1101:osd_trans_exec_check()) LBUG

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      We are getting hit with this issue much frequently while running
      test: sanity, test 51e

      I am attaching the logs to this ticket.
      Logs from node 51e.windu03.log.windu00:

      LustreError: 201113:0:(osd_internal.h:1101:osd_trans_exec_check()) LBUG
       Pid: 201113, comm: mdt03_000
       
       Call Trace:
        libcfs_debug_dumpstack+0x55/0x80 [libcfs]
        lbug_with_loc+0x47/0xb0 [libcfs]
        osd_xattr_set+0x5d8/0x6c0 [osd_ldiskfs]
        ? ldiskfs_xattr_inode_get+0xdb/0xf0 [ldiskfs]
        lod_sub_object_xattr_set+0x223/0x460 [lod]
        lod_xattr_set_internal+0x126/0x2b0 [lod]
        lod_xattr_set+0x101/0x430 [lod]
        ? mdd_env_info+0x25/0x70 [mdd]
        mdd_links_write+0x235/0x2e0 [mdd]
        mdd_links_rename+0x312/0x620 [mdd]
        mdd_link+0x104c/0x10f0 [mdd]
        mdt_reint_link+0x9b1/0xb40 [mdt]
        ? mdt_root_squash+0x2c/0x3f0 [mdt]
        ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
        mdt_reint_rec+0x5d/0x200 [mdt]
        mdt_reint_internal+0x62b/0xb80 [mdt]
        mdt_reint+0x6b/0x120 [mdt]
        tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
        ptlrpc_main+0xe41/0x1910 [ptlrpc]
        ? ptlrpc_main+0x0/0x1910 [ptlrpc]
        kthread+0x96/0xa0
        child_rip+0xa/0x20
        ? kthread+0x0/0xa0
        ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            [LU-7332] LustreError: 201113:0:(osd_internal.h:1101:osd_trans_exec_check()) LBUG

            Sure Peter. I will keep the updates posted.

            Thanks,

            vinayakh Vinayak (Inactive) added a comment - Sure Peter. I will keep the updates posted. Thanks,
            pjones Peter Jones added a comment -

            Thanks! We'll close this out as a duplicate of LU-5770 then

            pjones Peter Jones added a comment - Thanks! We'll close this out as a duplicate of LU-5770 then

            Multi run passed all (100 times) test instances.

            vinayakh Vinayak (Inactive) added a comment - Multi run passed all (100 times) test instances.

            Hello Andreas, Alex,

            We have tried the patch and sanity, test_51e passes in the initial run on 4 node set up (2 clients, 1 MDS, 1 OSS). Submitted the test for multi run on the same set up and also asked our testing team to verify the issue on environment (10+ nodes production env) where it is reproducible. I will update you soon whatever I hear back from our testing team.

            Thanks,

            vinayakh Vinayak (Inactive) added a comment - Hello Andreas, Alex, We have tried the patch and sanity, test_51e passes in the initial run on 4 node set up (2 clients, 1 MDS, 1 OSS). Submitted the test for multi run on the same set up and also asked our testing team to verify the issue on environment (10+ nodes production env) where it is reproducible. I will update you soon whatever I hear back from our testing team. Thanks,

            Please reply back if that patch fixed your problem, and we can prioritize the landing of the patch.

            adilger Andreas Dilger added a comment - Please reply back if that patch fixed your problem, and we can prioritize the landing of the patch.

            Thanks for pointing me to the solution Alex. I will try it and let you know..

            vinayakh Vinayak (Inactive) added a comment - Thanks for pointing me to the solution Alex. I will try it and let you know..

            this is because of huge LINKEA. please try http://review.whamcloud.com/#/c/12412/

            bzzz Alex Zhuravlev added a comment - this is because of huge LINKEA. please try http://review.whamcloud.com/#/c/12412/

            Yes Peter. I meant the same.

            Thanks,

            vinayakh Vinayak (Inactive) added a comment - Yes Peter. I meant the same. Thanks,
            pjones Peter Jones added a comment -

            I am assuming that by "Latest Intel master" you mean the tip of the community tree master.

            pjones Peter Jones added a comment - I am assuming that by "Latest Intel master" you mean the tip of the community tree master.
            vinayakh Vinayak (Inactive) added a comment - - edited

            Hello Andreas,

            Initially we thought that this issue is much related to LU-6969 but we are still getting this issue even after LU-6969 patch is merged.

            We found this issue on Latest Intel master. Please help me in correcting the Affect version also.

            vinayakh Vinayak (Inactive) added a comment - - edited Hello Andreas, Initially we thought that this issue is much related to LU-6969 but we are still getting this issue even after LU-6969 patch is merged. We found this issue on Latest Intel master . Please help me in correcting the Affect version also.

            People

              wc-triage WC Triage
              vinayakh Vinayak (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: