Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7138

LBUG: (osd_handler.c:1017:osd_trans_start()) ASSERTION( get_current()->journal_info == ((void *)0) ) failed:

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.7.0
    • None
    • 1
    • 9223372036854775807

    Description

      This evening we have hit this LBUG on the MDT in our production file system, the file system is currently down as we hit the same bug every time we attempt to bring the MDT back, as soon as recovery finishes.

      <0>LustreError: 722:0:(osd_handler.c:1017:osd_trans_start()) ASSERTION( get_current()->journal_info == ((void *)0) ) failed:
      <0>LustreError: 722:0:(osd_handler.c:1017:osd_trans_start()) LBUG
      <4>Pid: 722, comm: mdt01_017
      <4>
      <4>Call Trace:
      <4> [<ffffffffa065f895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa065fe97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa17df24d>] osd_trans_start+0x25d/0x660 [osd_ldiskfs]
      <4> [<ffffffffa09b9b4a>] llog_osd_destroy+0x42a/0xd40 [obdclass]
      <4> [<ffffffffa09b2edc>] llog_cat_new_log+0x1ec/0x710 [obdclass]
      <4> [<ffffffffa09b350a>] llog_cat_add_rec+0x10a/0x450 [obdclass]
      <4> [<ffffffffa09ab1e9>] llog_add+0x89/0x1c0 [obdclass]
      <4> [<ffffffffa17f1976>] ? osd_attr_set+0x166/0x460 [osd_ldiskfs]
      <4> [<ffffffffa0d914e2>] mdd_changelog_store+0x122/0x290 [mdd]
      <4> [<ffffffffa0da4d0c>] mdd_changelog_data_store+0x16c/0x320 [mdd]
      <4> [<ffffffffa0dad9b3>] mdd_attr_set+0x12f3/0x1730 [mdd]
      <4> [<ffffffffa088a551>] mdt_reint_setattr+0xf81/0x13a0 [mdt]
      <4> [<ffffffffa087be1c>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
      <4> [<ffffffffa08801dd>] mdt_reint_rec+0x5d/0x200 [mdt]
      <4> [<ffffffffa086423b>] mdt_reint_internal+0x4cb/0x7a0 [mdt]
      <4> [<ffffffffa08649ab>] mdt_reint+0x6b/0x120 [mdt]
      <4> [<ffffffffa0c6f56e>] tgt_request_handle+0x8be/0x1000 [ptlrpc]
      <4> [<ffffffffa0c1f5a1>] ptlrpc_main+0xe41/0x1960 [ptlrpc]
      <4> [<ffffffff8106c4f0>] ? pick_next_task_fair+0xd0/0x130
      <4> [<ffffffffa0c1e760>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
      <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      <4>
      <0>Kernel panic - not syncing: LBUG
      

      The stack trace doesn't quite seem to be the same as for LU-6634 (which anyway doesn't have any fix suggested.)

      Attachments

        Issue Links

          Activity

            [LU-7138] LBUG: (osd_handler.c:1017:osd_trans_start()) ASSERTION( get_current()->journal_info == ((void *)0) ) failed:

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/31478
            Subject: LU-7138 sptlrpc: make srpc_info writable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a2f7394ca5affe35c568ba2971c3bf3aeb2a7843

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/31478 Subject: LU-7138 sptlrpc: make srpc_info writable Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a2f7394ca5affe35c568ba2971c3bf3aeb2a7843

            As far as I can see, at least the patch for LU-6556 has been merged into master. Unfortunately I have not managed to cleanly merge that into the 2.7 branch. Any chance someone could point me to a version for 2.7.x for that patch? (Though I don't think I'll re-enable changelogs on any of the production systems until we have both patches, I still would like to apply the first patch on test systems ASAP.)

            ferner Frederik Ferner (Inactive) added a comment - As far as I can see, at least the patch for LU-6556 has been merged into master. Unfortunately I have not managed to cleanly merge that into the 2.7 branch. Any chance someone could point me to a version for 2.7.x for that patch? (Though I don't think I'll re-enable changelogs on any of the production systems until we have both patches, I still would like to apply the first patch on test systems ASAP.)
            pjones Peter Jones added a comment -

            Frederik

            As soon as the fixes are finalized for LU-6634 and LU-6556 we'll create 2.7.x versions.

            Peter

            pjones Peter Jones added a comment - Frederik As soon as the fixes are finalized for LU-6634 and LU-6556 we'll create 2.7.x versions. Peter

            yes, you're right, that's a typo - I meant LU-6634. the port for 2.7 would need llog_trans_destroy() and llog_destroy() from the master branch.

            bzzz Alex Zhuravlev added a comment - yes, you're right, that's a typo - I meant LU-6634 . the port for 2.7 would need llog_trans_destroy() and llog_destroy() from the master branch.

            Alex,

            could you double check the bug number that this is a duplicate of? LU-6636 doesn't look right, did you mean LU-6634?

            Also, what would be the best way we can get a fix/patch for lustre 2.7?

            Cheers,
            Frederik

            ferner Frederik Ferner (Inactive) added a comment - - edited Alex, could you double check the bug number that this is a duplicate of? LU-6636 doesn't look right, did you mean LU-6634 ? Also, what would be the best way we can get a fix/patch for lustre 2.7? Cheers, Frederik
            bzzz Alex Zhuravlev added a comment - - edited

            a duplicate of LU-6634

            bzzz Alex Zhuravlev added a comment - - edited a duplicate of LU-6634

            People

              green Oleg Drokin
              ferner Frederik Ferner (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: