Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3892

osp_sync.c:356:osp_sync_interpret()) ASSERTION( req->rq_transno == 0 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0, Lustre 2.4.2
    • Lustre 2.5.0
    • None
    • 10156

    Description

      I started to hit this recently running sanity in a loop, in different tests, same crash every time:

      <0>[20903.330989] LustreError: 9397:0:(osp_sync.c:356:osp_sync_interpret()) ASSERTION( req->rq_transno == 0 ) failed: 
      <0>[20903.331969] LustreError: 9397:0:(osp_sync.c:356:osp_sync_interpret()) LBUG
      <4>[20903.332470] Pid: 9397, comm: ptlrpcd_2
      <4>[20903.332898] 
      <4>[20903.332899] Call Trace:
      <4>[20903.333610]  [<ffffffffa0ac78a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4>[20903.334192]  [<ffffffffa0ac7ea7>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4>[20903.334705]  [<ffffffffa07f1092>] osp_sync_interpret+0x492/0x500 [osp]
      <4>[20903.335252]  [<ffffffffa127b2aa>] ptlrpc_check_set+0x2ca/0x1da0 [ptlrpc]
      <4>[20903.335824]  [<ffffffffa12a700b>] ptlrpcd_check+0x55b/0x590 [ptlrpc]
      <4>[20903.336580]  [<ffffffffa12a7553>] ptlrpcd+0x233/0x390 [ptlrpc]
      <4>[20903.337121]  [<ffffffff8105ad10>] ? default_wake_function+0x0/0x20
      <4>[20903.337649]  [<ffffffffa12a7320>] ? ptlrpcd+0x0/0x390 [ptlrpc]
      <4>[20903.338147]  [<ffffffff81094606>] kthread+0x96/0xa0
      <4>[20903.343428]  [<ffffffff8100c10a>] child_rip+0xa/0x20
      <4>[20903.343939]  [<ffffffff81094570>] ? kthread+0x0/0xa0
      <4>[20903.344452]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
      <4>[20903.345028] 
      <0>[20903.348400] Kernel panic - not syncing: LBUG
      

      Crash and modules: /exports/crashdumps/192.168.10.219-2013-09-05-20\:55\:33/
      other crashes like this: /exports/crashdumps/192.168.10.224-2013-09-05-19:19:15 /exports/crashdumps/192.168.10.219-2013-09-05-15\:06\:22
      source tag in my tree: master-20130905

      Attachments

        Issue Links

          Activity

            [LU-3892] osp_sync.c:356:osp_sync_interpret()) ASSERTION( req->rq_transno == 0 ) failed

            Was this patched in master as well? Cray saw this issue in 2.6 (LU-5193), so it's presumably in master.
            [Edit]

            Sorry, please forget that comment. For some reason I thought the original patch was against 2.5. Not sure what I was thinking here.

            paf Patrick Farrell (Inactive) added a comment - - edited Was this patched in master as well? Cray saw this issue in 2.6 ( LU-5193 ), so it's presumably in master. [Edit] Sorry, please forget that comment. For some reason I thought the original patch was against 2.5. Not sure what I was thinking here.
            pjones Peter Jones added a comment -

            Hi Lukasz

            It's ok. This issue is under consideration for 2.4.2. There is no need to open a new ticket.

            Peter

            pjones Peter Jones added a comment - Hi Lukasz It's ok. This issue is under consideration for 2.4.2. There is no need to open a new ticket. Peter
            lflis Lukasz Flis added a comment -

            Peter, should i report this bug in a new ticket pointing 2.4 explicitly?

            Sep 25 20:56:14 <user.notice> mds01.storage 3450:0:(osp_sync.c:359:osp_sync_interpret()) ASSERTION(
            req->rq_transno == 0 ) failed:
            Sep 25 20:56:14 <user.notice> mds01.storage 3450:0:(osp_sync.c:359:osp_sync_interpret()) LBUG

            Sep 25 20:56:14 <user.notice> mds01.storage Kernel[]: panic - not syncing: LBUG
            Sep 25 20:56:14 <user.notice> mds01.storage Pid[]: 3450, comm: ptlrpcd_4 Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1
            Sep 25 20:56:14 <user.notice> mds01.storage Call[]: Trace:
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffff8150de58>]: ? panic+0xa7/0x16f
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa052beeb>]: ? lbug_with_loc+0x9b/0xb0 [libcfs]
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa0fe56b3>]: ? osp_sync_interpret+0x4a3/0x510 [osp]
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa07f5edc>]: ? ptlrpc_check_set+0x2ac/0x1b20 [ptlrpc]
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa082369b>]: ? ptlrpcd_check+0x53b/0x560 [ptlrpc]
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa0823bc3>]: ? ptlrpcd+0x233/0x390 [ptlrpc]
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffff81063410>]: ? default_wake_function+0x0/0x20
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa0823990>]: ? ptlrpcd+0x0/0x390 [ptlrpc]
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffff8100c0ca>]: ? child_rip+0xa/0x20
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa0823990>]: ? ptlrpcd+0x0/0x390 [ptlrpc]
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa0823990>]: ? ptlrpcd+0x0/0x390 [ptlrpc]
            Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffff8100c0c0>]: ? child_rip+0x0/0x20

            It's exactly the same issue

            Lukasz Flis

            lflis Lukasz Flis added a comment - Peter, should i report this bug in a new ticket pointing 2.4 explicitly? Sep 25 20:56:14 <user.notice> mds01.storage 3450:0:(osp_sync.c:359:osp_sync_interpret()) ASSERTION( req->rq_transno == 0 ) failed: Sep 25 20:56:14 <user.notice> mds01.storage 3450:0:(osp_sync.c:359:osp_sync_interpret()) LBUG Sep 25 20:56:14 <user.notice> mds01.storage Kernel[]: panic - not syncing: LBUG Sep 25 20:56:14 <user.notice> mds01.storage Pid[]: 3450, comm: ptlrpcd_4 Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1 Sep 25 20:56:14 <user.notice> mds01.storage Call[]: Trace: Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffff8150de58>] : ? panic+0xa7/0x16f Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa052beeb>] : ? lbug_with_loc+0x9b/0xb0 [libcfs] Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa0fe56b3>] : ? osp_sync_interpret+0x4a3/0x510 [osp] Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa07f5edc>] : ? ptlrpc_check_set+0x2ac/0x1b20 [ptlrpc] Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa082369b>] : ? ptlrpcd_check+0x53b/0x560 [ptlrpc] Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa0823bc3>] : ? ptlrpcd+0x233/0x390 [ptlrpc] Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffff81063410>] : ? default_wake_function+0x0/0x20 Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa0823990>] : ? ptlrpcd+0x0/0x390 [ptlrpc] Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffff8100c0ca>] : ? child_rip+0xa/0x20 Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa0823990>] : ? ptlrpcd+0x0/0x390 [ptlrpc] Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffffa0823990>] : ? ptlrpcd+0x0/0x390 [ptlrpc] Sep 25 20:56:14 <user.notice> mds01.storage [<ffffffff8100c0c0>] : ? child_rip+0x0/0x20 It's exactly the same issue – Lukasz Flis
            lflis Lukasz Flis added a comment -

            Our MDS server got panic today due to this bug.
            Problem is also present in 2.4.1 - please remember to cherry pick
            patch for next 2.4 release

            Regards

            Lukasz Flis
            ACC Cyfronet

            lflis Lukasz Flis added a comment - Our MDS server got panic today due to this bug. Problem is also present in 2.4.1 - please remember to cherry pick patch for next 2.4 release Regards – Lukasz Flis ACC Cyfronet
            pjones Peter Jones added a comment -

            Landed for 2.5.0

            pjones Peter Jones added a comment - Landed for 2.5.0

            I was able to reproduce the issue locally. the last patch should fix the root cause.

            bzzz Alex Zhuravlev added a comment - I was able to reproduce the issue locally. the last patch should fix the root cause.

            hopefully a better approach: http://review.whamcloud.com/#/c/7672/

            bzzz Alex Zhuravlev added a comment - hopefully a better approach: http://review.whamcloud.com/#/c/7672/
            bzzz Alex Zhuravlev added a comment - http://review.whamcloud.com/7664

            People

              bzzz Alex Zhuravlev
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: