Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9983

LBUG llog_osd.c:327:llog_osd_declare_write_rec() - all DNE MDS

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.10.1, Lustre 2.11.0
    • Fix Version/s: Lustre 2.11.0, Lustre 2.10.6
    • Labels:
    • Environment:
      version=2.10.52_83_g1fc4ed3 lustre-master build 3637 - RHEL 7.3 distro with 7.4 kernels
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      soak is started, jobs begin to run.
      soak-8 (MDT0000) reports and error:

      Sep 13 15:53:37 soak-8 kernel: LustreError: 4873:0:(out_handler.c:597:out_write()) soaked-MDT0000: empty or wrong size 0 pos: rc = -61
      Sep 13 15:53:37 soak-8 kernel: LustreError: 4990:0:(out_handler.c:1000:out_handle()) soaked-MDT0000: invalid update buffer magic 0 expect bdde0002: rc = -71
      Sep 13 15:53:37 soak-8 kernel: LustreError: 4873:0:(out_handler.c:597:out_write()) Skipped 2 previous similar messages
      Sep 13 15:53:44 soak-8 kernel: LustreError: 5117:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 968 actual 344.
      

      All other DNE MDTS (01-03) promptly LBUG:
      soak-9

      Sep 13 15:53:37 soak-9 kernel: LustreError: 11-0: soaked-MDT0000-osp-MDT0001: operation out_update to node 192.168.1.108@o2ib failed: rc = -71
      Sep 13 15:53:37 soak-9 kernel: LustreError: 4734:0:(layout.c:2082:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)#012  req@ffff880808e79500 x1578438980902080/t0(0) o1000->soaked-MDT0002-osp-MDT0001@192.168.1.110@o2ib:24/4 lens 392/192 e 0 to 0 dl 1505318024 ref 2 fl Interpret:RM/0/0 rc -71/-71
      Sep 13 15:53:37 soak-9 kernel: LustreError: Skipped 1 previous similar message
      Sep 13 15:53:47 soak-9 kernel: LustreError: 4719:0:(llog_osd.c:327:llog_osd_declare_write_rec()) ASSERTION( rec ) failed:
      Sep 13 15:53:47 soak-9 kernel: LustreError: 4719:0:(llog_osd.c:327:llog_osd_declare_write_rec()) LBUG
      Sep 13 15:53:47 soak-9 kernel: Pid: 4719, comm: dist_txn-1
      Sep 13 15:53:47 soak-9 kernel: #012Call Trace:
      Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0e667ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0e6683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0f658f5>] llog_osd_declare_write_rec+0x3c5/0x3d0 [obdclass]
      Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0f5a0e4>] llog_declare_write_rec+0x84/0x200 [obdclass]
      Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0f5a862>] llog_cancel_rec+0xe2/0x870 [obdclass]
      Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0f60f6a>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass]
      Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0e71ba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      Sep 13 15:53:47 soak-9 kernel: [<ffffffffc11f22d9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc]
      Sep 13 15:53:47 soak-9 kernel: [<ffffffff810c12c8>] ? check_preempt_curr+0x78/0xa0 
      Sep 13 15:53:47 soak-9 kernel: [<ffffffff810c4810>] ? default_wake_function+0x0/0x20
      Sep 13 15:53:47 soak-9 kernel: [<ffffffffc11f1d40>] ? distribute_txn_commit_thread+0x0/0xca0 [ptlrpc]
      Sep 13 15:53:47 soak-9 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
      Sep 13 15:53:47 soak-9 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      Sep 13 15:53:47 soak-9 kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
      Sep 13 15:53:47 soak-9 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      Sep 13 15:53:47 soak-9 kernel:
      Sep 13 15:53:47 soak-9 kernel: Kernel panic - not syncing: LBUG
      Sep 13 15:53:50 soak-9 kernel: CPU: 25 PID: 4719 Comm: dist_txn-1 Tainted: P           OE  ------------   3.10.0-693.1.1.el7_lustre.x86_64 #1
      Sep 13 15:53:50 soak-9 kernel: Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      Sep 13 15:53:50 soak-9 kernel: ffff880419612f00 000000001ae378cc ffff880418eafb90 ffffffff816a3d6d
      Sep 13 15:53:50 soak-9 kernel: ffff880418eafc10 ffffffff8169dc54 ffffffff00000008 ffff880418eafc20
      Sep 13 15:53:50 soak-9 kernel: ffff880418eafbc0 000000001ae378cc 000000001ae378cc ffff88082da4f8b8
      Sep 13 15:53:50 soak-9 kernel: Call Trace:
      Sep 13 15:53:50 soak-9 kernel: [<ffffffff816a3d6d>] dump_stack+0x19/0x1b
      Sep 13 15:53:50 soak-9 kernel: [<ffffffff8169dc54>] panic+0xe8/0x20d
      Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0e66854>] lbug_with_loc+0x64/0xb0 [libcfs]
      Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0f658f5>] llog_osd_declare_write_rec+0x3c5/0x3d0 [obdclass]
      Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0f5a0e4>] llog_declare_write_rec+0x84/0x200 [obdclass]
      Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0f5a862>] llog_cancel_rec+0xe2/0x870 [obdclass]
      Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0f60f6a>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass]
      Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0e71ba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      Sep 13 15:53:50 soak-9 kernel: [<ffffffffc11f22d9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc]
      Sep 13 15:53:50 soak-9 kernel: [<ffffffff810c12c8>] ? check_preempt_curr+0x78/0xa0
      Sep 13 15:53:50 soak-9 kernel: [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
      Sep 13 15:53:50 soak-9 kernel: [<ffffffffc11f1d40>] ? sub_trans_commit_cb+0x20/0x20 [ptlrpc]
      Sep 13 15:53:50 soak-9 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
      Sep 13 15:53:50 soak-9 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      Sep 13 15:53:50 soak-9 kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
      Sep 13 15:53:50 soak-9 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      Sep 13 15:53:50 soak-9 kernel: Kernel Offset: disabled
      Sep 13 15:53:50 soak-9 kernel: ------------[ cut here ]------------
      

      soak-10

      Sep 13 15:53:37 soak-10 kernel: LustreError: 11-0: soaked-MDT0000-osp-MDT0002: operation out_update to node 192.168.1.108@o2ib failed: rc = -71
      Sep 13 15:53:37 soak-10 kernel: LustreError: 14955:0:(layout.c:2082:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)#012  req@ffff880818bb5400 x1578438986144240/t0(0) o1000->soaked-MDT0000-osp-MDT0002@192.168.1.108@o2ib:24/4 lens 392/192 e 0 to 0 dl 1505318024 ref 2 fl Interpret:RM/0/0 rc -71/-71
      Sep 13 15:53:37 soak-10 kernel: LustreError: 14939:0:(out_handler.c:597:out_write()) soaked-MDT0002: empty or wrong size 0 pos: rc = -61
      Sep 13 15:53:37 soak-10 kernel: LustreError: 14939:0:(out_handler.c:1000:out_handle()) soaked-MDT0002: invalid update buffer magic 0 expect bdde0002: rc = -71
      Sep 13 15:53:44 soak-10 kernel: LustreError: 14961:0:(llog_osd.c:327:llog_osd_declare_write_rec()) ASSERTION( rec ) failed:
      Sep 13 15:53:44 soak-10 kernel: LustreError: 14961:0:(llog_osd.c:327:llog_osd_declare_write_rec()) LBUG
      Sep 13 15:53:44 soak-10 kernel: Pid: 14961, comm: dist_txn-2
      Sep 13 15:53:44 soak-10 kernel: #012Call Trace:
      Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0e087ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0e0883c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0f208f5>] llog_osd_declare_write_rec+0x3c5/0x3d0 [obdclass]
      Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0f150e4>] llog_declare_write_rec+0x84/0x200 [obdclass]
      Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0f15862>] llog_cancel_rec+0xe2/0x870 [obdclass]
      Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0f1bf6a>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass]
      Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0e13ba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      Sep 13 15:53:44 soak-10 kernel: [<ffffffffc11fc2d9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc]
      Sep 13 15:53:44 soak-10 kernel: [<ffffffff810c4810>] ? default_wake_function+0x0/0x20
      Sep 13 15:53:44 soak-10 kernel: [<ffffffffc11fbd40>] ? distribute_txn_commit_thread+0x0/0xca0 [ptlrpc]
      Sep 13 15:53:44 soak-10 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
      Sep 13 15:53:44 soak-10 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      Sep 13 15:53:44 soak-10 kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
      Sep 13 15:53:44 soak-10 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      Sep 13 15:53:44 soak-10 kernel:
      Sep 13 15:53:44 soak-10 kernel: Kernel panic - not syncing: LBUG
      Sep 13 15:53:45 soak-10 kernel: CPU: 10 PID: 14961 Comm: dist_txn-2 Tainted: P           OE  ------------   3.10.0-693.1.1.el7_lustre.x86_64 #1
      Sep 13 15:53:45 soak-10 kernel: Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      Sep 13 15:53:45 soak-10 kernel: ffff8803f8e7bf00 000000009b9e8911 ffff8803f8ea7b90 ffffffff816a3d6d
      Sep 13 15:53:45 soak-10 kernel: ffff8803f8ea7c10 ffffffff8169dc54 ffffffff00000008 ffff8803f8ea7c20
      Sep 13 15:53:45 soak-10 kernel: ffff8803f8ea7bc0 000000009b9e8911 000000009b9e8911 ffff88082d88f8b8
      Sep 13 15:53:45 soak-10 kernel: Call Trace:
      Sep 13 15:53:45 soak-10 kernel: [<ffffffff816a3d6d>] dump_stack+0x19/0x1b
      Sep 13 15:53:45 soak-10 kernel: [<ffffffff8169dc54>] panic+0xe8/0x20d
      Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0e08854>] lbug_with_loc+0x64/0xb0 [libcfs]
      Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0f208f5>] llog_osd_declare_write_rec+0x3c5/0x3d0 [obdclass]
      Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0f150e4>] llog_declare_write_rec+0x84/0x200 [obdclass]
      Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0f15862>] llog_cancel_rec+0xe2/0x870 [obdclass]
      Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0f1bf6a>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass]
      Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0e13ba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      Sep 13 15:53:45 soak-10 kernel: [<ffffffffc11fc2d9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc]
      Sep 13 15:53:45 soak-10 kernel: [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
      Sep 13 15:53:45 soak-10 kernel: [<ffffffffc11fbd40>] ? sub_trans_commit_cb+0x20/0x20 [ptlrpc]
      Sep 13 15:53:45 soak-10 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
      Sep 13 15:53:45 soak-10 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      Sep 13 15:53:45 soak-10 kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
      Sep 13 15:53:45 soak-10 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      Sep 13 15:53:45 soak-10 kernel: Kernel Offset: disabled
      Sep 13 15:53:45 soak-10 kernel: ------------[ cut here ]------------
      

      soak-11

      Sep 13 15:53:43 soak-11 kernel: LustreError: 4755:0:(llog_osd.c:327:llog_osd_declare_write_rec()) ASSERTION( rec ) failed:
      Sep 13 15:53:43 soak-11 kernel: LustreError: 4755:0:(llog_osd.c:327:llog_osd_declare_write_rec()) LBUG
      Sep 13 15:53:43 soak-11 kernel: Pid: 4755, comm: dist_txn-3
      Sep 13 15:53:43 soak-11 kernel: #012Call Trace:
      Sep 13 15:53:43 soak-11 kernel: [<ffffffffc0d6f7ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      Sep 13 15:53:43 soak-11 kernel: [<ffffffffc0d6f83c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      Sep 13 15:53:44 soak-11 kernel: [<ffffffffc0e878f5>] llog_osd_declare_write_rec+0x3c5/0x3d0 [obdclass]
      Sep 13 15:53:44 soak-11 kernel: [<ffffffffc0e7c0e4>] llog_declare_write_rec+0x84/0x200 [obdclass]
      Sep 13 15:53:44 soak-11 kernel: [<ffffffffc0e7c862>] llog_cancel_rec+0xe2/0x870 [obdclass]
      Sep 13 15:53:44 soak-11 kernel: [<ffffffffc0e82f6a>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass]
      Sep 13 15:53:44 soak-11 kernel: [<ffffffffc0d7aba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      Sep 13 15:53:44 soak-11 kernel: [<ffffffffc11632d9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc]
      Sep 13 15:53:44 soak-11 kernel: [<ffffffff810c4810>] ? default_wake_function+0x0/0x20
      Sep 13 15:53:44 soak-11 kernel: [<ffffffffc1162d40>] ? distribute_txn_commit_thread+0x0/0xca0 [ptlrpc]
      Sep 13 15:53:44 soak-11 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
      Sep 13 15:53:44 soak-11 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      Sep 13 15:53:44 soak-11 kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
      Sep 13 15:53:44 soak-11 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      Sep 13 15:53:44 soak-11 kernel:
      Sep 13 15:53:44 soak-11 kernel: Kernel panic - not syncing: LBUG
      

        Attachments

        1. client.log
          80 kB
        2. client.lustre.log.txt.gz
          965 kB
        3. MDS.lustre.log.txt.gz
          12.83 MB
        4. server.log
          455 kB
        5. soak-8.lustre.log.gz
          165 kB
        6. vmcore-dmesg.txt
          139 kB

          Issue Links

            Activity

              People

              • Assignee:
                wc-triage WC Triage
                Reporter:
                cliffw Cliff White (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: