Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.10.1, Lustre 2.11.0
-
version=2.10.52_83_g1fc4ed3 lustre-master build 3637 - RHEL 7.3 distro with 7.4 kernels
-
3
-
9223372036854775807
Description
soak is started, jobs begin to run.
soak-8 (MDT0000) reports and error:
Sep 13 15:53:37 soak-8 kernel: LustreError: 4873:0:(out_handler.c:597:out_write()) soaked-MDT0000: empty or wrong size 0 pos: rc = -61 Sep 13 15:53:37 soak-8 kernel: LustreError: 4990:0:(out_handler.c:1000:out_handle()) soaked-MDT0000: invalid update buffer magic 0 expect bdde0002: rc = -71 Sep 13 15:53:37 soak-8 kernel: LustreError: 4873:0:(out_handler.c:597:out_write()) Skipped 2 previous similar messages Sep 13 15:53:44 soak-8 kernel: LustreError: 5117:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 968 actual 344.
All other DNE MDTS (01-03) promptly LBUG:
soak-9
Sep 13 15:53:37 soak-9 kernel: LustreError: 11-0: soaked-MDT0000-osp-MDT0001: operation out_update to node 192.168.1.108@o2ib failed: rc = -71 Sep 13 15:53:37 soak-9 kernel: LustreError: 4734:0:(layout.c:2082:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)#012 req@ffff880808e79500 x1578438980902080/t0(0) o1000->soaked-MDT0002-osp-MDT0001@192.168.1.110@o2ib:24/4 lens 392/192 e 0 to 0 dl 1505318024 ref 2 fl Interpret:RM/0/0 rc -71/-71 Sep 13 15:53:37 soak-9 kernel: LustreError: Skipped 1 previous similar message Sep 13 15:53:47 soak-9 kernel: LustreError: 4719:0:(llog_osd.c:327:llog_osd_declare_write_rec()) ASSERTION( rec ) failed: Sep 13 15:53:47 soak-9 kernel: LustreError: 4719:0:(llog_osd.c:327:llog_osd_declare_write_rec()) LBUG Sep 13 15:53:47 soak-9 kernel: Pid: 4719, comm: dist_txn-1 Sep 13 15:53:47 soak-9 kernel: #012Call Trace: Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0e667ae>] libcfs_call_trace+0x4e/0x60 [libcfs] Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0e6683c>] lbug_with_loc+0x4c/0xb0 [libcfs] Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0f658f5>] llog_osd_declare_write_rec+0x3c5/0x3d0 [obdclass] Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0f5a0e4>] llog_declare_write_rec+0x84/0x200 [obdclass] Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0f5a862>] llog_cancel_rec+0xe2/0x870 [obdclass] Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0f60f6a>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass] Sep 13 15:53:47 soak-9 kernel: [<ffffffffc0e71ba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] Sep 13 15:53:47 soak-9 kernel: [<ffffffffc11f22d9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc] Sep 13 15:53:47 soak-9 kernel: [<ffffffff810c12c8>] ? check_preempt_curr+0x78/0xa0 Sep 13 15:53:47 soak-9 kernel: [<ffffffff810c4810>] ? default_wake_function+0x0/0x20 Sep 13 15:53:47 soak-9 kernel: [<ffffffffc11f1d40>] ? distribute_txn_commit_thread+0x0/0xca0 [ptlrpc] Sep 13 15:53:47 soak-9 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0 Sep 13 15:53:47 soak-9 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0 Sep 13 15:53:47 soak-9 kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 Sep 13 15:53:47 soak-9 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0 Sep 13 15:53:47 soak-9 kernel: Sep 13 15:53:47 soak-9 kernel: Kernel panic - not syncing: LBUG Sep 13 15:53:50 soak-9 kernel: CPU: 25 PID: 4719 Comm: dist_txn-1 Tainted: P OE ------------ 3.10.0-693.1.1.el7_lustre.x86_64 #1 Sep 13 15:53:50 soak-9 kernel: Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Sep 13 15:53:50 soak-9 kernel: ffff880419612f00 000000001ae378cc ffff880418eafb90 ffffffff816a3d6d Sep 13 15:53:50 soak-9 kernel: ffff880418eafc10 ffffffff8169dc54 ffffffff00000008 ffff880418eafc20 Sep 13 15:53:50 soak-9 kernel: ffff880418eafbc0 000000001ae378cc 000000001ae378cc ffff88082da4f8b8 Sep 13 15:53:50 soak-9 kernel: Call Trace: Sep 13 15:53:50 soak-9 kernel: [<ffffffff816a3d6d>] dump_stack+0x19/0x1b Sep 13 15:53:50 soak-9 kernel: [<ffffffff8169dc54>] panic+0xe8/0x20d Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0e66854>] lbug_with_loc+0x64/0xb0 [libcfs] Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0f658f5>] llog_osd_declare_write_rec+0x3c5/0x3d0 [obdclass] Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0f5a0e4>] llog_declare_write_rec+0x84/0x200 [obdclass] Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0f5a862>] llog_cancel_rec+0xe2/0x870 [obdclass] Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0f60f6a>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass] Sep 13 15:53:50 soak-9 kernel: [<ffffffffc0e71ba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] Sep 13 15:53:50 soak-9 kernel: [<ffffffffc11f22d9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc] Sep 13 15:53:50 soak-9 kernel: [<ffffffff810c12c8>] ? check_preempt_curr+0x78/0xa0 Sep 13 15:53:50 soak-9 kernel: [<ffffffff810c4810>] ? wake_up_state+0x20/0x20 Sep 13 15:53:50 soak-9 kernel: [<ffffffffc11f1d40>] ? sub_trans_commit_cb+0x20/0x20 [ptlrpc] Sep 13 15:53:50 soak-9 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0 Sep 13 15:53:50 soak-9 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 Sep 13 15:53:50 soak-9 kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 Sep 13 15:53:50 soak-9 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 Sep 13 15:53:50 soak-9 kernel: Kernel Offset: disabled Sep 13 15:53:50 soak-9 kernel: ------------[ cut here ]------------
soak-10
Sep 13 15:53:37 soak-10 kernel: LustreError: 11-0: soaked-MDT0000-osp-MDT0002: operation out_update to node 192.168.1.108@o2ib failed: rc = -71 Sep 13 15:53:37 soak-10 kernel: LustreError: 14955:0:(layout.c:2082:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)#012 req@ffff880818bb5400 x1578438986144240/t0(0) o1000->soaked-MDT0000-osp-MDT0002@192.168.1.108@o2ib:24/4 lens 392/192 e 0 to 0 dl 1505318024 ref 2 fl Interpret:RM/0/0 rc -71/-71 Sep 13 15:53:37 soak-10 kernel: LustreError: 14939:0:(out_handler.c:597:out_write()) soaked-MDT0002: empty or wrong size 0 pos: rc = -61 Sep 13 15:53:37 soak-10 kernel: LustreError: 14939:0:(out_handler.c:1000:out_handle()) soaked-MDT0002: invalid update buffer magic 0 expect bdde0002: rc = -71 Sep 13 15:53:44 soak-10 kernel: LustreError: 14961:0:(llog_osd.c:327:llog_osd_declare_write_rec()) ASSERTION( rec ) failed: Sep 13 15:53:44 soak-10 kernel: LustreError: 14961:0:(llog_osd.c:327:llog_osd_declare_write_rec()) LBUG Sep 13 15:53:44 soak-10 kernel: Pid: 14961, comm: dist_txn-2 Sep 13 15:53:44 soak-10 kernel: #012Call Trace: Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0e087ae>] libcfs_call_trace+0x4e/0x60 [libcfs] Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0e0883c>] lbug_with_loc+0x4c/0xb0 [libcfs] Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0f208f5>] llog_osd_declare_write_rec+0x3c5/0x3d0 [obdclass] Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0f150e4>] llog_declare_write_rec+0x84/0x200 [obdclass] Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0f15862>] llog_cancel_rec+0xe2/0x870 [obdclass] Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0f1bf6a>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass] Sep 13 15:53:44 soak-10 kernel: [<ffffffffc0e13ba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] Sep 13 15:53:44 soak-10 kernel: [<ffffffffc11fc2d9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc] Sep 13 15:53:44 soak-10 kernel: [<ffffffff810c4810>] ? default_wake_function+0x0/0x20 Sep 13 15:53:44 soak-10 kernel: [<ffffffffc11fbd40>] ? distribute_txn_commit_thread+0x0/0xca0 [ptlrpc] Sep 13 15:53:44 soak-10 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0 Sep 13 15:53:44 soak-10 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0 Sep 13 15:53:44 soak-10 kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 Sep 13 15:53:44 soak-10 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0 Sep 13 15:53:44 soak-10 kernel: Sep 13 15:53:44 soak-10 kernel: Kernel panic - not syncing: LBUG Sep 13 15:53:45 soak-10 kernel: CPU: 10 PID: 14961 Comm: dist_txn-2 Tainted: P OE ------------ 3.10.0-693.1.1.el7_lustre.x86_64 #1 Sep 13 15:53:45 soak-10 kernel: Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Sep 13 15:53:45 soak-10 kernel: ffff8803f8e7bf00 000000009b9e8911 ffff8803f8ea7b90 ffffffff816a3d6d Sep 13 15:53:45 soak-10 kernel: ffff8803f8ea7c10 ffffffff8169dc54 ffffffff00000008 ffff8803f8ea7c20 Sep 13 15:53:45 soak-10 kernel: ffff8803f8ea7bc0 000000009b9e8911 000000009b9e8911 ffff88082d88f8b8 Sep 13 15:53:45 soak-10 kernel: Call Trace: Sep 13 15:53:45 soak-10 kernel: [<ffffffff816a3d6d>] dump_stack+0x19/0x1b Sep 13 15:53:45 soak-10 kernel: [<ffffffff8169dc54>] panic+0xe8/0x20d Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0e08854>] lbug_with_loc+0x64/0xb0 [libcfs] Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0f208f5>] llog_osd_declare_write_rec+0x3c5/0x3d0 [obdclass] Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0f150e4>] llog_declare_write_rec+0x84/0x200 [obdclass] Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0f15862>] llog_cancel_rec+0xe2/0x870 [obdclass] Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0f1bf6a>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass] Sep 13 15:53:45 soak-10 kernel: [<ffffffffc0e13ba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] Sep 13 15:53:45 soak-10 kernel: [<ffffffffc11fc2d9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc] Sep 13 15:53:45 soak-10 kernel: [<ffffffff810c4810>] ? wake_up_state+0x20/0x20 Sep 13 15:53:45 soak-10 kernel: [<ffffffffc11fbd40>] ? sub_trans_commit_cb+0x20/0x20 [ptlrpc] Sep 13 15:53:45 soak-10 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0 Sep 13 15:53:45 soak-10 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 Sep 13 15:53:45 soak-10 kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 Sep 13 15:53:45 soak-10 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 Sep 13 15:53:45 soak-10 kernel: Kernel Offset: disabled Sep 13 15:53:45 soak-10 kernel: ------------[ cut here ]------------
soak-11
Sep 13 15:53:43 soak-11 kernel: LustreError: 4755:0:(llog_osd.c:327:llog_osd_declare_write_rec()) ASSERTION( rec ) failed: Sep 13 15:53:43 soak-11 kernel: LustreError: 4755:0:(llog_osd.c:327:llog_osd_declare_write_rec()) LBUG Sep 13 15:53:43 soak-11 kernel: Pid: 4755, comm: dist_txn-3 Sep 13 15:53:43 soak-11 kernel: #012Call Trace: Sep 13 15:53:43 soak-11 kernel: [<ffffffffc0d6f7ae>] libcfs_call_trace+0x4e/0x60 [libcfs] Sep 13 15:53:43 soak-11 kernel: [<ffffffffc0d6f83c>] lbug_with_loc+0x4c/0xb0 [libcfs] Sep 13 15:53:44 soak-11 kernel: [<ffffffffc0e878f5>] llog_osd_declare_write_rec+0x3c5/0x3d0 [obdclass] Sep 13 15:53:44 soak-11 kernel: [<ffffffffc0e7c0e4>] llog_declare_write_rec+0x84/0x200 [obdclass] Sep 13 15:53:44 soak-11 kernel: [<ffffffffc0e7c862>] llog_cancel_rec+0xe2/0x870 [obdclass] Sep 13 15:53:44 soak-11 kernel: [<ffffffffc0e82f6a>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass] Sep 13 15:53:44 soak-11 kernel: [<ffffffffc0d7aba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] Sep 13 15:53:44 soak-11 kernel: [<ffffffffc11632d9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc] Sep 13 15:53:44 soak-11 kernel: [<ffffffff810c4810>] ? default_wake_function+0x0/0x20 Sep 13 15:53:44 soak-11 kernel: [<ffffffffc1162d40>] ? distribute_txn_commit_thread+0x0/0xca0 [ptlrpc] Sep 13 15:53:44 soak-11 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0 Sep 13 15:53:44 soak-11 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0 Sep 13 15:53:44 soak-11 kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 Sep 13 15:53:44 soak-11 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0 Sep 13 15:53:44 soak-11 kernel: Sep 13 15:53:44 soak-11 kernel: Kernel panic - not syncing: LBUG
Attachments
Issue Links
- is related to
-
LU-10089 kiblnd_fmr_pool_map() Failed to map mr 10/11 elements
- Resolved
-
LU-10068 OST fails to mount:LustreError: 14558:0:(pack_generic.c:588:__lustre_unpack_msg()) message length 0 too small for magic/version check
- Resolved
- is related to
-
LU-9500 MOFED 4/mlx5: Aligning non-aligned page addresses trigger dump_cqe
- Resolved
-
LU-9026 Adapt to the removal of ib_get_dma_mr()
- Resolved
-
LU-9810 Melanox OFED 4.1 support
- Resolved
- mentioned in
-
Page Loading...