[LU-5388] Interop 2.5.2<->2.6 failure on test suite replay-dual test_24: (lu_object.h:855:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed Created: 21/Jul/14 Updated: 25/Aug/14 Resolved: 15/Aug/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.2 |
| Fix Version/s: | Lustre 2.5.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server: 2.5.2 |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 15002 | ||||||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/512f0146-0d81-11e4-b3f5-5254006e85c2. The sub-test test_24 failed with the following error:
Not sure but this may related with 04:50:42:Lustre: DEBUG MARKER: == replay-dual test 24: reconstruct on non-existing object == 04:50:00 (1405511400) 04:50:42:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x119 04:50:42:Lustre: *** cfs_fail_loc=119, val=2147483648*** 04:50:43:LustreError: 1776:0:(ldlm_lib.c:2415:target_send_reply_msg()) @@@ dropping reply req@ffff880079979800 x1473783347046016/t545460846596(0) o36->3d5f7619-adde-ff24-3759-10f38dee94cd@10.2.4.188@tcp:0/0 lens 488/456 e 0 to 0 dl 1405511406 ref 1 fl Interpret:/0/0 rc 0/0 04:50:43:Lustre: DEBUG MARKER: lctl set_param fail_loc=0 04:50:43:Lustre: DEBUG MARKER: lctl set_param -n osd*.*MDT*.force_sync 1 04:50:43:LustreError: 1777:0:(lu_object.h:855:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: 04:50:43:LustreError: 1777:0:(lu_object.h:855:lu_object_attr()) LBUG 04:50:43:Pid: 1777, comm: mdt00_001 04:50:43: 04:50:43:Call Trace: 04:50:43: [<ffffffffa0a17895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 04:50:43: [<ffffffffa0a17e97>] lbug_with_loc+0x47/0xb0 [libcfs] 04:50:44: [<ffffffffa041680a>] mdt_attr_get_complex+0x6da/0x770 [mdt] 04:50:44: [<ffffffffa0a22308>] ? libcfs_log_return+0x28/0x40 [libcfs] 04:50:45: [<ffffffffa042fb95>] mdt_reconstruct_setattr+0xd5/0x3f0 [mdt] 04:50:46: [<ffffffffa0a27901>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 04:50:46: [<ffffffffa042f565>] mdt_reconstruct+0x45/0x120 [mdt] 04:50:47: [<ffffffffa040aeab>] mdt_reint_internal+0x6bb/0x780 [mdt] 04:50:47: [<ffffffffa040afb4>] mdt_reint+0x44/0xe0 [mdt] 04:50:47: [<ffffffffa040e58a>] mdt_handle_common+0x52a/0x1470 [mdt] 04:50:47: [<ffffffffa044a755>] mds_regular_handle+0x15/0x20 [mdt] 04:50:47: [<ffffffffa13acbc5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] 04:50:47: [<ffffffffa0a293cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] 04:50:47: [<ffffffffa13a42a9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] 04:50:49: [<ffffffffa0a27901>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 04:50:49: [<ffffffffa13adf2d>] ptlrpc_main+0xaed/0x1740 [ptlrpc] 04:50:49: [<ffffffffa13ad440>] ? ptlrpc_main+0x0/0x1740 [ptlrpc] 04:50:49: [<ffffffff8109ab56>] kthread+0x96/0xa0 04:50:50: [<ffffffff8100c20a>] child_rip+0xa/0x20 04:50:50: [<ffffffff8109aac0>] ? kthread+0x0/0xa0 04:50:50: [<ffffffff8100c200>] ? child_rip+0x0/0x20 04:50:51: 04:50:51:Kernel panic - not syncing: LBUG 04:50:51:Pid: 1777, comm: mdt00_001 Not tainted 2.6.32-431.17.1.el6_lustre.x86_64 #1 04:50:52:Call Trace: 04:50:52: [<ffffffff8152795f>] ? panic+0xa7/0x16f 04:50:52: [<ffffffffa0a17eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 04:50:52: [<ffffffffa041680a>] ? mdt_attr_get_complex+0x6da/0x770 [mdt] 04:50:52: [<ffffffffa0a22308>] ? libcfs_log_return+0x28/0x40 [libcfs] 04:50:52: [<ffffffffa042fb95>] ? mdt_reconstruct_setattr+0xd5/0x3f0 [mdt] 04:50:53: [<ffffffffa0a27901>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 04:50:53: [<ffffffffa042f565>] ? mdt_reconstruct+0x45/0x120 [mdt] 04:50:54: [<ffffffffa040aeab>] ? mdt_reint_internal+0x6bb/0x780 [mdt] 04:50:55: [<ffffffffa040afb4>] ? mdt_reint+0x44/0xe0 [mdt] 04:50:56: [<ffffffffa040e58a>] ? mdt_handle_common+0x52a/0x1470 [mdt] 04:50:56: [<ffffffffa044a755>] ? mds_regular_handle+0x15/0x20 [mdt] 04:50:58: [<ffffffffa13acbc5>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] 04:50:59: [<ffffffffa0a293cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] 04:51:00: [<ffffffffa13a42a9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] 04:51:00: [<ffffffffa0a27901>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 04:51:00: [<ffffffffa13adf2d>] ? ptlrpc_main+0xaed/0x1740 [ptlrpc] 04:51:02: [<ffffffffa13ad440>] ? ptlrpc_main+0x0/0x1740 [ptlrpc] 04:51:02: [<ffffffff8109ab56>] ? kthread+0x96/0xa0 04:51:02: [<ffffffff8100c20a>] ? child_rip+0xa/0x20 04:51:02: [<ffffffff8109aac0>] ? kthread+0x0/0xa0 04:51:02: [<ffffffff8100c200>] ? child_rip+0x0/0x20 04:51:04:Initializing cgroup subsys cpuset |
| Comments |
| Comment by Andreas Dilger [ 22/Jul/14 ] |
|
|
| Comment by Oleg Drokin [ 22/Jul/14 ] |
|
This is because we need to pick http://review.whamcloud.com/11025 to b2_5 that has a fix for this (might be incomplete a bit because there are other master failures in this test for me) |
| Comment by Peter Jones [ 15/Aug/14 ] |
|
Should be fixed when 2.5.3 is GA as |
| Comment by Jian Yu [ 20/Aug/14 ] |
|
Lustre client build: https://build.hpdd.intel.com/job/lustre-b2_5/80/ replay-dual test 24 hit the same LBUG: 02:01:30:LustreError: 28062:0:(lu_object.h:867:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: 02:01:30:LustreError: 28062:0:(lu_object.h:867:lu_object_attr()) LBUG 02:01:30:Pid: 28062, comm: mdt00_000 02:01:30: 02:01:30:Call Trace: 02:01:30: [<ffffffffa0453895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 02:01:30: [<ffffffffa0453e97>] lbug_with_loc+0x47/0xb0 [libcfs] 02:01:30: [<ffffffffa0e2bf4a>] mdt_attr_get_complex+0x6da/0x770 [mdt] 02:01:30: [<ffffffffa045ed98>] ? libcfs_log_return+0x28/0x40 [libcfs] 02:01:30: [<ffffffffa0e42a45>] mdt_reconstruct_setattr+0xd5/0x3e0 [mdt] 02:01:30: [<ffffffffa04642d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 02:01:30: [<ffffffffa0e42415>] mdt_reconstruct+0x45/0x120 [mdt] 02:01:30: [<ffffffffa0e1dcdb>] mdt_reint_internal+0x6bb/0x780 [mdt] 02:01:30: [<ffffffffa0e1dde4>] mdt_reint+0x44/0xe0 [mdt] 02:01:30: [<ffffffffa0e22a97>] mdt_handle_common+0x647/0x16d0 [mdt] 02:01:30: [<ffffffffa0e5c6e5>] mds_regular_handle+0x15/0x20 [mdt] 02:01:30: [<ffffffffa08fe3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] 02:01:30: [<ffffffffa04545de>] ? cfs_timer_arm+0xe/0x10 [libcfs] 02:01:30: [<ffffffffa0465d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] 02:01:30: [<ffffffffa08f5719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] 02:01:30: [<ffffffffa04642d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 02:01:30: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70 02:01:30: [<ffffffffa08ff74e>] ptlrpc_main+0xace/0x1700 [ptlrpc] 02:01:30: [<ffffffffa08fec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] 02:01:30: [<ffffffff8100c0ca>] child_rip+0xa/0x20 02:01:30: [<ffffffffa08fec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] 02:01:30: [<ffffffffa08fec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] 02:01:30: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 02:01:30: 02:01:30:Kernel panic - not syncing: LBUG Maloo report: https://testing.hpdd.intel.com/test_sets/960c99c0-266f-11e4-8ee8-5254006e85c2 |
| Comment by Jian Yu [ 20/Aug/14 ] |
|
I'll upload a patch to add Lustre version check code. |
| Comment by Jian Yu [ 21/Aug/14 ] |
|
Patch for master branch: http://review.whamcloud.com/11262 |