[LU-5388] Interop 2.5.2<->2.6 failure on test suite replay-dual test_24: (lu_object.h:855:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed Created: 21/Jul/14  Updated: 25/Aug/14  Resolved: 15/Aug/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.2
Fix Version/s: Lustre 2.5.3

Type: Bug Priority: Critical
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

server: 2.5.2
client: lustre-b2_6-rc2 ldiskfs


Issue Links:
Related
is related to LU-5163 (lu_object.h:852:lu_object_attr()) AS... Resolved
is related to LU-5285 mdt_reconstruct_setattr() calls mdt_a... Resolved
is related to LU-5339 replay-dual defines test_24() twice Resolved
Severity: 3
Rank (Obsolete): 15002

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/512f0146-0d81-11e4-b3f5-5254006e85c2.

The sub-test test_24 failed with the following error:

test failed to respond and timed out

Not sure but this may related with LU-5163
MDS console

04:50:42:Lustre: DEBUG MARKER: == replay-dual test 24: reconstruct on non-existing object == 04:50:00 (1405511400)
04:50:42:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x119
04:50:42:Lustre: *** cfs_fail_loc=119, val=2147483648***
04:50:43:LustreError: 1776:0:(ldlm_lib.c:2415:target_send_reply_msg()) @@@ dropping reply  req@ffff880079979800 x1473783347046016/t545460846596(0) o36->3d5f7619-adde-ff24-3759-10f38dee94cd@10.2.4.188@tcp:0/0 lens 488/456 e 0 to 0 dl 1405511406 ref 1 fl Interpret:/0/0 rc 0/0
04:50:43:Lustre: DEBUG MARKER: lctl set_param fail_loc=0
04:50:43:Lustre: DEBUG MARKER: lctl set_param -n osd*.*MDT*.force_sync 1
04:50:43:LustreError: 1777:0:(lu_object.h:855:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: 
04:50:43:LustreError: 1777:0:(lu_object.h:855:lu_object_attr()) LBUG
04:50:43:Pid: 1777, comm: mdt00_001
04:50:43:
04:50:43:Call Trace:
04:50:43: [<ffffffffa0a17895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
04:50:43: [<ffffffffa0a17e97>] lbug_with_loc+0x47/0xb0 [libcfs]
04:50:44: [<ffffffffa041680a>] mdt_attr_get_complex+0x6da/0x770 [mdt]
04:50:44: [<ffffffffa0a22308>] ? libcfs_log_return+0x28/0x40 [libcfs]
04:50:45: [<ffffffffa042fb95>] mdt_reconstruct_setattr+0xd5/0x3f0 [mdt]
04:50:46: [<ffffffffa0a27901>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
04:50:46: [<ffffffffa042f565>] mdt_reconstruct+0x45/0x120 [mdt]
04:50:47: [<ffffffffa040aeab>] mdt_reint_internal+0x6bb/0x780 [mdt]
04:50:47: [<ffffffffa040afb4>] mdt_reint+0x44/0xe0 [mdt]
04:50:47: [<ffffffffa040e58a>] mdt_handle_common+0x52a/0x1470 [mdt]
04:50:47: [<ffffffffa044a755>] mds_regular_handle+0x15/0x20 [mdt]
04:50:47: [<ffffffffa13acbc5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
04:50:47: [<ffffffffa0a293cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
04:50:47: [<ffffffffa13a42a9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
04:50:49: [<ffffffffa0a27901>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
04:50:49: [<ffffffffa13adf2d>] ptlrpc_main+0xaed/0x1740 [ptlrpc]
04:50:49: [<ffffffffa13ad440>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
04:50:49: [<ffffffff8109ab56>] kthread+0x96/0xa0
04:50:50: [<ffffffff8100c20a>] child_rip+0xa/0x20
04:50:50: [<ffffffff8109aac0>] ? kthread+0x0/0xa0
04:50:50: [<ffffffff8100c200>] ? child_rip+0x0/0x20
04:50:51:
04:50:51:Kernel panic - not syncing: LBUG
04:50:51:Pid: 1777, comm: mdt00_001 Not tainted 2.6.32-431.17.1.el6_lustre.x86_64 #1
04:50:52:Call Trace:
04:50:52: [<ffffffff8152795f>] ? panic+0xa7/0x16f
04:50:52: [<ffffffffa0a17eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
04:50:52: [<ffffffffa041680a>] ? mdt_attr_get_complex+0x6da/0x770 [mdt]
04:50:52: [<ffffffffa0a22308>] ? libcfs_log_return+0x28/0x40 [libcfs]
04:50:52: [<ffffffffa042fb95>] ? mdt_reconstruct_setattr+0xd5/0x3f0 [mdt]
04:50:53: [<ffffffffa0a27901>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
04:50:53: [<ffffffffa042f565>] ? mdt_reconstruct+0x45/0x120 [mdt]
04:50:54: [<ffffffffa040aeab>] ? mdt_reint_internal+0x6bb/0x780 [mdt]
04:50:55: [<ffffffffa040afb4>] ? mdt_reint+0x44/0xe0 [mdt]
04:50:56: [<ffffffffa040e58a>] ? mdt_handle_common+0x52a/0x1470 [mdt]
04:50:56: [<ffffffffa044a755>] ? mds_regular_handle+0x15/0x20 [mdt]
04:50:58: [<ffffffffa13acbc5>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
04:50:59: [<ffffffffa0a293cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
04:51:00: [<ffffffffa13a42a9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
04:51:00: [<ffffffffa0a27901>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
04:51:00: [<ffffffffa13adf2d>] ? ptlrpc_main+0xaed/0x1740 [ptlrpc]
04:51:02: [<ffffffffa13ad440>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
04:51:02: [<ffffffff8109ab56>] ? kthread+0x96/0xa0
04:51:02: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
04:51:02: [<ffffffff8109aac0>] ? kthread+0x0/0xa0
04:51:02: [<ffffffff8100c200>] ? child_rip+0x0/0x20
04:51:04:Initializing cgroup subsys cpuset


 Comments   
Comment by Andreas Dilger [ 22/Jul/14 ]

LU-5339 is tracking a fix to un-duplicate test_24, and this patch should also disable the new test_24 from running if the MDS is 2.5.2 or older.

Comment by Oleg Drokin [ 22/Jul/14 ]

This is because we need to pick http://review.whamcloud.com/11025 to b2_5 that has a fix for this (might be incomplete a bit because there are other master failures in this test for me)

Comment by Peter Jones [ 15/Aug/14 ]

Should be fixed when 2.5.3 is GA as LU-5285 fix has landed to b2_5

Comment by Jian Yu [ 20/Aug/14 ]

Lustre client build: https://build.hpdd.intel.com/job/lustre-b2_5/80/
Lustre server build: https://build.hpdd.intel.com/job/lustre-b2_4/73/ (2.4.3)
Distro/Arch: RHEL6.5/x86_64

replay-dual test 24 hit the same LBUG:

02:01:30:LustreError: 28062:0:(lu_object.h:867:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: 
02:01:30:LustreError: 28062:0:(lu_object.h:867:lu_object_attr()) LBUG
02:01:30:Pid: 28062, comm: mdt00_000
02:01:30:
02:01:30:Call Trace:
02:01:30: [<ffffffffa0453895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
02:01:30: [<ffffffffa0453e97>] lbug_with_loc+0x47/0xb0 [libcfs]
02:01:30: [<ffffffffa0e2bf4a>] mdt_attr_get_complex+0x6da/0x770 [mdt]
02:01:30: [<ffffffffa045ed98>] ? libcfs_log_return+0x28/0x40 [libcfs]
02:01:30: [<ffffffffa0e42a45>] mdt_reconstruct_setattr+0xd5/0x3e0 [mdt]
02:01:30: [<ffffffffa04642d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
02:01:30: [<ffffffffa0e42415>] mdt_reconstruct+0x45/0x120 [mdt]
02:01:30: [<ffffffffa0e1dcdb>] mdt_reint_internal+0x6bb/0x780 [mdt]
02:01:30: [<ffffffffa0e1dde4>] mdt_reint+0x44/0xe0 [mdt]
02:01:30: [<ffffffffa0e22a97>] mdt_handle_common+0x647/0x16d0 [mdt]
02:01:30: [<ffffffffa0e5c6e5>] mds_regular_handle+0x15/0x20 [mdt]
02:01:30: [<ffffffffa08fe3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
02:01:30: [<ffffffffa04545de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
02:01:30: [<ffffffffa0465d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
02:01:30: [<ffffffffa08f5719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
02:01:30: [<ffffffffa04642d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
02:01:30: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
02:01:30: [<ffffffffa08ff74e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
02:01:30: [<ffffffffa08fec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
02:01:30: [<ffffffff8100c0ca>] child_rip+0xa/0x20
02:01:30: [<ffffffffa08fec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
02:01:30: [<ffffffffa08fec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
02:01:30: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
02:01:30:
02:01:30:Kernel panic - not syncing: LBUG

Maloo report: https://testing.hpdd.intel.com/test_sets/960c99c0-266f-11e4-8ee8-5254006e85c2

Comment by Jian Yu [ 20/Aug/14 ]

I'll upload a patch to add Lustre version check code.

Comment by Jian Yu [ 21/Aug/14 ]

Patch for master branch: http://review.whamcloud.com/11262
Patch for b2_5 branch: http://review.whamcloud.com/11539

Generated at Sat Feb 10 01:51:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.