[LU-5982] sanity-lfsck test_29c: (8) unexpected status, test_30: timeout Created: 04/Dec/14  Updated: 06/Dec/14  Resolved: 06/Dec/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-5518 LFSCK 3: recover client visible objec... Resolved
Severity: 3
Rank (Obsolete): 16695

 Description   

This issue was created by maloo for Li Wei <liwei@whamcloud.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/5278cc92-7b3d-11e4-8c6d-5254006e85c2.

The sub-test test_29c failed with the following error:

(8) unexpected status

More detailed test output:

== sanity-lfsck test 29c: Not verify nlink attr if hark links exceed linkEA limitation == 08:43:50 (1417625030)
#####
There are too much hard links to the object, and exceeds the
	echo object's linkEA limitation, as to NOT all the known name entries
will be recorded in the linkEA. Under such case, LFSCK should
skip the nlink verification for this object.
#####
rm: cannot remove `/mnt/lustre/.lustre/lost+found/MDT0001/[0x340000bd1:0xb:0x0]-P-0': Directory not empty
Inject failure stub on MDT0 to simulate the case that
foo's hard links exceed the object's linkEA limitation.
CMD: onyx-32vm3 /usr/sbin/lctl set_param fail_loc=0x1627
fail_loc=0x1627
/mnt/lustre/d29c.sanity-lfsck/d0/foo
/mnt/lustre/d29c.sanity-lfsck/d0/h1
Trigger namespace LFSCK to repair the nlink count
CMD: onyx-32vm3 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
Started LFSCK on the device lustre-MDT0000: scrub namespace
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
pdsh@onyx-32vm6: onyx-32vm3: mcmd: xpoll (setting up stderr): Interrupted system call
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
Waiting 32 secs for update
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
Waiting 22 secs for update
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
Waiting 12 secs for update
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
Waiting 2 secs for update
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
		awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
Update not seen after 32s: wanted 'completed' got ''
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
 sanity-lfsck test_29c: @@@@@@ FAIL: (8) unexpected status 

Info required for matching: sanity-lfsck 29c



 Comments   
Comment by James Nunez (Inactive) [ 04/Dec/14 ]

I was testing a recent build of master, #2760, on the OpenSFS cluster and ran into the same test failure. On the MDS console in my testing, I see

Message from syslogd@mds01 at Dec  3 16:38:19 ...
 kernel:LustreError: 29360:0:(osd_handler.c:1571:osd_object_write_unlock()) ASSERTION( obj->oo_owner == env ) failed: 

Message from syslogd@mds01 at Dec  3 16:38:19 ...
 kernel:LustreError: 29360:0:(osd_handler.c:1571:osd_object_write_unlock()) LBUG

Then the MDS crashed.

Looking at your test logs, it looks like the primary MDS crashed:

09:03:40:Lustre: DEBUG MARKER: == sanity-lfsck test 29c: Not verify nlink attr if hark links exceed linkEA limitation == 08:43:50 (1417625030)
09:03:40:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x1627
09:03:40:Lustre: *** cfs_fail_loc=1627, val=0***
09:03:40:Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
09:03:40:LustreError: 28234:0:(osd_handler.c:1571:osd_object_write_unlock()) ASSERTION( obj->oo_owner == env ) failed: 
09:03:40:LustreError: 28234:0:(osd_handler.c:1571:osd_object_write_unlock()) LBUG
09:03:40:Pid: 28234, comm: lfsck_namespace
09:03:40:
09:03:40:Call Trace:
09:03:40: [<ffffffffa048e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
09:03:40: [<ffffffffa048ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
09:03:40: [<ffffffffa0d1cb66>] osd_object_write_unlock+0xd6/0x110 [osd_ldiskfs]
09:03:40: [<ffffffffa0e3fcd2>] dt_write_unlock+0x22/0xc0 [lfsck]
09:03:40: [<ffffffffa0e51dfc>] lfsck_namespace_insert_orphan+0xc9c/0x1100 [lfsck]
09:03:40: [<ffffffff8128d756>] ? vsnprintf+0x336/0x5e0
09:03:40: [<ffffffffa0e59d26>] lfsck_namespace_double_scan_dir+0x9d6/0xe40 [lfsck]
09:03:40: [<ffffffffa04a3ee9>] ? cfs_hash_bd_add_locked+0x29/0x90 [libcfs]
09:03:40: [<ffffffffa0e5a514>] lfsck_namespace_double_scan_one+0x384/0x1310 [lfsck]
09:03:40: [<ffffffffa05fe163>] ? lu_object_find_at+0xb3/0x100 [obdclass]
09:03:40: [<ffffffffa0e5c444>] lfsck_namespace_assistant_handler_p2+0xfa4/0x1140 [lfsck]
09:03:40: [<ffffffffa0e3e1bc>] lfsck_assistant_engine+0x148c/0x1dc0 [lfsck]
09:03:40: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
09:03:40: [<ffffffffa0e3cd30>] ? lfsck_assistant_engine+0x0/0x1dc0 [lfsck]
09:03:41: [<ffffffff8109abf6>] kthread+0x96/0xa0
09:03:41: [<ffffffff8100c20a>] child_rip+0xa/0x20
09:03:41: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
09:03:41: [<ffffffff8100c200>] ? child_rip+0x0/0x20
09:03:41:
09:03:41:Kernel panic - not syncing: LBUG
09:03:41:Pid: 28234, comm: lfsck_namespace Not tainted 2.6.32-431.29.2.el6_lustre.g2856d31.x86_64 #1
09:03:41:Call Trace:
09:03:41: [<ffffffff81528fdc>] ? panic+0xa7/0x16f
09:03:41:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_namespace |
09:03:41:		awk '/^status/ { print $2 }'
09:03:41: [<ffffffffa048eeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
09:03:41: [<ffffffffa0d1cb66>] ? osd_object_write_unlock+0xd6/0x110 [osd_ldiskfs]
09:03:41: [<ffffffffa0e3fcd2>] ? dt_write_unlock+0x22/0xc0 [lfsck]
09:03:41: [<ffffffffa0e51dfc>] ? lfsck_namespace_insert_orphan+0xc9c/0x1100 [lfsck]
09:03:41: [<ffffffff8128d756>] ? vsnprintf+0x336/0x5e0
09:03:41: [<ffffffffa0e59d26>] ? lfsck_namespace_double_scan_dir+0x9d6/0xe40 [lfsck]
09:03:41: [<ffffffffa04a3ee9>] ? cfs_hash_bd_add_locked+0x29/0x90 [libcfs]
09:03:41: [<ffffffffa0e5a514>] ? lfsck_namespace_double_scan_one+0x384/0x1310 [lfsck]
09:03:41: [<ffffffffa05fe163>] ? lu_object_find_at+0xb3/0x100 [obdclass]
09:03:41: [<ffffffffa0e5c444>] ? lfsck_namespace_assistant_handler_p2+0xfa4/0x1140 [lfsck]
09:03:41: [<ffffffffa0e3e1bc>] ? lfsck_assistant_engine+0x148c/0x1dc0 [lfsck]
09:03:41: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
09:03:41: [<ffffffffa0e3cd30>] ? lfsck_assistant_engine+0x0/0x1dc0 [lfsck]
09:03:41: [<ffffffff8109abf6>] ? kthread+0x96/0xa0
09:03:41: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
09:03:41: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
09:03:41: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Comment by Andreas Dilger [ 04/Dec/14 ]

Hit this recently in https://testing.hpdd.intel.com/test_sets/13c4439a-7b11-11e4-9132-5254006e85c2 as well. Looks like this is a new regression.

Comment by Jian Yu [ 04/Dec/14 ]

More instances on master branch:
https://testing.hpdd.intel.com/test_sets/8015e966-7b72-11e4-813e-5254006e85c2
https://testing.hpdd.intel.com/test_sets/c643440e-7bd8-11e4-813e-5254006e85c2

Comment by John Hammond [ 04/Dec/14 ]

https://testing.hpdd.intel.com/test_sets/4ceb1126-7b57-11e4-8c6d-5254006e85c2

Comment by Gerrit Updater [ 05/Dec/14 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/12943
Subject: LU-5982 lfsck: not unlock the object repeatedly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5d9c9240c598eb0bed054679440acb555f3bff1f

Comment by Andreas Dilger [ 06/Dec/14 ]

This problem was added via http://review.whamcloud.com/11536 "LU-5518 lfsck: recover orphans from backend lost+found"

Comment by Gerrit Updater [ 06/Dec/14 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/12943/
Subject: LU-5982 lfsck: not unlock the object repeatedly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8ee9c59a3688e5026ebb914190b63970135ce2e5

Comment by Andreas Dilger [ 06/Dec/14 ]

Patch landed to master.

Generated at Sat Feb 10 01:56:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.