[LU-5982] sanity-lfsck test_29c: (8) unexpected status, test_30: timeout Created: 04/Dec/14 Updated: 06/Dec/14 Resolved: 06/Dec/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 16695 | ||||||||
| Description |
|
This issue was created by maloo for Li Wei <liwei@whamcloud.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/5278cc92-7b3d-11e4-8c6d-5254006e85c2. The sub-test test_29c failed with the following error: (8) unexpected status More detailed test output: == sanity-lfsck test 29c: Not verify nlink attr if hark links exceed linkEA limitation == 08:43:50 (1417625030)
#####
There are too much hard links to the object, and exceeds the
echo object's linkEA limitation, as to NOT all the known name entries
will be recorded in the linkEA. Under such case, LFSCK should
skip the nlink verification for this object.
#####
rm: cannot remove `/mnt/lustre/.lustre/lost+found/MDT0001/[0x340000bd1:0xb:0x0]-P-0': Directory not empty
Inject failure stub on MDT0 to simulate the case that
foo's hard links exceed the object's linkEA limitation.
CMD: onyx-32vm3 /usr/sbin/lctl set_param fail_loc=0x1627
fail_loc=0x1627
/mnt/lustre/d29c.sanity-lfsck/d0/foo
/mnt/lustre/d29c.sanity-lfsck/d0/h1
Trigger namespace LFSCK to repair the nlink count
CMD: onyx-32vm3 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
Started LFSCK on the device lustre-MDT0000: scrub namespace
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
pdsh@onyx-32vm6: onyx-32vm3: mcmd: xpoll (setting up stderr): Interrupted system call
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
Waiting 32 secs for update
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
Waiting 22 secs for update
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
Waiting 12 secs for update
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
Waiting 2 secs for update
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
awk '/^status/ { print \$2 }'
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
Update not seen after 32s: wanted 'completed' got ''
CMD: onyx-32vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
onyx-32vm3: error: get_param: mdd/lustre-MDT0000/lfsck_namespace: Found no match
sanity-lfsck test_29c: @@@@@@ FAIL: (8) unexpected status
Info required for matching: sanity-lfsck 29c |
| Comments |
| Comment by James Nunez (Inactive) [ 04/Dec/14 ] |
|
I was testing a recent build of master, #2760, on the OpenSFS cluster and ran into the same test failure. On the MDS console in my testing, I see Message from syslogd@mds01 at Dec 3 16:38:19 ... kernel:LustreError: 29360:0:(osd_handler.c:1571:osd_object_write_unlock()) ASSERTION( obj->oo_owner == env ) failed: Message from syslogd@mds01 at Dec 3 16:38:19 ... kernel:LustreError: 29360:0:(osd_handler.c:1571:osd_object_write_unlock()) LBUG Then the MDS crashed. Looking at your test logs, it looks like the primary MDS crashed: 09:03:40:Lustre: DEBUG MARKER: == sanity-lfsck test 29c: Not verify nlink attr if hark links exceed linkEA limitation == 08:43:50 (1417625030)
09:03:40:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x1627
09:03:40:Lustre: *** cfs_fail_loc=1627, val=0***
09:03:40:Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
09:03:40:LustreError: 28234:0:(osd_handler.c:1571:osd_object_write_unlock()) ASSERTION( obj->oo_owner == env ) failed:
09:03:40:LustreError: 28234:0:(osd_handler.c:1571:osd_object_write_unlock()) LBUG
09:03:40:Pid: 28234, comm: lfsck_namespace
09:03:40:
09:03:40:Call Trace:
09:03:40: [<ffffffffa048e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
09:03:40: [<ffffffffa048ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
09:03:40: [<ffffffffa0d1cb66>] osd_object_write_unlock+0xd6/0x110 [osd_ldiskfs]
09:03:40: [<ffffffffa0e3fcd2>] dt_write_unlock+0x22/0xc0 [lfsck]
09:03:40: [<ffffffffa0e51dfc>] lfsck_namespace_insert_orphan+0xc9c/0x1100 [lfsck]
09:03:40: [<ffffffff8128d756>] ? vsnprintf+0x336/0x5e0
09:03:40: [<ffffffffa0e59d26>] lfsck_namespace_double_scan_dir+0x9d6/0xe40 [lfsck]
09:03:40: [<ffffffffa04a3ee9>] ? cfs_hash_bd_add_locked+0x29/0x90 [libcfs]
09:03:40: [<ffffffffa0e5a514>] lfsck_namespace_double_scan_one+0x384/0x1310 [lfsck]
09:03:40: [<ffffffffa05fe163>] ? lu_object_find_at+0xb3/0x100 [obdclass]
09:03:40: [<ffffffffa0e5c444>] lfsck_namespace_assistant_handler_p2+0xfa4/0x1140 [lfsck]
09:03:40: [<ffffffffa0e3e1bc>] lfsck_assistant_engine+0x148c/0x1dc0 [lfsck]
09:03:40: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
09:03:40: [<ffffffffa0e3cd30>] ? lfsck_assistant_engine+0x0/0x1dc0 [lfsck]
09:03:41: [<ffffffff8109abf6>] kthread+0x96/0xa0
09:03:41: [<ffffffff8100c20a>] child_rip+0xa/0x20
09:03:41: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
09:03:41: [<ffffffff8100c200>] ? child_rip+0x0/0x20
09:03:41:
09:03:41:Kernel panic - not syncing: LBUG
09:03:41:Pid: 28234, comm: lfsck_namespace Not tainted 2.6.32-431.29.2.el6_lustre.g2856d31.x86_64 #1
09:03:41:Call Trace:
09:03:41: [<ffffffff81528fdc>] ? panic+0xa7/0x16f
09:03:41:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
09:03:41: awk '/^status/ { print $2 }'
09:03:41: [<ffffffffa048eeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
09:03:41: [<ffffffffa0d1cb66>] ? osd_object_write_unlock+0xd6/0x110 [osd_ldiskfs]
09:03:41: [<ffffffffa0e3fcd2>] ? dt_write_unlock+0x22/0xc0 [lfsck]
09:03:41: [<ffffffffa0e51dfc>] ? lfsck_namespace_insert_orphan+0xc9c/0x1100 [lfsck]
09:03:41: [<ffffffff8128d756>] ? vsnprintf+0x336/0x5e0
09:03:41: [<ffffffffa0e59d26>] ? lfsck_namespace_double_scan_dir+0x9d6/0xe40 [lfsck]
09:03:41: [<ffffffffa04a3ee9>] ? cfs_hash_bd_add_locked+0x29/0x90 [libcfs]
09:03:41: [<ffffffffa0e5a514>] ? lfsck_namespace_double_scan_one+0x384/0x1310 [lfsck]
09:03:41: [<ffffffffa05fe163>] ? lu_object_find_at+0xb3/0x100 [obdclass]
09:03:41: [<ffffffffa0e5c444>] ? lfsck_namespace_assistant_handler_p2+0xfa4/0x1140 [lfsck]
09:03:41: [<ffffffffa0e3e1bc>] ? lfsck_assistant_engine+0x148c/0x1dc0 [lfsck]
09:03:41: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
09:03:41: [<ffffffffa0e3cd30>] ? lfsck_assistant_engine+0x0/0x1dc0 [lfsck]
09:03:41: [<ffffffff8109abf6>] ? kthread+0x96/0xa0
09:03:41: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
09:03:41: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
09:03:41: [<ffffffff8100c200>] ? child_rip+0x0/0x20
|
| Comment by Andreas Dilger [ 04/Dec/14 ] |
|
Hit this recently in https://testing.hpdd.intel.com/test_sets/13c4439a-7b11-11e4-9132-5254006e85c2 as well. Looks like this is a new regression. |
| Comment by Jian Yu [ 04/Dec/14 ] |
|
More instances on master branch: |
| Comment by John Hammond [ 04/Dec/14 ] |
|
https://testing.hpdd.intel.com/test_sets/4ceb1126-7b57-11e4-8c6d-5254006e85c2 |
| Comment by Gerrit Updater [ 05/Dec/14 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/12943 |
| Comment by Andreas Dilger [ 06/Dec/14 ] |
|
This problem was added via http://review.whamcloud.com/11536 " |
| Comment by Gerrit Updater [ 06/Dec/14 ] |
|
Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/12943/ |
| Comment by Andreas Dilger [ 06/Dec/14 ] |
|
Patch landed to master. |