[LU-2269] Test failure on test suite recovery-small, subtest test_50: writemany returned rc 2 Created: 03/Nov/12 Updated: 19/Apr/13 Resolved: 19/Nov/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Nathaniel Clark |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | NFBlocker | ||
| Environment: |
lustre master build #1011 SLES11 SP2 client |
||
| Severity: | 3 |
| Rank (Obsolete): | 5428 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/163db52a-253c-11e2-9e7c-52540035b04c. The sub-test test_50 failed with the following error:
ost console log: 01:34:03:Lustre: DEBUG MARKER: == recovery-small test 50: failover MDS under load =================================================== 01:34:02 (1351845242) 01:36:25:LustreError: 20350:0:(qsd_reint.c:58:qsd_reint_completion()) lustre-OST0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x20000:0x0], rc:-5 01:36:25:LustreError: 20350:0:(qsd_reint.c:58:qsd_reint_completion()) Skipped 13 previous similar messages 01:38:16:Lustre: DEBUG MARKER: /usr/sbin/lctl mark recovery-small test_50: @@@@@@ IGNORE \(bz13652\): writemany returned rc 2 01:38:16:Lustre: DEBUG MARKER: recovery-small test_50: @@@@@@ IGNORE (bz13652): writemany returned rc 2 01:38:16:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2012-11-01/lustre-master-el6-x86_64-sles11sp2-x86_64__1011__-69983126486880-163552/recovery-small.test_50.debug_log.$(hostname -s).1351845491.log; 01:38:16: dmesg > /logdir/test_logs/2012-11-01/lustre-master-el6-x86_64 01:38:16:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 2>/dev/null || true 01:38:16:Lustre: DEBUG MARKER: rc=$([ -f /proc/sys/lnet/catastrophe ] && echo $(< /proc/sys/lnet/catastrophe) || echo 0); 01:38:16: if [ $rc -ne 0 ]; then echo $(hostname): $rc; fi 01:38:16: exit $rc; 01:38:17:Lustre: DEBUG MARKER: /usr/sbin/lctl mark == recovery-small test 51: failover MDS during recovery ============================================== 01:38:16 \(1351845496\) |
| Comments |
| Comment by Sarah Liu [ 05/Nov/12 ] |
|
another failure: https://maloo.whamcloud.com/test_sets/4fc66cd2-2731-11e2-b04c-52540035b04c |
| Comment by Nathaniel Clark [ 13/Nov/12 ] |
|
Line in test log that appears when either test 50 or 51 fails is like the following (dmesg on oss): Lustre: 18518:0:(ofd_obd.c:1069:ofd_orphans_destroy()) lustre-OST0000: deleting orphan objects from 5376 to 5419 |
| Comment by Andreas Dilger [ 16/Nov/12 ] |
|
Nathaniel, the "deleting orphans" message itself is normal operation during recovery. It indicates that the MDS requested the OST delete objects that it had precreated because they were not needed by the MDS after it crashed, but might have garbage data in them. Li Wei, could this potentially relate to the "deleting too many orphans" problem? I haven't really looked into the logs to determine the root cause. |
| Comment by Li Wei (Inactive) [ 16/Nov/12 ] |
|
Andreas, it is not obvious from the console logs, but I'll download and take a look at the debug logs. |
| Comment by Li Wei (Inactive) [ 18/Nov/12 ] |
|
I didn't find much from the debug logs either. However, there were lines like LustreError: 19991:0:(osp_precreate.c:274:osp_precreate_send()) lustre-OST0000-osc-MDT0000: can't precreate: rc = -5 LustreError: 19991:0:(osp_precreate.c:609:osp_precreate_thread()) lustre-OST0000-osc-MDT0000: cannot precreate objects: rc = -5 LustreError: 19994:0:(osp_precreate.c:274:osp_precreate_send()) lustre-OST0001-osc-MDT0000: can't precreate: rc = -5 LustreError: 19994:0:(osp_precreate.c:609:osp_precreate_thread()) lustre-OST0001-osc-MDT0000: cannot precreate objects: rc = -5 [...] LustreError: 20557:0:(osp_precreate.c:563:osp_precreate_thread()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -5 with quota error messages above them. |
| Comment by Nathaniel Clark [ 19/Nov/12 ] |
|
The fix in |