[LU-1850] replay-vbr test_1b: FAIL: open succeeded unexpectedly Created: 06/Sep/12 Updated: 12/Aug/16 Resolved: 28/Sep/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0 |
| Fix Version/s: | Lustre 2.3.0, Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Peter Jones | Assignee: | Sarah Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4426 |
| Description |
|
https://maloo.whamcloud.com/test_sessions/a693a99c-f682-11e1-8eb0-52540035b04c https://maloo.whamcloud.com/test_logs/04f30584-f686-11e1-8eb0-52540035b04c/show_text [HOST:client-29vm6] [old_quota_type:3] [new_quota_type:ug3] |
| Comments |
| Comment by Peter Jones [ 06/Sep/12 ] |
|
Niu Could you please look at this failure and see if you can work out why this has happened? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 10/Sep/12 ] |
|
The error message is indicating that the /mnt/lustre isn't accessiable (got -ESTALE when trying to open it), it's not related to quota. I think it could probably caused by the insane testing environment after replay-vbr failed. We can see from the maloo report: I think it should be a test script issue (we'd make sure the testing environment is clean after each failed test?) but not a lustre bug. |
| Comment by Peter Jones [ 10/Sep/12 ] |
|
Thanks Niu. Sarah will look at making the necessary test script improvements |
| Comment by Sarah Liu [ 11/Sep/12 ] |
|
Try unmount and remount client, make sure the system is ready for the following tests. |
| Comment by Jian Yu [ 11/Sep/12 ] |
|
The initial issue is that replay-vbr test_1b failed as follows: CMD: client-29vm6 checkstat -v -a /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b Can't lstat /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b: Stale NFS file handle replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly Dmesg on Client client-29vm6: [33466.651998] Lustre: DEBUG MARKER: checkstat -v -a /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b [33466.659487] LustreError: 3642:0:(file.c:2328:ll_inode_revalidate_fini()) failure -116 inode 524300 [33466.724390] Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly Dmesg on MDS client-29vm3: LustreError: 8705:0:(llog_lvfs.c:610:llog_lvfs_create()) error looking up logfile 0x60:0x3f5bbfdd: rc -116 LustreError: 8705:0:(llog_cat.c:168:llog_cat_id2handle()) error opening log id 0x60:3f5bbfdd: rc -116 LustreError: 8705:0:(llog_obd.c:314:cat_cancel_cb()) Cannot find handle for log 0x60 LustreError: 8676:0:(llog_obd.c:387:llog_obd_origin_setup()) llog_process() with cat_cancel_cb failed: -116 LustreError: 8676:0:(llog_obd.c:214:llog_setup_named()) obd lustre-OST0003-osc-MDT0000 ctxt 2 lop_setup=ffffffffa03cbdf0 failed -116 LustreError: 8676:0:(osc_request.c:3281:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT LustreError: 8676:0:(osc_request.c:3298:__osc_llog_init()) osc 'lustre-OST0003-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' catid ffff880065dd78d0 rc=-116 LustreError: 8676:0:(osc_request.c:3300:__osc_llog_init()) logid 0x58:0x3f5bbfd2 LustreError: 8676:0:(osc_request.c:3328:osc_llog_init()) rc: -116 LustreError: 8676:0:(lov_log.c:243:lov_llog_init()) error osc_llog_init idx 3 osc 'lustre-OST0003-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' (rc=-116) LustreError: 8676:0:(llog_lvfs.c:610:llog_lvfs_create()) error looking up logfile 0x59:0x3f5bbfd3: rc -116 LustreError: 8676:0:(llog_obd.c:214:llog_setup_named()) obd lustre-OST0004-osc-MDT0000 ctxt 2 lop_setup=ffffffffa03cbdf0 failed -116 LustreError: 8676:0:(osc_request.c:3281:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT LustreError: 8676:0:(osc_request.c:3298:__osc_llog_init()) osc 'lustre-OST0004-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' catid ffff880065dd78d0 rc=-116 LustreError: 8676:0:(osc_request.c:3300:__osc_llog_init()) logid 0x59:0x3f5bbfd3 LustreError: 8676:0:(osc_request.c:3328:osc_llog_init()) rc: -116 LustreError: 8676:0:(lov_log.c:243:lov_llog_init()) error osc_llog_init idx 4 osc 'lustre-OST0004-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' (rc=-116) LustreError: 8706:0:(lov_log.c:155:lov_llog_origin_connect()) error osc_llog_connect tgt 3 (-107) LustreError: 8706:0:(mds_lov.c:873:__mds_lov_synchronize()) lustre-OST0003_UUID failed at llog_origin_connect: -107 LustreError: 8706:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0003_UUID sync failed -107, deactivating LustreError: 8707:0:(lov_log.c:155:lov_llog_origin_connect()) error osc_llog_connect tgt 4 (-107) LustreError: 8707:0:(mds_lov.c:873:__mds_lov_synchronize()) lustre-OST0004_UUID failed at llog_origin_connect: -107 LustreError: 8707:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0004_UUID sync failed -107, deactivating Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly Lustre: DEBUG MARKER: replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly It was really a Lustre issue. We need fix this Lustre issue first. |
| Comment by Niu Yawei (Inactive) [ 11/Sep/12 ] |
|
don't we have ticket for the replay-vbr test_1b failure? If not I think we'd open another ticket for it or change the description/summary of this ticket and fix it here. |
| Comment by Jian Yu [ 12/Sep/12 ] |
I just searched for this failure on Jira and Maloo, and found it did not occur in the past. So I changed the summary of this ticket. For the test script, after the above Lustre failure occurs, I'm not sure whether trying to mount /mnt/lustre again will succeed or not. Li Wei, could you please review the above patch created by Sarah to see whether that will reduce the affection on the subsequent sub-tests/tests after the Lustre failure occurs? Thanks. |
| Comment by Li Wei (Inactive) [ 12/Sep/12 ] |
|
OK, will do the review in a couple of hours. |
| Comment by Niu Yawei (Inactive) [ 12/Sep/12 ] |
|
I highly suspect the replay-vbr test_1b failure is another instance of LVM/VM flush problem, like |
| Comment by Jian Yu [ 15/Sep/12 ] |
|
Here are the historical Maloo reports for replay-vbr test 1b failures: As we can see, the failure in this ticket did not occur regularly. And per the analysis from Niu, this is very likely another instance of LVM/VM flush issue. So, we can lower the priority of this ticket. We need land the test script patch created by Sarah to reduce the affection on the subsequent sub-tests/tests after the failure occurs. |
| Comment by Sarah Liu [ 20/Sep/12 ] |
|
patch for master http://review.whamcloud.com/#change,4063 |
| Comment by Jodi Levi (Inactive) [ 28/Sep/12 ] |
|
Reopening ticket to add Fix Version 2.3 |