Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1850

replay-vbr test_1b: FAIL: open succeeded unexpectedly

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.3.0, Lustre 2.4.0
    • Lustre 2.3.0
    • None
    • 3
    • 4426

    Description

      https://maloo.whamcloud.com/test_sessions/a693a99c-f682-11e1-8eb0-52540035b04c

      https://maloo.whamcloud.com/test_logs/04f30584-f686-11e1-8eb0-52540035b04c/show_text

      [HOST:client-29vm6] [old_quota_type:3] [new_quota_type:ug3]
      open /mnt/lustre failed: Stale NFS file handle (116)
      quotacheck failed: Stale NFS file handle
      metadata-updates : @@@@@@ FAIL: quotacheck has failed
       Trace dump:
       = /usr/lib64/lustre/tests/test-framework.sh:3638:error_noexit()
       = /usr/lib64/lustre/tests/test-framework.sh:3660:error()
       = /usr/lib64/lustre/tests/test-framework.sh:989:quota_save_version()
       = /u

      Attachments

        Activity

          [LU-1850] replay-vbr test_1b: FAIL: open succeeded unexpectedly

          Reopening ticket to add Fix Version 2.3

          jlevi Jodi Levi (Inactive) added a comment - Reopening ticket to add Fix Version 2.3
          sarah Sarah Liu added a comment - patch for master http://review.whamcloud.com/#change,4063
          yujian Jian Yu added a comment -

          Here are the historical Maloo reports for replay-vbr test 1b failures:
          http://tinyurl.com/8tybkum

          As we can see, the failure in this ticket did not occur regularly. And per the analysis from Niu, this is very likely another instance of LVM/VM flush issue. So, we can lower the priority of this ticket.

          We need land the test script patch created by Sarah to reduce the affection on the subsequent sub-tests/tests after the failure occurs.

          yujian Jian Yu added a comment - Here are the historical Maloo reports for replay-vbr test 1b failures: http://tinyurl.com/8tybkum As we can see, the failure in this ticket did not occur regularly. And per the analysis from Niu, this is very likely another instance of LVM/VM flush issue. So, we can lower the priority of this ticket. We need land the test script patch created by Sarah to reduce the affection on the subsequent sub-tests/tests after the failure occurs.

          I highly suspect the replay-vbr test_1b failure is another instance of LVM/VM flush problem, like LU-1012. From the log, looks like the llog object creation isn't committed before replay-barrier set the fs as read_only.

          niu Niu Yawei (Inactive) added a comment - I highly suspect the replay-vbr test_1b failure is another instance of LVM/VM flush problem, like LU-1012 . From the log, looks like the llog object creation isn't committed before replay-barrier set the fs as read_only.

          OK, will do the review in a couple of hours.

          liwei Li Wei (Inactive) added a comment - OK, will do the review in a couple of hours.
          yujian Jian Yu added a comment -

          don't we have ticket for the replay-vbr test_1b failure? If not I think we'd open another ticket for it or change the description/summary of this ticket and fix it here.

          I just searched for this failure on Jira and Maloo, and found it did not occur in the past. So I changed the summary of this ticket.

          For the test script, after the above Lustre failure occurs, I'm not sure whether trying to mount /mnt/lustre again will succeed or not. Li Wei, could you please review the above patch created by Sarah to see whether that will reduce the affection on the subsequent sub-tests/tests after the Lustre failure occurs? Thanks.

          yujian Jian Yu added a comment - don't we have ticket for the replay-vbr test_1b failure? If not I think we'd open another ticket for it or change the description/summary of this ticket and fix it here. I just searched for this failure on Jira and Maloo, and found it did not occur in the past. So I changed the summary of this ticket. For the test script, after the above Lustre failure occurs, I'm not sure whether trying to mount /mnt/lustre again will succeed or not. Li Wei, could you please review the above patch created by Sarah to see whether that will reduce the affection on the subsequent sub-tests/tests after the Lustre failure occurs? Thanks.

          don't we have ticket for the replay-vbr test_1b failure? If not I think we'd open another ticket for it or change the description/summary of this ticket and fix it here.

          niu Niu Yawei (Inactive) added a comment - don't we have ticket for the replay-vbr test_1b failure? If not I think we'd open another ticket for it or change the description/summary of this ticket and fix it here.
          yujian Jian Yu added a comment - - edited

          The initial issue is that replay-vbr test_1b failed as follows:

          CMD: client-29vm6 checkstat -v -a /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b
          Can't lstat /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b: Stale NFS file handle
           replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly 
          

          Dmesg on Client client-29vm6:

          [33466.651998] Lustre: DEBUG MARKER: checkstat -v -a /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b
          [33466.659487] LustreError: 3642:0:(file.c:2328:ll_inode_revalidate_fini()) failure -116 inode 524300
          [33466.724390] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly 
          

          Dmesg on MDS client-29vm3:

          LustreError: 8705:0:(llog_lvfs.c:610:llog_lvfs_create()) error looking up logfile 0x60:0x3f5bbfdd: rc -116
          LustreError: 8705:0:(llog_cat.c:168:llog_cat_id2handle()) error opening log id 0x60:3f5bbfdd: rc -116
          LustreError: 8705:0:(llog_obd.c:314:cat_cancel_cb()) Cannot find handle for log 0x60
          LustreError: 8676:0:(llog_obd.c:387:llog_obd_origin_setup()) llog_process() with cat_cancel_cb failed: -116
          LustreError: 8676:0:(llog_obd.c:214:llog_setup_named()) obd lustre-OST0003-osc-MDT0000 ctxt 2 lop_setup=ffffffffa03cbdf0 failed -116
          LustreError: 8676:0:(osc_request.c:3281:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT
          LustreError: 8676:0:(osc_request.c:3298:__osc_llog_init()) osc 'lustre-OST0003-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' catid ffff880065dd78d0 rc=-116
          LustreError: 8676:0:(osc_request.c:3300:__osc_llog_init()) logid 0x58:0x3f5bbfd2
          LustreError: 8676:0:(osc_request.c:3328:osc_llog_init()) rc: -116
          LustreError: 8676:0:(lov_log.c:243:lov_llog_init()) error osc_llog_init idx 3 osc 'lustre-OST0003-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' (rc=-116)
          LustreError: 8676:0:(llog_lvfs.c:610:llog_lvfs_create()) error looking up logfile 0x59:0x3f5bbfd3: rc -116
          LustreError: 8676:0:(llog_obd.c:214:llog_setup_named()) obd lustre-OST0004-osc-MDT0000 ctxt 2 lop_setup=ffffffffa03cbdf0 failed -116
          LustreError: 8676:0:(osc_request.c:3281:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT
          LustreError: 8676:0:(osc_request.c:3298:__osc_llog_init()) osc 'lustre-OST0004-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' catid ffff880065dd78d0 rc=-116
          LustreError: 8676:0:(osc_request.c:3300:__osc_llog_init()) logid 0x59:0x3f5bbfd3
          LustreError: 8676:0:(osc_request.c:3328:osc_llog_init()) rc: -116
          LustreError: 8676:0:(lov_log.c:243:lov_llog_init()) error osc_llog_init idx 4 osc 'lustre-OST0004-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' (rc=-116)
          LustreError: 8706:0:(lov_log.c:155:lov_llog_origin_connect()) error osc_llog_connect tgt 3 (-107)
          LustreError: 8706:0:(mds_lov.c:873:__mds_lov_synchronize()) lustre-OST0003_UUID failed at llog_origin_connect: -107
          LustreError: 8706:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0003_UUID sync failed -107, deactivating
          LustreError: 8707:0:(lov_log.c:155:lov_llog_origin_connect()) error osc_llog_connect tgt 4 (-107)
          LustreError: 8707:0:(mds_lov.c:873:__mds_lov_synchronize()) lustre-OST0004_UUID failed at llog_origin_connect: -107
          LustreError: 8707:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0004_UUID sync failed -107, deactivating
          Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
          Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1 2>/dev/null
          Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly 
          Lustre: DEBUG MARKER: replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly
          

          It was really a Lustre issue. We need fix this Lustre issue first.

          yujian Jian Yu added a comment - - edited The initial issue is that replay-vbr test_1b failed as follows: CMD: client-29vm6 checkstat -v -a /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b Can't lstat /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b: Stale NFS file handle replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly Dmesg on Client client-29vm6: [33466.651998] Lustre: DEBUG MARKER: checkstat -v -a /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b [33466.659487] LustreError: 3642:0:(file.c:2328:ll_inode_revalidate_fini()) failure -116 inode 524300 [33466.724390] Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly Dmesg on MDS client-29vm3: LustreError: 8705:0:(llog_lvfs.c:610:llog_lvfs_create()) error looking up logfile 0x60:0x3f5bbfdd: rc -116 LustreError: 8705:0:(llog_cat.c:168:llog_cat_id2handle()) error opening log id 0x60:3f5bbfdd: rc -116 LustreError: 8705:0:(llog_obd.c:314:cat_cancel_cb()) Cannot find handle for log 0x60 LustreError: 8676:0:(llog_obd.c:387:llog_obd_origin_setup()) llog_process() with cat_cancel_cb failed: -116 LustreError: 8676:0:(llog_obd.c:214:llog_setup_named()) obd lustre-OST0003-osc-MDT0000 ctxt 2 lop_setup=ffffffffa03cbdf0 failed -116 LustreError: 8676:0:(osc_request.c:3281:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT LustreError: 8676:0:(osc_request.c:3298:__osc_llog_init()) osc 'lustre-OST0003-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' catid ffff880065dd78d0 rc=-116 LustreError: 8676:0:(osc_request.c:3300:__osc_llog_init()) logid 0x58:0x3f5bbfd2 LustreError: 8676:0:(osc_request.c:3328:osc_llog_init()) rc: -116 LustreError: 8676:0:(lov_log.c:243:lov_llog_init()) error osc_llog_init idx 3 osc 'lustre-OST0003-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' (rc=-116) LustreError: 8676:0:(llog_lvfs.c:610:llog_lvfs_create()) error looking up logfile 0x59:0x3f5bbfd3: rc -116 LustreError: 8676:0:(llog_obd.c:214:llog_setup_named()) obd lustre-OST0004-osc-MDT0000 ctxt 2 lop_setup=ffffffffa03cbdf0 failed -116 LustreError: 8676:0:(osc_request.c:3281:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT LustreError: 8676:0:(osc_request.c:3298:__osc_llog_init()) osc 'lustre-OST0004-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' catid ffff880065dd78d0 rc=-116 LustreError: 8676:0:(osc_request.c:3300:__osc_llog_init()) logid 0x59:0x3f5bbfd3 LustreError: 8676:0:(osc_request.c:3328:osc_llog_init()) rc: -116 LustreError: 8676:0:(lov_log.c:243:lov_llog_init()) error osc_llog_init idx 4 osc 'lustre-OST0004-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' (rc=-116) LustreError: 8706:0:(lov_log.c:155:lov_llog_origin_connect()) error osc_llog_connect tgt 3 (-107) LustreError: 8706:0:(mds_lov.c:873:__mds_lov_synchronize()) lustre-OST0003_UUID failed at llog_origin_connect: -107 LustreError: 8706:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0003_UUID sync failed -107, deactivating LustreError: 8707:0:(lov_log.c:155:lov_llog_origin_connect()) error osc_llog_connect tgt 4 (-107) LustreError: 8707:0:(mds_lov.c:873:__mds_lov_synchronize()) lustre-OST0004_UUID failed at llog_origin_connect: -107 LustreError: 8707:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0004_UUID sync failed -107, deactivating Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly Lustre: DEBUG MARKER: replay-vbr test_1b: @@@@@@ FAIL: open succeeded unexpectedly It was really a Lustre issue. We need fix this Lustre issue first.
          sarah Sarah Liu added a comment - - edited

          Try unmount and remount client, make sure the system is ready for the following tests.
          Patch for b2_3 is here http://review.whamcloud.com/#change,3929

          sarah Sarah Liu added a comment - - edited Try unmount and remount client, make sure the system is ready for the following tests. Patch for b2_3 is here http://review.whamcloud.com/#change,3929
          pjones Peter Jones added a comment -

          Thanks Niu. Sarah will look at making the necessary test script improvements

          pjones Peter Jones added a comment - Thanks Niu. Sarah will look at making the necessary test script improvements

          People

            sarah Sarah Liu
            pjones Peter Jones
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: