Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15553

replay-vbr test 12a fails with 'test_12a failed with 4'

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.0, Lustre 2.15.0, Lustre 2.15.4, Lustre 2.15.5, Lustre 2.15.6
    • 3
    • 9223372036854775807

    Description

      replay-vbr test_12a started failing with 'test_12a failed with 4' on August 4, 2021 for Lustre 2.14.53.7 with logs at https://testing.whamcloud.com/test_sets/17efe0ba-7e4a-4e7f-b7f5-02383e1314c5. We’ve seen this test fail for ZFS and ldiskfs, but, so far, always DNE.

      Looking at a recent failure at https://testing.whamcloud.com/test_sets/014ce4c3-c654-47f9-9333-1c58ebf545c3, the suite_log shows

      CMD: onyx-24vm7 e2label /dev/mapper/mds1_flakey 2>/dev/null
      Started lustre-MDT0000
      CMD: onyx-55vm7.onyx.whamcloud.com unlinkmany /mnt/lustre/f12a.replay-vbr- 25
       - unlinked 0 (time 1643080125 ; total 0 ; last 0)
      total: 25 unlinks in 0 seconds: inf unlinks/second
      CMD: onyx-55vm7.onyx.whamcloud.com unlinkmany /mnt/lustre/f12a.replay-vbr-3- 25
       - unlinked 0 (time 1643080125 ; total 0 ; last 0)
      total: 25 unlinks in 0 seconds: inf unlinks/second
      CMD: onyx-55vm7.onyx.whamcloud.com checkstat -v /mnt/lustre/d12a.replay-vbr/f12a.replay-vbr
       replay-vbr test_12a: @@@@@@ FAIL: test_12a failed with 4 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6391:error()
        = /usr/lib64/lustre/tests/test-framework.sh:6695:run_one()
      

      Looking at the code for this test,

      1152     # All 50 files should have been replayed
      1153     do_node $CLIENT1 unlinkmany $DIR/$tfile- 25 || return 2
      1154     do_node $CLIENT1 unlinkmany $DIR/$tfile-3- 25 || return 3
      1155     do_node $CLIENT1 $CHECKSTAT $DIR/$tdir/$tfile && return 4
      1156 
      1157     return 0
      1158 }
      1159 run_test 12a "lost data due to missed REMOTE client during replay"
      

      The call to checkstat is what produces this error.

      Attachments

        Issue Links

          Activity

            [LU-15553] replay-vbr test 12a fails with 'test_12a failed with 4'

            People

              laisiyao Lai Siyao
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: