Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15551

Interop: Various tests fail with fallocate “Protocol error” errors

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.15.0
    • Lustre 2.15.0
    • 3
    • 9223372036854775807

    Description

      Several tests are failing with the a variety error messages, but they all have the message “do_preallocate: fallocate: Protocol error” . These failures started September 11, 2021.

      sanity-benchmark test_fsx and sanityn test_16a, test_16b, test_16c fail with the following in the suite_log:

      do_preallocate: fallocate: Protocol error
      

      sanity test_150b fail with the following in the suite_log:

      fallocate failed, error Protocol error, mode 0, offset 62914560, len 4194304
      

      sanity-benchmark test_fsx fails with 'fsx failed' starting with 2.14.54.45 servers and 2.14.0 clients; https://testing.whamcloud.com/test_sets/daab9d17-e641-427e-9572-168989a4b23b. The following in the suite_log:

      == sanity-benchmark test fsx: fsx ==================================================================== 12:02:50 (1631361770)
      keep default fallocate mode: 0
      debug=0
      Using: /usr/lib64/lustre/tests/fsx -c 50 -p 1000 -S 5151 -P /tmp -l 5461544 	     -N 100000  /mnt/lustre/f0.fsxfile
      Chance of close/open is 1 in 50
      Seed set to 5151
      fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported
      fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported
      truncating to largest ever: 0x3531dc
      fallocate: 118b1 to 469048
      do_preallocate: fallocate: Protocol error
      LOG DUMP (10 total operations):
      1: 1631361772.173266 TRUNCATE UP	from 0x00000 to 0x3531dc
      2: 1631361772.176123 WRITE    0x475dee thru 0x476c9e (0x00eb1 bytes) HOLE
      3: 1631361772.176366 TRUNCATE DOWN	from 0x476c9f to 0x18202b
      4: 1631361772.178621 MAPWRITE 0xd455f thru 0xd6920 (0x023c2 bytes)
      5: 1631361772.246526 SKIPPED (no operation)
      6: 1631361772.246564 READ     0x1b680 thru 0x26b75 (0x0b4f6 bytes)
      7: 1631361772.247881 MAPWRITE 0x22e11f thru 0x2313a0 (0x03282 bytes)
      8: 1631361772.280197 MAPWRITE 0x374883 thru 0x3810ae (0x0c82c bytes)
      9: 1631361772.315333 MAPWRITE 0x4796a2 thru 0x47a8f8 (0x01257 bytes)
      10: 1631361772.346572 FALLOC  	from 0x118b1 to 0x1efe9	(0x0d738 bytes)INTERIOR
      Correct content saved for comparison
      (maybe hexdump "/mnt/lustre/f0.fsxfile" vs "/tmp//f0.fsxfile.fsxgood")
       sanity-benchmark test_fsx: @@@@@@ FAIL: fsx failed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
        = /usr/lib64/lustre/tests/sanity-benchmark.sh:196:test_fsx()
      

      sanityn test_16a, test_16b, test_16c fails with 'fsx failed' starting on same data and same client/server versions;

      == sanityn test 16a: 12500 iterations of dual-mount fsx ============================================== 18:58:04 (1631386684)
      CMD: onyx-23vm7 /usr/sbin/lctl get_param -n lod.lustre-MDT0000*.stripesize
      keep default fallocate mode: 0
      7+0 records in
      7+0 records out
      7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.577856 s, 12.7 MB/s
      7+0 records in
      7+0 records out
      7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.46945 s, 15.6 MB/s
      Chance of close/open is 1 in 50
      Seed set to 6688
      fd 0: /mnt/lustre/f16a.sanityn
      fd 1: /mnt/lustre2/f16a.sanityn
      fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported
      fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported
      fallocate: 5233c8 to 4b288
      do_preallocate: fallocate: Protocol error
      LOG DUMP (2 total operations):
      1[1]: 1631386688.420122 WRITE    0x561c9c thru 0x56e64f (0x0c9b4 bytes) HOLE
      2[0]: 1631386688.427553 FALLOC  	from 0x5233c8 to 0x52350f	(0x00147 bytes)INTERIOR
      Correct content saved for comparison
      (maybe hexdump "/mnt/lustre/f16a.sanityn" vs "/mnt/lustre/f16a.sanityn.fsxgood")
       sanityn test_16a: @@@@@@ FAIL: fsx failed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
        = /usr/lib64/lustre/tests/sanityn.sh:402:test_16a()
      …
      == sanityn test 16b: 12500 iterations of dual-mount fsx at small size ================================ 18:58:12 (1631386692)
      keep default fallocate mode: 0
      7+0 records in
      7+0 records out
      7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.515146 s, 14.2 MB/s
      7+0 records in
      7+0 records out
      7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.453771 s, 16.2 MB/s
      Chance of close/open is 1 in 50
      Seed set to 6694
      fd 0: /mnt/lustre/f16b.sanityn
      fd 1: /mnt/lustre2/f16b.sanityn
      fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported
      fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported
      skipping zero size read
      fallocate: 3a858e to 5e8cc6
      do_preallocate: fallocate: Protocol error
      LOG DUMP (4 total operations):
      1[18446744073708685781]: 1631386695.071343 SKIPPED (no operation)
      2[1]: 1631386695.073361 SKIPPED (no operation)
      3[1]: 1631386695.073642 WRITE    0x98f7df thru 0x991253 (0x01a75 bytes) HOLE
      4[0]: 1631386695.081362 FALLOC  	from 0x3a858e to 0x3aa51d	(0x01f8f bytes)INTERIOR
      Correct content saved for comparison
      (maybe hexdump "/mnt/lustre/f16b.sanityn" vs "/mnt/lustre/f16b.sanityn.fsxgood")
       sanityn test_16b: @@@@@@ FAIL: fsx with tiny write failed. 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
        = /usr/lib64/lustre/tests/sanityn.sh:431:test_16b()
      …
      == sanityn test 16c: verify data consistency on ldiskfs with cache disabled (b=17397) ================ 18:58:18 (1631386698)
      CMD: onyx-23vm7 /usr/sbin/lctl get_param -n lod.lustre-MDT0000*.stripesize
      keep default fallocate mode: 0
      7+0 records in
      7+0 records out
      7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.632607 s, 11.6 MB/s
      7+0 records in
      7+0 records out
      7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.479825 s, 15.3 MB/s
      CMD: onyx-23vm7 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1
      CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      Waiting for local destroys to complete
      CMD: onyx-23vm6 /usr/sbin/lctl get_param -n obdfilter.lustre-OST*.read_cache_enable 		osd-*.lustre-OST*.read_cache_enable 2>&1
      pdsh@onyx-23vm4: onyx-23vm6: ssh exited with exit code 2
      CMD: onyx-23vm6 /usr/sbin/lctl set_param -n obdfilter.lustre-OST*.read_cache_enable=0 		osd-*.lustre-OST*.read_cache_enable=0 2>&1
      pdsh@onyx-23vm4: onyx-23vm6: ssh exited with exit code 2
      CMD: onyx-23vm6 /usr/sbin/lctl set_param -n obdfilter.lustre-OST*.writethrough_cache_enable=0 		osd-*.lustre-OST*.writethrough_cache_enable=0 2>&1
      pdsh@onyx-23vm4: onyx-23vm6: ssh exited with exit code 2
      Chance of close/open is 1 in 50
      Seed set to 6716
      fd 0: /mnt/lustre/f16c.sanityn
      fd 1: /mnt/lustre2/f16c.sanityn
      fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported
      fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported
      fallocate: 4154fd to ffdea255
      do_preallocate: fallocate: Protocol error
      LOG DUMP (5 total operations):
      1[18446744073708595669]: 1631386716.729518 SKIPPED (no operation)
      2[18446744073708595669]: 1631386716.730419 SKIPPED (no operation)
      3[1]: 1631386716.736190 MAPWRITE 0x1f481f thru 0x1ff751 (0x0af33 bytes)
      4[1]: 1631386716.938927 SKIPPED (no operation)
      5[1]: 1631386716.939194 FALLOC  	from 0x4154fd to 0x4246d1	(0x0f1d4 bytes)INTERIOR
      Correct content saved for comparison
      (maybe hexdump "/mnt/lustre/f16c.sanityn" vs "/mnt/lustre/f16c.sanityn.fsxgood")
       sanityn test_16c: @@@@@@ FAIL: fsx failed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
        = /usr/lib64/lustre/tests/sanityn.sh:462:test_16c()
      
      

      sanity test_150b fails with a similar message ever since the test was introduced; https://testing.whamcloud.com/test_sets/9ea27d14-cf74-483f-98b4-46fc0efa0146.

      == sanity test 150b: Verify fallocate (prealloc) functionality ======================================= 07:55:05 (1623225305)
      keep default fallocate mode: 0
      fallocate failed, error Protocol error, mode 0, offset 62914560, len 4194304
       sanity test_150b: @@@@@@ FAIL: fallocate failed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
        = /usr/lib64/lustre/tests/sanity.sh:13441:test_150b()
       

      Attachments

        Issue Links

          Activity

            People

              arshad512 Arshad Hussain
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: