Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.15.0
-
3
-
9223372036854775807
Description
Several tests are failing with the a variety error messages, but they all have the message “do_preallocate: fallocate: Protocol error” . These failures started September 11, 2021.
sanity-benchmark test_fsx and sanityn test_16a, test_16b, test_16c fail with the following in the suite_log:
do_preallocate: fallocate: Protocol error
sanity test_150b fail with the following in the suite_log:
fallocate failed, error Protocol error, mode 0, offset 62914560, len 4194304
sanity-benchmark test_fsx fails with 'fsx failed' starting with 2.14.54.45 servers and 2.14.0 clients; https://testing.whamcloud.com/test_sets/daab9d17-e641-427e-9572-168989a4b23b. The following in the suite_log:
== sanity-benchmark test fsx: fsx ==================================================================== 12:02:50 (1631361770) keep default fallocate mode: 0 debug=0 Using: /usr/lib64/lustre/tests/fsx -c 50 -p 1000 -S 5151 -P /tmp -l 5461544 -N 100000 /mnt/lustre/f0.fsxfile Chance of close/open is 1 in 50 Seed set to 5151 fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported truncating to largest ever: 0x3531dc fallocate: 118b1 to 469048 do_preallocate: fallocate: Protocol error LOG DUMP (10 total operations): 1: 1631361772.173266 TRUNCATE UP from 0x00000 to 0x3531dc 2: 1631361772.176123 WRITE 0x475dee thru 0x476c9e (0x00eb1 bytes) HOLE 3: 1631361772.176366 TRUNCATE DOWN from 0x476c9f to 0x18202b 4: 1631361772.178621 MAPWRITE 0xd455f thru 0xd6920 (0x023c2 bytes) 5: 1631361772.246526 SKIPPED (no operation) 6: 1631361772.246564 READ 0x1b680 thru 0x26b75 (0x0b4f6 bytes) 7: 1631361772.247881 MAPWRITE 0x22e11f thru 0x2313a0 (0x03282 bytes) 8: 1631361772.280197 MAPWRITE 0x374883 thru 0x3810ae (0x0c82c bytes) 9: 1631361772.315333 MAPWRITE 0x4796a2 thru 0x47a8f8 (0x01257 bytes) 10: 1631361772.346572 FALLOC from 0x118b1 to 0x1efe9 (0x0d738 bytes)INTERIOR Correct content saved for comparison (maybe hexdump "/mnt/lustre/f0.fsxfile" vs "/tmp//f0.fsxfile.fsxgood") sanity-benchmark test_fsx: @@@@@@ FAIL: fsx failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanity-benchmark.sh:196:test_fsx()
sanityn test_16a, test_16b, test_16c fails with 'fsx failed' starting on same data and same client/server versions;
== sanityn test 16a: 12500 iterations of dual-mount fsx ============================================== 18:58:04 (1631386684) CMD: onyx-23vm7 /usr/sbin/lctl get_param -n lod.lustre-MDT0000*.stripesize keep default fallocate mode: 0 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.577856 s, 12.7 MB/s 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.46945 s, 15.6 MB/s Chance of close/open is 1 in 50 Seed set to 6688 fd 0: /mnt/lustre/f16a.sanityn fd 1: /mnt/lustre2/f16a.sanityn fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported fallocate: 5233c8 to 4b288 do_preallocate: fallocate: Protocol error LOG DUMP (2 total operations): 1[1]: 1631386688.420122 WRITE 0x561c9c thru 0x56e64f (0x0c9b4 bytes) HOLE 2[0]: 1631386688.427553 FALLOC from 0x5233c8 to 0x52350f (0x00147 bytes)INTERIOR Correct content saved for comparison (maybe hexdump "/mnt/lustre/f16a.sanityn" vs "/mnt/lustre/f16a.sanityn.fsxgood") sanityn test_16a: @@@@@@ FAIL: fsx failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanityn.sh:402:test_16a() … == sanityn test 16b: 12500 iterations of dual-mount fsx at small size ================================ 18:58:12 (1631386692) keep default fallocate mode: 0 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.515146 s, 14.2 MB/s 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.453771 s, 16.2 MB/s Chance of close/open is 1 in 50 Seed set to 6694 fd 0: /mnt/lustre/f16b.sanityn fd 1: /mnt/lustre2/f16b.sanityn fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported skipping zero size read fallocate: 3a858e to 5e8cc6 do_preallocate: fallocate: Protocol error LOG DUMP (4 total operations): 1[18446744073708685781]: 1631386695.071343 SKIPPED (no operation) 2[1]: 1631386695.073361 SKIPPED (no operation) 3[1]: 1631386695.073642 WRITE 0x98f7df thru 0x991253 (0x01a75 bytes) HOLE 4[0]: 1631386695.081362 FALLOC from 0x3a858e to 0x3aa51d (0x01f8f bytes)INTERIOR Correct content saved for comparison (maybe hexdump "/mnt/lustre/f16b.sanityn" vs "/mnt/lustre/f16b.sanityn.fsxgood") sanityn test_16b: @@@@@@ FAIL: fsx with tiny write failed. Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanityn.sh:431:test_16b() … == sanityn test 16c: verify data consistency on ldiskfs with cache disabled (b=17397) ================ 18:58:18 (1631386698) CMD: onyx-23vm7 /usr/sbin/lctl get_param -n lod.lustre-MDT0000*.stripesize keep default fallocate mode: 0 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.632607 s, 11.6 MB/s 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.479825 s, 15.3 MB/s CMD: onyx-23vm7 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1 CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* Waiting for local destroys to complete CMD: onyx-23vm6 /usr/sbin/lctl get_param -n obdfilter.lustre-OST*.read_cache_enable osd-*.lustre-OST*.read_cache_enable 2>&1 pdsh@onyx-23vm4: onyx-23vm6: ssh exited with exit code 2 CMD: onyx-23vm6 /usr/sbin/lctl set_param -n obdfilter.lustre-OST*.read_cache_enable=0 osd-*.lustre-OST*.read_cache_enable=0 2>&1 pdsh@onyx-23vm4: onyx-23vm6: ssh exited with exit code 2 CMD: onyx-23vm6 /usr/sbin/lctl set_param -n obdfilter.lustre-OST*.writethrough_cache_enable=0 osd-*.lustre-OST*.writethrough_cache_enable=0 2>&1 pdsh@onyx-23vm4: onyx-23vm6: ssh exited with exit code 2 Chance of close/open is 1 in 50 Seed set to 6716 fd 0: /mnt/lustre/f16c.sanityn fd 1: /mnt/lustre2/f16c.sanityn fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported fallocate: 4154fd to ffdea255 do_preallocate: fallocate: Protocol error LOG DUMP (5 total operations): 1[18446744073708595669]: 1631386716.729518 SKIPPED (no operation) 2[18446744073708595669]: 1631386716.730419 SKIPPED (no operation) 3[1]: 1631386716.736190 MAPWRITE 0x1f481f thru 0x1ff751 (0x0af33 bytes) 4[1]: 1631386716.938927 SKIPPED (no operation) 5[1]: 1631386716.939194 FALLOC from 0x4154fd to 0x4246d1 (0x0f1d4 bytes)INTERIOR Correct content saved for comparison (maybe hexdump "/mnt/lustre/f16c.sanityn" vs "/mnt/lustre/f16c.sanityn.fsxgood") sanityn test_16c: @@@@@@ FAIL: fsx failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanityn.sh:462:test_16c()
sanity test_150b fails with a similar message ever since the test was introduced; https://testing.whamcloud.com/test_sets/9ea27d14-cf74-483f-98b4-46fc0efa0146.
== sanity test 150b: Verify fallocate (prealloc) functionality ======================================= 07:55:05 (1623225305) keep default fallocate mode: 0 fallocate failed, error Protocol error, mode 0, offset 62914560, len 4194304 sanity test_150b: @@@@@@ FAIL: fallocate failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanity.sh:13441:test_150b()
Attachments
Issue Links
- is duplicated by
-
LU-15698 Interop sanity-flr test_50c: fallocate failed: keep size mode is unsupported
- Resolved
- is related to
-
LU-15748 interop: sanity test_150b: fallocate failed, error Operation not supported, mode 0, offset 62914560, len 4194304
- Resolved
- is related to
-
LU-14160 Implement fallocate FALLOCATE_FL_PUNCH_HOLE support
- Resolved