[LU-15551] Interop: Various tests fail with fallocate “Protocol error” errors Created: 11/Feb/22 Updated: 22/Apr/22 Resolved: 18/Mar/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | James Nunez (Inactive) | Assignee: | Arshad Hussain |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | interop | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
Several tests are failing with the a variety error messages, but they all have the message “do_preallocate: fallocate: Protocol error” . These failures started September 11, 2021. sanity-benchmark test_fsx and sanityn test_16a, test_16b, test_16c fail with the following in the suite_log: do_preallocate: fallocate: Protocol error sanity test_150b fail with the following in the suite_log: fallocate failed, error Protocol error, mode 0, offset 62914560, len 4194304 sanity-benchmark test_fsx fails with 'fsx failed' starting with 2.14.54.45 servers and 2.14.0 clients; https://testing.whamcloud.com/test_sets/daab9d17-e641-427e-9572-168989a4b23b. The following in the suite_log: == sanity-benchmark test fsx: fsx ==================================================================== 12:02:50 (1631361770) keep default fallocate mode: 0 debug=0 Using: /usr/lib64/lustre/tests/fsx -c 50 -p 1000 -S 5151 -P /tmp -l 5461544 -N 100000 /mnt/lustre/f0.fsxfile Chance of close/open is 1 in 50 Seed set to 5151 fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported truncating to largest ever: 0x3531dc fallocate: 118b1 to 469048 do_preallocate: fallocate: Protocol error LOG DUMP (10 total operations): 1: 1631361772.173266 TRUNCATE UP from 0x00000 to 0x3531dc 2: 1631361772.176123 WRITE 0x475dee thru 0x476c9e (0x00eb1 bytes) HOLE 3: 1631361772.176366 TRUNCATE DOWN from 0x476c9f to 0x18202b 4: 1631361772.178621 MAPWRITE 0xd455f thru 0xd6920 (0x023c2 bytes) 5: 1631361772.246526 SKIPPED (no operation) 6: 1631361772.246564 READ 0x1b680 thru 0x26b75 (0x0b4f6 bytes) 7: 1631361772.247881 MAPWRITE 0x22e11f thru 0x2313a0 (0x03282 bytes) 8: 1631361772.280197 MAPWRITE 0x374883 thru 0x3810ae (0x0c82c bytes) 9: 1631361772.315333 MAPWRITE 0x4796a2 thru 0x47a8f8 (0x01257 bytes) 10: 1631361772.346572 FALLOC from 0x118b1 to 0x1efe9 (0x0d738 bytes)INTERIOR Correct content saved for comparison (maybe hexdump "/mnt/lustre/f0.fsxfile" vs "/tmp//f0.fsxfile.fsxgood") sanity-benchmark test_fsx: @@@@@@ FAIL: fsx failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanity-benchmark.sh:196:test_fsx() sanityn test_16a, test_16b, test_16c fails with 'fsx failed' starting on same data and same client/server versions; == sanityn test 16a: 12500 iterations of dual-mount fsx ============================================== 18:58:04 (1631386684) CMD: onyx-23vm7 /usr/sbin/lctl get_param -n lod.lustre-MDT0000*.stripesize keep default fallocate mode: 0 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.577856 s, 12.7 MB/s 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.46945 s, 15.6 MB/s Chance of close/open is 1 in 50 Seed set to 6688 fd 0: /mnt/lustre/f16a.sanityn fd 1: /mnt/lustre2/f16a.sanityn fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported fallocate: 5233c8 to 4b288 do_preallocate: fallocate: Protocol error LOG DUMP (2 total operations): 1[1]: 1631386688.420122 WRITE 0x561c9c thru 0x56e64f (0x0c9b4 bytes) HOLE 2[0]: 1631386688.427553 FALLOC from 0x5233c8 to 0x52350f (0x00147 bytes)INTERIOR Correct content saved for comparison (maybe hexdump "/mnt/lustre/f16a.sanityn" vs "/mnt/lustre/f16a.sanityn.fsxgood") sanityn test_16a: @@@@@@ FAIL: fsx failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanityn.sh:402:test_16a() … == sanityn test 16b: 12500 iterations of dual-mount fsx at small size ================================ 18:58:12 (1631386692) keep default fallocate mode: 0 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.515146 s, 14.2 MB/s 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.453771 s, 16.2 MB/s Chance of close/open is 1 in 50 Seed set to 6694 fd 0: /mnt/lustre/f16b.sanityn fd 1: /mnt/lustre2/f16b.sanityn fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported skipping zero size read fallocate: 3a858e to 5e8cc6 do_preallocate: fallocate: Protocol error LOG DUMP (4 total operations): 1[18446744073708685781]: 1631386695.071343 SKIPPED (no operation) 2[1]: 1631386695.073361 SKIPPED (no operation) 3[1]: 1631386695.073642 WRITE 0x98f7df thru 0x991253 (0x01a75 bytes) HOLE 4[0]: 1631386695.081362 FALLOC from 0x3a858e to 0x3aa51d (0x01f8f bytes)INTERIOR Correct content saved for comparison (maybe hexdump "/mnt/lustre/f16b.sanityn" vs "/mnt/lustre/f16b.sanityn.fsxgood") sanityn test_16b: @@@@@@ FAIL: fsx with tiny write failed. Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanityn.sh:431:test_16b() … == sanityn test 16c: verify data consistency on ldiskfs with cache disabled (b=17397) ================ 18:58:18 (1631386698) CMD: onyx-23vm7 /usr/sbin/lctl get_param -n lod.lustre-MDT0000*.stripesize keep default fallocate mode: 0 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.632607 s, 11.6 MB/s 7+0 records in 7+0 records out 7340032 bytes (7.3 MB, 7.0 MiB) copied, 0.479825 s, 15.3 MB/s CMD: onyx-23vm7 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1 CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-23vm7 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* Waiting for local destroys to complete CMD: onyx-23vm6 /usr/sbin/lctl get_param -n obdfilter.lustre-OST*.read_cache_enable osd-*.lustre-OST*.read_cache_enable 2>&1 pdsh@onyx-23vm4: onyx-23vm6: ssh exited with exit code 2 CMD: onyx-23vm6 /usr/sbin/lctl set_param -n obdfilter.lustre-OST*.read_cache_enable=0 osd-*.lustre-OST*.read_cache_enable=0 2>&1 pdsh@onyx-23vm4: onyx-23vm6: ssh exited with exit code 2 CMD: onyx-23vm6 /usr/sbin/lctl set_param -n obdfilter.lustre-OST*.writethrough_cache_enable=0 osd-*.lustre-OST*.writethrough_cache_enable=0 2>&1 pdsh@onyx-23vm4: onyx-23vm6: ssh exited with exit code 2 Chance of close/open is 1 in 50 Seed set to 6716 fd 0: /mnt/lustre/f16c.sanityn fd 1: /mnt/lustre2/f16c.sanityn fsx: test_fallocate: filesystem does not support fallocate mode 0x3, disabling!: Operation not supported fsx: test_fallocate: filesystem does not support fallocate mode 0x10, disabling!: Operation not supported fallocate: 4154fd to ffdea255 do_preallocate: fallocate: Protocol error LOG DUMP (5 total operations): 1[18446744073708595669]: 1631386716.729518 SKIPPED (no operation) 2[18446744073708595669]: 1631386716.730419 SKIPPED (no operation) 3[1]: 1631386716.736190 MAPWRITE 0x1f481f thru 0x1ff751 (0x0af33 bytes) 4[1]: 1631386716.938927 SKIPPED (no operation) 5[1]: 1631386716.939194 FALLOC from 0x4154fd to 0x4246d1 (0x0f1d4 bytes)INTERIOR Correct content saved for comparison (maybe hexdump "/mnt/lustre/f16c.sanityn" vs "/mnt/lustre/f16c.sanityn.fsxgood") sanityn test_16c: @@@@@@ FAIL: fsx failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanityn.sh:462:test_16c() sanity test_150b fails with a similar message ever since the test was introduced; https://testing.whamcloud.com/test_sets/9ea27d14-cf74-483f-98b4-46fc0efa0146. == sanity test 150b: Verify fallocate (prealloc) functionality ======================================= 07:55:05 (1623225305) keep default fallocate mode: 0 fallocate failed, error Protocol error, mode 0, offset 62914560, len 4194304 sanity test_150b: @@@@@@ FAIL: fallocate failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanity.sh:13441:test_150b() |
| Comments |
| Comment by Peter Jones [ 11/Feb/22 ] |
|
Arshad Can you please advise? Does this seem related to Peter |
| Comment by Patrick Farrell [ 11/Feb/22 ] |
|
(Specifically https://review.whamcloud.com/40877 ) |
| Comment by Arshad Hussain [ 12/Feb/22 ] |
|
>Can you please advise? Does this seem related to Hi Peter, I am checking on this. |
| Comment by Arshad Hussain [ 12/Feb/22 ] |
|
>(Specifically https://review.whamcloud.com/#/c/46221/ ) Hi Patrick, The URL fails to open with "Not found " and/or permisison. Sorry I cannot check this out. I am however, looking into other info in bug report. |
| Comment by Andreas Dilger [ 12/Feb/22 ] |
|
I've fixed the URL to https://review.whamcloud.com/40877 " It probably needs the server to return -EOPNOTSUPP instead of -EPROTO for unsupported fallocate modes. On the client (for interop) it should convert -EPROTO to -EOPNOTSUPP in fallocate. I thought there was already a patch to do this, but I guess it is not in the right place - https://review.whamcloud.com/41148 " |
| Comment by Gerrit Updater [ 14/Feb/22 ] |
|
"Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/46516 |
| Comment by Arshad Hussain [ 14/Feb/22 ] |
|
Hi Andreas, Thanks for your comments and analysis. The client is getting -EPROTO from ofd_fallocate_hdl functin (server side) Changed it to return EOPNOTSUPP. The client part looks ok to me. Thanks! |
| Comment by Gerrit Updater [ 18/Mar/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46516/ |
| Comment by Peter Jones [ 18/Mar/22 ] |
|
Landed for 2.15 |