[LU-10100] sanity test_27a: setstripe failed with "error on ioctl 0x8008669a for '*' (3): Invalid argument" Created: 06/Oct/17 Updated: 10/Feb/20 Resolved: 20/Aug/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | James Casper | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ppc | ||
| Environment: |
trevis, full, x86_64 servers, ppc clients |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||
| Description |
|
https://testing.whamcloud.com/test_sessions/ba995751-659c-4e63-9b5b-fbf101137b78 From test_log: stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 error on ioctl 0x8008669a for '/mnt/lustre/d27/f0' (3): Invalid argument error: setstripe: create striped file '/mnt/lustre/d27/f0' failed: Invalid argument sanity test_27a: @@@@@@ FAIL: setstripe failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:5289:error() = /usr/lib64/lustre/tests/sanity.sh:1357:test_27a() = /usr/lib64/lustre/tests/test-framework.sh:5565:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5604:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:5451:run_test() = /usr/lib64/lustre/tests/sanity.sh:1361:main() |
| Comments |
| Comment by James Nunez (Inactive) [ 02/May/18 ] |
|
Several sanity tests fail 'lfs setstripe' for PPC architectures including test_27a, 27b, 27d, 27e, 27k, 27r, 27w, 27wa, 27z, 27C, 27E, 56t, 56u, 56x, 56xa, 56xb, 65k, 78, 101b, … For full test group results, the first time we see these tests fail for PPC is on 2017-09-17 20:43:36 UTC for master build # 3642, version 2.10.53.1. |
| Comment by James Nunez (Inactive) [ 29/Apr/19 ] |
|
Similar setstripe error for ppc for sanityn test 16a; https://testing.whamcloud.com/test_sets/7d68fa34-668f-11e9-8bb1-52540065bddc . == sanityn test 16a: 12500 iterations of dual-mount fsx ============================================== 03:45:53 (1555991153) CMD: trevis-55vm12 /usr/sbin/lctl get_param -n lod.lustre-MDT0000*.stripesize lfs setstripe: setstripe error for '/mnt/lustre/f16a.sanityn': Invalid argument 7+0 records in 7+0 records out 7340032 bytes (7.3 MB) copied, 1.41347 s, 5.2 MB/s 7+0 records in 7+0 records out 7340032 bytes (7.3 MB) copied, 0.707808 s, 10.4 MB/s lfs setstripe: setstripe error for '/mnt/lustre/f16a.sanityn': Invalid argument Chance of close/open is 1 in 50 ... dowrite: write: Invalid argument LOG DUMP (1 total operations): 1[0]: 1555991453.674708 WRITE 0x237000 thru 0x238fff (0x2000 bytes) HOLE Correct content saved for comparison (maybe hexdump "/mnt/lustre/f16a.sanityn" vs "/mnt/lustre/f16a.sanityn.fsxgood") sanityn test_16a: @@@@@@ FAIL: fsx with O_DIRECT failed. |
| Comment by James Nunez (Inactive) [ 30/Apr/19 ] |
|
Similar issue seen on sanity-hsm tests 11a, 11b, 12a, 12b, 12c, 12n, 13, 25a, 27a, 30a, 31a, 72, 77, 110a, 111a, 201, 222a, 222c, and 223a fail with trevis-77vm2: lhsmtool_posix: 1555997033.356277 lhsmtool_posix[26740]: importing '/mnt/lustre/d11a.sanity-hsm/f11a.sanity-hsm' from '/tmp/arc1/sanity-hsm.test_11a//d11a.sanity-hsm/f11a.sanity-hsm' trevis-77vm2: lhsmtool_posix: setstripe error for '/mnt/lustre/d11a.sanity-hsm/f11a.sanity-hsm': Invalid argument trevis-77vm2: lhsmtool_posix: cannot create '/mnt/lustre/d11a.sanity-hsm/f11a.sanity-hsm' for import: Invalid argument (22) trevis-77vm2: lhsmtool_posix: 1555997033.361745 lhsmtool_posix[26740]: cannot import '/mnt/lustre/d11a.sanity-hsm/f11a.sanity-hsm' from '/tmp/arc1/sanity-hsm.test_11a//d11a.sanity-hsm/f11a.sanity-hsm': Invalid argument (22) trevis-77vm2: lhsmtool_posix: 1555997033.361757 lhsmtool_posix[26740]: process finished, errs: 0 major, 0 minor, rc=-22 (Invalid argument) sanity-hsm test_11a: @@@@@@ FAIL: Failed to import 'd11a.sanity-hsm/f11a.sanity-hsm' to '/mnt/lustre/d11a.sanity-hsm/f11a.sanity-hsm' |
| Comment by James Nunez (Inactive) [ 30/Apr/19 ] |
|
sanity-flr test_0a, 0b, 0c, 0e, 0f, 0g, 1, and 42 all fail with 'create mirrored file /mnt/lustre/d0a.sanity-flr/f*.sanity-flr failed' Looking at the suite_log for a recent failure, https://testing.whamcloud.com/test_sets/a1786810-668f-11e9-8bb1-52540065bddc, we see all these tests fail with an ‘Invalid argument’ message == sanity-flr test 0a: lfs mirror create with -N option ============================================== 06:36:59 (1556001419) lfs mirror create: cannot create composite file '/mnt/lustre/d0a.sanity-flr/f0a.sanity-flr': Invalid argument sanity-flr test_0a: @@@@@@ FAIL: create mirrored file /mnt/lustre/d0a.sanity-flr/f0a.sanity-flr failed sanity-flr test_0d has a similar failure == sanity-flr test 0d: lfs mirror extend with -N option ============================================== 06:39:48 (1556001588) lfs mirror extend: cannot create composite file '/mnt/lustre/d0d.sanity-flr/. :VOLATILE:0000:70DEDB39': Invalid argument error: lfs mirror extend: /mnt/lustre/d0d.sanity-flr/f0d.sanity-flr: cannot create volatile file: Operation not permitted sanity-flr test_0d: @@@@@@ FAIL: convert and extend /mnt/lustre/d0d.sanity-flr/f0d.sanity-flr failed |
| Comment by Andreas Dilger [ 30/May/19 ] |
|
I did some investigation into the debug logs of one of the many failed tests to see where the problem is coming from. It appears that the client is able to create the file with open(O_LOV_DELAY_CREATE) and then calls ioctl(LL_IOC_LOV_SETSTRIPE), but the MDS returns -EINVAL without much information in the logs (I don't think the debug=-1 mask is being set on the MDS for sanity.sh): mdc_finish_enqueue() @@@ op: 1 disposition: 3, status: -22 req@c000000074ed5100 x1634660749735520/t0(0) o101->lustre-MDT0000-mdc-c00000007457a800@10.9.5.36@tcp:12/10 lens 648/568 e 0 to 0 dl 1558936451 ref 1 fl Complete:R/0/0 rc 301/301 mdc_finish_intent_lock() D_IT dentry intent: open status -22 disp 3 rc -22 mdc_intent_lock() Process leaving (rc=-22) ll_intent_file_open() lock enqueue: err: -22 ll_intent_file_open() Process leaving via out (rc=-22) ll_lov_setstripe_ea_info() Process leaving via out_unlock (rc=-22) ll_lov_setstripe() Process leaving (rc=-22) ll_file_ioctl() Process leaving (rc=-22) so it definitely seems that the MDS is not swabbing part or all of the incoming request and/or the client is not doing the same. My preference would be to fix this on both ends, if possible and depending on what the problem is, so that we have maximum coverage for new/old clients talking to old/new servers. |
| Comment by Andreas Dilger [ 31/May/19 ] |
|
Looking on the MDS I see that it is already failing when checking the lmm_magic: lod_verify_striping()) Process entered lod_verify_striping()) bad userland LOV MAGIC: 0xd00bd10b lod_verify_striping()) Process leaving (rc= -22) lod_qos_parse_config()) Process leaving (rc= -22) lod_prepare_create()) Process leaving (rc= -22) lod_declare_striped_create()) Process leaving via out (rc= -22) lod_declare_xattr_set()) Process leaving (rc= -22) mdd_create_data()) Process leaving via stop (rc= -22) mdt_mfd_open()) Process leaving (rc= -22) mdt_finish_open()) Process leaving (rc= -22) mdt_open_by_fid_lock()) Process leaving via out_unlock (rc= -22) mdt_reint_open()) no object for [0x200000405:0x15:0x0]: -22 |
| Comment by Patrick Farrell (Inactive) [ 31/May/19 ] |
|
Andreas didn't call this out specifically (If your brain works the right way, I guess it's obvious lod_verify_striping()) bad userland LOV MAGIC: 0xd00bd10b Which we can see because the magic is: #define LOV_MAGIC_MAGIC 0x0BD0 #define LOV_MAGIC_V1 (0x0BD10000 | LOV_MAGIC_MAGIC) |
| Comment by Peter Jones [ 31/May/19 ] |
|
Jian Can you please follow up on this? Thanks Peter |
| Comment by Jian Yu [ 21/Jun/19 ] |
|
In llapi_file_open_param(), while setting lmm_magic, we need use cpu_to_le32() to convert the format into little-endian form. I'm creating the patch. |
| Comment by Gerrit Updater [ 22/Jun/19 ] |
|
Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35291 |
| Comment by Jian Yu [ 23/Jun/19 ] |
|
Besides lmm_magic, other fields in struct lov_user_md also need to be converted. |
| Comment by Andreas Dilger [ 24/Jul/19 ] |
|
Jian, can you please also make a separate patch for the MDS to swab the layout received from the client if it is in big-endian format. This will simplify interop for deployment on systems where there are lots of PPC clients that have not been upgraded. |
| Comment by Jian Yu [ 24/Jul/19 ] |
|
Sure, Andreas. Let me work on this. |
| Comment by Gerrit Updater [ 27/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35291/ |
| Comment by Gerrit Updater [ 29/Jul/19 ] |
|
Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35633 |
| Comment by Gerrit Updater [ 11/Aug/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35633/ |
| Comment by Jian Yu [ 20/Aug/19 ] |
|
The patch for client has landed for Lustre 2.13.0. Patch for MDS will be worked in LU-12673. |