[LU-12656] sanity-flr test 200 fails with 'failed writing to *:*’ Created: 09/Aug/19 Updated: 06/Jul/23 Resolved: 19/May/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.5, Lustre 2.12.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | failing_tests, ubuntu, ubuntu18, ubuntu20 | ||
| Environment: |
SLES12SP4, RHEL 8.0 and Ubuntu 18.04 clients |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
sanity-flr test_200 fails with messages similar to 'failed writing to 1540352:850912'. It looks like this test is failing for “new” kernels running for Ubuntu 18.04 and RHEL8 clients. This issue started around 5 May 2019. Looking at the client test_log for https://testing.whamcloud.com/test_sets/f6a78ce6-b5bb-11e9-b88c-52540065bddc, we see a lot of resync failures and a write ‘input/output error’: == sanity-flr test 200: stress test ================================================================== 03:20:28 (1564716028) Starting client: trevis-39vm5.trevis.whamcloud.com: -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre2 CMD: trevis-39vm5.trevis.whamcloud.com mkdir -p /mnt/lustre2 CMD: trevis-39vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre2 Starting client: trevis-39vm5.trevis.whamcloud.com: -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre3 CMD: trevis-39vm5.trevis.whamcloud.com mkdir -p /mnt/lustre3 CMD: trevis-39vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre3 fail_loc=0x1A03 CMD: trevis-39vm8 /usr/sbin/lctl set_param fail_loc=0x1A03 fail_loc=0x1A03 resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..Extending file size to 2917280 .. Extending file size to 5634016 .. failed … resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed write: Input/output error sanity-flr test_200: @@@@@@ FAIL: failed writing to 1540352:850912 sanity-flr test_200: @@@@@@ FAIL: read failed There are errors on the OSS console log: [33656.735317] Lustre: DEBUG MARKER: == sanity-flr test 200: stress test ================================================================== 03:20:28 (1564716028) [33746.465968] Lustre: lustre-OST0006: Client 8f96d56b-3831-a498-7a9f-598dabb943ba (at 10.9.5.223@tcp) reconnecting [33746.468305] Lustre: Skipped 2 previous similar messages [33746.691754] LustreError: 21525:0:(ldlm_lib.c:3255:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff9f4601388050 x1640713887127712/t0(0) o3->8f96d56b-3831-a498-7a9f-598dabb943ba@10.9.5.223@tcp:548/0 lens 488/440 e 0 to 0 dl 1564716153 ref 1 fl Interpret:/0/0 rc 0/0 [33746.695913] Lustre: lustre-OST0006: Bulk IO read error with 8f96d56b-3831-a498-7a9f-598dabb943ba (at 10.9.5.223@tcp), client will retry: rc -110 [33747.468651] Lustre: 32287:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564716112/real 1564716112] req@ffff9f45f8771f80 x1640688537737184/t0(0) o104->lustre-OST0001@10.9.5.223@tcp:15/16 lens 296/224 e 0 to 1 dl 1564716119 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [33747.473941] Lustre: 32287:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [33752.042829] Lustre: lustre-OST0005: Client 8f96d56b-3831-a498-7a9f-598dabb943ba (at 10.9.5.223@tcp) reconnecting [33752.044858] Lustre: Skipped 2 previous similar messages [33797.383872] Lustre: lustre-OST0002: Client 9b851642-fac7-baed-c494-00770d32c258 (at 10.9.5.223@tcp) reconnecting [33797.384269] Lustre: lustre-OST0002: Connection restored to 577b503a-fd94-6a91-d5f9-abcccfabd52d (at 10.9.5.223@tcp) [33797.384271] Lustre: Skipped 63 previous similar messages [33797.388702] Lustre: Skipped 8 previous similar messages [33847.309939] LustreError: 30819:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 106s: evicting client at 10.9.5.223@tcp ns: filter-lustre-OST0000_UUID lock: ffff9f45d47c4fc0/0xc82ebda2297890c1 lrc: 3/0,0 mode: PR/PR res: [0x480c:0x0:0x0].0x0 rrc: 6 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x60000400000020 nid: 10.9.5.223@tcp remote: 0xb3340d303f93846 expref: 9 pid: 19643 timeout: 33849 lvb_type: 1 [33849.577110] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-flr test_200: @@@@@@ FAIL: failed writing to 1540352:850912 [33849.578735] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-flr test_200: @@@@@@ FAIL: read failed Other test failures at |
| Comments |
| Comment by Andreas Dilger [ 21/Apr/23 ] |
|
Alex pushed patch: https://review.whamcloud.com/46413 " |
| Comment by James A Simmons [ 19/May/23 ] |
|
With the landing of |