Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.13.0, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.5, Lustre 2.12.7
-
SLES12SP4, RHEL 8.0 and Ubuntu 18.04 clients
-
3
-
9223372036854775807
Description
sanity-flr test_200 fails with messages similar to 'failed writing to 1540352:850912'. It looks like this test is failing for “new” kernels running for Ubuntu 18.04 and RHEL8 clients. This issue started around 5 May 2019.
Looking at the client test_log for https://testing.whamcloud.com/test_sets/f6a78ce6-b5bb-11e9-b88c-52540065bddc, we see a lot of resync failures and a write ‘input/output error’:
== sanity-flr test 200: stress test ================================================================== 03:20:28 (1564716028) Starting client: trevis-39vm5.trevis.whamcloud.com: -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre2 CMD: trevis-39vm5.trevis.whamcloud.com mkdir -p /mnt/lustre2 CMD: trevis-39vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre2 Starting client: trevis-39vm5.trevis.whamcloud.com: -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre3 CMD: trevis-39vm5.trevis.whamcloud.com mkdir -p /mnt/lustre3 CMD: trevis-39vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre3 fail_loc=0x1A03 CMD: trevis-39vm8 /usr/sbin/lctl set_param fail_loc=0x1A03 fail_loc=0x1A03 resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..Extending file size to 2917280 .. Extending file size to 5634016 .. failed … resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed write: Input/output error sanity-flr test_200: @@@@@@ FAIL: failed writing to 1540352:850912 sanity-flr test_200: @@@@@@ FAIL: read failed
There are errors on the OSS console log:
[33656.735317] Lustre: DEBUG MARKER: == sanity-flr test 200: stress test ================================================================== 03:20:28 (1564716028) [33746.465968] Lustre: lustre-OST0006: Client 8f96d56b-3831-a498-7a9f-598dabb943ba (at 10.9.5.223@tcp) reconnecting [33746.468305] Lustre: Skipped 2 previous similar messages [33746.691754] LustreError: 21525:0:(ldlm_lib.c:3255:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff9f4601388050 x1640713887127712/t0(0) o3->8f96d56b-3831-a498-7a9f-598dabb943ba@10.9.5.223@tcp:548/0 lens 488/440 e 0 to 0 dl 1564716153 ref 1 fl Interpret:/0/0 rc 0/0 [33746.695913] Lustre: lustre-OST0006: Bulk IO read error with 8f96d56b-3831-a498-7a9f-598dabb943ba (at 10.9.5.223@tcp), client will retry: rc -110 [33747.468651] Lustre: 32287:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564716112/real 1564716112] req@ffff9f45f8771f80 x1640688537737184/t0(0) o104->lustre-OST0001@10.9.5.223@tcp:15/16 lens 296/224 e 0 to 1 dl 1564716119 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [33747.473941] Lustre: 32287:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [33752.042829] Lustre: lustre-OST0005: Client 8f96d56b-3831-a498-7a9f-598dabb943ba (at 10.9.5.223@tcp) reconnecting [33752.044858] Lustre: Skipped 2 previous similar messages [33797.383872] Lustre: lustre-OST0002: Client 9b851642-fac7-baed-c494-00770d32c258 (at 10.9.5.223@tcp) reconnecting [33797.384269] Lustre: lustre-OST0002: Connection restored to 577b503a-fd94-6a91-d5f9-abcccfabd52d (at 10.9.5.223@tcp) [33797.384271] Lustre: Skipped 63 previous similar messages [33797.388702] Lustre: Skipped 8 previous similar messages [33847.309939] LustreError: 30819:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 106s: evicting client at 10.9.5.223@tcp ns: filter-lustre-OST0000_UUID lock: ffff9f45d47c4fc0/0xc82ebda2297890c1 lrc: 3/0,0 mode: PR/PR res: [0x480c:0x0:0x0].0x0 rrc: 6 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x60000400000020 nid: 10.9.5.223@tcp remote: 0xb3340d303f93846 expref: 9 pid: 19643 timeout: 33849 lvb_type: 1 [33849.577110] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-flr test_200: @@@@@@ FAIL: failed writing to 1540352:850912 [33849.578735] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-flr test_200: @@@@@@ FAIL: read failed
Other test failures at
https://testing.whamcloud.com/test_sets/e4e45b80-705e-11e9-bd0e-52540065bddc
Attachments
Issue Links
- is related to
-
LU-15781 Ubuntu 22.04 LTS release support
- Resolved
- is related to
-
LU-15300 mirror resync can cause EIO to unrelated applications
- Resolved
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...