Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.12.5
-
None
-
Client running 2.12.5
server running 2.12.4/5
-
2
-
9223372036854775807
Description
We are seeing this error with hdf5 files that results in a job crash due to I/O error.
Wed Jan 27 10:51:01 2021 C r901i5n2 [1611773461.647472] LustreError: 38823:0:(vvp_io.c:1562:vvp_io_init()) nbp11: refresh file layout [0x2400498d9:0x1642:0x0] error -5. Wed Jan 27 10:51:01 2021 M r901i5n2 kernel: [1611773461.663472] LustreError: 38823:0:(vvp_io.c:1562:vvp_io_init()) nbp11: refresh file layout [0x2400498d9:0x1642:0x0] error -5.
Here is the only thing relevant that I found in debug file. I have enable additional debugging hoping to gather additional info.
0000100 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (import.c 1078 ptlrpc_connect_interpret()) nbp11-OST0024-osc-ffff8f3f503c4000 connect to target with instance 4 00000100 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (import.c 939 ptlrpc_connect_set_flags()) nbp11-OST0024-osc-ffff8f3f503c4000 Resetting ns_connect_flags to server flags 0xa0425af2e3440478 00000080 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize 72/1704 00000100 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (import.c 1162 ptlrpc_connect_interpret()) connected to replayable target nbp11-OST0024_UUID 00000100 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f57200e7000 nbp11-OST0024_UUID changing import state from CONNECTING to FULL 00000080 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize 72/1704 00000100 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (recover.c 223 ptlrpc_wake_delayed()) @@@ waking (set ffff8f4de56df380) req@ffff8f533af8edc0 x1683453707600256/t0(0) o101->nbp11-OST0024-osc-ffff8f3f503c4000@10.151.26.168@o2ib 28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc W/0/ffffffff rc 0/-1 00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 1078 ptlrpc_connect_interpret()) nbp11-OST0015-osc-ffff8f3f503c4000 connect to target with instance 5 00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 939 ptlrpc_connect_set_flags()) nbp11-OST0015-osc-ffff8f3f503c4000 Resetting ns_connect_flags to server flags 0xa0425af2e3440478 00000080 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize 72/1704 00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 1162 ptlrpc_connect_interpret()) connected to replayable target nbp11-OST0015_UUID 00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f4faa569000 nbp11-OST0015_UUID changing import state from CONNECTING to FULL 00000080 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize 72/1704 00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (recover.c 223 ptlrpc_wake_delayed()) @@@ waking (set ffff8f3bbd4ef380) req@ffff8f3f3491edc0 x1683453707789248/t0(0) o101->nbp11-OST0015-osc-ffff8f3f503c4000@10.151.26.167@o2ib 28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc W/0/ffffffff rc 0/-1 00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 1078 ptlrpc_connect_interpret()) nbp11-OST001a-osc-ffff8f3f503c4000 connect to target with instance 8 00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 939 ptlrpc_connect_set_flags()) nbp11-OST001a-osc-ffff8f3f503c4000 Resetting ns_connect_flags to server flags 0xa0425af2e3440478 00000080 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize 72/1704 00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 1162 ptlrpc_connect_interpret()) connected to replayable target nbp11-OST001a_UUID 00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f5764398800 nbp11-OST001a_UUID changing import state from CONNECTING to FULL 00000080 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize 72/1704 00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (recover.c 223 ptlrpc_wake_delayed()) @@@ waking (set ffff8f3bbd4efb80) req@ffff8f36219ca000 x1683453707789888/t0(0) o101->nbp11-OST001a-osc-ffff8f3f503c4000@10.151.26.171@o2ib 28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc W/0/ffffffff rc 0/-1 00000100 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (import.c 1078 ptlrpc_connect_interpret()) nbp11-OST000a-osc-ffff8f3f503c4000 connect to target with instance 4 00000100 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (import.c 939 ptlrpc_connect_set_flags()) nbp11-OST000a-osc-ffff8f3f503c4000 Resetting ns_connect_flags to server flags 0xa0425af2e3440478 00000080 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize 72/1704 00000100 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (import.c 1162 ptlrpc_connect_interpret()) connected to replayable target nbp11-OST000a_UUID 00000100 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f53b9dde800 nbp11-OST000a_UUID changing import state from CONNECTING to FULL 00000080 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize 72/1704 00000100 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (recover.c 223 ptlrpc_wake_delayed()) @@@ waking (set ffff8f4de56df700) req@ffff8f57a8e38dc0 x1683453707790848/t0(0) o101->nbp11-OST000a-osc-ffff8f3f503c4000@10.151.26.169@o2ib 28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc W/0/ffffffff rc 0/-1 00000100 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (import.c 1078 ptlrpc_connect_interpret()) nbp16-OST0005-osc-ffff8f3d83ba4000 connect to target with instance 4 00000100 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (import.c 939 ptlrpc_connect_set_flags()) nbp16-OST0005-osc-ffff8f3d83ba4000 Resetting ns_connect_flags to server flags 0xa0425af2e3440478 00000080 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize 144/600 00000100 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (import.c 1162 ptlrpc_connect_interpret()) connected to replayable target nbp16-OST0005_UUID 00000100 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f57bb391800 nbp16-OST0005_UUID changing import state from CONNECTING to FULL 00000080 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize 144/600 00000100 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (recover.c 223 ptlrpc_wake_delayed()) @@@ waking (set ffff8f3f99b8cc00) req@ffff8f3c5228b240 x1683453707981184/t0(0) o101->nbp16-OST0005-osc-ffff8f3d83ba4000@10.151.26.195@o2ib 28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc W/0/ffffffff rc 0/-1 00000100 00080000 3.0 Wed Jan 27 11:05:33 PST 2021 0 4789 0 (import.c 1753 ptlrpc_disconnect_idle_interpret()) @@@ inflight=1, refcount=5 rc = 0 req@ffff8f3fb2bb16c0 x1683453708172736/t0(0) o9->nbp11-OST0030-osc-ffff8f3f503c4000@10.151.26.168@o2ib 28/4 lens 224/192 e 0 to 0 d l 1611774613 ref 1 fl Interpret RN/0/0 rc 0/0 00000100 00080000 3.0 Wed Jan 27 11:05:33 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f5764398000 nbp11-OST0030_UUID changing import state from CONNECTING to IDLE 00000100 00080000 18.0 Wed Jan 27 11:06:00 PST 2021 0 4789 0 (import.c 1753 ptlrpc_disconnect_idle_interpret()) @@@ inflight=1, refcount=5 rc = 0 req@ffff8f29b0455240 x1683453708177472/t0(0) o9->nbp11-OST0031-osc-ffff8f3f503c4000@10.151.26.170@o2ib 28/4 lens 224/192 e 0 to 0 dl 1611774640 ref 1 fl Interpret RN/0/0 rc 0/0 00000100 00080000 18.0 Wed Jan 27 11:06:00 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f571ce55000 nbp11-OST0031_UUID changing import state from CONNECTING to IDLE 00000002 00080000 45.0 Wed Jan 27 11:07:35 PST 2021 0 39214 0 (mdc_request.c 903 mdc_close()) @@@ matched open req@ffff8f348f370040 x1683453708196032/t41664090920(41664090920) o101->nbp11-MDT0002-mdc-ffff8f3f503c4000@10.151.26.171@o2ib 12/10 lens 976/1008 e 0 to 0 dl 1611775078 ref 1 fl Complete RP/4/ffffffff rc 0/-1 00000002 00080000 45.0 Wed Jan 27 11:07:35 PST 2021 0 39214 0 (mdc_request.c 903 mdc_close()) @@@ matched open req@ffff8f348f3716c0 x1683453708196416/t41664090924(41664090924) o101->nbp11-MDT0002-mdc-ffff8f3f503c4000@10.151.26.171@o2ib 12/10 lens 1080/1080 e 0 to 0 dl 161177507 8 ref 2 fl Complete RP/4/0 rc 0/0
I did see this error from MDT to OST around the same time. So this could be a network issue.
Jan 27 10:59:06 nbp11-srv1 kernel: [21636687.904930] Lustre: 24143:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1611773322/real 1611773322] req@ffff9e2aca8f6300 x1669611070093696/t0(0) o6->nbp11-OST0035-osc-MDT0000@10.151.26.172@o2ib:28/4 lens 544/432 e 1 to 1 dl 1611773946 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Jan 27 10:59:06 nbp11-srv1 kernel: [21636687.991757] Lustre: nbp11-OST0035-osc-MDT0000: Connection to nbp11-OST0035 (at 10.151.26.172@o2ib) was lost; in progress operations using this service will wait for recovery to complete Jan 27 10:59:06 nbp11-srv6 kernel: [21636637.466674] Lustre: nbp11-OST0035: Client nbp11-MDT0000-mdtlov_UUID (at 10.151.26.167@o2ib) reconnecting
Attachments
Issue Links
- duplicates
-
LU-14644 IOR SSF PFL ill-formed I/O job aborted with EIO during automated FOFB testing
- Resolved