Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.12.0
-
None
-
Clients:2.12.0 Servers (Oak): 2.10.5
-
3
-
9223372036854775807
Description
Since we upgraded our clients to 2.12.0, our users are reporting more I/O errors on Oak (2.10 servers) that seem to be related to the following Lustre Error messages:
Example:
Feb 18 19:25:59 sh-106-64.int kernel: LustreError: 397481:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation: req@ffff8c67bf646000 x1624748797937520/t0(0) o101->oak-OST005f-osc-ffff8c809690880
NAMD job failing with I/O error:
Info: Working in the current directory /oak/....../aqueous2opls/run2 ... FATAL ERROR: Error on write to binary file step6.6_equilibration.restart.vel: Input/output error
Timestamp of NAMD file: Feb 18 19:25 cluster6.out
The Lustre client shows a lot of these error messages, on different OSTs. This is all Oak related logs on a client (sh-106-64) that has generated I/O errors:
Feb 06 11:23:22 sh-106-64.int kernel: Lustre: Mounted oak-client Feb 07 12:49:28 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549572562/real 1549572562] req@ffff8c6591f3e300 x16247483- Feb 10 08:43:24 sh-106-64.int kernel: LustreError: 1287:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation: req@ffff8c66aaa35400 x1624748322417600/t0(0) o101->oak-OST006d-osc-ffff8c8096908800@ Feb 13 10:55:06 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550084100/real 1550084100] req@ffff8c7204ebe900 x16247484- Feb 15 07:46:03 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550245557/real 1550245557] req@ffff8c7fc423a400 x16247484- Feb 15 09:17:12 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550251026/real 1550251026] req@ffff8c7fc423b600 x16247484- Feb 15 09:21:23 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550251277/real 1550251277] req@ffff8c7fc4238f00 x16247484- Feb 15 10:09:28 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550254162/real 1550254162] req@ffff8c7fc423b000 x16247484- Feb 15 10:22:01 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550254915/real 1550254915] req@ffff8c7fc423bc00 x16247484- Feb 15 13:28:05 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550266079/real 1550266079] req@ffff8c7fc4238c00 x16247484- Feb 15 13:31:26 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550266280/real 1550266280] req@ffff8c7fc423ce00 x16247484- Feb 15 13:38:07 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550266681/real 1550266681] req@ffff8c7fc423e300 x16247484- Feb 15 13:44:24 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550267058/real 1550267058] req@ffff8c7fc423ad00 x16247484- Feb 16 18:12:47 sh-106-64.int kernel: LustreError: 11-0: oak-OST0071-osc-ffff8c8096908800: operation ost_connect to node 10.0.2.109@o2ib5 failed: rc = -19 Feb 17 00:40:54 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550392848/real 1550392848] req@ffff8c8020528300 x16247484- Feb 18 19:25:59 sh-106-64.int kernel: LustreError: 397481:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation: req@ffff8c67bf646000 x1624748797937520/t0(0) o101->oak-OST005f-osc-ffff8c809690880 Feb 19 10:55:50 sh-106-64.int kernel: LustreError: 39073:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation: req@ffff8c77a92f2400 x1624748825028640/t0(0) o101->oak-OST004f-osc-ffff8c8096908800 Feb 19 11:46:27 sh-106-64.int kernel: LustreError: 39926:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation: req@ffff8c8041fada00 x1624748825812992/t0(0) o101->oak-OST0033-osc-ffff8c8096908800 Feb 19 14:14:03 sh-106-64.int kernel: LustreError: 48889:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation: req@ffff8c69c2e1bc00 x1624748827483728/t0(0) o101->oak-OST0066-osc-ffff8c8096908800 Feb 19 14:36:38 sh-106-64.int kernel: LustreError: 56018:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation: req@ffff8c67d265e900 x1624748827749392/t0(0) o101->oak-OST0071-osc-ffff8c8096908800
Any idea of how to troubleshoot this issue? Perhaps this is a 2.10/2.12 compat issue?
Thanks,
Stephane