Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11976

req wrong generation leading to I/O errors on 2.12 clients

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.12.0
    • None
    • Clients:2.12.0 Servers (Oak): 2.10.5
    • 3
    • 9223372036854775807

    Description

      Since we upgraded our clients to 2.12.0, our users are reporting more I/O errors on Oak (2.10 servers) that seem to be related to the following Lustre Error messages:

      Example:

      Feb 18 19:25:59 sh-106-64.int kernel: LustreError: 397481:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c67bf646000 x1624748797937520/t0(0) o101->oak-OST005f-osc-ffff8c809690880
      

      NAMD job failing with I/O error:

      Info: Working in the current directory /oak/....../aqueous2opls/run2
      ...
      FATAL ERROR: Error on write to binary file step6.6_equilibration.restart.vel: Input/output error
      

      Timestamp of NAMD file: Feb 18 19:25 cluster6.out

      The Lustre client shows a lot of these error messages, on different OSTs. This is all Oak related logs on a client (sh-106-64) that has generated I/O errors:

      Feb 06 11:23:22 sh-106-64.int kernel: Lustre: Mounted oak-client
      Feb 07 12:49:28 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549572562/real 1549572562]  req@ffff8c6591f3e300 x16247483-
      Feb 10 08:43:24 sh-106-64.int kernel: LustreError: 1287:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c66aaa35400 x1624748322417600/t0(0) o101->oak-OST006d-osc-ffff8c8096908800@
      Feb 13 10:55:06 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550084100/real 1550084100]  req@ffff8c7204ebe900 x16247484-
      Feb 15 07:46:03 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550245557/real 1550245557]  req@ffff8c7fc423a400 x16247484-
      Feb 15 09:17:12 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550251026/real 1550251026]  req@ffff8c7fc423b600 x16247484-
      Feb 15 09:21:23 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550251277/real 1550251277]  req@ffff8c7fc4238f00 x16247484-
      Feb 15 10:09:28 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550254162/real 1550254162]  req@ffff8c7fc423b000 x16247484-
      Feb 15 10:22:01 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550254915/real 1550254915]  req@ffff8c7fc423bc00 x16247484-
      Feb 15 13:28:05 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550266079/real 1550266079]  req@ffff8c7fc4238c00 x16247484-
      Feb 15 13:31:26 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550266280/real 1550266280]  req@ffff8c7fc423ce00 x16247484-
      Feb 15 13:38:07 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550266681/real 1550266681]  req@ffff8c7fc423e300 x16247484-
      Feb 15 13:44:24 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550267058/real 1550267058]  req@ffff8c7fc423ad00 x16247484-
      Feb 16 18:12:47 sh-106-64.int kernel: LustreError: 11-0: oak-OST0071-osc-ffff8c8096908800: operation ost_connect to node 10.0.2.109@o2ib5 failed: rc = -19
      Feb 17 00:40:54 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550392848/real 1550392848]  req@ffff8c8020528300 x16247484-
      Feb 18 19:25:59 sh-106-64.int kernel: LustreError: 397481:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c67bf646000 x1624748797937520/t0(0) o101->oak-OST005f-osc-ffff8c809690880
      Feb 19 10:55:50 sh-106-64.int kernel: LustreError: 39073:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c77a92f2400 x1624748825028640/t0(0) o101->oak-OST004f-osc-ffff8c8096908800
      Feb 19 11:46:27 sh-106-64.int kernel: LustreError: 39926:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c8041fada00 x1624748825812992/t0(0) o101->oak-OST0033-osc-ffff8c8096908800
      Feb 19 14:14:03 sh-106-64.int kernel: LustreError: 48889:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c69c2e1bc00 x1624748827483728/t0(0) o101->oak-OST0066-osc-ffff8c8096908800
      Feb 19 14:36:38 sh-106-64.int kernel: LustreError: 56018:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c67d265e900 x1624748827749392/t0(0) o101->oak-OST0071-osc-ffff8c8096908800
      

      Any idea of how to troubleshoot this issue? Perhaps this is a 2.10/2.12 compat issue?
      Thanks,
      Stephane

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: