Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14372

LustreError: 38823:0:(vvp_io.c:1562:vvp_io_init()) nbp11: refresh file layout [0x2400498d9:0x1642:0x0] error -5

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.12.5
    • None
    • Client running 2.12.5
      server running 2.12.4/5
    • 2
    • 9223372036854775807

    Description

      We are seeing this error with hdf5 files that results in a job crash due to I/O error.

       Wed Jan 27 10:51:01 2021 C r901i5n2 [1611773461.647472] LustreError: 38823:0:(vvp_io.c:1562:vvp_io_init()) nbp11: refresh file layout [0x2400498d9:0x1642:0x0] error -5.
      Wed Jan 27 10:51:01 2021 M r901i5n2 kernel: [1611773461.663472] LustreError: 38823:0:(vvp_io.c:1562:vvp_io_init()) nbp11: refresh file layout [0x2400498d9:0x1642:0x0] error -5.
      

      Here is the only thing relevant that I found in debug file. I have enable additional debugging hoping to gather additional info.

      0000100 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (import.c 1078 ptlrpc_connect_interpret()) nbp11-OST0024-osc-ffff8f3f503c4000  connect to target with instance 4
      00000100 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (import.c 939 ptlrpc_connect_set_flags()) nbp11-OST0024-osc-ffff8f3f503c4000  Resetting ns_connect_flags to server flags  0xa0425af2e3440478
      00000080 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize  72/1704
      00000100 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (import.c 1162 ptlrpc_connect_interpret()) connected to replayable target  nbp11-OST0024_UUID
      00000100 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f57200e7000 nbp11-OST0024_UUID  changing import state from CONNECTING to FULL
      00000080 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize  72/1704
      00000100 00080000 1.0 Wed Jan 27 10:32:40 PST 2021 0 4789 0 (recover.c 223 ptlrpc_wake_delayed()) @@@ waking (set ffff8f4de56df380)   req@ffff8f533af8edc0 x1683453707600256/t0(0) o101->nbp11-OST0024-osc-ffff8f3f503c4000@10.151.26.168@o2ib 28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl
       Rpc W/0/ffffffff rc 0/-1
      00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 1078 ptlrpc_connect_interpret()) nbp11-OST0015-osc-ffff8f3f503c4000  connect to target with instance 5
      00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 939 ptlrpc_connect_set_flags()) nbp11-OST0015-osc-ffff8f3f503c4000  Resetting ns_connect_flags to server flags  0xa0425af2e3440478
      00000080 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize  72/1704
      00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 1162 ptlrpc_connect_interpret()) connected to replayable target  nbp11-OST0015_UUID
      00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f4faa569000 nbp11-OST0015_UUID  changing import state from CONNECTING to FULL
      00000080 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize  72/1704
      00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (recover.c 223 ptlrpc_wake_delayed()) @@@ waking (set ffff8f3bbd4ef380)   req@ffff8f3f3491edc0 x1683453707789248/t0(0) o101->nbp11-OST0015-osc-ffff8f3f503c4000@10.151.26.167@o2ib 28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl
       Rpc W/0/ffffffff rc 0/-1
      00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 1078 ptlrpc_connect_interpret()) nbp11-OST001a-osc-ffff8f3f503c4000  connect to target with instance 8
      00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 939 ptlrpc_connect_set_flags()) nbp11-OST001a-osc-ffff8f3f503c4000  Resetting ns_connect_flags to server flags  0xa0425af2e3440478
      00000080 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize  72/1704
      00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 1162 ptlrpc_connect_interpret()) connected to replayable target  nbp11-OST001a_UUID
      00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f5764398800 nbp11-OST001a_UUID  changing import state from CONNECTING to FULL
      00000080 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize  72/1704
      00000100 00080000 1.0 Wed Jan 27 10:41:58 PST 2021 0 4789 0 (recover.c 223 ptlrpc_wake_delayed()) @@@ waking (set ffff8f3bbd4efb80)   req@ffff8f36219ca000 x1683453707789888/t0(0) o101->nbp11-OST001a-osc-ffff8f3f503c4000@10.151.26.171@o2ib 28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl
       Rpc W/0/ffffffff rc 0/-1
      00000100 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (import.c 1078 ptlrpc_connect_interpret()) nbp11-OST000a-osc-ffff8f3f503c4000  connect to target with instance 4
      00000100 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (import.c 939 ptlrpc_connect_set_flags()) nbp11-OST000a-osc-ffff8f3f503c4000  Resetting ns_connect_flags to server flags  0xa0425af2e3440478
      00000080 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize  72/1704
      00000100 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (import.c 1162 ptlrpc_connect_interpret()) connected to replayable target  nbp11-OST000a_UUID
      00000100 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f53b9dde800 nbp11-OST000a_UUID  changing import state from CONNECTING to FULL
      00000080 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize  72/1704
      00000100 00080000 1.0 Wed Jan 27 10:41:59 PST 2021 0 4789 0 (recover.c 223 ptlrpc_wake_delayed()) @@@ waking (set ffff8f4de56df700)   req@ffff8f57a8e38dc0 x1683453707790848/t0(0) o101->nbp11-OST000a-osc-ffff8f3f503c4000@10.151.26.169@o2ib 28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl
       Rpc W/0/ffffffff rc 0/-1
      00000100 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (import.c 1078 ptlrpc_connect_interpret()) nbp16-OST0005-osc-ffff8f3d83ba4000  connect to target with instance 4
      00000100 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (import.c 939 ptlrpc_connect_set_flags()) nbp16-OST0005-osc-ffff8f3d83ba4000  Resetting ns_connect_flags to server flags  0xa0425af2e3440478
      00000080 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize  144/600
      00000100 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (import.c 1162 ptlrpc_connect_interpret()) connected to replayable target  nbp16-OST0005_UUID
      00000100 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f57bb391800 nbp16-OST0005_UUID  changing import state from CONNECTING to FULL
      00000080 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (lcommon_misc.c 70 cl_init_ea_size()) updating def/max_easize  144/600
      00000100 00080000 1.0 Wed Jan 27 10:51:01 PST 2021 0 4789 0 (recover.c 223 ptlrpc_wake_delayed()) @@@ waking (set ffff8f3f99b8cc00)   req@ffff8f3c5228b240 x1683453707981184/t0(0) o101->nbp16-OST0005-osc-ffff8f3d83ba4000@10.151.26.195@o2ib 28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl
       Rpc W/0/ffffffff rc 0/-1
      00000100 00080000 3.0 Wed Jan 27 11:05:33 PST 2021 0 4789 0 (import.c 1753 ptlrpc_disconnect_idle_interpret()) @@@ inflight=1, refcount=5  rc = 0   req@ffff8f3fb2bb16c0 x1683453708172736/t0(0) o9->nbp11-OST0030-osc-ffff8f3f503c4000@10.151.26.168@o2ib 28/4 lens 224/192 e 0 to 0 d
      l 1611774613 ref 1 fl Interpret RN/0/0 rc 0/0
      00000100 00080000 3.0 Wed Jan 27 11:05:33 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f5764398000 nbp11-OST0030_UUID  changing import state from CONNECTING to IDLE
      00000100 00080000 18.0 Wed Jan 27 11:06:00 PST 2021 0 4789 0 (import.c 1753 ptlrpc_disconnect_idle_interpret()) @@@ inflight=1, refcount=5  rc = 0   req@ffff8f29b0455240 x1683453708177472/t0(0) o9->nbp11-OST0031-osc-ffff8f3f503c4000@10.151.26.170@o2ib 28/4 lens 224/192 e 0 to 0 
      dl 1611774640 ref 1 fl Interpret RN/0/0 rc 0/0
      00000100 00080000 18.0 Wed Jan 27 11:06:00 PST 2021 0 4789 0 (import.c 86 import_set_state_nolock()) ffff8f571ce55000 nbp11-OST0031_UUID  changing import state from CONNECTING to IDLE
      00000002 00080000 45.0 Wed Jan 27 11:07:35 PST 2021 0 39214 0 (mdc_request.c 903 mdc_close()) @@@ matched open  req@ffff8f348f370040 x1683453708196032/t41664090920(41664090920) o101->nbp11-MDT0002-mdc-ffff8f3f503c4000@10.151.26.171@o2ib 12/10 lens 976/1008 e 0 to 0 dl 1611775078
       ref 1 fl Complete RP/4/ffffffff rc 0/-1
      00000002 00080000 45.0 Wed Jan 27 11:07:35 PST 2021 0 39214 0 (mdc_request.c 903 mdc_close()) @@@ matched open  req@ffff8f348f3716c0 x1683453708196416/t41664090924(41664090924) o101->nbp11-MDT0002-mdc-ffff8f3f503c4000@10.151.26.171@o2ib 12/10 lens 1080/1080 e 0 to 0 dl 161177507
      8 ref 2 fl Complete RP/4/0 rc 0/0
      

      I did see this error from MDT to OST around the same time. So this could be a network issue.

      Jan 27 10:59:06 nbp11-srv1 kernel: [21636687.904930] Lustre: 24143:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1611773322/real 1611773322]  req@ffff9e2aca8f6300 x1669611070093696/t0(0) o6->nbp11-OST0035-osc-MDT0000@10.151.26.172@o2ib:28/4 lens 544/432 e 1 to 1 dl 1611773946 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
      Jan 27 10:59:06 nbp11-srv1 kernel: [21636687.991757] Lustre: nbp11-OST0035-osc-MDT0000: Connection to nbp11-OST0035 (at 10.151.26.172@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Jan 27 10:59:06 nbp11-srv6 kernel: [21636637.466674] Lustre: nbp11-OST0035: Client nbp11-MDT0000-mdtlov_UUID (at 10.151.26.167@o2ib) reconnecting
      
      

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: