Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12856

LustreError: 82937:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 0(270336)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0, Lustre 2.12.4
    • Lustre 2.12.2
    • None
    • L2.12.2 server & L2.10.8 Server
      L2.12.2 client & L2.11 Client
    • 2
    • 9223372036854775807

    Description

      We are seeing

       Oct 14 07:51:06 nbp7-oss7 kernel: [1110415.124675] LNet: 72766:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.151.54.10@o2ib
      Oct 14 07:53:28 nbp7-oss7 kernel: [1110557.738283] LustreError: 48242:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 0(270336)  req@ffff8bd7711d9850 x1647322708401808/t0(0) o3->c1bb3af1-465f-4e4e-3b45-831f7f0aa442@10.151.53.134@o2ib:520/0 lens 488/440 e 0 to 0 dl 1571064920 ref 1 fl Interpret:/2/0 rc 0/0
      Oct 14 07:53:28 nbp7-oss7 kernel: [1110557.820414] Lustre: nbp7-OST0009: Bulk IO read error with c1bb3af1-465f-4e4e-3b45-831f7f0aa442 (at 10.151.53.134@o2ib), client will retry: rc -110
      Oct 14 07:58:40 nbp7-oss7 kernel: [1110869.753867] LNet: 72764:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.151.54.10@o2ib
      Oct 14 08:00:58 nbp7-oss7 kernel: [1111007.747557] LustreError: 7338:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 0(270336)  req@ffff8bd4ef6c3050 x1647322708401808/t0(0) o3->c1bb3af1-465f-4e4e-3b45-831f7f0aa442@10.151.53.134@o2ib:220/0 lens 488/440 e 0 to 0 dl 1571065375 ref 1 fl Interpret:/2/0 rc 0/0
      Oct 14 08:00:58 nbp7-oss7 kernel: [1111007.829410] Lustre: nbp7-OST0009: Bulk IO read error with c1bb3af1-465f-4e4e-3b45-831f7f0aa442 (at 10.151.53.134@o2ib), client will retry: rc -110
      Oct 14 08:02:39 nbp7-oss7 kernel: [1111108.765855] Lustre: nbp7-OST0003: Client c1bb3af1-465f-4e4e-3b45-831f7f0aa442 (at 10.151.53.134@o2ib) reconnecting
      Oct 14 08:02:39 nbp7-oss7 kernel: [1111108.800504] Lustre: Skipped 5 previous similar messages
      Oct 14 08:02:39 nbp7-oss7 kernel: [1111108.818286] Lustre: nbp7-OST0003: Connection restored to 2ee3c4e1-9fd7-3338-5bf8-d1f02bcd8a20 (at 10.151.53.134@o2ib)
      Oct 14 08:02:39 nbp7-oss7 kernel: [1111108.818288] Lustre: Skipped 5 previous similar messages
      Oct 14 08:09:47 nbp7-oss7 kernel: [1111536.849491] LNet: 72766:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.151.54.10@o2ib
      Oct 14 08:11:48 nbp7-oss7 kernel: [1111657.759009] LustreError: 82937:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 0(270336)  req@ffff8bd724b11850 x1647322708401808/t0(0) o3->c1bb3af1-465f-4e4e-3b45-831f7f0aa442@10.151.53.134@o2ib:132/0 lens 488/440 e 0 to 0 dl 1571066042 ref 1 fl Interpret:/2/0 rc 0/0
      Oct 14 08:11:48 nbp7-oss7 kernel: [1111657.841189] Lustre: nbp7-OST0009: Bulk IO read error with c1bb3af1-465f-4e4e-3b45-831f7f0aa442 (at 10.151.53.134@o2ib), client will retry: rc -110
      Oct 14 08:12:57 nbp7-oss7 kernel: [1111726.231189] Lustre: nbp7-OST000f: Client c1bb3af1-465f-4e4e-3b45-831f7f0aa442 (at 10.151.53.134@o2ib) reconnecting
      Oct 14 08:12:57 nbp7-oss7 kernel: [1111726.265819] Lustre: Skipped 5 previous similar messages
      

      We have the 3 patch reported in LU-12385 and LU-12772 applied to both client and server.
      Client Configs

      r593i6n16 ~ # lnetctl global show
      global:
          numa_range: 0
          max_intf: 200
          discovery: 1
          drop_asym_route: 0
          retry_count: 0
          transaction_timeout: 100
          health_sensitivity: 0
          recovery_interval: 1
      ----
      options ko2iblnd require_privileged_port=0 use_privileged_port=0
      options ko2iblnd timeout=100 retry_count=7 peer_timeout=0 map_on_demand=32 peer_credits=32 concurrent_sends=32
      options lnet networks=o2ib(ib1)
      options lnet avoid_asym_router_failure=1 check_routers_before_use=1 small_router_buffers=65536 large_router_buffers=8192
      options lnet lnet_transaction_timeout=100
      options ptlrpc at_max=600 at_min=200
      

      Server Configs

      nbp7-oss7 ~ # lnetctl global show
      global:
          numa_range: 0
          max_intf: 200
          discovery: 1
          drop_asym_route: 0
          retry_count: 0
          transaction_timeout: 100
          health_sensitivity: 0
          recovery_interval: 1
      ----
      options ko2iblnd require_privileged_port=0 use_privileged_port=0
      options ko2iblnd ntx=251072 credits=125536 fmr_pool_size=62769 
      options ko2iblnd timeout=100 retry_count=7 peer_timeout=0 map_on_demand=32 peer_credits=32 concurrent_sends=32
      
      
      #lnet
      options lnet networks=o2ib(ib1) 
      options lnet routes="o2ib233 10.151.26.[80-94]@o2ib; o2ib313 10.151.25.[195-197,202-205,222]@o2ib 10.151.26.[60,127,140-144,150-154]@o2ib; o2ib417 10.151.26.[148,149]@o2ib 10.151.25.[167-170]@o2ib"
      options lnet dead_router_check_interval=60 live_router_check_interval=30
      options lnet avoid_asym_router_failure=1 check_routers_before_use=1 small_router_buffers=65536 large_router_buffers=8192options lnet lnet_transaction_timeout=100
      options ptlrpc at_max=600 at_min=200
      

      These bulk errors are a major issue. We though matching client and server to 2.12.2 would solve it but doesn't look like it.

      Attachments

        1. 0001-LU-12856-debug.patch
          2 kB
        2. 0002-LU-12856-revert-LU-9983.patch
          2 kB
        3. client.10.151.53.134.debug.gz
          805 kB
        4. nbp7-mds.2019-10-16.out.gz
          65.90 MB
        5. r407i2n14.out2.gz
          50.70 MB
        6. server.xid.1647322708401808.debug.gz
          78.16 MB

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: