Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3714

Single client data copy from/to lfs hangs client. [server,client]bulk_callback errors

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Blocker
    • None
    • Lustre 2.1.6
    • None
    • CentOS 6.4, 2.6.32-358.11.1.el6_lustre.x86_64, Intel Truescale IB/QDR, single-rail, in-kernel infiniband.

    Description

      Fresh boot/mount of lfs 2.1.6. Pre-existing ldiskfs OSTs, lfs upgraded from 2.1.5. Single client mount of lfs via o2ib. Copy of 2GB files from/to lfs causes client hang and loss of connection to two OSS nodes.

      Datafile creation:

      cd /lustre2 ; tar cf ./test.tar /usr
      

      Simple copy test:

      for i in `cat iter`; do cp test.tar test.tar.$i; done
      

      After 40GB of data transfer (2GB, read & write to new file, 10 files) the client process hangs.

      Logs of MDS, OSS and client shows no IB lid or other hardware errors.

      Output from /var/log/messages
      MDS:

      Aug  6 13:27:38 lustrefs-sys-mds0 kernel: Lustre: 7848:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has failed due to network error: [sent 1375813658/real 1375813658]  req@ffff88047b2bc800 x1442643895649363/t0(0) o8->lustrefssys-OST000a-osc-MDT0000@10.148.0.154@o2ib:28/4 lens 368/512 e 0 to 1 dl 1375813713 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Aug  6 13:27:38 lustrefs-sys-mds0 kernel: Lustre: 7848:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 23 previous similar messages
      Aug  6 13:28:00 lustrefs-sys-mds0 kernel: Lustre: 7892:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from d053eba6-b0f0-eafb-4a55-cb86e1c046fb@10.148.0.154@o2ib t0 exp (null) cur 1375813680 last 0
      Aug  6 13:28:00 lustrefs-sys-mds0 kernel: Lustre: 7892:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 4 previous similar messages
      Aug  6 13:28:03 lustrefs-sys-mds0 kernel: Lustre: lustrefssys-OST000a-osc-MDT0000: Connection restored to lustrefssys-OST000a (at 10.148.0.154@o2ib)
      Aug  6 13:28:03 lustrefs-sys-mds0 kernel: Lustre: MDS mdd_obd-lustrefssys-MDT0000: lustrefssys-OST000a_UUID now active, resetting orphans
      Aug  6 13:28:03 lustrefs-sys-mds0 kernel: Lustre: Skipped 14 previous similar messages
      

      OSS10:

      Aug  6 13:28:00 lustrefs-sys-oss10 kernel: Lustre: 7777:0:(ldlm_lib.c:952:target_handle_connect()) lustrefssys-OST000a: connection from 4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib recovering/t0 exp (null) cur 1375813680 last 0
      Aug  6 13:28:00 lustrefs-sys-oss10 kernel: Lustre: lustrefssys-OST000a: Denying connection for new client 10.148.0.143@o2ib (at 4ece0c04-00b5-aedd-f612-11cbcc7fb566), waiting for 0 clients in recovery for 5:00
      Aug  6 13:28:00 lustrefs-sys-oss10 kernel: Lustre: MGC10.148.0.142@o2ib: Reactivating import
      Aug  6 13:28:00 lustrefs-sys-oss10 kernel: Lustre: 7777:0:(ldlm_lib.c:952:target_handle_connect()) lustrefssys-OST000a: connection from 4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib recovering/t0 exp (null) cur 1375813680 last 0
      Aug  6 13:28:00 lustrefs-sys-oss10 kernel: Lustre: lustrefssys-OST000a: Denying connection for new client 10.148.0.143@o2ib (at 4ece0c04-00b5-aedd-f612-11cbcc7fb566), waiting for 0 clients in recovery for 4:59
      Aug  6 13:28:03 lustrefs-sys-oss10 ntpd[7983]: Listening on interface #7 ib0, fe80::211:7500:77:dc5a#123 Enabled
      Aug  6 13:28:03 lustrefs-sys-oss10 kernel: Lustre: 7777:0:(ldlm_lib.c:952:target_handle_connect()) lustrefssys-OST000a: connection from lustrefssys-MDT0000-mdtlov_UUID@10.148.0.142@o2ib recovering/t0 exp ffff880270301000 cur 1375813683 last 1375812940
      Aug  6 13:28:03 lustrefs-sys-oss10 kernel: Lustre: lustrefssys-OST000a: sending delayed replies to recovered clients
      Aug  6 13:28:03 lustrefs-sys-oss10 kernel: Lustre: lustrefssys-OST000a: received MDS connection from 10.148.0.142@o2ib
      Aug  6 13:28:09 lustrefs-sys-oss10 ntpd[7983]: synchronized to 198.122.144.26, stratum 2
      Aug  6 13:28:25 lustrefs-sys-oss10 kernel: Lustre: 7777:0:(ldlm_lib.c:952:target_handle_connect()) lustrefssys-OST000a: connection from 4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib t0 exp (null) cur 1375813705 last 0
      

      oss06:

      Aug  6 13:31:47 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff8804795b6000
      Aug  6 13:31:47 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff8804795b6000
      Aug  6 13:31:47 lustrefs-sys-oss06 kernel: LustreError: 8060:0:(ldlm_lib.c:2685:target_bulk_io()) @@@ network error on bulk GET 0(1048576)  req@ffff88026672a850 x1442644628631725/t0(0) o4->4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib:0/0 lens 456/416 e 1 to 0 dl 1375813926 ref 1 fl Interpret:/0/0 rc 0/0
      Aug  6 13:31:47 lustrefs-sys-oss06 kernel: Lustre: lustrefssys-OST0006: Bulk IO write error with 4ece0c04-00b5-aedd-f612-11cbcc7fb566 (at 10.148.0.143@o2ib), client will retry: rc -110
      Aug  6 13:32:06 lustrefs-sys-oss06 kernel: Lustre: lustrefssys-OST0006: Client 4ece0c04-00b5-aedd-f612-11cbcc7fb566 (at 10.148.0.143@o2ib) reconnecting
      Aug  6 13:32:06 lustrefs-sys-oss06 kernel: Lustre: 7930:0:(ldlm_lib.c:952:target_handle_connect()) lustrefssys-OST0006: connection from 4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib t9859661 exp ffff88025da7b000 cur 1375813926 last 1375813926
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff8802591e8000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff8802591e8000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 8056:0:(ldlm_lib.c:2685:target_bulk_io()) @@@ network error on bulk GET 0(1048576)  req@ffff880264a09800 x1442644628631910/t0(0) o4->4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib:0/0 lens 456/416 e 0 to 0 dl 1375813969 ref 1 fl Interpret:/2/0 rc 0/0
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff8802592bc000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff8802592bc000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff8802592be000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff8802592be000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: Lustre: lustrefssys-OST0006: Bulk IO write error with 4ece0c04-00b5-aedd-f612-11cbcc7fb566 (at 10.148.0.143@o2ib), client will retry: rc -110
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff8802592c4000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff8802592c4000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff88045232c000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff88045232c000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff88045232e000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff88045232e000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880452330000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880452330000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880452332000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880452332000
      Aug  6 13:32:19 lustrefs-sys-oss06 kernel: LustreError: 8056:0:(ldlm_lib.c:2685:target_bulk_io()) Skipped 7 previous similar messages
      Aug  6 13:32:43 lustrefs-sys-oss06 kernel: Lustre: lustrefssys-OST0006: Client 4ece0c04-00b5-aedd-f612-11cbcc7fb566 (at 10.148.0.143@o2ib) reconnecting
      Aug  6 13:32:43 lustrefs-sys-oss06 kernel: Lustre: 7930:0:(ldlm_lib.c:952:target_handle_connect()) lustrefssys-OST0006: connection from 4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib t9859661 exp ffff88025da7b000 cur 1375813963 last 1375813963
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880259354000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880259354000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 8056:0:(ldlm_lib.c:2685:target_bulk_io()) @@@ network error on bulk GET 0(1048576)  req@ffff88025a3f6800 x1442644628631943/t0(0) o4->4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib:0/0 lens 456/416 e 0 to 0 dl 1375814006 ref 1 fl Interpret:/2/0 rc 0/0
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880259356000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880259356000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880259366000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: Lustre: lustrefssys-OST0006: Bulk IO write error with 4ece0c04-00b5-aedd-f612-11cbcc7fb566 (at 10.148.0.143@o2ib), client will retry: rc -110
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: Lustre: Skipped 7 previous similar messages
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880259366000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff8802592dc000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff8802592dc000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff8804523b2000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff8804523b2000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff88047a6b0000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff88047a6b0000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880259360000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880259360000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880259358000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880259358000
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: LustreError: 8056:0:(ldlm_lib.c:2685:target_bulk_io()) Skipped 7 previous similar messages
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: Lustre: 2209:0:(o2iblnd_cb.c:2341:kiblnd_passive_connect()) Conn race 10.148.0.143@o2ib
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: Lustre: lustrefssys-OST0006: Client 4ece0c04-00b5-aedd-f612-11cbcc7fb566 (at 10.148.0.143@o2ib) reconnecting
      Aug  6 13:32:56 lustrefs-sys-oss06 kernel: Lustre: 7930:0:(ldlm_lib.c:952:target_handle_connect()) lustrefssys-OST0006: connection from 4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib t9859661 exp ffff88025da7b000 cur 1375813976 last 1375813976
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880259366000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880259366000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880259354000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 8053:0:(ldlm_lib.c:2685:target_bulk_io()) @@@ network error on bulk GET 0(1048576)  req@ffff880264abf400 x1442644628631952/t0(0) o4->4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib:0/0 lens 456/416 e 0 to 0 dl 1375814019 ref 1 fl Interpret:/2/0 rc 0/0
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: Lustre: lustrefssys-OST0006: Bulk IO write error with 4ece0c04-00b5-aedd-f612-11cbcc7fb566 (at 10.148.0.143@o2ib), client will retry: rc -110
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: Lustre: Skipped 7 previous similar messages
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880259354000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880259358000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880259358000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff880259208000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff880259208000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff8804523b2000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff8804523b2000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff88047a6b0000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff88047a6b0000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff88045236c000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff88045236c000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff8804523e2000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff8804523e2000
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: Lustre: 2208:0:(o2iblnd_cb.c:2341:kiblnd_passive_connect()) Conn race 10.148.0.143@o2ib
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: Lustre: lustrefssys-OST0006: Client 4ece0c04-00b5-aedd-f612-11cbcc7fb566 (at 10.148.0.143@o2ib) reconnecting
      Aug  6 13:33:09 lustrefs-sys-oss06 kernel: Lustre: 7930:0:(ldlm_lib.c:952:target_handle_connect()) lustrefssys-OST0006: connection from 4ece0c04-00b5-aedd-f612-11cbcc7fb566@10.148.0.143@o2ib t9859661 exp ffff88025da7b000 cur 1375813989 last 1375813989
      Aug  6 13:33:22 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 4, status -103, desc ffff8804523e2000
      Aug  6 13:33:22 lustrefs-sys-oss06 kernel: LustreError: 7738:0:(events.c:396:server_bulk_callback()) event type 2, status -103, desc ffff8804523e2000
      

      lustre-client:

      Aug  6 13:31:41 lustrefs-sys-mds1 kernel: Lustre: 9547:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for sent delay: [sent 1375813894/real 0]  req@ffff8802cb514400 x1442644628631729/t0(0) o4->lustrefssys-OST0006-osc-ffff88027c008800@10.148.0.150@o2ib:6/4 lens 456/416 e 0 to 1 dl 1375813901 ref 3 fl Rpc:X/0/ffffffff rc 0/-1
      Aug  6 13:31:41 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection to lustrefssys-OST0006 (at 10.148.0.150@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Aug  6 13:31:47 lustrefs-sys-mds1 kernel: LustreError: 9527:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff8802a1370000
      Aug  6 13:32:06 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection restored to lustrefssys-OST0006 (at 10.148.0.150@o2ib)
      Aug  6 13:32:18 lustrefs-sys-mds1 kernel: Lustre: 9547:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1375813926/real 1375813926]  req@ffff8802cb514400 x1442644628631911/t0(0) o4->lustrefssys-OST0006-osc-ffff88027c008800@10.148.0.150@o2ib:6/4 lens 456/416 e 0 to 1 dl 1375813938 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
      Aug  6 13:32:18 lustrefs-sys-mds1 kernel: Lustre: 9547:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
      Aug  6 13:32:18 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection to lustrefssys-OST0006 (at 10.148.0.150@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Aug  6 13:32:19 lustrefs-sys-mds1 kernel: LustreError: 9528:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3c6000
      Aug  6 13:32:19 lustrefs-sys-mds1 kernel: LustreError: 9521:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c244000
      Aug  6 13:32:19 lustrefs-sys-mds1 kernel: LustreError: 9526:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d2ae000
      Aug  6 13:32:19 lustrefs-sys-mds1 kernel: LustreError: 9527:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3cc000
      Aug  6 13:32:19 lustrefs-sys-mds1 kernel: LustreError: 9523:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d272000
      Aug  6 13:32:19 lustrefs-sys-mds1 kernel: LustreError: 9524:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d23c000
      Aug  6 13:32:19 lustrefs-sys-mds1 kernel: LustreError: 9525:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff8802a1370000
      Aug  6 13:32:19 lustrefs-sys-mds1 kernel: LustreError: 9522:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c362000
      Aug  6 13:32:43 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection restored to lustrefssys-OST0006 (at 10.148.0.150@o2ib)
      Aug  6 13:32:56 lustrefs-sys-mds1 kernel: LustreError: 9523:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d23c000
      Aug  6 13:32:56 lustrefs-sys-mds1 kernel: LustreError: 9521:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d272000
      Aug  6 13:32:56 lustrefs-sys-mds1 kernel: LustreError: 9525:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3cc000
      Aug  6 13:32:56 lustrefs-sys-mds1 kernel: LustreError: 9524:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c244000
      Aug  6 13:32:56 lustrefs-sys-mds1 kernel: LustreError: 9526:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3c6000
      Aug  6 13:32:56 lustrefs-sys-mds1 kernel: LustreError: 9527:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c362000
      Aug  6 13:32:56 lustrefs-sys-mds1 kernel: LustreError: 9522:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d2ae000
      Aug  6 13:32:56 lustrefs-sys-mds1 kernel: LustreError: 9528:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff8802a1370000
      Aug  6 13:32:56 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection to lustrefssys-OST0006 (at 10.148.0.150@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Aug  6 13:32:56 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection restored to lustrefssys-OST0006 (at 10.148.0.150@o2ib)
      Aug  6 13:33:09 lustrefs-sys-mds1 kernel: LustreError: 9528:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c244000
      Aug  6 13:33:09 lustrefs-sys-mds1 kernel: LustreError: 9521:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff8802a1370000
      Aug  6 13:33:09 lustrefs-sys-mds1 kernel: LustreError: 9522:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3cc000
      Aug  6 13:33:09 lustrefs-sys-mds1 kernel: LustreError: 9527:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d23c000
      Aug  6 13:33:09 lustrefs-sys-mds1 kernel: LustreError: 9526:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c362000
      Aug  6 13:33:09 lustrefs-sys-mds1 kernel: LustreError: 9525:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d272000
      Aug  6 13:33:09 lustrefs-sys-mds1 kernel: LustreError: 9524:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d2ae000
      Aug  6 13:33:09 lustrefs-sys-mds1 kernel: LustreError: 9523:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3c6000
      Aug  6 13:33:09 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection to lustrefssys-OST0006 (at 10.148.0.150@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Aug  6 13:33:09 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection restored to lustrefssys-OST0006 (at 10.148.0.150@o2ib)
      Aug  6 13:33:22 lustrefs-sys-mds1 kernel: LustreError: 9526:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c362000
      Aug  6 13:33:22 lustrefs-sys-mds1 kernel: LustreError: 9521:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d23c000
      Aug  6 13:33:22 lustrefs-sys-mds1 kernel: LustreError: 9525:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3cc000
      Aug  6 13:33:22 lustrefs-sys-mds1 kernel: LustreError: 9528:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3c6000
      Aug  6 13:33:22 lustrefs-sys-mds1 kernel: LustreError: 9524:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c244000
      Aug  6 13:33:22 lustrefs-sys-mds1 kernel: LustreError: 9527:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d272000
      Aug  6 13:33:22 lustrefs-sys-mds1 kernel: LustreError: 9522:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d2ae000
      Aug  6 13:33:22 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection to lustrefssys-OST0006 (at 10.148.0.150@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Aug  6 13:33:22 lustrefs-sys-mds1 kernel: LustreError: 9523:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff8802a1370000
      Aug  6 13:33:22 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection restored to lustrefssys-OST0006 (at 10.148.0.150@o2ib)
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: LustreError: 9525:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3cc000
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: LustreError: 9527:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d23c000
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: Lustre: 9547:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has failed due to network error: [sent 1375814002/real 1375814002]  req@ffff8802cb514400 x1442644628631991/t0(0) o4->lustrefssys-OST0006-osc-ffff88027c008800@10.148.0.150@o2ib:6/4 lens 456/416 e 0 to 1 dl 1375814019 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: Lustre: 9547:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 32 previous similar messages
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: LustreError: 9523:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c362000
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: LustreError: 9528:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d272000
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: LustreError: 9524:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff8802a1370000
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: LustreError: 9526:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c244000
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: LustreError: 9521:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3c6000
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection to lustrefssys-OST0006 (at 10.148.0.150@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Aug  6 13:33:35 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection restored to lustrefssys-OST0006 (at 10.148.0.150@o2ib)
      Aug  6 13:33:48 lustrefs-sys-mds1 kernel: LustreError: 9523:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff8802a1370000
      Aug  6 13:33:48 lustrefs-sys-mds1 kernel: LustreError: 9527:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c244000
      Aug  6 13:33:48 lustrefs-sys-mds1 kernel: LustreError: 9524:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d272000
      Aug  6 13:33:48 lustrefs-sys-mds1 kernel: LustreError: 9525:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3cc000
      Aug  6 13:33:48 lustrefs-sys-mds1 kernel: LustreError: 9522:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d2ae000
      Aug  6 13:33:48 lustrefs-sys-mds1 kernel: LustreError: 9528:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c362000
      Aug  6 13:33:48 lustrefs-sys-mds1 kernel: LustreError: 9526:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d23c000
      Aug  6 13:33:48 lustrefs-sys-mds1 kernel: LustreError: 9521:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3c6000
      Aug  6 13:33:48 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection to lustrefssys-OST0006 (at 10.148.0.150@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Aug  6 13:33:48 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection restored to lustrefssys-OST0006 (at 10.148.0.150@o2ib)
      Aug  6 13:34:01 lustrefs-sys-mds1 kernel: LustreError: 9521:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c362000
      Aug  6 13:34:01 lustrefs-sys-mds1 kernel: LustreError: 9527:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d272000
      Aug  6 13:34:01 lustrefs-sys-mds1 kernel: LustreError: 9525:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c244000
      Aug  6 13:34:01 lustrefs-sys-mds1 kernel: LustreError: 9528:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3c6000
      Aug  6 13:34:01 lustrefs-sys-mds1 kernel: LustreError: 9524:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d23c000
      Aug  6 13:34:01 lustrefs-sys-mds1 kernel: LustreError: 9526:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3cc000
      Aug  6 13:34:01 lustrefs-sys-mds1 kernel: LustreError: 9523:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d2ae000
      Aug  6 13:34:01 lustrefs-sys-mds1 kernel: LustreError: 9522:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff8802a1370000
      Aug  6 13:34:01 lustrefs-sys-mds1 kernel: LustreError: 11-0: an error occurred while communicating with 10.148.0.150@o2ib. The ost_connect operation failed with -16
      Aug  6 13:34:26 lustrefs-sys-mds1 kernel: Lustre: lustrefssys-OST0006-osc-ffff88027c008800: Connection restored to lustrefssys-OST0006 (at 10.148.0.150@o2ib)
      Aug  6 13:34:39 lustrefs-sys-mds1 kernel: LustreError: 9521:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d2ae000
      Aug  6 13:34:39 lustrefs-sys-mds1 kernel: LustreError: 9525:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff8802a1370000
      Aug  6 13:34:39 lustrefs-sys-mds1 kernel: LustreError: 9528:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3c6000
      Aug  6 13:34:39 lustrefs-sys-mds1 kernel: LustreError: 9523:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029d272000
      Aug  6 13:34:39 lustrefs-sys-mds1 kernel: LustreError: 9524:0:(events.c:203:client_bulk_callback()) event type 0, status -5, desc ffff88029c3cc000
      

      Client mount options: -o nochecksum -o flock

      Intel Truescale IB module opts: singleport=1 krcvqs=3 pcie_caps=0x51 rcvhdrcnt=4096
      Lustre module options: ko2iblnd map_on_demand=32

      Filesystem description:
      2.1.6 server and client. 18 OSS, 1 OST per OSS. Intel Truescale QDR single rail.

      Note: machine mds1 used as client. Not currently configured as an MDS.

      Attachments

        Activity

          People

            wc-triage WC Triage
            aeonjeffj Jeff Johnson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: