Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.1.0
-
None
-
lustre-master/rhel6-x86_64/#114
client-5 is client, fat-intel-2 is ost
-
3
-
10519
Description
system hang when running replay-single test_70b with quota enabled.
----------client-5 syslog---------
Lustre: DEBUG MARKER: == replay-single test 70b: mds recovery; 2 clients == 18:43:25 (1305251005)
Lustre: 31242:0:(debug.c:320:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
Lustre: 31242:0:(debug.c:320:libcfs_debug_str2mask()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: Started rundbench load pid=31244 ...
Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
Lustre: DEBUG MARKER: test_70b fail mds1 1 times
Lustre: 22450:0:(import.c:529:import_select_connection()) lustre-MDT0000-mdc-ffff88022c570000: tried all connections, increasing latency to 21s
Lustre: 22450:0:(import.c:529:import_select_connection()) Skipped 15 previous similar messages
LustreError: 166-1: MGC192.168.4.128@o2ib: Connection to service MGS via nid 192.168.4.128@o2ib was lost; in progress operations using this service will fail.
LustreError: Skipped 4 previous similar messages
Lustre: 22449:0:(import.c:885:ptlrpc_connect_interpret()) MGS@192.168.4.128@o2ib changed server handle from 0xe7668c940984dcd4 to 0xe7668c9409873140
Lustre: MGC192.168.4.128@o2ib: Reactivating import
Lustre: Skipped 4 previous similar messages
Lustre: MGC192.168.4.128@o2ib: Connection restored to service MGS using nid 192.168.4.128@o2ib.
Lustre: Skipped 15 previous similar messages
LustreError: 22449:0:(client.c:2570:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff8801de49d400 x1368643764028074/t519691043276(519691043276) o-1->lustre-MDT0000_UUID@192.168.4.128@o2ib:12/10 lens 552/544 e 0 to 0 dl 1305251097 ref 2 fl Interpret:RP/ffffffff/ffffffff rc 301/-1
Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
Lustre: DEBUG MARKER: test_70b fail mds1 2 times
Lustre: 22449:0:(import.c:885:ptlrpc_connect_interpret()) MGS@192.168.4.128@o2ib changed server handle from 0xe7668c9409873140 to 0xe7668c940987f1ea
Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
Lustre: DEBUG MARKER: test_70b fail mds1 3 times
Lustre: 22449:0:(import.c:885:ptlrpc_connect_interpret()) MGS@192.168.4.128@o2ib changed server handle from 0xe7668c940987f1ea to 0xe7668c940988fe75
Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
Lustre: DEBUG MARKER: test_70b fail mds1 4 times
LustreError: 11-0: an error occurred while communicating with 192.168.4.128@o2ib. The obd_ping operation failed with -107
LustreError: Skipped 14 previous similar messages
Lustre: 22449:0:(import.c:885:ptlrpc_connect_interpret()) MGS@192.168.4.128@o2ib changed server handle from 0xe7668c940988fe75 to 0xe7668c940a3306d6
Lustre: lustre-MDT0000-mdc-ffff88022c570000: Connection to service lustre-MDT0000 via nid 192.168.4.128@o2ib was lost; in progress operations using this service will wait for recovery to complete.
Lustre: Skipped 6 previous similar messages
LustreError: 22449:0:(client.c:2570:ptlrpc_replay_interpret()) @@@ status 116, old was 0 req@ffff880309152800 x1368643764039222/t532575946369(532575946369) o-1>lustre-MDT0000_UUID@192.168.4.128@o2ib:23/10 lens 360/424 e 0 to 0 dl 1305251472 ref 2 fl Interpret:R/ffffffff/ffffffff rc -116/-1
LustreError: 22449:0:(client.c:2570:ptlrpc_replay_interpret()) Skipped 10 previous similar messages
Lustre: 22448:0:(client.c:1775:ptlrpc_expire_one_request()) @@@ Request x1368643764472774 sent from lustre-MDT0000-mdc-ffff88022c570000 to NID 192.168.4.128@o2ib has timed out for slow reply: [sent 1305251440] [real_sent 1305251440] [current 1305251467] [deadline 27s] [delay 0s] req@ffff8802ca0e5400 x1368643764472774/t0(0) o-1->lustre-MDT0000_UUID@192.168.4.128@o2ib:12/10 lens 192/192 e 0 to 1 dl 1305251467 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
Lustre: 22448:0:(client.c:1775:ptlrpc_expire_one_request()) Skipped 19 previous similar messages
----------fat-intel-2(ost)syslog-------
Lustre: DEBUG MARKER: == replay-single test 70b: mds recovery; 2 clients == 18:43:25 (1305251005)
Lustre: DEBUG MARKER: Started rundbench load pid=31244 ...
Lustre: DEBUG MARKER: test_70b fail mds1 1 times
Lustre: 17517:0:(client.c:1775:ptlrpc_expire_one_request()) @@@ Request x1368582162267244 sent from MGC192.168.4.128@o2ib to NID 192.168.4.128@o2ib has timed out for slow reply: [sent 1305251017] [real_sent 1305251017] [current 1305251024] [deadline 7s] [delay 0s] req@ffff880555595c00 x1368582162267244/t0(0) o-1->MGS@MGC192.168.4.128@o2ib_0:26/25 lens 192/192 e 0 to 1 dl 1305251024 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
Lustre: 17517:0:(client.c:1775:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
LustreError: 166-1: MGC192.168.4.128@o2ib: Connection to service MGS via nid 192.168.4.128@o2ib was lost; in progress operations using this service will fail.
LustreError: Skipped 5 previous similar messages
Lustre: 1115:0:(ldlm_lib.c:800:target_handle_connect()) lustre-OST0000: received new MDS connection from NID 192.168.4.128@o2ib, removing former export from same NID
Lustre: 1115:0:(ldlm_lib.c:800:target_handle_connect()) Skipped 29 previous similar messages
Lustre: 1115:0:(ldlm_lib.c:871:target_handle_connect()) lustre-OST0000: connection from lustre-MDT0000-mdtlov_UUID@192.168.4.128@o2ib t0 exp (null) cur 1305251030 last 0
Lustre: 1115:0:(ldlm_lib.c:871:target_handle_connect()) Skipped 54 previous similar messages
Lustre: 1115:0:(filter.c:2806:filter_connect()) lustre-OST0000: Received MDS connection (0xc18765fcb6c04773); group 0
Lustre: 1115:0:(filter.c:2806:filter_connect()) Skipped 46 previous similar messages
Lustre: 1115:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import lustre-OST0000->NET_0x50000c0a80480_UUID netid 50000: select flavor null
Lustre: 1115:0:(sec.c:1474:sptlrpc_import_sec_adapt()) Skipped 57 previous similar messages
Lustre: 17519:0:(import.c:529:import_select_connection()) MGC192.168.4.128@o2ib: tried all connections, increasing latency to 6s
Lustre: 17519:0:(import.c:529:import_select_connection()) Skipped 1 previous similar message
Lustre: 17518:0:(import.c:885:ptlrpc_connect_interpret()) MGS@MGC192.168.4.128@o2ib_0 changed server handle from 0xe7668c940984dce2 to 0xe7668c9409873178
Lustre: MGC192.168.4.128@o2ib: Reactivating import
Lustre: Skipped 4 previous similar messages
Lustre: MGC192.168.4.128@o2ib: Connection restored to service MGS using nid 192.168.4.128@o2ib.
Lustre: Skipped 4 previous similar messages
Lustre: lustre-OST0001: received MDS connection from 192.168.4.128@o2ib
Lustre: lustre-OST0000: received MDS connection from 192.168.4.128@o2ib
Lustre: Skipped 29 previous similar messages
Lustre: 1117:0:(lustre_log.h:471:llog_group_set_export()) lustre-OST0004: export for group 0 is changed: 0xffff880283c42400 -> 0xffff8806058d4000
Lustre: 1113:0:(llog_net.c:168:llog_receptor_accept()) changing the import ffff8803254cb800 - ffff88061ba07000
Lustre: 1117:0:(lustre_log.h:471:llog_group_set_export()) Skipped 46 previous similar messages
Lustre: 1113:0:(llog_net.c:168:llog_receptor_accept()) Skipped 58 previous similar messages
Lustre: 1125:0:(filter.c:2510:filter_llog_connect()) lustre-OST0003: Recovery from log 0x21b165/0x0:b36580c9
Lustre: 1125:0:(filter.c:2510:filter_llog_connect()) Skipped 31 previous similar messages
Lustre: Skipped 2 previous similar messages
Lustre: DEBUG MARKER: test_70b fail mds1 2 times
Lustre: 17518:0:(import.c:885:ptlrpc_connect_interpret()) MGS@MGC192.168.4.128@o2ib_0 changed server handle from 0xe7668c9409873178 to 0xe7668c940987f276
Lustre: DEBUG MARKER: test_70b fail mds1 3 times
Lustre: 17518:0:(import.c:885:ptlrpc_connect_interpret()) MGS@MGC192.168.4.128@o2ib_0 changed server handle from 0xe7668c940987f276 to 0xe7668c940988fe28
Lustre: lustre-OST0002: haven't heard from client lustre-MDT0000-mdtlov_UUID (at 192.168.4.128@o2ib) in 54 seconds. I think it's dead, and I am evicting it. exp ffff88062d9b5000, cur 1305251420 expire 1305251390 last 1305251366
Lustre: lustre-OST0005: haven't heard from client lustre-MDT0000-mdtlov_UUID (at 192.168.4.128@o2ib) in 54 seconds. I think it's dead, and I am evicting it. exp ffff88061b45bc00, cur 1305251420 expire 1305251390 last 1305251366
Lustre: DEBUG MARKER: test_70b fail mds1 4 times
Lustre: 17518:0:(import.c:885:ptlrpc_connect_interpret()) MGS@MGC192.168.4.128@o2ib_0 changed server handle from 0xe7668c940988fe28 to 0xe7668c940a33069e