Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.5.0
-
None
-
client and server: lustre-b2_5 build #2 RHEL6 ldiskfs
-
3
-
11302
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/de5e20d0-399a-11e3-8e4c-52540035b04c.
The sub-test test_failover_mds failed with the following error:
test_failover_mds returned 1
ost console:
11:33:23:Lustre: 5067:0:(client.c:1897:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1382121196/real 1382121196] req@ffff880068bd7400 x1449258849931308/t0(0) o104->lustre-OST0003@10.10.4.185@tcp:15/16 lens 296/224 e 0 to 1 dl 1382121203 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 11:33:23:Lustre: 5067:0:(client.c:1897:ptlrpc_expire_one_request()) Skipped 7 previous similar messages 11:33:23:LustreError: 138-a: lustre-OST0003: A client on nid 10.10.4.185@tcp was evicted due to a lock blocking callback time out: rc -107 11:33:23:LustreError: 10256:0:(ldlm_lockd.c:662:ldlm_handle_ast_error()) ### client (nid 10.10.4.185@tcp) returned 0 from blocking AST ns: filter-lustre-OST0006_UUID lock: ffff880022cd2940/0xfac1c2f1448c0462 lrc: 4/0,0 mode: PR/PR res: [0x4b:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000000010020 nid: 10.10.4.185@tcp remote: 0x7b8061ee0cd77238 expref: 133 pid: 5833 timeout: 4295554319 lvb_type: 1 11:33:23:LustreError: 10257:0:(ldlm_lockd.c:662:ldlm_handle_ast_error()) ### client (nid 10.10.4.185@tcp) returned 0 from blocking AST ns: filter-lustre-OST0006_UUID lock: ffff8800668a3580/0xfac1c2f1448bfad7 lrc: 4/0,0 mode: PR/PR res: [0x1f:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000000010020 nid: 10.10.4.185@tcp remote: 0x7b8061ee0cd75b7f expref: 133 pid: 5834 timeout: 4295554409 lvb_type: 1 11:33:23:LustreError: 10188:0:(ldlm_lib.c:2698:target_bulk_io()) @@@ Eviction on bulk GET req@ffff880053968400 x1449258870970988/t0(0) o4->9b574173-5156-a592-3ac3-b5e341956c1f@10.10.4.185@tcp:0/0 lens 488/448 e 0 to 0 dl 1382121246 ref 1 fl Interpret:/0/0 rc 0/0 11:33:23:Lustre: lustre-OST0006: Bulk IO write error with 9b574173-5156-a592-3ac3-b5e341956c1f (at 10.10.4.185@tcp), client will retry: rc -107 11:33:23:LustreError: 138-a: lustre-OST0005: A client on nid 10.10.4.185@tcp was evicted due to a lock blocking callback time out: rc -107 11:33:23:LustreError: Skipped 1 previous similar message 11:33:24:LustreError: 10225:0:(ldlm_lockd.c:662:ldlm_handle_ast_error()) ### client (nid 10.10.4.185@tcp) returned 0 from blocking AST ns: filter-lustre-OST0002_UUID lock: ffff880022cd2340/0xfac1c2f1448c0477 lrc: 4/0,0 mode: PR/PR res: [0x4b:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000000010020 nid: 10.10.4.185@tcp remote: 0x7b8061ee0cd77277 expref: 156 pid: 5833 timeout: 4295554644 lvb_type: 1 11:33:24:LustreError: 10225:0:(ldlm_lockd.c:662:ldlm_handle_ast_error()) Skipped 18 previous similar messages 11:33:24:LustreError: 5068:0:(ofd_grant.c:163:ofd_grant_sanity_check()) ofd_statfs: tot_granted 77870080 != fo_tot_granted 77874176 11:33:24:LustreError: 5068:0:(ofd_grant.c:166:ofd_grant_sanity_check()) ofd_statfs: tot_pending 7340032 != fo_tot_pending 7344128 11:33:24:LustreError: 138-a: lustre-OST0004: A client on nid 10.10.4.185@tcp was evicted due to a lock blocking callback time out: rc -107 11:33:24:LustreError: Skipped 1 previous similar message 11:33:24:LustreError: 9818:0:(ldlm_lib.c:2698:target_bulk_io()) @@@ Eviction on bulk GET req@ffff88004269a000 x1449258870971012/t0(0) o4->9b574173-5156-a592-3ac3-b5e341956c1f@10.10.4.185@tcp:0/0 lens 488/448 e 0 to 0 dl 1382121246 ref 1 fl Interpret:/0/0 rc 0/0 11:33:24:Lustre: lustre-OST0002: Bulk IO write error with 9b574173-5156-a592-3ac3-b5e341956c1f (at 10.10.4.185@tcp), client will retry: rc -107 11:33:24:LustreError: 5068:0:(ofd_grant.c:163:ofd_grant_sanity_check()) ofd_statfs: tot_granted 66335744 != fo_tot_granted 66343936 11:33:24:LustreError: 5068:0:(ofd_grant.c:166:ofd_grant_sanity_check()) ofd_statfs: tot_pending 5242880 != fo_tot_pending 5251072 11:33:24:LustreError: 5072:0:(ldlm_lib.c:2698:target_bulk_io()) @@@ Eviction on bulk GET req@ffff8800441a9400 x1449258870971096/t0(0) o4->9b574173-5156-a592-3ac3-b5e341956c1f@10.10.4.185@tcp:0/0 lens 488/448 e 0 to 0 dl 1382121246 ref 1 fl Interpret:/0/0 rc 0/0 11:33:24:LustreError: 5072:0:(ldlm_lib.c:2698:target_bulk_io()) Skipped 4 previous similar messages 11:33:24:Lustre: lustre-OST0002: Bulk IO write error with 9b574173-5156-a592-3ac3-b5e341956c1f (at 10.10.4.185@tcp), client will retry: rc -107 11:33:24:Lustre: Skipped 4 previous similar messages 11:33:24:LustreError: 5068:0:(ofd_grant.c:163:ofd_grant_sanity_check()) ofd_statfs: tot_granted 74724352 != fo_tot_granted 74728448 11:33:24:LustreError: 5068:0:(ofd_grant.c:166:ofd_grant_sanity_check()) ofd_statfs: tot_pending 5242880 != fo_tot_pending 5246976 11:33:25:LustreError: 5057:0:(ldlm_lockd.c:2299:ldlm_cancel_handler()) ldlm_cancel from 10.10.4.185@tcp arrived at 1382121205 with bad export cookie 18068937521480070292 11:33:25:LustreError: 5057:0:(ldlm_lockd.c:2299:ldlm_cancel_handler()) ldlm_cancel from 10.10.4.185@tcp arrived at 1382121205 with bad export cookie 18068937521480070334 11:45:00:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Duration: 86400 11:45:00:Server failover period: 900 seconds 11:45:01:Exited after: 82 seconds 11:45:01:Number of failovers before exit: 11:45:01:mds1: 1 times 11:45:01:ost1: 0 times 11:45:01:ost2: 0 times 11:45:01:ost3: 0 times 11:45:01:ost4: 0 times 11:45:02:ost5: 0 times 11:45:02:ost6: 0 times