[LU-5167] parallel-scale test iorssf: ERROR: Input/output error Created: 10/Jun/14  Updated: 20/Jul/16

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/61/
Distro/Arch: RHEL6.5/x86_64 + SLES11SP3/x86_64 (Server + Client)


Severity: 3
Rank (Obsolete): 14246

 Description   

parallel-scale test iorssf failed as follows:

** error **
ERROR in aiori-POSIX.c (line 316): cannot close file.
ERROR: Input/output error
** exiting **

Dmesg on client node:

[79443.603331] Lustre: DEBUG MARKER: == parallel-scale test iorssf: iorssf ================================================================ 08:39:21 (1402241961)
[79539.008049] Lustre: 4915:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1402242048/real 0]  req@ffff88000b268800 x1470307796710476/t0(0) o103->lustre-OST0006-osc-ffff880009b9b000@10.1.6.247@tcp:17/18 lens 328/224 e 0 to 1 dl 1402242057 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
[79539.008055] Lustre: 4915:0:(client.c:1908:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
[79539.008068] Lustre: lustre-OST0006-osc-ffff880009b9b000: Connection to lustre-OST0006 (at 10.1.6.247@tcp) was lost; in progress operations using this service will wait for recovery to complete
[79543.442585] LustreError: 167-0: lustre-OST0006-osc-ffff880009b9b000: This client was evicted by lustre-OST0006; in progress operations using this service will fail.
[79543.506888] Lustre: lustre-OST0006-osc-ffff880009b9b000: Connection restored to lustre-OST0006 (at 10.1.6.247@tcp)
[79543.506891] Lustre: Skipped 1 previous similar message
[79619.930641] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale test_iorssf: @@@@@@ FAIL: ior failed! 1

Console log on OSS node:

09:45:22:Lustre: DEBUG MARKER: == parallel-scale test iorssf: iorssf ================================================================ 08:39:21 (1402241961)
09:45:22:Lustre: 10808:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1402242048/real 1402242048]  req@ffff880033dd8400 x1470305068132704/t0(0) o104->lustre-OST0006@10.1.6.249@tcp:15/16 lens 296/224 e 0 to 1 dl 1402242055 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
09:45:22:Lustre: 10808:0:(client.c:1908:ptlrpc_expire_one_request()) Skipped 1 previous similar message
09:45:22:LustreError: 138-a: lustre-OST0006: A client on nid 10.1.6.249@tcp was evicted due to a lock blocking callback time out: rc -107
09:45:22:LustreError: Skipped 26 previous similar messages
09:45:22:LustreError: 10775:0:(ldlm_lib.c:2702:target_bulk_io()) @@@ Eviction on bulk GET  req@ffff88002ba61400 x1470307796710396/t0(0) o4->b4770fd0-a408-2d95-7c2a-fb5d5e6f4b61@10.1.6.249@tcp:0/0 lens 488/448 e 1 to 0 dl 1402242082 ref 1 fl Interpret:/0/0 rc 0/0
09:45:22:Lustre: lustre-OST0006: Bulk IO write error with b4770fd0-a408-2d95-7c2a-fb5d5e6f4b61 (at 10.1.6.249@tcp), client will retry: rc -107
09:45:22:Lustre: Skipped 8 previous similar messages
09:45:22:LustreError: 10775:0:(ldlm_lib.c:2702:target_bulk_io()) Skipped 1 previous similar message
09:45:22:LustreError: 10869:0:(ldlm_lockd.c:2300:ldlm_cancel_handler()) ldlm_cancel from 10.1.6.249@tcp arrived at 1402242061 with bad export cookie 6818992319885353493
09:45:22:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale test_iorssf: @@@@@@ FAIL: ior failed! 1 
09:45:22:Lustre: DEBUG MARKER: parallel-scale test_iorssf: @@@@@@ FAIL: ior failed! 1

Maloo report: https://maloo.whamcloud.com/test_sets/a12994bc-ef55-11e3-9713-52540035b04c



 Comments   
Comment by Jian Yu [ 10/Jun/14 ]

The same test in another test run passed: https://maloo.whamcloud.com/test_sets/a5fb05f4-ef85-11e3-b8c2-52540035b04c

Comment by Jian Yu [ 10/Jun/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/59/
Distro/Arch: RHEL6.5/x86_64 + SLES11SP2/x86_64 (Server + Client)

The same failure occurred: https://maloo.whamcloud.com/test_sets/6aecca88-e891-11e3-a0dd-52540035b04c

Generated at Sat Feb 10 01:49:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.