[LU-2520] replay-single test_85b: FAIL: unused locks are not canceled Created: 22/Dec/12  Updated: 09/Sep/16  Resolved: 08/Jan/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.1, Lustre 2.1.4, Lustre 2.1.6, Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: Hongchao Zhang
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Lustre Tag: v2_1_4_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/159/
Distro/Arch: RHEL6.3/x86_64
Test Group: failover


Issue Links:
Duplicate
is duplicated by LU-3797 replay-single test_85a test_85b: unus... Closed
Severity: 3
Rank (Obsolete): 5936

 Description   

replay-single test 85b failed as follows:

== replay-single test 85b: check the cancellation of unused locks during recovery(EXTENT) == 12:55:40 (1356123340)
error on ioctl 0x4008669a for '/mnt/lustre' (3): Cannot send after transport endpoint shutdown
error: setstripe: create stripe file '/mnt/lustre' failed
before recovery: unused locks count = 0
Failing ost1 on node client-27vm8

<~snip~>

Started lustre-OST0006
after recovery: unused locks count = 0
 replay-single test_85b: @@@@@@ FAIL: unused locks are not canceled

Maloo report: https://maloo.whamcloud.com/test_sets/5d18ad4c-4bb6-11e2-aa80-52540035b04c



 Comments   
Comment by Jian Yu [ 22/Dec/12 ]

The issue also exists in Lustre 2.1.1 release:
https://maloo.whamcloud.com/test_sets/54fc0fde-59fc-11e1-bce3-5254004bbbd3

Comment by Jian Yu [ 06/Jun/13 ]

Lustre Tag: v2_1_6_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/208/
Distro/Arch: RHEL6.4/x86_64
Test Group: failover

The same issue occurred again:
https://maloo.whamcloud.com/test_sets/fcb21874-cdf3-11e2-ba28-52540035b04c

Comment by Hongchao Zhang [ 08/Jul/13 ]

the patch is tracked at http://review.whamcloud.com/#/c/6905/

Comment by Hongchao Zhang [ 22/Jul/13 ]

the MGC could not be UP after failover, which cause the setstripe ioctl fail,

...
17:26:27:LustreError: 8459:0:(client.c:1060:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff8800339e3c00 x1429788769287186/t0(0) o255->MGC10.10.4.164@tcp@10.10.4.164@tcp:26/25 lens 1216/1216 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
17:26:27:LustreError: 8459:0:(client.c:1060:ptlrpc_import_delay_req()) Skipped 32 previous similar messages
17:26:27:LustreError: 8459:0:(dir.c:685:ll_send_mgc_param()) Failed to set parameter: -108
...

the patch is updated to try to fix this issue.

Comment by Oleg Drokin [ 23/Aug/13 ]

So I see 2.5 is marked as affected too, and I guess 2.4 to.
Do we need 2.4/2.5 patches too?

Comment by Jian Yu [ 26/Aug/13 ]

I did not hit this failure while testing Lustre b2_4 with failover configuration.

Comment by Nathaniel Clark [ 04/Sep/13 ]

This is being hit on master on ZFS (see LU-3797)

Comment by John Fuchs-Chesney (Inactive) [ 08/Jan/16 ]

Incomplete and out of date.
~ jfc.

Generated at Sat Feb 10 01:25:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.