[LU-581] 1.8<->2.1 interop: sanity test 120: FAIL: 1 blocking RPC occured Created: 09/Aug/11  Updated: 27/May/15  Resolved: 27/May/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0, Lustre 2.1.1, Lustre 2.1.2, Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jian Yu Assignee: Lai Siyao
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Old Lustre Version: 1.8.6-wc1
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/

New Lustre Version: 2.0.66.0
Lustre Build: http://newbuild.whamcloud.com/job/lustre-master/228/

Clean upgrading (Lustre servers and clients were upgraded all at once) from Lustre 1.8.6-wc1 to Lustre 2.0.66.0 under the following configuration:

OSS1: RHEL5/x86_64
OSS2: RHEL5/x86_64
MDS: RHEL5/x86_64
Client1: RHEL6/x86_64
Client2: RHEL5/x86_64


Issue Links:
Related
is related to LU-4909 sanity test_120f failure: 1 blocking ... Resolved
Severity: 3
Bugzilla ID: 23,338
Rank (Obsolete): 4383

 Description   

After the upgrading, sanity test 120a failed on Lustre 2.0.66.0 as follows:

== sanity test 120a: Early Lock Cancel: mkdir test == 03:12:14 (1312884734)
ldlm.namespaces.lustre-MDT0000-mdc-ffff88031cc93400.lru_size=400
ldlm.namespaces.lustre-OST0000-osc-ffff88031cc93400.lru_size=400
ldlm.namespaces.lustre-OST0001-osc-ffff88031cc93400.lru_size=400
 sanity test_120a: @@@@@@ FAIL: 1 blocking RPC occured.

Please refer to the Maloo report for more logs: https://maloo.whamcloud.com/test_sets/a570d34e-c278-11e0-8bdf-52540025f9af

sanity test 120

{c,d,e,f}

also failed.

This is an known issue: bug 23338.



 Comments   
Comment by Jian Yu [ 16/Aug/11 ]

Lai Siyao would work on this ticket.

Comment by Lai Siyao [ 31/Aug/11 ]

The cause is because ELC (early lock cancel) flags is set in LMV layer, but for 1.8 <-> 2.1 case, LMV doesn't exist. I will move ELC flags setting into llite for 2.1.

Comment by Jian Yu [ 05/Sep/11 ]

The same issue occurred while running sanity tests after clean upgrading from Lustre 1.8.5/1.8.6-wc1 to 2.1.0:
https://maloo.whamcloud.com/test_sets/17b5046c-d7b3-11e0-8d02-52540025f9af

Comment by Lai Siyao [ 09/Oct/11 ]

review is on http://review.whamcloud.com/#change,1339

Comment by Jian Yu [ 24/Feb/12 ]

The same issue occurred while running sanity tests after clean upgrading from Lustre 1.8.7-wc1 to 2.1.1:
https://maloo.whamcloud.com/test_sets/3d45f592-5ee0-11e1-ab6b-5254004bbbd3

Comment by Jian Yu [ 07/Jun/12 ]

The same issue occurred while running sanity tests after clean upgrading from Lustre 1.8.8-wc1 to 2.1.2:

client-1: == sanity test 120a: Early Lock Cancel: mkdir test == 04:40:35 (1339069235)
client-1: ldlm.namespaces.lustre-MDT0000-mdc-ffff8802a90f9800.lru_size=400
client-1: ldlm.namespaces.lustre-OST0000-osc-ffff8802a90f9800.lru_size=400
client-1: ldlm.namespaces.lustre-OST0001-osc-ffff8802a90f9800.lru_size=400
client-1:  sanity test_120a: @@@@@@ FAIL: 1 blocking RPC occured.
client-1: Dumping lctl log to /home/yujian/test_logs/2012-06-07/043300/sanity.test_120a.*.1339069236.log
client-1: FAIL 120a (3s)

Maloo report: https://maloo.whamcloud.com/test_sets/c250c2d6-b0d3-11e1-99ce-52540035b04c

Comment by Lai Siyao [ 04/Jul/12 ]

Hi Yujian, the main cause is that 2.x config contains section for lmv, but 1.8 not. And some ELC logic is implemented in lmv layer. A simple way to fix it is to regenerate config logs, please refer to http://wiki.lustre.org/manual/LustreManual18_HTML/ConfiguringLustre.html 4.3.11.

If this works, do you think it's acceptable. IIRC, regenerating config logs is a common way to fix config issues in upgrade, is it?

Comment by Jian Yu [ 05/Jul/12 ]

Hi Lai,
Thanks for the explanation. Next time I perform 1.8->2.x upgrading testing, I'll regenerate config logs to run sanity test 120*. For the testing without regenerating config logs, I'll skip sanity test 120*.

Comment by Andreas Dilger [ 27/May/15 ]

Haven't seen this in a long time.

Generated at Sat Feb 10 01:08:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.