[LU-11662] interop: multiple sanity-hsm tests failing with master clients with 2.10.5 servers Created: 13/Nov/18  Updated: 27/Nov/18  Resolved: 27/Nov/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Critical
Reporter: James Nunez (Inactive) Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: interop
Environment:

master clients with 2.10.5 servers


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Recent landings to master have caused sanity-hsm tests 1b, 12c,f, g, h, m, o, p, 14, 15, 24a,b,d,e,f, 30b,c, 31b,c, 34, 35, 36, 37, 57, 58, 59, 110b, 222b,d, 223b, 228, 260c, 407 to fail for interop testing; master clients with 2.10.5 servers.

Looking at the failed test session at https://testing.whamcloud.com/test_sets/0de77f1c-e73e-11e8-b67f-52540065bddc , in the client 2 (vm6) console log, we see the following in most of the failed tests

[19290.833447] LustreError: 13720:0:(vvp_io.c:1495:vvp_io_init()) lustre: refresh file layout [0x200007936:0x6:0x0] error -61.

The above failure is for Lustre master 2.11.56.112 build #3824 clients and 2.10.5 servers. The last master build that we have interop test results for is for build #3821 and the testing passed. Nothing has landed to 2.10.5 for a couple of months. Thus, a patch causing this interop failure landed to master between builds #3821 and #3824.

This is what landed between build 3821 and 3824:

LU-11508 mdt: reject DoM file migration
LU-11468 lnet: set recovery interval from lnetctl
LU-6142 lov: Fix style issues for lov_ea.c
LU-6142 quota: Fix style issues for qsd_lock.c
LU-11380 mdc: move empty xattr handling to mdc layer
LU-11445 obd: remove portals handle from OBD import
LU-11599 ldlm: printing negative time on logs for recovery
LU-9538 utils: update description of ldiskfs xattrs
LU-11570 lnet: update changelog
LU-11561 ofd: return attr syncjournal to sync_journal
LU-11611 mdt: incorrect return value in mdt_reint_unlink
LU-11525 kernel: new kernel [RHEL7.6 3.10.0-957.el7]
Revert "LU-8130 ptlrpc: convert conn_hash to rhashtable"



 Comments   
Comment by John Hammond [ 14/Nov/18 ]

I think my change 0f42b38843 (LU-11380 mdc: move empty xattr handling to mdc layer) is the most likely culprit here.

Comment by Peter Jones [ 15/Nov/18 ]

Have you tried pushing a test to check this theory?

Comment by Gerrit Updater [ 15/Nov/18 ]

John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33663
Subject: LU-11662 mdc: revert "LU-11380 mdc: move empty xattr ..."
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 64b77da5828d50011f9d4d584a7820abae8c7fab

Comment by John Hammond [ 15/Nov/18 ]

Yes and will likely have a fix.

Comment by Gerrit Updater [ 15/Nov/18 ]

John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33665
Subject: LU-11662 llite: handle -ENODATA in ll_layout_fetch()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 99f2377b11b814842f6d0d4a21ffdebaf799ef96

Comment by Gerrit Updater [ 27/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33665/
Subject: LU-11662 llite: handle -ENODATA in ll_layout_fetch()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e3f367f3660dc53b690934d42bc5019a292d81bc

Comment by Peter Jones [ 27/Nov/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:45:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.