[LU-11375] Client 2.10 fails to mount Server 2.11.54 Created: 13/Sep/18 Updated: 25/Oct/18 Resolved: 10/Oct/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Nathaniel Clark | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | interop | ||
| Environment: |
2 MDS, 2 OSS ~2.11.54_91 |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Mounting client fails with: # mount -t lustre 192.168.56.12@tcp:/zfs2 /mnt/fs2 mount.lustre: mount 192.168.56.12@tcp:/zfs2 at /mnt/fs2 failed: Bad address MDT0 has the following error in debug log: 00000100:02000000:1.0:1536858056.517840:0:2585:0:(import.c:1534:ptlrpc_import_recovery_state_machine()) zfs2-MDT0000: Connection restored to 880830c7-7d38-ac3b-c683-8d949f550eaa (at 192.168.56.30@tcp) 00000100:00020000:1.0:1536858056.518193:0:2585:0:(layout.c:2067:__req_capsule_get()) @@@ Wrong buffer for field `mdt_body' (1 of 1) in format `MDS_STATFS': 0 vs. 216 (client) req@ffff9a06cd249b00 x1611512228611360/t0(0) o41->1953ab2d-accd-2169-bea6-82979f16a7d4@192.168.56.30@tcp:452/0 lens 224/0 e 0 to 0 dl 1536858067 ref 1 fl Interpret:/0/ffffffff rc 0/-1 00000020:00000080:1.0:1536858056.524976:0:2585:0:(genops.c:1561:class_disconnect()) disconnect: cookie 0x3f757c9924a4d7eb Setup: |
| Comments |
| Comment by John Hammond [ 13/Sep/18 ] |
|
Likely from https://review.whamcloud.com/29136 |
| Comment by James Nunez (Inactive) [ 13/Sep/18 ] |
|
We run automated testing between 2.10.4 clients and master servers and we do see these tests failing in lustre-initialization. Looking at https://testing.whamcloud.com/test_sets/73262750-b59f-11e8-9df3-52540065bddc, we see the mount fail 2018-09-11T07:51:28 Starting client: trevis-19vm1.trevis.whamcloud.com: -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre 2018-09-11T07:51:28 CMD: trevis-19vm1.trevis.whamcloud.com mkdir -p /mnt/lustre 2018-09-11T07:51:28 CMD: trevis-19vm1.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre 2018-09-11T07:51:28 mount.lustre: mount trevis-19vm4@tcp:/lustre at /mnt/lustre failed: Bad address On that client (vm1), we see [ 221.250210] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre [ 221.731388] LustreError: 11-0: lustre-MDT0000-mdc-ffff90943bde8800: operation mds_statfs to node 10.9.4.227@tcp failed: rc = -14 [ 221.732646] LustreError: 15598:0:(lmv_obd.c:1387:lmv_statfs()) can't stat MDS #0 (lustre MDT0000-mdc-ffff90943bde8800), error -14 [ 221.738793] LustreError: 15598:0:(lov_obd.c:878:lov_cleanup()) lustre-clilov-ffff90943bde8800: lov tgt 0 not cleaned! deathrow=0, lovrc=1 [ 221.742206] Lustre: Unmounted lustre-client [ 221.743475] LustreError: 15598:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount (-14) |
| Comment by Gerrit Updater [ 13/Sep/18 ] |
|
Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33162 |
| Comment by Alex Zhuravlev [ 14/Sep/18 ] |
|
the patch isn't ready... I've met interesting issue - the initial MDS_STATFS is formed before the connection (and connection flags) is set up properly. so the client and the server have different connect flags. |
| Comment by James Nunez (Inactive) [ 18/Sep/18 ] |
|
Since master build #3787, it looks like Ubuntu 18.04 clients can't mount due to this issue. In the MDS console log, we see [ 191.353309] LustreError: 15466:0:(layout.c:2067:__req_capsule_get()) @@@ Wrong buffer for field `mdt_body' (1 of 1) in format `MDS_STATFS': 0 vs. 216 (client) Logs at |
| Comment by Alex Zhuravlev [ 18/Sep/18 ] |
|
please try the updated patch.
|
| Comment by Alex Zhuravlev [ 19/Sep/18 ] |
|
@James please suggest how to test with with appropriate Commit parameter |
| Comment by James Nunez (Inactive) [ 19/Sep/18 ] |
|
Added the following to the commit message to test b2_10 clients with servers from this patch: |
| Comment by James Nunez (Inactive) [ 19/Sep/18 ] |
|
Looks like this patch allows a 2.10.5 client to mount a later server. Results at https://testing.whamcloud.com/test_sessions/319c062d-ad5a-4f4c-bcb1-a7c23ebff084 |
| Comment by James A Simmons [ 05/Oct/18 ] |
|
The lustre linux client (also 2.10) also now works. |
| Comment by Gerrit Updater [ 10/Oct/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33162/ |
| Comment by Peter Jones [ 10/Oct/18 ] |
|
Landed for 2.12 |