[LU-13402] sanity test_252: Invalid number of mdtlov clients returned by /usr/sbin/lr_reader Created: 31/Mar/20  Updated: 16/Apr/23  Resolved: 01/May/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-13408 tgt_cancel_slc_locks()) ASSERTION( lo... Resolved
Related
is related to LU-13379 interop b2_12/master: lustre-initiali... Resolved
is related to LU-13469 MDS hung during mount Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for S Buisson <sbuisson@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/59cb919c-f797-4928-908e-b8cc42e8b5bd

test_252 failed with the following error:

Invalid number of mdtlov clients returned by /usr/sbin/lr_reader

mdtlov for MDT0002 seems to be missing, as lr_reader output only shows lustre-MDT0001-mdtlov_UUID and lustre-MDT0003-mdtlov_UUID.
So MDT target local to MDT0 is missing?

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_252 - Invalid number of mdtlov clients returned by /usr/sbin/lr_reader



 Comments   
Comment by Andreas Dilger [ 31/Mar/20 ]

This is seen during testing of patch https://review.whamcloud.com/38022 "LU-13379 tests: don't use localrecov for older servers" which just moves the "-o localrecov" setting later in the test setup so that we can check the server version before adding the mount optoin.

I thought that this patch is identical to the original code, but I'm wondering if somehow "localrecov" is causing the local MDT not to register itself in the last_rcvd file? In the failure case, lustre-MDT0002 is missing from the log on lustre-MDT0000, so it would be a "local" client in that respect.

Comment by Andreas Dilger [ 02/Apr/20 ]

Alex, any thoughts on this? The 38022 patch is failing consistently due to moving the "-o localrecov" flag definition, so it is definitely the fault of the patch.

Comment by Andreas Dilger [ 06/Apr/20 ]

This looks to be only failing for the LU-13379 patch.

Comment by Gerrit Updater [ 06/Apr/20 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38138
Subject: LU-13402 target: never exclude MDT from last_rcvd
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7962e141c6c6468527b9c329eeaa81b2f8a8f325

Comment by Gerrit Updater [ 01/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38138/
Subject: LU-13402 target: never exclude MDT/OST from last_rcvd
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6682b74280ed778f8668f942c808eb70ed7bc67f

Comment by Peter Jones [ 01/May/20 ]

Landed for 2.14

Generated at Sat Feb 10 03:00:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.