[LU-9875] conf-sanity test 70e fails with 'start mdt1 failed' Created: 13/Aug/17  Updated: 26/Aug/19  Resolved: 26/Aug/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: test
Environment:

Separate MGS and MDS or MGT and MDT


Issue Links:
Related
is related to LU-10717 several conf-sanity tests failed: FAI... Resolved
is related to LU-8688 All Lustre test suites should run/PAS... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

conf-sanity test_70e fails when run on a Lustre file system with a separate MDT and MGT with the error

conf-sanity test_70e: @@@@@@ FAIL: start mdt1 failed

From the test_log, we can see that the MDT cannot be mounted after being formatted

Starting mds1:   /dev/lvm-Role_MDS/P2 /mnt/lustre-mds1
CMD: onyx-44vm7 mkdir -p /mnt/lustre-mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P2 /mnt/lustre-mds1
onyx-44vm7: mount.lustre: mount /dev/mapper/lvm--Role_MDS-P2 at /mnt/lustre-mds1 failed: Address already in use
onyx-44vm7: The target service's index is already in use. (/dev/mapper/lvm--Role_MDS-P2)
Start of /dev/lvm-Role_MDS/P2 on mds1 failed 98

Looking at the MGS dmesg log, we see

[10988.278664] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P2 /mnt/lustre-mds1
[10988.392026] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[10988.518785] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[10988.568637] LustreError: 140-5: Server lustre-MDT0000 requested index 0, but that index is already in use. Use --writeconf to force
[10988.569912] LustreError: 9140:0:(mgs_handler.c:537:mgs_target_reg()) Failed to write lustre-MDT0000 log (-98)
[10988.575542] LustreError: 15f-b: lustre-MDT0000: cannot register this server with the MGS: rc = -98. Is the MGS running?
[10988.592411] LustreError: 27288:0:(obd_mount_server.c:1866:server_fill_super()) Unable to start targets: -98
[10988.593547] LustreError: 27288:0:(obd_mount_server.c:1576:server_put_super()) no obd lustre-MDT0000
[10988.594456] LustreError: 27288:0:(obd_mount_server.c:135:server_deregister_mount()) lustre-MDT0000 not registered
[10988.657486] LustreError: 27288:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount  (-98)

The MGS still remembers that there was already an MDT with index 0 for the existing file system and, thus, refuses to allow the new MDT to use index 0.

Test sessions with logs for this failure are at
https://testing.hpdd.intel.com/test_sets/c59d75f4-7e5c-11e7-b716-5254006e85c2
https://testing.hpdd.intel.com/test_sets/131fc342-7efb-11e7-9785-5254006e85c2



 Comments   
Comment by Andreas Dilger [ 26/Aug/19 ]

This appears to have been fixed by the landing of patch https://review.whamcloud.com/33589 "LU-10717 tests: tests should not start mgs".

Generated at Sat Feb 10 02:30:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.