[LU-9200] sanity-lfsck: failed to mount OST Created: 09/Mar/17  Updated: 03/Mar/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.5, Lustre 2.10.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/f11eec1a-f8f8-11e6-aac4-5254006e85c2.

server and client: lustre-master tag-2.9.53, el7 zfs

there is no error log, this failure causes sanity-lfsck cannot be run, also affect the following 2 test suites.

Starting ost1:   lustre-ost1/ost1 /mnt/lustre-ost1
CMD: trevis-36vm8 mkdir -p /mnt/lustre-ost1; mount -t lustre   		                   lustre-ost1/ost1 /mnt/lustre-ost1
trevis-36vm8: mount.lustre: mount lustre-ost1/ost1 at /mnt/lustre-ost1 failed: Cannot send after transport endpoint shutdown


 Comments   
Comment by Andreas Dilger [ 15/Mar/17 ]

There isn't much in the logs. It looks like there may have been a service still running from the previous test that was not unmounted cleanly, and this was preventing the new OST mount from failing. Will see if this happens again.

Comment by James Nunez (Inactive) [ 15/Feb/18 ]

We have seen this failure several times since this ticket was opened. A recent example of this failure is at
https://testing.hpdd.intel.com/test_sets/874a01ea-111f-11e8-a7cd-52540065bddc

In the OSS console log, we see the mount fail

[  274.852821] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-ost1
[  275.205895] Lustre: DEBUG MARKER: lsmod | grep zfs >&/dev/null || modprobe zfs;
[  275.205895] 			zpool list -H lustre-ost1 >/dev/null 2>&1 ||
[  275.205895] 			zpool import -f -o cachefile=none -o failmode=panic -d /dev/lvm-Role_OSS lustre-ost1
[  275.688869] Lustre: DEBUG MARKER: zfs get -H -o value 						lustre:svname lustre-ost1/ost1
[  276.020363] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-ost1; mount -t lustre   		                   lustre-ost1/ost1 /mnt/lustre-ost1
[  284.401047] LustreError: 15f-b: lustre-OST0000: cannot register this server with the MGS: rc = -108. Is the MGS running?
[  284.403459] LustreError: 9448:0:(obd_mount_server.c:1934:server_fill_super()) Unable to start targets: -108
[  284.405669] LustreError: 9448:0:(obd_mount_server.c:1586:server_put_super()) no obd lustre-OST0000
[  284.408071] LustreError: 9448:0:(obd_mount_server.c:132:server_deregister_mount()) lustre-OST0000 not registered
[  284.410875] Lustre: server umount lustre-OST0000 complete
[  284.412717] LustreError: 9448:0:(obd_mount.c:1583:lustre_fill_super()) Unable to mount  (-108)

In this case, lustre-initialization took place right before sanity-lfsck failed mounting OST0.

Comment by James Nunez (Inactive) [ 20/Feb/18 ]

There's, what looks like, another instance of this with Ubuntu clients, at https://testing.hpdd.intel.com/test_sets/49dd5b22-12b9-11e8-a6ad-52540065bddc. In this case, the console log for the OSS has an interesting message about the obd reference

[  242.176504] Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 2. Is it stuck?
Comment by Alex Zhuravlev [ 06/Mar/18 ]

AFAICS, I'm getting this very frequently. for example, https://testing.hpdd.intel.com/test_sessions/bdf41958-f133-4ab6-a190-a7d9cdfd785f

Comment by James Nunez (Inactive) [ 14/Aug/18 ]

For Lustre 2.10.5 RC1, we see this issue with the OST mount cause sanity-lfsck to fail without running any tests, but we also see the next three test suites; sanityn, sanity-hsm, sanity-lsnapshot; fail do to this issue.

An example of this is at https://testing.whamcloud.com/test_sessions/1e4b0840-3c0f-4c74-8d51-0005c0b498c8

Generated at Sat Feb 10 02:24:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.