[LU-9200] sanity-lfsck: failed to mount OST Created: 09/Mar/17 Updated: 03/Mar/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.5, Lustre 2.10.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for sarah_lw <wei3.liu@intel.com> Please provide additional information about the failure here. This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/f11eec1a-f8f8-11e6-aac4-5254006e85c2. server and client: lustre-master tag-2.9.53, el7 zfs there is no error log, this failure causes sanity-lfsck cannot be run, also affect the following 2 test suites. Starting ost1: lustre-ost1/ost1 /mnt/lustre-ost1 CMD: trevis-36vm8 mkdir -p /mnt/lustre-ost1; mount -t lustre lustre-ost1/ost1 /mnt/lustre-ost1 trevis-36vm8: mount.lustre: mount lustre-ost1/ost1 at /mnt/lustre-ost1 failed: Cannot send after transport endpoint shutdown |
| Comments |
| Comment by Andreas Dilger [ 15/Mar/17 ] |
|
There isn't much in the logs. It looks like there may have been a service still running from the previous test that was not unmounted cleanly, and this was preventing the new OST mount from failing. Will see if this happens again. |
| Comment by James Nunez (Inactive) [ 15/Feb/18 ] |
|
We have seen this failure several times since this ticket was opened. A recent example of this failure is at In the OSS console log, we see the mount fail [ 274.852821] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-ost1 [ 275.205895] Lustre: DEBUG MARKER: lsmod | grep zfs >&/dev/null || modprobe zfs; [ 275.205895] zpool list -H lustre-ost1 >/dev/null 2>&1 || [ 275.205895] zpool import -f -o cachefile=none -o failmode=panic -d /dev/lvm-Role_OSS lustre-ost1 [ 275.688869] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-ost1/ost1 [ 276.020363] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-ost1; mount -t lustre lustre-ost1/ost1 /mnt/lustre-ost1 [ 284.401047] LustreError: 15f-b: lustre-OST0000: cannot register this server with the MGS: rc = -108. Is the MGS running? [ 284.403459] LustreError: 9448:0:(obd_mount_server.c:1934:server_fill_super()) Unable to start targets: -108 [ 284.405669] LustreError: 9448:0:(obd_mount_server.c:1586:server_put_super()) no obd lustre-OST0000 [ 284.408071] LustreError: 9448:0:(obd_mount_server.c:132:server_deregister_mount()) lustre-OST0000 not registered [ 284.410875] Lustre: server umount lustre-OST0000 complete [ 284.412717] LustreError: 9448:0:(obd_mount.c:1583:lustre_fill_super()) Unable to mount (-108) In this case, lustre-initialization took place right before sanity-lfsck failed mounting OST0. |
| Comment by James Nunez (Inactive) [ 20/Feb/18 ] |
|
There's, what looks like, another instance of this with Ubuntu clients, at https://testing.hpdd.intel.com/test_sets/49dd5b22-12b9-11e8-a6ad-52540065bddc. In this case, the console log for the OSS has an interesting message about the obd reference [ 242.176504] Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 2. Is it stuck? |
| Comment by Alex Zhuravlev [ 06/Mar/18 ] |
|
AFAICS, I'm getting this very frequently. for example, https://testing.hpdd.intel.com/test_sessions/bdf41958-f133-4ab6-a190-a7d9cdfd785f |
| Comment by James Nunez (Inactive) [ 14/Aug/18 ] |
|
For Lustre 2.10.5 RC1, we see this issue with the OST mount cause sanity-lfsck to fail without running any tests, but we also see the next three test suites; sanityn, sanity-hsm, sanity-lsnapshot; fail do to this issue. An example of this is at https://testing.whamcloud.com/test_sessions/1e4b0840-3c0f-4c74-8d51-0005c0b498c8 |