Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.9.0
-
None
-
autotest review-dne-part-2
-
3
-
9223372036854775807
Description
replay-single test_87 fails with
'Restart of ost1 failed!'
From the test_log, we see that after the failover of the OST, the mount has problems
Failover ost1 to trevis-4vm3 10:12:00 (1464603120) waiting for trevis-4vm3 network 900 secs ... 10:12:00 (1464603120) network interface is UP CMD: trevis-4vm3 hostname mount facets: ost1 CMD: trevis-4vm3 test -b /dev/lvm-Role_OSS/P1 CMD: trevis-4vm3 e2label /dev/lvm-Role_OSS/P1 Starting ost1: /dev/lvm-Role_OSS/P1 /mnt/ost1 CMD: trevis-4vm3 mkdir -p /mnt/ost1; mount -t lustre /dev/lvm-Role_OSS/P1 /mnt/ost1 trevis-4vm3: mount.lustre: mount /dev/mapper/lvm--Role_OSS-P1 at /mnt/ost1 failed: No such file or directory trevis-4vm3: Is the MGS specification correct? trevis-4vm3: Is the filesystem name correct? trevis-4vm3: If upgrading, is the copied client log valid? (see upgrade docs) Start of /dev/lvm-Role_OSS/P1 on ost1 failed 2
This issue has occurred 4 times in the past two weeks and the logs are not consistent on all failures, but on two of the failures, the OST console has
18:17:53:[ 5498.685473] Lustre: Failing over lustre-OST0000 18:17:53:[ 5498.697452] Removing read-only on unknown block (0xfc00000) 18:17:53:[ 5498.707991] Lustre: server umount lustre-OST0000 complete 18:17:53:[ 5498.869364] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 18:17:53:[ 5499.538728] LustreError: 137-5: lustre-OST0000_UUID: not available for connect from 10.2.4.216@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. 18:17:53:[ 5499.548875] LustreError: Skipped 7 previous similar messages 18:17:53:[ 5509.256355] Lustre: DEBUG MARKER: hostname 18:17:53:[ 5509.645443] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_OSS/P1 18:17:53:[ 5509.955019] Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre /dev/lvm-Role_OSS/P1 /mnt/ost1 18:17:53:[ 5510.316293] LDISKFS-fs (dm-0): recovery complete 18:17:53:[ 5510.334090] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache 18:17:53:[ 5510.375257] LustreError: 26044:0:(llog_osd.c:256:llog_osd_read_header()) lustre-OST0000-osd: bad log lustre-client [0xa:0x74:0x0] header magic: 0x2083300 (expected 0x10645539) 18:17:53:[ 5510.385449] LustreError: 26044:0:(mgc_request.c:1741:mgc_llog_local_copy()) MGC10.2.4.216@tcp: failed to copy remote log lustre-client: rc = -5 18:17:53:[ 5510.389682] LustreError: 13a-8: Failed to get MGS log lustre-client and no local copy. 18:17:53:[ 5510.391675] LustreError: 15c-8: MGC10.2.4.216@tcp: The configuration from log 'lustre-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. 18:17:53:[ 5510.397556] LustreError: 26044:0:(obd_mount_server.c:1326:server_start_targets()) lustre-OST0000: failed to start LWP: -2 18:17:53:[ 5510.399960] LustreError: 26044:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -2 18:17:53:[ 5510.402275] Lustre: Failing over lustre-OST0000 18:17:53:[ 5510.447063] Lustre: server umount lustre-OST0000 complete 18:17:53:[ 5510.452435] LustreError: 26044:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-2) 18:17:53:[ 5510.807253] Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-single test_87: @@@@@@ FAIL: Restart of ost1 failed!
All failures are on review-dne-part-2.
The following links to logs are all the failures seen to date:
2016-05-16 19:50:47 - 2.8.52.70.g49cd5fd - https://testing.hpdd.intel.com/test_sets/4ad651a2-1c0a-11e6-855a-5254006e85c2
2016-05-25 23:55:13 – 2.8.51.3.ge28e633 - https://testing.hpdd.intel.com/test_sets/3b82c9ec-2326-11e6-a8f9-5254006e85c2
2016-05-30 08:47:44 - 2.8.53.38.gd685fc5 - https://testing.hpdd.intel.com/test_sets/20dd7d0e-2690-11e6-ab39-5254006e85c2
2016-05-31 18:40:42 - 2.8.53.28.g3c47b56 - https://testing.hpdd.intel.com/test_sets/a2e361f2-27b1-11e6-aac3-5254006e85c2