Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.9.0
Labels:
None
Environment:
autotest review-dne-part-2

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

replay-single test_87 fails with

 'Restart of ost1 failed!'

From the test_log, we see that after the failover of the OST, the mount has problems

Failover ost1 to trevis-4vm3
10:12:00 (1464603120) waiting for trevis-4vm3 network 900 secs ...
10:12:00 (1464603120) network interface is UP
CMD: trevis-4vm3 hostname
mount facets: ost1
CMD: trevis-4vm3 test -b /dev/lvm-Role_OSS/P1
CMD: trevis-4vm3 e2label /dev/lvm-Role_OSS/P1
Starting ost1:   /dev/lvm-Role_OSS/P1 /mnt/ost1
CMD: trevis-4vm3 mkdir -p /mnt/ost1; mount -t lustre   		                   /dev/lvm-Role_OSS/P1 /mnt/ost1
trevis-4vm3: mount.lustre: mount /dev/mapper/lvm--Role_OSS-P1 at /mnt/ost1 failed: No such file or directory
trevis-4vm3: Is the MGS specification correct?
trevis-4vm3: Is the filesystem name correct?
trevis-4vm3: If upgrading, is the copied client log valid? (see upgrade docs)
Start of /dev/lvm-Role_OSS/P1 on ost1 failed 2

This issue has occurred 4 times in the past two weeks and the logs are not consistent on all failures, but on two of the failures, the OST console has

18:17:53:[ 5498.685473] Lustre: Failing over lustre-OST0000
18:17:53:[ 5498.697452] Removing read-only on unknown block (0xfc00000)
18:17:53:[ 5498.707991] Lustre: server umount lustre-OST0000 complete
18:17:53:[ 5498.869364] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
18:17:53:[ 5499.538728] LustreError: 137-5: lustre-OST0000_UUID: not available for connect from 10.2.4.216@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
18:17:53:[ 5499.548875] LustreError: Skipped 7 previous similar messages
18:17:53:[ 5509.256355] Lustre: DEBUG MARKER: hostname
18:17:53:[ 5509.645443] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_OSS/P1
18:17:53:[ 5509.955019] Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre   		                   /dev/lvm-Role_OSS/P1 /mnt/ost1
18:17:53:[ 5510.316293] LDISKFS-fs (dm-0): recovery complete
18:17:53:[ 5510.334090] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache
18:17:53:[ 5510.375257] LustreError: 26044:0:(llog_osd.c:256:llog_osd_read_header()) lustre-OST0000-osd: bad log lustre-client [0xa:0x74:0x0] header magic: 0x2083300 (expected 0x10645539)
18:17:53:[ 5510.385449] LustreError: 26044:0:(mgc_request.c:1741:mgc_llog_local_copy()) MGC10.2.4.216@tcp: failed to copy remote log lustre-client: rc = -5
18:17:53:[ 5510.389682] LustreError: 13a-8: Failed to get MGS log lustre-client and no local copy.
18:17:53:[ 5510.391675] LustreError: 15c-8: MGC10.2.4.216@tcp: The configuration from log 'lustre-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
18:17:53:[ 5510.397556] LustreError: 26044:0:(obd_mount_server.c:1326:server_start_targets()) lustre-OST0000: failed to start LWP: -2
18:17:53:[ 5510.399960] LustreError: 26044:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -2
18:17:53:[ 5510.402275] Lustre: Failing over lustre-OST0000
18:17:53:[ 5510.447063] Lustre: server umount lustre-OST0000 complete
18:17:53:[ 5510.452435] LustreError: 26044:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount  (-2)
18:17:53:[ 5510.807253] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-single test_87: @@@@@@ FAIL: Restart of ost1 failed!

All failures are on review-dne-part-2.

The following links to logs are all the failures seen to date:
2016-05-16 19:50:47 - 2.8.52.70.g49cd5fd - https://testing.hpdd.intel.com/test_sets/4ad651a2-1c0a-11e6-855a-5254006e85c2
2016-05-25 23:55:13 – 2.8.51.3.ge28e633 - https://testing.hpdd.intel.com/test_sets/3b82c9ec-2326-11e6-a8f9-5254006e85c2
2016-05-30 08:47:44 - 2.8.53.38.gd685fc5 - https://testing.hpdd.intel.com/test_sets/20dd7d0e-2690-11e6-ab39-5254006e85c2
2016-05-31 18:40:42 - 2.8.53.28.g3c47b56 - https://testing.hpdd.intel.com/test_sets/a2e361f2-27b1-11e6-aac3-5254006e85c2

Attachments

Activity

People

Assignee:: Mikhail Pershin

Reporter:: James Nunez (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 01/Jun/16 5:00 PM

Updated:: 22/Jul/18 8:34 AM

Resolved:: 22/Jul/18 8:34 AM