Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.1.1
Labels:
None
Environment:

Hide
Server: 2.1.1 in centos 6.2, kernel 2.6.32-220.4.1.el6, x86_64, lustre server 2.1.1-0.2nasS
Client: 2.1.1 in centos 6.2, unpatched kernel 2.6.32-220.4.1.el6, x86_64. lustre client 2.1.1-0.2nasC

1 mds/mgs (service360)
2 osses (service361,service362)
2 clients (service333, service334)

The lustre git repo can be found at
https://github.com/jlan/lustre-nas/tree/nas-2.1.1

Show
Server: 2.1.1 in centos 6.2, kernel 2.6.32-220.4.1.el6, x86_64, lustre server 2.1.1-0.2nasS Client: 2.1.1 in centos 6.2, unpatched kernel 2.6.32-220.4.1.el6, x86_64. lustre client 2.1.1-0.2nasC 1 mds/mgs (service360) 2 osses (service361,service362) 2 clients (service333, service334) The lustre git repo can be found at https://github.com/jlan/lustre-nas/tree/nas-2.1.1

Severity:
3
Rank (Obsolete):
6409

Description

Reply-dual test 16 failed:

== replay-dual test 16: fail MDS during recovery (3571) == 17:38:52 (1335573532)
Filesystem 1K-blocks Used Available Use% Mounted on
service360@o2ib:/lustre
3937056 205112 3531816 6% /mnt/nbp0-1
total: 25 creates in 0.04 seconds: 678.21 creates/second
total: 1 creates in 0.00 seconds: 389.26 creates/second
Failing mds1 on node service360
Stopping /mnt/mds1 (opts
affected facets: mds1
Failover mds1 to service360
17:39:07 (1335573547) waiting for service360 network 900 secs ...
17:39:07 (1335573547) network interface is UP
Starting mds1: -o errors=panic,acl /dev/sdb1 /mnt/mds1
service360: mount.lustre: mount /dev/sdb1 at /mnt/mds1 failed: Invalid argument
service360: This may have multiple causes.
service360: Are the mount options correct?
service360: Check the syslog for more info.
mount -t lustre /dev/sdb1 /mnt/mds1
Start of /dev/sdb1 on mds1 failed 22
replay-dual test_16: @@@@@@ FAIL: Restart of mds1 failed!
Dumping lctl log to /var/acc-sm/test_logs//1335573120/replay-dual.test_16.*.1335573548.log
tar: Removing leading `/' from member names
/var/acc-sm/test_logs//1335573120/replay-dual-1335573548.tar.bz2
FAIL 16 (45s)

The "Invalid argument" was about extents, which we do not turn on on MDS.

The replay-dual.test_16.dmesg.service360.1335573548.log seemed to suggest the problem
was a corrupted file:
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts:
LDISKFS-fs warning (device sdb1): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
LDISKFS-fs (sdb1): ldiskfs_check_descriptors: Checksum for group 0 failed (27004!=29265)
LDISKFS-fs (sdb1): group descriptors corrupted!

~~LU-699~~ seemed to have encountered a data corruption problem in reply-dual test_1. I applied the patch and rebuilt a lustre server package, but the test still failed.

REPLAY_DUAL-16.tgz is attached.

The failure is 100% reproducible.

Could the data corruption problem caused by trying to fail-over mds to the same node?
In other words, is it a test-case problem or a real problem?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

REPLAY_DUAL-16.tgz
6.43 MB
02/May/12 2:47 PM
replay-dual-14b.tar.bz2
4.56 MB
03/May/12 6:51 PM
replay-dual-14b.tar.bz2
4.56 MB
03/May/12 6:50 PM
replay-dual-14b.tar.bz2
4.56 MB
03/May/12 6:41 PM
replay-dual-14b.tar.bz2
4.56 MB
03/May/12 6:38 PM
replay-dual-16.tar.bz2
1.66 MB
03/May/12 6:50 PM
replay-dual-16.tar.bz2
9.87 MB
03/May/12 6:41 PM
replay-dual-16.tar.bz2
9.87 MB
03/May/12 6:38 PM

Activity

People

Assignee:: Lai Siyao

Reporter:: Jay Lan (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 02/May/12 2:47 PM

Updated:: 05/Oct/12 8:31 PM

Resolved:: 05/Oct/12 8:31 PM