[LU-5404] sanity test_228b FAIL: Fail to start MDT. Created: 24/Jul/14 Updated: 14/Dec/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
before upgrade: 2.5.2 ldiskfs |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 15038 | ||||||||
| Description |
|
After rolling upgrade OSS and MDS from 2.5.2 to b2_6-rc2, two clients are still 2.5.2, run sanity hit following error. Same error if only upgrade OSS to b2_6-rc2, all the other nodes(MDS, clients) are 2.5.2 test console == sanity test 228b: idle OI blocks can be reused after MDT restart == 19:18:39 (1406168319) Lustre: DEBUG MARKER: == sanity test 228b: idle OI blocks can be reused after MDT restart == 19:18:39 (1406168319) fail_loc=0x80001002 Lustre: *** cfs_fail_loc=1002, val=0*** total: 10000 creates in 15.96 seconds: 626.67 creates/second fail_loc=0 onyx-26: debugfs 1.42.9.wc1 (24-Feb-2014) onyx-26: /dev/sdb1: catastrophic mode - not reading inode or group bitmaps - unlinked 0 (time 1406168342 ; total 0 ; last 0) total: 10000 unlinks in 19 seconds: 526.315796 unlinks/second Starting mds: -o user_xattr,acl /dev/sdb1 /mnt/mds onyx-26: mount.lustre: mount /dev/sdb1 at /mnt/mds failed: Operation already in progress onyx-26: The target service is already running. (/dev/sdb1) Start of /dev/sdb1 on mds failed 114 sanity test_228b: @@@@@@ FAIL: Fail to start MDT. Lustre: DEBUG MARKER: sanity test_228b: @@@@@@ FAIL: Fail to start MDT. Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4343:error_noexit() = /usr/lib64/lustre/tests/test-framework.sh:4374:error() = sanity.sh:11773:test_228b() = /usr/lib64/lustre/tests/test-framework.sh:4613:run_one() = /usr/lib64/lustre/tests/test-framework.sh:4648:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4516:run_test() = sanity.sh:11787:main() Dumping lctl log to /home/w3liu/toro_home/test_logs/sanity.test_228b.*.1406168362.log FAIL 228b (51s) |
| Comments |
| Comment by Sarah Liu [ 24/Jul/14 ] |
|
logs |
| Comment by Andreas Dilger [ 24/Jul/14 ] |
|
I don't think this problem is related specifically to the upgrade, but rather the problem is that the MDS is being unmounted and quickly mounted again. There seems to be some problem that the MDS is still cleaning something up internally that keeps the MDS mountpoint busy for a short time. I've seen this problem in local testing with sanity.sh test_17o which also does stop mds; start mds in quick succession. It may also be that sanity test_160a could fail in the same manner. What needs to be done is to get the Lustre debug logs to see what is still happening with the MDS mountpoint between when the unmount syscall is completed and when the superblock is finally released internally. |
| Comment by Andreas Dilger [ 08/Dec/14 ] |
|
The MGS restart issue I referred to in my previous comment is |