[LU-5404] sanity test_228b FAIL: Fail to start MDT. Created: 24/Jul/14  Updated: 14/Dec/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

before upgrade: 2.5.2 ldiskfs
after upgrade:
server: b2_6-rc2
clients: 2.5.2


Attachments: File 5404.tgz    
Issue Links:
Related
is related to LU-5420 Failure on test suite sanity test_17m... Resolved
Severity: 3
Rank (Obsolete): 15038

 Description   

After rolling upgrade OSS and MDS from 2.5.2 to b2_6-rc2, two clients are still 2.5.2, run sanity hit following error.

Same error if only upgrade OSS to b2_6-rc2, all the other nodes(MDS, clients) are 2.5.2

test console

== sanity test 228b: idle OI blocks can be reused after MDT restart == 19:18:39 (1406168319)
Lustre: DEBUG MARKER: == sanity test 228b: idle OI blocks can be reused after MDT restart == 19:18:39 (1406168319)
fail_loc=0x80001002
Lustre: *** cfs_fail_loc=1002, val=0***
total: 10000 creates in 15.96 seconds: 626.67 creates/second
fail_loc=0
onyx-26: debugfs 1.42.9.wc1 (24-Feb-2014)
onyx-26: /dev/sdb1: catastrophic mode - not reading inode or group bitmaps
 - unlinked 0 (time 1406168342 ; total 0 ; last 0)
total: 10000 unlinks in 19 seconds: 526.315796 unlinks/second
Starting mds: -o user_xattr,acl  /dev/sdb1 /mnt/mds
onyx-26: mount.lustre: mount /dev/sdb1 at /mnt/mds failed: Operation already in progress
onyx-26: The target service is already running. (/dev/sdb1)
Start of /dev/sdb1 on mds failed 114
 sanity test_228b: @@@@@@ FAIL: Fail to start MDT. 
Lustre: DEBUG MARKER: sanity test_228b: @@@@@@ FAIL: Fail to start MDT.
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4343:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4374:error()
  = sanity.sh:11773:test_228b()
  = /usr/lib64/lustre/tests/test-framework.sh:4613:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4648:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4516:run_test()
  = sanity.sh:11787:main()
Dumping lctl log to /home/w3liu/toro_home/test_logs/sanity.test_228b.*.1406168362.log
FAIL 228b (51s)


 Comments   
Comment by Sarah Liu [ 24/Jul/14 ]

logs

Comment by Andreas Dilger [ 24/Jul/14 ]

I don't think this problem is related specifically to the upgrade, but rather the problem is that the MDS is being unmounted and quickly mounted again. There seems to be some problem that the MDS is still cleaning something up internally that keeps the MDS mountpoint busy for a short time. I've seen this problem in local testing with sanity.sh test_17o which also does stop mds; start mds in quick succession. It may also be that sanity test_160a could fail in the same manner.

What needs to be done is to get the Lustre debug logs to see what is still happening with the MDS mountpoint between when the unmount syscall is completed and when the superblock is finally released internally.

Comment by Andreas Dilger [ 08/Dec/14 ]

The MGS restart issue I referred to in my previous comment is LU-5420.

Generated at Sat Feb 10 01:51:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.