Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5404

sanity test_228b FAIL: Fail to start MDT.

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.6.0
    • None
    • before upgrade: 2.5.2 ldiskfs
      after upgrade:
      server: b2_6-rc2
      clients: 2.5.2
    • 3
    • 15038

    Description

      After rolling upgrade OSS and MDS from 2.5.2 to b2_6-rc2, two clients are still 2.5.2, run sanity hit following error.

      Same error if only upgrade OSS to b2_6-rc2, all the other nodes(MDS, clients) are 2.5.2

      test console

      == sanity test 228b: idle OI blocks can be reused after MDT restart == 19:18:39 (1406168319)
      Lustre: DEBUG MARKER: == sanity test 228b: idle OI blocks can be reused after MDT restart == 19:18:39 (1406168319)
      fail_loc=0x80001002
      Lustre: *** cfs_fail_loc=1002, val=0***
      total: 10000 creates in 15.96 seconds: 626.67 creates/second
      fail_loc=0
      onyx-26: debugfs 1.42.9.wc1 (24-Feb-2014)
      onyx-26: /dev/sdb1: catastrophic mode - not reading inode or group bitmaps
       - unlinked 0 (time 1406168342 ; total 0 ; last 0)
      total: 10000 unlinks in 19 seconds: 526.315796 unlinks/second
      Starting mds: -o user_xattr,acl  /dev/sdb1 /mnt/mds
      onyx-26: mount.lustre: mount /dev/sdb1 at /mnt/mds failed: Operation already in progress
      onyx-26: The target service is already running. (/dev/sdb1)
      Start of /dev/sdb1 on mds failed 114
       sanity test_228b: @@@@@@ FAIL: Fail to start MDT. 
      Lustre: DEBUG MARKER: sanity test_228b: @@@@@@ FAIL: Fail to start MDT.
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4343:error_noexit()
        = /usr/lib64/lustre/tests/test-framework.sh:4374:error()
        = sanity.sh:11773:test_228b()
        = /usr/lib64/lustre/tests/test-framework.sh:4613:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:4648:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4516:run_test()
        = sanity.sh:11787:main()
      Dumping lctl log to /home/w3liu/toro_home/test_logs/sanity.test_228b.*.1406168362.log
      FAIL 228b (51s)
      

      Attachments

        Issue Links

          Activity

            [LU-5404] sanity test_228b FAIL: Fail to start MDT.

            The MGS restart issue I referred to in my previous comment is LU-5420.

            adilger Andreas Dilger added a comment - The MGS restart issue I referred to in my previous comment is LU-5420 .

            I don't think this problem is related specifically to the upgrade, but rather the problem is that the MDS is being unmounted and quickly mounted again. There seems to be some problem that the MDS is still cleaning something up internally that keeps the MDS mountpoint busy for a short time. I've seen this problem in local testing with sanity.sh test_17o which also does stop mds; start mds in quick succession. It may also be that sanity test_160a could fail in the same manner.

            What needs to be done is to get the Lustre debug logs to see what is still happening with the MDS mountpoint between when the unmount syscall is completed and when the superblock is finally released internally.

            adilger Andreas Dilger added a comment - I don't think this problem is related specifically to the upgrade, but rather the problem is that the MDS is being unmounted and quickly mounted again. There seems to be some problem that the MDS is still cleaning something up internally that keeps the MDS mountpoint busy for a short time. I've seen this problem in local testing with sanity.sh test_17o which also does stop mds; start mds in quick succession. It may also be that sanity test_160a could fail in the same manner. What needs to be done is to get the Lustre debug logs to see what is still happening with the MDS mountpoint between when the unmount syscall is completed and when the superblock is finally released internally.
            sarah Sarah Liu added a comment -

            logs

            sarah Sarah Liu added a comment - logs

            People

              wc-triage WC Triage
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: