Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      https://testing.hpdd.intel.com/test_sessions/e089506c-5bf0-11e5-9dac-5254006e85c2

      LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
      LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
      Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
      Lustre: Skipped 79 previous similar messages
      Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
      Lustre: Skipped 26 previous similar messages
      Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
      Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
      Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null
      Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
      Lustre: DEBUG MARKER: sync; sync; sync
      Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
      Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
      LustreError: 14276:0:(osd_handler.c:1380:osd_ro()) *** setting lustre-MDT0000 read-only ***
      Turning device dm-0 (0xfd00000) read-only
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
      Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      Lustre: DEBUG MARKER: lctl set_param fail_loc=0x20000709 fail_val=5
      Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
      Lustre: DEBUG MARKER: umount -d /mnt/mds1
      Lustre: Failing over lustre-MDT0000
      Removing read-only on unknown block (0xfd00000)
      Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      Lustre: DEBUG MARKER: hostname
      Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
      Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o recovery_time_hard=60,recovery_time_soft=60                                   /dev/lvm-Role_MDS/P1 /mnt/mds1
      LDISKFS-fs (dm-0): recovery complete
      LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
      LDISKFS-fs error (device dm-0): ldiskfs_lookup: deleted inode referenced: 75023
      Aborting journal on device dm-0-8.
      LDISKFS-fs (dm-0): Remounting filesystem read-only
      LDISKFS-fs error (device dm-0): ldiskfs_put_super: Couldn't clean up the journal
      LustreError: 14732:0:(obd_config.c:575:class_setup()) setup lustre-MDT0000-osd failed (-30)
      LustreError: 14732:0:(obd_mount.c:203:lustre_start_simple()) lustre-MDT0000-osd setup error -30
      LustreError: 14732:0:(obd_mount_server.c:1760:server_fill_super()) Unable to start osd on /dev/mapper/lvm--Role_MDS-P1: -30
      LustreError: 14732:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount  (-30)
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_84: @@@@@@ FAIL: Restart of mds1 failed!
      

      Looks the filesystem is corrupted somehow.

      Attachments

        Issue Links

          Activity

            [LU-7169] conf-sanity 84 restart mds1 failed
            pjones Peter Jones added a comment -

            believed to be a duplicate of LU-7428

            pjones Peter Jones added a comment - believed to be a duplicate of LU-7428
            adilger Andreas Dilger added a comment - - edited

            The Free blocks/inodes count wrong errors do not indicate any kind of problem. These fields in the superblock are not updated during normal usage, so after replay_barrier() they will be out of date.

            I thought about putting the sync_all_data(); sleep 5; call inside replay_barrier() but I didn't want to slow down any of the other tests that use this function that don't immediately follow reformatting the filesystem.

            We'll see if this patch actually fixes the problem.

            adilger Andreas Dilger added a comment - - edited The Free blocks/inodes count wrong errors do not indicate any kind of problem. These fields in the superblock are not updated during normal usage, so after replay_barrier() they will be out of date. I thought about putting the sync_all_data(); sleep 5; call inside replay_barrier() but I didn't want to slow down any of the other tests that use this function that don't immediately follow reformatting the filesystem. We'll see if this patch actually fixes the problem.

            The patch from Andreas:
            Gerrit Updater added a comment - Yesterday
            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/17371
            Subject: LU-7428 tests: write superblock in conf-sanity test_84
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d3e516e59697f7c48e5cb97054ee04cf97dc7132

            yong.fan nasf (Inactive) added a comment - The patch from Andreas: Gerrit Updater added a comment - Yesterday Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/17371 Subject: LU-7428 tests: write superblock in conf-sanity test_84 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d3e516e59697f7c48e5cb97054ee04cf97dc7132
            yong.fan nasf (Inactive) added a comment - - edited

            You are right. LU-7428 is the duplication of this ticket. According to the latest test logs (marked as LU-7428)
            https://testing.hpdd.intel.com/sub_tests/15602932-9477-11e5-a5ac-5254006e85c2

            The former test_83 reformat the system after test_83 done, then move to test_84, after the replay_barrier(), the e2fsck found some inconsistency:

            Warning: skipping journal recovery because doing a read-only filesystem check.
            Pass 1: Checking inodes, blocks, and sizes
            Pass 1: Memory used: 292k/0k (86k/207k), time:  0.01/ 0.00/ 0.00
            Pass 1: I/O read: 1MB, write: 0MB, rate: 104.44MB/s
            Pass 2: Checking directory structure
            Pass 2: Memory used: 292k/0k (98k/195k), time:  0.02/ 0.00/ 0.01
            Pass 2: I/O read: 1MB, write: 0MB, rate: 47.50MB/s
            Pass 3: Checking directory connectivity
            Peak memory: Memory used: 292k/0k (98k/195k), time:  0.04/ 0.01/ 0.01
            Pass 3: Memory used: 292k/0k (96k/197k), time:  0.00/ 0.00/ 0.00
            Pass 3: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
            Pass 4: Checking reference counts
            Pass 4: Memory used: 292k/0k (62k/231k), time:  0.00/ 0.00/ 0.00
            Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
            Pass 5: Checking group summary information
            Free blocks count wrong (33296, counted=32960).
            Fix? no
            
            Free inodes count wrong (99987, counted=99754).
            Fix? no
            
            Pass 5: Memory used: 292k/0k (61k/232k), time:  0.01/ 0.01/ 0.00
            ...
            

            It is reasonable to suspect that the replay_barrier() marked the system as readonly too early before super block synced to disk. But seems the replay_barrier() has considered that:

            replay_barrier() {
                    local facet=$1
                    do_facet $facet "sync; sync; sync"
                    df $MOUNT
            ..
            

            So seems the "sync" does not work. As for your patch http://review.whamcloud.com/#/c/17371/, it should be helpful. But why not make the sync_all_data to be inside replay_barrier() to replace the 'do_facet $facet "sync; sync; sync"', then it can help others.

            yong.fan nasf (Inactive) added a comment - - edited You are right. LU-7428 is the duplication of this ticket. According to the latest test logs (marked as LU-7428 ) https://testing.hpdd.intel.com/sub_tests/15602932-9477-11e5-a5ac-5254006e85c2 The former test_83 reformat the system after test_83 done, then move to test_84, after the replay_barrier(), the e2fsck found some inconsistency: Warning: skipping journal recovery because doing a read-only filesystem check. Pass 1: Checking inodes, blocks, and sizes Pass 1: Memory used: 292k/0k (86k/207k), time: 0.01/ 0.00/ 0.00 Pass 1: I/O read: 1MB, write: 0MB, rate: 104.44MB/s Pass 2: Checking directory structure Pass 2: Memory used: 292k/0k (98k/195k), time: 0.02/ 0.00/ 0.01 Pass 2: I/O read: 1MB, write: 0MB, rate: 47.50MB/s Pass 3: Checking directory connectivity Peak memory: Memory used: 292k/0k (98k/195k), time: 0.04/ 0.01/ 0.01 Pass 3: Memory used: 292k/0k (96k/197k), time: 0.00/ 0.00/ 0.00 Pass 3: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 4: Checking reference counts Pass 4: Memory used: 292k/0k (62k/231k), time: 0.00/ 0.00/ 0.00 Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 5: Checking group summary information Free blocks count wrong (33296, counted=32960). Fix? no Free inodes count wrong (99987, counted=99754). Fix? no Pass 5: Memory used: 292k/0k (61k/232k), time: 0.01/ 0.01/ 0.00 ... It is reasonable to suspect that the replay_barrier() marked the system as readonly too early before super block synced to disk. But seems the replay_barrier() has considered that: replay_barrier() { local facet=$1 do_facet $facet "sync; sync; sync" df $MOUNT .. So seems the "sync" does not work. As for your patch http://review.whamcloud.com/#/c/17371/ , it should be helpful. But why not make the sync_all_data to be inside replay_barrier() to replace the 'do_facet $facet "sync; sync; sync"', then it can help others.

            There are many, many failures of this test, but unfortunately they have all been assigned different bugs because the error messages are different.

            In the tests I've seen, the e2fsck run is clean, except for the superblock inside and block counts, which is expected.

            I pushed a patch under LU-7428 that may fix the problem, which I think is caused by test_84() setting the MDS read-only right after mount, and that is causing some of the recently written data to be discarded (e.g. superblock label, llog records, etc). Unfortunately, it will take a few days to be tested.

            adilger Andreas Dilger added a comment - There are many, many failures of this test, but unfortunately they have all been assigned different bugs because the error messages are different. In the tests I've seen, the e2fsck run is clean, except for the superblock inside and block counts, which is expected. I pushed a patch under LU-7428 that may fix the problem, which I think is caused by test_84() setting the MDS read-only right after mount, and that is causing some of the recently written data to be discarded (e.g. superblock label, llog records, etc). Unfortunately, it will take a few days to be tested.

            Right, we are still waiting for more failure instances after landing the patch.

            yong.fan nasf (Inactive) added a comment - Right, we are still waiting for more failure instances after landing the patch.

            The landed patch was only for debugging. This issue is not resolved.

            adilger Andreas Dilger added a comment - The landed patch was only for debugging. This issue is not resolved.

            Landed for 2.8

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16664/
            Subject: LU-7169 tests: check disk corruption during failover
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f84e06eead85de5cd7832855bab5ff72a542e971

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16664/ Subject: LU-7169 tests: check disk corruption during failover Project: fs/lustre-release Branch: master Current Patch Set: Commit: f84e06eead85de5cd7832855bab5ff72a542e971
            yong.fan nasf (Inactive) added a comment - - edited

            We hit the trouble with the patch applied, the log shows that there is really some disk inconsistency as following:
            https://testing.hpdd.intel.com/test_sets/e6f060ac-8707-11e5-bf92-5254006e85c2

            CMD: onyx-44vm3 e2fsck -d -v -t -t -f -n /dev/lvm-Role_MDS/P1
            onyx-44vm3: e2fsck 1.42.13.wc3 (28-Aug-2015)
            Warning: skipping journal recovery because doing a read-only filesystem check.
            Pass 1: Checking inodes, blocks, and sizes
            Pass 1: Memory used: 292k/0k (87k/206k), time:  0.00/ 0.00/ 0.00
            Pass 1: I/O read: 1MB, write: 0MB, rate: 553.40MB/s
            Pass 2: Checking directory structure
            Pass 2: Memory used: 292k/0k (98k/195k), time:  0.00/ 0.00/ 0.00
            Pass 2: I/O read: 1MB, write: 0MB, rate: 655.74MB/s
            Pass 3: Checking directory connectivity
            Peak memory: Memory used: 292k/0k (98k/195k), time:  0.01/ 0.00/ 0.00
            Pass 3: Memory used: 292k/0k (96k/197k), time:  0.00/ 0.00/ 0.00
            Pass 3: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
            Pass 4: Checking reference counts
            Pass 4: Memory used: 292k/0k (62k/231k), time:  0.00/ 0.00/ 0.00
            Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
            Pass 5: Checking group summary information
            Free blocks count wrong (33296, counted=32947).
            Fix? no
            
            Free inodes count wrong (99987, counted=99750).
            Fix? no
            
            Pass 5: Memory used: 292k/0k (62k/231k), time:  0.00/ 0.00/ 0.00
            Pass 5: I/O read: 1MB, write: 0MB, rate: 333.78MB/s
            
                      13 inodes used (0.01%, out of 100000)
                       6 non-contiguous files (46.2%)
                       0 non-contiguous directories (0.0%)
                         # of inodes with ind/dind/tind blocks: 0/0/0
                   16704 blocks used (33.41%, out of 50000)
                       0 bad blocks
                       1 large file
            
                     125 regular files
                     116 directories
                       0 character device files
                       0 block device files
                       0 fifos
                       0 links
                       0 symbolic links (0 fast symbolic links)
                       0 sockets
            ------------
                     241 files
            Memory used: 292k/0k (61k/232k), time:  0.01/ 0.01/ 0.00
            I/O read: 1MB, write: 0MB, rate: 92.42MB/s
            reboot facets: mds1
            

            Such inconsistency does not means the super block crashed. Because without disk checksum, the e2fsck cannot detect per-block based data corruption. Only with above logs, we cannot say it is the root reason for test_84 failure. I will update the patch with more debug information.

            yong.fan nasf (Inactive) added a comment - - edited We hit the trouble with the patch applied, the log shows that there is really some disk inconsistency as following: https://testing.hpdd.intel.com/test_sets/e6f060ac-8707-11e5-bf92-5254006e85c2 CMD: onyx-44vm3 e2fsck -d -v -t -t -f -n /dev/lvm-Role_MDS/P1 onyx-44vm3: e2fsck 1.42.13.wc3 (28-Aug-2015) Warning: skipping journal recovery because doing a read-only filesystem check. Pass 1: Checking inodes, blocks, and sizes Pass 1: Memory used: 292k/0k (87k/206k), time: 0.00/ 0.00/ 0.00 Pass 1: I/O read: 1MB, write: 0MB, rate: 553.40MB/s Pass 2: Checking directory structure Pass 2: Memory used: 292k/0k (98k/195k), time: 0.00/ 0.00/ 0.00 Pass 2: I/O read: 1MB, write: 0MB, rate: 655.74MB/s Pass 3: Checking directory connectivity Peak memory: Memory used: 292k/0k (98k/195k), time: 0.01/ 0.00/ 0.00 Pass 3: Memory used: 292k/0k (96k/197k), time: 0.00/ 0.00/ 0.00 Pass 3: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 4: Checking reference counts Pass 4: Memory used: 292k/0k (62k/231k), time: 0.00/ 0.00/ 0.00 Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 5: Checking group summary information Free blocks count wrong (33296, counted=32947). Fix? no Free inodes count wrong (99987, counted=99750). Fix? no Pass 5: Memory used: 292k/0k (62k/231k), time: 0.00/ 0.00/ 0.00 Pass 5: I/O read: 1MB, write: 0MB, rate: 333.78MB/s 13 inodes used (0.01%, out of 100000) 6 non-contiguous files (46.2%) 0 non-contiguous directories (0.0%) # of inodes with ind/dind/tind blocks: 0/0/0 16704 blocks used (33.41%, out of 50000) 0 bad blocks 1 large file 125 regular files 116 directories 0 character device files 0 block device files 0 fifos 0 links 0 symbolic links (0 fast symbolic links) 0 sockets ------------ 241 files Memory used: 292k/0k (61k/232k), time: 0.01/ 0.01/ 0.00 I/O read: 1MB, write: 0MB, rate: 92.42MB/s reboot facets: mds1 Such inconsistency does not means the super block crashed. Because without disk checksum, the e2fsck cannot detect per-block based data corruption. Only with above logs, we cannot say it is the root reason for test_84 failure. I will update the patch with more debug information.

            People

              yong.fan nasf (Inactive)
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: