Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      https://testing.hpdd.intel.com/test_sessions/e089506c-5bf0-11e5-9dac-5254006e85c2

      LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
      LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
      Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
      Lustre: Skipped 79 previous similar messages
      Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
      Lustre: Skipped 26 previous similar messages
      Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
      Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
      Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null
      Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
      Lustre: DEBUG MARKER: sync; sync; sync
      Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
      Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
      LustreError: 14276:0:(osd_handler.c:1380:osd_ro()) *** setting lustre-MDT0000 read-only ***
      Turning device dm-0 (0xfd00000) read-only
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
      Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      Lustre: DEBUG MARKER: lctl set_param fail_loc=0x20000709 fail_val=5
      Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
      Lustre: DEBUG MARKER: umount -d /mnt/mds1
      Lustre: Failing over lustre-MDT0000
      Removing read-only on unknown block (0xfd00000)
      Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      Lustre: DEBUG MARKER: hostname
      Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
      Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o recovery_time_hard=60,recovery_time_soft=60                                   /dev/lvm-Role_MDS/P1 /mnt/mds1
      LDISKFS-fs (dm-0): recovery complete
      LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
      LDISKFS-fs error (device dm-0): ldiskfs_lookup: deleted inode referenced: 75023
      Aborting journal on device dm-0-8.
      LDISKFS-fs (dm-0): Remounting filesystem read-only
      LDISKFS-fs error (device dm-0): ldiskfs_put_super: Couldn't clean up the journal
      LustreError: 14732:0:(obd_config.c:575:class_setup()) setup lustre-MDT0000-osd failed (-30)
      LustreError: 14732:0:(obd_mount.c:203:lustre_start_simple()) lustre-MDT0000-osd setup error -30
      LustreError: 14732:0:(obd_mount_server.c:1760:server_fill_super()) Unable to start osd on /dev/mapper/lvm--Role_MDS-P1: -30
      LustreError: 14732:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount  (-30)
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_84: @@@@@@ FAIL: Restart of mds1 failed!
      

      Looks the filesystem is corrupted somehow.

      Attachments

        Issue Links

          Activity

            [LU-7169] conf-sanity 84 restart mds1 failed

            There are many, many failures of this test, but unfortunately they have all been assigned different bugs because the error messages are different.

            In the tests I've seen, the e2fsck run is clean, except for the superblock inside and block counts, which is expected.

            I pushed a patch under LU-7428 that may fix the problem, which I think is caused by test_84() setting the MDS read-only right after mount, and that is causing some of the recently written data to be discarded (e.g. superblock label, llog records, etc). Unfortunately, it will take a few days to be tested.

            adilger Andreas Dilger added a comment - There are many, many failures of this test, but unfortunately they have all been assigned different bugs because the error messages are different. In the tests I've seen, the e2fsck run is clean, except for the superblock inside and block counts, which is expected. I pushed a patch under LU-7428 that may fix the problem, which I think is caused by test_84() setting the MDS read-only right after mount, and that is causing some of the recently written data to be discarded (e.g. superblock label, llog records, etc). Unfortunately, it will take a few days to be tested.

            Right, we are still waiting for more failure instances after landing the patch.

            yong.fan nasf (Inactive) added a comment - Right, we are still waiting for more failure instances after landing the patch.

            The landed patch was only for debugging. This issue is not resolved.

            adilger Andreas Dilger added a comment - The landed patch was only for debugging. This issue is not resolved.

            Landed for 2.8

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16664/
            Subject: LU-7169 tests: check disk corruption during failover
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f84e06eead85de5cd7832855bab5ff72a542e971

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16664/ Subject: LU-7169 tests: check disk corruption during failover Project: fs/lustre-release Branch: master Current Patch Set: Commit: f84e06eead85de5cd7832855bab5ff72a542e971
            yong.fan nasf (Inactive) added a comment - - edited

            We hit the trouble with the patch applied, the log shows that there is really some disk inconsistency as following:
            https://testing.hpdd.intel.com/test_sets/e6f060ac-8707-11e5-bf92-5254006e85c2

            CMD: onyx-44vm3 e2fsck -d -v -t -t -f -n /dev/lvm-Role_MDS/P1
            onyx-44vm3: e2fsck 1.42.13.wc3 (28-Aug-2015)
            Warning: skipping journal recovery because doing a read-only filesystem check.
            Pass 1: Checking inodes, blocks, and sizes
            Pass 1: Memory used: 292k/0k (87k/206k), time:  0.00/ 0.00/ 0.00
            Pass 1: I/O read: 1MB, write: 0MB, rate: 553.40MB/s
            Pass 2: Checking directory structure
            Pass 2: Memory used: 292k/0k (98k/195k), time:  0.00/ 0.00/ 0.00
            Pass 2: I/O read: 1MB, write: 0MB, rate: 655.74MB/s
            Pass 3: Checking directory connectivity
            Peak memory: Memory used: 292k/0k (98k/195k), time:  0.01/ 0.00/ 0.00
            Pass 3: Memory used: 292k/0k (96k/197k), time:  0.00/ 0.00/ 0.00
            Pass 3: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
            Pass 4: Checking reference counts
            Pass 4: Memory used: 292k/0k (62k/231k), time:  0.00/ 0.00/ 0.00
            Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
            Pass 5: Checking group summary information
            Free blocks count wrong (33296, counted=32947).
            Fix? no
            
            Free inodes count wrong (99987, counted=99750).
            Fix? no
            
            Pass 5: Memory used: 292k/0k (62k/231k), time:  0.00/ 0.00/ 0.00
            Pass 5: I/O read: 1MB, write: 0MB, rate: 333.78MB/s
            
                      13 inodes used (0.01%, out of 100000)
                       6 non-contiguous files (46.2%)
                       0 non-contiguous directories (0.0%)
                         # of inodes with ind/dind/tind blocks: 0/0/0
                   16704 blocks used (33.41%, out of 50000)
                       0 bad blocks
                       1 large file
            
                     125 regular files
                     116 directories
                       0 character device files
                       0 block device files
                       0 fifos
                       0 links
                       0 symbolic links (0 fast symbolic links)
                       0 sockets
            ------------
                     241 files
            Memory used: 292k/0k (61k/232k), time:  0.01/ 0.01/ 0.00
            I/O read: 1MB, write: 0MB, rate: 92.42MB/s
            reboot facets: mds1
            

            Such inconsistency does not means the super block crashed. Because without disk checksum, the e2fsck cannot detect per-block based data corruption. Only with above logs, we cannot say it is the root reason for test_84 failure. I will update the patch with more debug information.

            yong.fan nasf (Inactive) added a comment - - edited We hit the trouble with the patch applied, the log shows that there is really some disk inconsistency as following: https://testing.hpdd.intel.com/test_sets/e6f060ac-8707-11e5-bf92-5254006e85c2 CMD: onyx-44vm3 e2fsck -d -v -t -t -f -n /dev/lvm-Role_MDS/P1 onyx-44vm3: e2fsck 1.42.13.wc3 (28-Aug-2015) Warning: skipping journal recovery because doing a read-only filesystem check. Pass 1: Checking inodes, blocks, and sizes Pass 1: Memory used: 292k/0k (87k/206k), time: 0.00/ 0.00/ 0.00 Pass 1: I/O read: 1MB, write: 0MB, rate: 553.40MB/s Pass 2: Checking directory structure Pass 2: Memory used: 292k/0k (98k/195k), time: 0.00/ 0.00/ 0.00 Pass 2: I/O read: 1MB, write: 0MB, rate: 655.74MB/s Pass 3: Checking directory connectivity Peak memory: Memory used: 292k/0k (98k/195k), time: 0.01/ 0.00/ 0.00 Pass 3: Memory used: 292k/0k (96k/197k), time: 0.00/ 0.00/ 0.00 Pass 3: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 4: Checking reference counts Pass 4: Memory used: 292k/0k (62k/231k), time: 0.00/ 0.00/ 0.00 Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 5: Checking group summary information Free blocks count wrong (33296, counted=32947). Fix? no Free inodes count wrong (99987, counted=99750). Fix? no Pass 5: Memory used: 292k/0k (62k/231k), time: 0.00/ 0.00/ 0.00 Pass 5: I/O read: 1MB, write: 0MB, rate: 333.78MB/s 13 inodes used (0.01%, out of 100000) 6 non-contiguous files (46.2%) 0 non-contiguous directories (0.0%) # of inodes with ind/dind/tind blocks: 0/0/0 16704 blocks used (33.41%, out of 50000) 0 bad blocks 1 large file 125 regular files 116 directories 0 character device files 0 block device files 0 fifos 0 links 0 symbolic links (0 fast symbolic links) 0 sockets ------------ 241 files Memory used: 292k/0k (61k/232k), time: 0.01/ 0.01/ 0.00 I/O read: 1MB, write: 0MB, rate: 92.42MB/s reboot facets: mds1 Such inconsistency does not means the super block crashed. Because without disk checksum, the e2fsck cannot detect per-block based data corruption. Only with above logs, we cannot say it is the root reason for test_84 failure. I will update the patch with more debug information.
            pjones Peter Jones added a comment -

            Excellent. Thanks Fan Yong.

            pjones Peter Jones added a comment - Excellent. Thanks Fan Yong.

            The issues cannot be reproduced any longer after the debug patch, even through several repeat the failed test case. So I suggest to land the patch to master and give the chance to normal Maloo run. I will update the patch to drop "fortestonly" and ask for landing permission.

            yong.fan nasf (Inactive) added a comment - The issues cannot be reproduced any longer after the debug patch, even through several repeat the failed test case. So I suggest to land the patch to master and give the chance to normal Maloo run. I will update the patch to drop "fortestonly" and ask for landing permission.
            pjones Peter Jones added a comment -

            Fan Yong

            What are the next steps here? It looks like the debug patch did not trigger the failure. Should we land the debug patch or does it just need to be run with a higher number of runs to improve the chances of hitting it?

            Thanks

            Peter

            pjones Peter Jones added a comment - Fan Yong What are the next steps here? It looks like the debug patch did not trigger the failure. Should we land the debug patch or does it just need to be run with a higher number of runs to improve the chances of hitting it? Thanks Peter

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/16664
            Subject: LU-7169 tests: check disk corruption during failover
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: aa1a896b1fddadce8b95b2c9af6c4a8509535a19

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/16664 Subject: LU-7169 tests: check disk corruption during failover Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: aa1a896b1fddadce8b95b2c9af6c4a8509535a19

            It is NOT the same at LU-6895. The direct reason for the failure is as following:

            LDISKFS-fs error (device dm-0): ldiskfs_lookup: deleted inode referenced: 75023
            

            ldiskfs_lookup() found a deleted inode, then its error handler set the system as read-only, and then caused cascaded failures for subsequent operations.

            In fact, before the ldiskfs_lookup() failure, the system already crashed. Here is the MDS debug log:

            00100000:00000001:0.0:1442342253.462893:0:14732:0:(osd_scrub.c:2423:osd_scrub_setup()) Process entered
            00080000:00000001:0.0:1442342253.462936:0:14732:0:(osd_handler.c:2401:osd_ea_fid_set()) Process entered
            00080000:00000001:0.0:1442342253.462944:0:14732:0:(osd_handler.c:2429:osd_ea_fid_set()) Process leaving (rc=0 : 0 : 0)
            00100000:10000000:0.0:1442342253.463278:0:14732:0:(osd_scrub.c:277:osd_scrub_file_reset()) lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x2
            

            Related source code is as following:

            int osd_scrub_setup(const struct lu_env *env, struct osd_device *dev)
            {
            ...
                    } else {
                            if (memcmp(sf->sf_uuid, es->s_uuid, 16) != 0) {
                                    osd_scrub_file_reset(scrub, es->s_uuid,SF_INCONSISTENT);
                                    dirty = 1;
            ...
            }
            

            The logs shows that during the MDT0000 mount, the osd_scrub_setup() found the super block's uuid has been changed. Usually, such case only happens when MDT file-level backup/restore. But in our conf-sanity test cases, there were no backup/restore operations. So the local file-system should have been crashed during the MDT failover. As for what caused the super block corruption, I have no idea yet.

            yong.fan nasf (Inactive) added a comment - It is NOT the same at LU-6895 . The direct reason for the failure is as following: LDISKFS-fs error (device dm-0): ldiskfs_lookup: deleted inode referenced: 75023 ldiskfs_lookup() found a deleted inode, then its error handler set the system as read-only, and then caused cascaded failures for subsequent operations. In fact, before the ldiskfs_lookup() failure, the system already crashed. Here is the MDS debug log: 00100000:00000001:0.0:1442342253.462893:0:14732:0:(osd_scrub.c:2423:osd_scrub_setup()) Process entered 00080000:00000001:0.0:1442342253.462936:0:14732:0:(osd_handler.c:2401:osd_ea_fid_set()) Process entered 00080000:00000001:0.0:1442342253.462944:0:14732:0:(osd_handler.c:2429:osd_ea_fid_set()) Process leaving (rc=0 : 0 : 0) 00100000:10000000:0.0:1442342253.463278:0:14732:0:(osd_scrub.c:277:osd_scrub_file_reset()) lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x2 Related source code is as following: int osd_scrub_setup( const struct lu_env *env, struct osd_device *dev) { ... } else { if (memcmp(sf->sf_uuid, es->s_uuid, 16) != 0) { osd_scrub_file_reset(scrub, es->s_uuid,SF_INCONSISTENT); dirty = 1; ... } The logs shows that during the MDT0000 mount, the osd_scrub_setup() found the super block's uuid has been changed. Usually, such case only happens when MDT file-level backup/restore. But in our conf-sanity test cases, there were no backup/restore operations. So the local file-system should have been crashed during the MDT failover. As for what caused the super block corruption, I have no idea yet.

            People

              yong.fan nasf (Inactive)
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: