Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12068

sanity-lfsck test_6b: (7.2) 0x0 is not larger than 0x0

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.2
    • Lustre 2.13.0, Lustre 2.12.1
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for wangshilong <wshilong@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/36dcb5b6-45d2-11e9-9646-52540065bddc

      test_6b failed with the following error:

      (7.2) 0x0 is not larger than 0x0
      

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-lfsck test_6b - (7.2) 0x0 is not larger than 0x0

      Attachments

        Issue Links

          Activity

            [LU-12068] sanity-lfsck test_6b: (7.2) 0x0 is not larger than 0x0
            mdiep Minh Diep added a comment -

            Landed in 2.13

            mdiep Minh Diep added a comment - Landed in 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34525/
            Subject: LU-12068 test: compare position for ZFS dot entry
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 42adbae36f206a6ed4170e7619cd993c8fa80b1d

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34525/ Subject: LU-12068 test: compare position for ZFS dot entry Project: fs/lustre-release Branch: master Current Patch Set: Commit: 42adbae36f206a6ed4170e7619cd993c8fa80b1d
            mdiep Minh Diep added a comment - +1 on b2_12 https://testing.whamcloud.com/test_sets/eb96e502-5523-11e9-9646-52540065bddc
            hongchao.zhang Hongchao Zhang added a comment - - edited

            Hi Andreas,

            Yes, there is one failed case on 2019-02-20 and another one on 2019-02-25, both failed cases are on branch "master-next",
            the version is "2.12.51.85", is it possible this version contains some other patches?

            https://testing.whamcloud.com/test_sessions/ae657cad-8359-4ff5-a42a-9b0e496a025b
            https://testing.whamcloud.com/test_sessions/b49849ff-483d-4982-84c5-92721cae1afe

            hongchao.zhang Hongchao Zhang added a comment - - edited Hi Andreas, Yes, there is one failed case on 2019-02-20 and another one on 2019-02-25, both failed cases are on branch "master-next", the version is "2.12.51.85", is it possible this version contains some other patches? https://testing.whamcloud.com/test_sessions/ae657cad-8359-4ff5-a42a-9b0e496a025b https://testing.whamcloud.com/test_sessions/b49849ff-483d-4982-84c5-92721cae1afe

            The patch https://review.whamcloud.com/34098 "LU-11330 osd-zfs: hash for ./.. must be 0" only landed on 2019-02-27, while there were a few tests failing on 2019-02-20 to 2019-02-25, so it is close to the first date this problem was seen, but not exactly the same. However, most of the tests started failing after 2019-02-27 so it is possible there are a couple of different issues here, and LU-11330 made the problem much worse. There aren't any cases where this test failed during the testing of LU-11330, but it is definitely not being hit on ldiskfs so this is the likely cause of most of these failures.

            adilger Andreas Dilger added a comment - The patch https://review.whamcloud.com/34098 " LU-11330 osd-zfs: hash for ./.. must be 0 " only landed on 2019-02-27, while there were a few tests failing on 2019-02-20 to 2019-02-25, so it is close to the first date this problem was seen, but not exactly the same. However, most of the tests started failing after 2019-02-27 so it is possible there are a couple of different issues here, and LU-11330 made the problem much worse. There aren't any cases where this test failed during the testing of LU-11330 , but it is definitely not being hit on ldiskfs so this is the likely cause of most of these failures.

            The issue is managed to be reproduced locally, and the cause of it is that the LFSCK process is stopped just when it is scanning the "." or ".." entry
            of some directory, for ZFS, the position of both the two entries is zero.

            static __u64 osd_dir_it_store(const struct lu_env *env, const struct dt_it *di)
            {       
                    struct osd_zap_it *it = (struct osd_zap_it *)di;
                    __u64              pos;
                    ENTRY;
            
                    if (it->ozi_pos <= OZI_POS_DOTDOT)
                            pos = 0;
                    else
                            pos = osd_zap_cursor_serialize(it->ozi_zc);
                                              
                    RETURN(pos);
            }       
            

            the patch is tracked at https://review.whamcloud.com/34525

            hongchao.zhang Hongchao Zhang added a comment - The issue is managed to be reproduced locally, and the cause of it is that the LFSCK process is stopped just when it is scanning the "." or ".." entry of some directory, for ZFS, the position of both the two entries is zero. static __u64 osd_dir_it_store(const struct lu_env *env, const struct dt_it *di) { struct osd_zap_it *it = (struct osd_zap_it *)di; __u64 pos; ENTRY; if (it->ozi_pos <= OZI_POS_DOTDOT) pos = 0; else pos = osd_zap_cursor_serialize(it->ozi_zc); RETURN(pos); } the patch is tracked at https://review.whamcloud.com/34525

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34525
            Subject: LU-12068 test: compare position for ZFS dot entry
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 617cd0fa4b53453cd88295a7c2265817ebc38f48

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34525 Subject: LU-12068 test: compare position for ZFS dot entry Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 617cd0fa4b53453cd88295a7c2265817ebc38f48

            HongChao, could you please look into this issue. It is causing a large number of test failures.

            adilger Andreas Dilger added a comment - HongChao, could you please look into this issue. It is causing a large number of test failures.

            I dug in to this for a while yesterday (I found several examples with debug, James' there is very much representative), and I mostly concluded that I don't understand lfsck tests very well.  It would be good if someone who knows the lfsck architecture could dig in - I was spending most of my time trying to figure out what normal was for this.

            pfarrell Patrick Farrell (Inactive) added a comment - I dug in to this for a while yesterday (I found several examples with debug, James' there is very much representative), and I mostly concluded that I don't understand lfsck tests very well.  It would be good if someone who knows the lfsck architecture could dig in - I was spending most of my time trying to figure out what normal was for this.

            We have at least one sanity-lfsck test 6b failure with the debug information at https://testing.whamcloud.com/test_sets/5c532a1a-4fe0-11e9-9720-52540065bddc. The debug info for this test is:

            Additional debug for 6b
            CMD: trevis-12vm8 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
            name: lfsck_namespace
            magic: 0xa06249ff
            version: 2
            status: scanning-phase1
            flags:
            param:
            last_completed_time: 1553561433
            time_since_last_completed: 14 seconds
            latest_start_time: 1553561446
            time_since_latest_start: 1 seconds
            last_checkpoint_time: 1553561444
            time_since_last_checkpoint: 3 seconds
            latest_start_position: 1943, [0x200000405:0x1:0x0], 0x0
            last_checkpoint_position: 1680, [0x200000405:0x1:0x0], 0x0
            first_failure_position: N/A, N/A, N/A
            checked_phase1: 4
            checked_phase2: 0
            updated_phase1: 0
            updated_phase2: 0
            failed_phase1: 0
            failed_phase2: 0
            directories: 2
            dirent_repaired: 0
            linkea_repaired: 0
            nlinks_repaired: 0
            multiple_linked_checked: 0
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 0
            dangling_repaired: 0
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 0
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 0
            striped_shards_repaired: 0
            striped_shards_failed: 0
            striped_shards_skipped: 0
            name_hash_repaired: 0
            linkea_overflow_cleared: 0
            agent_entries_repaired: 0
            success_count: 11
            run_time_phase1: 7 seconds
            run_time_phase2: 0 seconds
            average_speed_phase1: 0 items/sec
            average_speed_phase2: N/A
            average_speed_total: 0 items/sec
            real_time_speed_phase1: 0 items/sec
            real_time_speed_phase2: N/A
            current_position: 1942, [0x200000405:0x1:0x0], 0x0
             sanity-lfsck test_6b: @@@@@@ FAIL: (7.2) 0x0 is not larger than 0x0 
            
            jamesanunez James Nunez (Inactive) added a comment - We have at least one sanity-lfsck test 6b failure with the debug information at https://testing.whamcloud.com/test_sets/5c532a1a-4fe0-11e9-9720-52540065bddc . The debug info for this test is: Additional debug for 6b CMD: trevis-12vm8 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace name: lfsck_namespace magic: 0xa06249ff version: 2 status: scanning-phase1 flags: param: last_completed_time: 1553561433 time_since_last_completed: 14 seconds latest_start_time: 1553561446 time_since_latest_start: 1 seconds last_checkpoint_time: 1553561444 time_since_last_checkpoint: 3 seconds latest_start_position: 1943, [0x200000405:0x1:0x0], 0x0 last_checkpoint_position: 1680, [0x200000405:0x1:0x0], 0x0 first_failure_position: N/A, N/A, N/A checked_phase1: 4 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 2 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 0 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 agent_entries_repaired: 0 success_count: 11 run_time_phase1: 7 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A average_speed_total: 0 items/sec real_time_speed_phase1: 0 items/sec real_time_speed_phase2: N/A current_position: 1942, [0x200000405:0x1:0x0], 0x0 sanity-lfsck test_6b: @@@@@@ FAIL: (7.2) 0x0 is not larger than 0x0

            Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/34417/
            Subject: LU-12068 tests: add debug for sanity-lfsck test_6b
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 698209700faf51c227f9ba16626de5ed70fa97c8

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/34417/ Subject: LU-12068 tests: add debug for sanity-lfsck test_6b Project: fs/lustre-release Branch: master Current Patch Set: Commit: 698209700faf51c227f9ba16626de5ed70fa97c8

            People

              hongchao.zhang Hongchao Zhang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: