Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12068

sanity-lfsck test_6b: (7.2) 0x0 is not larger than 0x0

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.2
    • Lustre 2.13.0, Lustre 2.12.1
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for wangshilong <wshilong@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/36dcb5b6-45d2-11e9-9646-52540065bddc

      test_6b failed with the following error:

      (7.2) 0x0 is not larger than 0x0
      

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-lfsck test_6b - (7.2) 0x0 is not larger than 0x0

      Attachments

        Issue Links

          Activity

            [LU-12068] sanity-lfsck test_6b: (7.2) 0x0 is not larger than 0x0

            The issue is managed to be reproduced locally, and the cause of it is that the LFSCK process is stopped just when it is scanning the "." or ".." entry
            of some directory, for ZFS, the position of both the two entries is zero.

            static __u64 osd_dir_it_store(const struct lu_env *env, const struct dt_it *di)
            {       
                    struct osd_zap_it *it = (struct osd_zap_it *)di;
                    __u64              pos;
                    ENTRY;
            
                    if (it->ozi_pos <= OZI_POS_DOTDOT)
                            pos = 0;
                    else
                            pos = osd_zap_cursor_serialize(it->ozi_zc);
                                              
                    RETURN(pos);
            }       
            

            the patch is tracked at https://review.whamcloud.com/34525

            hongchao.zhang Hongchao Zhang added a comment - The issue is managed to be reproduced locally, and the cause of it is that the LFSCK process is stopped just when it is scanning the "." or ".." entry of some directory, for ZFS, the position of both the two entries is zero. static __u64 osd_dir_it_store(const struct lu_env *env, const struct dt_it *di) { struct osd_zap_it *it = (struct osd_zap_it *)di; __u64 pos; ENTRY; if (it->ozi_pos <= OZI_POS_DOTDOT) pos = 0; else pos = osd_zap_cursor_serialize(it->ozi_zc); RETURN(pos); } the patch is tracked at https://review.whamcloud.com/34525

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34525
            Subject: LU-12068 test: compare position for ZFS dot entry
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 617cd0fa4b53453cd88295a7c2265817ebc38f48

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34525 Subject: LU-12068 test: compare position for ZFS dot entry Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 617cd0fa4b53453cd88295a7c2265817ebc38f48

            HongChao, could you please look into this issue. It is causing a large number of test failures.

            adilger Andreas Dilger added a comment - HongChao, could you please look into this issue. It is causing a large number of test failures.

            I dug in to this for a while yesterday (I found several examples with debug, James' there is very much representative), and I mostly concluded that I don't understand lfsck tests very well.  It would be good if someone who knows the lfsck architecture could dig in - I was spending most of my time trying to figure out what normal was for this.

            pfarrell Patrick Farrell (Inactive) added a comment - I dug in to this for a while yesterday (I found several examples with debug, James' there is very much representative), and I mostly concluded that I don't understand lfsck tests very well.  It would be good if someone who knows the lfsck architecture could dig in - I was spending most of my time trying to figure out what normal was for this.

            We have at least one sanity-lfsck test 6b failure with the debug information at https://testing.whamcloud.com/test_sets/5c532a1a-4fe0-11e9-9720-52540065bddc. The debug info for this test is:

            Additional debug for 6b
            CMD: trevis-12vm8 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
            name: lfsck_namespace
            magic: 0xa06249ff
            version: 2
            status: scanning-phase1
            flags:
            param:
            last_completed_time: 1553561433
            time_since_last_completed: 14 seconds
            latest_start_time: 1553561446
            time_since_latest_start: 1 seconds
            last_checkpoint_time: 1553561444
            time_since_last_checkpoint: 3 seconds
            latest_start_position: 1943, [0x200000405:0x1:0x0], 0x0
            last_checkpoint_position: 1680, [0x200000405:0x1:0x0], 0x0
            first_failure_position: N/A, N/A, N/A
            checked_phase1: 4
            checked_phase2: 0
            updated_phase1: 0
            updated_phase2: 0
            failed_phase1: 0
            failed_phase2: 0
            directories: 2
            dirent_repaired: 0
            linkea_repaired: 0
            nlinks_repaired: 0
            multiple_linked_checked: 0
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 0
            dangling_repaired: 0
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 0
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 0
            striped_shards_repaired: 0
            striped_shards_failed: 0
            striped_shards_skipped: 0
            name_hash_repaired: 0
            linkea_overflow_cleared: 0
            agent_entries_repaired: 0
            success_count: 11
            run_time_phase1: 7 seconds
            run_time_phase2: 0 seconds
            average_speed_phase1: 0 items/sec
            average_speed_phase2: N/A
            average_speed_total: 0 items/sec
            real_time_speed_phase1: 0 items/sec
            real_time_speed_phase2: N/A
            current_position: 1942, [0x200000405:0x1:0x0], 0x0
             sanity-lfsck test_6b: @@@@@@ FAIL: (7.2) 0x0 is not larger than 0x0 
            
            jamesanunez James Nunez (Inactive) added a comment - We have at least one sanity-lfsck test 6b failure with the debug information at https://testing.whamcloud.com/test_sets/5c532a1a-4fe0-11e9-9720-52540065bddc . The debug info for this test is: Additional debug for 6b CMD: trevis-12vm8 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace name: lfsck_namespace magic: 0xa06249ff version: 2 status: scanning-phase1 flags: param: last_completed_time: 1553561433 time_since_last_completed: 14 seconds latest_start_time: 1553561446 time_since_latest_start: 1 seconds last_checkpoint_time: 1553561444 time_since_last_checkpoint: 3 seconds latest_start_position: 1943, [0x200000405:0x1:0x0], 0x0 last_checkpoint_position: 1680, [0x200000405:0x1:0x0], 0x0 first_failure_position: N/A, N/A, N/A checked_phase1: 4 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 2 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 0 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 agent_entries_repaired: 0 success_count: 11 run_time_phase1: 7 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A average_speed_total: 0 items/sec real_time_speed_phase1: 0 items/sec real_time_speed_phase2: N/A current_position: 1942, [0x200000405:0x1:0x0], 0x0 sanity-lfsck test_6b: @@@@@@ FAIL: (7.2) 0x0 is not larger than 0x0

            Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/34417/
            Subject: LU-12068 tests: add debug for sanity-lfsck test_6b
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 698209700faf51c227f9ba16626de5ed70fa97c8

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/34417/ Subject: LU-12068 tests: add debug for sanity-lfsck test_6b Project: fs/lustre-release Branch: master Current Patch Set: Commit: 698209700faf51c227f9ba16626de5ed70fa97c8

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34417
            Subject: LU-12068 tests: add debug for sanity-lfsck test_6b
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3a81468d0f0fff77de01e67a479c9ea82878d4c8

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34417 Subject: LU-12068 tests: add debug for sanity-lfsck test_6b Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3a81468d0f0fff77de01e67a479c9ea82878d4c8
            adilger Andreas Dilger added a comment - - edited

            It looks like this has been happening a lot recently, but the first recent occurrence is on 2019-02-20:
            https://testing.whamcloud.com/test_sets/07e0f5ee-3513-11e9-b4f9-52540065bddc

            There were a number of patches landed on 2019-02-18 that may have triggered this issue, but none of the patches landed on the 18th look like they would be the cause. The most recent patch that changed LFSCK is LU-11111 but it landed on 2019-02-11.

            adilger Andreas Dilger added a comment - - edited It looks like this has been happening a lot recently, but the first recent occurrence is on 2019-02-20: https://testing.whamcloud.com/test_sets/07e0f5ee-3513-11e9-b4f9-52540065bddc There were a number of patches landed on 2019-02-18 that may have triggered this issue, but none of the patches landed on the 18th look like they would be the cause. The most recent patch that changed LFSCK is LU-11111 but it landed on 2019-02-11.

            People

              hongchao.zhang Hongchao Zhang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: