Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3231

fld_cache_lookup() copies fld_cache_entry onto lu_seq_range

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 7896

    Description

      fld_cache_lookup() keeps a prev pointer to a struct fld_cache_entry. But when it uses this pointer to return the prev range, it copies the entry onto the range argument and not the entry's fce_range member.

      int fld_cache_lookup(struct fld_cache *cache,
                           const seqno_t seq, struct lu_seq_range *range)
      {
              struct fld_cache_entry *flde;
              struct fld_cache_entry *prev = NULL;
              cfs_list_t *head;
              ENTRY;
      
              ...
              cfs_list_for_each_entry(flde, head, fce_list) {
                      if (flde->fce_range.lsr_start > seq) {
                              if (prev != NULL)
                                      memcpy(range, prev, sizeof(*range));
                              break;
                      }
              ...
      }
      

      Attachments

        Issue Links

          Activity

            [LU-3231] fld_cache_lookup() copies fld_cache_entry onto lu_seq_range
            pjones Peter Jones added a comment -

            Landed for 2.4

            pjones Peter Jones added a comment - Landed for 2.4
            di.wang Di Wang added a comment -

            I think it will affect non-DNE setup too, when it uses up current meta-sequence(128k seqs), then tries to get new seq. This bug should be hit during the sequence range merging in FLD, IMHO. Thanks again, John.

            di.wang Di Wang added a comment - I think it will affect non-DNE setup too, when it uses up current meta-sequence(128k seqs), then tries to get new seq. This bug should be hit during the sequence range merging in FLD, IMHO. Thanks again, John.
            jhammond John Hammond added a comment -

            Sorry, I spent too much time trying to understand what was going on.

            DNE blocker. I cannot see any evidence that it affects non-DNE setups.

            On the current master (2.3.64-43-g507dc87) I'm using MDSCOUNT=2 MOUNT_2=y llmount.sh to setup. About every 1 in 5 times this results in a unusable client mount:

            # DURATION=10 sh ./lustre/tests/racer.sh 
            Logging to shared log directory: /tmp/test_logs/1366998603
            racer: /root/lustre-release/lustre/tests/racer/racer.sh with 2 MDTs
            excepting tests: 
            m: Checking config lustre mounted on /mnt/lustre2
            Checking servers environments
            Checking clients m environments
            m: Checking config lustre mounted on /mnt/lustre
            Checking servers environments
            Checking clients m environments
            Using TIMEOUT=20
            disable quota as required
            RACERDIRS=/mnt/lustre /mnt/lustre2
            
            == racer test 1: racer on clients: m DURATION=10 == 12:50:04 (1366998604)
            racers pids: 18395 18396 18397 18398
            ./dir_remote.sh: line 13: /mnt/lustre/racer/5/3: Input/output error
            ./dir_remote.sh: line 13: /mnt/lustre/racer/0/2: Input/output error
            ./dir_remote.sh: line 13: /mnt/lustre/racer/5/18: Input/output error
            ./dir_remote.sh: line 13: /mnt/lustre2/racer/16/5: Input/output error
            ./dir_remote.sh: line 13: /mnt/lustre2/racer1/7/3: No such file or directory
            ./dir_remote.sh: line 13: /mnt/lustre2/racer/15/19: Input/output error
            ./dir_remote.sh: line 13: /mnt/lustre2/racer/14/3: Input/output error
            ./dir_remote.sh: line 13: /mnt/lustre2/racer1/2/9: No such file or directory
            ./dir_remote.sh: line 13: /mnt/lustre/racer1/4/7: No such file or directory
            ./dir_remote.sh: line 13: /mnt/lustre/racer/12/9: Input/output error
            ./dir_remote.sh: line 13: /mnt/lustre2/racer/10/9: Input/output error
            ./dir_remote.sh: line 13: /mnt/lustre2/racer/3/16: Input/output error
            ./dir_remote.sh: line 13: /mnt/lustre2/racer1/3/8: Not a directory
            ./dir_remote.sh: line 13: /mnt/lustre2/racer/3/16: Input/output error
            ./dir_remote.sh: line 13: /mnt/lustre2/racer1/3/19: Not a directory
            ./dir_remote.sh: line 13: /mnt/lustre/racer1/2/5: Not a directory
            ...
            

            Note the requested flag "ffff8801" below is really part of a list_head from the entry:

            Lustre: DEBUG MARKER: == racer test 1: racer on clients: m DURATION=10 == 12:50:04 (1366998604)
            Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000280000400-0x00000002c0000400):0:mdt
            Lustre: cli-ctl-lustre-MDT0000-osp-MDT0001: Allocated super-sequence [0x00000002c0000400-0x0000000300000400):1:mdt]
            LustreError: 15736:0:(fld_handler.c:158:fld_server_lookup()) srv-lustre-MDT0000: FLD cache range [0x0000000280000400-0x00000002c0000400):0:mdt does not matchrequested flag ffff8801: rc = -5
            LustreError: 18811:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5
            LustreError: 15736:0:(fld_handler.c:158:fld_server_lookup()) srv-lustre-MDT0000: FLD cache range [0x0000000280000400-0x00000002c0000400):0:mdt does not matchrequested flag ffff8801: rc = -5
            LustreError: 18788:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5
            LustreError: 15715:0:(mdt_reint.c:332:mdt_md_create()) lustre-MDT0001: remote dir is only permitted on MDT0 or set_param mdt.*.enable_remote_dir=1
            LustreError: 15714:0:(mdt_reint.c:332:mdt_md_create()) lustre-MDT0001: remote dir is only permitted on MDT0 or set_param mdt.*.enable_remote_dir=1
            LustreError: 19275:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5
            LustreError: 19275:0:(lmv_fld.c:78:lmv_fld_lookup()) Skipped 284 previous similar messages
            
            jhammond John Hammond added a comment - Sorry, I spent too much time trying to understand what was going on. DNE blocker. I cannot see any evidence that it affects non-DNE setups. On the current master (2.3.64-43-g507dc87) I'm using MDSCOUNT=2 MOUNT_2=y llmount.sh to setup. About every 1 in 5 times this results in a unusable client mount: # DURATION=10 sh ./lustre/tests/racer.sh Logging to shared log directory: /tmp/test_logs/1366998603 racer: /root/lustre-release/lustre/tests/racer/racer.sh with 2 MDTs excepting tests: m: Checking config lustre mounted on /mnt/lustre2 Checking servers environments Checking clients m environments m: Checking config lustre mounted on /mnt/lustre Checking servers environments Checking clients m environments Using TIMEOUT=20 disable quota as required RACERDIRS=/mnt/lustre /mnt/lustre2 == racer test 1: racer on clients: m DURATION=10 == 12:50:04 (1366998604) racers pids: 18395 18396 18397 18398 ./dir_remote.sh: line 13: /mnt/lustre/racer/5/3: Input/output error ./dir_remote.sh: line 13: /mnt/lustre/racer/0/2: Input/output error ./dir_remote.sh: line 13: /mnt/lustre/racer/5/18: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer/16/5: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer1/7/3: No such file or directory ./dir_remote.sh: line 13: /mnt/lustre2/racer/15/19: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer/14/3: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer1/2/9: No such file or directory ./dir_remote.sh: line 13: /mnt/lustre/racer1/4/7: No such file or directory ./dir_remote.sh: line 13: /mnt/lustre/racer/12/9: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer/10/9: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer/3/16: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer1/3/8: Not a directory ./dir_remote.sh: line 13: /mnt/lustre2/racer/3/16: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer1/3/19: Not a directory ./dir_remote.sh: line 13: /mnt/lustre/racer1/2/5: Not a directory ... Note the requested flag "ffff8801" below is really part of a list_head from the entry: Lustre: DEBUG MARKER: == racer test 1: racer on clients: m DURATION=10 == 12:50:04 (1366998604) Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000280000400-0x00000002c0000400):0:mdt Lustre: cli-ctl-lustre-MDT0000-osp-MDT0001: Allocated super-sequence [0x00000002c0000400-0x0000000300000400):1:mdt] LustreError: 15736:0:(fld_handler.c:158:fld_server_lookup()) srv-lustre-MDT0000: FLD cache range [0x0000000280000400-0x00000002c0000400):0:mdt does not matchrequested flag ffff8801: rc = -5 LustreError: 18811:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5 LustreError: 15736:0:(fld_handler.c:158:fld_server_lookup()) srv-lustre-MDT0000: FLD cache range [0x0000000280000400-0x00000002c0000400):0:mdt does not matchrequested flag ffff8801: rc = -5 LustreError: 18788:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5 LustreError: 15715:0:(mdt_reint.c:332:mdt_md_create()) lustre-MDT0001: remote dir is only permitted on MDT0 or set_param mdt.*.enable_remote_dir=1 LustreError: 15714:0:(mdt_reint.c:332:mdt_md_create()) lustre-MDT0001: remote dir is only permitted on MDT0 or set_param mdt.*.enable_remote_dir=1 LustreError: 19275:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5 LustreError: 19275:0:(lmv_fld.c:78:lmv_fld_lookup()) Skipped 284 previous similar messages

            Di, John, how serious is this bug? Is this crashing, data corrupting, etc? Affecting normal usage, or only DNE? Please describe severity of problem, and mark it a blocker if so.

            adilger Andreas Dilger added a comment - Di, John, how serious is this bug? Is this crashing, data corrupting, etc? Affecting normal usage, or only DNE? Please describe severity of problem, and mark it a blocker if so.
            jhammond John Hammond added a comment - Please see http://review.whamcloud.com/6171 .

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: