Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8414

DNE: Setting of remote_dir_gid parameter not persistent

Details

    • 3
    • 9223372036854775807

    Description

      Error occured during soak testing of build '20160713' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160713). MDSes have been configured using ldiskfs, OSTs using zfs. Test environment consist of 4 MDSes with 1 MDT each, 6 OSSes with 4 OSTs each. MDS and OSS nodes are configured in active-active HA configuration.
      Roles:

      • lola-8 MGS/MDS
      • lola-[9-11] MDS

      DNE has been enabled using the command sequence (see Lustre manual page 96):

      pdsh -g mds 'lctl set_param mdt.*.enable_remote_dir=1'
      pdsh -g mds 'lctl set_param mdt.*.enable_remote_dir_gid=-1'
      especially
      pdsh -w lola-8 'lctl set_param -P mdt.*.enable_remote_dir=1'
      pdsh -w lola-8 'lctl set_param -P mdt.*.enable_remote_dir_git=-1'
      

      (The later two commands only work on MGS node).
      Problem occur after each of the node lola-[9-11] have been restarted or resourcres had been failover / failedback.
      While parameter 'enable_remote_dir' is persistent on the non MGS MDSes, the parameter 'enable_remote_dir_gid' isn't.
      Therefore the command:

      [soaktest@lola-16 ~]$ lfs setdirstripe -c 4 -i 1  /mnt/soaked/soaktest/hsm_rbh/
      error on LL_IOC_LMV_SETSTRIPE '/mnt/soaked/soaktest/hsm_rbh/' (3): Operation not permitted
      error: setdirstripe: create stripe dir '/mnt/soaked/soaktest/hsm_rbh/' failed
      
      ---------------
      --> Remote dir setting:
      ----------------
      lola-8
      ----------------
      Remote dir_gid setting soaked-MDT0000: -1
      ----------------
      lola-9
      ----------------
      Remote dir_gid setting soaked-MDT0001: 0
      ----------------
      lola-10
      ----------------
      Remote dir_gid setting soaked-MDT0002: -1
      ----------------
      lola-11
      ----------------
      Remote dir_gid setting soaked-MDT0003: 0
      

      failed. This will break all test (slurm) jobs that rely on this functionality.

      After setting the parameters on the nodes again the command

      [soaktest@lola-16 ~]$ lfs setdirstripe -c 4 -i 1  /mnt/soaked/soaktest/hsm_rbh/
      ^A2[soaktest@lola-16 ~]$ lfs setdirstripe -c 4 -i 1  -D /mnt/soaked/soaktest/hsm_rbh/
      [soaktest@lola-16 ~]$ lfs getdirstripe  /mnt/soaked/soaktest/hsm_rbh/
      /mnt/soaked/soaktest/hsm_rbh/
      [soaktest@lola-16 ~]$ lfs getdirstripe  /mnt/soaked/soaktest/hsm_rbh/
      /mnt/soaked/soaktest/hsm_rbh/
      lmv_stripe_count: 4 lmv_stripe_offset: 1
      mdtidx           FID[seq:oid:ver]
           1           [0x240007160:0x3:0x0]
           2           [0x28000d714:0x3:0x0]
           3           [0x2c000a810:0x1:0x0]
           0           [0x20000fe01:0x3:0x0]
      

      end successful.

      Attachments

        Issue Links

          Activity

            [LU-8414] DNE: Setting of remote_dir_gid parameter not persistent
            pjones Peter Jones added a comment -

            Closing as no longer appearing

            pjones Peter Jones added a comment - Closing as no longer appearing

            On current tip of master, after 5 MDT failovers, remote dir is persistent

            # lfs setdirstripe -c 4 -i 1 /mnt/soaked/bah
            [root@lola-16 jobs]# lfs getdirstripe /mnt/soaked/bah
            lmv_stripe_count: 4 lmv_stripe_offset: 1
            mdtidx           FID[seq:oid:ver]
                 1           [0x240002b10:0x21ad:0x0]
                 2           [0x280001b74:0x21ad:0x0]
                 3           [0x2c0003ac8:0x21ad:0x0]
                 0           [0x200002b4f:0x21ad:0x0]
            
            cliffw Cliff White (Inactive) added a comment - On current tip of master, after 5 MDT failovers, remote dir is persistent # lfs setdirstripe -c 4 -i 1 /mnt/soaked/bah [root@lola-16 jobs]# lfs getdirstripe /mnt/soaked/bah lmv_stripe_count: 4 lmv_stripe_offset: 1 mdtidx FID[seq:oid:ver] 1 [0x240002b10:0x21ad:0x0] 2 [0x280001b74:0x21ad:0x0] 3 [0x2c0003ac8:0x21ad:0x0] 0 [0x200002b4f:0x21ad:0x0]

            done (LUDOC-355).
            I just wonder whether it is a use case, if customers would like to enable only a subset of the available remote MDTs. The fix enables all or nothing, so that the 'set_param -P - procedure' would need be executed.

            heckes Frank Heckes (Inactive) added a comment - done ( LUDOC-355 ). I just wonder whether it is a use case, if customers would like to enable only a subset of the available remote MDTs. The fix enables all or nothing, so that the ' set_param -P - procedure' would need be executed.

            Frank, can you please file an LUDOC ticket with details of what needs to be fixed in the manual so that this documented correctly.

            adilger Andreas Dilger added a comment - Frank, can you please file an LUDOC ticket with details of what needs to be fixed in the manual so that this documented correctly.

            Sorry I didn't read and thought carefully. Indeed conf_param fixes the problem:

            [root@lola-16 ~]# pdsh -g mds 'lctl get_param mdt.*.enable_remote_dir ; lctl get_param mdt.*.enable_remote_dir_gid' | sort -k 2,2
            lola-8: mdt.soaked-MDT0000.enable_remote_dir=0
            lola-8: mdt.soaked-MDT0000.enable_remote_dir_gid=0
            lola-9: mdt.soaked-MDT0001.enable_remote_dir=0
            lola-9: mdt.soaked-MDT0001.enable_remote_dir_gid=0
            lola-10: mdt.soaked-MDT0002.enable_remote_dir=0
            lola-10: mdt.soaked-MDT0002.enable_remote_dir_gid=0
            lola-11: mdt.soaked-MDT0003.enable_remote_dir=0
            lola-11: mdt.soaked-MDT0003.enable_remote_dir_gid=0
            [root@lola-16 ~]# ssh lola-8 'lctl conf_param soaked.mdt.enable_remote_dir=1'
            [root@lola-16 ~]# pdsh -g mds 'lctl get_param mdt.*.enable_remote_dir ; lctl get_param mdt.*.enable_remote_dir_gid' | sort -k 2,2
            lola-8: mdt.soaked-MDT0000.enable_remote_dir=0
            lola-8: mdt.soaked-MDT0000.enable_remote_dir_gid=0
            lola-9: mdt.soaked-MDT0001.enable_remote_dir=1
            lola-9: mdt.soaked-MDT0001.enable_remote_dir_gid=0
            lola-10: mdt.soaked-MDT0002.enable_remote_dir=0
            lola-10: mdt.soaked-MDT0002.enable_remote_dir_gid=0
            lola-11: mdt.soaked-MDT0003.enable_remote_dir=1
            lola-11: mdt.soaked-MDT0003.enable_remote_dir_gid=0
            [root@lola-16 ~]# ssh lola-8 'lctl conf_param soaked.mdt.enable_remote_dir_gid=-1'
            [root@lola-16 ~]# pdsh -g mds 'lctl get_param mdt.*.enable_remote_dir ; lctl get_param mdt.*.enable_remote_dir_gid' | sort -k 2,2
            lola-8: mdt.soaked-MDT0000.enable_remote_dir=1
            lola-8: mdt.soaked-MDT0000.enable_remote_dir_gid=-1
            lola-9: mdt.soaked-MDT0001.enable_remote_dir=1
            lola-9: mdt.soaked-MDT0001.enable_remote_dir_gid=-1
            lola-10: mdt.soaked-MDT0002.enable_remote_dir=1
            lola-10: mdt.soaked-MDT0002.enable_remote_dir_gid=-1
            lola-11: mdt.soaked-MDT0003.enable_remote_dir=1
            lola-11: mdt.soaked-MDT0003.enable_remote_dir_gid=-1
            
            heckes Frank Heckes (Inactive) added a comment - Sorry I didn't read and thought carefully. Indeed conf_param fixes the problem: [root@lola-16 ~]# pdsh -g mds 'lctl get_param mdt.*.enable_remote_dir ; lctl get_param mdt.*.enable_remote_dir_gid' | sort -k 2,2 lola-8: mdt.soaked-MDT0000.enable_remote_dir=0 lola-8: mdt.soaked-MDT0000.enable_remote_dir_gid=0 lola-9: mdt.soaked-MDT0001.enable_remote_dir=0 lola-9: mdt.soaked-MDT0001.enable_remote_dir_gid=0 lola-10: mdt.soaked-MDT0002.enable_remote_dir=0 lola-10: mdt.soaked-MDT0002.enable_remote_dir_gid=0 lola-11: mdt.soaked-MDT0003.enable_remote_dir=0 lola-11: mdt.soaked-MDT0003.enable_remote_dir_gid=0 [root@lola-16 ~]# ssh lola-8 'lctl conf_param soaked.mdt.enable_remote_dir=1' [root@lola-16 ~]# pdsh -g mds 'lctl get_param mdt.*.enable_remote_dir ; lctl get_param mdt.*.enable_remote_dir_gid' | sort -k 2,2 lola-8: mdt.soaked-MDT0000.enable_remote_dir=0 lola-8: mdt.soaked-MDT0000.enable_remote_dir_gid=0 lola-9: mdt.soaked-MDT0001.enable_remote_dir=1 lola-9: mdt.soaked-MDT0001.enable_remote_dir_gid=0 lola-10: mdt.soaked-MDT0002.enable_remote_dir=0 lola-10: mdt.soaked-MDT0002.enable_remote_dir_gid=0 lola-11: mdt.soaked-MDT0003.enable_remote_dir=1 lola-11: mdt.soaked-MDT0003.enable_remote_dir_gid=0 [root@lola-16 ~]# ssh lola-8 'lctl conf_param soaked.mdt.enable_remote_dir_gid=-1' [root@lola-16 ~]# pdsh -g mds 'lctl get_param mdt.*.enable_remote_dir ; lctl get_param mdt.*.enable_remote_dir_gid' | sort -k 2,2 lola-8: mdt.soaked-MDT0000.enable_remote_dir=1 lola-8: mdt.soaked-MDT0000.enable_remote_dir_gid=-1 lola-9: mdt.soaked-MDT0001.enable_remote_dir=1 lola-9: mdt.soaked-MDT0001.enable_remote_dir_gid=-1 lola-10: mdt.soaked-MDT0002.enable_remote_dir=1 lola-10: mdt.soaked-MDT0002.enable_remote_dir_gid=-1 lola-11: mdt.soaked-MDT0003.enable_remote_dir=1 lola-11: mdt.soaked-MDT0003.enable_remote_dir_gid=-1

            Frank, as shown in my previous comment, the format of conf_param and set_param are different. For conf_param you need to specify the filesystem name first instead of the device name:

            lctl conf_param soaked.mdt.enable_remote_dir=1
            lctl conf_param soaked.mdt.enable_remote_dir_gid=-1
            
            adilger Andreas Dilger added a comment - Frank, as shown in my previous comment, the format of conf_param and set_param are different. For conf_param you need to specify the filesystem name first instead of the device name: lctl conf_param soaked.mdt.enable_remote_dir=1 lctl conf_param soaked.mdt.enable_remote_dir_gid=-1

            People

              laisiyao Lai Siyao
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: