Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
Lustre 2.9.0
-
3
-
9223372036854775807
Description
Error occured during soak testing of build '20160713' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160713). MDSes have been configured using ldiskfs, OSTs using zfs. Test environment consist of 4 MDSes with 1 MDT each, 6 OSSes with 4 OSTs each. MDS and OSS nodes are configured in active-active HA configuration.
Roles:
- lola-8 MGS/MDS
- lola-[9-11] MDS
DNE has been enabled using the command sequence (see Lustre manual page 96):
pdsh -g mds 'lctl set_param mdt.*.enable_remote_dir=1' pdsh -g mds 'lctl set_param mdt.*.enable_remote_dir_gid=-1' especially pdsh -w lola-8 'lctl set_param -P mdt.*.enable_remote_dir=1' pdsh -w lola-8 'lctl set_param -P mdt.*.enable_remote_dir_git=-1'
(The later two commands only work on MGS node).
Problem occur after each of the node lola-[9-11] have been restarted or resourcres had been failover / failedback.
While parameter 'enable_remote_dir' is persistent on the non MGS MDSes, the parameter 'enable_remote_dir_gid' isn't.
Therefore the command:
[soaktest@lola-16 ~]$ lfs setdirstripe -c 4 -i 1 /mnt/soaked/soaktest/hsm_rbh/ error on LL_IOC_LMV_SETSTRIPE '/mnt/soaked/soaktest/hsm_rbh/' (3): Operation not permitted error: setdirstripe: create stripe dir '/mnt/soaked/soaktest/hsm_rbh/' failed --------------- --> Remote dir setting: ---------------- lola-8 ---------------- Remote dir_gid setting soaked-MDT0000: -1 ---------------- lola-9 ---------------- Remote dir_gid setting soaked-MDT0001: 0 ---------------- lola-10 ---------------- Remote dir_gid setting soaked-MDT0002: -1 ---------------- lola-11 ---------------- Remote dir_gid setting soaked-MDT0003: 0
failed. This will break all test (slurm) jobs that rely on this functionality.
After setting the parameters on the nodes again the command
[soaktest@lola-16 ~]$ lfs setdirstripe -c 4 -i 1 /mnt/soaked/soaktest/hsm_rbh/ ^A2[soaktest@lola-16 ~]$ lfs setdirstripe -c 4 -i 1 -D /mnt/soaked/soaktest/hsm_rbh/ [soaktest@lola-16 ~]$ lfs getdirstripe /mnt/soaked/soaktest/hsm_rbh/ /mnt/soaked/soaktest/hsm_rbh/ [soaktest@lola-16 ~]$ lfs getdirstripe /mnt/soaked/soaktest/hsm_rbh/ /mnt/soaked/soaktest/hsm_rbh/ lmv_stripe_count: 4 lmv_stripe_offset: 1 mdtidx FID[seq:oid:ver] 1 [0x240007160:0x3:0x0] 2 [0x28000d714:0x3:0x0] 3 [0x2c000a810:0x1:0x0] 0 [0x20000fe01:0x3:0x0]
end successful.
Attachments
Issue Links
- is related to
-
LU-7004 fix "lctl set_param -P" to allow deprecation of "lctl conf_param"
-
- Resolved
-
Activity
Resolution | New: Cannot Reproduce [ 5 ] | |
Status | Original: In Progress [ 3 ] | New: Resolved [ 5 ] |
Affects Version/s | New: Lustre 2.9.0 [ 11891 ] |
Link | Original: This issue is related to JFC-23 [ JFC-23 ] |
Link | New: This issue is related to JFC-22 [ JFC-22 ] |
Link | New: This issue is related to JFC-23 [ JFC-23 ] |
Status | Original: Open [ 1 ] | New: In Progress [ 3 ] |
Assignee | Original: WC Triage [ wc-triage ] | New: Lai Siyao [ laisiyao ] |
Fix Version/s | New: Lustre 2.9.0 [ 11891 ] |
Description |
Original:
Error occured during soak testing of build '20160713' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160713). MDSes have been configured using _ldiskfs_, OSTs using _zfs_. Test environment consist of 4 MDSes with 1 MDT each, 6 OSSes with 4 OSTs each. MDS and OSS nodes are configured in active-active HA configuration. Roles: - lola-8 MGS/MDS - lola-[9-11] MDS DNE has been enabled using the command sequence (see Lustre manual page 96): {noformat} pdsh -g mds 'lctl set_param mdt.*.enable_remote_dir=1' pdsh -g mds 'lctl set_param mdt.*.enable_remote_dir_gid=-1' especially pdsh -w lola-8 'lctl set_param -P mdt.*.enable_remote_dir=1' pdsh -w lola-8 'lctl set_param -P mdt.*.enable_remote_dir_git=-1' {noformat} (The later two commands only work on MGS node). Problem occur after each of the node {{lola-[9-11]}} have been restarted or resourcres had been failover / failedback. While parameter '{{enable_remote_dir}}' is persistent on the non MGS MDSes, the parameter '{{enable_remote_dir_gid}}' isn't. Therefore the command: {noformat} [soaktest@lola-16 ~]$ lfs setdirstripe -c 4 -i 1 /mnt/soaked/soaktest/hsm_rbh/ error on LL_IOC_LMV_SETSTRIPE '/mnt/soaked/soaktest/hsm_rbh/' (3): Operation not permitted error: setdirstripe: create stripe dir '/mnt/soaked/soaktest/hsm_rbh/' failed --------------- --> Remote dir setting: ---------------- lola-8 ---------------- Remote dir_gid setting soaked-MDT0000: -1 ---------------- lola-9 ---------------- Remote dir_gid setting soaked-MDT0001: 0 ---------------- lola-10 ---------------- Remote dir_gid setting soaked-MDT0002: -1 ---------------- lola-11 ---------------- Remote dir_gid setting soaked-MDT0003: 0 {noformat} failed. This will break all test (slurm) jobs that rely on this functionality. |
New:
Error occured during soak testing of build '20160713' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160713). MDSes have been configured using _ldiskfs_, OSTs using _zfs_. Test environment consist of 4 MDSes with 1 MDT each, 6 OSSes with 4 OSTs each. MDS and OSS nodes are configured in active-active HA configuration. Roles: - lola-8 MGS/MDS - lola-[9-11] MDS DNE has been enabled using the command sequence (see Lustre manual page 96): {noformat} pdsh -g mds 'lctl set_param mdt.*.enable_remote_dir=1' pdsh -g mds 'lctl set_param mdt.*.enable_remote_dir_gid=-1' especially pdsh -w lola-8 'lctl set_param -P mdt.*.enable_remote_dir=1' pdsh -w lola-8 'lctl set_param -P mdt.*.enable_remote_dir_git=-1' {noformat} (The later two commands only work on MGS node). Problem occur after each of the node {{lola-[9-11]}} have been restarted or resourcres had been failover / failedback. While parameter '{{enable_remote_dir}}' is persistent on the non MGS MDSes, the parameter '{{enable_remote_dir_gid}}' isn't. Therefore the command: {noformat} [soaktest@lola-16 ~]$ lfs setdirstripe -c 4 -i 1 /mnt/soaked/soaktest/hsm_rbh/ error on LL_IOC_LMV_SETSTRIPE '/mnt/soaked/soaktest/hsm_rbh/' (3): Operation not permitted error: setdirstripe: create stripe dir '/mnt/soaked/soaktest/hsm_rbh/' failed --------------- --> Remote dir setting: ---------------- lola-8 ---------------- Remote dir_gid setting soaked-MDT0000: -1 ---------------- lola-9 ---------------- Remote dir_gid setting soaked-MDT0001: 0 ---------------- lola-10 ---------------- Remote dir_gid setting soaked-MDT0002: -1 ---------------- lola-11 ---------------- Remote dir_gid setting soaked-MDT0003: 0 {noformat} failed. This will break all test (slurm) jobs that rely on this functionality. After setting the parameters on the nodes again the command {noformat} [soaktest@lola-16 ~]$ lfs setdirstripe -c 4 -i 1 /mnt/soaked/soaktest/hsm_rbh/ ^A2[soaktest@lola-16 ~]$ lfs setdirstripe -c 4 -i 1 -D /mnt/soaked/soaktest/hsm_rbh/ [soaktest@lola-16 ~]$ lfs getdirstripe /mnt/soaked/soaktest/hsm_rbh/ /mnt/soaked/soaktest/hsm_rbh/ [soaktest@lola-16 ~]$ lfs getdirstripe /mnt/soaked/soaktest/hsm_rbh/ /mnt/soaked/soaktest/hsm_rbh/ lmv_stripe_count: 4 lmv_stripe_offset: 1 mdtidx FID[seq:oid:ver] 1 [0x240007160:0x3:0x0] 2 [0x28000d714:0x3:0x0] 3 [0x2c000a810:0x1:0x0] 0 [0x20000fe01:0x3:0x0] {noformat} end successful. |