[LU-8951] lctl conf_param not retaining *_cache_enable settings - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.7.0
Labels:
None
Environment:
CentOS-6.8
lustre-2.7.2-2nasS_mofed32v3f_2.6.32_573.26.1.el6.20160517.x86_64.lustre272.x86_64
kernel-2.6.32-573.26.1.el6.20160517.x86_64.lustre272

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

The read_cache_enable=1 and writethrough_cache_enable=1 settings don't appear to be retained through unmount/remount of the OSTs. However, the readcache_max_filesize=4194304 DOES get retained.

Is there a step I'm missing? Is there additional debugging procedures I should follow to trace down the source of the problem?

Example commands showing that the setting does get propagated after a "lctl conf_param", but then goes away...

nbp7-mds2 ~ # for i in $(lctl list_param osc.fscache*OST* | sed 's/osc\.//; s/-osc-.*//'); do
  lctl conf_param $i.ost.read_cache_enable=1;
  lctl conf_param $i.ost.writethrough_cache_enable=1;
  lctl conf_param $i.ost.readcache_max_filesize=4M;
done

(wait ~10 seconds to propagate)

Nothing unusual in the OSS logs. In the logs on the MGS:

Dec 16 14:28:26 nbp7-mds2 kernel: Lustre: Modifying parameter fscache-OST0005.ost.read_cache_enable in log fscache-OST0005
Dec 16 14:28:26 nbp7-mds2 kernel: Lustre: Skipped 17 previous similar messages

nbp1-oss6 ~ # lctl get_param obdfilter.fscache*.{*_cache_enable,readcache_max_filesize}
obdfilter.fscache-OST0005.read_cache_enable=1
obdfilter.fscache-OST0005.writethrough_cache_enable=1
obdfilter.fscache-OST000b.read_cache_enable=1
obdfilter.fscache-OST000b.writethrough_cache_enable=1
obdfilter.fscache-OST0011.read_cache_enable=1
obdfilter.fscache-OST0011.writethrough_cache_enable=1
obdfilter.fscache-OST0017.read_cache_enable=1
obdfilter.fscache-OST0017.writethrough_cache_enable=1
obdfilter.fscache-OST001d.read_cache_enable=1
obdfilter.fscache-OST001d.writethrough_cache_enable=1
obdfilter.fscache-OST0023.read_cache_enable=1
obdfilter.fscache-OST0023.writethrough_cache_enable=1
obdfilter.fscache-OST0005.readcache_max_filesize=4194304
obdfilter.fscache-OST000b.readcache_max_filesize=4194304
obdfilter.fscache-OST0011.readcache_max_filesize=4194304
obdfilter.fscache-OST0017.readcache_max_filesize=4194304
obdfilter.fscache-OST001d.readcache_max_filesize=4194304
obdfilter.fscache-OST0023.readcache_max_filesize=4194304

# lmount -u -v -f fscache.csv --host service636
umounting OSTs...
ssh -n service636 umount /mnt/lustre/OST29
ssh -n service636 umount /mnt/lustre/OST5
ssh -n service636 umount /mnt/lustre/OST35
ssh -n service636 umount /mnt/lustre/OST17
ssh -n service636 umount /mnt/lustre/OST11
ssh -n service636 umount /mnt/lustre/OST23

nbp1-oss6 ~ # lctl get_param obdfilter.fscache*.{*_cache_enable,readcache_max_filesize}
error: get_param: obdfilter/fscache*/*_cache_enable: Found no match
error: get_param: obdfilter/fscache*/readcache_max_filesize: Found no match

# lmount -m -v -f fscache.csv --host service636
mounting OSTs...
ssh -n service636 'mkdir -p /mnt/lustre/OST29 ; mount -t lustre $(journal-dev-of.sh dev/intelcas1-29) -o errors=panic,extents,mballoc /dev/intelcas1-29 /mnt/lustre/OST29' 
ssh -n service636 'mkdir -p /mnt/lustre/OST5 ; mount -t lustre $(journal-dev-of.sh /dev/intelcas1-5) -o errors=panic,extents,mballoc /dev/intelcas1-5 /mnt/lustre/OST5' 
ssh -n service636 'mkdir -p /mnt/lustre/OST35 ; mount -t lustre $(journal-dev-of.sh /dev/intelcas2-35) -o errors=panic,extents,mballoc /dev/intelcas2-35 /mnt/lustre/OST35' 
ssh -n service636 'mkdir -p /mnt/lustre/OST17 ; mount -t lustre $(journal-dev-of.sh /dev/intelcas1-17) -o errors=panic,extents,mballoc /dev/intelcas1-17 /mnt/lustre/OST17' 
ssh -n service636 'mkdir -p /mnt/lustre/OST11 ; mount -t lustre $(journal-dev-of.sh /dev/intelcas2-11) -o errors=panic,extents,mballoc /dev/intelcas2-11 /mnt/lustre/OST11' 
ssh -n service636 'mkdir -p /mnt/lustre/OST23 ; mount -t lustre $(journal-dev-of.sh /dev/intelcas2-23) -o errors=panic,extents,mballoc /dev/intelcas2-23 /mnt/lustre/OST23'

# lctl get_param obdfilter.fscache*.{*_cache_enable,readcache_max_filesize}
obdfilter.fscache-OST0005.read_cache_enable=0
obdfilter.fscache-OST0005.writethrough_cache_enable=0
obdfilter.fscache-OST000b.read_cache_enable=0
obdfilter.fscache-OST000b.writethrough_cache_enable=0
obdfilter.fscache-OST0011.read_cache_enable=0
obdfilter.fscache-OST0011.writethrough_cache_enable=0
obdfilter.fscache-OST0017.read_cache_enable=0
obdfilter.fscache-OST0017.writethrough_cache_enable=0
obdfilter.fscache-OST001d.read_cache_enable=0
obdfilter.fscache-OST001d.writethrough_cache_enable=0
obdfilter.fscache-OST0023.read_cache_enable=0
obdfilter.fscache-OST0023.writethrough_cache_enable=0
obdfilter.fscache-OST0005.readcache_max_filesize=4194304
obdfilter.fscache-OST000b.readcache_max_filesize=4194304
obdfilter.fscache-OST0011.readcache_max_filesize=4194304
obdfilter.fscache-OST0017.readcache_max_filesize=4194304
obdfilter.fscache-OST001d.readcache_max_filesize=4194304
obdfilter.fscache-OST0023.readcache_max_filesize=4194304

I should note that the MDS does have a single MGS for two file systems, in case that is relevant to reproducing the problem...

nbp7-mds2 ~ # lctl dl
  0 UP osd-ldiskfs MGS-osd MGS-osd_UUID 5
  1 UP mgs MGS MGS 9
  2 UP mgc MGC10.151.27.39@o2ib 2eb7f880-6a2e-ea5e-3631-922183627327 5
  3 UP osd-ldiskfs nocache-MDT0000-osd nocache-MDT0000-osd_UUID 13
  4 UP mds MDS MDS_uuid 3
  5 UP lod nocache-MDT0000-mdtlov nocache-MDT0000-mdtlov_UUID 4
  6 UP mdt nocache-MDT0000 nocache-MDT0000_UUID 19
  7 UP mdd nocache-MDD0000 nocache-MDD0000_UUID 4
  8 UP qmt nocache-QMT0000 nocache-QMT0000_UUID 4
  9 UP osp nocache-OST0029-osc-MDT0000 nocache-MDT0000-mdtlov_UUID 5
 10 UP osp nocache-OST002f-osc-MDT0000 nocache-MDT0000-mdtlov_UUID 5
 11 UP osp nocache-OST003b-osc-MDT0000 nocache-MDT0000-mdtlov_UUID 5
 12 UP osp nocache-OST0035-osc-MDT0000 nocache-MDT0000-mdtlov_UUID 5
 13 UP osp nocache-OST0041-osc-MDT0000 nocache-MDT0000-mdtlov_UUID 5
 14 UP osp nocache-OST0047-osc-MDT0000 nocache-MDT0000-mdtlov_UUID 5
 15 UP lwp nocache-MDT0000-lwp-MDT0000 nocache-MDT0000-lwp-MDT0000_UUID 5
 16 UP osd-ldiskfs fscache-MDT0000-osd fscache-MDT0000-osd_UUID 13
 17 UP lod fscache-MDT0000-mdtlov fscache-MDT0000-mdtlov_UUID 4
 18 UP mdt fscache-MDT0000 fscache-MDT0000_UUID 17
 19 UP mdd fscache-MDD0000 fscache-MDD0000_UUID 4
 20 UP qmt fscache-QMT0000 fscache-QMT0000_UUID 4
 21 UP osp fscache-OST0023-osc-MDT0000 fscache-MDT0000-mdtlov_UUID 5
 22 UP osp fscache-OST0011-osc-MDT0000 fscache-MDT0000-mdtlov_UUID 5
 23 UP osp fscache-OST000b-osc-MDT0000 fscache-MDT0000-mdtlov_UUID 5
 24 UP osp fscache-OST001d-osc-MDT0000 fscache-MDT0000-mdtlov_UUID 5
 25 UP osp fscache-OST0017-osc-MDT0000 fscache-MDT0000-mdtlov_UUID 5
 26 UP osp fscache-OST0005-osc-MDT0000 fscache-MDT0000-mdtlov_UUID 5
 27 UP lwp fscache-MDT0000-lwp-MDT0000 fscache-MDT0000-lwp-MDT0000_UUID 5

Attachments

Activity

[LU-8951] lctl conf_param not retaining *_cache_enable settings

Andreas Dilger added a comment - 28/Apr/20 8:41 PM

Close old issue that cannot be reproduced.

Andreas Dilger added a comment - 28/Apr/20 8:41 PM Close old issue that cannot be reproduced.

Nathan Dauchy (Inactive) added a comment - 21/Dec/16 9:42 PM

As a baseline, I tried to duplicate the problem on a completely different system (running lustre-2.5.42.8.ddn4) and was not able to get the same (bad) symptom. So, I will continue to try to figure out what is different about the test system where this is happening. One possibility is that we are using external journal devices. Please let me know if you have suggestions of where else to look for differences.

Nathan Dauchy (Inactive) added a comment - 21/Dec/16 9:42 PM As a baseline, I tried to duplicate the problem on a completely different system (running lustre-2.5.42.8.ddn4) and was not able to get the same (bad) symptom. So, I will continue to try to figure out what is different about the test system where this is happening. One possibility is that we are using external journal devices. Please let me know if you have suggestions of where else to look for differences.

Nathan Dauchy (Inactive) added a comment - 21/Dec/16 6:29 PM

Emoly,

Thanks for the clarification on the history and goals for "set_param -P". I highly recommend that to help get through the "transition period", 1) the manual be updated to include clarification that set_param -P is preferred and that conf_param will be removed, and 2) conf_param report a warning or error for any tunable for which set_param -P should be working.

I can easily duplicate the problem using the procedure posted in the original description. I did this on the other file system for this MDS/OSS pair (to rule out any effect from the CAS cache testing) and it had the same symptoms. Not much useful in the logs, but here you go...

Dec 21 10:11:09 nbp7-mds2 kernel: Lustre: Modifying parameter nocache-OST0029.ost.read_cache_enable in log nocache-OST0029
Dec 21 10:11:20 nbp7-mds2 kernel: Lustre: 7277:0:(client.c:1941:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1482343725/real 1482343725]  req@ffff8806f06ba680 x1553901315768044/t0(0) o8->nocache-OST0047-osc-MDT0000@10.151.26.123@o2ib:28/4 lens 400/544 e 0 to 1 dl 1482343880 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Dec 21 10:12:22 nbp7-mds2 kernel: Lustre: Modifying parameter nocache-OST0029.ost.read_cache_enable in log nocache-OST0029
Dec 21 10:12:22 nbp7-mds2 kernel: Lustre: Skipped 17 previous similar messages

Dec 21 10:13:37 nbp7-mds2 kernel: Lustre: nocache-OST0029-osc-MDT0000: Connection to nocache-OST0029 (at 10.151.26.123@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Dec 21 10:13:37 nbp7-mds2 kernel: Lustre: Skipped 5 previous similar messages
Dec 21 10:13:40 nbp7-mds2 kernel: Lustre: nocache-OST0047-osc-MDT0000: Connection restored to nocache-OST0047 (at 10.151.26.123@o2ib)
Dec 21 10:13:40 nbp7-mds2 kernel: Lustre: Skipped 5 previous similar messages

Dec 21 10:13:07 nbp1-oss6 kernel: Lustre: Failing over nocache-OST0029
Dec 21 10:13:07 nbp1-oss6 kernel: Lustre: Skipped 2 previous similar messages
Dec 21 10:13:08 nbp1-oss6 kernel: Lustre: server umount nocache-OST0035 complete
Dec 21 10:13:10 nbp1-oss6 kernel: LNet: 12436:0:(lib-move.c:1485:lnet_parse_put()) Dropping PUT from 12345-10.151.27.39@o2ib portal 7 match 1553901315769072 offset 0 length 224: 4
Dec 21 10:13:10 nbp1-oss6 kernel: LNet: 12436:0:(lib-move.c:1485:lnet_parse_put()) Skipped 5 previous similar messages
Dec 21 10:13:11 nbp1-oss6 kernel: perl[45541]: segfault at 0 ip 00007fffebca713e sp 00007fffffffddd0 error 4 in libpcp_pmda.so.3[7fffebca3000+11000]
Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-27): mounted filesystem with ordered data mode. quota=on. Opts: 
Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-29): mounted filesystem with ordered data mode. quota=on. Opts: 
Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-28): mounted filesystem with ordered data mode. quota=on. Opts: 
Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-33): 
Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-26): mounted filesystem with ordered data mode. quota=on. Opts: 
Dec 21 10:13:24 nbp1-oss6 kernel: mounted filesystem with ordered data mode. quota=on. Opts: 
Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-47): mounted filesystem with ordered data mode. quota=on. Opts: 
Dec 21 10:13:30 nbp1-oss6 kernel: LustreError: 137-5: nocache-OST002f_UUID: not available for connect from 10.151.27.19@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
Dec 21 10:13:30 nbp1-oss6 kernel: LustreError: Skipped 3 previous similar messages

Dec 21 10:13:37 nbp1-oss6 kernel: Lustre: nocache-OST0029: Will be in recovery for at least 5:00, or until 2 clients reconnect
Dec 21 10:13:37 nbp1-oss6 kernel: Lustre: Skipped 5 previous similar messages
Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST0047: Recovery over after 0:03, of 2 clients 2 recovered and 0 were evicted.
Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST0047: deleting orphan objects from 0x0:2678325 to 0x0:2678417
Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST0029: deleting orphan objects from 0x0:2797909 to 0x0:2798001
Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST002f: deleting orphan objects from 0x0:2711637 to 0x0:2711729
Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: Skipped 6 previous similar messages
Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST003b: deleting orphan objects from 0x0:2714774 to 0x0:2714865
Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST0041: deleting orphan objects from 0x0:2714453 to 0x0:2714545
Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST0035: deleting orphan objects from 0x0:2749318 to 0x0:2749409

Nathan Dauchy (Inactive) added a comment - 21/Dec/16 6:29 PM Emoly, Thanks for the clarification on the history and goals for "set_param -P". I highly recommend that to help get through the "transition period", 1) the manual be updated to include clarification that set_param -P is preferred and that conf_param will be removed, and 2) conf_param report a warning or error for any tunable for which set_param -P should be working. I can easily duplicate the problem using the procedure posted in the original description. I did this on the other file system for this MDS/OSS pair (to rule out any effect from the CAS cache testing) and it had the same symptoms. Not much useful in the logs, but here you go... Dec 21 10:11:09 nbp7-mds2 kernel: Lustre: Modifying parameter nocache-OST0029.ost.read_cache_enable in log nocache-OST0029 Dec 21 10:11:20 nbp7-mds2 kernel: Lustre: 7277:0:(client.c:1941:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1482343725/real 1482343725] req@ffff8806f06ba680 x1553901315768044/t0(0) o8->nocache-OST0047-osc-MDT0000@10.151.26.123@o2ib:28/4 lens 400/544 e 0 to 1 dl 1482343880 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Dec 21 10:12:22 nbp7-mds2 kernel: Lustre: Modifying parameter nocache-OST0029.ost.read_cache_enable in log nocache-OST0029 Dec 21 10:12:22 nbp7-mds2 kernel: Lustre: Skipped 17 previous similar messages Dec 21 10:13:37 nbp7-mds2 kernel: Lustre: nocache-OST0029-osc-MDT0000: Connection to nocache-OST0029 (at 10.151.26.123@o2ib) was lost; in progress operations using this service will wait for recovery to complete Dec 21 10:13:37 nbp7-mds2 kernel: Lustre: Skipped 5 previous similar messages Dec 21 10:13:40 nbp7-mds2 kernel: Lustre: nocache-OST0047-osc-MDT0000: Connection restored to nocache-OST0047 (at 10.151.26.123@o2ib) Dec 21 10:13:40 nbp7-mds2 kernel: Lustre: Skipped 5 previous similar messages Dec 21 10:13:07 nbp1-oss6 kernel: Lustre: Failing over nocache-OST0029 Dec 21 10:13:07 nbp1-oss6 kernel: Lustre: Skipped 2 previous similar messages Dec 21 10:13:08 nbp1-oss6 kernel: Lustre: server umount nocache-OST0035 complete Dec 21 10:13:10 nbp1-oss6 kernel: LNet: 12436:0:(lib-move.c:1485:lnet_parse_put()) Dropping PUT from 12345-10.151.27.39@o2ib portal 7 match 1553901315769072 offset 0 length 224: 4 Dec 21 10:13:10 nbp1-oss6 kernel: LNet: 12436:0:(lib-move.c:1485:lnet_parse_put()) Skipped 5 previous similar messages Dec 21 10:13:11 nbp1-oss6 kernel: perl[45541]: segfault at 0 ip 00007fffebca713e sp 00007fffffffddd0 error 4 in libpcp_pmda.so.3[7fffebca3000+11000] Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-27): mounted filesystem with ordered data mode. quota=on. Opts: Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-29): mounted filesystem with ordered data mode. quota=on. Opts: Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-28): mounted filesystem with ordered data mode. quota=on. Opts: Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-33): Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-26): mounted filesystem with ordered data mode. quota=on. Opts: Dec 21 10:13:24 nbp1-oss6 kernel: mounted filesystem with ordered data mode. quota=on. Opts: Dec 21 10:13:24 nbp1-oss6 kernel: LDISKFS-fs (dm-47): mounted filesystem with ordered data mode. quota=on. Opts: Dec 21 10:13:30 nbp1-oss6 kernel: LustreError: 137-5: nocache-OST002f_UUID: not available for connect from 10.151.27.19@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Dec 21 10:13:30 nbp1-oss6 kernel: LustreError: Skipped 3 previous similar messages Dec 21 10:13:37 nbp1-oss6 kernel: Lustre: nocache-OST0029: Will be in recovery for at least 5:00, or until 2 clients reconnect Dec 21 10:13:37 nbp1-oss6 kernel: Lustre: Skipped 5 previous similar messages Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST0047: Recovery over after 0:03, of 2 clients 2 recovered and 0 were evicted. Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST0047: deleting orphan objects from 0x0:2678325 to 0x0:2678417 Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST0029: deleting orphan objects from 0x0:2797909 to 0x0:2798001 Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST002f: deleting orphan objects from 0x0:2711637 to 0x0:2711729 Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: Skipped 6 previous similar messages Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST003b: deleting orphan objects from 0x0:2714774 to 0x0:2714865 Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST0041: deleting orphan objects from 0x0:2714453 to 0x0:2714545 Dec 21 10:13:40 nbp1-oss6 kernel: Lustre: nocache-OST0035: deleting orphan objects from 0x0:2749318 to 0x0:2749409

Emoly Liu added a comment - 21/Dec/16 8:31 AM - edited

Nathan,

Why is readcache_max_filesize behaving differently than *_cache_enable?

I did this test for several times on both single node and multiple nodes, one MGS for two filesystems, but still failed to reproduce it. The conf_param works well for me. I'm using b2_7_fe(New tag 2.7.2-RC1) + el6.7(2.6.32-573.26.1.el6_lustre.x86_64). Could you please provide more details about how to reproduce it and any logs?

If conf_param doesn't make a given setting persistent, and the "set_param -P" option does, shouldn't conf_param then return an error message? As it is, it happily accepts the argument but does not do what one would think it does... since the manual clearly says "Use the lctl conf_param command to set permanent parameters."

Is "set_param -P" being phased in and "conf_param" being deprecated? Or, is conf_param being retained only for filesystem-wide persistent options, and set_param will be for operations that work on all (including remote) hosts? The manual could use some clarification on this point... perhaps indicating that "set_param -P" is not just available, but preferred for 2.5+?

The "lctl set_param -P" functionality was landed via ~~LU-3155~~ for 2.5.0, and intended to replace "lctl conf_param" with a more uniform interface that matches the existing "lctl set_param" and "lctl get_param" usage. However, there are still unresolved issues that need to be addressed and a transition period needed before the old functionality can be removed.

Emoly Liu added a comment - 21/Dec/16 8:31 AM - edited Nathan, Why is readcache_max_filesize behaving differently than *_cache_enable? I did this test for several times on both single node and multiple nodes, one MGS for two filesystems, but still failed to reproduce it. The conf_param works well for me. I'm using b2_7_fe(New tag 2.7.2-RC1) + el6.7(2.6.32-573.26.1.el6_lustre.x86_64). Could you please provide more details about how to reproduce it and any logs? If conf_param doesn't make a given setting persistent, and the "set_param -P" option does, shouldn't conf_param then return an error message? As it is, it happily accepts the argument but does not do what one would think it does... since the manual clearly says "Use the lctl conf_param command to set permanent parameters." Is "set_param -P" being phased in and "conf_param" being deprecated? Or, is conf_param being retained only for filesystem-wide persistent options, and set_param will be for operations that work on all (including remote) hosts? The manual could use some clarification on this point... perhaps indicating that "set_param -P" is not just available, but preferred for 2.5+? The "lctl set_param -P" functionality was landed via LU-3155 for 2.5.0, and intended to replace "lctl conf_param" with a more uniform interface that matches the existing "lctl set_param" and "lctl get_param" usage. However, there are still unresolved issues that need to be addressed and a transition period needed before the old functionality can be removed.

Peter Jones added a comment - 19/Dec/16 6:13 PM

Emoly

Could you please advise on this one?

Thanks

Peter

Peter Jones added a comment - 19/Dec/16 6:13 PM Emoly Could you please advise on this one? Thanks Peter

Nathan Dauchy (Inactive) added a comment - 17/Dec/16 4:59 PM

Update... Mahmoud clued me in to the "set_param -P" option, and this seems to work!

nbp7-mds2 ~ # lctl set_param -P obdfilter.fscache*.read_cache_enable=1
nbp7-mds2 ~ # lctl set_param -P obdfilter.fscache*.writethrough_cache_enable=1
nbp7-mds2 ~ # lctl set_param -P obdfilter.fscache*.readcache_max_filesize=4M

So, the main problem is solved. Only questions remaining on this issue then are...

Why is readcache_max_filesize behaving differently than *_cache_enable?
If conf_param doesn't make a given setting persistent, and the "set_param -P" option does, shouldn't conf_param then return an error message? As it is, it happily accepts the argument but does not do what one would think it does... since the manual clearly says "Use the lctl conf_param command to set permanent parameters."
Is "set_param -P" being phased in and "conf_param" being deprecated? Or, is conf_param being retained only for filesystem-wide persistent options, and set_param will be for operations that work on all (including remote) hosts? The manual could use some clarification on this point... perhaps indicating that "set_param -P" is not just available, but preferred for 2.5+?

Nathan Dauchy (Inactive) added a comment - 17/Dec/16 4:59 PM Update... Mahmoud clued me in to the "set_param -P" option, and this seems to work! nbp7-mds2 ~ # lctl set_param -P obdfilter.fscache*.read_cache_enable=1 nbp7-mds2 ~ # lctl set_param -P obdfilter.fscache*.writethrough_cache_enable=1 nbp7-mds2 ~ # lctl set_param -P obdfilter.fscache*.readcache_max_filesize=4M So, the main problem is solved. Only questions remaining on this issue then are... Why is readcache_max_filesize behaving differently than *_cache_enable? If conf_param doesn't make a given setting persistent, and the "set_param -P" option does, shouldn't conf_param then return an error message? As it is, it happily accepts the argument but does not do what one would think it does... since the manual clearly says "Use the lctl conf_param command to set permanent parameters." Is "set_param -P" being phased in and "conf_param" being deprecated? Or, is conf_param being retained only for filesystem-wide persistent options, and set_param will be for operations that work on all (including remote) hosts? The manual could use some clarification on this point... perhaps indicating that "set_param -P" is not just available, but preferred for 2.5+?

Nathan Dauchy (Inactive) added a comment - 16/Dec/16 11:10 PM

To try to rule out the multi-FS MGS as the source of the problem, I completely stopped the "nocache" file system, then stopped and restarted all targets (MGS, MDT, OSTs) of the "fscache" file system, and symptoms did not change.

Nathan Dauchy (Inactive) added a comment - 16/Dec/16 11:10 PM To try to rule out the multi-FS MGS as the source of the problem, I completely stopped the "nocache" file system, then stopped and restarted all targets (MGS, MDT, OSTs) of the "fscache" file system, and symptoms did not change.

People

Assignee:: Emoly Liu

Reporter:: Nathan Dauchy (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 16/Dec/16 10:51 PM

Updated:: 28/Apr/20 8:41 PM

Resolved:: 28/Apr/20 8:41 PM