[LU-3676] Error setting llite.max_cached_mb on MGS Created: 31/Jul/13  Updated: 15/Jan/15  Resolved: 14/Nov/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.5.0, Lustre 2.7.0, Lustre 2.5.2
Fix Version/s: Lustre 2.7.0, Lustre 2.5.4

Type: Bug Priority: Critical
Reporter: Kelsey Prantis (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9491

 Description   

When setting the conf param max_cached_mb on the MGS, clients connecting to it's filesystem did not pick up the set value, and instead used the default. There is this correlating error message in the /var/log/messages on the client:

Jul 31 23:01:17 vm4 kernel: LustreError: 5911:0:(obd_config.c:1308:class_process_proc_param()) writing proc entry max_cached_mb err -19

As -19 is a ENODEV, this may point to a bug where llite parameters are attempted to be set before the system is ready for them.

Version of Lustre used: IEEL Lustre and client

Steps to reproduce:
1. Install Lustre and manually configure a filesystem. In this case, I had 4 Lustre servers: 2 were an active-active pair for the MGT and MDT, and 2 were an active-active pair for a pair of OSTs.
2. Mount the filesystem and run "lctl get_param llite.*.max_cached_mb" on the client. Note what the default value is.
3. Unmount the filesystem.
2. Run 'lctl conf_param <fsname>.llite.max_cached_mb=<a value other than the default noted above>' on the MGS.
3. Mount the filesystem on a client
4. Check /var/log/messages, see the above error message.
5. Re-run "lctl get_param llite.*.max_cached_mb" on the client, and see that it is still the default value, not the value set on the MGS.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 31/Jul/13 ]

This is an issue about client parameter settings. It failed at:

static int ll_wr_max_cached_mb(struct file *file, const char *buffer,
                               unsigned long count, void *data)
{
        ...

        if (sbi->ll_dt_exp == NULL)
                RETURN(-ENODEV);

        ...
}

The root cause of this problem is that when this parameter is applied, the client is not set up data connection to OST yet. In that case, it's not necessary to return -ENODEV at all, instead we should return success here.

Comment by Jinshan Xiong (Inactive) [ 31/Jul/13 ]

patch is at: http://review.whamcloud.com/7194

Comment by Jodi Levi (Inactive) [ 17/Sep/13 ]

Patch landed to Master so closing ticket.

Comment by Jinshan Xiong (Inactive) [ 05/Sep/14 ]

Sorry this patch has to be reworked and here is the patch: http://review.whamcloud.com/11783

Comment by Andreas Dilger [ 09/Sep/14 ]

Jinshan, what is the priority for landing this new version of the patch, and which versions should it be landed on?

Comment by Jinshan Xiong (Inactive) [ 09/Sep/14 ]

This patch will affect the usability of conf_param for max_cached_mb so the priority should be medium.

I'd like it to be landed to master for now and I will back port it to elsewhere.

Comment by Jodi Levi (Inactive) [ 14/Nov/14 ]

Patches landed to Master.

Comment by Gerrit Updater [ 03/Dec/14 ]

Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/12924
Subject: LU-3676 llite: to configure max_cached_mb correctly
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: 4da46d136b433e45a9c4a669e34fd349739ad2e5

Comment by Gerrit Updater [ 15/Jan/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12924/
Subject: LU-3676 llite: to configure max_cached_mb correctly
Project: fs/lustre-release
Branch: b2_5
Current Patch Set:
Commit: 016d6276ed20f4f61894278d055e4fe80fd0a470

Generated at Sat Feb 10 01:35:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.