[LU-9104] Unknown config param in llog fails mounting target Created: 10/Feb/17  Updated: 29/Nov/18  Resolved: 29/Nov/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Rahul Deshmukh (Inactive) Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-8066 Move lustre procfs handling to sysfs ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

If we have unknown config parameter in llog then it fails the target mounting.

Steps to re-create it are as :

$>sh llmount.sh
$>export NOFORMAT=yes
$>for OST in $(lctl get_param mgs.MGS.live.lustre | grep OST); do echo "max_rpcs_in_flight=50 on $OST" ; lctl conf_param $OST.ost.max_rpcs_in_flight=50 ; done <--- Note here ost.max.. is mention instead of osc.max..
$>sh llmountcleanup.sh
$>sh llmount.sh <------- this hangs, as ost is not avilable 


 Comments   
Comment by Gerrit Updater [ 10/Feb/17 ]

Rahul Deshmukh (rahul.deshmukh@seagate.com) uploaded a new patch: https://review.whamcloud.com/25368
Subject: LU-9104 tests: check for unknown config param while mounting
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5864f7b957c946dc4726cce180c84e7c2b323eee

Comment by Rahul Deshmukh (Inactive) [ 15/Feb/17 ]

Pushed the patch and re-producer, please review.

Comment by Peter Jones [ 29/May/17 ]

James

Can you please organize reviews of this proposed test?

Thanks

Peter

Comment by Gerrit Updater [ 19/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25368/
Subject: LU-9104 obd: Ignore unknown config param while mounting
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1385dbf8b9caedca7bb32f35db1529e4d5c52d4f

Comment by Peter Jones [ 19/Jul/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 26/Jul/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28232
Subject: LU-9104 obd: Ignore unknown config param while mounting
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 902a5fba10096e3196cfed25a2bfc7db0bfe692a

Comment by Gerrit Updater [ 14/Sep/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28232/
Subject: LU-9104 obd: Ignore unknown config param while mounting
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 542815f43c2fbbc29e528fc1de496e54851ca720

Comment by Taizeng Wu [ 29/Nov/18 ]

I am using lustre-2.10.5, when i mount mgs&mdt, it report unkown config parameter, how i remove wrong parameter.

 

```

[Thu Nov 29 15:00:14 2018] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[Thu Nov 29 15:00:17 2018] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[Thu Nov 29 15:00:18 2018] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[Thu Nov 29 15:00:19 2018] Lustre: MGS: Connection restored to 0b413a06-995f-b510-6aba-9810cdf71a17 (at 0@lo)
[Thu Nov 29 15:00:19 2018] Lustre: Found index 0 for public4-MDT0000, updating log
[Thu Nov 29 15:00:19 2018] LustreError: 11-0: public4-OST0000-osc-MDT0000: operation ost_connect to node 10.10.1.11@o2ib failed: rc = -114
[Thu Nov 29 15:00:19 2018] LustreError: Skipped 4 previous similar messages
[Thu Nov 29 15:00:19 2018] LustreError: 19825:0:(obd_config.c:1361:class_process_proc_param()) public4-MDT0000: unknown config parameter 'mdt.job_cleanaup_interval=60'
[Thu Nov 29 15:00:19 2018] LustreError: 19825:0:(obd_config.c:1682:class_config_llog_handler()) MGC10.10.1.14@o2ib: cfg command failed: rc = -38
[Thu Nov 29 15:00:19 2018] Lustre: cmd=cf00f 0:public4-MDT0000 1:mdt.job_cleanaup_interval=60

[Thu Nov 29 15:00:19 2018] LustreError: 15c-8: MGC10.10.1.14@o2ib: The configuration from log 'public4-MDT0000' failed (-38). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[Thu Nov 29 15:00:19 2018] LustreError: 19761:0:(obd_mount_server.c:1386:server_start_targets()) failed to start server public4-MDT0000: -38
[Thu Nov 29 15:00:19 2018] LustreError: 19761:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start targets: -38
[Thu Nov 29 15:00:19 2018] Lustre: Failing over public4-MDT0000
[Thu Nov 29 15:00:25 2018] Lustre: 19761:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1543474819/real 1543474819] req@ffff99bab7d6a100 x1618446900006864/t0(0) o251->MGC10.10.1.14@o2ib@0@lo:26/25 lens 224/224 e 0 to 1 dl 1543474825 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[Thu Nov 29 15:00:26 2018] Lustre: server umount public4-MDT0000 complete
[Thu Nov 29 15:00:26 2018] LustreError: 19761:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount (-38)

```

Comment by Taizeng Wu [ 29/Nov/18 ]

I am trying to use `mount -t lustre -o nosvc xxx` which only start mgs, then execute `

lctl conf_param -d public4-MDT0000.mdt.job_cleanaup_interval` to remove wrong config.

Comment by Li Xi [ 29/Nov/18 ]

The patch of https://review.whamcloud.com/25368/ might still has a bug in osd_process_config()

When a llog config of type MDT can not be understood by class_process_proc_param(PARAM_MDT) or any other prefix (e.g. PARAM_HSM), it is passed to osd_process_config() and class_process_proc_param() is called for it for twice.

The problem is why the -ENOSYS failure of class_process_proc_param(PARAM_OST) is not ignored in osd_process_config()?

I am not sure about the llog processing codes, but I guess the process is: class_process_proc_param(PARAM_MDT) -> class_process_proc_param(PARAM_HSM) -> class_process_proc_param(PARAM_HSM) -> class_process_proc_param(PARAM_OSD) -> class_process_proc_param(PARAM_OST). So if osd_process_config() doesn't understand the llog record, it should ignore the -ENOSYS. Right?

Comment by Li Xi [ 29/Nov/18 ]

I think we should fix this problem by ignoring ENOSYS in osd_process_config() so reopening the ticket.

Comment by Peter Jones [ 29/Nov/18 ]

IMHO, given that this bug has existed in multiple shipped releases, it would be better to track this issue in a new ticket so that we can more easily track getting it into future releases.

Comment by James A Simmons [ 29/Nov/18 ]

I can fix this under LU-8066 since we will be moving to sysfs which handles these error codes differently.

Comment by Peter Jones [ 29/Nov/18 ]

great - thanks James

Generated at Sat Feb 10 02:23:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.