[LU-1253] obd_set_info_async: dev 0 no operation Created: 22/Mar/12  Updated: 14/Jun/15  Resolved: 14/Jun/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Christopher Morrone Assignee: Hongchao Zhang
Resolution: Won't Fix Votes: 0
Labels: llnl
Environment:

Lustre 1.8 and 2.1 clients, servers are lustre 2.1.0-24chaos


Severity: 3
Rank (Obsolete): 10085

 Description   

Our admins are expanding a 2.1 filesystem with new OSTs. Because of known-and-never-solved issues in 1.8 that made adding OSTs in non-sequential order problematic (break out your hexeditor to fix), the admins are adding the new OSTs one at a time in sequential order.

Each addition causes the MGS lock to be revoked from all clients, which causes an MGS reconnect storm. When this happens we see the following:

2012-03-22 15:37:11 Lustre: MGS: Client 1aea7ac4-5b27-3e30-3c57-467cd6aed36f (at 192.168.120.118@o2ib7) reconnecting
2012-03-22 15:37:11 Lustre: Skipped 995 previous similar messages
2012-03-22 15:37:11 LustreError: 12649:0:(obd_class.h:501:obd_set_info_async()) obd_set_info_async: dev 0 no operation
2012-03-22 15:37:11 LustreError: 12649:0:(obd_class.h:501:obd_set_info_async()) Skipped 861 previous similar messages

I assume that the call to obd_set_info_async() is the one in target_handle_connect().

Device 0 is the MGS device.



 Comments   
Comment by Peter Jones [ 23/Mar/12 ]

Hongchao

Could you please look into this one?

Thanks

Peter

Comment by Christopher Morrone [ 23/Mar/12 ]

FYI, console logs from the MDS/MGS node for the whole day are attached to LU-1257 if you want more context.

Comment by Hongchao Zhang [ 27/Apr/12 ]

currently, there is only one ldlm_resource for one Lustre system, say, "lustre", "scratch", then if one config log was changed,
all nodes related to this Lustre system will be re-enqueue the config lock to reprocess the config.

using the target(MDT or OST) name(say, Lustre-OST0001) or the address of client's super block could fix this issue,
but I am afraid it needs huge work to do.

Comment by Christopher Morrone [ 30/Apr/12 ]

I don't think either of those will be sufficient. In the first (target name), we fail to protect against the admins incorrectly giving two targets the same name. In the second case, if the ost reboots it will no longer have the same super block address.

We need something unique that we store ON DISK with the OST before beginning the registration process.

I agree that this would be a fairly big project. The original design really didn't consider the possibility of failures during configuration.

Comment by D. Marc Stearman (Inactive) [ 11/Jun/15 ]

I have closed our local Jira issue as obsolete. Unless there are objections, I'm happy to close this one out.

Comment by Peter Jones [ 14/Jun/15 ]

ok - thanks Mark

Generated at Sat Feb 10 01:14:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.