[LU-1253] obd_set_info_async: dev 0 no operation Created: 22/Mar/12 Updated: 14/Jun/15 Resolved: 14/Jun/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Christopher Morrone | Assignee: | Hongchao Zhang |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
Lustre 1.8 and 2.1 clients, servers are lustre 2.1.0-24chaos |
||
| Severity: | 3 |
| Rank (Obsolete): | 10085 |
| Description |
|
Our admins are expanding a 2.1 filesystem with new OSTs. Because of known-and-never-solved issues in 1.8 that made adding OSTs in non-sequential order problematic (break out your hexeditor to fix), the admins are adding the new OSTs one at a time in sequential order. Each addition causes the MGS lock to be revoked from all clients, which causes an MGS reconnect storm. When this happens we see the following: 2012-03-22 15:37:11 Lustre: MGS: Client 1aea7ac4-5b27-3e30-3c57-467cd6aed36f (at 192.168.120.118@o2ib7) reconnecting 2012-03-22 15:37:11 Lustre: Skipped 995 previous similar messages 2012-03-22 15:37:11 LustreError: 12649:0:(obd_class.h:501:obd_set_info_async()) obd_set_info_async: dev 0 no operation 2012-03-22 15:37:11 LustreError: 12649:0:(obd_class.h:501:obd_set_info_async()) Skipped 861 previous similar messages I assume that the call to obd_set_info_async() is the one in target_handle_connect(). Device 0 is the MGS device. |
| Comments |
| Comment by Peter Jones [ 23/Mar/12 ] |
|
Hongchao Could you please look into this one? Thanks Peter |
| Comment by Christopher Morrone [ 23/Mar/12 ] |
|
FYI, console logs from the MDS/MGS node for the whole day are attached to |
| Comment by Hongchao Zhang [ 27/Apr/12 ] |
|
currently, there is only one ldlm_resource for one Lustre system, say, "lustre", "scratch", then if one config log was changed, using the target(MDT or OST) name(say, Lustre-OST0001) or the address of client's super block could fix this issue, |
| Comment by Christopher Morrone [ 30/Apr/12 ] |
|
I don't think either of those will be sufficient. In the first (target name), we fail to protect against the admins incorrectly giving two targets the same name. In the second case, if the ost reboots it will no longer have the same super block address. We need something unique that we store ON DISK with the OST before beginning the registration process. I agree that this would be a fairly big project. The original design really didn't consider the possibility of failures during configuration. |
| Comment by D. Marc Stearman (Inactive) [ 11/Jun/15 ] |
|
I have closed our local Jira issue as obsolete. Unless there are objections, I'm happy to close this one out. |
| Comment by Peter Jones [ 14/Jun/15 ] |
|
ok - thanks Mark |