[LU-17185] After deactivating OSTs, some clients see them as active Created: 11/Oct/23 Updated: 13/Oct/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Roger Sersted | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Client: Servers: |
||
| Epic/Theme: | client, mgs |
| Severity: | 3 |
| Epic: | client, mgs |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
After running lctl conf_param lustrefc-OST0018.osc.active=0 on the MGS for multiple OSTs, some clients see the OSTs as inactive and work just fine. Some clients see the OSTs as active and hang. The software stack is the same on working and non-working clients. Here is some sample output from a working client:
Output from a non-working client
This has rendered our cluster unusable. |
| Comments |
| Comment by Roger Sersted [ 11/Oct/23 ] |
|
I should add, I tried rebooting one of the problem nodes and that did not resolve this issue. |
| Comment by Andreas Dilger [ 11/Oct/23 ] |
|
I can't say for sure why this setting is not being applied to some of the clients. As a workaround, you could manually deactivate these OSTs on the affected clients like: client# lctl set_param osc.*OST{0007,000e,000f}*.active=0
possibly using pdsh or other tool to execute it on multiple clients at once. It shouldn't be harmful if this is run on clients that already have the OSTs deactivated. |
| Comment by Andreas Dilger [ 11/Oct/23 ] |
|
It would be worthwhile to check that the conf_param command is present in the client config log: mgs# lctl --device MGS llog_print lustrefc-client | grep OST0007 There would be initial commands to add the OST and then the last one should be the one to mark it inactive. In newer releases there is an "lctl del_ost" command that will remove the OST setup commands from the configuration log completely. |
| Comment by Roger Sersted [ 12/Oct/23 ] |
|
Thank you. The lctl ...active=0 command on the clients fixed the hang problem. I'll run the llog_print command later and attach the results. |
| Comment by Roger Sersted [ 13/Oct/23 ] |
|
I thought it fixed the problem. On the problem clients, I'm seeing this: [root@puppy83 ~]# lfs df -h [root@puppy83 ~]# lctl dl |