[LU-642] LBUG in client when activating an OST which was registered as initially inactive Created: 25/Aug/11 Updated: 28/May/17 Resolved: 28/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Spray (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Low Priority | Votes: | 0 |
| Labels: | ptr | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 7751 | ||||||||||||
| Description |
|
What we're trying to accomplish is to have an OST be inactive when it's first registered, by tunefs'ing the osc.active setting on the OST before we first mount it. I'm seeing that when I activate an OST which was initially inactive, I hit an LBUG on client trying to write to it. The config is MGS+MDT+OST0+OST1. Tried with all on one host, and with OSTs+client on different hosts, same effect. Using lustre-2.1.0-2.6.18_238.19.1.el5_lustre.g65156ed_gf426fb9 + other packages from the same build on CentOS5. Logs etc to follow. |
| Comments |
| Comment by Robert Read (Inactive) [ 25/Aug/11 ] |
|
This would be a good test to add to conf-sanity.sh. |
| Comment by John Spray (Inactive) [ 26/Aug/11 ] |
|
Relevant syslog excerpts: On the MGS/MDT when the OST is activated: On the initially inactive OST when it's activated: Aug 26 00:53:19 flint03 kernel: Lustre: 15810:0:(ldlm_lib.c:877:target_handle_connect()) flintfs-OST0000: connection from flintfs-MDT0000-mdtlov_UUID@1 On the client after the OST is activated, just this one message before the LBUG |
| Comment by John Spray (Inactive) [ 26/Aug/11 ] |
|
New observation: I can get this LBUG without ever trying to activate the OST, i.e. skip steps 5+6 in the original report. Here's the output from the client now that I've hooked up a serial console: Lustre: 2429:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGC192.168.0.8@tcp1->MGC192.168.0.8@tcp1_0 netid 20000: select flavor null Call Trace: Kernel panic - not syncing: LBUG |
| Comment by Peter Jones [ 30/Dec/11 ] |
|
Bobijam Could you please look into this one? Thanks Peter |
| Comment by Jeremy Filizetti [ 19/Oct/12 ] |
|
This bug should probably be linked to |
| Comment by Zhenyu Xu [ 04/Nov/12 ] |
|
found the root cause. When OSC is inactivate before lov tries to connect it (as this scenario does), lov_connect will not connect the OST device, and the import to it is set to invalid, when we activate it later, following procedure happens: ptlrpc_set_import_active() set import valid
ptlrpc_recover_import()
-
--> ptlrpc_recover_import_no_retry() fails out with -EALREADY, since the import is in NEW state, not in supposed DISCON state.
We need supplement the obd_connect RPC if it is still in NEW state when we activate the OSC later. |
| Comment by Zhenyu Xu [ 08/Nov/12 ] |
|
b2_1 patch tracking at http://review.whamcloud.com/4463 patch description LU-642 lov: make up obd_connect for inactive OSC When OSC is inactivated before lov tries to connect it, lov_connect() miss the chance to connect it to OST devices even when it is activated later. We need make up the connection for the initially inactive OSC when it is activated. |