[LU-1809] Clients unable to mount (-108) Created: 31/Aug/12 Updated: 19/Oct/12 Resolved: 19/Oct/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Kit Westneat (Inactive) | Assignee: | Keith Mannthey (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 6342 |
| Description |
|
NOAA hit a problem that looks a lot like Here's the client's syslog: MDS logs to come. |
| Comments |
| Comment by Isaac Huang (Inactive) [ 31/Aug/12 ] |
|
Likely this is a dup of This looked like a local error, i.e. the message did not go out on wire. Please: If this is a dup of |
| Comment by Kit Westneat (Inactive) [ 05/Sep/12 ] |
|
It looks like the patch is fairly simple, can we get it landed on b1_8? In the meantime I will communicate the workaround to the customer. I think it is pretty rare though. Thanks, |
| Comment by Kit Westneat (Inactive) [ 06/Sep/12 ] |
|
Hi Isaac, What are the implications of peer_timeout=0? That is to say, what exactly does it do? Also, does it have to be on all the servers and clients? or can it be just the servers or just the clients? Thanks, |
| Comment by Isaac Huang (Inactive) [ 06/Sep/12 ] |
|
peer_timeout=0 disables a feature that should only be turned on for routers - it was a bug to be able to enable it anywhere but the routers. In other words, peer_timeout=0 fixes it without any code changes. The feature does not work on clients and servers and will cause messages to be dropped, so "peer_timeout=0" must be set on all clients and servers. |
| Comment by Kit Westneat (Inactive) [ 06/Sep/12 ] |
|
Is it ok to do "peer_timeout=0" on the clients before the servers? Or does it need to be set at the same time everywhere? |
| Comment by Isaac Huang (Inactive) [ 06/Sep/12 ] |
|
There's no requirement on order. You can do it in any order that's most convenient. |
| Comment by Kit Westneat (Inactive) [ 25/Sep/12 ] |
|
could we get this landed to b1_8? It appears to have fixed the issue. |
| Comment by Isaac Huang (Inactive) [ 17/Oct/12 ] |
|
I likely missed some notifications when JIRA was upgraded a while back. I agree that it's a simple patch that fixes a class of problems hard to diagnose when they manifest themselves at upper layers. I'd defer to Peter whether to land it to b1_8. |
| Comment by Peter Jones [ 17/Oct/12 ] |
|
Thanks Isaac. Keith can you please backport the patch from master to b1_8? |
| Comment by Isaac Huang (Inactive) [ 17/Oct/12 ] |
|
Quite likely the patch would apply to b1_8 without any changes, just ignore white space changes with patch --ignore-whitespace. |
| Comment by Keith Mannthey (Inactive) [ 17/Oct/12 ] |
|
I was able to cherry-pick the patch from http://review.whamcloud.com/4287 is the b1_8 patch. |
| Comment by Peter Jones [ 19/Oct/12 ] |
|
duplicate of |