[LU-4214] Hyperion - OST never recovers on failover node Created: 06/Nov/13 Updated: 03/Nov/17 Resolved: 11/Jun/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Cliff White (Inactive) | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 11463 | ||||||||||||
| Description |
|
On Hyperion, doing manual failover. OSTs are formatted thusly: mkfs.lustre --reformat --ost --fsname lustre --mgsnode=$MGSNODE --index=$stinx --servicenode=${PRI[$i]} --servicenode=${SEC[$i]} --mkfsoptions='-t ext4 -J size=2048 -O extents -G 256 -i 69905' /dev/sd${DISK[$i]}" &
Result on disk: Permanent disk data:
Target: lustre-OST0013
Index: 19
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=192.168.120.5@o2ib failover.node=192.168.127.62@o2ib failover.node=192.168.127.66@o2ib
Proceedure:
Clients continue to timeout on primary NID. System remains in this state for further data gathering, suggestions appreciated. |
| Comments |
| Comment by Mikhail Pershin [ 28/Nov/13 ] |
|
Cliff, bug is set as 'related' to |
| Comment by Cliff White (Inactive) [ 02/Dec/13 ] |
|
I have no idea why that is marked as related. Not done by me. There was very little information in the logs, I posted it into the bug. The lack of any error messages in this situation is rather frustrating. |
| Comment by Mikhail Pershin [ 03/Dec/13 ] |
|
OK, I see. I have no good idea about what is wrong there yet, but I have one about that message:
That looks like we need to fix target_handle_connect() to establish new connection for LWP client if NID was changes like we are doing for MDS connection. Patch is here http://review.whamcloud.com/#/c/8465/ and I am waiting for Johann reply on that. |
| Comment by Andreas Dilger [ 25/Apr/14 ] |
|
Mike, Johann commented on the patch http://review.whamcloud.com/8465 so it needs to be refreshed. |
| Comment by Jodi Levi (Inactive) [ 11/Jun/14 ] |
|
Patch landed to Master. |
| Comment by Gerrit Updater [ 11/Feb/15 ] |
|
Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/13726 |