[LU-7797] Can't mount zpools after OSS restart Created: 19/Feb/16 Updated: 24/Feb/16 Resolved: 24/Feb/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Frank Heckes (Inactive) | Assignee: | Alex Zhuravlev |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Error happened during soak testing of build '20160218' (see: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160218). DNE is enabled. Sequence of events:
Attached messages and console log files of lola-5 |
| Comments |
| Comment by Frank Heckes (Inactive) [ 19/Feb/16 ] |
|
The states of the zpool are as follows: soaked-ost3[root@lola-5 ~]# zpool status -v soaked-ost3
pool: soaked-ost3
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: none requested
config:
NAME STATE READ WRITE CKSUM
soaked-ost3 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
lola-5_ost3_disk_0 ONLINE 0 0 0
lola-5_ost3_disk_1 ONLINE 0 0 0
lola-5_ost3_disk_2 ONLINE 0 0 0
lola-5_ost3_disk_3 ONLINE 0 0 0
lola-5_ost3_disk_4 ONLINE 0 0 0
lola-5_ost3_disk_5 ONLINE 0 0 0
lola-5_ost3_disk_6 ONLINE 0 0 0
lola-5_ost3_disk_7 ONLINE 0 0 0
lola-5_ost3_disk_8 ONLINE 0 0 0
lola-5_ost3_disk_9 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
soaked-ost3/ost3:/oi.10
soaked-ost7[root@lola-5 ~]# zpool status -v soaked-ost7
pool: soaked-ost7
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
soaked-ost7 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
lola-5_ost7_disk_0 ONLINE 0 0 0
lola-5_ost7_disk_1 ONLINE 0 0 0
lola-5_ost7_disk_2 ONLINE 0 0 0
lola-5_ost7_disk_3 ONLINE 0 0 0
lola-5_ost7_disk_4 ONLINE 0 0 0
lola-5_ost7_disk_5 ONLINE 0 0 0
lola-5_ost7_disk_6 ONLINE 0 0 0
lola-5_ost7_disk_7 ONLINE 0 0 0
lola-5_ost7_disk_8 ONLINE 0 0 0
lola-5_ost7_disk_9 ONLINE 0 0 0
errors: No known data errors
|
| Comment by Frank Heckes (Inactive) [ 19/Feb/16 ] |
|
one remark: The called out zpool 'soaked-ost11' ('...has unrecoverable errros..') can be mounted and is operational after Lustre recovery completes. |
| Comment by Peter Jones [ 19/Feb/16 ] |
|
Alex Could you please look into how this could have occurred? Thanks Peter |
| Comment by Andreas Dilger [ 19/Feb/16 ] |
|
Unfortunately, there is no OI scrub functionality for ZFS today, so it isn't possible to just delete the corrupted OI file and have OI Scrub rebuild it. ZFS is not supposed to be corrupted during usage, and none of the APIs that Lustre is using to modify the filesystem should allow the pool to be corrupt. However, in
ZFS is definitely vulnerable to corruption if it is mounted on multiple nodes at the same time, and there is not currently an MMP feature like ldiskfs that actively prevents it from being accessed by two nodes. This kind of corruption would show up first for files that are being modified frequently (e.g. the OI file seen here) because each node will be assigning different blocks and updating the tree differently. There needs to be strict Linux HA control of the zpools so that they are not imported on the backup node unless they are failed over, and when failover happens there needs to be STONITH to turn off the primary node before the pool is imported on the backup to avoid concurrent access. It is worthwhile to contact Gabriele or Zhiqi to see if there is a best-practices guide to installing ZFS in HA failover configuration. Also, see patch http://review.whamcloud.com/16611 " You may be able to recover the corrupted OST zpool if it hasn't been mounted for a long time by reverting to an older uberblock or snapshot that does not have the corruption in it. See for example http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script or https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSMetadataRecovery for details. If that doesn't work then it would be necessary to reformat the filesystem. However, at a minimum the /etc/hostid should be set (and unique!) to prevent gratuitous imports, and failover between OSSes should be disabled until proper HA configuration is done. |
| Comment by Frank Heckes (Inactive) [ 22/Feb/16 ] |
|
Andreas: Many thanks for the pointers. I'll try to fix the OSTs and enhance the soak framework to follow the HA best practices for ZFS. |
| Comment by Frank Heckes (Inactive) [ 24/Feb/16 ] |
|
I think this ticket can be closed. The error was caused by the node set-up not reflecting the zfs constraints (aka importing the same zpool (OST) simultaneously on two nodes (OSSes)). |