[LU-11013] Data Corruption error on Lustre ZFS dRaid Created: 10/May/18 Updated: 11/May/21 Resolved: 11/May/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Kurniawan Alfizah (Inactive) | Assignee: | Isaac Huang (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL-7.4, in-kernel ofed, mellanox FDR10, Lustre-2.10.3, dRaid-(pull-7078), dm-multipath |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Setting up Lustre testbed in ANL with:
|
| Comments |
| Comment by Andreas Dilger [ 10/May/18 ] |
|
Have you tried this with native ZFS+dRAID to see if it hits the same corruption? That would isolate the problem to dRAID vs. an interaction between Lustre and dRAID. Note that you should make the native ZFS dataset the same way as Lustre, namely to enable recordsize=1024k, dnodesize=auto, multimount. It might be best to format the OST with mkfs.lustre as today, then set canmount=yes and mount it locally for your testing. |
| Comment by Isaac Huang (Inactive) [ 14/May/18 ] |
|
When was the dRAID code last refreshed? I pushed quite some changes a couple of weeks ago - please make sure to run the latest code. Did you build zfs and spl with --enable-debug and set zfs module option draid_debug_lvl=5? Also, please see below on what debug information to gather for dRAID bugs: https://github.com/zfsonlinux/zfs/wiki/dRAID-HOWTO#troubleshooting |
| Comment by Kurniawan Alfizah (Inactive) [ 24/May/18 ] |
|
We're using this one 'https://github.com/zfsonlinux/zfs/pull/7078' I cloned them around early March 18. Btw, following Andreas suggestion, I think I might be able to re-create the problem in our VM cluster. Created VM with 30 virtual hdds, filled them up to about 98% and then removed, I got data corruption. Same thing, with or without lustre. But I don't see the problem with raidz2. On wolf-16, created the draid with 30 hdds, and then filled them up, I even managed to crash ZFS itself. This one is Isaac built ZFS though, so could be different problem.
|
| Comment by Andreas Dilger [ 11/May/21 ] |
|
This is presumably fixed in the ZFS 2.1 dRAID implementation upstream. |