[LU-11013] Data Corruption error on Lustre ZFS dRaid Created: 10/May/18  Updated: 11/May/21  Resolved: 11/May/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Kurniawan Alfizah (Inactive) Assignee: Isaac Huang (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

RHEL-7.4, in-kernel ofed, mellanox FDR10, Lustre-2.10.3, dRaid-(pull-7078), dm-multipath


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Setting up Lustre testbed in ANL with: 

  • 4 OSSs, total 16 OSTs ( 8 JBODs each with 60 HDDs)
  • Hybrid lustre, MGT/MDT - mdraid - raid10 - ldiskfs, OST - zfs - dRaid
  • MGT - 2 SSDs, raid10 - ldiskfs
  • MDT0 - 12 SSDs, raid10 - ldiskfs
  • MDT1 - 10 SSDs, raid10 - ldiskfs
  • Each OST, 30 HDDs, zfs dRaid with 3*(8+1) + 1

 

  • Filled up the fs to about 99%, we got data corruption problem after cleaned up fs and ran zfs scrub. Quite severe and ended up crashed the fs.
  • Rebuild lustre dRaid fs, and test again in order to duplicate the problem.
  • On first iteration of fill and clean up, the fs was holding up. Only got "One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected." on two dRaid zpool. So just clear up those errors.
  • After 2nd iteration, finally able to reproduce the error, after emptied file system and run scrub, we got the same data corruption problem ("One or more devices has experienced an error resulting in data corruption. Application may be affected").
  • Change the zpool to raidz2 with 3*(8+2) and we don't have this problem.


 Comments   
Comment by Andreas Dilger [ 10/May/18 ]

Have you tried this with native ZFS+dRAID to see if it hits the same corruption? That would isolate the problem to dRAID vs. an interaction between Lustre and dRAID. 

Note that you should make the native ZFS dataset the same way as Lustre, namely to enable recordsize=1024k, dnodesize=auto, multimount. It might be best to format the OST with mkfs.lustre as today, then set canmount=yes and mount it locally for your testing.

Comment by Isaac Huang (Inactive) [ 14/May/18 ]

When was the dRAID code last refreshed? I pushed quite some changes a couple of weeks ago - please make sure to run the latest code. Did you build zfs and spl with --enable-debug and set zfs module option draid_debug_lvl=5? Also, please see below on what debug information to gather for dRAID bugs:

https://github.com/zfsonlinux/zfs/wiki/dRAID-HOWTO#troubleshooting

Comment by Kurniawan Alfizah (Inactive) [ 24/May/18 ]

We're using this one 'https://github.com/zfsonlinux/zfs/pull/7078'  I cloned them around early March 18.

Btw, following Andreas suggestion, I think I might be able to re-create the problem in our VM cluster. Created VM with 30 virtual hdds, filled them up to about 98% and then removed, I got data corruption. Same thing, with or without lustre. But I don't see the problem with raidz2.

On wolf-16, created the draid with 30 hdds, and then filled them up, I even managed to crash ZFS itself. This one is Isaac built ZFS though, so could be different problem.

 

 

Comment by Andreas Dilger [ 11/May/21 ]

This is presumably fixed in the ZFS 2.1 dRAID implementation upstream.

Generated at Sat Feb 10 02:40:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.