[LU-5481] mmp updates can some times fail T10PI checks Created: 13/Aug/14  Updated: 17/Aug/18  Resolved: 16/Oct/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Mahmoud Hanafi Assignee: Niu Yawei (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

T10PI enabled RAID (netapp 5400)
Lustre OSS running (DIF&DIX) checking


Issue Links:
Duplicate
duplicates LU-11187 MMP updated sometimes failes T10PI ch... Resolved
Severity: 3
Rank (Obsolete): 15295

 Description   

MMP updates on a OSS with T10PI can some times cause block guard check failure. We have see this across several OSSes where MMP is configured on the ost.

Aug 13 06:17:04 nbp7-oss13 kernel: sd 12:0:0:37: [sdci] Host Data Integrity Failure
Aug 13 06:17:04 nbp7-oss13 kernel: sd 12:0:0:37: [sdci] Result: hostbyte=DID_ABORT driverbyte=DRIVER_SENSE
Aug 13 06:17:04 nbp7-oss13 kernel: sd 12:0:0:37: [sdci] Sense Key : Illegal Request [current] [descriptor]
Aug 13 06:17:04 nbp7-oss13 kernel: Descriptor sense data with sense descriptors (in hex):
Aug 13 06:17:04 nbp7-oss13 kernel: 72 05 10 01 00 00 00 0c 00 0a 80 00 00 00 00 00
Aug 13 06:17:04 nbp7-oss13 kernel: 00 00 c7 d0
Aug 13 06:17:04 nbp7-oss13 kernel: sd 12:0:0:37: [sdci] Add. Sense: Logical block guard check failed
Aug 13 06:17:04 nbp7-oss13 kernel: sd 12:0:0:37: [sdci] CDB: Write(10): 2a 20 00 00 c7 c8 00 00 08 00
Aug 13 06:17:04 nbp7-oss13 kernel: Buffer I/O error on device dm-15, logical block 6393
Aug 13 06:17:04 nbp7-oss13 kernel: lost page write due to I/O error on dm-15

Here is dump of the block that was called out

nbp7-oss13 ~ # debugfs -c /dev/mapper/nbp7-ost36 
debugfs 1.42.7.wc2 (07-Nov-2013)
/dev/mapper/nbp7-ost36: catastrophic mode - not reading inode or group bitmaps
debugfs:  bd 6393
0000  504d 4d00 bf6e 0000 5781 eb53 0000 0000  PMM..n..W..S....
0020  6e62 7037 2d6f 7373 3133 0000 0000 0000  nbp7-oss13......
0040  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0120  646d 2d31 3500 6170 7065 722f 6e62 7037  dm-15.apper/nbp7
0140  2d6f 7374 3336 0000 0000 0000 0000 0000  -ost36..........
0160  0a00 0000 0000 0000 0000 0000 0000 0000  ................
0200  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
nbp7-oss13 ~ # tune2fs -l /dev/mapper/nbp7-ost36
tune2fs 1.42.7.wc2 (07-Nov-2013)
Filesystem volume name:   nbp7-OST0024
Last mounted on:          /
Filesystem UUID:          e57be5e4-9ba7-46de-b1d8-7d0ba9e6c536
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize quota
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              22888704
Block count:              5859483648
Reserved block count:     292974182
Free blocks:              1575153870
Free inodes:              21282853
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         128
Inode blocks per group:   8
Flex block group size:    256
Filesystem created:       Sun Jun 16 15:50:03 2013
Last mount time:          Mon Aug 11 16:54:09 2014
Last write time:          Mon Aug 11 16:54:09 2014
Mount count:              2
Maximum mount count:      -1
Last checked:             Mon Aug 11 15:47:35 2014
Check interval:           0 (<none>)
Lifetime writes:          28 TB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     28
Desired extra isize:      28
Journal UUID:             95f8602b-579f-4d4d-a778-7230fae604cc
Journal device:	          0xfd00
Default directory hash:   half_md4
Directory Hash Seed:      865cd6af-0f3e-4ed6-87c5-82a838b2b5d9
MMP block number:         6393
MMP update interval:      5
User quota inode:         3
Group quota inode:        4


 Comments   
Comment by Peter Jones [ 13/Aug/14 ]

Niu

Could you please advise?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 14/Aug/14 ]

Hi, Mahmoud

Do you know when ususally such check failure happens? While mount/umount? Or on every mmp write while system is running? Did you observe any failures on other blocks rather than mmp block? Thanks.

Comment by Mahmoud Hanafi [ 14/Aug/14 ]

The failure appears to happen at random times. It is not necessary during mount/umount.
The failure is observed only on the mmp block.

Comment by Mahmoud Hanafi [ 15/Oct/15 ]

Please close this

Comment by Peter Jones [ 16/Oct/15 ]

ok Mahmoud

Generated at Sat Feb 10 01:51:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.