Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.10.0, Lustre 2.11.0
Affects Version/s: None
Labels:
- LS_RZ
- prod
- rel-note
Environment:

Hide
My most recent re-production of this was:
ZFS based on 0.7.0 RC4 fs/zfs:coral-rc1-combined
Lustre tagged release 2.9.57(but 2.9.58 fails as well)
Centos 7.3 3.10.0-514.16.1.el7.x86_64

I have personally verified this fails on Lustre 2.8, 2.9 and latest tagged release, zfs 0.6.5-current ZOL Master and the most recent Centos 7.1, 7.2, and 7.3 kernels.

Show
My most recent re-production of this was: ZFS based on 0.7.0 RC4 fs/zfs:coral-rc1-combined Lustre tagged release 2.9.57(but 2.9.58 fails as well) Centos 7.3 3.10.0-514.16.1.el7.x86_64 I have personally verified this fails on Lustre 2.8, 2.9 and latest tagged release, zfs 0.6.5-current ZOL Master and the most recent Centos 7.1, 7.2, and 7.3 kernels.

Severity:
1
Rank (Obsolete):
9223372036854775807

Description

My most recent re-production of this was:
ZFS based on 0.7.0 RC4 fs/zfs:coral-rc1-combined
Lustre tagged release 2.9.57(but 2.9.58 fails as well)
Centos 7.3 3.10.0-514.16.1.el7.x86_64

I have personally verified this fails on Lustre 2.8, 2.9 and latest tagged release, zfs 0.6.5-current ZOL Master and the most recent Centos 7.1, 7.2, and 7.3 kernels.

This may well be a Lustre issue I need to try to reproduce on raidz, with out large RPCs, etc.

On both the clients and OSS nodes we see checksum errors while the file aging test is running such as:
[ 9354.968454] LustreError: 168-f: BAD WRITE CHECKSUM: lsdraid-OST0000 from 12345-192.168.1.6@o2ib inode [0x200000401:0x254:0x0] object 0x0:292 extent [117440512-125698047]: client csum de357896, server csum 5cd77893

[ 9394.315856] LustreError: 168-f: BAD WRITE CHECKSUM: lsdraid-OST0000 from 12345-192.168.1.6@o2ib inode [0x200000401:0x28c:0x0] object 0x0:320 extent [67108864-82968575]: client csum df6bd34a, server csum 8480d352
[ 9404.371609] LustreError: 168-f: BAD WRITE CHECKSUM: lsdraid-OST0000 from 12345-192.168.1.6@o2ib inode [0x200000401:0x298:0x0] object 0x0:326 extent [67108864-74448895]: client csum 2ced4ec0, server csum 1f814ec4

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

debug_info.20170406_143409_48420_wolf-3.wolf.hpdd.intel.com.tgz
07/Apr/17 3:14 PM
3.45 MB
John Salinas
wolf-6_client.tgz
07/Apr/17 3:14 PM
5.67 MB
John Salinas
BasicLibs.py
14/Apr/17 1:50 AM
6 kB
John Salinas
FileAger-wolf6.py
14/Apr/17 1:50 AM
6 kB
John Salinas
FileAger-wolf7.py
14/Apr/17 1:50 AM
6 kB
John Salinas
FileAger-wolf8.py
14/Apr/17 1:50 AM
6 kB
John Salinas
FileAger-wolf9.py
14/Apr/17 1:50 AM
6 kB
John Salinas
debug_vmalloc.patch
27/Apr/17 9:04 PM
22 kB
Andreas Dilger
debug_vmalloc_lustre.patch
27/Apr/17 9:04 PM
6 kB
Andreas Dilger
debug_vmalloc_spl.patch
27/Apr/17 9:04 PM
14 kB
Andreas Dilger
Linux_x64_Memory_Address_Mapping.pdf
10/May/19 4:19 AM
224 kB
Homer Li

Issue Links

is duplicated by

LU-9304 BUG: Bad page state in process ll_ost_io01_013 pfn:1a01bcd kernel BUG at include/linux/scatterlist.h:65!

Resolved

is related to

LU-9279 coral-beta-combined build 124 kernel BUG at include/linux/scatterlist.h:65! invalid opcode: 0000 [#1] SMP

Resolved

LU-9854 Lustre 2.10.0 mmap() issues

Resolved

mentioned in: Page Loading...

Activity

People

Assignee:: Alex Zhuravlev

Reporter:: John Salinas (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 23 Start watching this issue

Dates

Due:: 28/Feb/17

Created:: 29/Sep/16 4:04 PM

Updated:: 10/May/19 4:19 AM

Resolved:: 29/Sep/17 5:17 PM