[LU-12945] Avoid sending PageSlab pages through tcp stack Created: 06/Nov/19  Updated: 17/Feb/21  Resolved: 10/Jun/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Shaun Tancheff Assignee: Shaun Tancheff
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Oct 26 12:58:25 oss kernel: BUG: Bad page state in process socknal_sd03_00 pfn:b0641
Oct 26 12:58:25 oss kernel: page:ffffeea482c19040 count:0 mapcount:-1 mapping: (null) index:0x0
Oct 26 12:58:25 oss kernel: page flags: 0x1fffff00008000(tail)
Oct 26 12:58:25 oss kernel: page dumped because: nonzero mapcount
Oct 26 12:58:25 oss kernel: Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) fuse zfs(POE) snd_hda_codec_generic zunicode(POE) zlua(POE) ppdev zcommon(POE) znvpair(POE) zavl(POE) snd_hda_intel icp(POE) iosf_mbi crc32_pclmul snd_hda_codec spl(OE) snd_hda_core snd_hwdep ghash_clmulni_intel snd_seq snd_seq_device aesni_intel snd_pcm lrw gf128mul joydev glue_helper snd_timer parport_pc ablk_helper snd virtio_balloon cryptd pcspkr parport sg soundcore i2c_piix4 ip_tables xfs libcrc32c sd_mod sr_mod crc_t10dif cdrom crct10dif_generic ata_generic pata_acpi virtio_scsi virtio_net virtio_console qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_piix drm crct10dif_pclmul libata virtio_pci crct10dif_common
Oct 26 12:58:25 oss kernel: virtio_ring crc32c_intel serio_raw floppy virtio drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
Oct 26 12:58:25 oss kernel: CPU: 6 PID: 4165 Comm: socknal_sd03_00 Kdump: loaded Tainted: P OE ------------ 3.10.0-957.27.3.ldiskfs.el7.x86_64 #1

When running with upstream linux debug [CONFIG_DEBUG_VM] kernel (v5.0-3279-ga10674bf2406) has added a test for PageSlab() on the memory passed to tcp_sendpages and blocks sending.

Oct 30 23:47:01 oss kernel: WARNING: CPU: 2 PID: 3642 at net/ipv4/tcp.c:966 do_tcp_sendpages+0xb56/0xcb0

Avoid sending PageSlab through the tcp stack



 Comments   
Comment by Gerrit Updater [ 06/Nov/19 ]

Shaun Tancheff (stancheff@cray.com) uploaded a new patch: https://review.whamcloud.com/36691
Subject: LU-12945 lnet: Avoid sending PageSlab pages via tcp
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9f59d66ab83ce50de606b063da9a8bc6693ed55a

Comment by Shaun Tancheff [ 08/Nov/19 ]

Frequency of PageSlab occurrence:
     ksocknal_lib_send_iov: buffered 142330 / zero 197893

The vast majority of buffered sends use 2 pages per instance.

Comment by Shaun Tancheff [ 08/Nov/19 ]

This BUG: 
   page:ffffeea482c19040 count:0 mapcount:-1 mapping: (null) index:0x0

Appears to also be related to the kernel version and ZFS version used.
  5.4 + zfs 0.8.2+ the issue is not reproducible.
with CentOS 7.6 kernel or older ZFS (early 0.8.0) this does reproduce.

Comment by Shaun Tancheff [ 28/Nov/19 ]

https://review.whamcloud.com/36691 at patch 4/5/6 resolves:

Oct 26 12:58:25 oss kernel: BUG: Bad page state in process socknal_sd03_00 pfn:b0641
Oct 26 12:58:25 oss kernel: page:ffffeea482c19040 count:0 mapcount:-1 mapping: (null) index:0x0
O

For all zfs/kernel combinations.

 

Comment by Gerrit Updater [ 21/Jan/20 ]

Shaun Tancheff (stancheff@cray.com) uploaded a new patch: https://review.whamcloud.com/37297
Subject: LU-12945 lnet: Disable zero copy when running on VM
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6dd727501adf35064d4d6dcd9b1c5f60688883bc

Comment by Gerrit Updater [ 21/Jan/20 ]

Shaun Tancheff (stancheff@cray.com) uploaded a new patch: https://review.whamcloud.com/37300
Subject: LU-12945 lnet: Disable zero copy when running on VM
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9d485b153fd18cfc739012a864944eaf0c29fd42

Comment by Gerrit Updater [ 01/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37300/
Subject: LU-12945 lnet: Disable zero copy when running on VM
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0c45e49457a3f61ca661f4f7b0ad749cceaf7709

Comment by Shaun Tancheff [ 10/Jun/20 ]

All patches needed for issue are landed.

Generated at Sat Feb 10 02:57:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.