Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • 3
    • 9223372036854775807

    Description

      While Lustre was running with combination:
      clients: aarch64 6.8.0-49-generic
      servers: x86_64 5.14.0-503.21.1_lustre.el9.x86_64

      The test failed like:

       sanityn test_109: FAIL: Mount /mnt/lustre fails with 1
      
      
      [  897.637539] ------------[ cut here ]------------
      [  897.637909] WARNING: CPU: 0 PID: 14361 at net/core/skbuff.c:7006 skb_splice_from_iter+0x17c/0x2e0
      [  897.638504] Modules linked in: lzstd(OE) llz4hc(OE) llz4(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) crc32_generic rpcsec_gss_krb5 nfsv4 nfs netfs qrtr cfg80211 8021q garp mrp stp llc binfmt_misc nls_iso8859_1 dm_multipath efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_arm64 sha1_ce aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [last unloaded: lzstd(OE)]
      [  897.642756] CPU: 0 PID: 14361 Comm: socknal_sd00_01 Kdump: loaded Tainted: G           OE      6.8.0-49-generic #49-Ubuntu
      [  897.643470] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      [  897.643930] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [  897.644385] pc : skb_splice_from_iter+0x17c/0x2e0
      [  897.644692] lr : skb_splice_from_iter+0xc8/0x2e0
      [  897.644997] sp : ffff80008553b8e0
      [  897.645216] x29: ffff80008553b960 x28: ffff0000f4364a00 x27: ffff0000f4364a00
      [  897.645681] x26: 0000000000001000 x25: 0000000000001000 x24: ffff80008553b918
      [  897.646145] x23: 0000000000000001 x22: 0000000000001000 x21: 0000000000000000
      [  897.646611] x20: fffffc0003151140 x19: 0000000000001000 x18: ffff800083d5f050
      [  897.647087] x17: 0000000000000000 x16: 0000000000000000 x15: 0000303900003039
      [  897.647553] x14: 000200000af01e70 x13: 0000000000000000 x12: 0000000000000000
      [  897.648019] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff80008144d128
      [  897.648483] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
      [  897.648951] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
      [  897.649419] x2 : 0000000000000000 x1 : fffffc0003151001 x0 : 017fffc000000840
      [  897.649888] Call trace:
      [  897.650057]  skb_splice_from_iter+0x17c/0x2e0
      [  897.650342]  tcp_sendmsg_locked+0x330/0xc50
      [  897.650616]  tcp_sendmsg+0x44/0x88
      [  897.650844]  inet_sendmsg+0x50/0xb8
      [  897.651077]  __sock_sendmsg+0x80/0x108
      [  897.651329]  sock_sendmsg+0x84/0xf0
      [  897.651560]  ksocknal_lib_sendpage+0x7c/0xd8 [ksocklnd]
      [  897.651916]  ksocknal_lib_send_kiov+0xf8/0x290 [ksocklnd]
      [  897.652276]  ksocknal_scheduler+0x9f8/0x1b30 [ksocklnd]
      [  897.652629]  kthread+0xf8/0x110
      [  897.652838]  ret_from_fork+0x10/0x20
      [  897.653074] ---[ end trace 0000000000000000 ]---
      [  897.653462] LNet: There was an unexpected network error while writing to 10.240.22.173: rc = -5
      [  897.654084] LustreError: 14361:0:(events.c:209:client_bulk_callback()) event type 1, status -5, desc ffff
      
      
      

      It should be hit on many times, But since it is a WARN_ONCE. So only one instance reported in dmesg.

      Attachments

        Issue Links

          Activity

            [LU-18749] Check page for zerocopy
            pjones Peter Jones added a comment -

            Merged for 2.17

            pjones Peter Jones added a comment - Merged for 2.17

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58205/
            Subject: LU-18749 socklnd: check page for zerocopy
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f01456a070afbb317e31ac0f99caa81375331fa4

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58205/ Subject: LU-18749 socklnd: check page for zerocopy Project: fs/lustre-release Branch: master Current Patch Set: Commit: f01456a070afbb317e31ac0f99caa81375331fa4
            ys Yang Sheng added a comment -

            Yes, So looks like this issue has already lurked in since RHEL8. Until the warning message was added to get it more noticed. Then we can close LU-16441 as duplicated with this one i think.

            ys Yang Sheng added a comment - Yes, So looks like this issue has already lurked in since RHEL8. Until the warning message was added to get it more noticed. Then we can close LU-16441 as duplicated with this one i think.
            simmonsja James A Simmons added a comment - - edited

            Its a early RHEL8 on x86 with 200GB ethernet cards. If I remember correctly if you shrink the ksocklnd module param zc_min_payload you can see this problem

            simmonsja James A Simmons added a comment - - edited Its a early RHEL8 on x86 with 200GB ethernet cards. If I remember correctly if you shrink the ksocklnd module param zc_min_payload you can see this problem
            ys Yang Sheng added a comment -

            Which arch and distro for LU-16441?

            ys Yang Sheng added a comment - Which arch and distro for LU-16441 ?

            Would this resolve LU-16441?

            simmonsja James A Simmons added a comment - Would this resolve LU-16441 ?

            "Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58205
            Subject: LU-18749 socklnd: check page for zerocopy
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: be176f2359cbcd8a1d5224fbf16b4e5dd9d0fd1c

            gerrit Gerrit Updater added a comment - "Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58205 Subject: LU-18749 socklnd: check page for zerocopy Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: be176f2359cbcd8a1d5224fbf16b4e5dd9d0fd1c

            People

              ys Yang Sheng
              ys Yang Sheng
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: