Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9304

BUG: Bad page state in process ll_ost_io01_013 pfn:1a01bcd kernel BUG at include/linux/scatterlist.h:65!

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      Running 4 Lustre Clients, 2 OSS nodes each with 1 zpool, and 1 mds.
      This OSS node:

      1. zpool status -v
        pool: ost0
        state: ONLINE
        scan: none requested
        config:

      NAME STATE READ WRITE CKSUM
      ost0 ONLINE 0 0 0
      draid1-0

      {any}

      ONLINE 0 0 0
      mpathaj ONLINE 0 0 0
      mpathai ONLINE 0 0 0
      mpathah ONLINE 0 0 0
      mpathag ONLINE 0 0 0
      mpathaq ONLINE 0 0 0
      mpathap ONLINE 0 0 0
      mpathak ONLINE 0 0 0
      mpathz ONLINE 0 0 0
      mpatham ONLINE 0 0 0
      mpathal ONLINE 0 0 0
      mpathao ONLINE 0 0 0
      spares
      $draid1-0-s0 AVAIL

      errors: No known data errors

      This build of zfs was from coral-prototype branch and Lustre was a Lustre Master from Dec 1st.

      We were running our file system aging utility: FileAger.py (1-2 copies on each of the 4 client nodes) along an IOR: mpirun -wdir /mnt/lustre/ -np 4 -rr -machinefile hosts -env I_MPI_EXTRA_FILESYSTEM=on -env I_MPI_EXTRA_FILESYSTEM_LIST=lustre /home/johnsali/wolf-3/ior/src/ior -a POSIX -F -N 4 -d 2 -i 1 -s 20000 -b 16MB -t 16MB -k -w -r

      While this was running it appears we hit this failure.

      [159898.950714] BUG: Bad page state in process ll_ost_io01_013 pfn:1a01bcd
      [159898.960045] page:ffffea006806f340 count:-1 mapcount:0 mapping: (null) index:0x0
      [159898.970667] page flags: 0x6fffff00000000()
      [159898.976808] page dumped because: nonzero _count
      [159898.983412] Modules linked in: nfsv3 nfs_acl raid10 osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) zfs(OE) zunicode(OE) zavl(OE) zcommon(OE) znvpair(OE) spl(OE) zlib_deflate lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) sha512_generic crypto_null rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ses dm_service_time enclosure intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mpt3sas ipmi_devintf ipmi_ssif ipmi_si
      [159899.072452] raid_class sb_edac iTCO_wdt iTCO_vendor_support scsi_transport_sas sg edac_core pcspkr ipmi_msghandler wmi ioatdma mei_me mei lpc_ich shpchp i2c_i801 mfd_core acpi_pad acpi_power_meter dm_multipath dm_mod ip_tables ext4 mbcache jbd2 mlx4_ib mlx4_en ib_sa vxlan ib_mad ip6_udp_tunnel udp_tunnel ib_core ib_addr sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt drm_kms_helper crct10dif_pclmul igb crct10dif_common ttm ptp crc32c_intel ahci pps_core drm mlx4_core libahci dca i2c_algo_bit libata i2c_core [last unloaded: zunicode]
      [159899.135473] CPU: 57 PID: 98747 Comm: ll_ost_io01_013 Tainted: G IOE ------------ 3.10.0-327.36.3.el7.x86_64 #1
      [159899.149461] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
      [159899.162801] ffffea006806f340 00000000424e76b3 ffff880f9e233908 ffffffff81636431
      [159899.172821] ffff880f9e233930 ffffffff81631645 ffffea006806f340 0000000000000000
      [159899.182870] 000fffff00000000 ffff880f9e233978 ffffffff811714dd fff00000fe000000
      [159899.192895] Call Trace:
      [159899.197269] [<ffffffff81636431>] dump_stack+0x19/0x1b
      [159899.204667] [<ffffffff81631645>] bad_page.part.59+0xdf/0xfc
      [159899.212639] [<ffffffff811714dd>] free_pages_prepare+0x16d/0x190
      [159899.220965] [<ffffffff81171e21>] free_hot_cold_page+0x31/0x140
      [159899.229171] [<ffffffff8117200f>] __free_pages+0x3f/0x60
      [159899.236690] [<ffffffffa100bad3>] osd_bufs_put+0x123/0x1f0 [osd_zfs]
      [159899.245372] [<ffffffffa118284a>] ofd_commitrw_write+0xea/0x1c20 [ofd]
      [159899.254234] [<ffffffffa1186f2d>] ofd_commitrw+0x51d/0xa40 [ofd]
      [159899.262551] [<ffffffffa0d538d5>] obd_commitrw+0x2ec/0x32f [ptlrpc]
      [159899.271488] [<ffffffffa0d2bf71>] tgt_brw_write+0xea1/0x1640 [ptlrpc]
      [159899.280509] [<ffffffff810c15cc>] ? update_curr+0xcc/0x150
      [159899.288372] [<ffffffff810be46e>] ? account_entity_dequeue+0xae/0xd0
      [159899.297010] [<ffffffffa0c82560>] ? target_send_reply_msg+0x170/0x170 [ptlrpc]
      [159899.306746] [<ffffffffa0d28225>] tgt_request_handle+0x915/0x1320 [ptlrpc]
      [159899.316058] [<ffffffffa0cd41ab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      [159899.326348] [<ffffffffa0967128>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
      [159899.335679] [<ffffffffa0cd1d68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [159899.345029] [<ffffffff810b8952>] ? default_wake_function+0x12/0x20
      [159899.353394] [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90
      [159899.361264] [<ffffffffa0cd8260>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
      [159899.369596] [<ffffffffa0cd77c0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc]
      [159899.379160] [<ffffffff810a5b8f>] kthread+0xcf/0xe0
      [159899.385881] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
      [159899.394413] [<ffffffff81646a98>] ret_from_fork+0x58/0x90
      [159899.401653] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
      [159899.410157] Disabling lock debugging due to kernel taint
      [163012.964891] LustreError: 168-f: BAD WRITE CHECKSUM: lsdraid-OST0000 from 12345-192.168.1.8@o2ib inode [0x200000406:0x3c5:0x0] object 0x0:44785 extent [67108864-80752639]: client csum 7f08fe36, server csum f8fbfe4c
      [163012.990138] LustreError: Skipped 2 previous similar messages
      [163020.008131] LustreError: 168-f: BAD WRITE CHECKSUM: lsdraid-OST0000 from 12345-192.168.1.8@o2ib inode [0x200000406:0x3d6:0x0] object 0x0:44794 extent [83886080-100270079]: client csum 886feb33, server csum ccc0eb4a
      [163042.829796] -----------[ cut here ]-----------
      [163042.837389] kernel BUG at include/linux/scatterlist.h:65!
      [163042.845758] invalid opcode: 0000 1 SMP
      [163042.852645] Modules linked in: nfsv3 nfs_acl raid10 osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) zfs(OE) zunicode(OE) zavl(OE) zcommon(OE) znvpair(OE) spl(OE) zlib_deflate lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) sha512_generic crypto_null rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ses dm_service_time enclosure intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mpt3sas ipmi_devintf ipmi_ssif ipmi_si
      [163042.944819] raid_class sb_edac iTCO_wdt iTCO_vendor_support scsi_transport_sas sg edac_core pcspkr ipmi_msghandler wmi ioatdma mei_me mei lpc_ich shpchp i2c_i801 mfd_core acpi_pad acpi_power_meter dm_multipath dm_mod ip_tables ext4 mbcache jbd2 mlx4_ib mlx4_en ib_sa vxlan ib_mad ip6_udp_tunnel udp_tunnel ib_core ib_addr sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt drm_kms_helper crct10dif_pclmul igb crct10dif_common ttm ptp crc32c_intel ahci pps_core drm mlx4_core libahci dca i2c_algo_bit libata i2c_core [last unloaded: zunicode]
      [163043.010335] CPU: 12 PID: 84956 Comm: ll_ost_io00_002 Tainted: G B IOE ------------ 3.10.0-327.36.3.el7.x86_64 #1
      [163043.025057] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
      [163043.038989] task: ffff880fc52bc500 ti: ffff880fc55bc000 task.ti: ffff880fc55bc000
      [163043.049639] RIP: 0010:[<ffffffffa0960fef>] [<ffffffffa0960fef>] cfs_crypto_hash_update_page+0x9f/0xb0 [libcfs]
      [163043.063453] RSP: 0018:ffff880fc55bfab8 EFLAGS: 00010202
      [163043.071687] RAX: 0000000000000002 RBX: ffff8810f6db9b80 RCX: 0000000000000000
      [163043.081918] RDX: 0000000000000020 RSI: 0000000000000000 RDI: ffff880fc55bfad8
      [163043.092095] RBP: ffff880fc55bfb00 R08: 00000000000195a0 R09: ffff880fc55bfab8
      [163043.103441] R10: ffff88103e807900 R11: 0000000000000001 R12: 3635343332313036
      [163043.113462] R13: 0000000033323130 R14: 0000000000000534 R15: 0000000000000000
      [163043.123487] FS: 0000000000000000(0000) GS:ffff88103ef00000(0000) knlGS:0000000000000000
      [163043.134599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [163043.143101] CR2: 00007fce5afab000 CR3: 000000000194a000 CR4: 00000000001407e0
      [163043.153184] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [163043.163242] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [163043.173280] Stack:
      [163043.177580] 0000000000000002 0000000000000000 0000000000000000 0000000000000000
      [163043.188354] 00000000f43b381e 0000000000000000 ffff880fcc7d1301 ffff880e73ecc200
      [163043.199140] 0000000000000000 ffff880fc55bfb68 ffffffffa0d5345c ffff88202563f0a8
      [163043.209907] Call Trace:
      [163043.215455] [<ffffffffa0d5345c>] tgt_checksum_bulk.isra.33+0x35a/0x4e7 [ptlrpc]
      [163043.226242] [<ffffffffa0d2c21d>] tgt_brw_write+0x114d/0x1640 [ptlrpc]
      [163043.235986] [<ffffffff810c15cc>] ? update_curr+0xcc/0x150
      [163043.244558] [<ffffffff810be46e>] ? account_entity_dequeue+0xae/0xd0
      [163043.254271] [<ffffffffa0c82560>] ? target_send_reply_msg+0x170/0x170 [ptlrpc]
      [163043.264858] [<ffffffffa0d28225>] tgt_request_handle+0x915/0x1320 [ptlrpc]
      [163043.275043] [<ffffffffa0cd41ab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      [163043.286074] [<ffffffffa0967128>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
      [163043.296175] [<ffffffffa0cd1d68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [163043.306194] [<ffffffff810b8952>] ? default_wake_function+0x12/0x20
      [163043.315553] [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90
      [163043.324714] [<ffffffffa0cd8260>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
      [163043.334070] [<ffffffffa0cd77c0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc]
      [163043.344635] [<ffffffff810a5b8f>] kthread+0xcf/0xe0
      [163043.352181] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
      [163043.361606] [<ffffffff81646a98>] ret_from_fork+0x58/0x90
      [163043.369571] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
      [163043.378772] Code: 89 43 38 48 8b 43 20 ff 50 c0 48 8b 55 d8 65 48 33 14 25 28 00 00 00 75 0d 48 83 c4 28 5b 41 5c 41 5d 41 5e 5d c3 e8 61 a0 71 e0 <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
      [163043.406113] RIP [<ffffffffa0960fef>] cfs_crypto_hash_update_page+0x9f/0xb0 [libcfs]
      [163043.416991] RSP <ffff880fc55bfab8>

      This happened fairly quickly. After this run I restarted the system and it happened again almost immediately.

      Attachments

        Issue Links

          Activity

            [LU-9304] BUG: Bad page state in process ll_ost_io01_013 pfn:1a01bcd kernel BUG at include/linux/scatterlist.h:65!
            jgmitter Joseph Gmitter (Inactive) made changes -
            Link New: This issue duplicates LU-9305 [ LU-9305 ]
            jay Jinshan Xiong (Inactive) made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]

            memory corruption.

            jay Jinshan Xiong (Inactive) added a comment - memory corruption.

            Here is a test case:
            #!/usr/bin/python3.4

            import os
            import uuid
            import tempfile

            def WriteFileBasic(multiplysize, blocksize, writechar, path):
            """
            Writes a basic file in a sequanetial fashion and fills it with writechar with the size of the file
            being equal to writechar * multiplysize and written in chunks at time based on blocksize.
            """
            writethis = writechar * multiplysize
            unique_filename = uuid.uuid4()
            filetowrite = path + '/basic-' + str(unique_filename)

            fd = open(filetowrite, 'wb')
            for x in range(blocksize):
            fd.write(bytes(writethis, 'UTF-8'))
            fd.close()

            directory = tempfile.mkdtemp(dir="/mnt/lustre")
            if not os.path.exists(directory):
            os.makedirs(directory)

            print("Writing files to:

            {0}

            ".format(directory))
            for i in range(100):
            for x in range(2):
            WriteFileBasic(multiplysize=204800, blocksize=1024, writechar='0123456', path=directory)
            WriteFileBasic(multiplysize=204800, blocksize=1024, writechar='0123456', path=directory)
            WriteFileBasic(multiplysize=204800, blocksize=128, writechar='0', path=directory)
            WriteFileBasic(multiplysize=204800, blocksize=128, writechar='0', path=directory)
            WriteFileBasic(multiplysize=204800, blocksize=1024, writechar='0123456', path=directory)
            WriteFileBasic(multiplysize=124, blocksize=1024, writechar='0', path=directory)

            I start multiple copies of that on one client node at least 2 but 4 sometimes seems to make this come out a little quicker

            Here is an example from tonight running this.
            mpirun -np 4 -wdir /mnt/lustre -machinefile hosts -env I_MPI_EXTRA_FILESYSTEM=on -env I_MPI_EXTRA_FILESYSTEM_LIST=lustre /home/johnsali/wolf-3/ior/src/ior -a POSIX -F -N 4 -d 2 -i 1 -s 20480 -b 8m -t 8m
            ./testcase_pagestate.py &
            ./testcase_pagestate.py &

            almost immediately:
            [103132.581173] LustreError: 168-f: BAD WRITE CHECKSUM: lsdraid-OST0000 from 12345-192.168.1.6@o2ib inode [0x200000bd0:0xa98:0x0] object 0x0:1018 extent [83886080-92979199]: client csum 3e2f59b2, server csum cf13a5a5

            But it took several minutes to get:
            [103332.411485] BUG: Bad page state in process ll_ost_io01_000 pfn:171cd2c
            [103332.420695] page:ffffea005c734b00 count:-1 mapcount:0 mapping: (null) index:0x0
            [103332.431396] page flags: 0x6fffff00000000()
            [103332.437504] page dumped because: nonzero _count
            [103332.444053] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate dm_service_time xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ses enclosure intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt mpt3sas iTCO_vendor_support lpc_ich ipmi_ssif sb_edac ipmi_devintf mei_me raid_class scsi_transport_sas sg mei edac_core i2c_i801 mfd_core pcspkr ipmi_si ioatdma shpchp ipmi_msghandler acpi_pad acpi_power_meter wmi dm_multipath dm_mod binfmt_misc nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 mlx4_en mlx4_ib vxlan ib_sa ip6_udp_tunnel ib_mad udp_tunnel ib_core ib_addr sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt drm_kms_helper igb crct10dif_pclmul ttm crct10dif_common ahci ptp crc32c_intel libahci pps_core drm mlx4_core dca libata i2c_algo_bit i2c_core
            [103332.588503] CPU: 58 PID: 7806 Comm: ll_ost_io01_000 Tainted: P B IOE ------------ 3.10.0-327.36.3.el7.x86_64 #1
            [103332.601978] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
            [103332.614871] ffffea005c734b00 000000002ff1b307 ffff880ff5e47908 ffffffff81636431
            [103332.624444] ffff880ff5e47930 ffffffff81631645 ffffea005c734b00 0000000000000000
            [103332.634042] 000fffff00000000 ffff880ff5e47978 ffffffff811714dd fff00000fe000000
            [103332.643624] Call Trace:
            [103332.647573] [<ffffffff81636431>] dump_stack+0x19/0x1b
            [103332.654528] [<ffffffff81631645>] bad_page.part.59+0xdf/0xfc
            [103332.662089] [<ffffffff811714dd>] free_pages_prepare+0x16d/0x190
            [103332.670015] [<ffffffff81171e21>] free_hot_cold_page+0x31/0x140
            [103332.677872] [<ffffffff8117200f>] __free_pages+0x3f/0x60
            [103332.685071] [<ffffffffa0aeead3>] osd_bufs_put+0x123/0x1f0 [osd_zfs]
            [103332.693421] [<ffffffffa0b5c84a>] ofd_commitrw_write+0xea/0x1c20 [ofd]
            [103332.701942] [<ffffffffa0b60f2d>] ofd_commitrw+0x51d/0xa40 [ofd]
            [103332.709944] [<ffffffffa0ece8d2>] obd_commitrw+0x2ec/0x32f [ptlrpc]
            [103332.718231] [<ffffffffa0ea6f71>] tgt_brw_write+0xea1/0x1640 [ptlrpc]
            [103332.726897] [<ffffffff810c15cc>] ? update_curr+0xcc/0x150
            [103332.734244] [<ffffffff810be46e>] ? account_entity_dequeue+0xae/0xd0
            [103332.742599] [<ffffffffa0dfd560>] ? target_send_reply_msg+0x170/0x170 [ptlrpc]
            [103332.751928] [<ffffffffa0ea3225>] tgt_request_handle+0x915/0x1320 [ptlrpc]
            [103332.761126] [<ffffffffa0e4f1ab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
            [103332.770928] [<ffffffffa09c1128>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
            [103332.779767] [<ffffffffa0e4cd68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
            [103332.788569] [<ffffffff810b8952>] ? default_wake_function+0x12/0x20
            [103332.796803] [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90
            [103332.804581] [<ffffffffa0e53260>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
            [103332.812826] [<ffffffffa0e527c0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc]
            [103332.822297] [<ffffffff810a5b8f>] kthread+0xcf/0xe0
            [103332.828979] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
            [103332.837491] [<ffffffff81646a98>] ret_from_fork+0x58/0x90
            [103332.844738] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140

            The case is basically writing large file, large file, very small file or large file, very small file while stream IO is going on from ior.

            jsalians_intel John Salinas (Inactive) added a comment - Here is a test case: #!/usr/bin/python3.4 import os import uuid import tempfile def WriteFileBasic(multiplysize, blocksize, writechar, path): """ Writes a basic file in a sequanetial fashion and fills it with writechar with the size of the file being equal to writechar * multiplysize and written in chunks at time based on blocksize. """ writethis = writechar * multiplysize unique_filename = uuid.uuid4() filetowrite = path + '/basic-' + str(unique_filename) fd = open(filetowrite, 'wb') for x in range(blocksize): fd.write(bytes(writethis, 'UTF-8')) fd.close() directory = tempfile.mkdtemp(dir="/mnt/lustre") if not os.path.exists(directory): os.makedirs(directory) print("Writing files to: {0} ".format(directory)) for i in range(100): for x in range(2): WriteFileBasic(multiplysize=204800, blocksize=1024, writechar='0123456', path=directory) WriteFileBasic(multiplysize=204800, blocksize=1024, writechar='0123456', path=directory) WriteFileBasic(multiplysize=204800, blocksize=128, writechar='0', path=directory) WriteFileBasic(multiplysize=204800, blocksize=128, writechar='0', path=directory) WriteFileBasic(multiplysize=204800, blocksize=1024, writechar='0123456', path=directory) WriteFileBasic(multiplysize=124, blocksize=1024, writechar='0', path=directory) I start multiple copies of that on one client node at least 2 but 4 sometimes seems to make this come out a little quicker Here is an example from tonight running this. mpirun -np 4 -wdir /mnt/lustre -machinefile hosts -env I_MPI_EXTRA_FILESYSTEM=on -env I_MPI_EXTRA_FILESYSTEM_LIST=lustre /home/johnsali/wolf-3/ior/src/ior -a POSIX -F -N 4 -d 2 -i 1 -s 20480 -b 8m -t 8m ./testcase_pagestate.py & ./testcase_pagestate.py & almost immediately: [103132.581173] LustreError: 168-f: BAD WRITE CHECKSUM: lsdraid-OST0000 from 12345-192.168.1.6@o2ib inode [0x200000bd0:0xa98:0x0] object 0x0:1018 extent [83886080-92979199] : client csum 3e2f59b2, server csum cf13a5a5 But it took several minutes to get: [103332.411485] BUG: Bad page state in process ll_ost_io01_000 pfn:171cd2c [103332.420695] page:ffffea005c734b00 count:-1 mapcount:0 mapping: (null) index:0x0 [103332.431396] page flags: 0x6fffff00000000() [103332.437504] page dumped because: nonzero _count [103332.444053] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate dm_service_time xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ses enclosure intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt mpt3sas iTCO_vendor_support lpc_ich ipmi_ssif sb_edac ipmi_devintf mei_me raid_class scsi_transport_sas sg mei edac_core i2c_i801 mfd_core pcspkr ipmi_si ioatdma shpchp ipmi_msghandler acpi_pad acpi_power_meter wmi dm_multipath dm_mod binfmt_misc nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 mlx4_en mlx4_ib vxlan ib_sa ip6_udp_tunnel ib_mad udp_tunnel ib_core ib_addr sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt drm_kms_helper igb crct10dif_pclmul ttm crct10dif_common ahci ptp crc32c_intel libahci pps_core drm mlx4_core dca libata i2c_algo_bit i2c_core [103332.588503] CPU: 58 PID: 7806 Comm: ll_ost_io01_000 Tainted: P B IOE ------------ 3.10.0-327.36.3.el7.x86_64 #1 [103332.601978] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [103332.614871] ffffea005c734b00 000000002ff1b307 ffff880ff5e47908 ffffffff81636431 [103332.624444] ffff880ff5e47930 ffffffff81631645 ffffea005c734b00 0000000000000000 [103332.634042] 000fffff00000000 ffff880ff5e47978 ffffffff811714dd fff00000fe000000 [103332.643624] Call Trace: [103332.647573] [<ffffffff81636431>] dump_stack+0x19/0x1b [103332.654528] [<ffffffff81631645>] bad_page.part.59+0xdf/0xfc [103332.662089] [<ffffffff811714dd>] free_pages_prepare+0x16d/0x190 [103332.670015] [<ffffffff81171e21>] free_hot_cold_page+0x31/0x140 [103332.677872] [<ffffffff8117200f>] __free_pages+0x3f/0x60 [103332.685071] [<ffffffffa0aeead3>] osd_bufs_put+0x123/0x1f0 [osd_zfs] [103332.693421] [<ffffffffa0b5c84a>] ofd_commitrw_write+0xea/0x1c20 [ofd] [103332.701942] [<ffffffffa0b60f2d>] ofd_commitrw+0x51d/0xa40 [ofd] [103332.709944] [<ffffffffa0ece8d2>] obd_commitrw+0x2ec/0x32f [ptlrpc] [103332.718231] [<ffffffffa0ea6f71>] tgt_brw_write+0xea1/0x1640 [ptlrpc] [103332.726897] [<ffffffff810c15cc>] ? update_curr+0xcc/0x150 [103332.734244] [<ffffffff810be46e>] ? account_entity_dequeue+0xae/0xd0 [103332.742599] [<ffffffffa0dfd560>] ? target_send_reply_msg+0x170/0x170 [ptlrpc] [103332.751928] [<ffffffffa0ea3225>] tgt_request_handle+0x915/0x1320 [ptlrpc] [103332.761126] [<ffffffffa0e4f1ab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [103332.770928] [<ffffffffa09c1128>] ? lc_watchdog_touch+0x68/0x180 [libcfs] [103332.779767] [<ffffffffa0e4cd68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [103332.788569] [<ffffffff810b8952>] ? default_wake_function+0x12/0x20 [103332.796803] [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90 [103332.804581] [<ffffffffa0e53260>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc] [103332.812826] [<ffffffffa0e527c0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc] [103332.822297] [<ffffffff810a5b8f>] kthread+0xcf/0xe0 [103332.828979] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140 [103332.837491] [<ffffffff81646a98>] ret_from_fork+0x58/0x90 [103332.844738] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140 The case is basically writing large file, large file, very small file or large file, very small file while stream IO is going on from ior.

            For the dumps in 0.7.0 RC3 after an OST process takes a BUG: Bad page state in process ll_ost_io01 but the OSS node doesn't crash is this process "hung" / dead forever?

            jsalians_intel John Salinas (Inactive) added a comment - For the dumps in 0.7.0 RC3 after an OST process takes a BUG: Bad page state in process ll_ost_io01 but the OSS node doesn't crash is this process "hung" / dead forever?

            Do you have any ideas why in the one case the OSS node hits the Bad page state in process in stock RC3 but not the node crash/reboot? Is that just different configuration options (like panic on this) or is that a symptom that something additional is wrong in beta-coral-combined?

            Is there anyway to isolate the memory for ptlrpc? I am not sure how to figure out what is stepping on these values. I can run these same tests directly on ZFS without issues – figuring out the interactions between zfs and lustre is a bit challenging.

            jsalians_intel John Salinas (Inactive) added a comment - Do you have any ideas why in the one case the OSS node hits the Bad page state in process in stock RC3 but not the node crash/reboot? Is that just different configuration options (like panic on this) or is that a symptom that something additional is wrong in beta-coral-combined? Is there anyway to isolate the memory for ptlrpc? I am not sure how to figure out what is stepping on these values. I can run these same tests directly on ZFS without issues – figuring out the interactions between zfs and lustre is a bit challenging.

            Do you want me to keep looking into this from the Lustre side?

            utopiabound Nathaniel Clark added a comment - Do you want me to keep looking into this from the Lustre side?

            I did a mostly successful run with Lustre 2.9.0 + 0.7.0 RC3 (none of our patches) record size 1M on both OSS which forces a max RPC size of 1MB both raidz. With this combination set I did not not see Bad page state in process XXX. It is my assumption based on these runs that the ZFS record size and associated brw size of 16MB is the issue.

            jsalians_intel John Salinas (Inactive) added a comment - I did a mostly successful run with Lustre 2.9.0 + 0.7.0 RC3 (none of our patches) record size 1M on both OSS which forces a max RPC size of 1MB both raidz. With this combination set I did not not see Bad page state in process XXX. It is my assumption based on these runs that the ZFS record size and associated brw size of 16MB is the issue.

            Looking at "wolf-3 OSS 10.8.1.3-2017-04-15-00:39:07":

            Dump the backtrace with stack:

            crash> bt -f
            ...
             #9 [ffff880fdb36bab0] cfs_crypto_hash_update_page at ffffffffa0813fce [libcfs]
                ffff880fdb36bab8: 3433323130363536 3130363500000332 
                ffff880fdb36bac8: 0000000000000000 0000000000000000 
                ffff880fdb36bad8: 00000000b643a6e1 0000000000000000 
                ffff880fdb36bae8: ffff882027752a01 ffff881dc0c9a600 
                ffff880fdb36baf8: 0000000000000000 ffff880fdb36bb68 
                ffff880fdb36bb08: ffffffffa0eb7459 
            #10 [ffff880fdb36bb08] tgt_checksum_bulk at ffffffffa0eb7459 [ptlrpc]
                ffff880fdb36bb10: ffff881969ae18a8 ffff880fd957e780 
                ffff880fdb36bb20: 00000004810b8940 ffff881d1a0d1c80 
                ffff880fdb36bb30: dead000000200200 00000000b643a6e1 
                ffff880fdb36bb40: ffff8817ffd40050 ffff882027752a80 
                ffff880fdb36bb50: ffff881f26180000 0000000000000000 
                ffff880fdb36bb60: ffff8818dee4cc00 ffff880fdb36bcd0 
                ffff880fdb36bb70: ffffffffa0e9021d 
            ...
            

            Hunt until we find the ptlrpc_bulk_desc:

            crash> struct ptlrpc_bulk_desc ffff881dc0c9a600
            struct ptlrpc_bulk_desc {
              bd_failure = 0, 
              bd_registered = 0, 
              bd_lock = {
                {
                  rlock = {
                    raw_lock = {
                      {
                        head_tail = 4587590, 
                        tickets = {
                          head = 70, 
                          tail = 70
                        }
                      }
                    }
                  }
                }
              }, 
              bd_import_generation = 0, 
              bd_type = 41, 
              bd_portal = 8, 
              bd_export = 0xffff8818dee4cc00, 
              bd_import = 0x0, 
              bd_req = 0xffff8817ffd40050, 
              bd_frag_ops = 0xffffffffa0ec16a0 <ptlrpc_bulk_kiov_nopin_ops>, 
              bd_waitq = {
                lock = {
                  {
                    rlock = {
                      raw_lock = {
                        {
                          head_tail = 393222, 
                          tickets = {
                            head = 6, 
                            tail = 6
                          }
                        }
                      }
                    }
                  }
                }, 
                task_list = {
                  next = 0xffff881dc0c9a640, 
                  prev = 0xffff881dc0c9a640
                }
              }, 
              bd_iov_count = 3872, 
              bd_max_iov = 3872, 
              bd_nob = 15859712, 
              bd_nob_transferred = 15859712, 
              bd_last_mbits = 0, 
              bd_cbid = {
                cbid_fn = 0xffffffffa0e30b80 <reply_out_callback+736>, 
                cbid_arg = 0xffff881dc0c9a600
              }, 
              bd_sender = 1407378115789062, 
              bd_md_count = 0, 
              bd_md_max_brw = 16, 
              bd_mds = {{
                  cookie = 237797
                }, {
                  cookie = 237805
                }, {
                  cookie = 237813
                }, {
                  cookie = 237821
                }, {
                  cookie = 237829
                }, {
                  cookie = 237837
                }, {
                  cookie = 237845
                }, {
                  cookie = 237853
                }, {
                  cookie = 237861
                }, {
                  cookie = 237869
                }, {
                  cookie = 237877
                }, {
                  cookie = 237885
                }, {
                  cookie = 237893
                }, {
                  cookie = 237901
                }, {
                  cookie = 237909
                }, {
                  cookie = 237917
                }}, 
              bd_u = {
                bd_kiov = {
                  bd_enc_vec = 0x0, 
                  bd_vec = 0xffff881be3ca0000
                }, 
                bd_kvec = {
                  bd_enc_kvec = 0x0, 
                  bd_kvec = 0xffff881be3ca0000
                }
              }
            }
            

            Since we know the page point in the bd_vec is the issue:

            crash> lnet_kiov_t ffff881be3ca0000
            struct lnet_kiov_t {
              kiov_page = 0x3433323130363534, 
              kiov_len = 825243189, 
              kiov_offset = 892613426
            }
            

            The kiov_page isn't even remotely a valid kernel pointer. I'll work on tracking down where the bad values could have come from.

            utopiabound Nathaniel Clark added a comment - Looking at "wolf-3 OSS 10.8.1.3-2017-04-15-00:39:07": Dump the backtrace with stack: crash> bt -f ... #9 [ffff880fdb36bab0] cfs_crypto_hash_update_page at ffffffffa0813fce [libcfs] ffff880fdb36bab8: 3433323130363536 3130363500000332 ffff880fdb36bac8: 0000000000000000 0000000000000000 ffff880fdb36bad8: 00000000b643a6e1 0000000000000000 ffff880fdb36bae8: ffff882027752a01 ffff881dc0c9a600 ffff880fdb36baf8: 0000000000000000 ffff880fdb36bb68 ffff880fdb36bb08: ffffffffa0eb7459 #10 [ffff880fdb36bb08] tgt_checksum_bulk at ffffffffa0eb7459 [ptlrpc] ffff880fdb36bb10: ffff881969ae18a8 ffff880fd957e780 ffff880fdb36bb20: 00000004810b8940 ffff881d1a0d1c80 ffff880fdb36bb30: dead000000200200 00000000b643a6e1 ffff880fdb36bb40: ffff8817ffd40050 ffff882027752a80 ffff880fdb36bb50: ffff881f26180000 0000000000000000 ffff880fdb36bb60: ffff8818dee4cc00 ffff880fdb36bcd0 ffff880fdb36bb70: ffffffffa0e9021d ... Hunt until we find the ptlrpc_bulk_desc : crash> struct ptlrpc_bulk_desc ffff881dc0c9a600 struct ptlrpc_bulk_desc { bd_failure = 0, bd_registered = 0, bd_lock = { { rlock = { raw_lock = { { head_tail = 4587590, tickets = { head = 70, tail = 70 } } } } } }, bd_import_generation = 0, bd_type = 41, bd_portal = 8, bd_export = 0xffff8818dee4cc00, bd_import = 0x0, bd_req = 0xffff8817ffd40050, bd_frag_ops = 0xffffffffa0ec16a0 <ptlrpc_bulk_kiov_nopin_ops>, bd_waitq = { lock = { { rlock = { raw_lock = { { head_tail = 393222, tickets = { head = 6, tail = 6 } } } } } }, task_list = { next = 0xffff881dc0c9a640, prev = 0xffff881dc0c9a640 } }, bd_iov_count = 3872, bd_max_iov = 3872, bd_nob = 15859712, bd_nob_transferred = 15859712, bd_last_mbits = 0, bd_cbid = { cbid_fn = 0xffffffffa0e30b80 <reply_out_callback+736>, cbid_arg = 0xffff881dc0c9a600 }, bd_sender = 1407378115789062, bd_md_count = 0, bd_md_max_brw = 16, bd_mds = {{ cookie = 237797 }, { cookie = 237805 }, { cookie = 237813 }, { cookie = 237821 }, { cookie = 237829 }, { cookie = 237837 }, { cookie = 237845 }, { cookie = 237853 }, { cookie = 237861 }, { cookie = 237869 }, { cookie = 237877 }, { cookie = 237885 }, { cookie = 237893 }, { cookie = 237901 }, { cookie = 237909 }, { cookie = 237917 }}, bd_u = { bd_kiov = { bd_enc_vec = 0x0, bd_vec = 0xffff881be3ca0000 }, bd_kvec = { bd_enc_kvec = 0x0, bd_kvec = 0xffff881be3ca0000 } } } Since we know the page point in the bd_vec is the issue: crash> lnet_kiov_t ffff881be3ca0000 struct lnet_kiov_t { kiov_page = 0x3433323130363534, kiov_len = 825243189, kiov_offset = 892613426 } The kiov_page isn't even remotely a valid kernel pointer. I'll work on tracking down where the bad values could have come from.

            The above dumps are now on onyx:
            /scratch/johnsali/
            rw-rr- 1 johnsali johnsali 430M Apr 15 08:24 10.8.1.3-2017-04-14-224609.tgz
            rw-rr- 1 johnsali johnsali 142M Apr 15 08:24 10.8.1.3-2017-04-15-003907.tgz
            rw-rr- 1 johnsali johnsali 707M Apr 15 08:26 10.8.1.3-2017-04-15-132245.tgz
            rw-rr- 1 johnsali johnsali 274M Apr 15 08:27 10.8.1.4-2017-04-15-002617.tgz
            rw-rr- 1 johnsali johnsali 485M Apr 15 08:29 10.8.1.4-2017-04-15-014743.tgz
            rw-rr- 1 johnsali johnsali 782M Apr 15 08:31 10.8.1.4-2017-04-15-132247.tgz

            jsalians_intel John Salinas (Inactive) added a comment - The above dumps are now on onyx: /scratch/johnsali/ rw-r r - 1 johnsali johnsali 430M Apr 15 08:24 10.8.1.3-2017-04-14-224609.tgz rw-r r - 1 johnsali johnsali 142M Apr 15 08:24 10.8.1.3-2017-04-15-003907.tgz rw-r r - 1 johnsali johnsali 707M Apr 15 08:26 10.8.1.3-2017-04-15-132245.tgz rw-r r - 1 johnsali johnsali 274M Apr 15 08:27 10.8.1.4-2017-04-15-002617.tgz rw-r r - 1 johnsali johnsali 485M Apr 15 08:29 10.8.1.4-2017-04-15-014743.tgz rw-r r - 1 johnsali johnsali 782M Apr 15 08:31 10.8.1.4-2017-04-15-132247.tgz

            People

              utopiabound Nathaniel Clark
              jsalians_intel John Salinas (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: