Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11787

sanityn test 71a fails with ‘data is not flushed from client’

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0, Lustre 2.15.4
    • Lustre 2.12.0, (10)
      Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.2, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.6, Lustre 2.15.0, Lustre 2.15.3
    • 3
    • 9223372036854775807

    Description

      sanityn test_71a fails with ‘data is not flushed from client’ for ARM clients. These failures started on October 26, 2018.

      Looking at a recent failure, https://testing.whamcloud.com/test_sets/f1121dd4-fdef-11e8-b837-52540065bddc , we see the following in the client test_log

      == sanityn test 71a: correct file map just after write operation is finished ========================= 00:36:26 (1544488586)
      CMD: trevis-24vm4 /usr/sbin/lctl get_param -n version 2>/dev/null ||
      				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
      				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
      1+0 records in
      1+0 records out
      40960 bytes (41 kB) copied, 0.00437335 s, 9.4 MB/s
      1+0 records in
      1+0 records out
      40960 bytes (41 kB) copied, 0.00155542 s, 26.3 MB/s
        File: '/mnt/lustre2/f71a.sanityn'
        Size: 163840    	Blocks: 1          IO Block: 4194304 regular file
      Device: 2c54f966h/743766374d	Inode: 144115205322894149  Links: 1
      Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
      Access: 2018-12-11 00:36:27.000000000 +0000
      Modify: 2018-12-11 00:36:27.000000000 +0000
      Change: 2018-12-11 00:36:27.000000000 +0000
       Birth: -
      209708
      fd: 3
      extent in offset 0, length 122880
      flags: 80000000
      extent in offset 122880, length 40960
      flags: 80000001
      No unwritten extents, extents number 2, file size 163840, original size 81920
       sanityn test_71a: @@@@@@ FAIL: data is not flushed from client 
      

      There are no errors in any of the console logs.

      Logs for more of these test failures are at
      https://testing.whamcloud.com/test_sets/8b2f4282-d9d0-11e8-b46b-52540065bddc
      https://testing.whamcloud.com/test_sets/83526f8c-df5f-11e8-a871-52540065bddc
      https://testing.whamcloud.com/test_sets/1de368ba-fa38-11e8-bb6b-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-11787] sanityn test 71a fails with ‘data is not flushed from client’

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51287/
            Subject: LU-11787 test: Fix checkfilemap tests for 64K page
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 66d9916dc947064b598f52476fb6482c0bbaff10

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51287/ Subject: LU-11787 test: Fix checkfilemap tests for 64K page Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 66d9916dc947064b598f52476fb6482c0bbaff10

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51287
            Subject: LU-11787 test: Fix checkfilemap tests for 64K page
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 5755b9c9148cfe235fb2f4dcb066eb0c83fab274

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51287 Subject: LU-11787 test: Fix checkfilemap tests for 64K page Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 5755b9c9148cfe235fb2f4dcb066eb0c83fab274
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45629/
            Subject: LU-11787 test: Fix checkfilemap tests for 64K page
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7c88dfd28b5cc6114a85f187ecb2473657d42c9d

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45629/ Subject: LU-11787 test: Fix checkfilemap tests for 64K page Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7c88dfd28b5cc6114a85f187ecb2473657d42c9d

            Looking at the test itself, it seems clear that it won't work properly on a 64KB PAGE_SIZE system:

                     # write data this way: hole - data - hole - data
                    dd if=/dev/urandom of=$DIR1/$tfile bs=40K seek=1 count=1
                    dd if=/dev/urandom of=$DIR1/$tfile bs=40K seek=3 count=1
            

            because the "hole" is from 80KB-120KB, and with 64KB PAGE_SIZE the 40-80KB write will actually dirty the 64KB-128KB page and no hole is created...

            The test needs to be updated to use seek=0 for the first dd and seek=4 for the second dd and it should pass. It may need some adjustments to the later checks, but at least there will be a hole from 64KB-128KB for 64KB PAGE_SIZE clients, and 40-160KB for 4KB PAGE_SIZE clients. If we want to make the file consistent, it should use bs=64K.

            adilger Andreas Dilger added a comment - Looking at the test itself, it seems clear that it won't work properly on a 64KB PAGE_SIZE system: # write data this way: hole - data - hole - data dd if=/dev/urandom of=$DIR1/$tfile bs=40K seek=1 count=1 dd if=/dev/urandom of=$DIR1/$tfile bs=40K seek=3 count=1 because the "hole" is from 80KB-120KB, and with 64KB PAGE_SIZE the 40-80KB write will actually dirty the 64KB-128KB page and no hole is created... The test needs to be updated to use seek=0 for the first dd and seek=4 for the second dd and it should pass. It may need some adjustments to the later checks, but at least there will be a hole from 64KB-128KB for 64KB PAGE_SIZE clients, and 40-160KB for 4KB PAGE_SIZE clients. If we want to make the file consistent, it should use bs=64K.
            xinliang Xinliang Liu added a comment -

            Currently, ldiskfs block allocated is PAGE_SIZE aligned, see function osd_ldiskfs_map_inode_pages(), test case need to be modified to PAGE_SIZE aligned for Arm.

            xinliang Xinliang Liu added a comment - Currently, ldiskfs block allocated is PAGE_SIZE aligned, see function osd_ldiskfs_map_inode_pages(), test case need to be modified to PAGE_SIZE aligned for Arm.

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/45629
            Subject: LU-11787 test: re-enable sanityn 71a for 64K page systems
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 49b892c53b16a7d02c29b4a041175d5303022cbb

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/45629 Subject: LU-11787 test: re-enable sanityn 71a for 64K page systems Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 49b892c53b16a7d02c29b4a041175d5303022cbb
            xinliang Xinliang Liu added a comment - - edited

            Note that above crash is another issue, tracking in this bug: LU-15122

             

            xinliang Xinliang Liu added a comment - - edited Note that above crash is another issue, tracking in this bug: LU-15122  
            xinliang Xinliang Liu added a comment -

            Arm crashing when running sanityn test 71a for master branch

            [ 1330.973746] Lustre: DEBUG MARKER: Using TIMEOUT=20
            [ 1333.254497] Lustre: Modifying parameter general.*.*.lbug_on_grant_miscount in log params
            [ 1338.681588] Lustre: DEBUG MARKER: beegfs-test-01.novalocal: executing check_logdir /tmp/test_logs/2021-10-11/020444
            [ 1339.086489] Lustre: DEBUG MARKER: beegfs-test-01.novalocal: executing yml_node
            [ 1339.419862] Lustre: DEBUG MARKER: Client: 2.14.52.80
            [ 1339.495670] Lustre: DEBUG MARKER: MDS: 2.14.52.80
            [ 1339.567308] Lustre: DEBUG MARKER: OSS: 2.14.52.80
            [ 1339.627097] Lustre: DEBUG MARKER: -----============= acceptance-small: sanityn ============----- Mon Oct 11 02:05:23 UTC 2021
            [ 1339.980338] Lustre: DEBUG MARKER: excepting tests: 28
            [ 1340.091948] Lustre: Mounted lustre-client
            [ 1340.468327] Lustre: DEBUG MARKER: beegfs-test-01.novalocal: executing check_config_client /mnt/lustre
            [ 1341.262854] Lustre: DEBUG MARKER: Using TIMEOUT=20
            [ 1341.443750] Lustre: Modifying parameter general.*.*.lbug_on_grant_miscount in log params
            [ 1342.212768] Lustre: DEBUG MARKER: == sanityn test 71a: correct file map just after write operation is finished ========================================================== 02:05:25 (1633917925)
            [ 1342.335774] LustreError: 4391:0:(osd_io.c:538:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 1, block_idx 16, i 14,start_blocks: 0, count: 30, npages: 3
            [ 1342.338924] LustreError: 4391:0:(osd_io.c:538:osd_do_bio()) LBUG
            [ 1342.340168] Pid: 4391, comm: ll_ost_io06_002 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021
            [ 1342.342344] Call Trace:
            [ 1342.342858] [<0>] libcfs_call_trace+0xb8/0x118 [libcfs]
            [ 1342.343908] [<0>] lbug_with_loc+0x60/0xa0 [libcfs]
            [ 1342.344888] [<0>] osd_do_bio.constprop.21+0x6c4/0xcf8 [osd_ldiskfs]
            [ 1342.346130] [<0>] osd_ldiskfs_map_write.constprop.20+0xb4/0xf0 [osd_ldiskfs]
            [ 1342.347535] [<0>] osd_ldiskfs_map_inode_pages+0x66c/0x958 [osd_ldiskfs]
            [ 1342.348841] [<0>] osd_write_commit+0x4fc/0xbb8 [osd_ldiskfs]
            [ 1342.349971] [<0>] ofd_commitrw_write+0x658/0x1e68 [ofd]
            [ 1342.351002] [<0>] ofd_commitrw+0x360/0xa18 [ofd]
            [ 1342.352134] [<0>] tgt_brw_write+0x1624/0x2d90 [ptlrpc]
            [ 1342.353229] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc]
            [ 1342.354356] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc]
            [ 1342.355513] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc]
            [ 1342.356824] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc]
            [ 1342.357786] [<0>] kthread+0x130/0x138
            [ 1342.358508] [<0>] ret_from_fork+0x10/0x18
            [ 1342.359295] Kernel panic - not syncing: LBUG
            [ 1342.360146] CPU: 12 PID: 4391 Comm: ll_ost_io06_002 Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-305.7.1.el8_lustre.aarch64 #1
            [ 1342.362770] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
            [ 1342.364231] Call trace:
            [ 1342.364751]  dump_backtrace+0x0/0x188
            [ 1342.365526]  show_stack+0x24/0x30
            [ 1342.366234]  dump_stack+0x9c/0xbc
            [ 1342.366937]  panic+0x130/0x2f8
            [ 1342.367608]  param_set_delay_minmax.isra.1+0x0/0xd0 [libcfs]
            [ 1342.368816]  osd_do_bio.constprop.21+0x6c4/0xcf8 [osd_ldiskfs]
            [ 1342.370058]  osd_ldiskfs_map_write.constprop.20+0xb4/0xf0 [osd_ldiskfs]
            [ 1342.371477]  osd_ldiskfs_map_inode_pages+0x66c/0x958 [osd_ldiskfs]
            [ 1342.372837]  osd_write_commit+0x4fc/0xbb8 [osd_ldiskfs]
            [ 1342.373951]  ofd_commitrw_write+0x658/0x1e68 [ofd]
            [ 1342.374970]  ofd_commitrw+0x360/0xa18 [ofd]
            [ 1342.375934]  tgt_brw_write+0x1624/0x2d90 [ptlrpc]
            [ 1342.376999]  tgt_handle_request0+0xd0/0x978 [ptlrpc]
            [ 1342.378123]  tgt_request_handle+0x7c0/0x1a38 [ptlrpc]
            [ 1342.379258]  ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc]
            [ 1342.380587]  ptlrpc_main+0xd28/0x15f0 [ptlrpc]
            [ 1342.381530]  kthread+0x130/0x138
            [ 1342.382242]  ret_from_fork+0x10/0x18
            [ 1342.383021] SMP: stopping secondary CPUs
            [ 1342.389567] Starting crashdump kernel...
            [ 1342.390356] Bye!
            
            

             

            xinliang Xinliang Liu added a comment - Arm crashing when running sanityn test 71a for master branch [ 1330.973746] Lustre: DEBUG MARKER: Using TIMEOUT=20 [ 1333.254497] Lustre: Modifying parameter general.*.*.lbug_on_grant_miscount in log params [ 1338.681588] Lustre: DEBUG MARKER: beegfs-test-01.novalocal: executing check_logdir /tmp/test_logs/2021-10-11/020444 [ 1339.086489] Lustre: DEBUG MARKER: beegfs-test-01.novalocal: executing yml_node [ 1339.419862] Lustre: DEBUG MARKER: Client: 2.14.52.80 [ 1339.495670] Lustre: DEBUG MARKER: MDS: 2.14.52.80 [ 1339.567308] Lustre: DEBUG MARKER: OSS: 2.14.52.80 [ 1339.627097] Lustre: DEBUG MARKER: -----============= acceptance-small: sanityn ============----- Mon Oct 11 02:05:23 UTC 2021 [ 1339.980338] Lustre: DEBUG MARKER: excepting tests: 28 [ 1340.091948] Lustre: Mounted lustre-client [ 1340.468327] Lustre: DEBUG MARKER: beegfs-test-01.novalocal: executing check_config_client /mnt/lustre [ 1341.262854] Lustre: DEBUG MARKER: Using TIMEOUT=20 [ 1341.443750] Lustre: Modifying parameter general.*.*.lbug_on_grant_miscount in log params [ 1342.212768] Lustre: DEBUG MARKER: == sanityn test 71a: correct file map just after write operation is finished ========================================================== 02:05:25 (1633917925) [ 1342.335774] LustreError: 4391:0:(osd_io.c:538:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 1, block_idx 16, i 14,start_blocks: 0, count: 30, npages: 3 [ 1342.338924] LustreError: 4391:0:(osd_io.c:538:osd_do_bio()) LBUG [ 1342.340168] Pid: 4391, comm: ll_ost_io06_002 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021 [ 1342.342344] Call Trace: [ 1342.342858] [<0>] libcfs_call_trace+0xb8/0x118 [libcfs] [ 1342.343908] [<0>] lbug_with_loc+0x60/0xa0 [libcfs] [ 1342.344888] [<0>] osd_do_bio.constprop.21+0x6c4/0xcf8 [osd_ldiskfs] [ 1342.346130] [<0>] osd_ldiskfs_map_write.constprop.20+0xb4/0xf0 [osd_ldiskfs] [ 1342.347535] [<0>] osd_ldiskfs_map_inode_pages+0x66c/0x958 [osd_ldiskfs] [ 1342.348841] [<0>] osd_write_commit+0x4fc/0xbb8 [osd_ldiskfs] [ 1342.349971] [<0>] ofd_commitrw_write+0x658/0x1e68 [ofd] [ 1342.351002] [<0>] ofd_commitrw+0x360/0xa18 [ofd] [ 1342.352134] [<0>] tgt_brw_write+0x1624/0x2d90 [ptlrpc] [ 1342.353229] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc] [ 1342.354356] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc] [ 1342.355513] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc] [ 1342.356824] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc] [ 1342.357786] [<0>] kthread+0x130/0x138 [ 1342.358508] [<0>] ret_from_fork+0x10/0x18 [ 1342.359295] Kernel panic - not syncing: LBUG [ 1342.360146] CPU: 12 PID: 4391 Comm: ll_ost_io06_002 Kdump: loaded Tainted: P OE --------- - - 4.18.0-305.7.1.el8_lustre.aarch64 #1 [ 1342.362770] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 1342.364231] Call trace: [ 1342.364751] dump_backtrace+0x0/0x188 [ 1342.365526] show_stack+0x24/0x30 [ 1342.366234] dump_stack+0x9c/0xbc [ 1342.366937] panic+0x130/0x2f8 [ 1342.367608] param_set_delay_minmax.isra.1+0x0/0xd0 [libcfs] [ 1342.368816] osd_do_bio.constprop.21+0x6c4/0xcf8 [osd_ldiskfs] [ 1342.370058] osd_ldiskfs_map_write.constprop.20+0xb4/0xf0 [osd_ldiskfs] [ 1342.371477] osd_ldiskfs_map_inode_pages+0x66c/0x958 [osd_ldiskfs] [ 1342.372837] osd_write_commit+0x4fc/0xbb8 [osd_ldiskfs] [ 1342.373951] ofd_commitrw_write+0x658/0x1e68 [ofd] [ 1342.374970] ofd_commitrw+0x360/0xa18 [ofd] [ 1342.375934] tgt_brw_write+0x1624/0x2d90 [ptlrpc] [ 1342.376999] tgt_handle_request0+0xd0/0x978 [ptlrpc] [ 1342.378123] tgt_request_handle+0x7c0/0x1a38 [ptlrpc] [ 1342.379258] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc] [ 1342.380587] ptlrpc_main+0xd28/0x15f0 [ptlrpc] [ 1342.381530] kthread+0x130/0x138 [ 1342.382242] ret_from_fork+0x10/0x18 [ 1342.383021] SMP: stopping secondary CPUs [ 1342.389567] Starting crashdump kernel... [ 1342.390356] Bye!  

            The patch that landed, https://review.whamcloud.com/37561/, puts sanityn tests 16a and 71a on the ALWAYS_EXCEPT list for PPC client testing. This ticket should remain open until those tests are fixed and the tests are taken off the list.

            jamesanunez James Nunez (Inactive) added a comment - The patch that landed, https://review.whamcloud.com/37561/ , puts sanityn tests 16a and 71a on the ALWAYS_EXCEPT list for PPC client testing. This ticket should remain open until those tests are fixed and the tests are taken off the list.
            sarah Sarah Liu added a comment -

            hit similar issue on PPC client on 2.12.3
            https://testing.whamcloud.com/test_sets/dd84edc0-eb0e-11e9-b62b-52540065bddc

            == sanityn test 71a: correct file map just after write operation is finished ========================= 23:54:23 (1570578863)
            CMD: trevis-55vm2 /usr/sbin/lctl get_param -n version 2>/dev/null ||
            				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
            				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
            1+0 records in
            1+0 records out
            40960 bytes (41 kB) copied, 0.00459198 s, 8.9 MB/s
            1+0 records in
            1+0 records out
            40960 bytes (41 kB) copied, 0.000819573 s, 50.0 MB/s
              File: '/mnt/lustre2/f71a.sanityn'
              Size: 163840    	Blocks: 1          IO Block: 4194304 regular file
            Device: 2c54f966h/743766374d	Inode: 144115205272562497  Links: 1
            Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2019-10-08 23:54:24.000000000 +0000
            Modify: 2019-10-08 23:54:24.000000000 +0000
            Change: 2019-10-08 23:54:24.000000000 +0000
             Birth: -
            149703
            fd: 3
            error while ioctl 2
             sanityn test_71a: @@@@@@ FAIL: data is not flushed from client 
            

            OSS crash

            [17669.845624] Lustre: DEBUG MARKER: dmesg
            [17670.427979] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanityn test 71a: correct file map just after write operation is finished ========================= 23:54:23 \(1570578863\)
            [17670.623376] Lustre: DEBUG MARKER: == sanityn test 71a: correct file map just after write operation is finished ========================= 23:54:23 (1570578863)
            [17671.194035] general protection fault: 0000 [#1] SMP 
            [17671.195205] Modules linked in: dm_flakey osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon i2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk 8139too crct10dif_pclmul crct10dif_common ata_piix crc32c_intel serio_raw
            [17671.209748]  libata virtio_pci virtio_ring virtio 8139cp mii floppy [last unloaded: dm_flakey]
            [17671.211639] CPU: 1 PID: 27825 Comm: ps Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7_lustre.x86_64 #1
            [17671.213628] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [17671.214627] task: ffff9b3dd354c100 ti: ffff9b3debae0000 task.ti: ffff9b3debae0000
            [17671.216012] RIP: 0010:[<ffffffffbccf7ab9>]  [<ffffffffbccf7ab9>] cap_capable+0x29/0x80
            [17671.217533] RSP: 0018:ffff9b3debae3c60  EFLAGS: 00010283
            [17671.218461] RAX: ffffffffbccf7a90 RBX: 000000000000000d RCX: 0000000000000000
            [17671.219703] RDX: 0000000000000013 RSI: 80d764bdffffffff RDI: ffff9b3dd3a0d900
            [17671.220929] RBP: ffff9b3debae3c60 R08: ffffffffbd64d780 R09: ffff9b3dfd001300
            [17671.222150] R10: ffff9b3dfd001300 R11: ffffffffbcbd753d R12: ffff9b3df8fd0000
            [17671.223393] R13: 0000000000000000 R14: ffff9b3def57d980 R15: ffffffffbd654920
            [17671.224582] FS:  00007fd3c6583880(0000) GS:ffff9b3dffd00000(0000) knlGS:0000000000000000
            [17671.225989] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [17671.226966] CR2: 00007fd3c5aba060 CR3: 000000005b5da000 CR4: 00000000000606e0
            [17671.228175] Call Trace:
            [17671.228700]  [<ffffffffbccf9e6b>] security_capable_noaudit+0x1b/0x20
            [17671.229815]  [<ffffffffbcaa6e45>] has_ns_capability_noaudit+0x15/0x20
            [17671.230908]  [<ffffffffbcaa7105>] ptrace_has_cap+0x35/0x40
            [17671.231875]  [<ffffffffbcaa7b51>] __ptrace_may_access+0x71/0x150
            [17671.232896]  [<ffffffffbcaa7efe>] ptrace_may_access+0x2e/0x50
            [17671.233885]  [<ffffffffbccc0841>] do_task_stat+0x91/0xbb0
            [17671.234850]  [<ffffffffbcc657a4>] ? mntput+0x24/0x40
            [17671.235711]  [<ffffffffbcc53ac2>] ? path_openat+0x172/0x640
            [17671.236682]  [<ffffffffbcc3f69a>] ? __check_object_size+0x1ca/0x250
            [17671.237757]  [<ffffffffbccc1e84>] proc_tgid_stat+0x14/0x20
            [17671.238689]  [<ffffffffbccbc062>] proc_single_show+0x52/0x90
            [17671.239649]  [<ffffffffbcc69fc0>] seq_read+0x130/0x440
            [17671.240555]  [<ffffffffbcc4256f>] vfs_read+0x9f/0x170
            [17671.241425]  [<ffffffffbcc4342f>] SyS_read+0x7f/0xf0
            [17671.242295]  [<ffffffffbd177d21>] ? system_call_after_swapgs+0xae/0x146
            [17671.243447]  [<ffffffffbd177ddb>] system_call_fastpath+0x22/0x27
            [17671.244932]  [<ffffffffbd177d21>] ? system_call_after_swapgs+0xae/0x146
            [17671.246185] Code: 1f 00 66 66 66 66 90 55 4c 8b 87 80 00 00 00 48 89 e5 eb 09 66 0f 1f 44 00 00 48 89 c6 4c 39 c6 74 28 48 81 fe 80 d7 64 bd 74 3f <48> 8b 86 c8 00 00 00 4c 39 c0 75 e3 8b 86 d0 00 00 00 39 47 14 
            [17671.251802] RIP  [<ffffffffbccf7ab9>] cap_capable+0x29/0x80
            [17671.252816]  RSP <ffff9b3debae3c60>
            
            sarah Sarah Liu added a comment - hit similar issue on PPC client on 2.12.3 https://testing.whamcloud.com/test_sets/dd84edc0-eb0e-11e9-b62b-52540065bddc == sanityn test 71a: correct file map just after write operation is finished ========================= 23:54:23 (1570578863) CMD: trevis-55vm2 /usr/sbin/lctl get_param -n version 2>/dev/null || /usr/sbin/lctl lustre_build_version 2>/dev/null || /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2 1+0 records in 1+0 records out 40960 bytes (41 kB) copied, 0.00459198 s, 8.9 MB/s 1+0 records in 1+0 records out 40960 bytes (41 kB) copied, 0.000819573 s, 50.0 MB/s File: '/mnt/lustre2/f71a.sanityn' Size: 163840 Blocks: 1 IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115205272562497 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-10-08 23:54:24.000000000 +0000 Modify: 2019-10-08 23:54:24.000000000 +0000 Change: 2019-10-08 23:54:24.000000000 +0000 Birth: - 149703 fd: 3 error while ioctl 2 sanityn test_71a: @@@@@@ FAIL: data is not flushed from client OSS crash [17669.845624] Lustre: DEBUG MARKER: dmesg [17670.427979] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanityn test 71a: correct file map just after write operation is finished ========================= 23:54:23 \(1570578863\) [17670.623376] Lustre: DEBUG MARKER: == sanityn test 71a: correct file map just after write operation is finished ========================= 23:54:23 (1570578863) [17671.194035] general protection fault: 0000 [#1] SMP [17671.195205] Modules linked in: dm_flakey osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon i2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk 8139too crct10dif_pclmul crct10dif_common ata_piix crc32c_intel serio_raw [17671.209748] libata virtio_pci virtio_ring virtio 8139cp mii floppy [last unloaded: dm_flakey] [17671.211639] CPU: 1 PID: 27825 Comm: ps Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.x86_64 #1 [17671.213628] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [17671.214627] task: ffff9b3dd354c100 ti: ffff9b3debae0000 task.ti: ffff9b3debae0000 [17671.216012] RIP: 0010:[<ffffffffbccf7ab9>] [<ffffffffbccf7ab9>] cap_capable+0x29/0x80 [17671.217533] RSP: 0018:ffff9b3debae3c60 EFLAGS: 00010283 [17671.218461] RAX: ffffffffbccf7a90 RBX: 000000000000000d RCX: 0000000000000000 [17671.219703] RDX: 0000000000000013 RSI: 80d764bdffffffff RDI: ffff9b3dd3a0d900 [17671.220929] RBP: ffff9b3debae3c60 R08: ffffffffbd64d780 R09: ffff9b3dfd001300 [17671.222150] R10: ffff9b3dfd001300 R11: ffffffffbcbd753d R12: ffff9b3df8fd0000 [17671.223393] R13: 0000000000000000 R14: ffff9b3def57d980 R15: ffffffffbd654920 [17671.224582] FS: 00007fd3c6583880(0000) GS:ffff9b3dffd00000(0000) knlGS:0000000000000000 [17671.225989] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [17671.226966] CR2: 00007fd3c5aba060 CR3: 000000005b5da000 CR4: 00000000000606e0 [17671.228175] Call Trace: [17671.228700] [<ffffffffbccf9e6b>] security_capable_noaudit+0x1b/0x20 [17671.229815] [<ffffffffbcaa6e45>] has_ns_capability_noaudit+0x15/0x20 [17671.230908] [<ffffffffbcaa7105>] ptrace_has_cap+0x35/0x40 [17671.231875] [<ffffffffbcaa7b51>] __ptrace_may_access+0x71/0x150 [17671.232896] [<ffffffffbcaa7efe>] ptrace_may_access+0x2e/0x50 [17671.233885] [<ffffffffbccc0841>] do_task_stat+0x91/0xbb0 [17671.234850] [<ffffffffbcc657a4>] ? mntput+0x24/0x40 [17671.235711] [<ffffffffbcc53ac2>] ? path_openat+0x172/0x640 [17671.236682] [<ffffffffbcc3f69a>] ? __check_object_size+0x1ca/0x250 [17671.237757] [<ffffffffbccc1e84>] proc_tgid_stat+0x14/0x20 [17671.238689] [<ffffffffbccbc062>] proc_single_show+0x52/0x90 [17671.239649] [<ffffffffbcc69fc0>] seq_read+0x130/0x440 [17671.240555] [<ffffffffbcc4256f>] vfs_read+0x9f/0x170 [17671.241425] [<ffffffffbcc4342f>] SyS_read+0x7f/0xf0 [17671.242295] [<ffffffffbd177d21>] ? system_call_after_swapgs+0xae/0x146 [17671.243447] [<ffffffffbd177ddb>] system_call_fastpath+0x22/0x27 [17671.244932] [<ffffffffbd177d21>] ? system_call_after_swapgs+0xae/0x146 [17671.246185] Code: 1f 00 66 66 66 66 90 55 4c 8b 87 80 00 00 00 48 89 e5 eb 09 66 0f 1f 44 00 00 48 89 c6 4c 39 c6 74 28 48 81 fe 80 d7 64 bd 74 3f <48> 8b 86 c8 00 00 00 4c 39 c0 75 e3 8b 86 d0 00 00 00 39 47 14 [17671.251802] RIP [<ffffffffbccf7ab9>] cap_capable+0x29/0x80 [17671.252816] RSP <ffff9b3debae3c60>

            People

              xinliang Xinliang Liu
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: