Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.4
    • Lustre 2.13.0, Lustre 2.12.1
    • 3
    • 9223372036854775807

    Description

      I see we are having this sort of a crash on master semi-regularly in full testing.

      [11400.772957] Lustre: DEBUG MARKER: == sanity-quota test 51: Test project accounting with mv/cp ========================================== 11:53:35 (1549367615)
      [11402.675816] Lustre: DEBUG MARKER: lctl set_param fail_val=0 fail_loc=0
      [11403.528916] Lustre: DEBUG MARKER: lctl set_param -n osd*.*OS*.force_sync=1
      [11404.500328] Lustre: DEBUG MARKER: lctl set_param -n osd*.*OS*.force_sync=1
      [11404.961396] BUG: unable to handle kernel paging request at ffff9775f906a000
      [11404.962319] IP: [<ffffffffc0b453d5>] lustre_swab_fiemap+0x85/0xa0 [ptlrpc]
      [11404.963223] PGD 56852067 PUD 56856067 PMD 7a5ba063 PTE 800000007906a061
      [11404.963934] Oops: 0003 [#1] SMP 
      [11404.964307] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul i2c_piix4 parport_pc pcspkr joydev parport glue_helper virtio_balloon ablk_helper cryptd ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy ata_piix
      [11404.972408]  8139too libata virtio_pci virtio_ring virtio 8139cp mii
      [11404.973038] CPU: 0 PID: 16619 Comm: ll_ost00_001 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.1.3.el7_lustre.x86_64 #1
      [11404.974208] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [11404.974750] task: ffff9775f9b51040 ti: ffff9775f9d58000 task.ti: ffff9775f9d58000
      [11404.975462] RIP: 0010:[<ffffffffc0b453d5>]  [<ffffffffc0b453d5>] lustre_swab_fiemap+0x85/0xa0 [ptlrpc]
      [11404.976388] RSP: 0018:ffff9775f9d5bbb8  EFLAGS: 00010202
      [11404.976899] RAX: ffff9775f9069fd8 RBX: 0000000000000000 RCX: 00000000000048c7
      [11404.977570] RDX: 00000000000002d7 RSI: 0000000000000002 RDI: ffff9775f90600e8
      [11404.978244] RBP: ffff9775f9d5bbb8 R08: 00000000000000b8 R09: 00000000000000e8
      [11404.978917] R10: 0000000000000000 R11: 0000000000000020 R12: ffffffffc0b41120
      [11404.979595] R13: ffffffffc0c3c260 R14: ffff9775f90600e8 R15: ffffffffc0c41120
      [11404.980273] FS:  0000000000000000(0000) GS:ffff9775ffc00000(0000) knlGS:0000000000000000
      [11404.981037] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [11404.981586] CR2: ffff9775f906a000 CR3: 0000000079bc6000 CR4: 00000000000606f0
      [11404.982265] Call Trace:
      [11404.982573]  [<ffffffffc0b69477>] __req_capsule_get+0x4c7/0x740 [ptlrpc]
      [11404.983253]  [<ffffffffc0b45350>] ? lustre_swab_obd_quotactl+0xb0/0xb0 [ptlrpc]
      [11404.983989]  [<ffffffffc0b69705>] req_capsule_client_get+0x15/0x20 [ptlrpc]
      [11404.984682]  [<ffffffffc0ed7d4e>] ofd_get_info_hdl+0x54e/0x10f0 [ofd]
      [11404.985332]  [<ffffffffc0b41752>] ? lustre_msg_get_opc+0x22/0xf0 [ptlrpc]
      [11404.986050]  [<ffffffffc0bab149>] ? tgt_request_preprocess.isra.31+0x299/0x7a0 [ptlrpc]
      [11404.986839]  [<ffffffffc0bac40a>] tgt_request_handle+0xafa/0x1590 [ptlrpc]
      [11404.987523]  [<ffffffffc070cf07>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [11404.988188]  [<ffffffffc0b4f99e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
      [11404.988951]  [<ffffffff8d4cba9b>] ? __wake_up_common+0x5b/0x90
      [11404.989539]  [<ffffffffc0b5347c>] ptlrpc_main+0xbbc/0x2090 [ptlrpc]
      [11404.990155]  [<ffffffff8d4d0880>] ? finish_task_switch+0x50/0x1c0
      [11404.990760]  [<ffffffffc0b528c0>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
      [11404.991477]  [<ffffffff8d4c1c31>] kthread+0xd1/0xe0
      [11404.991958]  [<ffffffff8d4c1b60>] ? insert_kthread_work+0x40/0x40
      [11404.992556]  [<ffffffff8db74c37>] ret_from_fork_nospec_begin+0x21/0x21
      [11404.993184]  [<ffffffff8d4c1b60>] ? insert_kthread_work+0x40/0x40
      [11404.993761] Code: 29 c8 48 8d 44 07 20 48 8b 08 48 0f c9 48 89 08 48 8b 48 08 48 0f c9 48 89 48 08 48 8b 48 10 48 0f c9 48 89 48 10 8b 48 28 0f c9 <89> 48 28 8b 48 2c 0f c9 89 48 2c 39 57 14 77 b3 5d c3 66 0f 1f 
      [11404.996791] RIP  [<ffffffffc0b453d5>] lustre_swab_fiemap+0x85/0xa0 [ptlrpc]
      [11404.997499]  RSP <ffff9775f9d5bbb8>
      [11404.997845] CR2: ffff9775f906a000
      

      Sample reports:
      https://testing.whamcloud.com/test_sessions/a7100cee-9175-47ca-aaa4-509431ebb316
      https://testing.whamcloud.com/test_sessions/99cd3009-304d-4d61-901e-1eeebea118af
      https://testing.whamcloud.com/test_sets/247fe132-2970-11e9-b901-52540065bddc (his one failed on Feb 5th)

      Attachments

        Issue Links

          Activity

            [LU-11997] Crash in lustre_swab_fiemap

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36481/
            Subject: LU-11997 ptlrpc: Properly swab ll_fiemap_info_key
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 6b8ea93881682346bfc201b0859f95ef26e89163

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36481/ Subject: LU-11997 ptlrpc: Properly swab ll_fiemap_info_key Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 6b8ea93881682346bfc201b0859f95ef26e89163

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36481
            Subject: LU-11997 ptlrpc: Properly swab ll_fiemap_info_key
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: e4961a5cb556cab20849891f011e2d338f23b5b2

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36481 Subject: LU-11997 ptlrpc: Properly swab ll_fiemap_info_key Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: e4961a5cb556cab20849891f011e2d338f23b5b2
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36308/
            Subject: LU-11997 ptlrpc: Properly swab ll_fiemap_info_key
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2b905746ee3b5d9dbafcdb1af5930aea18120a7b

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36308/ Subject: LU-11997 ptlrpc: Properly swab ll_fiemap_info_key Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2b905746ee3b5d9dbafcdb1af5930aea18120a7b

            Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36308
            Subject: LU-11997 ptlrpc: Properly swab ll_fiemap_info_key
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: cae79cbea1918b364c2d6870ba6f6dee32e5127c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36308 Subject: LU-11997 ptlrpc: Properly swab ll_fiemap_info_key Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: cae79cbea1918b364c2d6870ba6f6dee32e5127c
            green Oleg Drokin added a comment -

            I think a very similar crash in master: https://testing.whamcloud.com/test_sets/51edf2bc-9c0c-11e9-8fc1-52540065bddc
            in replay-single test 88

            [ 7123.688069] BUG: unable to handle kernel paging request at ffff9a8376000000
            [ 7123.688836] IP: [<ffffffffc0e8afd7>] lustre_swab_fiemap+0x67/0xa0 [ptlrpc]
            [ 7123.689681] PGD 76452067 PUD 76453067 PMD 36078063 PTE 8000000036000061
            [ 7123.690367] Oops: 0003 [#1] SMP 
            [ 7123.690721] Modules linked in: osp(OE) (OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon parport_pc parport i2c_piix4 ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy 8139too ata_piix
            [ 7123.698556]  libata 8139cp mii virtio_pci virtio_ring virtio
            [ 7123.699084] CPU: 1 PID: 1626 Comm: ll_ost00_021 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.12.2.el7_lustre.x86_64 #1
            [ 7123.700207] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [ 7123.700730] task: ffff9a83984dc100 ti: ffff9a839c954000 task.ti: ffff9a839c954000
            [ 7123.701412] RIP: 0010:[<ffffffffc0e8afd7>]  [<ffffffffc0e8afd7>] lustre_swab_fiemap+0x67/0xa0 [ptlrpc]
            [ 7123.702320] RSP: 0018:ffff9a839c957bb8  EFLAGS: 00010206
            [ 7123.702812] RAX: ffff9a8376000000 RBX: 0000000000000000 RCX: 6973737565000000
            [ 7123.703463] RDX: 0000000000006fdf RSI: 0000000000000002 RDI: ffff9a8375e78750
            [ 7123.704109] RBP: ffff9a839c957bb8 R08: 00000000000000b8 R09: 00000000000000e8
            [ 7123.704761] R10: 0000000000000000 R11: 0000000000000020 R12: ffffffffc0e86cb0
            [ 7123.705427] R13: ffffffffc0f845e0 R14: ffff9a8375e78750 R15: ffffffffc0f894a0
            [ 7123.706077] FS:  0000000000000000(0000) GS:ffff9a83bfd00000(0000) knlGS:0000000000000000
            [ 7123.706820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [ 7123.707350] CR2: ffff9a8376000000 CR3: 000000007a600000 CR4: 00000000000606e0
            [ 7123.708003] Call Trace:
            [ 7123.708292]  [<ffffffffc0eafa27>] __req_capsule_get+0x4c7/0x740 [ptlrpc]
            [ 7123.708941]  [<ffffffffc0e8af70>] ? lustre_swab_obd_quotactl+0xb0/0xb0 [ptlrpc]
            [ 7123.709648]  [<ffffffffc0eafcb5>] req_capsule_client_get+0x15/0x20 [ptlrpc]
            [ 7123.710341]  [<ffffffffc0e872e2>] ? lustre_msg_get_opc+0x22/0xf0 [ptlrpc]
            [ 7123.711023]  [<ffffffffc0ef24ca>] ? tgt_request_handle+0x91a/0x15c0 [ptlrpc]
            [ 7123.711707]  [<ffffffffc098c2e7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
            [ 7123.712356]  [<ffffffffc0e965de>] ? ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
            [ 7123.713106]  [<ffffffff9a2ced54>] ? __wake_up+0x44/0x50
            [ 7123.713620]  [<ffffffffc0e9a0cc>] ? ptlrpc_main+0xbac/0x1560 [ptlrpc]
            [ 7123.714248]  [<ffffffffc0e99520>] ? ptlrpc_register_service+0xfa0/0xfa0 [ptlrpc]
            [ 7123.714937]  [<ffffffff9a2c1d21>] ? kthread+0xd1/0xe0
            [ 7123.715412]  [<ffffffff9a2c1c50>] ? insert_kthread_work+0x40/0x40
            [ 7123.715987]  [<ffffffff9a975c37>] ? ret_from_fork_nospec_begin+0x21/0x21
            [ 7123.716609]  [<ffffffff9a2c1c50>] ? insert_kthread_work+0x40/0x40
            
            green Oleg Drokin added a comment - I think a very similar crash in master: https://testing.whamcloud.com/test_sets/51edf2bc-9c0c-11e9-8fc1-52540065bddc in replay-single test 88 [ 7123.688069] BUG: unable to handle kernel paging request at ffff9a8376000000 [ 7123.688836] IP: [<ffffffffc0e8afd7>] lustre_swab_fiemap+0x67/0xa0 [ptlrpc] [ 7123.689681] PGD 76452067 PUD 76453067 PMD 36078063 PTE 8000000036000061 [ 7123.690367] Oops: 0003 [#1] SMP [ 7123.690721] Modules linked in: osp(OE) (OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon parport_pc parport i2c_piix4 ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy 8139too ata_piix [ 7123.698556] libata 8139cp mii virtio_pci virtio_ring virtio [ 7123.699084] CPU: 1 PID: 1626 Comm: ll_ost00_021 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.12.2.el7_lustre.x86_64 #1 [ 7123.700207] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 7123.700730] task: ffff9a83984dc100 ti: ffff9a839c954000 task.ti: ffff9a839c954000 [ 7123.701412] RIP: 0010:[<ffffffffc0e8afd7>] [<ffffffffc0e8afd7>] lustre_swab_fiemap+0x67/0xa0 [ptlrpc] [ 7123.702320] RSP: 0018:ffff9a839c957bb8 EFLAGS: 00010206 [ 7123.702812] RAX: ffff9a8376000000 RBX: 0000000000000000 RCX: 6973737565000000 [ 7123.703463] RDX: 0000000000006fdf RSI: 0000000000000002 RDI: ffff9a8375e78750 [ 7123.704109] RBP: ffff9a839c957bb8 R08: 00000000000000b8 R09: 00000000000000e8 [ 7123.704761] R10: 0000000000000000 R11: 0000000000000020 R12: ffffffffc0e86cb0 [ 7123.705427] R13: ffffffffc0f845e0 R14: ffff9a8375e78750 R15: ffffffffc0f894a0 [ 7123.706077] FS: 0000000000000000(0000) GS:ffff9a83bfd00000(0000) knlGS:0000000000000000 [ 7123.706820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7123.707350] CR2: ffff9a8376000000 CR3: 000000007a600000 CR4: 00000000000606e0 [ 7123.708003] Call Trace: [ 7123.708292] [<ffffffffc0eafa27>] __req_capsule_get+0x4c7/0x740 [ptlrpc] [ 7123.708941] [<ffffffffc0e8af70>] ? lustre_swab_obd_quotactl+0xb0/0xb0 [ptlrpc] [ 7123.709648] [<ffffffffc0eafcb5>] req_capsule_client_get+0x15/0x20 [ptlrpc] [ 7123.710341] [<ffffffffc0e872e2>] ? lustre_msg_get_opc+0x22/0xf0 [ptlrpc] [ 7123.711023] [<ffffffffc0ef24ca>] ? tgt_request_handle+0x91a/0x15c0 [ptlrpc] [ 7123.711707] [<ffffffffc098c2e7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 7123.712356] [<ffffffffc0e965de>] ? ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [ 7123.713106] [<ffffffff9a2ced54>] ? __wake_up+0x44/0x50 [ 7123.713620] [<ffffffffc0e9a0cc>] ? ptlrpc_main+0xbac/0x1560 [ptlrpc] [ 7123.714248] [<ffffffffc0e99520>] ? ptlrpc_register_service+0xfa0/0xfa0 [ptlrpc] [ 7123.714937] [<ffffffff9a2c1d21>] ? kthread+0xd1/0xe0 [ 7123.715412] [<ffffffff9a2c1c50>] ? insert_kthread_work+0x40/0x40 [ 7123.715987] [<ffffffff9a975c37>] ? ret_from_fork_nospec_begin+0x21/0x21 [ 7123.716609] [<ffffffff9a2c1c50>] ? insert_kthread_work+0x40/0x40

            We see this crash in sanityn test 71a for PPC client testing. See https://testing.whamcloud.com/test_sets/7d68fa34-668f-11e9-8bb1-52540065bddc

            jamesanunez James Nunez (Inactive) added a comment - We see this crash in sanityn test 71a for PPC client testing. See https://testing.whamcloud.com/test_sets/7d68fa34-668f-11e9-8bb1-52540065bddc

            People

              green Oleg Drokin
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: