Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12002

Invalid spinlock on transaction start/stop on shutting-down service

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      I am having this sort of crashes quite a bit on my test rig when.a test (typically racer) is trying to wind down for a cleanup:

      [96026.688680] Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
      [96026.688683] Lustre: Skipped 5 previous similar messages
      [96026.704392] LustreError: Skipped 5 previous similar messages
      [96028.465664] Lustre: server umount lustre-MDT0001 complete
      [96029.930089] LustreError: 9719:0:(update_trans.c:1086:top_trans_stop()) lustre-MDT0001-osp-MDT0002: stop trans failed: rc = -5
      [96029.956001] LustreError: 15886:0:(ldlm_lockd.c:1360:ldlm_handle_enqueue0()) ### lock on destroyed export ffff88009263e800 ns: mdt-lustre-MDT0002_UUID lock: ffff8800a63d6d80/0x8d081f666e29586e lrc: 4/0,0 mode: PR/PR res: [0x280000404:0x328:0x0].0x0 bits 0x12/0x0 rrc: 31 type: IBT flags: 0x50200400000020 nid: 0@lo remote: 0x8d081f666e295860 expref: 13 pid: 15886 timeout: 0 lvb_type: 0
      [96029.964042] Lustre: 9719:0:(service.c:2173:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (2403:3461s); client may timeout.  req@ffff8802cd52db40 x1626162606418368/t0(0) o36->95fd6501-6ffc-35eb-4985-e93c989c4db4@0@lo:736/0 lens 488/408 e 21 to 0 dl 1550832939 ref 1 fl Complete:/0/0 rc -61/-61
      [96029.987068] LustreError: 15886:0:(ldlm_lockd.c:1360:ldlm_handle_enqueue0()) Skipped 7 previous similar messages
      [96030.015729] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [96030.016014] IP: [<ffffffff813fb685>] do_raw_spin_lock+0x5/0xa0
      [96030.016014] PGD 0 
      [96030.016014] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [96030.057454] LustreError: 24998:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88022b0eeb40 x1626162613733760/t0(0) o101->lustre-MDT0000-osp-MDT0002@0@lo:24/4 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
      [96030.016014] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_mod loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic sb_edac edac_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd virtio_console i2c_piix4 pcspkr virtio_balloon ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm ata_piix crct10dif_pclmul crct10dif_common drm_panel_orientation_quirks crc32c_intel serio_raw virtio_blk libata i2c_core floppy [last unloaded: dm_flakey]
      [96030.016014] CPU: 14 PID: 25864 Comm: mdt07_007 Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.6-debug #1
      [96030.016014] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [96030.016014] task: ffff8800880d63c0 ti: ffff88008dc30000 task.ti: ffff88008dc30000
      [96030.016014] RIP: 0010:[<ffffffff813fb685>]  [<ffffffff813fb685>] do_raw_spin_lock+0x5/0xa0
      [96030.016014] RSP: 0018:ffff88008dc33a98  EFLAGS: 00010202
      [96030.016014] RAX: ffff88008dc33fd8 RBX: 0000000000000000 RCX: 0000000000000000
      [96030.016014] RDX: ffff8800a5753e00 RSI: ffff8802ab1f5e78 RDI: 0000000000000000
      [96030.016014] RBP: ffff88008dc33aa0 R08: 0000000000000000 R09: ffff88026d410fc0
      [96030.016014] R10: ffff880000000038 R11: 0000000000000000 R12: ffff8802ffaccc00
      [96030.016014] R13: ffff88008a01c4c0 R14: ffff88026d410fc0 R15: ffff880286ec2d78
      [96030.016014] FS:  0000000000000000(0000) GS:ffff88033dd80000(0000) knlGS:0000000000000000
      [96030.016014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [96030.161412] LustreError: 31885:0:(ldlm_resource.c:1146:ldlm_resource_complain()) mdt-lustre-MDT0002_UUID: namespace resource [0x280000404:0x328:0x0].0x0 (ffff880270be84c0) refcount nonzero (7) after lock cleanup; forcing cleanup.
      [96030.161428] LustreError: 31885:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 5 previous similar messages
      [96030.016014] CR2: 0000000000000004 CR3: 00000000bb8e8000 CR4: 00000000001607e0
      [96030.016014] Call Trace:
      [96030.016014]  [<ffffffff817b996e>] _raw_spin_lock+0x1e/0x20
      [96030.016014]  [<ffffffffa06f050c>] top_trans_start+0x52c/0x960 [ptlrpc]
      [96030.016014]  [<ffffffffa0eb687d>] ? lod_declare_xattr_del+0x21d/0x320 [lod]
      [96030.016014]  [<ffffffffa0e99484>] lod_trans_start+0x34/0x40 [lod]
      [96030.016014]  [<ffffffffa0d910d4>] mdd_trans_start+0x14/0x20 [mdd]
      [96030.016014]  [<ffffffffa0d8475e>] mdd_xattr_del+0x20e/0x550 [mdd]
      [96030.016014]  [<ffffffffa0dfaabc>] mdt_reint_setxattr+0xb9c/0xfc0 [mdt]
      [96030.016014]  [<ffffffffa0df7c70>] mdt_reint_rec+0x80/0x210 [mdt]
      [96030.016014]  [<ffffffffa0dd4860>] mdt_reint_internal+0x770/0xb40 [mdt]
      [96030.016014]  [<ffffffffa0ddc9e7>] ? mdt_thread_info_init+0xa7/0x1e0 [mdt]
      [96030.016014]  [<ffffffffa0ddf9b7>] mdt_reint+0x67/0x140 [mdt]
      [96030.016014]  [<ffffffffa06dcac5>] tgt_request_handle+0x915/0x15c0 [ptlrpc]
      [96030.016014]  [<ffffffffa0294fa7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [96030.016014]  [<ffffffffa0683249>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc]
      [96030.016014]  [<ffffffff810bfbd8>] ? __wake_up_common+0x58/0x90
      [96030.016014]  [<ffffffff813fb7bb>] ? do_raw_spin_unlock+0x4b/0x90
      [96030.016014]  [<ffffffffa068723c>] ptlrpc_main+0xb5c/0x2040 [ptlrpc]
      [96030.016014]  [<ffffffff810c32ed>] ? finish_task_switch+0x5d/0x1b0
      [96030.016014]  [<ffffffffa06866e0>] ? ptlrpc_register_service+0xfe0/0xfe0 [ptlrpc]
      [96030.016014]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
      [96030.016014]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
      [96030.016014]  [<ffffffff817c4c5d>] ret_from_fork_nospec_begin+0x7/0x21
      [96030.016014]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: