Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8491

Quota code sleeping in atomic context

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      I tried to run a debugging kernel with latest master with autotest and sanity-quota fell apart with multiple sleeping under spinlock problems culminating in double taking a spinlock:

      08:09:46:[10054.232995] BUG: sleeping function called from invalid context at mm/slab.c:3054
      08:09:46:[10054.237505] in_atomic(): 1, irqs_disabled(): 0, pid: 25313, name: mdt00_003
      08:09:46:[10054.241875] CPU: 0 PID: 25313 Comm: mdt00_003 Tainted: G        W  OE  ------------   3.10.0-327.22.2.el7_lustre.x86_64 #1
      08:09:46:[10054.243752] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      08:09:46:[10054.245245]  ffff8800399ede50 000000001b0e0c50 ffff88003cec7a28 ffffffff8164bed6
      08:09:46:[10054.246953]  ffff88003cec7a38 ffffffff810b5639 ffff88003cec7ad0 ffffffff811cb595
      08:09:46:[10054.248630]  ffffffffa016233f 0000000000000046 ffff880027ae7410 ffffffffa0bfcfb5
      08:09:46:[10054.250317] Call Trace:
      08:09:46:[10054.251601]  [<ffffffff8164bed6>] dump_stack+0x19/0x1b
      08:09:46:[10054.253108]  [<ffffffff810b5639>] __might_sleep+0xd9/0x100
      08:09:46:[10054.254619]  [<ffffffff811cb595>] kmem_cache_alloc_trace+0x65/0x630
      08:09:46:[10054.256179]  [<ffffffffa016233f>] ? jbd2_journal_stop+0x1ef/0x400 [jbd2]
      08:09:46:[10054.257791]  [<ffffffffa0bfcfb5>] ? qmt_glimpse_lock+0x155/0x780 [lquota]
      08:09:46:[10054.259396]  [<ffffffff810bcab6>] ? try_to_wake_up+0x1b6/0x320
      08:09:46:[10054.260937]  [<ffffffffa0bfcfb5>] qmt_glimpse_lock+0x155/0x780 [lquota]
      08:09:46:[10054.262581]  [<ffffffffa0c00a2f>] qmt_glb_lock_notify+0x12f/0x310 [lquota]
      08:09:46:[10054.264180]  [<ffffffffa0bfae19>] qmt_set.constprop.14+0x4d9/0x700 [lquota]
      08:09:46:[10054.265796]  [<ffffffffa0bfb1fe>] qmt_quotactl+0x1be/0x630 [lquota]
      08:09:46:[10054.267394]  [<ffffffffa0dde014>] mdt_quotactl+0x514/0x610 [mdt]
      08:09:46:[10054.269032]  [<ffffffffa0a8b7e5>] tgt_request_handle+0x925/0x1330 [ptlrpc]
      08:09:46:[10054.270655]  [<ffffffffa0a3924e>] ptlrpc_server_handle_request+0x22e/0xaa0 [ptlrpc]
      08:09:46:[10054.272376]  [<ffffffffa0a37aee>] ? ptlrpc_wait_event+0xae/0x350 [ptlrpc]
      08:09:46:[10054.273983]  [<ffffffff810bcc92>] ? default_wake_function+0x12/0x20
      08:09:46:[10054.275549]  [<ffffffff810b2cd8>] ? __wake_up_common+0x58/0x90
      08:09:46:[10054.277121]  [<ffffffffa0a3d018>] ptlrpc_main+0xa58/0x1db0 [ptlrpc]
      08:09:46:[10054.278700]  [<ffffffffa0a3c5c0>] ? ptlrpc_register_service+0xe60/0xe60 [ptlrpc]
      08:09:46:[10054.280352]  [<ffffffff810a8a24>] kthread+0xe4/0xf0
      08:09:46:[10054.281827]  [<ffffffff810a8940>] ? kthread_create_on_node+0x140/0x140
      08:09:46:[10054.283415]  [<ffffffff8165d3d8>] ret_from_fork+0x58/0x90
      08:09:46:[10054.284925]  [<ffffffff810a8940>] ? kthread_create_on_node+0x140/0x140
      
      08:09:46:[10058.183002] BUG: scheduling while atomic: qmt_reba_lustre/24447/0x10000002
      08:09:46:[10058.184521] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) ldiskfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic crct10dif_common ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm 8139too drm ata_piix i2c_core serio_raw virtio_pci virtio_ring virtio libata 8139cp mii floppy
      08:09:46:[10058.198939] CPU: 0 PID: 24447 Comm: qmt_reba_lustre Tainted: G        W  OE  ------------   3.10.0-327.22.2.el7_lustre.x86_64 #1
      08:09:46:[10058.202211] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      08:09:46:[10058.203895]  ffff880027ae3fd8 00000000444ee29f ffff880027ae3c20 ffffffff8164bed6
      08:09:46:[10058.205776]  ffff880027ae3c30 ffffffff81648241 ffff880027ae3c90 ffffffff8165223c
      08:09:46:[10058.207645]  ffff880027ae7410 ffff880027ae3fd8 ffff880027ae3fd8 ffff880027ae3fd8
      08:09:46:[10058.209514] Call Trace:
      08:09:46:[10058.210983]  [<ffffffff8164bed6>] dump_stack+0x19/0x1b
      08:09:46:[10058.212645]  [<ffffffff81648241>] __schedule_bug+0x4d/0x5b
      08:09:46:[10058.214325]  [<ffffffff8165223c>] __schedule+0x7bc/0x900
      08:09:46:[10058.215984]  [<ffffffff810b9ce6>] __cond_resched+0x26/0x30
      08:09:46:[10058.217643]  [<ffffffff8165264a>] _cond_resched+0x3a/0x50
      08:09:46:[10058.219298]  [<ffffffff811cb59a>] kmem_cache_alloc_trace+0x6a/0x630
      08:09:46:[10058.221019]  [<ffffffffa0bfcfb5>] ? qmt_glimpse_lock+0x155/0x780 [lquota]
      08:09:46:[10058.222777]  [<ffffffffa0bfcfb5>] qmt_glimpse_lock+0x155/0x780 [lquota]
      08:09:46:[10058.224528]  [<ffffffffa0bfdcf5>] qmt_reba_thread+0x715/0xc90 [lquota]
      08:09:46:[10058.226260]  [<ffffffff810bcc80>] ? wake_up_state+0x20/0x20
      08:09:46:[10058.227914]  [<ffffffffa0bfd5e0>] ? qmt_glimpse_lock+0x780/0x780 [lquota]
      08:09:46:[10058.229668]  [<ffffffff810a8a24>] kthread+0xe4/0xf0
      08:09:46:[10058.231276]  [<ffffffff810a8940>] ? kthread_create_on_node+0x140/0x140
      08:09:46:[10058.232990]  [<ffffffff8165d3d8>] ret_from_fork+0x58/0x90
      08:09:46:[10058.234594]  [<ffffffff810a8940>] ? kthread_create_on_node+0x140/0x140
      08:09:46:[10058.240296] BUG: spinlock cpu recursion on CPU#0, ldlm_cn00_002/17112
      08:09:46:[10058.242027]  lock: 0xffff880036e6a4a0, .magic: dead4ead, .owner: qmt_reba_lustre/24447, .owner_cpu: 0
      08:10:08:[10058.243896] CPU: 0 PID: 17112 Comm: ldlm_cn00_002 Tainted: G        W  OE  ------------   3.10.0-327.22.2.el7_lustre.x86_64 #1
      08:10:08:[10058.247148] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      08:10:08:[10058.248814]  ffff880027ae7410 00000000d868321e ffff88002a1f7b98 ffffffff8164bed6
      08:10:08:[10058.250668]  ffff88002a1f7bb8 ffffffff8164bf64 ffff880036e6a4a0 ffffffff818b3096
      08:10:08:[10058.252541]  ffff88002a1f7bd8 ffffffff8164bf8a ffff880036e6a4a0 0000000000000000
      08:10:08:[10058.254397] Call Trace:
      08:10:08:[10058.255848]  [<ffffffff8164bed6>] dump_stack+0x19/0x1b
      08:10:08:[10058.257488]  [<ffffffff8164bf64>] spin_dump+0x8c/0x91
      08:10:08:[10058.259116]  [<ffffffff8164bf8a>] spin_bug+0x21/0x26
      08:10:08:[10058.260725]  [<ffffffff8131c008>] do_raw_spin_lock+0x118/0x170
      08:10:08:[10058.262421]  [<ffffffff8165413e>] _raw_spin_lock+0x1e/0x20
      08:10:08:[10058.264094]  [<ffffffffa09d902c>] lock_res_and_lock+0x2c/0x50 [ptlrpc]
      08:10:08:[10058.265831]  [<ffffffffa09e15dd>] ldlm_lock_cancel+0x2d/0x1e0 [ptlrpc]
      08:10:08:[10058.267560]  [<ffffffffa0a06251>] ldlm_request_cancel+0x151/0x710 [ptlrpc]
      08:10:08:[10058.269316]  [<ffffffffa0a09b4a>] ldlm_handle_cancel+0xba/0x250 [ptlrpc]
      08:10:08:[10058.271051]  [<ffffffffa0a09e21>] ldlm_cancel_handler+0x141/0x490 [ptlrpc]
      08:10:08:[10058.272791]  [<ffffffffa0a3924e>] ptlrpc_server_handle_request+0x22e/0xaa0 [ptlrpc]
      08:10:08:[10058.274534]  [<ffffffffa0a37aee>] ? ptlrpc_wait_event+0xae/0x350 [ptlrpc]
      08:10:08:[10058.276176]  [<ffffffff810bcc92>] ? default_wake_function+0x12/0x20
      08:10:08:[10058.277743]  [<ffffffff810b2cd8>] ? __wake_up_common+0x58/0x90
      08:10:08:[10058.279278]  [<ffffffffa0a3d018>] ptlrpc_main+0xa58/0x1db0 [ptlrpc]
      08:10:08:[10058.280804]  [<ffffffffa0a3c5c0>] ? ptlrpc_register_service+0xe60/0xe60 [ptlrpc]
      08:10:08:[10058.282370]  [<ffffffff810a8a24>] kthread+0xe4/0xf0
      08:10:08:[10058.283727]  [<ffffffff810a8940>] ? kthread_create_on_node+0x140/0x140
      08:10:08:[10058.285182]  [<ffffffff8165d3d8>] ret_from_fork+0x58/0x90
      08:10:08:[10058.286542]  [<ffffffff810a8940>] ? kthread_create_on_node+0x140/0x140
      

      Full report is at https://testing.hpdd.intel.com/test_sets/c73d6a92-5e4e-11e6-b5b1-5254006e85c2

      Attachments

        Activity

          [LU-8491] Quota code sleeping in atomic context
          pjones Peter Jones added a comment -

          Landed for 2.10

          pjones Peter Jones added a comment - Landed for 2.10

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21923/
          Subject: LU-8491 quota: sleep while holding spinlock
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 6cb38c3a863993f3bba8332194c5ee8c939ad25d

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21923/ Subject: LU-8491 quota: sleep while holding spinlock Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6cb38c3a863993f3bba8332194c5ee8c939ad25d

          Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/21923
          Subject: LU-8491 quota: sleep while holding spinlock
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 6c1016218f9d8f7421aea95405a20e15cf07b817

          gerrit Gerrit Updater added a comment - Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/21923 Subject: LU-8491 quota: sleep while holding spinlock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6c1016218f9d8f7421aea95405a20e15cf07b817

          Hi Niu,

          Can you please have a look at this issue?

          Thanks.
          Joe

          jgmitter Joseph Gmitter (Inactive) added a comment - Hi Niu, Can you please have a look at this issue? Thanks. Joe

          this is because of OBD_ALLOC_PTR(work); with the lock on the resource held: lock_res(res);

          bzzz Alex Zhuravlev added a comment - this is because of OBD_ALLOC_PTR(work); with the lock on the resource held: lock_res(res);

          People

            niu Niu Yawei (Inactive)
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: