Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-195

OSS nodes hung due to multiple threads spinning on dq_list_lock with Lustre quotas enabled

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Blocker
    • Lustre 2.1.0
    • Lustre 2.0.0
    • None
    • 3
    • 8538

    Description

      At CEA they sometimes have multiple OSS nodes completely hang. Those nodes are dead and need to be crashed.

      In the dump, they see that multiple tasks are spinning on "dq_list_lock" with the following stack traces:
      ============================================================================
      #7 [ffff8805cf859180] _spin_lock at ffffffff81454fee
      #8 [ffff8805cf859188] dqget at ffffffff811b0914
      #9 [ffff8805cf8591d8] vfs_get_dqblk at ffffffff811b0f5a
      #10 [ffff8805cf8591f8] fsfilt_ldiskfs_quotactl at ffffffffa03fcbff
      #11 [ffff8805cf8592a8] compute_remquota at ffffffffa07cb7ce
      #12 [ffff8805cf859328] quota_check_common at ffffffffa07d4ade
      #13 [ffff8805cf859468] quota_chk_acq_common at ffffffffa07d5561
      #14 [ffff8805cf8595e8] filter_commitrw_write at ffffffffa0797488
      #15 [ffff8805cf8597d8] filter_commitrw at ffffffffa078a535
      #16 [ffff8805cf859898] obd_commitrw at ffffffffa0655ffa
      #17 [ffff8805cf859918] ost_brw_write at ffffffffa065e644
      #18 [ffff8805cf859af8] ost_handle at ffffffffa066337a
      #19 [ffff8805cf859ca8] ptlrpc_server_handle_request at ffffffffa06c5b11
      #20 [ffff8805cf859de8] ptlrpc_main at ffffffffa06c6f0a
      #21 [ffff8805cf859f48] kernel_thread at ffffffff8100d1aa

      and

      #6 [ffff8804e59817d0] _spin_lock at ffffffff81454fee
      #7 [ffff8804e59817d8] dqget at ffffffff811b0914
      #8 [ffff8804e5981828] dquot_initialize at ffffffff811b1077
      #9 [ffff8804e5981898] filter_destroy at ffffffffa0779496
      #10 [ffff8804e5981a78] ost_destroy at ffffffffa0656de3
      #11 [ffff8804e5981af8] ost_handle at ffffffffa066252b
      #12 [ffff8804e5981ca8] ptlrpc_server_handle_request at ffffffffa06c5b11
      #13 [ffff8804e5981de8] ptlrpc_main at ffffffffa06c6f0a
      #14 [ffff8804e5981f48] kernel_thread at ffffffff8100d1aa
      ============================================================================

      when the one who owns the "dq_list_lock" is spinning forever with the following stack trace:
      ============================================================================
      #6 [ffff88039cdeb8c0] vfs_quota_sync at ffffffff811b128b
      #7 [ffff88039cdeb918] fsfilt_ldiskfs_quotactl at ffffffffa03fc6fe
      #8 [ffff88039cdeb9c8] filter_quota_ctl at ffffffffa07d1bc2
      #9 [ffff88039cdebaf8] ost_handle at ffffffffa06627d9
      #10 [ffff88039cdebca8] ptlrpc_server_handle_request at ffffffffa06c5b11
      #11 [ffff88039cdebde8] ptlrpc_main at ffffffffa06c6f0a
      #12 [ffff88039cdebf48] kernel_thread at ffffffff8100d1aa
      ============================================================================

      We can also see that a (struct super_block *)->s_dquot.info[cnt].dqi_dirty_list list contains a single "struct dquot" having its dq_dirty.new pointing to itself and also its dq_flags with both DQ_ACTIVE_B and DQ_MOD_B bits unset. It seems that this is leading to an infinite loop in vfs_quota_sync()/clear_dquot_dirty().

      So maybe there is a place (in the kernel or Lustre) where a dqot struct can be chained or unchained on the dqi_dirty_list without the protection of "dq_list_lock".

      On the OSSes, we also see very often the following messages in the syslog:

      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: -----------[ cut here ]-----------
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: WARNING: at lib/list_debug.c:26 __list_add+0x6d/0xa0() (Tainted: GF W )
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: Hardware name: bullx super-node
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: list_add corruption. next->prev should be prev (ffff88087da265c0), but was ffff88087c9bb2b0. (n
      ext=ffff88087c9bb2b0).
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: Modules linked in: iptable_filter(U) ip_tables(U) x_tables(U) obdfilter(U) fsfilt_ldiskfs(U) os
      t(U) mgc(U) ldiskfs(U) jbd2(U) lustre(U) lov(U) osc(U) mdc(U) lquota(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(F)(U)
      lpfc(U) scsi_transport_fc(U) scsi_tgt(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_ta
      ble(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6(U) ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mthca(U) ib_
      mad(U) ib_core(U) usbhid(U) hid(U) mlx4_core(U) igb(U) ioatdma(U) i2c_i801(U) sg(U) i2c_core(U) uhci_hcd(U) dca(U) ehci_hcd(U) iTCO_wdt(U) iTCO_vend
      or_support(U) ext3(U) jbd(U) mbcache(U) sd_mod(U) crc_t10dif(U) ahci(U) dm_mod(U) [last unloaded: scsi_tgt]
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: Pid: 10660, comm: ll_ost_io_185 Tainted: GF W 2.6.32-30.el6.Bull.16.x86_64 #1
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: Call Trace:
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: [<ffffffff8105caa3>] warn_slowpath_common+0x83/0xc0
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: [<ffffffff8105cb41>] warn_slowpath_fmt+0x41/0x50
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: [<ffffffff8124ca5d>] __list_add+0x6d/0xa0
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: [<ffffffff811aef9d>] dquot_mark_dquot_dirty+0x5d/0x70
      2011-03-31 11:38:17 Mar 31 11:38:17 node206 kernel: [<ffffffffa087f251>] ldiskfs_mark_dquot_dirty+0x31/0x60 [ldiskfs]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff811af887>] __dquot_free_space+0x197/0x2f0
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff811afa10>] dquot_free_space+0x10/0x20
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa084b3a3>] ldiskfs_free_blocks+0xf3/0x110 [ldiskfs]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa085033e>] ldiskfs_ext_truncate+0x82e/0x9c0 [ldiskfs]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff811106e2>] ? pagevec_lookup+0x22/0x30
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa085cfc8>] ldiskfs_truncate+0x4c8/0x660 [ldiskfs]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa084d43b>] ? __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff8112303b>] ? unmap_mapping_range+0x6b/0x140
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff81111ebe>] vmtruncate+0x5e/0x70
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff811729c5>] inode_setattr+0x35/0x170
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa085dba6>] ldiskfs_setattr+0x186/0x390 [ldiskfs]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa08c500e>] fsfilt_ldiskfs_setattr+0x17e/0x200 [fsfilt_ldiskfs]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff810fee3f>] ? find_or_create_page+0x3f/0xb0
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa08f0fd4>] filter_setattr_internal+0xcc4/0x22c0 [obdfilter]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa08de14f>] ? filter_fmd_find_nolock+0x24f/0x2f0 [obdfilter]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa08d6633>] ? filter_fmd_put+0x33/0x190 [obdfilter]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa00f4dc1>] ? push_ctxt+0x281/0x3e0 [lvfs]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa08f272d>] filter_setattr+0x15d/0x610 [obdfilter]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa0600e0b>] ? lustre_pack_reply_v2+0x23b/0x310 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa05ffc65>] ? lustre_msg_buf+0x85/0x90 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa062ba7b>] ? __req_capsule_get+0x14b/0x6b0 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa0600fb1>] ? lustre_pack_reply_flags+0xd1/0x1f0 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa08f2cb9>] filter_truncate+0xd9/0x290 [obdfilter]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa00f973c>] ? lprocfs_counter_add+0x12c/0x170 [lvfs]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa08b0741>] ost_punch+0x2a1/0x8c0 [ost]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa06019dc>] ? lustre_msg_get_version+0x7c/0xe0 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa05ff884>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa0601b9c>] ? lustre_msg_get_conn_cnt+0x7c/0xe0 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa08b86b0>] ost_handle+0x31d0/0x4f40 [ost]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff8124a390>] ? __bitmap_weight+0x50/0xb0
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa05ff884>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa060eb11>] ptlrpc_server_handle_request+0x421/0xef0 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff8104079e>] ? activate_task+0x2e/0x40
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff8104e0b6>] ? try_to_wake_up+0x286/0x380
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff8104e1c2>] ? default_wake_function+0x12/0x20
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff81041059>] ? __wake_up_common+0x59/0x90
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa012c5ae>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa060ff0a>] ptlrpc_main+0x92a/0x15b0 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff8104e1b0>] ? default_wake_function+0x0/0x20
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff8100d1aa>] child_rip+0xa/0x20
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffffa060f5e0>] ? ptlrpc_main+0x0/0x15b0 [ptlrpc]
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: [<ffffffff8100d1a0>] ? child_rip+0x0/0x20
      2011-03-31 11:38:18 Mar 31 11:38:17 node206 kernel: --[ end trace bb3c2f07eefda023 ]--

      ..........

      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: -----------[ cut here ]-----------
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: WARNING: at lib/list_debug.c:26 __list_add+0x6d/0xa0() (Tainted: GF W )
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: Hardware name: bullx super-node
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: list_add corruption. next->prev should be prev (ffff88087da265c0), but was ffff88087c9bb2b0. (n
      ext=ffff88087c9bb2b0).
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: Modules linked in: iptable_filter(U) ip_tables(U) x_tables(U) obdfilter(U) fsfilt_ldiskfs(U) os
      t(U) mgc(U) ldiskfs(U) jbd2(U) lustre(U) lov(U) osc(U) mdc(U) lquota(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(F)(U)
      lpfc(U) scsi_transport_fc(U) scsi_tgt(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_ta
      ble(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6(U) ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mthca(U) ib_
      mad(U) ib_core(U) usbhid(U) hid(U) mlx4_core(U) igb(U) ioatdma(U) i2c_i801(U) sg(U) i2c_core(U) uhci_hcd(U) dca(U) ehci_hcd(U) iTCO_wdt(U) iTCO_vend
      or_support(U) ext3(U) jbd(U) mbcache(U) sd_mod(U) crc_t10dif(U) ahci(U) dm_mod(U) [last unloaded: scsi_tgt]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: Pid: 20096, comm: ll_ost_io_45 Tainted: GF W 2.6.32-30.el6.Bull.16.x86_64 #1
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: Call Trace:
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8105caa3>] warn_slowpath_common+0x83/0xc0
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8105cb41>] warn_slowpath_fmt+0x41/0x50
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8124ca5d>] __list_add+0x6d/0xa0
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff811aef9d>] dquot_mark_dquot_dirty+0x5d/0x70
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa087f251>] ldiskfs_mark_dquot_dirty+0x31/0x60 [ldiskfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff811afb83>] __dquot_alloc_space+0x133/0x220
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff81250078>] ? __percpu_counter_add+0x68/0x90
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff811afc9e>] dquot_alloc_space+0xe/0x10
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa0865e96>] ldiskfs_mb_new_blocks+0xf6/0x660 [ldiskfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff81453c1e>] ? mutex_lock+0x1e/0x50
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff811b0a7b>] ? dqget+0x1cb/0x380
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08c6d1b>] ldiskfs_ext_new_extent_cb+0x59b/0x6f0 [fsfilt_ldiskfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff811851cc>] ? __getblk+0x2c/0x2e0
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa084f7e9>] ldiskfs_ext_walk_space+0x109/0x2c0 [ldiskfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08c6780>] ? ldiskfs_ext_new_extent_cb+0x0/0x6f0 [fsfilt_ldiskfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08c644d>] fsfilt_map_nblocks+0xed/0x120 [fsfilt_ldiskfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08c659b>] fsfilt_ldiskfs_map_ext_inode_pages+0x11b/0x260 [fsfilt_ldiskfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff810cea15>] ? call_rcu_sched+0x15/0x20
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff81088cad>] ? commit_creds+0x11d/0x1e0
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08c6775>] fsfilt_ldiskfs_map_inode_pages+0x95/0xa0 [fsfilt_ldiskfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa087efb8>] ? ldiskfs_journal_start_sb+0x58/0x90 [ldiskfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08ff4c5>] filter_do_bio+0xd75/0x1860 [obdfilter]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa0901bd8>] filter_commitrw_write+0x13d8/0x284c [obdfilter]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08f4535>] filter_commitrw+0x2c5/0x2f0 [obdfilter]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa05ffc65>] ? lustre_msg_buf+0x85/0x90 [ptlrpc]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa062ba7b>] ? __req_capsule_get+0x14b/0x6b0 [ptlrpc]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa00f973c>] ? lprocfs_counter_add+0x12c/0x170 [lvfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08abffa>] obd_commitrw+0x11a/0x410 [ost]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08b4644>] ost_brw_write+0xff4/0x1e90 [ost]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa05f9e44>] ? ptlrpc_send_reply+0x284/0x6f0 [ptlrpc]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08f2cb9>] ? filter_truncate+0xd9/0x290 [obdfilter]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa00f973c>] ? lprocfs_counter_add+0x12c/0x170 [lvfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8104e1b0>] ? default_wake_function+0x0/0x20
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa05ff884>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa08b937a>] ost_handle+0x3e9a/0x4f40 [ost]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8124a390>] ? __bitmap_weight+0x50/0xb0
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa05ff884>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa060eb11>] ptlrpc_server_handle_request+0x421/0xef0 [ptlrpc]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8104079e>] ? activate_task+0x2e/0x40
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8104e0b6>] ? try_to_wake_up+0x286/0x380
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8104e1c2>] ? default_wake_function+0x12/0x20
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff81041059>] ? __wake_up_common+0x59/0x90
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa012c5ae>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa060ff0a>] ptlrpc_main+0x92a/0x15b0 [ptlrpc]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8104e1b0>] ? default_wake_function+0x0/0x20
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8100d1aa>] child_rip+0xa/0x20
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffffa060f5e0>] ? ptlrpc_main+0x0/0x15b0 [ptlrpc]
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: [<ffffffff8100d1a0>] ? child_rip+0x0/0x20
      2011-04-02 10:59:08 Apr 2 10:59:08 node206 kernel: --[ end trace bb3c2f07eefda100 ]--

      .........

      =============================================================

      To me, this problem looks very similar to bugzilla 22363. But it is strange that the fix for this bug was only landed in 1.8 branch. In comment 28 Andrew ays that master does not need it as it is a SLES11-only fix, but now that we support RHEL6 in master, is this still true?
      And also, I noticed that the patch quota-support-64-bit-quota-format.patch is not applied in 2.6-rhel6.series file.

      What do you think?

      TIA,
      Sebastien.

      Attachments

        Activity

          [LU-195] OSS nodes hung due to multiple threads spinning on dq_list_lock with Lustre quotas enabled

          People

            bobijam Zhenyu Xu
            sebastien.buisson Sebastien Buisson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: