Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1563

lustre_quota.h:326:lqs_putref() LBUG

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 1.8.9
    • Lustre 1.8.7
    • None
    • lustre-1.8.7-wc1, RHEL5(Server), RHEL6(Client)
    • 3
    • 6372

    Description

      Our customer hit the following LBUG when probably the quota is enabled.
      It looks like this is very similar to LU-1224, but it's not fixed even yet.

      Jun 13 17:03:57 nos021i kernel: LustreError: 8625:0:(lustre_quota.h:326:lqs_putref()) ASSERTION(atomic_read(&lqs->lqs_refcount) > 0) failed
      Jun 13 17:03:57 nos021i kernel: LustreError: 8625:0:(lustre_quota.h:326:lqs_putref()) LBUG
      Jun 13 17:03:57 nos021i kernel: Pid: 8625, comm: ll_ost_io_21
      Jun 13 17:03:57 nos021i kernel: 
      Jun 13 17:03:57 nos021i kernel: Call Trace:
      Jun 13 17:03:57 nos021i kernel:  [<ffffffff886fe6a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
      Jun 13 17:03:57 nos021i kernel:  [<ffffffff886febda>] lbug_with_loc+0x7a/0xd0 [libcfs]
      Jun 13 17:03:57 nos021i kernel:  [<ffffffff88706fc0>] tracefile_init+0x0/0x110 [libcfs]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff889572ef>] quota_pending_commit+0x41f/0x5b0 [lquota]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff800a2dff>] autoremove_wake_function+0x0/0x2e
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff88b4690b>] fsfilt_ldiskfs_commit_wait+0xab/0xd0 [fsfilt_ldiskfs]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff88b8786f>] filter_commitrw_write+0x253f/0x2dd0 [obdfilter]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff88b21e1a>] ost_checksum_bulk+0x2aa/0x5a0 [ost]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff88b28d09>] ost_brw_write+0x1c99/0x2480 [ost]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff88862ac8>] ptlrpc_send_reply+0x5e8/0x600 [ptlrpc]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff8882d8b0>] target_committed_to_req+0x40/0x120 [ptlrpc]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff8008e7f9>] default_wake_function+0x0/0xe
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff888670a8>] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff88b2c09e>] ost_handle+0x2bae/0x55b0 [ost]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff887b8d00>] class_handle2object+0xe0/0x170 [obdclass]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff8882119a>] lock_res_and_lock+0xba/0xd0 [ptlrpc]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff88826168>] __ldlm_handle2lock+0x2f8/0x360 [ptlrpc]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff888766d9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff88876e35>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff8008cc1e>] __wake_up_common+0x3e/0x68
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff88877dc6>] ptlrpc_main+0xf66/0x1120 [ptlrpc]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff88876e60>] ptlrpc_main+0x0/0x1120 [ptlrpc]
      Jun 13 17:03:58 nos021i kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
      

      Attachments

        Issue Links

          Activity

            [LU-1563] lustre_quota.h:326:lqs_putref() LBUG
            pjones Peter Jones added a comment -

            ok. Then marking as resolved as this fix has landed to b1_8. We can reopen if the fix transpires not to have fixed the reported problem

            pjones Peter Jones added a comment - ok. Then marking as resolved as this fix has landed to b1_8. We can reopen if the fix transpires not to have fixed the reported problem

            Hi, Peter

            No, the code in master is different from b1_8, it has similar checking in the quota_pending_commit() like this patch. Thanks.

            niu Niu Yawei (Inactive) added a comment - Hi, Peter No, the code in master is different from b1_8, it has similar checking in the quota_pending_commit() like this patch. Thanks.
            pjones Peter Jones added a comment -

            Niu

            Will this same change be needed for master?

            Peter

            pjones Peter Jones added a comment - Niu Will this same change be needed for master? Peter

            This part of code is racy, quota_pending_commit() could find a lqs (not being held in quota_check_common()) then put it twice.
            I posted a patch for b1_8: http://review.whamcloud.com/#change,3187

            niu Niu Yawei (Inactive) added a comment - This part of code is racy, quota_pending_commit() could find a lqs (not being held in quota_check_common()) then put it twice. I posted a patch for b1_8: http://review.whamcloud.com/#change,3187
            pjones Peter Jones added a comment -

            Niu

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Niu Could you please look into this one? Thanks Peter

            FYI, I suspect it's same with BZ24188, however the patch on BZ24188 is for 2.x which has a total new implementation of cfs_hash, so we can't make a direct backport.

            liang Liang Zhen (Inactive) added a comment - FYI, I suspect it's same with BZ24188, however the patch on BZ24188 is for 2.x which has a total new implementation of cfs_hash, so we can't make a direct backport.

            OSS's log files are uploaded on /uploads/LU-1563. We hit LBUG on these three OSSs.

            ihara Shuichi Ihara (Inactive) added a comment - OSS's log files are uploaded on /uploads/ LU-1563 . We hit LBUG on these three OSSs.

            People

              niu Niu Yawei (Inactive)
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: