Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4336

Client LBUG ASSERTION( id == qid[type]

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.4.1

    • client: Sles11SP2 lustre2.4.1
      server: Centos6.4 lustre2.4.0,
    • 3
    • 11861

    Description

      we have crash dump available for upload.

      [692937.813459] LustreError: 82893:0:(osc_quota.c:62:osc_quota_chkdq()) ASSERTION( id == qid[type] ) failed: The ids don't match 1515870810 != 11629^M
      [692937.826479] LustreError: 82893:0:(osc_quota.c:62:osc_quota_chkdq()) LBUG^M
      [692937.833294] Pid: 82893, comm: wrf.exe^M
      [692937.837070] ^M
      [692937.837071] Call Trace:^M
      [692937.841208] [<ffffffff810048c5>] dump_trace+0x75/0x310^M
      [692937.846529] [<ffffffffa04c082a>] libcfs_debug_dumpstack+0x4a/0x70 [libcfs]^M
      [692937.853581] [<ffffffffa04c0d5e>] lbug_with_loc+0x3e/0xb0 [libcfs]^M
      [692937.859853] [<ffffffffa090247a>] osc_quota_chkdq+0x20a/0x210 [osc]^M
      [692937.866207] [<ffffffffa0915ec8>] osc_queue_async_io+0xb58/0x1c90 [osc]^M
      [692937.872906] [<ffffffffa08fa6b4>] osc_page_cache_add+0xc4/0x200 [osc]^M
      [692937.879447] [<ffffffffa062a466>] cl_page_cache_add+0x86/0x230 [obdclass]^M
      [692937.886341] [<ffffffffa0992532>] lov_page_cache_add+0x1a2/0x1f0 [lov]^M
      [692937.892968] [<ffffffffa062a466>] cl_page_cache_add+0x86/0x230 [obdclass]^M
      [692937.899869] [<ffffffffa0a71649>] vvp_io_commit_write+0x429/0x640 [lustre]^M
      [692937.906843] [<ffffffffa0639d8c>] cl_io_commit_write+0x9c/0x1d0 [obdclass]^M
      [692937.913826] [<ffffffffa0a45f04>] ll_commit_write+0x104/0x1f0 [lustre]^M
      [692937.920440] [<ffffffffa0a5efca>] ll_write_end+0x2a/0x60 [lustre]^M
      [692937.926622] [<ffffffff810ef312>] generic_perform_write+0x122/0x1c0^M
      [692937.932960] [<ffffffff810ef411>] generic_file_buffered_write+0x61/0xa0^M
      [692937.939647] [<ffffffff810f231a>] __generic_file_aio_write+0x28a/0x480^M
      [692937.946245] [<ffffffff810f2568>] generic_file_aio_write+0x58/0xc0^M
      [692937.952509] [<ffffffffa0a74101>] vvp_io_write_start+0xc1/0x2e0 [lustre]^M
      [692937.959312] [<ffffffffa06368d9>] cl_io_start+0x69/0x140 [obdclass]^M
      [692937.965686] [<ffffffffa063ac93>] cl_io_loop+0xa3/0x190 [obdclass]^M
      [692937.971968] [<ffffffffa0a19d51>] ll_file_io_generic+0x461/0x600 [lustre]^M
      [692937.978842] [<ffffffffa0a1a126>] ll_file_aio_write+0x236/0x290 [lustre]^M
      [692937.985624] [<ffffffffa0a1b333>] ll_file_write+0x203/0x290 [lustre]^M
      [692937.992052] [<ffffffff81150e9e>] vfs_write+0xce/0x140^M
      [692937.997266] [<ffffffff81151013>] sys_write+0x53/0xa0^M
      [692938.002393] [<ffffffff8145ab12>] system_call_fastpath+0x16/0x1b^M
      [692938.008486] [<00002aaaab6946f0>] 0x2aaaab6946f0^M
      [692938.013179] ^M
      [692938.019447] Kernel panic - not syncing: LBUG^M
      [692938.023804] Pid: 82893, comm: wrf.exe Tainted: G C NX 3.0.74-0.6.6.2.20130516-nasa #1^M
      [692938.032398] Call Trace:^M

      Attachments

        Activity

          [LU-4336] Client LBUG ASSERTION( id == qid[type]
          pjones Peter Jones added a comment -

          Landed for 2.5.1 and 2.6. Will track landing on older branches separately

          pjones Peter Jones added a comment - Landed for 2.5.1 and 2.6. Will track landing on older branches separately
          niu Niu Yawei (Inactive) added a comment - b2_1: http://review.whamcloud.com/8596 b2_4: http://review.whamcloud.com/8595

          Could this be related to Lu-4249. They are both the same filesystem

          I don't think so, this is a client bug, whereas LU-4249 is a server side problem, and not like the oqi hash on client, the server side lqe hash has reference count for each lqe, so it has no such race in theory.

          niu Niu Yawei (Inactive) added a comment - Could this be related to Lu-4249. They are both the same filesystem I don't think so, this is a client bug, whereas LU-4249 is a server side problem, and not like the oqi hash on client, the server side lqe hash has reference count for each lqe, so it has no such race in theory.

          Could this be related to Lu-4249. They are both the same filesystem

          mhanafi Mahmoud Hanafi added a comment - Could this be related to Lu-4249. They are both the same filesystem
          niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/8460

          1515870810 is 0x5a5a5a5a, which means the oqi in hash has been freed at this moment, looks there is an obvious race between osc_quota_setdq() and osc_quota_chkdq(). I'll cooke a patch to fix it soon.

          niu Niu Yawei (Inactive) added a comment - 1515870810 is 0x5a5a5a5a, which means the oqi in hash has been freed at this moment, looks there is an obvious race between osc_quota_setdq() and osc_quota_chkdq(). I'll cooke a patch to fix it soon.

          People

            niu Niu Yawei (Inactive)
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: