Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2279

replay-single test 88 LustreError: 10305:0:(osp_precreate.c:694:osp_precreate_reserve()) LBUG

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 5462

    Description

      Running replay-single, I hit this LBUG:

      [50576.766529] Lustre: DEBUG MARKER: == replay-single test 88: MDS should not assign same objid to different files == 01:10:48 (1352095848)
      [50576.964168] Turning device loop1 (0x700001) read-only
      [50576.985608] Lustre: DEBUG MARKER: ost1 REPLAY BARRIER on lustre-OST0000
      [50576.989899] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-OST0000
      [50577.290548] Turning device loop0 (0x700000) read-only
      [50577.316807] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      [50577.323211] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
      [50577.541027] Lustre: lustre-OST0000-osc-MDT0000: slow creates, last=8104, next=8104, reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=-19
      [50577.541472] LustreError: 10305:0:(osp_precreate.c:694:osp_precreate_reserve()) LBUG
      [50577.541985] Pid: 10305, comm: mdt00_000
      [50577.542331] 
      [50577.542332] Call Trace:
      [50577.542866]  [<ffffffffa0455915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [50577.543256]  [<ffffffffa0455f27>] lbug_with_loc+0x47/0xb0 [libcfs]
      [50577.543618]  [<ffffffffa0e4473c>] osp_precreate_reserve+0x50c/0x600 [osp]
      [50577.544000]  [<ffffffffa0e3d7a0>] ? osp_object_alloc+0x120/0x290 [osp]
      [50577.544368]  [<ffffffffa0e43533>] osp_declare_object_create+0x163/0x540 [osp]
      [50577.544785]  [<ffffffffa07384bd>] lod_qos_declare_object_on+0xed/0x4c0 [lod]
      [50577.545173]  [<ffffffffa073b65b>] lod_qos_prep_create+0x92b/0x1848 [lod]
      [50577.545549]  [<ffffffffa073686b>] lod_declare_striped_object+0x14b/0x920 [lod]
      [50577.546558]  [<ffffffffa0737348>] lod_declare_object_create+0x308/0x4f0 [lod]
      [50577.546905]  [<ffffffffa0cf1c4f>] mdd_declare_object_create_internal+0xaf/0x1d0 [mdd]
      [50577.547452]  [<ffffffffa0d02aca>] mdd_create+0x39a/0x1550 [mdd]
      [50577.547770]  [<ffffffffa0de9d28>] mdt_reint_open+0x1078/0x1870 [mdt]
      [50577.548094]  [<ffffffffa0d08b0e>] ? md_ucred+0x1e/0x60 [mdd]
      [50577.548403]  [<ffffffffa0db5655>] ? mdt_ucred+0x15/0x20 [mdt]
      [50577.548718]  [<ffffffffa0dd4c81>] mdt_reint_rec+0x41/0xe0 [mdt]
      [50577.549059]  [<ffffffffa0dce413>] mdt_reint_internal+0x4e3/0x7e0 [mdt]
      [50577.549395]  [<ffffffffa0dce9dd>] mdt_intent_reint+0x1ed/0x500 [mdt]
      [50577.549715]  [<ffffffffa0dca3ae>] mdt_intent_policy+0x38e/0x770 [mdt]
      [50577.550077]  [<ffffffffa0783dda>] ldlm_lock_enqueue+0x2ea/0x890 [ptlrpc]
      [50577.550424]  [<ffffffffa07ab187>] ldlm_handle_enqueue0+0x4e7/0x1010 [ptlrpc]
      [50577.550761]  [<ffffffffa0dca8e6>] mdt_enqueue+0x46/0x130 [mdt]
      [50577.551076]  [<ffffffffa0dbe1f2>] mdt_handle_common+0x932/0x1740 [mdt]
      [50577.551425]  [<ffffffffa0dbf0d5>] mdt_regular_handle+0x15/0x20 [mdt]
      [50577.551797]  [<ffffffffa07d9573>] ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc]
      [50577.552413]  [<ffffffffa045666e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [50577.552697]  [<ffffffffa07d2241>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
      [50577.552950]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
      [50577.553303]  [<ffffffffa07dc10a>] ptlrpc_main+0xb9a/0x1960 [ptlrpc]
      [50577.553709]  [<ffffffffa07db570>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
      [50577.553927]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [50577.554222]  [<ffffffffa07db570>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
      [50577.554605]  [<ffffffffa07db570>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
      [50577.554960]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      [50577.555289] 
      [50577.560016] Kernel panic - not syncing: LBUG
      

      crashdump and modules are at /exports/crashdumps/192.168.10.210-2012-11-05-01\:10\:50

      Attachments

        Activity

          [LU-2279] replay-single test 88 LustreError: 10305:0:(osp_precreate.c:694:osp_precreate_reserve()) LBUG
          jlevi Jodi Levi (Inactive) made changes -
          Fix Version/s New: Lustre 2.4.0 [ 10154 ]
          bzzz Alex Zhuravlev made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: In Progress [ 3 ] New: Resolved [ 5 ]

          the patch is landed

          bzzz Alex Zhuravlev added a comment - the patch is landed
          bzzz Alex Zhuravlev added a comment - http://review.whamcloud.com/4847
          morrone Christopher Morrone (Inactive) made changes -
          Labels Original: MB New: MB sequoia
          jlevi Jodi Levi (Inactive) made changes -
          Labels Original: sequoia New: MB
          pjones Peter Jones made changes -
          Labels Original: topsequoia New: sequoia

          yes, sorry for confusion.. i totally forgot my own code. there was a problem long ago when OSP ss reconnecting too frequently leading to a storm of wakeups.

          as for OST being selected - it's valid as another OSTs might be in the same state (or just empty, for example).
          there is also a race between disconnect and selection procedure.

          to understand whether this was bad choice or not it's important to know other OST's states.

          bzzz Alex Zhuravlev added a comment - yes, sorry for confusion.. i totally forgot my own code. there was a problem long ago when OSP ss reconnecting too frequently leading to a storm of wakeups. as for OST being selected - it's valid as another OSTs might be in the same state (or just empty, for example). there is also a race between disconnect and selection procedure. to understand whether this was bad choice or not it's important to know other OST's states.
          di.wang Di Wang added a comment -

          I am confused here, or I miss something probably. According to the log we saw here

          [50577.541027] Lustre: lustre-OST0000-osc-MDT0000: slow creates, last=8104, next=8104, reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=-19
          

          sync_changes and syn_rpc_in_progress are zero as expected, and status is -19 because ost is not setup in time, so the loop in osp_precreate_reserve will
          not sleep at all (why you say it was waken up so often?), i.e. it will keep running until count reach 1000 and hit LBUG, unless it reaches expire time first, which might be unlikely given that expire time is normally 100 seconds.

          Another thing I do not understand is that why this OST is still being chosen to do procreate if it is not ready yet, probably another bug?

          Please correct me if I misunderstand sth.

          di.wang Di Wang added a comment - I am confused here, or I miss something probably. According to the log we saw here [50577.541027] Lustre: lustre-OST0000-osc-MDT0000: slow creates, last=8104, next=8104, reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=-19 sync_changes and syn_rpc_in_progress are zero as expected, and status is -19 because ost is not setup in time, so the loop in osp_precreate_reserve will not sleep at all (why you say it was waken up so often?), i.e. it will keep running until count reach 1000 and hit LBUG, unless it reaches expire time first, which might be unlikely given that expire time is normally 100 seconds. Another thing I do not understand is that why this OST is still being chosen to do procreate if it is not ready yet, probably another bug? Please correct me if I misunderstand sth.

          but we do not want to be blocked forever. this is why osp_precreate_reserve() exits when specified period expires. again the question why is it woken up so often?
          given there is no connection, why status has changed that many times?

          bzzz Alex Zhuravlev added a comment - but we do not want to be blocked forever. this is why osp_precreate_reserve() exits when specified period expires. again the question why is it woken up so often? given there is no connection, why status has changed that many times?

          People

            bzzz Alex Zhuravlev
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: