XMLWordPrintable

Details

    • Technical task
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      Maloo testing reports the following panic:

      LustreError: 7:0:(wbc.c:866:wbc_inode_flush_lockdrop()) ASSERTION( !wbc_inode_has_protected(wbci) ) failed: WBC flags: 85 inode 00000000982468e8
      [ 1026.402024] LustreError: 7:0:(wbc.c:866:wbc_inode_flush_lockdrop()) LBUG
      [ 1026.404520] Pid: 7, comm: kworker/u4:0 4.18.0-240.22.1.el8_3.x86_64 #1 SMP Thu Apr 8 19:01:30 UTC 2021
      [ 1026.407533] Call Trace TBD:
      [ 1026.408826] [<0>] libcfs_call_trace+0x6f/0x90 [libcfs]
      [ 1026.410524] [<0>] lbug_with_loc+0x43/0x80 [libcfs]
      [ 1026.412388] [<0>] wbc_inode_flush_lockdrop+0xed/0xf0 [lustre]
      [ 1026.414304] [<0>] wbc_write_inode+0x184/0x1a0 [lustre]
      [ 1026.416076] [<0>] __writeback_single_inode+0x2da/0x370
      [ 1026.417803] [<0>] writeback_sb_inodes+0x1e7/0x440
      [ 1026.419369] [<0>] __writeback_inodes_wb+0x5f/0xc0
      [ 1026.420905] [<0>] wb_writeback+0x25b/0x2f0
      [ 1026.422171] [<0>] wb_workfn+0x192/0x4a0
      [ 1026.423350] [<0>] process_one_work+0x1a7/0x360
      [ 1026.424651] [<0>] worker_thread+0x30/0x390
      [ 1026.425884] [<0>] kthread+0x112/0x130
      [ 1026.427060] [<0>] ret_from_fork+0x35/0x40
      [ 1026.428258] Kernel panic - not syncing: LBUG
      

      The panic is introduced by. a write followed a sync call.
      After analyzed, we found that sync call will try to write the dirty inodes with WB_SYNC_NONE mode:

      writeback_sb_inodes()
                 if ((inode->i_state & I_SYNC) && wbc.sync_mode != WB_SYNC_ALL) {
      			/*
      			 * If this inode is locked for writeback and we are not
      			 * doing writeback-for-data-integrity, move it to
      			 * b_more_io so that writeback can proceed with the
      			 * other inodes on s_io.
      			 *
      			 * We'll have another go at writing back this inode
      			 * when we completed a full scan of b_io.
      			 */
      			spin_unlock(&inode->i_lock);
      			requeue_io(inode, wb);
      			trace_writeback_sb_inodes_requeue(inode);
      			continue;
      		}
      

      When kernel is flushing the parent inode P (with I_SYNC flags), sync with WB_SYNC_NONE (in the sync call context) will skip this inode, move it to b_more_io so that writeback can proceed with the other inodes on s_io.

      However, if the parent inode is not flushed to the server, we can not flush the children inodes under the directory. This is the reason that triggers the panic.
      Thus we need to handle the WB_SYNC_NONE sync mode carefully.

      Attachments

        Activity

          People

            qian_wc Qian Yingjin
            qian_wc Qian Yingjin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: