Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5700

async IO LBUG obj->cob_transient_pages == 0

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.7.0
    • 3
    • 15957

    Description

      LustreError: 58185:0:(rw26.c:474:ll_direct_IO_26()) ASSERTION( obj->cob_transient_pages == 0 ) failed:
      LustreError: 58184:0:(rw26.c:474:ll_direct_IO_26()) ASSERTION( obj->cob_transient_pages == 0 ) failed:
      LustreError: 58184:0:(rw26.c:474:ll_direct_IO_26()) LBUG
      Pid: 58184, comm: sio

      Call Trace:
      [<ffffffffa057d895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [<ffffffffa057de97>] lbug_with_loc+0x47/0xb0 [libcfs]
      [<ffffffffa0c03044>] ll_direct_IO_26+0xb64/0x1140 [lustre]
      LustreError: 58186:0:(rw26.c:474:ll_direct_IO_26()) ASSERTION( obj->cob_transient_pages == 0 ) failed:
      LustreError: 58186:0:(rw26.c:474:ll_direct_IO_26()) LBUG
      Pid: 58186, comm: sio

      Call Trace:
      [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
      [<ffffffff8112158b>] generic_file_aio_read+0x6bb/0x700
      [<ffffffff811b4b5f>] ? list_move+0x1f/0x30
      [<ffffffff811b51ff>] ? __mark_inode_dirty+0x13f/0x160
      [<ffffffff811a59cd>] ? touch_atime+0x14d/0x1a0
      [<ffffffffa0c173be>] vvp_io_read_start+0x22e/0x450 [lustre]
      [<ffffffffa057d895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [<ffffffffa057de97>] lbug_with_loc+0x47/0xb0 [libcfs]
      LustreError: 58183:0:(rw26.c:474:ll_direct_IO_26()) ASSERTION( obj->cob_transient_pages == 0 ) failed:
      LustreError: 58183:0:(rw26.c:474:ll_direct_IO_26()) LBUG
      Pid: 58183, comm: sio

      Call Trace:
      [<ffffffffa0c03044>] ll_direct_IO_26+0xb64/0x1140 [lustre]
      [<ffffffffa057d895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
      [<ffffffffa057de97>] lbug_with_loc+0x47/0xb0 [libcfs]
      [<ffffffff8112158b>] generic_file_aio_read+0x6bb/0x700
      [<ffffffffa06f14da>] cl_io_start+0x6a/0x140 [obdclass]
      [<ffffffff81078fa7>] ? current_fs_time+0x27/0x30
      [<ffffffffa0bce4d5>] ? ll_inode_size_unlock+0x15/0x20 [lustre]
      [<ffffffffa06f5664>] cl_io_loop+0xb4/0x1b0 [obdclass]
      [<ffffffffa0bb3f41>] ll_file_io_generic+0x481/0xaa0 [lustre]
      [<ffffffffa0c0d6e5>] ? ccc_object_size_unlock+0x35/0x40 [lustre]
      [<ffffffffa0bb4da0>] ll_file_aio_read+0x130/0x2b0 [lustre]
      [<ffffffff811a58f1>] ? touch_atime+0x71/0x1a0
      [<ffffffffa0bb5079>] ll_file_read+0x159/0x290 [lustre]
      [<ffffffffa0c173be>] vvp_io_read_start+0x22e/0x450 [lustre]
      [<ffffffffa0c03044>] ll_direct_IO_26+0xb64/0x1140 [lustre]
      [<ffffffff811894d5>] vfs_read+0xb5/0x1a0
      [<ffffffff81189802>] sys_pread64+0x82/0xa0
      [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

      [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
      [<ffffffff8112158b>] generic_file_aio_read+0x6bb/0x700
      [<ffffffff81078fa7>] ? current_fs_time+0x27/0x30
      [<ffffffffa0bce4d5>] ? ll_inode_size_unlock+0x15/0x20 [lustre]
      [<ffffffffa0c0d6e5>] ? ccc_object_size_unlock+0x35/0x40 [lustre]
      [<ffffffff811a58f1>] ? touch_atime+0x71/0x1a0
      [<ffffffffa0c173be>] vvp_io_read_start+0x22e/0x450 [lustre]
      [<ffffffffa06f14da>] cl_io_start+0x6a/0x140 [obdclass]
      [<ffffffffa06f5664>] cl_io_loop+0xb4/0x1b0 [obdclass]
      [<ffffffffa0bb3f41>] ll_file_io_generic+0x481/0xaa0 [lustre]
      [<ffffffffa0bb4da0>] ll_file_aio_read+0x130/0x2b0 [lustre]
      [<ffffffffa0bb5079>] ll_file_read+0x159/0x290 [lustre]
      [<ffffffff811894d5>] vfs_read+0xb5/0x1a0
      [<ffffffff81189802>] sys_pread64+0x82/0xa0
      [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

      [<ffffffffa06f14da>] cl_io_start+0x6a/0x140 [obdclass]
      [<ffffffffa06f5664>] cl_io_loop+0xb4/0x1b0 [obdclass]
      [<ffffffffa0bb3f41>] ll_file_io_generic+0x481/0xaa0 [lustre]
      [<ffffffffa0bb4da0>] ll_file_aio_read+0x130/0x2b0 [lustre]
      [<ffffffffa0bb5079>] ll_file_read+0x159/0x290 [lustre]
      [<ffffffff811894d5>] vfs_read+0xb5/0x1a0
      [<ffffffff81189802>] sys_pread64+0x82/0xa0
      [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

      Kernel panic - not syncing: LBUG
      Pid: 58184, comm: sio Not tainted 2.6.32-431.20.3.hz100.el6.x86_64 #1
      Call Trace:
      [<ffffffff81528067>] ? panic+0xa7/0x16f
      [<ffffffffa057deeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      [<ffffffffa0c03044>] ? ll_direct_IO_26+0xb64/0x1140 [lustre]
      [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
      [<ffffffff8112158b>] ? generic_file_aio_read+0x6bb/0x700
      [<ffffffff811b4b5f>] ? list_move+0x1f/0x30
      [<ffffffff811b51ff>] ? __mark_inode_dirty+0x13f/0x160
      [<ffffffff811a59cd>] ? touch_atime+0x14d/0x1a0
      [<ffffffffa0c173be>] ? vvp_io_read_start+0x22e/0x450 [lustre]
      [<ffffffffa06f14da>] ? cl_io_start+0x6a/0x140 [obdclass]
      [<ffffffffa06f5664>] ? cl_io_loop+0xb4/0x1b0 [obdclass]
      [<ffffffffa0bb3f41>] ? ll_file_io_generic+0x481/0xaa0 [lustre]
      [<ffffffffa0bb4da0>] ? ll_file_aio_read+0x130/0x2b0 [lustre]
      [<ffffffffa0bb5079>] ? ll_file_read+0x159/0x290 [lustre]
      [<ffffffff811894d5>] ? vfs_read+0xb5/0x1a0
      [<ffffffff81189802>] ? sys_pread64+0x82/0xa0
      [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

      Attachments

        Issue Links

          Activity

            [LU-5700] async IO LBUG obj->cob_transient_pages == 0
            pjones Peter Jones added a comment -

            ok thanks Steve

            pjones Peter Jones added a comment - ok thanks Steve

            Works for now.

            We're not entirely convinced that this section of code is safe for reentry, but this clearly fixed one problem, and we can open a new ticket if we identify another.

            schamp Stephen Champion added a comment - Works for now. We're not entirely convinced that this section of code is safe for reentry, but this clearly fixed one problem, and we can open a new ticket if we identify another.

            Stephen,
            Your patch 12179 has been merged into master (October 10).

            Do you need any more work on this ticket, or can I mark it as resolved, fixed?

            Thanks,
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Stephen, Your patch 12179 has been merged into master (October 10). Do you need any more work on this ticket, or can I mark it as resolved, fixed? Thanks, ~ jfc.

            Our test system had another crash with normal (not async) DIO last night. This led to

            http://review.whamcloud.com/#/c/12179/

            Which lets us get on with the performance evaluation we were doing.

            The patch removes the assertions and makes the counter atomic. We are not convinced that this section is safe for multiple threads - it needs to be considered carefully. The patch should not introduce any new bugs, but may expose something by not crashing.

            schamp Stephen Champion added a comment - Our test system had another crash with normal (not async) DIO last night. This led to http://review.whamcloud.com/#/c/12179/ Which lets us get on with the performance evaluation we were doing. The patch removes the assertions and makes the counter atomic. We are not convinced that this section is safe for multiple threads - it needs to be considered carefully. The patch should not introduce any new bugs, but may expose something by not crashing.
            pjones Peter Jones added a comment -

            Lai

            Could you please advise on this issue?

            Thanks

            Peter

            pjones Peter Jones added a comment - Lai Could you please advise on this issue? Thanks Peter

            A bit more info from the commit header: the change was LU-1669

            commit ed5ebb87bfc2b684958daac90c4369f395482a16
            Author: Prakash Surya <surya1@llnl.gov>
            Date: Wed Oct 2 17:16:51 2013 -0700

            LU-1669 vvp: Use lockless __generic_file_aio_write
            ...
            Change-Id: I0023132b5d941b3304f39f015f95106542998072
            Reviewed-on: http://review.whamcloud.com/6672

            olaf Olaf Weber (Inactive) added a comment - A bit more info from the commit header: the change was LU-1669 commit ed5ebb87bfc2b684958daac90c4369f395482a16 Author: Prakash Surya <surya1@llnl.gov> Date: Wed Oct 2 17:16:51 2013 -0700 LU-1669 vvp: Use lockless __generic_file_aio_write ... Change-Id: I0023132b5d941b3304f39f015f95106542998072 Reviewed-on: http://review.whamcloud.com/6672

            This is easy to reproduce using sio on an 8 socket UV system with 160 threads and 4 tb of memory. I have been unable to reproduce this on a smaller configuration thus far.

            root@cy024-4-sys:/mnt/cy024/schamp # ./sio -tDcw -b 2048 -A 16 -s 83865632 -g /mnt/cy024/schamp/sio.1
            Warning: filesize adjusted to 83865600 blocks
            <crash>

            3 OSS with five OST each, stripe size -1 :

            1. lfs getstripe /mnt/cy024-mnt/schamp/sio.1
              /mnt/cy024-mnt/schamp/sio.1
              lmm_stripe_count: 15
              lmm_stripe_size: 1048576
              lmm_pattern: 1
              lmm_layout_gen: 0
              lmm_stripe_offset: 7
              obdidx objid objid group
              7 6276 0x1884 0
              11 6276 0x1884 0
              15 6276 0x1884 0
              2 6276 0x1884 0
              13 6276 0x1884 0
              9 6276 0x1884 0
              6 6276 0x1884 0
              10 6276 0x1884 0
              12 6276 0x1884 0
              4 6276 0x1884 0
              1 6276 0x1884 0
              0 6276 0x1884 0
              14 6276 0x1884 0
              5 6276 0x1884 0
              8 6276 0x1884 0

            We have cores if they are useful.

            Olaf Weber's notes:

            There are four threads of interest, with stack traces like this showing they're doing direct IO.

            PID: 58183 TASK: ffff893de6382080 CPU: 151 COMMAND: "sio"
            #0 [ffff893dd1fed9b0] panic at ffffffff81528001
            #1 [ffff893dd1feda30] lbug_with_loc at ffffffffa057deeb [libcfs]
            #2 [ffff893dd1feda50] ll_direct_IO_26 at ffffffffa0c03044 [lustre]
            #3 [ffff893dd1fedb80] generic_file_aio_read at ffffffff8112158b
            #4 [ffff893dd1fedc60] vvp_io_read_start at ffffffffa0c173be [lustre]
            #5 [ffff893dd1fedcd0] cl_io_start at ffffffffa06f14da [obdclass]
            #6 [ffff893dd1fedd00] cl_io_loop at ffffffffa06f5664 [obdclass]
            #7 [ffff893dd1fedd30] ll_file_io_generic at ffffffffa0bb3f41 [lustre]
            #8 [ffff893dd1fede20] ll_file_aio_read at ffffffffa0bb4da0 [lustre]
            #9 [ffff893dd1fede80] ll_file_read at ffffffffa0bb5079 [lustre]
            #10 [ffff893dd1fedef0] vfs_read at ffffffff811894d5
            #11 [ffff893dd1fedf30] sys_pread64 at ffffffff81189802
            #12 [ffff893dd1fedf80] system_call_fastpath at ffffffff8100b072
            RIP: 00007ffff7495e13 RSP: 00007ffff7ff8dd8 RFLAGS: 00010206
            RAX: 0000000000000011 RBX: ffffffff8100b072 RCX: 0000000000000000
            RDX: 0000000000800000 RSI: 00007ffff699c000 RDI: 0000000000000003
            RBP: 0000000000606240 R8: 00007ffff7ff9700 R9: 00007ffff7ff9700
            R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffff7ff9700
            R13: 0000000000000003 R14: 000000000061f3e8 R15: 0000000000000003
            ORIG_RAX: 0000000000000011 CS: 0033 SS: 002b

            Approximately they're here:

            lustre/llite/rw26.c
            373 static ssize_t ll_direct_IO_26(int rw, struct kiocb *iocb,
            374 const struct iovec *iov, loff_t file_offset,
            375 unsigned long nr_segs)
            376 {
            377 struct lu_env *env;
            378 struct cl_io *io;
            379 struct file *file = iocb->ki_filp;
            380 struct inode *inode = file->f_mapping->host;
            381 struct ccc_object *obj = cl_inode2ccc(inode);
            382 long count = iov_length(iov, nr_segs);
            ...
            433 page_count = ll_get_user_pages(rw, user_addr, bytes,
            434 &pages, &max_pages);
            435 if (likely(page_count > 0)) {
            436 if (unlikely(page_count < max_pages))
            437 bytes = page_count << PAGE_CACHE_SHIFT;
            438 result = ll_direct_IO_26_seg(env, io, rw, inode,
            439 file->f_mapping,
            440 bytes, file_offset,
            441 pages, page_count);
            442 ll_free_user_pages(pages, max_pages, rw==READ);
            ...
            474 out:

            • 475 LASSERT(obj->cob_transient_pages == 0);

            They are all referencing the same obj

            crash> ccc_object.cob_transient_pages ffff88bdd31dd078
            cob_transient_pages = 36

            Looking at how cob_transient_pages is manipulated:

            lustre/include/lclient.h:
            194 /**
            195 * ccc-private object state.
            196 */
            197 struct ccc_object {
            ...
            210 /**
            211 * Access this counter is protected by inode->i_sem. Now that
            212 * the lifetime of transient pages must be covered by inode sem,
            213 * we don't need to hold any lock..
            214 */
            215 int cob_transient_pages;

            Protected by i_sem? A very bad sign, as i_sem has been replaced by i_mutex for
            quite some time now, and the i_mutex is not held across a read operation. Lustre
            would have to do this itself.

            Which points to commit ed5ebb87bfc2b684958daac90c4369f395482a16, part of which is this:

            diff --git a/lustre/llite/rw26.c b/lustre/llite/rw26.c
            index b605fa4..b9c8293 100644
            — a/lustre/llite/rw26.c
            +++ b/lustre/llite/rw26.c
            @@ -412,13 +412,6 @@ static ssize_t ll_direct_IO_26(int rw, struct kiocb *iocb,
            io = ccc_env_io(env)->cui_cl.cis_io;
            LASSERT(io != NULL);

            • /* 0. Need locking between buffered and direct access. and race with
            • * size changing by concurrent truncates and writes.
            • * 1. Need inode mutex to operate transient pages.
            • */
            • if (rw == READ)
            • mutex_lock(&inode->i_mutex);
              -
              LASSERT(obj->cob_transient_pages == 0);
              for (seg = 0; seg < nr_segs; seg++) { long iov_left = iov[seg].iov_len; @@ -480,8 +473,6 @@ static ssize_t ll_direct_IO_26(int rw, struct kiocb *iocb, }

              out:
              LASSERT(obj->cob_transient_pages == 0);

            • if (rw == READ)
            • mutex_unlock(&inode->i_mutex);

            if (tot_bytes > 0) {
            struct ccc_io *cio = ccc_env_io(env);

            schamp Stephen Champion added a comment - This is easy to reproduce using sio on an 8 socket UV system with 160 threads and 4 tb of memory. I have been unable to reproduce this on a smaller configuration thus far. root@cy024-4-sys:/mnt/cy024/schamp # ./sio -tDcw -b 2048 -A 16 -s 83865632 -g /mnt/cy024/schamp/sio.1 Warning: filesize adjusted to 83865600 blocks <crash> 3 OSS with five OST each, stripe size -1 : lfs getstripe /mnt/cy024-mnt/schamp/sio.1 /mnt/cy024-mnt/schamp/sio.1 lmm_stripe_count: 15 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 7 obdidx objid objid group 7 6276 0x1884 0 11 6276 0x1884 0 15 6276 0x1884 0 2 6276 0x1884 0 13 6276 0x1884 0 9 6276 0x1884 0 6 6276 0x1884 0 10 6276 0x1884 0 12 6276 0x1884 0 4 6276 0x1884 0 1 6276 0x1884 0 0 6276 0x1884 0 14 6276 0x1884 0 5 6276 0x1884 0 8 6276 0x1884 0 We have cores if they are useful. Olaf Weber's notes: There are four threads of interest, with stack traces like this showing they're doing direct IO. PID: 58183 TASK: ffff893de6382080 CPU: 151 COMMAND: "sio" #0 [ffff893dd1fed9b0] panic at ffffffff81528001 #1 [ffff893dd1feda30] lbug_with_loc at ffffffffa057deeb [libcfs] #2 [ffff893dd1feda50] ll_direct_IO_26 at ffffffffa0c03044 [lustre] #3 [ffff893dd1fedb80] generic_file_aio_read at ffffffff8112158b #4 [ffff893dd1fedc60] vvp_io_read_start at ffffffffa0c173be [lustre] #5 [ffff893dd1fedcd0] cl_io_start at ffffffffa06f14da [obdclass] #6 [ffff893dd1fedd00] cl_io_loop at ffffffffa06f5664 [obdclass] #7 [ffff893dd1fedd30] ll_file_io_generic at ffffffffa0bb3f41 [lustre] #8 [ffff893dd1fede20] ll_file_aio_read at ffffffffa0bb4da0 [lustre] #9 [ffff893dd1fede80] ll_file_read at ffffffffa0bb5079 [lustre] #10 [ffff893dd1fedef0] vfs_read at ffffffff811894d5 #11 [ffff893dd1fedf30] sys_pread64 at ffffffff81189802 #12 [ffff893dd1fedf80] system_call_fastpath at ffffffff8100b072 RIP: 00007ffff7495e13 RSP: 00007ffff7ff8dd8 RFLAGS: 00010206 RAX: 0000000000000011 RBX: ffffffff8100b072 RCX: 0000000000000000 RDX: 0000000000800000 RSI: 00007ffff699c000 RDI: 0000000000000003 RBP: 0000000000606240 R8: 00007ffff7ff9700 R9: 00007ffff7ff9700 R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffff7ff9700 R13: 0000000000000003 R14: 000000000061f3e8 R15: 0000000000000003 ORIG_RAX: 0000000000000011 CS: 0033 SS: 002b Approximately they're here: lustre/llite/rw26.c 373 static ssize_t ll_direct_IO_26(int rw, struct kiocb *iocb, 374 const struct iovec *iov, loff_t file_offset, 375 unsigned long nr_segs) 376 { 377 struct lu_env *env; 378 struct cl_io *io; 379 struct file *file = iocb->ki_filp; 380 struct inode *inode = file->f_mapping->host; 381 struct ccc_object *obj = cl_inode2ccc(inode); 382 long count = iov_length(iov, nr_segs); ... 433 page_count = ll_get_user_pages(rw, user_addr, bytes, 434 &pages, &max_pages); 435 if (likely(page_count > 0)) { 436 if (unlikely(page_count < max_pages)) 437 bytes = page_count << PAGE_CACHE_SHIFT; 438 result = ll_direct_IO_26_seg(env, io, rw, inode, 439 file->f_mapping, 440 bytes, file_offset, 441 pages, page_count); 442 ll_free_user_pages(pages, max_pages, rw==READ); ... 474 out: 475 LASSERT(obj->cob_transient_pages == 0); They are all referencing the same obj crash> ccc_object.cob_transient_pages ffff88bdd31dd078 cob_transient_pages = 36 Looking at how cob_transient_pages is manipulated: lustre/include/lclient.h: 194 /** 195 * ccc-private object state. 196 */ 197 struct ccc_object { ... 210 /** 211 * Access this counter is protected by inode->i_sem. Now that 212 * the lifetime of transient pages must be covered by inode sem, 213 * we don't need to hold any lock.. 214 */ 215 int cob_transient_pages; Protected by i_sem? A very bad sign, as i_sem has been replaced by i_mutex for quite some time now, and the i_mutex is not held across a read operation. Lustre would have to do this itself. Which points to commit ed5ebb87bfc2b684958daac90c4369f395482a16, part of which is this: diff --git a/lustre/llite/rw26.c b/lustre/llite/rw26.c index b605fa4..b9c8293 100644 — a/lustre/llite/rw26.c +++ b/lustre/llite/rw26.c @@ -412,13 +412,6 @@ static ssize_t ll_direct_IO_26(int rw, struct kiocb *iocb, io = ccc_env_io(env)->cui_cl.cis_io; LASSERT(io != NULL); /* 0. Need locking between buffered and direct access. and race with * size changing by concurrent truncates and writes. * 1. Need inode mutex to operate transient pages. */ if (rw == READ) mutex_lock(&inode->i_mutex); - LASSERT(obj->cob_transient_pages == 0); for (seg = 0; seg < nr_segs; seg++) { long iov_left = iov[seg].iov_len; @@ -480,8 +473,6 @@ static ssize_t ll_direct_IO_26(int rw, struct kiocb *iocb, } out: LASSERT(obj->cob_transient_pages == 0); if (rw == READ) mutex_unlock(&inode->i_mutex); if (tot_bytes > 0) { struct ccc_io *cio = ccc_env_io(env);

            People

              laisiyao Lai Siyao
              schamp Stephen Champion
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: