[LU-5700] async IO LBUG obj->cob_transient_pages == 0 Created: 02/Oct/14  Updated: 20/Oct/14  Resolved: 20/Oct/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Critical
Reporter: Stephen Champion Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-1669 lli->lli_write_mutex (single shared f... Resolved
Severity: 3
Rank (Obsolete): 15957

 Description   

LustreError: 58185:0:(rw26.c:474:ll_direct_IO_26()) ASSERTION( obj->cob_transient_pages == 0 ) failed:
LustreError: 58184:0:(rw26.c:474:ll_direct_IO_26()) ASSERTION( obj->cob_transient_pages == 0 ) failed:
LustreError: 58184:0:(rw26.c:474:ll_direct_IO_26()) LBUG
Pid: 58184, comm: sio

Call Trace:
[<ffffffffa057d895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa057de97>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0c03044>] ll_direct_IO_26+0xb64/0x1140 [lustre]
LustreError: 58186:0:(rw26.c:474:ll_direct_IO_26()) ASSERTION( obj->cob_transient_pages == 0 ) failed:
LustreError: 58186:0:(rw26.c:474:ll_direct_IO_26()) LBUG
Pid: 58186, comm: sio

Call Trace:
[<ffffffff81058bd3>] ? __wake_up+0x53/0x70
[<ffffffff8112158b>] generic_file_aio_read+0x6bb/0x700
[<ffffffff811b4b5f>] ? list_move+0x1f/0x30
[<ffffffff811b51ff>] ? __mark_inode_dirty+0x13f/0x160
[<ffffffff811a59cd>] ? touch_atime+0x14d/0x1a0
[<ffffffffa0c173be>] vvp_io_read_start+0x22e/0x450 [lustre]
[<ffffffffa057d895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa057de97>] lbug_with_loc+0x47/0xb0 [libcfs]
LustreError: 58183:0:(rw26.c:474:ll_direct_IO_26()) ASSERTION( obj->cob_transient_pages == 0 ) failed:
LustreError: 58183:0:(rw26.c:474:ll_direct_IO_26()) LBUG
Pid: 58183, comm: sio

Call Trace:
[<ffffffffa0c03044>] ll_direct_IO_26+0xb64/0x1140 [lustre]
[<ffffffffa057d895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffff81058bd3>] ? __wake_up+0x53/0x70
[<ffffffffa057de97>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffff8112158b>] generic_file_aio_read+0x6bb/0x700
[<ffffffffa06f14da>] cl_io_start+0x6a/0x140 [obdclass]
[<ffffffff81078fa7>] ? current_fs_time+0x27/0x30
[<ffffffffa0bce4d5>] ? ll_inode_size_unlock+0x15/0x20 [lustre]
[<ffffffffa06f5664>] cl_io_loop+0xb4/0x1b0 [obdclass]
[<ffffffffa0bb3f41>] ll_file_io_generic+0x481/0xaa0 [lustre]
[<ffffffffa0c0d6e5>] ? ccc_object_size_unlock+0x35/0x40 [lustre]
[<ffffffffa0bb4da0>] ll_file_aio_read+0x130/0x2b0 [lustre]
[<ffffffff811a58f1>] ? touch_atime+0x71/0x1a0
[<ffffffffa0bb5079>] ll_file_read+0x159/0x290 [lustre]
[<ffffffffa0c173be>] vvp_io_read_start+0x22e/0x450 [lustre]
[<ffffffffa0c03044>] ll_direct_IO_26+0xb64/0x1140 [lustre]
[<ffffffff811894d5>] vfs_read+0xb5/0x1a0
[<ffffffff81189802>] sys_pread64+0x82/0xa0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

[<ffffffff81058bd3>] ? __wake_up+0x53/0x70
[<ffffffff8112158b>] generic_file_aio_read+0x6bb/0x700
[<ffffffff81078fa7>] ? current_fs_time+0x27/0x30
[<ffffffffa0bce4d5>] ? ll_inode_size_unlock+0x15/0x20 [lustre]
[<ffffffffa0c0d6e5>] ? ccc_object_size_unlock+0x35/0x40 [lustre]
[<ffffffff811a58f1>] ? touch_atime+0x71/0x1a0
[<ffffffffa0c173be>] vvp_io_read_start+0x22e/0x450 [lustre]
[<ffffffffa06f14da>] cl_io_start+0x6a/0x140 [obdclass]
[<ffffffffa06f5664>] cl_io_loop+0xb4/0x1b0 [obdclass]
[<ffffffffa0bb3f41>] ll_file_io_generic+0x481/0xaa0 [lustre]
[<ffffffffa0bb4da0>] ll_file_aio_read+0x130/0x2b0 [lustre]
[<ffffffffa0bb5079>] ll_file_read+0x159/0x290 [lustre]
[<ffffffff811894d5>] vfs_read+0xb5/0x1a0
[<ffffffff81189802>] sys_pread64+0x82/0xa0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

[<ffffffffa06f14da>] cl_io_start+0x6a/0x140 [obdclass]
[<ffffffffa06f5664>] cl_io_loop+0xb4/0x1b0 [obdclass]
[<ffffffffa0bb3f41>] ll_file_io_generic+0x481/0xaa0 [lustre]
[<ffffffffa0bb4da0>] ll_file_aio_read+0x130/0x2b0 [lustre]
[<ffffffffa0bb5079>] ll_file_read+0x159/0x290 [lustre]
[<ffffffff811894d5>] vfs_read+0xb5/0x1a0
[<ffffffff81189802>] sys_pread64+0x82/0xa0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Kernel panic - not syncing: LBUG
Pid: 58184, comm: sio Not tainted 2.6.32-431.20.3.hz100.el6.x86_64 #1
Call Trace:
[<ffffffff81528067>] ? panic+0xa7/0x16f
[<ffffffffa057deeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
[<ffffffffa0c03044>] ? ll_direct_IO_26+0xb64/0x1140 [lustre]
[<ffffffff81058bd3>] ? __wake_up+0x53/0x70
[<ffffffff8112158b>] ? generic_file_aio_read+0x6bb/0x700
[<ffffffff811b4b5f>] ? list_move+0x1f/0x30
[<ffffffff811b51ff>] ? __mark_inode_dirty+0x13f/0x160
[<ffffffff811a59cd>] ? touch_atime+0x14d/0x1a0
[<ffffffffa0c173be>] ? vvp_io_read_start+0x22e/0x450 [lustre]
[<ffffffffa06f14da>] ? cl_io_start+0x6a/0x140 [obdclass]
[<ffffffffa06f5664>] ? cl_io_loop+0xb4/0x1b0 [obdclass]
[<ffffffffa0bb3f41>] ? ll_file_io_generic+0x481/0xaa0 [lustre]
[<ffffffffa0bb4da0>] ? ll_file_aio_read+0x130/0x2b0 [lustre]
[<ffffffffa0bb5079>] ? ll_file_read+0x159/0x290 [lustre]
[<ffffffff811894d5>] ? vfs_read+0xb5/0x1a0
[<ffffffff81189802>] ? sys_pread64+0x82/0xa0
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b



 Comments   
Comment by Stephen Champion [ 02/Oct/14 ]

This is easy to reproduce using sio on an 8 socket UV system with 160 threads and 4 tb of memory. I have been unable to reproduce this on a smaller configuration thus far.

root@cy024-4-sys:/mnt/cy024/schamp # ./sio -tDcw -b 2048 -A 16 -s 83865632 -g /mnt/cy024/schamp/sio.1
Warning: filesize adjusted to 83865600 blocks
<crash>

3 OSS with five OST each, stripe size -1 :

  1. lfs getstripe /mnt/cy024-mnt/schamp/sio.1
    /mnt/cy024-mnt/schamp/sio.1
    lmm_stripe_count: 15
    lmm_stripe_size: 1048576
    lmm_pattern: 1
    lmm_layout_gen: 0
    lmm_stripe_offset: 7
    obdidx objid objid group
    7 6276 0x1884 0
    11 6276 0x1884 0
    15 6276 0x1884 0
    2 6276 0x1884 0
    13 6276 0x1884 0
    9 6276 0x1884 0
    6 6276 0x1884 0
    10 6276 0x1884 0
    12 6276 0x1884 0
    4 6276 0x1884 0
    1 6276 0x1884 0
    0 6276 0x1884 0
    14 6276 0x1884 0
    5 6276 0x1884 0
    8 6276 0x1884 0

We have cores if they are useful.

Olaf Weber's notes:

There are four threads of interest, with stack traces like this showing they're doing direct IO.

PID: 58183 TASK: ffff893de6382080 CPU: 151 COMMAND: "sio"
#0 [ffff893dd1fed9b0] panic at ffffffff81528001
#1 [ffff893dd1feda30] lbug_with_loc at ffffffffa057deeb [libcfs]
#2 [ffff893dd1feda50] ll_direct_IO_26 at ffffffffa0c03044 [lustre]
#3 [ffff893dd1fedb80] generic_file_aio_read at ffffffff8112158b
#4 [ffff893dd1fedc60] vvp_io_read_start at ffffffffa0c173be [lustre]
#5 [ffff893dd1fedcd0] cl_io_start at ffffffffa06f14da [obdclass]
#6 [ffff893dd1fedd00] cl_io_loop at ffffffffa06f5664 [obdclass]
#7 [ffff893dd1fedd30] ll_file_io_generic at ffffffffa0bb3f41 [lustre]
#8 [ffff893dd1fede20] ll_file_aio_read at ffffffffa0bb4da0 [lustre]
#9 [ffff893dd1fede80] ll_file_read at ffffffffa0bb5079 [lustre]
#10 [ffff893dd1fedef0] vfs_read at ffffffff811894d5
#11 [ffff893dd1fedf30] sys_pread64 at ffffffff81189802
#12 [ffff893dd1fedf80] system_call_fastpath at ffffffff8100b072
RIP: 00007ffff7495e13 RSP: 00007ffff7ff8dd8 RFLAGS: 00010206
RAX: 0000000000000011 RBX: ffffffff8100b072 RCX: 0000000000000000
RDX: 0000000000800000 RSI: 00007ffff699c000 RDI: 0000000000000003
RBP: 0000000000606240 R8: 00007ffff7ff9700 R9: 00007ffff7ff9700
R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffff7ff9700
R13: 0000000000000003 R14: 000000000061f3e8 R15: 0000000000000003
ORIG_RAX: 0000000000000011 CS: 0033 SS: 002b

Approximately they're here:

lustre/llite/rw26.c
373 static ssize_t ll_direct_IO_26(int rw, struct kiocb *iocb,
374 const struct iovec *iov, loff_t file_offset,
375 unsigned long nr_segs)
376 {
377 struct lu_env *env;
378 struct cl_io *io;
379 struct file *file = iocb->ki_filp;
380 struct inode *inode = file->f_mapping->host;
381 struct ccc_object *obj = cl_inode2ccc(inode);
382 long count = iov_length(iov, nr_segs);
...
433 page_count = ll_get_user_pages(rw, user_addr, bytes,
434 &pages, &max_pages);
435 if (likely(page_count > 0)) {
436 if (unlikely(page_count < max_pages))
437 bytes = page_count << PAGE_CACHE_SHIFT;
438 result = ll_direct_IO_26_seg(env, io, rw, inode,
439 file->f_mapping,
440 bytes, file_offset,
441 pages, page_count);
442 ll_free_user_pages(pages, max_pages, rw==READ);
...
474 out:

  • 475 LASSERT(obj->cob_transient_pages == 0);

They are all referencing the same obj

crash> ccc_object.cob_transient_pages ffff88bdd31dd078
cob_transient_pages = 36

Looking at how cob_transient_pages is manipulated:

lustre/include/lclient.h:
194 /**
195 * ccc-private object state.
196 */
197 struct ccc_object {
...
210 /**
211 * Access this counter is protected by inode->i_sem. Now that
212 * the lifetime of transient pages must be covered by inode sem,
213 * we don't need to hold any lock..
214 */
215 int cob_transient_pages;

Protected by i_sem? A very bad sign, as i_sem has been replaced by i_mutex for
quite some time now, and the i_mutex is not held across a read operation. Lustre
would have to do this itself.

Which points to commit ed5ebb87bfc2b684958daac90c4369f395482a16, part of which is this:

diff --git a/lustre/llite/rw26.c b/lustre/llite/rw26.c
index b605fa4..b9c8293 100644
— a/lustre/llite/rw26.c
+++ b/lustre/llite/rw26.c
@@ -412,13 +412,6 @@ static ssize_t ll_direct_IO_26(int rw, struct kiocb *iocb,
io = ccc_env_io(env)->cui_cl.cis_io;
LASSERT(io != NULL);

  • /* 0. Need locking between buffered and direct access. and race with
  • * size changing by concurrent truncates and writes.
  • * 1. Need inode mutex to operate transient pages.
  • */
  • if (rw == READ)
  • mutex_lock(&inode->i_mutex);
    -
    LASSERT(obj->cob_transient_pages == 0);
    for (seg = 0; seg < nr_segs; seg++) { long iov_left = iov[seg].iov_len; @@ -480,8 +473,6 @@ static ssize_t ll_direct_IO_26(int rw, struct kiocb *iocb, }

    out:
    LASSERT(obj->cob_transient_pages == 0);

  • if (rw == READ)
  • mutex_unlock(&inode->i_mutex);

if (tot_bytes > 0) {
struct ccc_io *cio = ccc_env_io(env);

Comment by Olaf Weber [ 02/Oct/14 ]

A bit more info from the commit header: the change was LU-1669

commit ed5ebb87bfc2b684958daac90c4369f395482a16
Author: Prakash Surya <surya1@llnl.gov>
Date: Wed Oct 2 17:16:51 2013 -0700

LU-1669 vvp: Use lockless __generic_file_aio_write
...
Change-Id: I0023132b5d941b3304f39f015f95106542998072
Reviewed-on: http://review.whamcloud.com/6672

Comment by Peter Jones [ 02/Oct/14 ]

Lai

Could you please advise on this issue?

Thanks

Peter

Comment by Stephen Champion [ 03/Oct/14 ]

Our test system had another crash with normal (not async) DIO last night. This led to

http://review.whamcloud.com/#/c/12179/

Which lets us get on with the performance evaluation we were doing.

The patch removes the assertions and makes the counter atomic. We are not convinced that this section is safe for multiple threads - it needs to be considered carefully. The patch should not introduce any new bugs, but may expose something by not crashing.

Comment by John Fuchs-Chesney (Inactive) [ 20/Oct/14 ]

Stephen,
Your patch 12179 has been merged into master (October 10).

Do you need any more work on this ticket, or can I mark it as resolved, fixed?

Thanks,
~ jfc.

Comment by Stephen Champion [ 20/Oct/14 ]

Works for now.

We're not entirely convinced that this section of code is safe for reentry, but this clearly fixed one problem, and we can open a new ticket if we identify another.

Comment by Peter Jones [ 20/Oct/14 ]

ok thanks Steve

Generated at Sat Feb 10 01:53:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.