Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16616

crash in osc_brw_prep_request() ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_SIZE) ...

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      An assertion in the Lustre client is triggered by the following calls, which leads to a kernel crash during executing the write syscall shown below:
       
       // Strace
      open("./file1", O_RDWR|O_CREAT|O_EXCL|O_TRUNC|O_DIRECT|O_LARGEFILE|O_NOFOLLOW, 000) = 3
      write(3, "#! ./file1\n}!F\270\v\21k;{\310T+z\311-\311@:\312\27S"..., 3595

      Attachments

        Issue Links

          Activity

            [LU-16616] crash in osc_brw_prep_request() ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_SIZE) ...
            panda Andrew Perepechko added a comment - - edited

            I'm currently playing with smth like

             static bool ll_iov_iter_aligned(const struct iov_iter *i)
            {
                    const struct iovec *iov = i->iov;
                    size_t seg, size = i->count;
            
                    for (seg = 0; seg < i->nr_segs; ++seg) {
                            uintptr_t base = (uintptr_t)iov[seg].iov_base;
                            size_t len = iov[seg].iov_len;
            
                            if (seg == 0) {
                                    BUG_ON(len < i->iov_offset);
                                    base += i->iov_offset;
                                    len  -= i->iov_offset;
                            }
            
                            if (base & ~PAGE_MASK)
                                    return false;
            
                            if (size > len)
                                    size -= len;
                            else
                                    break;
                    }
            
                    return true;
            
            }  @@ -492,7 +483,7 @@ ll_direct_IO_impl(struct kiocb *iocb, struct iov_iter *iter, int rw)
                           MAX_DIO_SIZE >> PAGE_SHIFT);
             
                    /* Check that all user buffers are aligned as well */
            -       if (ll_iov_iter_alignment(iter) & ~PAGE_MASK)
            +       if (!ll_iov_iter_aligned(iter))
                            RETURN(-EINVAL);
              
                    lcc = ll_cl_find(inode);

            It needs to be ported to EL9+. I'll update the ticket when the fix is ready.

            panda Andrew Perepechko added a comment - - edited I'm currently playing with smth like static bool ll_iov_iter_aligned( const struct iov_iter *i) {         const struct iovec *iov = i->iov;        size_t seg, size = i->count;         for (seg = 0; seg < i->nr_segs; ++seg) {                uintptr_t base = (uintptr_t)iov[seg].iov_base;                size_t len = iov[seg].iov_len;                 if (seg == 0) {                        BUG_ON(len < i->iov_offset);                        base += i->iov_offset;                        len  -= i->iov_offset;                }                 if (base & ~PAGE_MASK)                         return false ;                 if (size > len)                        size -= len;                 else                         break ;        }         return true ; }  @@ -492,7 +483,7 @@ ll_direct_IO_impl(struct kiocb *iocb, struct iov_iter *iter, int rw)               MAX_DIO_SIZE >> PAGE_SHIFT);           /* Check that all user buffers are aligned as well */ -        if (ll_iov_iter_alignment(iter) & ~PAGE_MASK) +        if (!ll_iov_iter_aligned(iter))                RETURN(-EINVAL);          lcc = ll_cl_find(inode); It needs to be ported to EL9+. I'll update the ticket when the fix is ready.
            paf0186 Patrick Farrell added a comment -

            Interesting, OK - do you have a fix for this?  2.15 should have UDIO support but it might be broken in some limited way.  There's also PCC involved here, I think?

            paf0186 Patrick Farrell added a comment - Interesting, OK - do you have a fix for this?  2.15 should have UDIO support but it might be broken in some limited way.  There's also PCC involved here, I think?
            panda Andrew Perepechko added a comment -

            the following check at least in Lustre 2.15 is not correct:

            static unsigned long ll_iov_iter_alignment(struct iov_iter *i)
            
            {
            
            ...
                    res = iov_iter_alignment_vfs(i);
                    /* start address is page aligned */
                    if ((res & ~PAGE_MASK) == orig_size)
                            return PAGE_SIZE;
            
                    return res;
            }
             

            Since "start address is page aligned" is not always true, it's possible that this check passes when it shouldn't.

            iov_iter_alignment() OR's userspace bufs addresses and lengths. It's possible that the userspace buf has a bit in the ~PAGE_MASK set which is also set in the orig_size. Then the check passes.

            Example:  the userspace buffer is @0000561f0deb5800, reading [0:0xc0b] from file,

            here 0x800 is a bit subset of 0xc0b so "((res & ~PAGE_MASK) == orig_size)" check will pass, and Lustre versions without unaligned DIO support will crash with the stack trace above.

            panda Andrew Perepechko added a comment - the following check at least in Lustre 2.15 is not correct: static unsigned long ll_iov_iter_alignment(struct iov_iter *i) { ...        res = iov_iter_alignment_vfs(i);         /* start address is page aligned */         if ((res & ~PAGE_MASK) == orig_size)                 return PAGE_SIZE;         return res; } Since "start address is page aligned" is not always true, it's possible that this check passes when it shouldn't. iov_iter_alignment() OR's userspace bufs addresses and lengths. It's possible that the userspace buf has a bit in the ~PAGE_MASK set which is also set in the orig_size. Then the check passes. Example:  the userspace buffer is @0000561f0deb5800, reading [0:0xc0b] from file, here 0x800 is a bit subset of 0xc0b so "((res & ~PAGE_MASK) == orig_size)" check will pass, and Lustre versions without unaligned DIO support will crash with the stack trace above.
            tao.lyu Tao Lyu added a comment - - edited

            Hi Patrick,

            Would you mind debugging and fixing this bug?
            Thanks!

            Best,
            Tao

            tao.lyu Tao Lyu added a comment - - edited Hi Patrick, Would you mind debugging and fixing this bug? Thanks! Best, Tao
            tao.lyu Tao Lyu added a comment -

            Sure, glad to help collect the debug logs.

            tao.lyu Tao Lyu added a comment - Sure, glad to help collect the debug logs.

            Tao,

            Ahh, thank you for explaining.  OK - I'll have to set up an Ubuntu VM on my end, the poc works fine on my RHEL kernel based system.  It is probably a difference in kernel versions.

            Is it practical for you to collect debug logs from the client if I give you instructions?  It's not particularly difficult.

            paf0186 Patrick Farrell added a comment - Tao, Ahh, thank you for explaining.  OK - I'll have to set up an Ubuntu VM on my end, the poc works fine on my RHEL kernel based system.  It is probably a difference in kernel versions. Is it practical for you to collect debug logs from the client if I give you instructions?  It's not particularly difficult.
            tao.lyu Tao Lyu added a comment -

            Hi, Patrick,

            We are developing a bug-finding tool for distributed systems. In order to detect the newest bugs, we run the latest developing version.

            Yes, this is generated by our tool. It's for directly call syscalls instead of going to libc.
            Here is the strace results (I also posted in above description part):
            // Strace
            open("./file1", O_RDWR|O_CREAT|O_EXCL|O_TRUNC|O_DIRECT|O_LARGEFILE|O_NOFOLLOW, 000) = 3
            write(3, "#! ./file1\n}!F\270\v\21k;{\310T+z\311-\311@:\312\27S"..., 3595

            tao.lyu Tao Lyu added a comment - Hi, Patrick, We are developing a bug-finding tool for distributed systems. In order to detect the newest bugs, we run the latest developing version. Yes, this is generated by our tool. It's for directly call syscalls instead of going to libc. Here is the strace results (I also posted in above description part): // Strace open("./file1", O_RDWR|O_CREAT|O_EXCL|O_TRUNC|O_DIRECT|O_LARGEFILE|O_NOFOLLOW, 000) = 3 write(3, "#! ./file1\n}!F\270\v\21k;{\310T+z\311-\311@:\312\27S"..., 3595

            Tao,

            With the poc - it looks like maybe you converted that directly from the strace using some tool?  It's very odd to see "syscall(__NR_mmap, ... )" rather than mmap().  Are you able to replace any part of the oic with text and/or symbolic representations instead of all those hex values?  For example, it creates a file called 'tmpfile', but that string appears nowhere in poc.c, so I assume it must be encoded in there.

            paf0186 Patrick Farrell added a comment - Tao, With the poc - it looks like maybe you converted that directly from the strace using some tool?  It's very odd to see "syscall(__NR_mmap, ... )" rather than mmap().  Are you able to replace any part of the oic with text and/or symbolic representations instead of all those hex values?  For example, it creates a file called 'tmpfile', but that string appears nowhere in poc.c, so I assume it must be encoded in there.

            Tao,

            Why are you running based on 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7 ?  Are you intending to test an unreleased version?  That's a recent-ish pull of our development branch.  We appreciate if people want to test it, but we don't recommend it for production.  Our current public maintenance release is b2_15.

            paf0186 Patrick Farrell added a comment - Tao, Why are you running based on 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7 ?  Are you intending to test an unreleased version?  That's a recent-ish pull of our development branch.  We appreciate if people want to test it, but we don't recommend it for production.  Our current public maintenance release is b2_15.
            tao.lyu Tao Lyu added a comment - - edited

            Sure.

            Lustre commit: 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7

            The stack trace:

            root@dfs:~# [  154.265547] LustreError: 298:0:(osc_request.c:1819:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 0/2 pg: 000000005a02f487 off: 0, count: 3595
            [  154.268801] LustreError: 298:0:(osc_request.c:1819:osc_brw_prep_request()) LBUG
            [  154.269714] Kernel panic - not syncing: LBUG
            [  154.270224] CPU: 3 PID: 298 Comm: ptlrpcd_00_00 Tainted: G           O      5.4.148+ #7
            [  154.271135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
            [  154.272152] Call Trace:
            [  154.272455]  dump_stack+0x50/0x63
            [  154.272875]  panic+0xfb/0x2bc
            [  154.274264]  lbug_with_loc.cold+0x2c/0x2c [libcfs]
            [  154.275445]  osc_brw_prep_request+0x5214/0x6d20 [osc]
            [  154.280285]  osc_build_rpc+0x1487/0x3770 [osc]
            [  154.281284]  osc_io_unplug0+0x2f0d/0x5110 [osc]
            [  154.286077]  brw_queue_work+0xbe/0x220 [osc]
            [  154.287007]  work_interpreter+0xb3/0x340 [ptlrpc]
            [  154.287904]  ptlrpc_check_set+0x1244/0x7a90 [ptlrpc]
            [  154.291430]  ptlrpcd+0x1296/0x23c0 [ptlrpc]
            [  154.298351]  kthread+0xfb/0x130
            [  154.299703]  ret_from_fork+0x1f/0x40
            [  154.300279] Kernel Offset: 0xa400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
            [  154.301513] ---[ end Kernel panic - not syncing: LBUG ]---
            
            tao.lyu Tao Lyu added a comment - - edited Sure. Lustre commit: 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7 The stack trace: root@dfs:~# [ 154.265547] LustreError: 298:0:(osc_request.c:1819:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 0/2 pg: 000000005a02f487 off: 0, count: 3595 [ 154.268801] LustreError: 298:0:(osc_request.c:1819:osc_brw_prep_request()) LBUG [ 154.269714] Kernel panic - not syncing: LBUG [ 154.270224] CPU: 3 PID: 298 Comm: ptlrpcd_00_00 Tainted: G O 5.4.148+ #7 [ 154.271135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 154.272152] Call Trace: [ 154.272455] dump_stack+0x50/0x63 [ 154.272875] panic+0xfb/0x2bc [ 154.274264] lbug_with_loc.cold+0x2c/0x2c [libcfs] [ 154.275445] osc_brw_prep_request+0x5214/0x6d20 [osc] [ 154.280285] osc_build_rpc+0x1487/0x3770 [osc] [ 154.281284] osc_io_unplug0+0x2f0d/0x5110 [osc] [ 154.286077] brw_queue_work+0xbe/0x220 [osc] [ 154.287007] work_interpreter+0xb3/0x340 [ptlrpc] [ 154.287904] ptlrpc_check_set+0x1244/0x7a90 [ptlrpc] [ 154.291430] ptlrpcd+0x1296/0x23c0 [ptlrpc] [ 154.298351] kthread+0xfb/0x130 [ 154.299703] ret_from_fork+0x1f/0x40 [ 154.300279] Kernel Offset: 0xa400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 154.301513] ---[ end Kernel panic - not syncing: LBUG ]---

            People

              wc-triage WC Triage
              tao.lyu Tao Lyu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: