Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16616

crash in osc_brw_prep_request() ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_SIZE) ...

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      An assertion in the Lustre client is triggered by the following calls, which leads to a kernel crash during executing the write syscall shown below:
       
       // Strace
      open("./file1", O_RDWR|O_CREAT|O_EXCL|O_TRUNC|O_DIRECT|O_LARGEFILE|O_NOFOLLOW, 000) = 3
      write(3, "#! ./file1\n}!F\270\v\21k;{\310T+z\311-\311@:\312\27S"..., 3595

      Attachments

        Issue Links

          Activity

            [LU-16616] crash in osc_brw_prep_request() ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_SIZE) ...
            tao.lyu Tao Lyu added a comment - - edited

            Hi Patrick,

            Would you mind debugging and fixing this bug?
            Thanks!

            Best,
            Tao

            tao.lyu Tao Lyu added a comment - - edited Hi Patrick, Would you mind debugging and fixing this bug? Thanks! Best, Tao
            tao.lyu Tao Lyu added a comment -

            Sure, glad to help collect the debug logs.

            tao.lyu Tao Lyu added a comment - Sure, glad to help collect the debug logs.

            Tao,

            Ahh, thank you for explaining.  OK - I'll have to set up an Ubuntu VM on my end, the poc works fine on my RHEL kernel based system.  It is probably a difference in kernel versions.

            Is it practical for you to collect debug logs from the client if I give you instructions?  It's not particularly difficult.

            paf0186 Patrick Farrell added a comment - Tao, Ahh, thank you for explaining.  OK - I'll have to set up an Ubuntu VM on my end, the poc works fine on my RHEL kernel based system.  It is probably a difference in kernel versions. Is it practical for you to collect debug logs from the client if I give you instructions?  It's not particularly difficult.
            tao.lyu Tao Lyu added a comment -

            Hi, Patrick,

            We are developing a bug-finding tool for distributed systems. In order to detect the newest bugs, we run the latest developing version.

            Yes, this is generated by our tool. It's for directly call syscalls instead of going to libc.
            Here is the strace results (I also posted in above description part):
            // Strace
            open("./file1", O_RDWR|O_CREAT|O_EXCL|O_TRUNC|O_DIRECT|O_LARGEFILE|O_NOFOLLOW, 000) = 3
            write(3, "#! ./file1\n}!F\270\v\21k;{\310T+z\311-\311@:\312\27S"..., 3595

            tao.lyu Tao Lyu added a comment - Hi, Patrick, We are developing a bug-finding tool for distributed systems. In order to detect the newest bugs, we run the latest developing version. Yes, this is generated by our tool. It's for directly call syscalls instead of going to libc. Here is the strace results (I also posted in above description part): // Strace open("./file1", O_RDWR|O_CREAT|O_EXCL|O_TRUNC|O_DIRECT|O_LARGEFILE|O_NOFOLLOW, 000) = 3 write(3, "#! ./file1\n}!F\270\v\21k;{\310T+z\311-\311@:\312\27S"..., 3595

            Tao,

            With the poc - it looks like maybe you converted that directly from the strace using some tool?  It's very odd to see "syscall(__NR_mmap, ... )" rather than mmap().  Are you able to replace any part of the oic with text and/or symbolic representations instead of all those hex values?  For example, it creates a file called 'tmpfile', but that string appears nowhere in poc.c, so I assume it must be encoded in there.

            paf0186 Patrick Farrell added a comment - Tao, With the poc - it looks like maybe you converted that directly from the strace using some tool?  It's very odd to see "syscall(__NR_mmap, ... )" rather than mmap().  Are you able to replace any part of the oic with text and/or symbolic representations instead of all those hex values?  For example, it creates a file called 'tmpfile', but that string appears nowhere in poc.c, so I assume it must be encoded in there.

            Tao,

            Why are you running based on 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7 ?  Are you intending to test an unreleased version?  That's a recent-ish pull of our development branch.  We appreciate if people want to test it, but we don't recommend it for production.  Our current public maintenance release is b2_15.

            paf0186 Patrick Farrell added a comment - Tao, Why are you running based on 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7 ?  Are you intending to test an unreleased version?  That's a recent-ish pull of our development branch.  We appreciate if people want to test it, but we don't recommend it for production.  Our current public maintenance release is b2_15.
            tao.lyu Tao Lyu added a comment - - edited

            Sure.

            Lustre commit: 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7

            The stack trace:

            root@dfs:~# [  154.265547] LustreError: 298:0:(osc_request.c:1819:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 0/2 pg: 000000005a02f487 off: 0, count: 3595
            [  154.268801] LustreError: 298:0:(osc_request.c:1819:osc_brw_prep_request()) LBUG
            [  154.269714] Kernel panic - not syncing: LBUG
            [  154.270224] CPU: 3 PID: 298 Comm: ptlrpcd_00_00 Tainted: G           O      5.4.148+ #7
            [  154.271135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
            [  154.272152] Call Trace:
            [  154.272455]  dump_stack+0x50/0x63
            [  154.272875]  panic+0xfb/0x2bc
            [  154.274264]  lbug_with_loc.cold+0x2c/0x2c [libcfs]
            [  154.275445]  osc_brw_prep_request+0x5214/0x6d20 [osc]
            [  154.280285]  osc_build_rpc+0x1487/0x3770 [osc]
            [  154.281284]  osc_io_unplug0+0x2f0d/0x5110 [osc]
            [  154.286077]  brw_queue_work+0xbe/0x220 [osc]
            [  154.287007]  work_interpreter+0xb3/0x340 [ptlrpc]
            [  154.287904]  ptlrpc_check_set+0x1244/0x7a90 [ptlrpc]
            [  154.291430]  ptlrpcd+0x1296/0x23c0 [ptlrpc]
            [  154.298351]  kthread+0xfb/0x130
            [  154.299703]  ret_from_fork+0x1f/0x40
            [  154.300279] Kernel Offset: 0xa400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
            [  154.301513] ---[ end Kernel panic - not syncing: LBUG ]---
            
            tao.lyu Tao Lyu added a comment - - edited Sure. Lustre commit: 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7 The stack trace: root@dfs:~# [ 154.265547] LustreError: 298:0:(osc_request.c:1819:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 0/2 pg: 000000005a02f487 off: 0, count: 3595 [ 154.268801] LustreError: 298:0:(osc_request.c:1819:osc_brw_prep_request()) LBUG [ 154.269714] Kernel panic - not syncing: LBUG [ 154.270224] CPU: 3 PID: 298 Comm: ptlrpcd_00_00 Tainted: G O 5.4.148+ #7 [ 154.271135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 154.272152] Call Trace: [ 154.272455] dump_stack+0x50/0x63 [ 154.272875] panic+0xfb/0x2bc [ 154.274264] lbug_with_loc.cold+0x2c/0x2c [libcfs] [ 154.275445] osc_brw_prep_request+0x5214/0x6d20 [osc] [ 154.280285] osc_build_rpc+0x1487/0x3770 [osc] [ 154.281284] osc_io_unplug0+0x2f0d/0x5110 [osc] [ 154.286077] brw_queue_work+0xbe/0x220 [osc] [ 154.287007] work_interpreter+0xb3/0x340 [ptlrpc] [ 154.287904] ptlrpc_check_set+0x1244/0x7a90 [ptlrpc] [ 154.291430] ptlrpcd+0x1296/0x23c0 [ptlrpc] [ 154.298351] kthread+0xfb/0x130 [ 154.299703] ret_from_fork+0x1f/0x40 [ 154.300279] Kernel Offset: 0xa400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 154.301513] ---[ end Kernel panic - not syncing: LBUG ]---

            Tao,

            Can you please share the crash messages as well?  Specifically the stack trace and LBUG.  Also, what Lustre version are you running?  (The poc is appreciated, but we need some more general info.)

            paf0186 Patrick Farrell added a comment - Tao, Can you please share the crash messages as well?  Specifically the stack trace and LBUG.  Also, what Lustre version are you running?  (The poc is appreciated, but we need some more general info.)

            People

              wc-triage WC Triage
              tao.lyu Tao Lyu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: