Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2910

racer test_1, sanity test_54c: ASSERTION( cio->cui_tot_nrsegs >= cio->cui_nrsegs ) failed

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 7001

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/3fa2a912-85b2-11e2-9f8d-52540035b04c.

      The sub-test test_1 failed with the following error:

      test failed to respond and timed out

      client console shows:

      18:47:35:Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-22-ib,client-23-ib.lab.whamcloud.com DURATION=900 == 18:47:28 (1362451648)
      18:47:35:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 
      18:47:35:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
      18:51:19:LustreError: 8649:0:(file.c:2601:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000bd0:0x9095:0x0] error: rc = -116
      18:56:24:LustreError: 28058:0:(lcommon_cl.c:821:ccc_io_advance()) ASSERTION( cio->cui_tot_nrsegs >= cio->cui_nrsegs ) failed: 
      18:56:24:LustreError: 28058:0:(lcommon_cl.c:821:ccc_io_advance()) LBUG
      18:56:24:Pid: 28058, comm: cat
      18:56:24:
      18:56:24:Call Trace:
      18:56:24: [<ffffffffa03fe895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      18:56:24: [<ffffffffa03fee97>] lbug_with_loc+0x47/0xb0 [libcfs]
      18:56:24: [<ffffffffa1264c0e>] ccc_io_advance+0x11e/0x190 [lustre]
      18:56:24: [<ffffffffa05a96d6>] cl_io_rw_advance+0x66/0x150 [obdclass]
      18:56:24: [<ffffffffa05ae602>] cl_io_loop+0xe2/0x1b0 [obdclass]
      18:56:24: [<ffffffffa1215323>] ll_file_io_generic+0x223/0x570 [lustre]
      18:56:24: [<ffffffffa12157af>] ll_file_aio_read+0x13f/0x2c0 [lustre]
      18:56:25: [<ffffffffa121635c>] ll_file_read+0x16c/0x2a0 [lustre]
      18:56:25: [<ffffffff81176cb5>] vfs_read+0xb5/0x1a0
      18:56:25: [<ffffffff81176df1>] sys_read+0x51/0x90
      18:56:25: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      18:56:25:
      18:56:25:Kernel panic - not syncing: LBUG
      18:56:25:Pid: 28058, comm: cat Not tainted 2.6.32-279.19.1.el6.x86_64 #1
      18:56:25:Call Trace:
      18:56:25: [<ffffffff814e9541>] ? panic+0xa0/0x168
      18:56:25: [<ffffffffa03feeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      18:56:25: [<ffffffffa1264c0e>] ? ccc_io_advance+0x11e/0x190 [lustre]
      18:56:25: [<ffffffffa05a96d6>] ? cl_io_rw_advance+0x66/0x150 [obdclass]
      18:56:25: [<ffffffffa05ae602>] ? cl_io_loop+0xe2/0x1b0 [obdclass]
      18:56:25: [<ffffffffa1215323>] ? ll_file_io_generic+0x223/0x570 [lustre]
      18:56:25: [<ffffffffa12157af>] ? ll_file_aio_read+0x13f/0x2c0 [lustre]
      18:56:25: [<ffffffffa121635c>] ? ll_file_read+0x16c/0x2a0 [lustre]
      18:56:25: [<ffffffff81176cb5>] ? vfs_read+0xb5/0x1a0
      18:56:25: [<ffffffff81176df1>] ? sys_read+0x51/0x90
      18:56:26: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
      18:56:26:Initializing cgroup subsys cpuset
      18:56:26:Initializing cgroup subsys cpu
      

      Attachments

        Activity

          [LU-2910] racer test_1, sanity test_54c: ASSERTION( cio->cui_tot_nrsegs >= cio->cui_nrsegs ) failed

          ah no, i didn't

          jay Jinshan Xiong (Inactive) added a comment - ah no, i didn't
          sarah Sarah Liu added a comment -

          Xiong, have you added the file into Makefile.am? If no, then it will not be built into the rpms

          sarah Sarah Liu added a comment - Xiong, have you added the file into Makefile.am? If no, then it will not be built into the rpms

          Are you in master tree? Try to run the following command and see what you get:

          [jinxiong@intel lustre]$ git annotate tests/racer/file_swap.sh
          4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 1)#!/bin/bash
          4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 2)
          4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 3)DIR=$1
          4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 4)MAX=$2
          4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 5)
          4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 6)while : ; do

          jay Jinshan Xiong (Inactive) added a comment - Are you in master tree? Try to run the following command and see what you get: [jinxiong@intel lustre] $ git annotate tests/racer/file_swap.sh 4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 1)#!/bin/bash 4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 2) 4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 3)DIR=$1 4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 4)MAX=$2 4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 5) 4af3ab19 (Jinshan Xiong 2013-02-01 10:33:09 -0800 6)while : ; do
          sarah Sarah Liu added a comment -

          I ran this test on Friday 4 times and cannot reproduce it on my test nodes. BTW Xiong, I cannot find file_swap.sh under tests/racer, could you please point me out? Thanks.

          sarah Sarah Liu added a comment - I ran this test on Friday 4 times and cannot reproduce it on my test nodes. BTW Xiong, I cannot find file_swap.sh under tests/racer, could you please point me out? Thanks.

          Xiong, in the racer test, file size could be changed anytime (mostly from rename, I think), so it's possible trigger io restart when ccc_prep_size() thinking that the ppos is beyond of file size, isn't it?

          From the current code, seems we support only the whole io restart when there isn't any bit of read/write done yet, we don't support io restart when partial io done, right?

          niu Niu Yawei (Inactive) added a comment - Xiong, in the racer test, file size could be changed anytime (mostly from rename, I think), so it's possible trigger io restart when ccc_prep_size() thinking that the ppos is beyond of file size, isn't it? From the current code, seems we support only the whole io restart when there isn't any bit of read/write done yet, we don't support io restart when partial io done, right?

          Now that the log is missed, I'd like to wait for a while to see if this issue can be reproduced. I'd like to make it clear before making a fix.

          jay Jinshan Xiong (Inactive) added a comment - Now that the log is missed, I'd like to wait for a while to see if this issue can be reproduced. I'd like to make it clear before making a fix.

          I'm thinking in which situation it would make IO restart. Also it may be incorrect to restart whole IO in some circumstances.

          jay Jinshan Xiong (Inactive) added a comment - I'm thinking in which situation it would make IO restart. Also it may be incorrect to restart whole IO in some circumstances.

          Xiong, I think the file_swap.sh should be disabled by default. Sarah knows it better.

          niu Niu Yawei (Inactive) added a comment - Xiong, I think the file_swap.sh should be disabled by default. Sarah knows it better.

          is file_swap.sh enabled in this run?

          jay Jinshan Xiong (Inactive) added a comment - is file_swap.sh enabled in this run?

          Looks the LASSERT could be triggered when we restart io with not restored iovector: http://review.whamcloud.com/5652

          niu Niu Yawei (Inactive) added a comment - Looks the LASSERT could be triggered when we restart io with not restored iovector: http://review.whamcloud.com/5652

          The client debug log in above maloo link is unfortunately missed, Sarah, could you try to reproduce it and collect the client debug log (with D_VFSTRACE enabled)? Thanks in advance.

          niu Niu Yawei (Inactive) added a comment - The client debug log in above maloo link is unfortunately missed, Sarah, could you try to reproduce it and collect the client debug log (with D_VFSTRACE enabled)? Thanks in advance.

          People

            niu Niu Yawei (Inactive)
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: