Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2909

Failure on test suite sanity-benchmark test_fsx: fsx bus error, core dumped

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 6999

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/5a852a8a-85b1-11e2-9f8d-52540035b04c.

      The sub-test test_fsx failed with the following error:

      fsx failed

      truncating to largest ever: 0x25701f
      018000 1362442021.951758 trunc from 0x169dd8  to  0x2e6079	(0x17c2a2 bytes)
      019000 1362442023.494191 mapwrite   0x1dd935 thru 0x1ec6de	(0xedaa bytes)
      020000 1362442025.029135 read       0x17d39e thru 0x1848a8	(0x750b bytes)
      021000 1362442026.608768 mapwrite   0x21696e thru 0x223441	(0xcad4 bytes)
      022000 1362442028.147407 write      0x065ab3 thru 0x071b3c	(0xc08a bytes)
      023000 1362442029.708216 write      0x142a88 thru 0x149749	(0x6cc2 bytes)
      024000 1362442031.298952 trunc from 0x16cef5  to  0x19c5bf	(0x2f6cb bytes)
      025000 1362442032.760608 mapread    0x050a4e thru 0x05ea37	(0xdfea bytes)
      026000 1362442034.337654 write      0x100d42 thru 0x10c4b8	(0xb777 bytes)
      027000 1362442035.889125 mapread    0x0a0142 thru 0x0a0412	(0x2d1 bytes)
      028000 1362442037.474502 read       0x1556d1 thru 0x15c086	(0x69b6 bytes)
      029000 1362442039.086174 write      0x009b42 thru 0x00e01c	(0x44db bytes)
      030000 1362442040.692829 mapwrite   0x18135d thru 0x184f86	(0x3c2a bytes)
      031000 1362442042.344050 write      0x142774 thru 0x14e54f	(0xbddc bytes)
      032000 1362442043.844112 mapwrite   0x0a2628 thru 0x0a295f	(0x338 bytes)
      033000 1362442045.438181 write      0x18f5c4 thru 0x19c18c	(0xcbc9 bytes)
      034000 1362442047.032646 trunc from 0x1e89cc  to  0x3b16df	(0x1c8d14 bytes)
      035000 1362442048.552844 write      0x1e22ba thru 0x1eb190	(0x8ed7 bytes)
      036000 1362442050.084864 read       0x0abc33 thru 0x0b5f8c	(0xa35a bytes)
      037000 1362442051.712007 trunc from 0x0af05a  to  0x0b8740	(0x96e7 bytes)
      038000 1362442053.320923 mapread    0x055a8e thru 0x05a120	(0x4693 bytes)
      039000 1362442054.897862 write      0x236d76 thru 0x23733f	(0x5ca bytes)
      040000 1362442056.462382 mapread    0x116f2e thru 0x12395c	(0xca2f bytes)
      041000 1362442058.009744 mapwrite   0x1493c5 thru 0x156198	(0xcdd4 bytes)
      042000 1362442059.557940 write      0x1cc6db thru 0x1cf588	(0x2eae bytes)
      /usr/lib64/lustre/tests/sanity-benchmark.sh: line 186: 22372 Bus error               (core dumped) $CMD
       sanity-benchmark test_fsx: @@@@@@ FAIL: fsx failed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:3973:error_noexit()
        = /usr/lib64/lustre/tests/test-framework.sh:3996:error()
        = /usr/lib64/lustre/tests/sanity-benchmark.sh:199:test_fsx()
        = /usr/lib64/lustre/tests/test-framework.sh:4251:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:4284:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4139:run_test()
        = /usr/lib64/lustre/tests/sanity-benchmark.sh:203:main()
      Dumping lctl log to /logdir/test_logs/2013-03-04/lustre-master-el6-x86_64--full--2_1_1__1286__-70011898087180-120645/sanity-benchmark.test_fsx.*.1362442060.log
      CMD: client-20-ib,client-21-ib,client-22-ib,client-23-ib.lab.whamcloud.com /usr/sbin/lctl dk > /logdir/test_logs/2013-03-04/lustre-master-el6-x86_64--full--2_1_1__1286__-70011898087180-120645/sanity-benchmark.test_fsx.debug_log.\$(hostname -s).1362442060.log;
               dmesg > /logdir/test_logs/2013-03-04/lustre-master-el6-x86_64--full--2_1_1__1286__-70011898087180-120645/sanity-benchmark.test_fsx.dmesg.\$(hostname -s).1362442060.log
      

      Attachments

        Issue Links

          Activity

            [LU-2909] Failure on test suite sanity-benchmark test_fsx: fsx bus error, core dumped

            Hi, Keith, could you rerun the test with larger obd_max_dirty_pages to see if the situation is alleviated? If the test confirmed the root cause, we may take the next step to consider how to fix it. Thanks.

            niu Niu Yawei (Inactive) added a comment - Hi, Keith, could you rerun the test with larger obd_max_dirty_pages to see if the situation is alleviated? If the test confirmed the root cause, we may take the next step to consider how to fix it. Thanks.

            I think we wait if there are BRWs in flight (which might return grant space back). However, there is still the case where there is no bulk write in flight. That's exactly for this problem that, when Jinshan implemented support for mkwrite, i suggested to add a new RPC type (or an "empty" write) to request grant from OSTs.

            johann Johann Lombardi (Inactive) added a comment - I think we wait if there are BRWs in flight (which might return grant space back). However, there is still the case where there is no bulk write in flight. That's exactly for this problem that, when Jinshan implemented support for mkwrite, i suggested to add a new RPC type (or an "empty" write) to request grant from OSTs.

            in that case there should be a blocking point somewhere, so that we do not return to the kernel with an error?

            bzzz Alex Zhuravlev added a comment - in that case there should be a blocking point somewhere, so that we do not return to the kernel with an error?
            niu Niu Yawei (Inactive) added a comment - - edited

            Alex, osc_enter_cache() may return -EDQUOT in two cases:

            • out of grant;
            • too many dirty pages (in osc_enter_cache(), we can only trigger dirty flush on current osc but not the all oscs, so the global dirty max could be exceeded);

            I guess the test failed for the second case.

            niu Niu Yawei (Inactive) added a comment - - edited Alex, osc_enter_cache() may return -EDQUOT in two cases: out of grant; too many dirty pages (in osc_enter_cache(), we can only trigger dirty flush on current osc but not the all oscs, so the global dirty max could be exceeded); I guess the test failed for the second case.

            how client may have space granted while the server is out of space?

            bzzz Alex Zhuravlev added a comment - how client may have space granted while the server is out of space?

            so, this needs to be fixed, right? Unless there is no free space on the FS, the grant reservation should not really fail.

            The problem is that client doesn't know if server is out of space, it can only rely on grant so far. I think enlarging the obd_max_dirty_pages & cl_dirty_max may alleviate the problem?

            niu Niu Yawei (Inactive) added a comment - so, this needs to be fixed, right? Unless there is no free space on the FS, the grant reservation should not really fail. The problem is that client doesn't know if server is out of space, it can only rely on grant so far. I think enlarging the obd_max_dirty_pages & cl_dirty_max may alleviate the problem?

            I think the problem is that mmap write failed to reserve grant at the osc layer so -EDQUOT was returned to kernel which caused sigbus.

            Yes, probably. If this is the case, then I think returning SIGBUS to app could be the best we can do. What do you think about?

            niu Niu Yawei (Inactive) added a comment - I think the problem is that mmap write failed to reserve grant at the osc layer so -EDQUOT was returned to kernel which caused sigbus. Yes, probably. If this is the case, then I think returning SIGBUS to app could be the best we can do. What do you think about?
            green Oleg Drokin added a comment -

            so, this needs to be fixed, right? Unless there is no free space on the FS, the grant reservation should not really fail.

            green Oleg Drokin added a comment - so, this needs to be fixed, right? Unless there is no free space on the FS, the grant reservation should not really fail.

            The problem looks like mmap is trying to access invalid memory address, I can't imagine how it can be related to the fix in osc_enter_cache() (d1eb3357bab215018bcb29877ed8b2990e65cc73
            ).

            I think the problem is that mmap write failed to reserve grant at the osc layer so -EDQUOT was returned to kernel which caused sigbus.

            jay Jinshan Xiong (Inactive) added a comment - The problem looks like mmap is trying to access invalid memory address, I can't imagine how it can be related to the fix in osc_enter_cache() (d1eb3357bab215018bcb29877ed8b2990e65cc73 ). I think the problem is that mmap write failed to reserve grant at the osc layer so -EDQUOT was returned to kernel which caused sigbus.

            I was able to revert just that patch (it was clean to revert). I ran for 18 to 20 hours. It did take at least 10 hours to repro on Master so I felt safe calling it at 18-20 hours and Bens data point. I can set it up again and let it crank for days if needed.

            Ben, Did you roll back the whole git tree or just that patch as well?

            I ran a modified version of Bens script with our call to fsx.

            #!/bin/bash
            
            set -e
            
            for i in `seq 1 100`; do
            echo kmannth $i
            mount -t lustre mds@tcp:/lustre /mnt/lustre/
            ./a.out -c 50 -p 1000 -S 5662 -P /tmp -l 2453703 -N 100000  /mnt/lustre/f0.fsxfile 2>&1
            rm -rf /mnt/lustre/testfile*
            umount /mnt/lustre
            done
            

            On my system it does not happen right away (1gb of Memory in the Client). On Master lets say it took 10-12 hours to repro. After it was reproducing on the system it would fail quite easily (50% or more), the system has to be exhausted in some way to get to the reproduction state.

            keith Keith Mannthey (Inactive) added a comment - I was able to revert just that patch (it was clean to revert). I ran for 18 to 20 hours. It did take at least 10 hours to repro on Master so I felt safe calling it at 18-20 hours and Bens data point. I can set it up again and let it crank for days if needed. Ben, Did you roll back the whole git tree or just that patch as well? I ran a modified version of Bens script with our call to fsx. #!/bin/bash set -e for i in `seq 1 100`; do echo kmannth $i mount -t lustre mds@tcp:/lustre /mnt/lustre/ ./a.out -c 50 -p 1000 -S 5662 -P /tmp -l 2453703 -N 100000 /mnt/lustre/f0.fsxfile 2>&1 rm -rf /mnt/lustre/testfile* umount /mnt/lustre done On my system it does not happen right away (1gb of Memory in the Client). On Master lets say it took 10-12 hours to repro. After it was reproducing on the system it would fail quite easily (50% or more), the system has to be exhausted in some way to get to the reproduction state.

            The problem looks like mmap is trying to access invalid memory address, I can't imagine how it can be related to the fix in osc_enter_cache() (d1eb3357bab215018bcb29877ed8b2990e65cc73
            ).

            Keith, how did you run your test? With master reverted the patch d1eb3357bab215018bcb29877ed8b2990e65cc73 only, or with old master which reverted all the patches since d1eb3357bab215018bcb29877ed8b2990e65cc73 ? Thanks.

            niu Niu Yawei (Inactive) added a comment - The problem looks like mmap is trying to access invalid memory address, I can't imagine how it can be related to the fix in osc_enter_cache() (d1eb3357bab215018bcb29877ed8b2990e65cc73 ). Keith, how did you run your test? With master reverted the patch d1eb3357bab215018bcb29877ed8b2990e65cc73 only, or with old master which reverted all the patches since d1eb3357bab215018bcb29877ed8b2990e65cc73 ? Thanks.

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: