Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12818

replay-single test_70b and other tests fail with “Numerical result out of range” error

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.4
    • RHEL8 and Ubuntu 18.04
    • 3
    • 9223372036854775807

    Description

      Many replay-single tests fail for RHEL8 and Ubuntu18.04 with the “Numerical result out of range” error. All tests have different failure messages, but they all have the “Numerical result out of range” error during file open.

      Looking at the suite_log for https://testing.whamcloud.com/test_sets/0f9f03d0-c158-11e9-90ad-52540065bddc, we see replay-single tests 13, 14, 15, 18, 21, 22, 23, 31, 32, 33a, 53b, 53c, 53g, 53h, and 70b all fail

      == replay-single test 13: open chmod 0 |x| write close =============================================== 20:22:03 (1566073323)
      multiop /mnt/lustre/f13.replay-single vO_wc
      TMPPIPE=/tmp/multiop_open_wait_pipe.29190
      open(O_RDWR|O_CREAT): Numerical result out of range
       replay-single test_13: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f13.replay-single failed 
      …
      == replay-single test 14: open(O_CREAT), unlink |X| close ============================================ 20:22:07 (1566073327)
      multiop /mnt/lustre/f14.replay-single vO_tSc
      TMPPIPE=/tmp/multiop_open_wait_pipe.29190
      open(O_RDWR|O_CREAT): Numerical result out of range
       replay-single test_14: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f14.replay-single failed 
      …
      == replay-single test 15: open(O_CREAT), unlink |X|  touch new, close ================================ 20:22:11 (1566073331)
      multiop /mnt/lustre/f15.replay-single vO_tSc
      TMPPIPE=/tmp/multiop_open_wait_pipe.29190
      open(O_RDWR|O_CREAT): Numerical result out of range
       replay-single test_15: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f15.replay-single failed 
      …
      trevis-8vm6: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
      touch: cannot touch '/mnt/lustre/f18.replay-single-3': Numerical result out of range
       replay-single test_18: @@@@@@ FAIL: touch /mnt/lustre/f18.replay-single-3 failed 
      …
      trevis-8vm6: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
      touch: cannot touch '/mnt/lustre/f21.replay-single-2': Numerical result out of range
       replay-single test_21: @@@@@@ FAIL: touch /mnt/lustre/f21.replay-single-2 failed 
      …
      == replay-single test 22: open(O_CREAT), |X| unlink, replay, close (test mds_cleanup_orphans) ======== 20:33:35 (1566074015)
      multiop /mnt/lustre/f22.replay-single vO_tSc
      TMPPIPE=/tmp/multiop_open_wait_pipe.29190
      open(O_RDWR|O_CREAT): Numerical result out of range
      …
      == replay-single test 23: open(O_CREAT), |X| unlink touch new, replay, close (test mds_cleanup_orphans) ====================================================================================================== 20:33:39 (1566074019)
      multiop /mnt/lustre/f23.replay-single vO_tSc
      TMPPIPE=/tmp/multiop_open_wait_pipe.29190
      open(O_RDWR|O_CREAT): Numerical result out of range
       replay-single test_23: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f23.replay-single failed 
      …
      == replay-single test 31: open(O_CREAT) two, unlink one, |X| unlink one, close two (test mds_cleanup_orphans) ====================================================================================================== 20:44:50 (1566074690)
      multiop /mnt/lustre/f31.replay-single-1 vO_tSc
      TMPPIPE=/tmp/multiop_open_wait_pipe.29190
      open(O_RDWR|O_CREAT): Numerical result out of range
       replay-single test_31: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f31.replay-single-1 failed 
      …
      == replay-single test 32: close() notices client eviction; close() after client eviction ============= 20:44:54 (1566074694)
      multiop /mnt/lustre/f32.replay-single vO_c
      TMPPIPE=/tmp/multiop_open_wait_pipe.29190
      open(O_RDWR|O_CREAT): Numerical result out of range
       replay-single test_32: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f32.replay-single failed 
      …
      == replay-single test 33a: fid seq shouldn't be reused after abort recovery ========================== 20:44:58 (1566074698)
      open(/mnt/lustre/f33a.replay-single-0) error: Numerical result out of range
      total: 0 open/close in 0.00 seconds: 0.00 ops/second
       replay-single test_33a: @@@@@@ FAIL: createmany create /mnt/lustre/f33a.replay-single failed 
      …
      == replay-single test 53b: |X| open request while two MDC requests in flight ========================= 21:35:13 (1566077713)
      multiop /mnt/lustre/d53b.replay-single-1/f vO_c
      TMPPIPE=/tmp/multiop_open_wait_pipe.29190
      open(O_RDWR|O_CREAT): Numerical result out of range
       replay-single test_53b: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/d53b.replay-single-1/f failed 
      …
      == replay-single test 53c: |X| open request and close request while two MDC requests in flight ======= 21:35:17 (1566077717)
      open(O_RDWR|O_CREAT): Numerical result out of range
      CMD: trevis-8vm10 lctl set_param fail_loc=0x80000107
      fail_loc=0x80000107
      CMD: trevis-8vm10 lctl set_param fail_loc=0x80000115
      fail_loc=0x80000115
      /usr/lib64/lustre/tests/replay-single.sh: line 1294: kill: (15973) - No such process
      …
      == replay-single test 53g: |X| drop open reply and close request while close and open are both in flight ====================================================================================================== 21:39:58 (1566077998)
      open(O_RDWR|O_CREAT): Numerical result out of range
      CMD: trevis-8vm11 lctl set_param fail_loc=0x119
      fail_loc=0x119
      CMD: trevis-8vm11 lctl set_param fail_loc=0x80000115
      fail_loc=0x80000115
      /usr/lib64/lustre/tests/replay-single.sh: line 1437: kill: (22835) - No such process
      CMD: trevis-8vm11 lctl set_param fail_loc=0
      fail_loc=0
       replay-single test_53g: @@@@@@ FAIL: close_pid doesn't exist 
      …
      == replay-single test 53h: open request and close reply while two MDC requests in flight ============= 21:40:07 (1566078007)
      open(O_RDWR|O_CREAT): Numerical result out of range
      CMD: trevis-8vm11 lctl set_param fail_loc=0x80000107
      fail_loc=0x80000107
      CMD: trevis-8vm11 lctl set_param fail_loc=0x8000013b
      fail_loc=0x8000013b
      /usr/lib64/lustre/tests/replay-single.sh: line 1475: kill: (23667) - No such process
       replay-single test_53h: @@@@@@ FAIL: close_pid doesn't exist 
      …
      trevis-8vm7: [44191] open ./clients/client0/~dmtmp/PARADOX/__40D6B.DB failed for handle 10888 (Numerical result out of range)
      trevis-8vm7: (44193) ERROR: handle 10888 was not found
      trevis-8vm7: Child failed with status 1
      trevis-8vm7: dbench: no process found
      pdsh@trevis-8vm4: trevis-8vm7: ssh exited with exit code 1
      trevis-8vm4: [44069] open ./clients/client0/~dmtmp/PARADOX/__3F2C4.DB failed for handle 10851 (Numerical result out of range)
      trevis-8vm4: (44071) ERROR: handle 10851 was not found
      trevis-8vm6:    1     44591     0.68 MB/sec  execute 193 sec  latency 72567.258 ms
      trevis-8vm4: Child failed with status 1
      trevis-8vm4: dbench: no process found
      pdsh@trevis-8vm4: trevis-8vm4: ssh exited with exit code 1
      trevis-8vm6: [44660] open ./clients/client0/~dmtmp/PARADOX/__50172.DB failed for handle 11004 (Numerical result out of range)
      trevis-8vm6: (44662) ERROR: handle 11004 was not found
      trevis-8vm6: Child failed with status 1
      trevis-8vm6: dbench: no process found
      …
      trevis-8vm6: dbench: no process found
      pdsh@trevis-8vm4: trevis-8vm6: ssh exited with exit code 1
       replay-single test_70b: @@@@@@ FAIL: dbench stopped on some of trevis-8vm4.trevis.whamcloud.com,trevis-8vm6,trevis-8vm7! 
      

      Different test sessions have different tests that fail. For example tests 28, 29, 30, 31, 32, 53f, 53g, and 70b for the Ubuntu 18.04 test session https://testing.whamcloud.com/test_sets/9742e3aa-cc8f-11e9-98c8-52540065bddc.

      Although the error messages are different, this issue may be related to LU-10613.

      Logs for other failed test sessions are at:
      https://testing.whamcloud.com/test_sets/b80265a8-d8af-11e9-a25b-52540065bddc
      https://testing.whamcloud.com/test_sets/c68821fe-b9fa-11e9-97d5-52540065bddc
      https://testing.whamcloud.com/test_sets/6615df92-d105-11e9-90ad-52540065bddc
      https://testing.whamcloud.com/test_sets/95922fb6-d80b-11e9-98c8-52540065bddc
      https://testing.whamcloud.com/test_sets/211af22a-ddcf-11e9-a197-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: