Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8101

sanity test_118c fails with 'No page in writeback, writeback=0'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.9.0, Lustre 2.13.0, Lustre 2.10.7, Lustre 2.15.0
    • None
    • autotest reivedw-zfs-part-1
    • 3
    • 9223372036854775807

    Description

      sanity test 118c fails in review-zfs-part-1 with

      'No page in writeback, writeback=0' 
      

      In OST console logs when sanity test 118c PASS, there is only one call to cfs_fail_timeout_set(). When test 118c fails, on the OSTs, the console log has multiple calls to cfs_fail_timeout_set():

      10:48:43:Lustre: DEBUG MARKER: == sanity test 118c: Fsync blocks on EROFS until dirty pages are flushed ============ 10:48:38 (1462297718)
      10:48:43:Lustre: DEBUG MARKER: lctl set_param fail_val=0 fail_loc=0x216
      10:48:43:LustreError: 4699:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 216 sleeping for 1000ms
      10:48:43:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_118c: @@@@@@ FAIL: No page in writeback, writeback=0 
      10:48:43:Lustre: DEBUG MARKER: sanity test_118c: @@@@@@ FAIL: No page in writeback, writeback=0
      10:48:43:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2016-05-03/lustre-reviews-el6_7-x86_64--review-zfs-part-1--1_8_1__38579__-70130514263840-074815/sanity.test_118c.debug_log.$(hostname -s).1462297721.log;
      10:48:43:         dmesg > /logdir/test_logs/2016-05-03/lustre-reviews-el
      10:48:43:LustreError: 4699:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 216 awake
      10:50:24:LustreError: 4699:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 216 sleeping for 1000ms
      10:50:24:LustreError: 4699:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 216 awake
      10:50:24:LustreError: 4699:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 216 sleeping for 1000ms
      10:50:24:LustreError: 4699:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 216 awake
      10:50:24:Lustre: *** cfs_fail_loc=216, val=0***
      10:50:24:Lustre: Skipped 2 previous similar messages
      10:50:24:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null || true
      10:50:24:Lustre: DEBUG MARKER: for i in $(find /proc/spl -type f); do
      10:50:24:				echo ====${i}=======================;
      10:50:24:				cat ${i};
      10:50:24:				done
      10:50:24:Lustre: DEBUG MARKER: echo ====zpool events -v=======================;
      10:50:24:				zpool events -v
      10:50:24:Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 118d: Fsync validation inject a delay of the bulk ============ 10:48:54 \(1462297734\)
      

      Similarly, on the client, there are multiple calls to osc_brw_redo_request()in the console logs for the failed test

      10:49:59:Lustre: DEBUG MARKER: == sanity test 118c: Fsync blocks on EROFS until dirty pages are flushed ============ 10:48:38 (1462297718)
      10:49:59:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_118c: @@@@@@ FAIL: No page in writeback, writeback=0 
      10:49:59:Lustre: DEBUG MARKER: sanity test_118c: @@@@@@ FAIL: No page in writeback, writeback=0
      10:49:59:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2016-05-03/lustre-reviews-el6_7-x86_64--review-zfs-part-1--1_8_1__38579__-70130514263840-074815/sanity.test_118c.debug_log.$(hostname -s).1462297721.log;
      10:49:59:         dmesg > /logdir/test_logs/2016-05-03/lustre-reviews-el
      10:49:59:LustreError: 11-0: lustre-OST0001-osc-ffff8800623bc800: operation ost_write to node 10.2.5.40@tcp failed: rc = -30
      10:49:59:LustreError: Skipped 1 previous similar message
      10:49:59:LustreError: 4786:0:(osc_request.c:1527:osc_brw_redo_request()) @@@ redo for recoverable error -30  req@ffff880077ddac80 x1533326821172416/t0(0) o4->lustre-OST0001-osc-ffff8800623bc800@10.2.5.40@tcp:6/4 lens 608/192 e 0 to 0 dl 1462297765 ref 2 fl Interpret:RM/0/0 rc -30/-30
      10:49:59:LustreError: 4786:0:(osc_request.c:1527:osc_brw_redo_request()) Skipped 1 previous similar message
      10:49:59:LustreError: 4786:0:(osc_request.c:1527:osc_brw_redo_request()) @@@ redo for recoverable error -30  req@ffff88007d2f1cc0 x1533326821172448/t0(0) o4->lustre-OST0001-osc-ffff8800623bc800@10.2.5.40@tcp:6/4 lens 608/192 e 0 to 0 dl 1462297767 ref 2 fl Interpret:RM/0/0 rc -30/-30
      10:49:59:LustreError: 4786:0:(osc_request.c:1527:osc_brw_redo_request()) @@@ redo for recoverable error -30  req@ffff880077ddac80 x1533326821172464/t0(0) o4->lustre-OST0001-osc-ffff8800623bc800@10.2.5.40@tcp:6/4 lens 608/192 e 0 to 0 dl 1462297770 ref 2 fl Interpret:RM/0/0 rc -30/-30
      10:49:59:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null || true
      10:49:59:Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 118d: Fsync validation inject a delay of the bulk ============ 10:48:54 \(1462297734\)
      

      Logs for this failure are at:
      https://testing.hpdd.intel.com/test_sets/f12f9b3e-1184-11e6-b5f1-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/721631e0-f399-11e5-b32f-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/f0c2aa46-c7f2-11e5-80b3-5254006e85c2

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: