Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11182

parallel-scale test_cascading_rw fails with 'cascading_rw failed! 1'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0, Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for James Nunez <james.a.nunez@intel.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f888d9f2-8d67-11e8-87f3-52540065bddc

      test_cascading_rw failed with the following error:

      cascading_rw failed! 1
      

      In this failure, cascading_rw runs several write to file iterations, in this case 104 iterations, hits some problem and returns -1. From the test_log, we see

      23:41:23: Running test #/usr/lib64/lustre/tests/cascading_rw(iter 104)
      23:41:23: Process 0 (trevis-9vm1.trevis.whamcloud.com)
      	FAILED in cascading_rw.c:150:rw_file()
      write of file /mnt/lustre/d0.cascading_rw/cascading_rw return -1--------------------------------------------------------------------------
      MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
      

      The only interesting output in the console or dmesg logs is in the logs for client running the test. In the client console log, we see a message, but this shouldn’t be causing any issues ... should it?

      [88550.129323] Lustre: DEBUG MARKER: == parallel-scale test cascading_rw: cascading_rw ==================================================== 23:40:10 (1532216410)
      [88550.536177] Lustre: cascading_rw: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x20006bacf:0x10e05:0x0], use llapi_layout_get_by_path()
      [88623.058629] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale test_cascading_rw: @@@@@@ FAIL: cascading_rw failed! 1 
      

      The initial thought is that we are filling the file system. So, we need to add some debug logging to see if this is correct and then we can clean up the message in functions.sh/run_cascading_rw()

       730 
       731     # FIXME
       732     # Need space estimation here.
       733 
      

      Although it’s hard to tell when this started, this issue looks like it started around 2018-07-19.

      Here are a few other logs for this failure
      https://testing.whamcloud.com/test_sets/c36b7bf8-8b55-11e8-9028-52540065bddc
      https://testing.whamcloud.com/test_sets/e22eba50-8dad-11e8-87f3-52540065bddc

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      parallel-scale test_cascading_rw - cascading_rw failed! 1

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: