Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13230

sanity-benchmark test fsx hangs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.12.4
    • None
    • 3
    • 9223372036854775807

    Description

      sanity-benchmark test_fsx hangs.

      Looking at the hang at https://testing.whamcloud.com/test_sets/9c812454-4b41-11ea-a1c8-52540065bddc, the last thing seen in the client test_log is

      == sanity-benchmark test fsx: fsx ==================================================================== 15:41:36 (1581176496)
      debug=0
      Using: fsx -c 50 -p 1000 -S 20400 -P /tmp -l 3438416         -N 100000  /mnt/lustre/f0.fsxfile
      Chance of close/open is 1 in 50
      Seed set to 20400
      truncating to largest ever: 0x1240bb
      truncating to largest ever: 0x1b0cac
      truncating to largest ever: 0x331506
      truncating to largest ever: 0x338a0b
      truncating to largest ever: 0x3443e1
      

      Looking at the console logs, there’s no call traces and not many error messages to understand why the test hangs. Looking at the client1 (vm6) console log, we see

      [31869.584851] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-benchmark test fsx: fsx ==================================================================== 15:41:36 \(1581176496\)
      [31869.821328] Lustre: DEBUG MARKER: == sanity-benchmark test fsx: fsx ==================================================================== 15:41:36 (1581176496)
      [31869.901535] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000401:0x6ac8:0x0], use llapi_layout_get_by_path()
      [31869.917359] Lustre: lustre-OST0002-osc-ffff98bc9b52c000: reconnect after 1s idle
      [31869.918649] Lustre: Skipped 5 previous similar messages
      [31916.662148] Lustre: 24080:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1581176536/real 1581176536]  req@ffff98bc805cb600 x1657965685190528/t0(0) o400->lustre-MDT0000-mdc-ffff98bc9b52c000@10.9.5.70@tcp:12/10 lens 224/224 e 0 to 1 dl 1581176543 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [31916.667085] Lustre: lustre-MDT0000-mdc-ffff98bc9b52c000: Connection to lustre-MDT0000 (at 10.9.5.70@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [31916.669901] LustreError: 166-1: MGC10.9.5.70@tcp: Connection to MGS (at 10.9.5.70@tcp) was lost; in progress operations using this service will fail
      
      <ConMan> Console [trevis-26vm6] disconnected from <trevis-26:6005> at 02-08 16:14.
      

      Although there are other examples of sanity-benchmark test fsx hanging in the past, there isn’t enough information here to match this hang to past failures.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: