Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-762

Hyperion - mdtest failure

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.2.0, Lustre 2.1.1
    • Lustre 1.8.7
    • None
    • RHEL5/x86_64
    • 3
    • 4852

    Description

      mdtest in shared directory fails with 50 clients and MDTEST_NFILES=1024, also fails w/100 clients and MDTEST_NFILES=256. Failure is silent, no Lustre Errors or any servers messages. Failure is consistent and repeatable. Failure only occurs when run in shared directory mode.
      Failure example:

      000: Command line used: /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/white215/hyperion.3867/mdtest -i3 -n1024
      000: Path: /p/l_wham/white215/hyperion.3867
      000: FS: 87.0 TiB Used FS: 0.0% Inodes: 546.9 Mi Used Inodes: 2.1%
      000:
      000: 400 tasks, 409600 files/directories
      000: 10/14/2011 01:17:15: Process 0(hyperion319): FAILED in create_remove_items_helper, unable to remove directory: No such file or directory
      srun: mvapich: 2011-10-14T01:17:15: ABORT from MPI rank 0 [on hyperion319]
      000: [0] [MPI Abort by user] Aborting Program!
      000: [0:hyperion319] Abort: MPI_Abort() code: 1, rank 0, MPI Abort by user Aborting program ! at line 99 in file mpid_init.c
      000: slurmd[hyperion319]: *** STEP 1219936.0 KILLED AT 2011-10-14T01:17:15 WITH SIGNAL 9 ***

      Attachments

        1. e4-38-74
          10 kB
          Yang Sheng
        2. fs38-74
          33 kB
          Yang Sheng
        3. parallel-scale.test_metabench.debug_log.hyperion-mds1.1319834069.log.gz
          2.33 MB
          Cliff White

        Issue Links

          Activity

            People

              ys Yang Sheng
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: