Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4123

lfsck: @@@@@@ FAIL: /data/test/output isn't a shared directory

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.4.1
    • Lustre 1.8 -> master
      multi rail IB cluster
    • 3
    • 11126

    Description

      check_write_access() compares the local node name to the host name passed via
      xxx_HOST env variables. This breaks when there is a mismatch.

      My cluster uses the nodename as the host name of the management interface
      (ethernet). The Infiniband interface(s) use 'hostname-ibX'. To use an IB
      interface for acceptance tests, the ib host name must be used in xxx_HOST
      env variables.

      I have a patch to use the remote nodename in check_write_access().

      Attachments

        Activity

          [LU-4123] lfsck: @@@@@@ FAIL: /data/test/output isn't a shared directory
          yujian Jian Yu added a comment -

          Patch was back-ported to Lustre b2_4 branch: http://review.whamcloud.com/8343

          yujian Jian Yu added a comment - Patch was back-ported to Lustre b2_4 branch: http://review.whamcloud.com/8343
          yujian Jian Yu added a comment -

          Patch landed on master branch.

          yujian Jian Yu added a comment - Patch landed on master branch.
          schamp Stephen Champion added a comment - - edited

          A little more detail - my startup script runs from an NFS exported directory. It specifies hosts by name of an IB interface:

          1. grep _HOST= run-acc-accfs.sh
            mgs_HOST=n013-ib1
            mds_HOST=n013-ib1
            mds1_HOST=n013-ib1
            ost1_HOST=n008-ib1
            ost_HOST=n008-ib1
            ost2_HOST=n009-ib1

          check_logdir() creates the files:
          touch $dir/check_file.$(hostname -s)

          but must run on the node to actually be useful:
          do_rpc_nodes "$list" check_logdir $dir
          check_write_access $dir "$list" || return 1

          This works just fine:

          1. ls
            check_file.n008 rpmlist-client.n006 rpmlist-server.n009 shared
            check_file.n009 rpmlist-client.n007 rpmlist-server.n013
            check_file.n013 rpmlist-server.n008 run-acc-accfs.sh

          But because 'n013' != 'n013-ib1', the check for shared access fails.

          The patch insures that it will check for the same file name that is created, regardless of host/interface name arrangement.

          schamp Stephen Champion added a comment - - edited A little more detail - my startup script runs from an NFS exported directory. It specifies hosts by name of an IB interface: grep _HOST= run-acc-accfs.sh mgs_HOST=n013-ib1 mds_HOST=n013-ib1 mds1_HOST=n013-ib1 ost1_HOST=n008-ib1 ost_HOST=n008-ib1 ost2_HOST=n009-ib1 check_logdir() creates the files: touch $dir/check_file.$(hostname -s) but must run on the node to actually be useful: do_rpc_nodes "$list" check_logdir $dir check_write_access $dir "$list" || return 1 This works just fine: ls check_file.n008 rpmlist-client.n006 rpmlist-server.n009 shared check_file.n009 rpmlist-client.n007 rpmlist-server.n013 check_file.n013 rpmlist-server.n008 run-acc-accfs.sh But because 'n013' != 'n013-ib1', the check for shared access fails. The patch insures that it will check for the same file name that is created, regardless of host/interface name arrangement.
          schamp Stephen Champion added a comment - http://review.whamcloud.com/#/c/8009/

          People

            yujian Jian Yu
            schamp Stephen Champion
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: