Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8813

Kerberos: sanity and sanity-krb5 test suites fail on non-root user trying to touch file

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • Lustre 2.9.0
    • 3
    • 9223372036854775807

    Description

      I’m having problems running some test suites with Kerberos enabled. For example, running sanity.sh on 2.8.60 + the patch http://review.whamcloud.com/#/c/23600/, and the test fails with

      # ./auster -v -k sanity --only 0a
      Started at Tue Nov  8 14:39:14 UTC 2016
      eagle-48vm6.eagle.hpdd.intel.com: Checking config lustre mounted on /lustre/scratch
      Checking servers environments
      Checking clients eagle-48vm6.eagle.hpdd.intel.com environments
      Logging to local directory: /tmp/test_logs/2016-11-08/143914
      Client: Lustre version: 2.8.60_1_g35d09c7
      MDS: Lustre version: 2.8.60_1_g35d09c7
      OSS: Lustre version: 2.8.60_1_g35d09c7
      running: sanity ONLY=0a 
      run_suite sanity /usr/lib64/lustre/tests/sanity.sh
      -----============= acceptance-small: sanity ============----- Tue Nov  8 14:39:21 UTC 2016
      Running: bash /usr/lib64/lustre/tests/sanity.sh
      eagle-48vm6.eagle.hpdd.intel.com: Checking config lustre mounted on /lustre/scratch
      Checking servers environments
      Checking clients eagle-48vm6.eagle.hpdd.intel.com environments
      Using TIMEOUT=20
      disable quota as required
      osd-ldiskfs.track_declares_assert=1
      osd-ldiskfs.track_declares_assert=1
      debug=-1
      running as uid/gid/euid/egid 500/500/500/500, groups:
       [touch] [/lustre/scratch/d0_runas_test/f7025]
      touch: cannot touch `/lustre/scratch/d0_runas_test/f7025': Permission denied
       sanity : @@@@@@ FAIL: unable to write to /lustre/scratch/d0_runas_test as UID 500.
              Please set RUNAS_ID to some UID which exists on MDS and client or
              add user 500:500 on these nodes. 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4841:error()
        = /usr/lib64/lustre/tests/test-framework.sh:5670:check_runas_id()
        = /usr/lib64/lustre/tests/sanity.sh:126:main()
      Dumping lctl log to /tmp/test_logs/2016-11-08/143914/sanity..*.1478615970.log
      eagle-48vm1: Host key verification failed.
      eagle-48vm1: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
      eagle-48vm1: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
      pdsh@eagle-48vm6: eagle-48vm1: ssh exited with exit code 12
      eagle-48vm2: Host key verification failed.
      eagle-48vm2: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
      eagle-48vm2: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
      pdsh@eagle-48vm6: eagle-48vm2: ssh exited with exit code 12
      sanity returned 0
      Finished at Tue Nov  8 14:39:31 UTC 2016 in 17s
      ./auster: completed with rc 0
      

      The code that is failing in sanity.sh is

      # $RUNAS_ID may get set incorrectly somewhere else
      [ $UID -eq 0 -a $RUNAS_ID -eq 0 ] && error "\$RUNAS_ID set to 0, but \$UID is al
      so 0!"
      
      check_runas_id $RUNAS_ID $RUNAS_GID $RUNAS
      

      UID/GID 500 belongs to sanityusr and requested a Kerberos ticket before running sanity.sh:

      # su sanityusr
      bash-4.1$ klist
      Ticket cache: FILE:/tmp/krb5cc_500
      Default principal: sanityusr@CO.CFS
      
      Valid starting     Expires            Service principal
      11/08/16 14:38:48  11/09/16 14:38:48  krbtgt/CO.CFS@CO.CFS
      

      Note that CO.CFS is the realm being used.

      Since Lustre is not failing, it’s not surprising that there is nothing of interest in dmesg. For example, form the MGS/MDS:

      Lustre: DEBUG MARKER: -----============= acceptance-small: sanity ============----- Tue Nov 8 14:39:21 UTC 2016
      Lustre: DEBUG MARKER: Using TIMEOUT=20
      Lustre: DEBUG MARKER: sanity : @@@@@@ FAIL: unable to write to /lustre/scratch/d0_runas_test as UID 500.
      

      Logs for this run are at https://testing.hpdd.intel.com/test_sets/b87e2568-a5f8-11e6-964e-5254006e85c2.
      More logs will be attached to this ticket.

      The last time I tested Kerberos and the above tests ran was tag 2.8.54. Since that time some flags have been added to lsvcgssd. I tried to call lsvcgssd two different ways; the way things worked in 2.8.54 as ‘/usr/sbin/lsvcgssd’ on all Lustre servers and, the new recommended way, for the MGS/MDS, ‘/usr/sbin/lsvcgssd -m -g -k –vvv’ and, for the OSS, ‘/usr/sbin/lsvcgssd -o -k –vvv’ (verbosity is optional). All tests were run with RHEL 6.8 (for some reason, Maloo reports it as el6.7)

      Confirmation that others are or are not experiencing this problem with these test suites and Kerberos would be helpful.

      Attachments

        1. CopyofGSSKerberossetupguideforLustre.pdf
          140 kB
          Andrew Perepechko
        2. sanity..debug_log.eagle-48vm1.1478615970.log
          3.28 MB
          James Nunez
        3. sanity..debug_log.eagle-48vm2.1478615970.log
          3.28 MB
          James Nunez
        4. sanity..debug_log.eagle-48vm6.1478615970.log
          3.54 MB
          James Nunez
        5. sanity..dmesg.eagle-48vm1.1478615970.log
          29 kB
          James Nunez
        6. sanity..dmesg.eagle-48vm2.1478615970.log
          31 kB
          James Nunez
        7. sanity..dmesg.eagle-48vm6.1478615970.log
          26 kB
          James Nunez
        8. sanity.suite_log.eagle-48vm6.log
          2 kB
          James Nunez

        Issue Links

          Activity

            [LU-8813] Kerberos: sanity and sanity-krb5 test suites fail on non-root user trying to touch file

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25584/
            Subject: LU-8813 gss: limit the number of error messages in logs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4ed67efd13cddd7ec41d29e853601ce862aaae9e

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25584/ Subject: LU-8813 gss: limit the number of error messages in logs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4ed67efd13cddd7ec41d29e853601ce862aaae9e

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/25584
            Subject: LU-8813 gss: limit the number of error messages in logs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9a04aec1e2692cf32aedadee0bf745b657724012

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/25584 Subject: LU-8813 gss: limit the number of error messages in logs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9a04aec1e2692cf32aedadee0bf745b657724012
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23925/
            Subject: LU-8813 gss: allow svcgssd to start without "-k"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: faf53524cdb90eee45e9425e529a7a6868679c56

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23925/ Subject: LU-8813 gss: allow svcgssd to start without "-k" Project: fs/lustre-release Branch: master Current Patch Set: Commit: faf53524cdb90eee45e9425e529a7a6868679c56
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.10.0 [ 12204 ]
            Fix Version/s Original: Lustre 2.9.0 [ 11891 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-3425 [ LU-3425 ]

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/23925
            Subject: LU-8813 gss: allow svcgssd to start without "-k"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6f63a934d3f771c479f296b203f0f717cfae5313

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/23925 Subject: LU-8813 gss: allow svcgssd to start without "-k" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6f63a934d3f771c479f296b203f0f717cfae5313
            adilger Andreas Dilger made changes -
            Resolution Original: Fixed [ 1 ]
            Status Original: Resolved [ 5 ] New: Reopened [ 4 ]

            Reopen to fix compatibility with svcgssd not being passed -k option for existing Kerberos configurations.

            adilger Andreas Dilger added a comment - Reopen to fix compatibility with svcgssd not being passed -k option for existing Kerberos configurations.

            People

              yong.fan nasf (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: