Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4185

Incorrect permission handling when creating existing directories at ICHEC

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.4.0, Lustre 2.1.4, Lustre 2.5.0, Lustre 2.6.0
    • None
    • 3
    • 11322

    Description

      One of our customers is hitting what appears to be LU-1101. They are also running torque, here is the message from the customer:

      We use torque as resource manager and job start fails because of mkdir() returns EPERM. The only workaround we have for this is to not set the tmpdir in torque, which is suboptimal. Other applications might suffer the same bug. Quantum espresso is reported to fail as well (in the lustre bug report) and we have users running it.

      The bug was originally reported in bz 23459:
      https://bugzilla.lustre.org/show_bug.cgi?id=23459

      The patch which appears to be the root cause was added as part of bz 18534:
      https://bugzilla.lustre.org/show_bug.cgi?id=18534

      Thanks.

      Attachments

        Issue Links

          Activity

            [LU-4185] Incorrect permission handling when creating existing directories at ICHEC

            In discussion with Bobijam, this issue is resolved with the landing of https://review.whamcloud.com/#/c/8257/

            jgmitter Joseph Gmitter (Inactive) added a comment - In discussion with Bobijam, this issue is resolved with the landing of https://review.whamcloud.com/#/c/8257/

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/8257/
            Subject: LU-4185 llite: Revise create with no open optimization
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a2d5b2e83c0a512a3ea59698e8481621ab5856c2

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/8257/ Subject: LU-4185 llite: Revise create with no open optimization Project: fs/lustre-release Branch: master Current Patch Set: Commit: a2d5b2e83c0a512a3ea59698e8481621ab5856c2
            green Oleg Drokin added a comment -

            After some discussion with the kernel guys, it looks like EEXIST is not a requirement, so the applications that depend on it are broken.

            See http://marc.info/?l=linux-kernel&m=146803277310677&w=2 and http://marc.info/?l=linux-nfs&m=146803381011004&w=2

            So I guess it's a good idea to file bugreports with those apps even if we have this also adjusted in Lustre.

            green Oleg Drokin added a comment - After some discussion with the kernel guys, it looks like EEXIST is not a requirement, so the applications that depend on it are broken. See http://marc.info/?l=linux-kernel&m=146803277310677&w=2 and http://marc.info/?l=linux-nfs&m=146803381011004&w=2 So I guess it's a good idea to file bugreports with those apps even if we have this also adjusted in Lustre.
            green Oleg Drokin added a comment -

            BTW, interesting thing, apparently NFS has the same problem as Lustre and I have not heard any complaints.
            Time to fill some more bugs, I guess.

            [green@fedora1 crash]$ mkdir aaa 
            mkdir: cannot create directory 'aaa': Permission denied
            [green@fedora1 crash]$ mkdir lost+found
            mkdir: cannot create directory 'lost+found': Permission denied
            [green@fedora1 crash]$ ls -ld lost+found
            drwx------ 2 root root 16384 May 25  2013 lost+found
            [green@fedora1 crash]$ mkdir lost+found
            mkdir: cannot create directory 'lost+found': File exists
            
            green Oleg Drokin added a comment - BTW, interesting thing, apparently NFS has the same problem as Lustre and I have not heard any complaints. Time to fill some more bugs, I guess. [green@fedora1 crash]$ mkdir aaa mkdir: cannot create directory 'aaa' : Permission denied [green@fedora1 crash]$ mkdir lost+found mkdir: cannot create directory 'lost+found' : Permission denied [green@fedora1 crash]$ ls -ld lost+found drwx------ 2 root root 16384 May 25 2013 lost+found [green@fedora1 crash]$ mkdir lost+found mkdir: cannot create directory 'lost+found' : File exists
            green Oleg Drokin added a comment -

            Patrick: It would be great if you can verify the current patch (Either one) is ok in majority of operations like mdtest and such.

            green Oleg Drokin added a comment - Patrick: It would be great if you can verify the current patch (Either one) is ok in majority of operations like mdtest and such.
            green Oleg Drokin added a comment -

            Bobi's patch has a test which is better than mine that does not.

            green Oleg Drokin added a comment - Bobi's patch has a test which is better than mine that does not.

            Oleg - I think you've captured the problem as I understand end users are reporting it (as well as the Cray reproducer).

            One question - Which Gerrit link are we using? Bobi Jam has updated to a patch like yours as well.

            paf Patrick Farrell (Inactive) added a comment - Oleg - I think you've captured the problem as I understand end users are reporting it (as well as the Cray reproducer). One question - Which Gerrit link are we using? Bobi Jam has updated to a patch like yours as well.

            Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/21169
            Subject: LU-4185 llite: Revise create with no open optimization.
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 571495cff60cab20d687402524eb051474b3e32b

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/21169 Subject: LU-4185 llite: Revise create with no open optimization. Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 571495cff60cab20d687402524eb051474b3e32b
            green Oleg Drokin added a comment -

            I guess if somebody confirm that the case is distilled to the apps that basically get a work path in the form of /mnt/lustre/homes/user/somedir/app_folder where user only has write permissions from user onward, but the app does the equivalent of:
            mkdir /mnt ; mkdir /mnt/lustre ; mkdir /mnt/lustre/homes ; mkdir /mnt/lustre/homes/user ; .... and stops on !-EEXIST
            Then we can probably add a lot more cheaper workaround in the ll_lookup_nd in the form of:
            if this CREATE with but NOT OPEN ( == good proxy this is mkdir, mknod and so on)
            AND if this user does NOT have write permissions in the parent, THEN go on with the lookup.

            This way we do not penalize all the important fast cases being:
            1. User has write permissions - great, we'll do create and if it fails - that would be on the server with the correct error.
            2. User does not have write permissions, but the directory exists - that's ok too, we go to the server anyway, get our dentry and never get into create in the end.

            The
            3. User does not have permissions and directory does not exist - this will get slower, we jump from 0 to 1 RPC in this case. But I assume this is unimportant enough that nobody would notice or care?

            green Oleg Drokin added a comment - I guess if somebody confirm that the case is distilled to the apps that basically get a work path in the form of /mnt/lustre/homes/user/somedir/app_folder where user only has write permissions from user onward, but the app does the equivalent of: mkdir /mnt ; mkdir /mnt/lustre ; mkdir /mnt/lustre/homes ; mkdir /mnt/lustre/homes/user ; .... and stops on !-EEXIST Then we can probably add a lot more cheaper workaround in the ll_lookup_nd in the form of: if this CREATE with but NOT OPEN ( == good proxy this is mkdir, mknod and so on) AND if this user does NOT have write permissions in the parent, THEN go on with the lookup. This way we do not penalize all the important fast cases being: 1. User has write permissions - great, we'll do create and if it fails - that would be on the server with the correct error. 2. User does not have write permissions, but the directory exists - that's ok too, we go to the server anyway, get our dentry and never get into create in the end. The 3. User does not have permissions and directory does not exist - this will get slower, we jump from 0 to 1 RPC in this case. But I assume this is unimportant enough that nobody would notice or care?
            green Oleg Drokin added a comment -

            I tried to read the comments to get a better handle on this issue.

            The reproducer from Cray is basically:
            in a place that user1 owns and user2 does not have write permissions: create a directory as user1 ; then with cold caches try to create same directory as user2 - that fails with EPERM because we do not know that directory already exists. (I imagine if user1 created a non-directory then we get the same issue too).

            This seems to be a pretty narrow case, though.
            So the other EPERM cases - do they happen when all processes have write permissions in the dir where we are trying to create? Was that source of the failures distilled to a testcase oreven better - understood yet?

            green Oleg Drokin added a comment - I tried to read the comments to get a better handle on this issue. The reproducer from Cray is basically: in a place that user1 owns and user2 does not have write permissions: create a directory as user1 ; then with cold caches try to create same directory as user2 - that fails with EPERM because we do not know that directory already exists. (I imagine if user1 created a non-directory then we get the same issue too). This seems to be a pretty narrow case, though. So the other EPERM cases - do they happen when all processes have write permissions in the dir where we are trying to create? Was that source of the failures distilled to a testcase oreven better - understood yet?

            People

              bobijam Zhenyu Xu
              rganesan@ddn.com Rajeshwaran Ganesan
              Votes:
              2 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: