To clarify:
The problem occurs when Torque (pbs_mom) has the $tmpdir config (/var/torque/mom_priv/config) var set to Lustre filesystem (in our case $tmpdir /mnt/lustre/scratch/jobs). We occasionally get errors like:
Feb 14 13:56:23 n6-4-16 pbs_mom: LOG_ERROR::Permission denied (13) in TMakeTmpDir, Unable to make job transient directory: /mnt/lustre/scratch/jobs/18555647.batch.grid.cyf-kr.edu.pl
Feb 14 14:37:35 n6-4-16 pbs_mom: LOG_ERROR::Permission denied (13) in TMakeTmpDir, Unable to make job transient directory: /mnt/lustre/scratch/jobs/18557701.batch.grid.cyf-kr.edu.pl
Feb 14 14:38:17 n6-4-16 pbs_mom: LOG_ERROR::Permission denied (13) in TMakeTmpDir, Unable to make job transient directory: /mnt/lustre/scratch/jobs/18557716.batch.grid.cyf-kr.edu.pl
Feb 14 14:50:46 n6-4-16 pbs_mom: LOG_ERROR::Permission denied (13) in TMakeTmpDir, Unable to make job transient directory: /mnt/lustre/scratch/jobs/18559037.batch.grid.cyf-kr.edu.pl
Feb 14 15:01:44 n6-4-16 pbs_mom: LOG_ERROR::Permission denied (13) in TMakeTmpDir, Unable to make job transient directory: /mnt/lustre/scratch/jobs/18559949.batch.grid.cyf-kr.edu.pl
An example output of the reproducer:
[b14flis@n6-4-16 repro]$ ./a.out /mnt/lustre/scratch/jobs/
Iteration: 1
Creating directory: /mnt/lustre/scratch/jobs/1804289383
mkdir(/mnt,mode) errno: 17
mkdir(/mnt/lustre,mode) errno: 17
mkdir(/mnt/lustre/scratch,mode) errno: 17
mkdir(/mnt/lustre/scratch/jobs,mode) errno: 13
mkdirtree: failed: rc=13
sleeping for 2 seconds
Iteration: 2
doing stat before creating directory
Creating directory: /mnt/lustre/scratch/jobs/846930886
mkdir(/mnt,mode) errno: 17
mkdir(/mnt/lustre,mode) errno: 17
mkdir(/mnt/lustre/scratch,mode) errno: 17
mkdir(/mnt/lustre/scratch/jobs,mode) errno: 17
mkdirtree: successful: rc=0
ERROR: inconsistency detected: previous rc: 13 vs current rc: 0
Closing as a duplicate of
LU-4185.