[LU-10948] client cache open lock after N opens Created: 24/Apr/18  Updated: 22/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Bob Ciotti Assignee: Oleg Drokin
Resolution: Unresolved Votes: 0
Labels: None
Environment:

cent server/sles client


Attachments: PDF File LU-10948_testing.pdf     File fortran.100iter.strace    
Issue Links:
Duplicate
is duplicated by LU-5426 Add more controls for open file cache Resolved
Related
is related to LU-4398 mdt_object_open_lock() may not flush ... Resolved
is related to LU-10269 Fixes for selective trybits Resolved
is related to LU-11199 mdsrate open() performance degradation Resolved
is related to LU-7915 investigate heuristics for SPARK clie... Open
is related to LU-8585 All Lustre test suites should pass wi... Open
is related to LU-10955 Cancel open lock as part of close req... Open
is related to LU-12262 Improve sbi_flags checking Resolved
is related to LU-14694 racer timeouts with LU-10948 Resolved
is related to LU-14743 Interop: sanity test 429 fails with '... Resolved
is related to LU-16463 llapi_open_by_fid open should not be ... Resolved
is related to LU-7915 investigate heuristics for SPARK clie... Open
is related to LU-10957 Return more lock bits from MDS for op... Open
is related to LU-11623 Allow caching of open-created dentries Reopened
Epic/Theme: Performance
Rank (Obsolete): 9223372036854775807

 Description   

listed as minor but when a user does this, we start to get phone calls form other users and then page POC to identify code/user. Workaround is to terminate user job(s).

Oleg has said that Lustre has an existing feature for a client to acquire open lock but off by default. This to mimic NFS behavior.

Ideal change would be that we can specify a number of times that a file is opened on single client at which time lock is acquired. (e.g. 10th time)

Use case is naive user who loop in this way on like 5000+ threads in java:

do until till the sun turns black()

{ fd = open(*my_thread_ID, O_APPEND) calculate_something_small_but_useful() write(fd, *fortytwo, 42) close(fd }

Users often don't have complete control over the code they run and as a result may not be able to quickly make even simple changes.



 Comments   
Comment by Oleg Drokin [ 25/Apr/18 ]

So I took initial look into this with a simple patch that implements an inode open counter. This uncovered a number of unanticipated problems:

1. if you never properly look up the file before doing open (or do an ls in the directory that contains it), all lookups don't match (due to invalid flag in dentry so dcompare rejects it) and we always enter lustre via ll_atomic_open() with a new/negative dentry => we have no inode, so we cannot reach the counter until it's too late.
2. if we do ls -l beforehand and have the inode. the moment we request an open lock (that ends up being lookup|open) - due to a mode conflict, it invalidates the lookup|update|perms|.... lock we got from ls -l, this lock revocation invalidates the file dentry and ll_intent_file_open() does not rehash it on lookup lock receiving which returns us back to square 1 - we have the open lock, but we cannot find it (the rehashing is easy to solve, I'll do a simple patch for that soon).
3. the conflict arising from #2 is not detected by ELC so it's an actual client-server-client rpc pingpong which is also undesirable.

I am somewhat surprised to find we do not return a lookup lock from an open request even if the open lock was not requested, since normally we'd want to do that anyway and also include a layout lock (and save an RPC fetching layout so we can start doing io right away), but I guess this is a function of mdtest penalizing us if we return any lock? There's still some looking to be done here on my part, I guess.

Comment by Gerrit Updater [ 25/Apr/18 ]

Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32156
Subject: LU-10948 mdt: Always return lookup lock on opens
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d7bced6acd3a1ef77db2afdbadbe4bedc0ef54c4

Comment by Gerrit Updater [ 25/Apr/18 ]

Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32157
Subject: LU-10948 llite: Revalidate dentries in ll_intent_file_open
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e4066c651e0fc03ec967c45fa2d99a04d82e4447

Comment by Gerrit Updater [ 25/Apr/18 ]

Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32158
Subject: LU-10948 llite: WIP! Introduce inode open counter
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 07d1059dcd4f0e4965b8403cd1bcf52ba391343b

Comment by Oleg Drokin [ 25/Apr/18 ]

I just pushed 3 patches that make the problem go away. the 3rd patch is a bit of a WIP in that you cannot actually tune the counter value, but good for quick verification. I don't know if this will apply to 2.9, this is to try at some other time I guess (please confirm the exact version you would be able to try this on).

I tested the patches with this reproducer and it does the expected thing:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

int main(int argc, char **argv)
{
	int fd;
	char fortytwo[42];
	int i;

	for (i = 0; i < 100 ; i++) {
		fd = open(argv[1], O_WRONLY|O_APPEND);
		if (fd == -1) {
			perror("open");
			return -1;
		}

		write(fd, fortytwo, sizeof(fortytwo));
		close(fd);
	}

	return 0;
}

Note you need to create the file beforehand.

I also verified that if you do add O_CREAT (that does not require a file create beforehand) the things still work as expected and the caching still works too.

Likely sideeffect: you won't be able to write into your executables on Lustre after 10 executions - we'll need to pull in LU-4398 for that.

Comment by Oleg Drokin [ 25/Apr/18 ]

Also forgot to add that the patches do not fix problem #3 outlined, but it's sidestepped by the virtue of adding the lookup bit into the mix which suddenly makes that whole lock discoverable by ELC as a conflicting one.

We still need to add update bit into the elc match pattern for open requests as a separate patch, I suspect.

Comment by Bob Ciotti [ 25/Apr/18 ]

From JLan (our build engineer)

All 3 patches were applied to nas-2.10.3 cleanly.

Jay

 

— so we test soon

Comment by Mahmoud Hanafi [ 18/May/18 ]

Attaching our testing results.

  1. We observed big difference in run time.
  2. Non-patched clients with patched clients had the same results as patched clients.

LU-10948_testing.pdf

 

Comment by Oleg Drokin [ 18/May/18 ]

Thanks for the results!

Yes, the change 32156 is server-side and it's a pretty important part of the set, that's why when you don't patch the server you don't observe any positive impact.

That said, your open counts are still elevated which means at least something is not working as planned unless I misunderstood your testcase.

Comment by Mahmoud Hanafi [ 20/May/18 ]

This is my test case.

 

       program multi_stats
! compile with ifort -o multi_stats multi_stats.f -lmpi
      use mpi
      integer (kind=8), parameter :: niter=10000
      integer (kind=8) :: i
      real (kind=8) :: t0, t1, t2
      logical :: ex
      character*50 :: filename
      call mpi_init(ierr)
      call mpi_comm_rank(mpi_comm_world, myid, ierr)
      call mpi_comm_size(mpi_comm_world, nprocs, ierr)      t0 = mpi_wtime()
      write(filename,'(a,i0.8)') "test.", myid
!      print *, 'my filename is', filename
      open(10, file=filename, status="new", IOSTAT=IERR)
      close(10)
      do i = 1,niter
         open(10, file=filename, status='old', position='append')
         write ( 10,*) "test", i
         close(10)
!      if (myid .eq. nprocs-1) write(0,*) i
!      call sleep(1)
      call mpi_barrier(mpi_comm_world, ierr)
      enddo
 60   call mpi_barrier(mpi_comm_world, ierr)
      t1 = mpi_wtime()
      if (myid .eq. 0) print *, 'Total runtime = ',t1 - t0
      call mpi_finalize(ierr)
      end 
Comment by Oleg Drokin [ 21/May/18 ]

does the result change if you only run one instance of it, not 10 (Are all 10 on the same node)?

Do you have ability to test on 2.11? I wonder if it's just some 2.10 difference that makes this not work as expected, though cursory check does not seem to indicate anything like that.

If a single-process test still results in elevated open counts, please try my C reproducer, if that one works as expected, please run your reproducer under strace.

Comment by Mahmoud Hanafi [ 23/May/18 ]

Here is comparing the C code vs FORTRAN. Including strace of the FORTRAN.

 C 100 iterations
----------------  
/proc/fs/lustre/mdc/nbptest2-MDT0000-mdc-ffff88203d2dd800/md_stats
close               9
intent_lock         11
/proc/fs/lustre/mdc/nbptest2-MDT0000-mdc-ffff88203d2dd800/stats
req_waittime        29
req_active          29
mds_close           9
ldlm_cancel         9
/proc/fs/lustre/llite/nbptest2-ffff88203d2dd800/stats
write_bytes         100
open                100
close               100
getxattr            10
getxattr_hits       9
inode_permission    200

/proc/fs/lustre/mdt/nbptest2-MDT0000/exports/10.151.31.132@o2ib/ldlm_stats
ldlm_enqueue        11
ldlm_cancel         9
/proc/fs/lustre/mdt/nbptest2-MDT0000/exports/10.151.31.132@o2ib/stats
open                10
close               9
getxattr            1

Fortran 100 iterations
---------------------
/proc/fs/lustre/mdc/nbptest2-MDT0000-mdc-ffff88203d2dd800/md_stats
close               99
intent_lock         302 
setattr             100
/proc/fs/lustre/mdc/nbptest2-MDT0000-mdc-ffff88203d2dd800/stats
req_waittime        497
req_active          497
mds_close           99
ldlm_cancel         99
obd_ping            1
/proc/fs/lustre/llite/nbptest2-ffff88203d2dd800/stats
write_bytes         100
ioctl               200
open                100
close               100
seek                200
truncate            100
getattr             101
getxattr            100
getxattr_hits       100 
inode_permission    406

/proc/fs/lustre/mdt/nbptest2-MDT0000/exports/10.151.31.132@o2ib/ldlm_stats
ldlm_enqueue        198
ldlm_cancel         99
/proc/fs/lustre/mdt/nbptest2-MDT0000/exports/10.151.31.132@o2ib/stats
open                99
close               99
getattr             100
setattr             100

 

fortran.100iter.strace

Comment by Patrick Farrell (Inactive) [ 01/Aug/18 ]

For what it's worth, we've observed a 70-80% reduction in the mdsrate open() benchmark from the change that https://review.whamcloud.com/#/c/32156/ reverses.

Just something to consider - The impact of that change alone is enormous.

Comment by Andreas Dilger [ 02/Aug/18 ]

Patrick, when you say "reduction in the mdsrate open() benchmark", does that mean "reduction in the time taken" (== good), or "reduction in the open rate" (== bad)?

Comment by Patrick Farrell (Inactive) [ 02/Aug/18 ]

Ah, sorry, that was ambiguous.  Reduction in the open rate, so, bad news.

On a real system, this mdsrate open benchmark drops from 35K opens/second to around 11K.
This is 10K opens per process, 64 processes, opens are random files from among 300000 existing files (created by mdsrate earlier): 
aprun -n 64 /usr/lib64/lustre/tests/mdsrate -d /mnt/lustre/mdsrate --open --iters 10000 --nfile=300000

On a much smaller VM, I see a drop from 8K opens/second to 4K with this benchmark.
This is 8K opens per process, 4 processes, opens randomly selected from among 30000 existing files:
mpirun -n 4 /usr/lib64/lustre/tests/mdsrate -d /mnt/lustre/mdsrate --open --iters 8000 --nfile=30000

As discussed elsewhere, I'll open an LU for this in a few minutes.

Comment by Gerrit Updater [ 01/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32157/
Subject: LU-10948 llite: Revalidate dentries in ll_intent_file_open
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 14ca3157b21d8bd22be29c9578819b72fd39a1e5

Comment by Gerrit Updater [ 03/Jun/19 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35039
Subject: LU-10948 mdt: Remove openlock compat code with 2.1
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1f0f1746f2b2f368e76b8b188f85efe96a7f69e3

Comment by Gerrit Updater [ 16/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35039/
Subject: LU-10948 mdt: Remove openlock compat code with 2.1
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f4e39f710f9069208594870b5cdd37879b46a404

Comment by James A Simmons [ 22/Sep/19 ]

Is this resolved?

Comment by Alex Kulyavtsev [ 17/Apr/20 ]

Is this resolved?

In what lustre release this patch is available?

 

Comment by Andreas Dilger [ 29/Apr/20 ]

This issue has not been resolved, as the main functional patches have not been landed yet. The few patches that have been landed so far are just cleaning up issues in the code.

There are two patches from this ticket outstanding for this ticket that add a simple client-side per-inode counter+threshold to decide whether the client should fetch the openlock:

There is also an older patch on LU-7915 that just provides a whole-client tunable to enable opencache:

Getting the patches for this ticket updated to the latest master and reviewing/investigating the test failures would be useful for getting it landed sooner.

The LU-7915 patch might be useful for testing how much opencache would help a specific workload, but always-on opencache had shown to be a performance hit in some cases in the past, which is why we didn't enable opencache permanently. Still, landing patch 19664 would at least give an available tunable to turn this on/off on a per-client basis if it shows a net improvement for a site's workload.

Comment by Gerrit Updater [ 25/Dec/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41091
Subject: LU-10948 llite: Revalidate dentries in ll_intent_file_open
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: d1242a54f369712b5c39cc51e3118e9f83c5895e

Comment by Gerrit Updater [ 04/Mar/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41091/
Subject: LU-10948 llite: Revalidate dentries in ll_intent_file_open
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 6ad2679352ee9a14eceb8c70b37df084bc07a6f7

Comment by Gerrit Updater [ 12/May/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32158/
Subject: LU-10948 llite: Introduce inode open heat counter
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 41d99c4902836b7265db946dfa49cf99381f0db4

Comment by Jay Lan (Inactive) [ 22/May/21 ]

Can you port #32158 to b2_12? Thanks!

Comment by Gerrit Updater [ 03/Jun/21 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43903
Subject: LU-10948 mdt: New connect flag for non-open-by-fid lock request
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 14edc80c617871e550d41a0f2e0b0f1491e02f37

Comment by Gerrit Updater [ 03/Jun/21 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43907
Subject: LU-10948 mdt: New connect flag for non-open-by-fid lock request
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f6acd02aa06420264b83e4ef2d4bf2f0f95794c7

Comment by Gerrit Updater [ 30/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43907/
Subject: LU-10948 mdt: New connect flag for non-open-by-fid lock request
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 72c9a6e5fb6e11fca1b1438ac18f58ff7849ed7d

Comment by Andreas Dilger [ 22/Jan/24 ]

Patch https://review.whamcloud.com/32156 "LU-10948 mdt: Always return lookup lock on opens" is not landed yet.

Generated at Sat Feb 10 02:39:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.