[LU-10948] client cache open lock after N opens - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.9.0
Labels:
None
Environment:
cent server/sles client

Epic/Theme:
- Performance
Rank (Obsolete):
9223372036854775807

Description

listed as minor but when a user does this, we start to get phone calls form other users and then page POC to identify code/user. Workaround is to terminate user job(s).

Oleg has said that Lustre has an existing feature for a client to acquire open lock but off by default. This to mimic NFS behavior.

Ideal change would be that we can specify a number of times that a file is opened on single client at which time lock is acquired. (e.g. 10th time)

Use case is naive user who loop in this way on like 5000+ threads in java:

do until till the sun turns black()

{ fd = open(*my_thread_ID, O_APPEND) calculate_something_small_but_useful() write(fd, *fortytwo, 42) close(fd }

Users often don't have complete control over the code they run and as a result may not be able to quickly make even simple changes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

fortran.100iter.strace
64 kB
23/May/18 6:33 PM
LU-10948_testing.pdf
30 kB
18/May/18 7:41 PM

Issue Links

is duplicated by

LU-5426 Add more controls for open file cache

Resolved

is related to

LU-12262 Improve sbi_flags checking

Resolved

LU-14694 racer timeouts with LU-10948

Resolved

LU-14743 Interop: sanity test 429 fails with 'multiop failed'

Resolved

LU-16463 llapi_open_by_fid open should not be cached systematically

Resolved

LU-10957 Return more lock bits from MDS for open-create

Open

LU-11623 Allow caching of open-created dentries

Reopened

LU-7915 investigate heuristics for SPARK client getting MDS openlock

Resolved

is related to

LU-4398 mdt_object_open_lock() may not flush conflicting handles

Resolved

LU-10269 Fixes for selective trybits

Resolved

LU-11199 mdsrate open() performance degradation

Resolved

LU-8585 All Lustre test suites should pass with subdirectory mount

Open

LU-10955 Cancel open lock as part of close request

Open

LU-7915 investigate heuristics for SPARK client getting MDS openlock

Resolved

(3 is related to, 6 is related to )

Activity

[LU-10948] client cache open lock after N opens

Oleg Drokin added a comment - 21/May/18 4:56 AM

does the result change if you only run one instance of it, not 10 (Are all 10 on the same node)?

Do you have ability to test on 2.11? I wonder if it's just some 2.10 difference that makes this not work as expected, though cursory check does not seem to indicate anything like that.

If a single-process test still results in elevated open counts, please try my C reproducer, if that one works as expected, please run your reproducer under strace.

Oleg Drokin added a comment - 21/May/18 4:56 AM does the result change if you only run one instance of it, not 10 (Are all 10 on the same node)? Do you have ability to test on 2.11? I wonder if it's just some 2.10 difference that makes this not work as expected, though cursory check does not seem to indicate anything like that. If a single-process test still results in elevated open counts, please try my C reproducer, if that one works as expected, please run your reproducer under strace.

Mahmoud Hanafi added a comment - 20/May/18 6:51 PM

This is my test case.

       program multi_stats
! compile with ifort -o multi_stats multi_stats.f -lmpi
      use mpi
      integer (kind=8), parameter :: niter=10000
      integer (kind=8) :: i
      real (kind=8) :: t0, t1, t2
      logical :: ex
      character*50 :: filename
      call mpi_init(ierr)
      call mpi_comm_rank(mpi_comm_world, myid, ierr)
      call mpi_comm_size(mpi_comm_world, nprocs, ierr)      t0 = mpi_wtime()
      write(filename,'(a,i0.8)') "test.", myid
!      print *, 'my filename is', filename
      open(10, file=filename, status="new", IOSTAT=IERR)
      close(10)
      do i = 1,niter
         open(10, file=filename, status='old', position='append')
         write ( 10,*) "test", i
         close(10)
!      if (myid .eq. nprocs-1) write(0,*) i
!      call sleep(1)
      call mpi_barrier(mpi_comm_world, ierr)
      enddo
 60   call mpi_barrier(mpi_comm_world, ierr)
      t1 = mpi_wtime()
      if (myid .eq. 0) print *, 'Total runtime = ',t1 - t0
      call mpi_finalize(ierr)
      end

Mahmoud Hanafi added a comment - 20/May/18 6:51 PM This is my test case. program multi_stats ! compile with ifort -o multi_stats multi_stats.f -lmpi use mpi integer (kind=8), parameter :: niter=10000 integer (kind=8) :: i real (kind=8) :: t0, t1, t2 logical :: ex character*50 :: filename call mpi_init(ierr) call mpi_comm_rank(mpi_comm_world, myid, ierr) call mpi_comm_size(mpi_comm_world, nprocs, ierr) t0 = mpi_wtime() write(filename, '(a,i0.8)' ) "test." , myid ! print *, 'my filename is' , filename open(10, file=filename, status= " new " , IOSTAT=IERR) close(10) do i = 1,niter open(10, file=filename, status= 'old' , position= 'append' ) write ( 10,*) "test" , i close(10) ! if (myid .eq. nprocs-1) write(0,*) i ! call sleep(1) call mpi_barrier(mpi_comm_world, ierr) enddo 60 call mpi_barrier(mpi_comm_world, ierr) t1 = mpi_wtime() if (myid .eq. 0) print *, 'Total runtime = ' ,t1 - t0 call mpi_finalize(ierr) end

Oleg Drokin added a comment - 18/May/18 8:22 PM

Thanks for the results!

Yes, the change 32156 is server-side and it's a pretty important part of the set, that's why when you don't patch the server you don't observe any positive impact.

That said, your open counts are still elevated which means at least something is not working as planned unless I misunderstood your testcase.

Oleg Drokin added a comment - 18/May/18 8:22 PM Thanks for the results! Yes, the change 32156 is server-side and it's a pretty important part of the set, that's why when you don't patch the server you don't observe any positive impact. That said, your open counts are still elevated which means at least something is not working as planned unless I misunderstood your testcase.

Mahmoud Hanafi added a comment - 18/May/18 7:44 PM

Attaching our testing results.

We observed big difference in run time.
Non-patched clients with patched clients had the same results as patched clients.

LU-10948_testing.pdf

Mahmoud Hanafi added a comment - 18/May/18 7:44 PM Attaching our testing results. We observed big difference in run time. Non-patched clients with patched clients had the same results as patched clients. LU-10948_testing.pdf

Bob Ciotti (Inactive) added a comment - 25/Apr/18 8:31 PM

From JLan (our build engineer)

All 3 patches were applied to nas-2.10.3 cleanly.

Jay

— so we test soon

Bob Ciotti (Inactive) added a comment - 25/Apr/18 8:31 PM From JLan (our build engineer) All 3 patches were applied to nas-2.10.3 cleanly. Jay — so we test soon

Oleg Drokin added a comment - 25/Apr/18 7:29 PM

Also forgot to add that the patches do not fix problem #3 outlined, but it's sidestepped by the virtue of adding the lookup bit into the mix which suddenly makes that whole lock discoverable by ELC as a conflicting one.

We still need to add update bit into the elc match pattern for open requests as a separate patch, I suspect.

Oleg Drokin added a comment - 25/Apr/18 7:29 PM Also forgot to add that the patches do not fix problem #3 outlined, but it's sidestepped by the virtue of adding the lookup bit into the mix which suddenly makes that whole lock discoverable by ELC as a conflicting one. We still need to add update bit into the elc match pattern for open requests as a separate patch, I suspect.

Oleg Drokin added a comment - 25/Apr/18 7:18 PM - edited

I just pushed 3 patches that make the problem go away. the 3rd patch is a bit of a WIP in that you cannot actually tune the counter value, but good for quick verification. I don't know if this will apply to 2.9, this is to try at some other time I guess (please confirm the exact version you would be able to try this on).

I tested the patches with this reproducer and it does the expected thing:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

int main(int argc, char **argv)
{
	int fd;
	char fortytwo[42];
	int i;

	for (i = 0; i < 100 ; i++) {
		fd = open(argv[1], O_WRONLY|O_APPEND);
		if (fd == -1) {
			perror("open");
			return -1;
		}

		write(fd, fortytwo, sizeof(fortytwo));
		close(fd);
	}

	return 0;
}

Note you need to create the file beforehand.

I also verified that if you do add O_CREAT (that does not require a file create beforehand) the things still work as expected and the caching still works too.

Likely sideeffect: you won't be able to write into your executables on Lustre after 10 executions - we'll need to pull in ~~LU-4398~~ for that.

Oleg Drokin added a comment - 25/Apr/18 7:18 PM - edited I just pushed 3 patches that make the problem go away. the 3rd patch is a bit of a WIP in that you cannot actually tune the counter value, but good for quick verification. I don't know if this will apply to 2.9, this is to try at some other time I guess (please confirm the exact version you would be able to try this on). I tested the patches with this reproducer and it does the expected thing: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main( int argc, char **argv) { int fd; char fortytwo[42]; int i; for (i = 0; i < 100 ; i++) { fd = open(argv[1], O_WRONLY|O_APPEND); if (fd == -1) { perror( "open" ); return -1; } write(fd, fortytwo, sizeof(fortytwo)); close(fd); } return 0; } Note you need to create the file beforehand. I also verified that if you do add O_CREAT (that does not require a file create beforehand) the things still work as expected and the caching still works too. Likely sideeffect: you won't be able to write into your executables on Lustre after 10 executions - we'll need to pull in LU-4398 for that.

Gerrit Updater added a comment - 25/Apr/18 7:11 PM

Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32158
Subject: LU-10948 llite: WIP! Introduce inode open counter
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 07d1059dcd4f0e4965b8403cd1bcf52ba391343b

Gerrit Updater added a comment - 25/Apr/18 7:11 PM Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32158 Subject: LU-10948 llite: WIP! Introduce inode open counter Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 07d1059dcd4f0e4965b8403cd1bcf52ba391343b

Gerrit Updater added a comment - 25/Apr/18 7:11 PM

Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32157
Subject: LU-10948 llite: Revalidate dentries in ll_intent_file_open
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e4066c651e0fc03ec967c45fa2d99a04d82e4447

Gerrit Updater added a comment - 25/Apr/18 7:11 PM Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32157 Subject: LU-10948 llite: Revalidate dentries in ll_intent_file_open Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e4066c651e0fc03ec967c45fa2d99a04d82e4447

Gerrit Updater added a comment - 25/Apr/18 7:11 PM

Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32156
Subject: LU-10948 mdt: Always return lookup lock on opens
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d7bced6acd3a1ef77db2afdbadbe4bedc0ef54c4

Gerrit Updater added a comment - 25/Apr/18 7:11 PM Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32156 Subject: LU-10948 mdt: Always return lookup lock on opens Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d7bced6acd3a1ef77db2afdbadbe4bedc0ef54c4

Oleg Drokin added a comment - 25/Apr/18 4:21 PM - edited

So I took initial look into this with a simple patch that implements an inode open counter. This uncovered a number of unanticipated problems:

1. if you never properly look up the file before doing open (or do an ls in the directory that contains it), all lookups don't match (due to invalid flag in dentry so dcompare rejects it) and we always enter lustre via ll_atomic_open() with a new/negative dentry => we have no inode, so we cannot reach the counter until it's too late.
2. if we do ls -l beforehand and have the inode. the moment we request an open lock (that ends up being lookup|open) - due to a mode conflict, it invalidates the lookup|update|perms|.... lock we got from ls -l, this lock revocation invalidates the file dentry and ll_intent_file_open() does not rehash it on lookup lock receiving which returns us back to square 1 - we have the open lock, but we cannot find it (the rehashing is easy to solve, I'll do a simple patch for that soon).
3. the conflict arising from #2 is not detected by ELC so it's an actual client-server-client rpc pingpong which is also undesirable.

I am somewhat surprised to find we do not return a lookup lock from an open request even if the open lock was not requested, since normally we'd want to do that anyway and also include a layout lock (and save an RPC fetching layout so we can start doing io right away), but I guess this is a function of mdtest penalizing us if we return any lock? There's still some looking to be done here on my part, I guess.

Oleg Drokin added a comment - 25/Apr/18 4:21 PM - edited So I took initial look into this with a simple patch that implements an inode open counter. This uncovered a number of unanticipated problems: 1. if you never properly look up the file before doing open (or do an ls in the directory that contains it), all lookups don't match (due to invalid flag in dentry so dcompare rejects it) and we always enter lustre via ll_atomic_open() with a new/negative dentry => we have no inode, so we cannot reach the counter until it's too late. 2. if we do ls -l beforehand and have the inode. the moment we request an open lock (that ends up being lookup|open) - due to a mode conflict, it invalidates the lookup|update|perms|.... lock we got from ls -l, this lock revocation invalidates the file dentry and ll_intent_file_open() does not rehash it on lookup lock receiving which returns us back to square 1 - we have the open lock, but we cannot find it (the rehashing is easy to solve, I'll do a simple patch for that soon). 3. the conflict arising from #2 is not detected by ELC so it's an actual client-server-client rpc pingpong which is also undesirable. I am somewhat surprised to find we do not return a lookup lock from an open request even if the open lock was not requested, since normally we'd want to do that anyway and also include a layout lock (and save an RPC fetching layout so we can start doing io right away), but I guess this is a function of mdtest penalizing us if we return any lock? There's still some looking to be done here on my part, I guess.

People

Assignee:: Oleg Drokin

Reporter:: Bob Ciotti (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 24/Apr/18 5:55 PM

Updated:: 22/Jan/24 3:46 PM