[LU-2300] Wide striping leads to confusion for old clients Created: 07/Nov/12  Updated: 06/Nov/13  Resolved: 06/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Nathan Rutman Assignee: WC Triage
Resolution: Won't Fix Votes: 0
Labels: None

Rank (Obsolete): 5493

 Description   

1.8 client on 2.1 server:
> mkdir 161
> lfs setstripe -c 161 161
error: bad stripe count 161: Invalid argument (22)
error: setstripe: create stripe file '161' failed

Expectation #1: stripe count of "-1" gives a stripe width of 160 since that's all 1.8 can support. On large 1.8.x Lustre clusters -1 gives you 160 (without error, silent or otherwise) even if you have more OSTs.

Expectation #2: stripe count of "-1" gives a stripe width of "all OSTs", regardless of whether some clients can understand it.

Expectation #3: stripe count of "-1" gives a stripe count of 180 iff there are any clients < 2.1

Maybe we should have a MDS conf_param setting which is the "-1" stripe count value?



 Comments   
Comment by Andreas Dilger [ 07/Nov/12 ]

Nathan, could you clarify your expectations more? I'm not sure which of the expectations you are advocating for, and it conflates "-c

{> 160}

" with "-c -1" (which could mean different things).

In #3, is 180 the number of OSTs in the system, or is this a typo and you meant 160? Also, when you write "stripe count -1", do you really mean "any number larger than 160"?

I tried "lfs setstripe -c 161" on my local filesystem and it created a file with 5 stripes (== $OSTCOUNT).

Comment by Nathan Rutman [ 08/Nov/12 ]

I'm not sure which of the expectations you are advocating for

I'm not sure either

"-c {> 160}" with "-c -1"

I'm specifically referring to -1 = "all available OSTs"

I kind of think #2 makes the most sense, but it leads to confusing errors on 1.8 clients: "Why did it let me set that if it can't use it?"

typo and you meant 160?

Yes. Our system in the description has more than 160 OSTs.

The error message was confusing to our customer, and it's not clear to me what the appropriate response should actually be.

Comment by Andreas Dilger [ 17/Nov/12 ]

Nathan, I think I'm open to any reasonable interpretation of usage. I recall from Oleg's early wide striping testing that even though the 1.8 MDT cannot store layouts with more than 160 stripes that the clients can handle more that this, possibly in the neighborhood of 250 stripes.

It definitely makes sense that 1.8 clients should not create files that they cannot use. That means the MDS should limit the layout for clients.

Do you think that the presence of 1.8 clients should limit the layout usable by 2.x clients?

Comment by Nathan Rutman [ 19/Nov/12 ]

It definitely makes sense that 1.8 clients should not create files that they cannot use. That means the MDS should limit the layout for clients.

It means something should. Maybe the best answer is to change lfs on the clients to explicitly change -1 to 160 before submitting?

Do you think that the presence of 1.8 clients should limit the layout usable by 2.x clients?

I don't really like that, but on the other hand someone that creates a 161+ stripe file on a 2.x client and can't access it on a 18 client may be equally confused. If we do do this, then there are more questions:

  • what if a 1.8 client connects "after" a 161-stripe file is created?
  • how long do we wait since the "last" 1.8 client was seen before allowing a 161+ file?
Comment by Nathan Rutman [ 19/Nov/12 ]

Perhaps the simplest policy is

  • MDS changes a 1.8 client -1 stripe count to min(160, ost_count)
  • MDS leaves 2.x client -1 alone
  • 1.8 clients trying to read 161+ files get EINVAL return code from MDS
Comment by Peggy Gazzola [ 20/Nov/12 ]

I agree with Nathan's suggestions – 1.8 client stripe count -1 should be min(160, ost_count); 2.x clients -1 behaves as today; 1.8 clients fail on accessing files w/>160 stripes. I expect we'd see user complaints on the last item. Maybe a server-side option to allow a 2.x MDS to set -1 stripe count default to 160 (admin can then choose to set that if their environment includes 1.8 clients)?

Comment by Andreas Dilger [ 20/Nov/12 ]

I'm largely in agreement with having the MDS enforce the limits. I don't think it is practical to change the 1.8 clients at this point.

However, it should be noted that 1.8 clients can actually handle more than 160 stripes. Something like 250 or so - I suspect it will be reported by Oleg in the original bug (b=4224, IIRC). I believe 1.8 clients (or possibly the MDS) will already report an error if the reply buffer is too small.

Comment by Nathan Rutman [ 20/Nov/12 ]

Xyratex CLSTR-730

Comment by Nathan Rutman [ 20/Nov/12 ]

Ok, we'll work on a patch for this as I outlined above. (Andreas - testing shows anything above 160 returns an error.)

Comment by Andreas Dilger [ 06/Nov/13 ]

I don't think 1.8 interop on systems with > 160 OSTs is a huge concern anymore, and no patch is available to fix this. Closing this old bug.

Generated at Sat Feb 10 01:24:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.