[LU-9290] max_pages_per_rpc can't be smaller than ZFS recordsize Created: 04/Apr/17  Updated: 20/Feb/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Erich Focht Assignee: Nathaniel Clark
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Lustre 2.9.0
ZFS based OSTs


Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

At a customer we've hit LU-5718 and were trying to set max_pages_per_rpc to a value lower than 256. This didn't work. Reading the source showed:

 

lustre/obdclass/lprocfs_status.c: osc_obd_max_pages_per_rpc_seq_write()

 

chunk_mask = ~((1 << (cli->cl_chunkbits - PAGE_CACHE_SHIFT)) - 1);
/* max_pages_per_rpc must be chunk aligned */
val = (val + ~chunk_mask) & chunk_mask;
if (val == 0 || (ocd->ocd_brw_size != 0 &&
                 val > ocd->ocd_brw_size >> PAGE_CACHE_SHIFT)) {
        LPROCFS_CLIMP_EXIT(dev);
        return -ERANGE;
}

chunkbits is 20. It is set in lustre/osc/osc_request.c:osc_init_grant() to

cli->cl_chunkbits = max_t(int, PAGE_SHIFT, ocd->ocd_grant_blkbits);

and ocd_grant_blkbits is set to 20. It's comment line says: /* log2 of the backend filesystem blocksize */

Once I've reduced the ZFS recordsize from 1MB to 512kB and remounted the OST, I
was able to reduce the max_pages_per_rpc.

I believe that the value set in ocd_grant_blkbits is wrong, and actually should be the ZFS ashif value (i.e. the block size) and not the recordsize.



 Comments   
Comment by Erich Focht [ 04/Apr/17 ]

The value seems to come from

lustre/osd-zfs/osd_object.c:osd_mkreg()

               rc = -dmu_object_set_blocksize(osd->od_os, db->db_object,
                                               osd->od_max_blksz, 0, oh->ot_tx);


Comment by Peter Jones [ 04/Apr/17 ]

Nathaniel

Could you please advise with this one?

Thanks

Peter

Comment by Erich Focht [ 04/Apr/17 ]

Ignore my previous comment. I have no clue, yet, where the value is being set.

Comment by Jinshan Xiong (Inactive) [ 06/Apr/17 ]

Hi Erich,

The chunk size comes from ofd_brw_size, which is deduced from ofd_block_bits and then od_max_blksz of ZFS. The reason it has this restriction is because ZFS has huge penalty of doing partial record size writing.

May I ask why the customer would like to set max_pages_per_rpc to be less than record size? Essentially this will cause every single write to ZFS will be less than a record size.

Comment by Erich Focht [ 06/Apr/17 ]

Hi Jinshan,

the issue we have is NEC-37 . We cannot apply the workaround described in LU-5718 if you leave it this way. Except if we reduce the record size in zfs. I expected to be able to change this, even if the values lead to less performance...

Best regards
Erich

 

Generated at Sat Feb 10 02:24:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.