[LU-9348] brw_size is not always dynamically changeable Created: 15/Apr/17  Updated: 26/Nov/17  Resolved: 26/Nov/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: John Salinas (Inactive) Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: LS_RZ

Attachments: Text File OUTPUT.dmesg.txt     Text File OUTPUT.kernel_debug_trace.txt     Text File OUTPUT.lctl_dl.txt     Text File OUTPUT.show_kernelmod_params.txt     Text File OUTPUT.zpool_events.txt     Text File OUTPUT.zpool_events_verbose.txt     HTML File messages    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

On one OSS I could do:
[root@wolf-3 combined]# lctl set_param obdfilter.lsdraid-OST0000.brw_size=1
obdfilter.lsdraid-OST0000.brw_size=1
[root@wolf-3 combined]# lctl set_param obdfilter.*.brw_size=1
obdfilter.lsdraid-OST0000.brw_size=1

But on another OSS I could not do the same thing:

  1. lctl set_param obdfilter.*.brw_size=1
    error: set_param: setting /proc/fs/lustre/obdfilter/lsdraid-OST0001/brw_size=1: Numerical result out of range
    [root@wolf-4 combined]# cat /proc/fs/lustre/obdfilter/lsdraid-OST0001/brw_size
    16
    [root@wolf-4 combined]# echo "0" > /proc/fs/lustre/obdfilter/lsdraid-OST0001/brw_size
    -bash: echo: write error: Numerical result out of range
    [root@wolf-4 combined]# echo "1" > /proc/fs/lustre/obdfilter/lsdraid-OST0001/brw_size
    -bash: echo: write error: Numerical result out of range
    [root@wolf-4 combined]# echo "16" > /proc/fs/lustre/obdfilter/lsdraid-OST0001/brw_size
    [root@wolf-4 combined]# echo "17" > /proc/fs/lustre/obdfilter/lsdraid-OST0001/brw_size
    -bash: echo: write error: Numerical result out of range
    [root@wolf-4 combined]# echo "10" > /proc/fs/lustre/obdfilter/lsdraid-OST0001/brw_size
    -bash: echo: write error: Numerical result out of range

dmesg:
[ 91.466648] SPL: using hostid 0x61303830
[ 99.368278] LNet: HW CPU cores: 72, npartitions: 2
[ 99.375528] alg: No test for adler32 (adler32-zlib)
[ 99.381114] alg: No test for crc32 (crc32-table)
[ 107.466028] Lustre: Lustre: Build Version: 2.9.0_dirty
[ 107.583440] LNet: Added LNI 192.168.1.4@o2ib [8/256/0/180]
[ 107.912856] LustreError: 11-0: lsdraid-MDT0000-lwp-OST0001: operation mds_connect to node 192.168.1.5@o2ib failed: rc = -114
[ 107.977949] Lustre: lsdraid-OST0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
[ 113.824128] Lustre: lsdraid-OST0001: Will be in recovery for at least 2:30, or until 1 client reconnects
[ 113.834861] Lustre: lsdraid-OST0001: Connection restored to lsdraid-MDT0000-mdtlov_UUID (at 192.168.1.5@o2ib)
[ 114.012466] Lustre: lsdraid-OST0001: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.
[ 114.105679] Lustre: lsdraid-OST0001: deleting orphan objects from 0x0:82221 to 0x0:83201
[ 132.876656] LustreError: 11-0: lsdraid-MDT0000-lwp-OST0001: operation mds_connect to node 192.168.1.5@o2ib failed: rc = -114
[ 157.849631] LustreError: 11-0: lsdraid-MDT0000-lwp-OST0001: operation mds_connect to node 192.168.1.5@o2ib failed: rc = -114
[ 699.046036] Lustre: Failing over lsdraid-OST0001
[ 699.058930] Lustre: server umount lsdraid-OST0001 complete
[ 754.958545] Lustre: lsdraid-OST0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
[ 756.682541] Lustre: lsdraid-OST0001: Will be in recovery for at least 2:30, or until 1 client reconnects
[ 756.693277] Lustre: lsdraid-OST0001: Connection restored to lsdraid-MDT0000-mdtlov_UUID (at 192.168.1.5@o2ib)
[ 756.867379] Lustre: lsdraid-OST0001: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.
[ 756.940023] Lustre: lsdraid-OST0001: deleting orphan objects from 0x0:82221 to 0x0:83233

The impact is that this pool has to be unmounted and destroyed before more testing with different brw_size can continue. This only happens with OSTs that have been used with existing data, on a newly formatted OST I seem to be able to change the size every time.



 Comments   
Comment by John Salinas (Inactive) [ 18/Apr/17 ]

Andreas looked this up:

static ssize_t
ofd_brw_size_seq_write(struct file file, const char __user buffer,
                       size_t count, loff_t *off)
{
        :
        :
        if (val > DT_MAX_BRW_SIZE || val < (1 << ofd->ofd_blockbits))
                return -ERANGE; 

In this case I had set the record size to16M on this node (but 1M on the other node) and that is why I could not change it here. I did not see this mentioned in the documentation but perhaps I missed it.

Comment by Gerrit Updater [ 19/Apr/17 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26726
Subject: LU-9348 proc: describe brw_size limits better
Project: doc/manual
Branch: master
Current Patch Set: 1
Commit: f358669d0ee6f0cbf475e6505864dd056edf0aa7

Comment by Gerrit Updater [ 19/Apr/17 ]

Joseph Gmitter (joseph.gmitter@intel.com) merged in patch https://review.whamcloud.com/26726/
Subject: LU-9348 proc: describe brw_size limits better
Project: doc/manual
Branch: master
Current Patch Set:
Commit: b55d752a8dd903aaaeacaeaecb0ffbfa294a214f

Generated at Sat Feb 10 02:25:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.