[LU-11157] sanity test_42e: invalid arithmetic operator (error token is ".9") Created: 18/Jul/18  Updated: 21/Nov/19  Resolved: 10/May/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.13.0, Lustre 2.12.4

Type: Bug Priority: Minor
Reporter: Maloo Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-8066 Move lustre procfs handling to sysfs ... Open
is related to LU-10990 Get rid of per-osc max_dirty_mb setting Resolved
is related to LU-9091 Replace lprocfs_str_with_units_to_s64... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for John Hammond <jhammond@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f2a78f4c-8aa6-11e8-808e-52540065bddc

test_42e failed with the following error:

test_42e returned 1
== sanity test 42e: verify sub-RPC writes are not done synchronously ================================= 14:42:04 (1531924924)
total: 3500 open/close in 6.19 seconds: 565.32 ops/second
/usr/lib64/lustre/tests/sanity.sh: line 3978: 209.9: syntax error: invalid arithmetic operator (error token is ".9")

Some proc/sys/... files are returning decimal values. This one is from max_dirty_mb.

Looks like 14 instances of this failure today.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_42e - test_42e returned 1



 Comments   
Comment by Gerrit Updater [ 18/Jul/18 ]

John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32831
Subject: LU-11157 obd: keep dirty_max_pages a round number of MB
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2c46819ed70b222a8b762d1b2e7e1f4226de89e8

Comment by Andreas Dilger [ 18/Jul/18 ]
LU-8066 obdclass: move lustre sysctl to sysfs
    
    Backport from upstream the changes to port lustre
    systctl to sysfs. Needed to re-export the function
    lprocfs_read_frac_helper for later work. The
    following patches were backported:
    
    Linux-commit: e2424a1265f2772b66f068c205256e2aef5f74a0
    
    Move max_dirty_mb from sysctl to sysfs. max_dirty_mb is
    now a parameter in /sys/fs/lustre.
Comment by James A Simmons [ 19/Jul/18 ]

Ouch, it been broken upstream for some time.

Comment by John Hammond [ 19/Jul/18 ]

> Ouch, it been broken upstream for some time.

It was only when LU-10990 and LU-8066 were combined that this started failing.

I am really not a fan of displaying decimal fractional values in these files. Generally it means that we cannot faithfully save and restore the parameter values.

However if we do keep fractional values then I think there is more to do here. There are substantial differences between lprocfs_seq_read_frac_helper() and lprocfs_read_frac_helper(). I don't know why the second function is so complicated. But this is bad. Switching from proc to sys shouldn't be changing the format used to display the values.

Comment by Gerrit Updater [ 19/Jul/18 ]

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/32831/
Subject: LU-11157 obd: keep dirty_max_pages a round number of MB
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d3f88d376c49e4520a0d695a4b4e9b0c2dbebaaf

Comment by James Nunez (Inactive) [ 19/Jul/18 ]

In addition to sanity test 42e failing, we also see the following tests fail:
sanity 64d with the same error message; '209.9: syntax error: invalid arithmetic operator (error token is ".9")'
conf-sanity test 76a with error message ''209.9: syntax error: invalid arithmetic operator (error token is ".9")'
recovery-small test 55 with error 'error: set_param: setting /sys/fs/lustre/osc/lustre-OST0001-osc-ffff9b4353f50800/max_dirty_mb=209.9: Invalid argument'
sanity-dom test 42e, which is running sanity test 42e

Comment by James A Simmons [ 08/Feb/19 ]

So I started to look into this and see the reason as John pointed out for the failures Nunez posted is due to the test treating the values returned by say "max_dirty_mb" as a real integer and not a float point string. We could update the test to feed this result into bc since bash can't do floating point math. The question to ask is do we want to make sites do the same kind of crazy? If we don't that means we end implementing round_up() handling like we did for dirty_max_pages. Is that okay with people?

Comment by Andreas Dilger [ 09/Feb/19 ]

I'm fine with rounding the output to a whole number of MB. This is the only code that is using lprocfs_read_frac_helper() so it could just be removed. There are other places that are using lprocfs_seq_read_frac_helper() that will have the same issues. Back when this code was written, a few MB was a lot of memory, but now it is a rounding error for a client.

Comment by Gerrit Updater [ 25/Feb/19 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/34317
Subject: LU-11157 obd: round values to nearest MiB for *_mb sysfs files
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3e9d90e56042d5ff9076ae057c1ff117c348b2fa

Comment by Gerrit Updater [ 10/May/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34317/
Subject: LU-11157 obd: round values to nearest MiB for *_mb syfs files
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ba2817fe3ead1b8e32be6d6c6ce25b490626118a

Comment by Peter Jones [ 10/May/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 07/Oct/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36393
Subject: LU-11157 obd: round values to nearest MiB for *_mb syfs files
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 350b4ee688585c034f0c0e520e2587aa4d389eda

Comment by Gerrit Updater [ 21/Nov/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36393/
Subject: LU-11157 obd: round values to nearest MiB for *_mb syfs files
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 72c2d383c3e77206b0d169d894a18b5e43b7b72b

Generated at Sat Feb 10 02:41:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.