[LU-16409] ior-hard-write test fails with EINVAL Created: 16/Dec/22  Updated: 16/Dec/22  Resolved: 16/Dec/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Lukasz Flis Assignee: WC Triage
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Server: b2_15 @9d1805c8b9cc1067b9b3ba186e5e3531112e08a3
+ LU-16286 + LU-15894
EL8 4.18.0-372.26.1.el8_6.x86_64

Client: b2_15 @ 906b5d9dbe82beed41f191bd69ce1f72504a77c5
EL9 5.14.0-70.22.1.el9_0.x86_64


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During io500 run (sc22 version) we have encoutered write error during ior-hard-write phase  when posix.odirect option is supplied. 

Problem doesn not occur without O_DIRECT 

We have modified ior to be more verbose about the errno of the code. 
write call is interrupted with errno 22 (EINVAL) 

ERROR: write(114, 0x4c33000, 47008) failed, rc = 22, (aiori-POSIX.c:703)
ERROR: write(114, 0x2e80000, 47008) failed, rc = 22, (aiori-POSIX.c:703)

clients: 10, 32 processes per client
servers: 6 servers, 2 osts per server

ior commandline:

./ior --dataPacketType=timestamp -C -Q 1 -g -G=1853316614 -k -e -o /net/tscratch/tests/dw//2022.12.16-15.30.44/ior-hard/file -O stoneWallingStatusFile=./results/2022.12.16-15.30.44/ior-hard.stonewall -t 47008 -b 47008 -s 200000 -w -D 300 -a POSIX --posix.odirect -O saveRankPerformanceDetailsCSV=./results/2022.12.16-15.30.44/ior-hard-write.csv -O stoneWallingWearOut=1

striping setup: -c -1 -S 32m

lfs getstripe /net/tscratch/tests/dw//2022.12.16-15.30.44/ior-hard/file
/net/tscratch/tests/dw//2022.12.16-15.30.44/ior-hard/file
lmm_stripe_count:  12
lmm_stripe_size:   33554432
lmm_pattern:       raid0
lmm_layout_gen:    0
lmm_stripe_offset: 9
    obdidx         objid         objid         group
         9           2216075         0x21d08b                 0
        11           2211594         0x21bf0a                 0
         0            848231          0xcf167                 0
         2           2231886         0x220e4e                 0
         4            800648          0xc3788                 0
         6           2251269         0x225a05                 0
         8           2235886         0x221dee                 0
        10           2199341         0x218f2d                 0
         1           2008582         0x1ea606                 0
         3           2203243         0x219e6b                 0
         5            827881          0xca1e9                 0
         7           2201995         0x21998b                 0

Additional information:

No timeouts, no evictions happened during the test phase

This issue is reproducible on every run

 

Lukasz Flis

 



 Comments   
Comment by Andreas Dilger [ 16/Dec/22 ]

I don't think this is a bug. O_DIRECT requires read/write be sized and aligned on 4096-byte boundaries. The ior-hard-write is doing 47008-byte writes, so this is not expected to work.

Comment by Lukasz Flis [ 16/Dec/22 ]

Andreas, thank you for pointing this out. You are right indeed.  Let's close this one

Generated at Sat Feb 10 03:26:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.