[LU-15748] interop: sanity test_150b: fallocate failed, error Operation not supported, mode 0, offset 62914560, len 4194304 Created: 14/Apr/22  Updated: 27/Apr/23  Resolved: 29/Nov/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Arshad Hussain
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15551 Interop: Various tests fail with fall... Resolved
is related to LU-14382 Implement fallocate() support at MDT Resolved
is related to LU-14160 Implement fallocate FALLOCATE_FL_PUNC... Resolved
is related to LU-15167 fallocate does not increase quota usage Resolved
is related to LU-15877 sanity test_150b: fallocate failed, e... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Cliff White <cwhite@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/1c15b404-f4af-4275-9a77-f5a50b30ca56

keep default fallocate mode: 0
fallocate failed, error Operation not supported, mode 0, offset 62914560, len 4194304
 sanity test_150b: @@@@@@ FAIL: fallocate failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
  = /usr/lib64/lustre/tests/sanity.sh:13441:test_150b()
  = /usr/lib64/lustre/tests/test-framework.sh:6576:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:6623:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:6465:run_test()
  = /usr/lib64/lustre/tests/sanity.sh:13443:main()
Dumping lctl log to /autotest/autotest-1/2022-04-03/lustre-master_full-part-2_4283_1_29_b51025b5-001f-45f2-9eb2-ebe47692e36c//sanity.test_150b.*.1649017567.log


 Comments   
Comment by Patrick Farrell [ 18/Apr/22 ]

arshad512 , are you able to take a look at this?

Comment by Arshad Hussain [ 18/Apr/22 ]

@Patrick Farrell, sorry, I missed this JIRA. I will have a look and upadate on this.

 

 

Comment by Patrick Farrell [ 18/Apr/22 ]

arshad512 - no worries, it's quite new.  Thanks for taking a look.

Comment by Arshad Hussain [ 19/Apr/22 ]

@Andreas, @Patrick

This is an inter-op bug. Seen on 2.14 client and 2.15.0-RC3 (server). New check added under lustre/ofd/ofd_dev.c when fallocate(punch) was introduced. And was
always failing. This is because, The o_valid field of obdo was not setting correct flags
when passing it to server under osc.

if ((oa->o_valid & (OBD_MD_FLSIZE | OBD_MD_FLBLOCKS)) !=                
    (OBD_MD_FLSIZE | OBD_MD_FLBLOCKS)) { 
    ...
}

This was reproducable as (without patch) on 2.14 client:

# ./check_fallocate /mnt/lustre/a
fallocate failed, error Operation not supported, mode 0, offset 62914560, len 4194304

 

After patch on 2.14 client:

# ./check_fallocate /mnt/lustre/a
# ls -ali /mnt/lustre/a 
144115205272502273 -rwx------ 1 root root 125829120 Apr 19 05:56 /mnt/lustre/a

Since the check is valid. The patch must be back-ported to 2.14.  Other thing to check is that the within check_fallocate.c the function test_prealloc_nonsparse() always carries correct o_valid flags. However, test_prealloc_sparse() does not and which was failing.

Comment by Gerrit Updater [ 19/Apr/22 ]

"Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/47091
Subject: LU-15748 osc: Correctly pass obdo(o_valid) flags
Project: fs/lustre-release
Branch: b2_14
Current Patch Set: 1
Commit: 3dd05b38f5a3e779dea73b824ad3a6390e387327

Comment by Andreas Dilger [ 19/Apr/22 ]

It isn't really possible to retroactively fix older releases in this way. Even if this patch landed on b2_14 it would not help the sites that are running 2.14.0.

Do you know if 2.14.0 correctly populated these fields, but just didn't set the flags? In that case, one solution would be to disable the new flag check on master before 2.15.0 is released, and put it under:

 #ifdef LUSTRE_BUILD_VERSION > OBD_OCD_VERSION(2, 18, 52, 0)

and only check that the values are set for the next few releases. Also, we might (initially) make this check conditional on the use of FALLOC_FL_PUNCH_HOLE, since that is when the client started setting these flags.

Alternately, the server could check if the client version is >= 2.15.0 before enforcing the check, but this makes things complicated if backporting the punch feature to older releases (though at worst a sanity check is removed that might still be reasonably validated by checking the o_size and o_blocks values are sane. Alternately, it could be checked by an OBD_CONNECT2_* flag already added in 2.15.0, but I'm a bit reluctant to burn a new flag for this minor issue (though it wouldn't be the end of the world if others think that is needed).

It looks like the lack of OBD_CONNECT_TRUNCLOCK would indicate a newer client (it was set by 2.14.0 clients and sent to both MDTs and OSTs, but not by 2.15.0 clients), which would give us something like:

/* was OBD_CONNECT_TRUNCLOCK           0x400ULL *locks on server for punch */
/* temporary usage until 2.21.53 to indicate pre-2.15 client, see LU-15478 */
#define OBD_CONNECT_OLD_FALLOC         0x400ULL /* missing o_valid flags */

        /*
         * fallocate() start and end are passed in o_size and o_blocks
         * on the wire.  Clients 2.15.0 and newer should always set
         * the OBD_MD_FLSIZE and OBD_MD_FLBLOCKS valid flags, but some
         * older client versions did not.  We permit older clients to
         * not set these flags, checking their version by proxy using
         * the lack of OBD_CONNECT_TRUNCLOCK to imply 2.14.0 and older.
         * 
         * Return -EOPNOTSUPP to also work with older clients not
         * supporting newer server modes.
         */
        if ((oa->o_valid & (OBD_MD_FLSIZE | OBD_MD_FLBLOCKS)) !=
            (OBD_MD_FLSIZE | OBD_MD_FLBLOCKS)
#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(2, 21, 53, 0)
            && !(tgt_conn_flags(tsi) & OBD_CONNECT_OLD_FALLOC)
#endif
            )
                RETURN(-EOPNOTSUPP);

        start = oa->o_size;
        end = oa->o_blocks;
        /* verify arguments are sane (len <= 0 also denied by client VFS) */
        if (start >= end)
                RETURN(-EINVAL);

        /*
         * mode == 0 (which is standard prealloc) and PUNCH is supported
         * Rest of mode options are not supported yet.
         */
        mode = oa->o_falloc_mode;
        if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
                RETURN(-EOPNOTSUPP);

NOTE the same problem exists in mdt_fallocate_hdl() but has not even been fixed with the LU-15551 patch. It will need a similar change to the logic.

Comment by Gerrit Updater [ 20/Apr/22 ]

"Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/47098
Subject: LU-15748 osc: Make older clients work with newer servers without setting OBD_MD_* flags
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 22addaa242320addb7140c607a68f3e8500e8cc2

Comment by Gerrit Updater [ 28/Apr/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47098/
Subject: LU-15748 osc: fallocate interop for 2.14 clients
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 79053592966792a21912390b9db09b1ab1d61e2a

Comment by Peter Jones [ 28/Apr/22 ]

Landed for 2.15

Comment by Andreas Dilger [ 06/Jun/22 ]

I think the patch to fix the interop is backward and needs a minor fix.

Comment by Gerrit Updater [ 06/Jun/22 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47548
Subject: LU-15748 ofd: fix fallocate interop for older clients
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 98529e39d92e9d46e98b015327d52d0e382481b8

Comment by Gerrit Updater [ 06/Jun/22 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47549
Subject: LU-15748 ofd: test fallocate interop for older clients
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 709a3d698ba73805ae009ea82390960cb3b0ed3e

Comment by Gerrit Updater [ 21/Aug/22 ]

"Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/48274
Subject: LU-15748 tests: fix fallocate interop for older clients
Project: fs/lustre-release
Branch: b2_14
Current Patch Set: 1
Commit: e02fda0198f7b6455dc78a433685504b0c864893

Comment by Gerrit Updater [ 29/Nov/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47548/
Subject: LU-15748 ofd: fix fallocate interop for older clients
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 91cb04ef800a8ae9384777286ee1282a262ee89e

Comment by Peter Jones [ 29/Nov/22 ]

Landed for 2.15

Generated at Sat Feb 10 03:20:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.