Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15572

Interop sanity-flr failing with "cannot get UNLOCK lease"

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.15.0
    • Lustre 2.15.0, Lustre 2.15.1
    • None
    • 3
    • 9223372036854775807

    Description

      Running sanity-flr in interop mode fails a number of subtests:
      https://testing.whamcloud.com/test_sets/0a29fd64-d2e6-4cbf-b636-b1deae709128

      lfs mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16)
      

      Attachments

        Issue Links

          Activity

            [LU-15572] Interop sanity-flr failing with "cannot get UNLOCK lease"
            sarah Sarah Liu added a comment -

            Still seeing this error in sanity-flr test_50a in 2.15.1 interop with 2.12.9 el7.9 servers. In the current testing, it seems only happened to subtest 50a
            https://testing.whamcloud.com/test_sets/1f2bc891-06de-4859-97b1-f51a7ed6cf2f

            trevis-96vm2: trevis-96vm2.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
            trevis-96vm1: CMD: trevis-96vm1.trevis.whamcloud.com lctl get_param -n at_max
            trevis-96vm1: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
            trevis-96vm2: CMD: trevis-96vm2.trevis.whamcloud.com lctl get_param -n at_max
            trevis-96vm2: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
            lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16)
            /mnt/lustre/d50A.sanity-flr/f50A.sanity-flr layout generation from 2 to 1
             sanity-flr test_50A: @@@@@@ FAIL: split does not increase layout gen from 2 to 1 
            
            sarah Sarah Liu added a comment - Still seeing this error in sanity-flr test_50a in 2.15.1 interop with 2.12.9 el7.9 servers. In the current testing, it seems only happened to subtest 50a https://testing.whamcloud.com/test_sets/1f2bc891-06de-4859-97b1-f51a7ed6cf2f trevis-96vm2: trevis-96vm2.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid trevis-96vm1: CMD: trevis-96vm1.trevis.whamcloud.com lctl get_param -n at_max trevis-96vm1: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec trevis-96vm2: CMD: trevis-96vm2.trevis.whamcloud.com lctl get_param -n at_max trevis-96vm2: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16) /mnt/lustre/d50A.sanity-flr/f50A.sanity-flr layout generation from 2 to 1 sanity-flr test_50A: @@@@@@ FAIL: split does not increase layout gen from 2 to 1
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46614/
            Subject: LU-15572 util: mirror delete with old MDS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 79f8a4a38c71c7f06ac561ca4cd6d6b9cb5aa4fb

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46614/ Subject: LU-15572 util: mirror delete with old MDS Project: fs/lustre-release Branch: master Current Patch Set: Commit: 79f8a4a38c71c7f06ac561ca4cd6d6b9cb5aa4fb

            "Bobi Jam <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/46614
            Subject: LU-15572 util: mirror delete with old MDS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: cdd3a1299ca77face6078caa254ffa2b493cbf3e

            gerrit Gerrit Updater added a comment - "Bobi Jam <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/46614 Subject: LU-15572 util: mirror delete with old MDS Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: cdd3a1299ca77face6078caa254ffa2b493cbf3e

            Bobijam, rather than pasting the patch here as a comment, could you please submit it as a patch to Gerrit and add a Test-Parameters line to run it in interop mode:

            Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY="0 50 60 61"
            Test-Parameters: clientversion=2.14.0 testlist=sanity env=ONLY="0 50 60 61"
            Test-Parameters: serverversion=2.12.8 testlist=sanity env=ONLY="0 50 60 61"
            Test-Parameters: clientversion=2.12.8 testlist=sanity env=ONLY="0 50 60 61"
            
            adilger Andreas Dilger added a comment - Bobijam, rather than pasting the patch here as a comment, could you please submit it as a patch to Gerrit and add a Test-Parameters line to run it in interop mode: Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY="0 50 60 61" Test-Parameters: clientversion=2.14.0 testlist=sanity env=ONLY="0 50 60 61" Test-Parameters: serverversion=2.12.8 testlist=sanity env=ONLY="0 50 60 61" Test-Parameters: clientversion=2.12.8 testlist=sanity env=ONLY="0 50 60 61"
            bobijam Zhenyu Xu added a comment -

            Could use following patch for 2.15 to catch older MDS ambiguous error code

            diff --git a/lustre/utils/lfs.c b/lustre/utils/lfs.c
            index 3a1511fe05..e5d245af3e 100644
            --- a/lustre/utils/lfs.c
            +++ b/lustre/utils/lfs.c
            @@ -2329,7 +2329,7 @@ again:
                    data->lil_ids[1] = mirror_id;
                    rc = llapi_lease_set(fd, data);
                    if (rc <= 0) {
            -               if (rc == -EINVAL && purge) {
            +               if (purge) {
                                    /* could be old MDS which prohibit fd==fdv */
                                    purge = false;
                                    goto again; 
            bobijam Zhenyu Xu added a comment - Could use following patch for 2.15 to catch older MDS ambiguous error code diff --git a/lustre/utils/lfs.c b/lustre/utils/lfs.c index 3a1511fe05..e5d245af3e 100644 --- a/lustre/utils/lfs.c +++ b/lustre/utils/lfs.c @@ -2329,7 +2329,7 @@ again:         data->lil_ids[1] = mirror_id;         rc = llapi_lease_set(fd, data);         if (rc <= 0) { -               if (rc == -EINVAL && purge) { +               if (purge) {                         /* could be old MDS which prohibit fd==fdv */                         purge = false ;                         goto again;
            bobijam Zhenyu Xu added a comment -

            I think it could be patch https://review.whamcloud.com/42116 "LU-14521 'flr: delete mirror without volatile file'" does not catch proper error code from old MDS (it expects -EINVAL from MDS while old server returns -EBUSY in mirror delete).

            bobijam Zhenyu Xu added a comment - I think it could be patch https://review.whamcloud.com/42116 " LU-14521 'flr: delete mirror without volatile file'" does not catch proper error code from old MDS (it expects -EINVAL from MDS while old server returns -EBUSY in mirror delete).
            bobijam Zhenyu Xu added a comment -

            The patch https://review.whamcloud.com/45636 "LU-15268 ‘mdt: reveal the real intent close error code" is a server only patch, it should something else causing the interop failure.

            bobijam Zhenyu Xu added a comment - The patch https://review.whamcloud.com/45636  " LU-15268  ‘mdt: reveal the real intent close error code " is a server only patch, it should something else causing the interop failure.
            bobijam Zhenyu Xu added a comment - - edited

            Could be an interop issue between 2.15.0 client and 2.14.0 MDS on mirror split operation.

            test_0b:
            lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16)
            error lfs mirror delete: cannot split '/mnt/lustre/d0b.sanity-flr/f0b.sanity-flr': Device or resource busy

            test_50A:
            lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16)
            error lfs mirror split: cannot split '/mnt/lustre/d50A.sanity-flr/f50A.sanity-flr': Device or resource busy

            test_60b:
            lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16)
            error lfs mirror split: cannot split '/mnt/lustre/d60b.sanity-flr/f60b.sanity-flr': Device or resource busy

            test_61b:
            lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16)
            error lfs mirror split: cannot split '/mnt/lustre/d61b.sanity-flr/f61b.sanity-flr': Device or resource busy

            test_203:
            lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16)
            error lfs mirror delete: cannot split '/mnt/lustre/f203.sanity-flr': Device or resource busy

            take test_60b as an example:
            on MDS, it received the file close intent for mirror split request, and finished it successfully

            00000100:00100000:1.0:1645359656.359528:0:991126:0:(service.c:2142:ptlrpc_server_handle_req_in()) got req x1725282051462784
            00000100:00100000:1.0:1645359656.359542:0:991126:0:(service.c:2299:ptlrpc_server_handle_request()) Handling RPC req@0000000030f1e911 pname:cluuid+ref:pid:xid:nid:opc:job mdt_rdpg00_001:67bf88ef-fd73-4c00-9a58-d65897456138+35:1227937:x1725282051462784:12345-10.240.25.52@tcp:35:lfs.0
            ...
            00000100:00100000:1.0:1645359656.362030:0:991126:0:(service.c:2348:ptlrpc_server_handle_request()) Handled RPC req@0000000030f1e911 pname:cluuid+ref:pid:xid:nid:opc:job mdt_rdpg00_001:67bf88ef-fd73-4c00-9a58-d65897456138+32:1227937:x1725282051462784:12345-10.240.25.52@tcp:35:lfs.0 Request processed in 2487us (2537us total) trans 85899346091 rc 0/0
            

            while on client, I don't see the failure reason

            00000080:00200000:1.0:1645359656.359344:0:1227937:0:(file.c:3896:ll_file_ioctl()) VFS Op:inode=[0x2000090a6:0x4e:0x0](00000000a9e821ea), cmd=c00c66f3
            00000002:00100000:1.0:1645359656.359354:0:1227937:0:(mdc_request.c:944:mdc_close()) @@@ matched open  req@00000000d7fc9409 x1725282051460736/t85899346086(85899346086) o101->lustre-MDT0000-mdc-ffff913d103c9000@10.240.25.55@tcp:12/10 lens 576/656 e 0 to 0 dl 1645359663 ref 1 fl Complete:RPQU/4/ffffffff rc 0/-1 job:'lfs.0'
            00000100:00100000:1.0:1645359656.359372:0:1227937:0:(client.c:742:ptlrpc_reassign_next_xid()) @@@ reassign xid  req@000000008e2795d6 x1725282051462784/t0(0) o35->lustre-MDT0000-mdc-ffff913d103c9000@10.240.25.55@tcp:23/10 lens 504/1072 e 0 to 0 dl 0 ref 1 fl New:QU/0/ffffffff rc 0/-1 job:''
            00000100:00100000:1.0:1645359656.359378:0:1227937:0:(client.c:1733:ptlrpc_send_new_req()) Sending RPC req@000000008e2795d6 pname:cluuid:pid:xid:nid:opc:job lfs:67bf88ef-fd73-4c00-9a58-d65897456138:1227937:1725282051462784:10.240.25.55@tcp:35:lfs.0
            ...
            00000100:00100000:1.0:1645359656.362161:0:1227937:0:(client.c:2210:ptlrpc_check_set()) Completed RPC req@000000008e2795d6 pname:cluuid:pid:xid:nid:opc:job lfs:67bf88ef-fd73-4c00-9a58-d65897456138:1227937:1725282051462784:10.240.25.55@tcp:35:lfs.0
            00000002:00100000:1.0:1645359656.362168:0:1227937:0:(mdc_request.c:854:mdc_free_open()) @@@ free open request, rq_replay=0  req@00000000d7fc9409 x1725282051460736/t85899346086(85899346086) o101->lustre-MDT0000-mdc-ffff913d103c9000@10.240.25.55@tcp:12/10 lens 576/656 e 0 to 0 dl 1645359663 ref 1 fl Complete:RQU/4/ffffffff rc 0/-1 job:'lfs.0'
            
            bobijam Zhenyu Xu added a comment - - edited Could be an interop issue between 2.15.0 client and 2.14.0 MDS on mirror split operation. test_0b: lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16) error lfs mirror delete: cannot split '/mnt/lustre/d0b.sanity-flr/f0b.sanity-flr': Device or resource busy test_50A: lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16) error lfs mirror split: cannot split '/mnt/lustre/d50A.sanity-flr/f50A.sanity-flr': Device or resource busy test_60b: lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16) error lfs mirror split: cannot split '/mnt/lustre/d60b.sanity-flr/f60b.sanity-flr': Device or resource busy test_61b: lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16) error lfs mirror split: cannot split '/mnt/lustre/d61b.sanity-flr/f61b.sanity-flr': Device or resource busy test_203: lfs mirror mirror: cannot get UNLOCK lease, ext 8: Device or resource busy (16) error lfs mirror delete: cannot split '/mnt/lustre/f203.sanity-flr': Device or resource busy take test_60b as an example: on MDS, it received the file close intent for mirror split request, and finished it successfully 00000100:00100000:1.0:1645359656.359528:0:991126:0:(service.c:2142:ptlrpc_server_handle_req_in()) got req x1725282051462784 00000100:00100000:1.0:1645359656.359542:0:991126:0:(service.c:2299:ptlrpc_server_handle_request()) Handling RPC req@0000000030f1e911 pname:cluuid+ref:pid:xid:nid:opc:job mdt_rdpg00_001:67bf88ef-fd73-4c00-9a58-d65897456138+35:1227937:x1725282051462784:12345-10.240.25.52@tcp:35:lfs.0 ... 00000100:00100000:1.0:1645359656.362030:0:991126:0:(service.c:2348:ptlrpc_server_handle_request()) Handled RPC req@0000000030f1e911 pname:cluuid+ref:pid:xid:nid:opc:job mdt_rdpg00_001:67bf88ef-fd73-4c00-9a58-d65897456138+32:1227937:x1725282051462784:12345-10.240.25.52@tcp:35:lfs.0 Request processed in 2487us (2537us total) trans 85899346091 rc 0/0 while on client, I don't see the failure reason 00000080:00200000:1.0:1645359656.359344:0:1227937:0:(file.c:3896:ll_file_ioctl()) VFS Op:inode=[0x2000090a6:0x4e:0x0](00000000a9e821ea), cmd=c00c66f3 00000002:00100000:1.0:1645359656.359354:0:1227937:0:(mdc_request.c:944:mdc_close()) @@@ matched open req@00000000d7fc9409 x1725282051460736/t85899346086(85899346086) o101->lustre-MDT0000-mdc-ffff913d103c9000@10.240.25.55@tcp:12/10 lens 576/656 e 0 to 0 dl 1645359663 ref 1 fl Complete:RPQU/4/ffffffff rc 0/-1 job:'lfs.0' 00000100:00100000:1.0:1645359656.359372:0:1227937:0:(client.c:742:ptlrpc_reassign_next_xid()) @@@ reassign xid req@000000008e2795d6 x1725282051462784/t0(0) o35->lustre-MDT0000-mdc-ffff913d103c9000@10.240.25.55@tcp:23/10 lens 504/1072 e 0 to 0 dl 0 ref 1 fl New:QU/0/ffffffff rc 0/-1 job:'' 00000100:00100000:1.0:1645359656.359378:0:1227937:0:(client.c:1733:ptlrpc_send_new_req()) Sending RPC req@000000008e2795d6 pname:cluuid:pid:xid:nid:opc:job lfs:67bf88ef-fd73-4c00-9a58-d65897456138:1227937:1725282051462784:10.240.25.55@tcp:35:lfs.0 ... 00000100:00100000:1.0:1645359656.362161:0:1227937:0:(client.c:2210:ptlrpc_check_set()) Completed RPC req@000000008e2795d6 pname:cluuid:pid:xid:nid:opc:job lfs:67bf88ef-fd73-4c00-9a58-d65897456138:1227937:1725282051462784:10.240.25.55@tcp:35:lfs.0 00000002:00100000:1.0:1645359656.362168:0:1227937:0:(mdc_request.c:854:mdc_free_open()) @@@ free open request, rq_replay=0 req@00000000d7fc9409 x1725282051460736/t85899346086(85899346086) o101->lustre-MDT0000-mdc-ffff913d103c9000@10.240.25.55@tcp:12/10 lens 576/656 e 0 to 0 dl 1645359663 ref 1 fl Complete:RQU/4/ffffffff rc 0/-1 job:'lfs.0'

            This is also causing sanity test_272d, test_272e, and test_272f to fail in interop testing.

            adilger Andreas Dilger added a comment - This is also causing sanity test_272d, test_272e, and test_272f to fail in interop testing.

            People

              bobijam Zhenyu Xu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: